WO2023077121A1 - Rna quantitative amplicon sequencing for gene expression quantitation - Google Patents

Rna quantitative amplicon sequencing for gene expression quantitation Download PDF

Info

Publication number
WO2023077121A1
WO2023077121A1 PCT/US2022/078978 US2022078978W WO2023077121A1 WO 2023077121 A1 WO2023077121 A1 WO 2023077121A1 US 2022078978 W US2022078978 W US 2022078978W WO 2023077121 A1 WO2023077121 A1 WO 2023077121A1
Authority
WO
WIPO (PCT)
Prior art keywords
umi
target rna
rna
sequence
sequencing
Prior art date
Application number
PCT/US2022/078978
Other languages
French (fr)
Inventor
Peng Dai
Angela V. SERRANO
Jinny X. ZHANG
Original Assignee
Nuprobe Usa, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nuprobe Usa, Inc. filed Critical Nuprobe Usa, Inc.
Publication of WO2023077121A1 publication Critical patent/WO2023077121A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/6851Quantitative amplification

Definitions

  • the present disclosure relates to the fields of molecular biology and bioinformatics. More particularly, it relates to methods for analyzing RNA samples to quantify gene expression.
  • RNA expression level quantitation is important, as the set of RNAs transcribed and their molecule count reflect the current state of the cell and may reveal pathological mechanisms underlying diseases. Many methods have been developed for RNA expression level quantitation, including standard RNA Seq, nanoString, microarray, and quantitative reverse transcription PCR (RT-qPCR).
  • RNA QASeq utilizes PCR-based molecular barcoding to improve quantitation accuracy, and uses a targeted amplicon approach to focus the sequencing reads to genes of interest for cost reduction.
  • RNA QASeq requires lower input than nanoString, has better quantitation accuracy than microarray, exhibits higher multiplexing ability than RT-qPCR, and is a targeted approach compared to standard RNA Seq to reduce dropout of low expression level species.
  • this disclosure provides a method for quantitating between 1 and 10,000 Target RNA Regions in an RNA sample, the method comprising: (a) converting RNA molecules in the RNA sample to complementary DNA (cDNA) molecules; (b) contacting the cDNA molecules with: (i) a set of Unique Molecular Identifier (UMI) Primers, each UMI Primer of the set of UMI Primers comprising a unique molecular identifier sequence (UMI) and a gene-specific sequence that is complementary to a Target RNA Region subsequence; (ii) a second set of primers; (iii) a first DNA polymerase; and (iv) reagents and buffers needed for DNA polymerase extension to generate a mixture; (c) subjecting the mixture to temperatures that allow primer binding and DNA polymerase extension; (d) removing non-extended UMI Primers to generate a product; (e) preparing a sequencing library using the product; (I) subjecting the sequencing library to
  • this disclosure provides a method for quantitating between 1 and 10,000 Target RNA Regions in an RNA sample, the method comprising: (a) converting RNA molecules in the RNA sample to complementary DNA (cDNA) molecules; (b) contacting the cDNA molecules with: (i) a set of Unique Molecular Identifier (UMI) Primers, each UMI Primer of the set of UMI Primers comprising a unique molecular identifier sequence (UMI) and a gene-specific sequence that is complementary to a Target RNA Region subsequence; (ii) optionally, a second set of primers; (iii) a first DNA polymerase; and (iv) reagents and buffers needed for DNA polymerase extension to generate a mixture; (c) subjecting the mixture to temperature that allow primer binding and DNA polymerase extension; (d) removing non-extended UMI Primers to generate a product; (e) preparing a sequencing library using the product; (I) subjecting the
  • this disclosure provides a method for quantitating between 1 and 10,000 Target RNA Regions in an RNA sample, the method comprising: (a) converting RNA molecules in the RNA sample to complementary DNA (cDNA) molecules; (b) contacting the cDNA molecules with: (i) a set of Unique Molecular Identifier (UMI) Primers, each UMI Primer of the set of UMI Primers comprising a unique molecular identifier sequence (UMI) and a gene-specific sequence that is complementary to a Target RNA Region subsequence; (ii) a first DNA polymerase; and (iii) reagents and buffers needed for DNA polymerase extension to generate a mixture; (c) subjecting the mixture to temperatures that allow primer binding and DNA polymerase extension to generate a first product; (d) contacting the first product with a second set of primers; (e) subjecting the mixture to temperatures that allow primer binding and DNA polymerase extension to generate a second product; (f)
  • UMI Unique Molecular
  • Figure 1 depicts a schematic of the one-pot RNA Quantitative Amplicon sequencing (QASeq) workflow.
  • the sequencing library preparation consists of three PCR reaction steps: unique molecular identifier (UMI) PCR (“PCR 1”), nested PCR (“PCR 2”), and index PCR (“PCR 3”).
  • UMI unique molecular identifier
  • PCR 1 PCR 1
  • PCR 2 nested PCR
  • PCR 3 index PCR
  • Figure 2 depicts a schematic of an expression level calculation formula.
  • Figure 3 depicts RNA QASeq quantitation accuracy validation with External
  • RNA Controls Consortium (ERCC) spike-in reference sample RNA Controls Consortium (ERCC) spike-in reference sample.
  • Figure 4 depicts RNA QASeq quantitation reproducibility, using 10 ng of total liver RNA as input, shown in technical replicates.
  • Figure 5 depicts reduced RNA QASeq quantitation variability based on the number of amplicons per gene, shown in replicates. As the number of amplicons per genes increases, the standard deviation of the relative expression level shrinks.
  • Figure 6 depicts a comparison of relative RNA expression levels of genes obtained from RNA QASeq, nanoString, microarray, and standard RNA Seq, using a breast cancer formalin-fixed, paraffin-embedded (FFPE) RNA sample.
  • the expression level values observed for RNA QASeq are similar to those observed using nanoString and standard RNA Seq.
  • the expression level values observed using microarray show poor correlation with the expression level values observed using RNA QASeq, nanoString, and standard RNA Seq.
  • Figure 7 comprises panels A, B, C, D, E, F, G, H, I and J.
  • Figure 7 depicts a comparison of RNA expression levels obtained from RNA QASeq and nanoString, using various samples as input: Breast Cancer (BC) 1 replicate 1 (panel A), BC 1 replicate 2 (panel B), BC 1 with 10 ng of input (panel C), BC 2 replicate 1 (panel D), BC 2 replicate 2 (panel E), BC 2 with 10 ng of input (panel F), BC 3 replicate 1 (panel G), BC 3 replicate 2 (panel H), Lung Cancer (LC) replicate 1 (panel I), and LC replicate 2 (panel J).
  • BC Breast Cancer
  • RNA QASeq RNA QASeq
  • Figure 8 comprises panels A, B, C, D, E, F, G, H, I and J.
  • Figure 8 depicts a comparison of RNA expression levels obtained from RNA QASeq and microarray human transcriptome array (HTA), using various samples as input: BC 1 replicate 1 (panel A), BC 1 replicate 2 (panel B), BC 1 with 5 ng of input (panel C), BC 2 replicate 1 (panel D), BC 2 replicate 2 (panel E), BC 2 with 1 ng of input (panel F), BC 3 replicate 1 (panel G), BC 3 replicate 2 (panel H), LC replicate 1 (panel I), and LC replicate 2 (panel J).
  • HTA microarray human transcriptome array
  • Figure 9 comprises panels A, B, C, D, E, F, G, and H.
  • Figure 9 depicts a comparison of RNA expression levels obtained from RNA QASeq and standard RNA Seq, using various samples as input: BC 1 (panel A), BC 2 (panel B), BC 3 (panel C), commercial total liver RNA (panel D), total RNA from human blood sample (panel E), placenta FFPE 1 (panel F), placenta FFPE 2 (panel G), and placenta FFPE 3 (panel H).
  • About 20 million reads are assigned for standard RNA Seq, with ribosomal depletion.
  • Low expression level genes may be dropped out in standard RNA Seq (for example, dashed boxes in panels A and C), especially in FFPE samples. In general, the expression levels correlate well between RNA QASeq and standard RNA Seq.
  • Figure 10 depicts a comparison of RNA expression levels of three target genes (BAG1, MMP11, and BIRC5) obtained from RNA QASeq and RT-qPCR, using 10 ng of total human liver RNA as input for each RNA QASeq library preparation or each RT-qPCR well.
  • the expression levels of the three target genes are normalized with five reference genes (TFRC, GUSB, RPLP0, ACTB, and GAPDH).
  • Figure 11 depicts RNA expression level quantitation for the same FFPE RNA sample obtained from nanoString, at different input amounts.
  • Figure 12 comprises panels A, B, C, D, and E.
  • Figure 12 depicts a comparison of RNA expression levels obtained from nanoString and microarray HTA, using various samples as input: BC 1 (panel A), BC 2 (panel B), BC 3 (panel C), LC (panel D), and Placenta (panel E).
  • Figure 13 depicts relative RNA expression levels measured by RNA QASeq in four clinical FFPE samples and three healthy placenta FFPE samples. Hierarchical clustering indicated the expression patterns are the most similar among healthy placenta samples.
  • Figure 14 comprises panels A and B.
  • Figure 14 depicts consistency of UMI count (panel A) and total reads (panel B) between the open-tube RNA QASeq workflow and the one-pot RNA QASeq workflow.
  • the open-tube and one-pot methods provide similar results for each metric.
  • any and all combinations of the members that make up that grouping of alternatives is specifically envisioned. For example, if an item is selected from a group consisting of A, B, C, and D, the inventors specifically envision each alternative individually (e.g., A alone, B alone, etc.), as well as combinations such as A, B, and D; A and C; B and C; etc.
  • the term “and/or” when used in a list of two or more items means any one of the listed items by itself or in combination with any one or more of the other listed items.
  • the expression “A and/or B” is intended to mean either or both of A and B - i.e., A alone, B alone, or A and B in combination.
  • the expression “A, B and/or C” is intended to mean A alone, B alone, C alone, A and B in combination, A and C in combination, B and C in combination, or A, B, and C in combination.
  • range is understood to inclusive of the edges of the range as well as any number between the defined edges of the range.
  • “between 1 and 10” includes any number between 1 and 10, as well as the number 1 and the number 10.
  • a compound or “at least one compound” may include a plurality of compounds, including mixtures thereof.
  • plural refers to any number greater than one.
  • this disclosure provides a method for quantitating between 1 and 10,000 Target RNA Regions in an RNA sample, the method comprising: (a) converting RNA molecules in the RNA sample to complementary DNA (cDNA) molecules; (b) contacting the cDNA molecules with: (i) a set of Unique Molecular Identifier (UMI) Primers, each UMI Primer of the set of UMI Primers comprising a unique molecular identifier sequence (UMI) and a gene-specific sequence that is complementary to a Target RNA Region subsequence; (ii) a second set of primers; (iii) a first DNA polymerase; and (iv) reagents and buffers needed for DNA polymerase extension to generate a mixture; (c) subjecting the mixture to temperatures that allow primer binding and DNA polymerase extension; (d) removing non-extended UMI Primers to generate a product; (e) preparing a sequencing library using the product; (1) subjecting the sequencing library to high-
  • UMI Unique Molecular
  • this disclosure provides a method for quantitating between 1 and 10,000 Target RNA Regions in an RNA sample, the method comprising: (a) converting RNA molecules in the RNA sample to complementary DNA (cDNA) molecules; (b) contacting the cDNA molecules with: (i) a set of Unique Molecular Identifier (UMI) Primers, each UMI Primer of the set of UMI Primers comprising a unique molecular identifier sequence (UMI) and a gene-specific sequence that is complementary to a Target RNA Region subsequence; (ii) optionally, a second set of primers; (iii) a first DNA polymerase; and (iv) reagents and buffers needed for DNA polymerase extension to generate a mixture; (c) subjecting the mixture to temperature that allow primer binding and DNA polymerase extension; (d) removing non-extended UMI Primers to generate a product; (e) preparing a sequencing library using the product; (I) subjecting the
  • this disclosure provides a method for quantitating between 1 and 10,000 Target RNA Regions in an RNA sample, the method comprising: (a) converting RNA molecules in the RNA sample to complementary DNA (cDNA) molecules; (b) contacting the cDNA molecules with: (i) a set of Unique Molecular Identifier (UMI) Primers, each UMI Primer of the set of UMI Primers comprising a unique molecular identifier sequence (UMI) and a gene-specific sequence that is complementary to a Target RNA Region subsequence; (ii) a first DNA polymerase; and (iii) reagents and buffers needed for DNA polymerase extension to generate a mixture; (c) subjecting the mixture to temperatures that allow primer binding and DNA polymerase extension to generate a first product; (d) contacting the first product with a second set of primers; (e) subjecting the mixture to temperatures that allow primer binding and DNA polymerase extension to generate a second product; (I)
  • UMI Unique Molecular
  • DNA refers to deoxyribonucleic acid. DNA can be either single-stranded or double-stranded. DNA typically comprises four nucleotides: cytosine (C), guanine (G), adenine (A), and thymine (T). In an aspect, the sequence of a DNA molecule provided herein comprises one or more degenerate nucleotides. As used herein, a “degenerate nucleotide” refers to a nucleotide that can perform the same function or yield the same output as a structurally different nucleotide.
  • Non-limiting examples of degenerate nucleotides include a C, G, or T nucleotide (B); an A, G, or T nucleotide (D); an A, C, or T nucleotide (H); a G or T nucleotide (K); an A or C nucleotide (M); any nucleotide (N); an A or G nucleotide (R); a G or C nucleotide (S); an A, C, or G nucleotide (V); an A or T nucleotide (W), and a C or T nucleotide (Y).
  • RNA refers to ribonucleic acid. RNA can be either singlestranded or double-stranded. RNA typically comprises four nucleotides: cytosine (C), guanine (G), adenine (A), and uracil (U). In an aspect, the sequence of an RNA molecule provided herein comprises one or more degenerate nucleotides. As used herein, a “degenerate nucleotide” refers to a nucleotide that can perform the same function or yield the same output as a structurally different nucleotide.
  • Non-limiting examples of degenerate nucleotides include a C, G, or U nucleotide (B); an A, G, or U nucleotide (D); an A, C, or U nucleotide (H); a G or U nucleotide (K); an A or C nucleotide (M); any nucleotide (N); an A or G nucleotide (R); a G or C nucleotide (S); an A, C, or G nucleotide (V); an A or U nucleotide (W), and a C or U nucleotide (Y).
  • B C, G, or U nucleotide
  • D A, G, or U nucleotide
  • H A, C, or U nucleotide
  • K G or U nucleotide
  • M any nucleotide (N); an A or G nucleotide (R); a G or C nucleotide (S); an A,
  • one nucleic acid molecule can be complementary to a second nucleic acid molecule (e.g, a Target RNA Region subsequence).
  • a second nucleic acid molecule e.g, a Target RNA Region subsequence
  • the sequence of a nucleic acid molecule need not be 100% complementary to that of its target nucleic acid molecule to be specifically hybridizable or hybridizable.
  • a polynucleotide may hybridize over one or more segments such that intervening or adjacent segments are not involved in the hybridization event (e.g., a loop structure or hairpin structure).
  • an antisense nucleic acid molecule in which 18 of 20 nucleotides of the antisense compound are complementary to a target region, and would therefore specifically hybridize would represent 90 percent complementarity.
  • the remaining noncomplementary nucleotides may be clustered or interspersed with complementary nucleotides and need not be contiguous to each other or to complementary nucleotides. Percent complementarity between particular stretches of nucleic acid sequences within nucleic acids can be determined routinely using BLAST® programs (basic local alignment search tools) and PowerBLAST programs known in the art (see Altschul et al., J. Mol.
  • a gene-specific sequence is 100% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 99.5% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 99% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 98% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 97% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 96% complementary to a Target RNA Region subsequence.
  • a gene-specific sequence is at least 95% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 94% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 93% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 92% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 91% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 90% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 85% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 80% complementary to a Target RNA Region subsequence.
  • percent complementary as used herein in reference to two nucleotide sequences is similar to the concept of percent identity but refers to the percentage of nucleotides of a query sequence that optimally base-pair or hybridize to nucleotides a subject sequence when the query and subject sequences are linearly arranged and optimally base paired without secondary folding structures, such as loops, stems or hairpins.
  • Such a percent complementarity can be between two DNA strands, two RNA strands, or a DNA strand and a RNA strand.
  • the “percent complementarity” can be calculated by (i) optimally base-pairing or hybridizing the two nucleotide sequences in a linear and fully extended arrangement (i.e.
  • Optimal base pairing of two sequences can be determined based on the known pairings of nucleotide bases, such as G-C, A-T, and A-U, through hydrogen binding.
  • the percent identity is determined by dividing the number of complementary positions between the two linear sequences by the total length of the reference sequence.
  • the “percent complementarity” for the query sequence is equal to the number of base-paired positions between the two sequences divided by the total number of positions in the query sequence over its length, which is then multiplied by 100%.
  • a DNA molecule comprises a UMI.
  • an RNA molecule comprises a UMI.
  • a primer comprises a UMI.
  • a “unique molecular identifier” refers to a unique nucleotide sequence that serves as a molecular barcode for an individual molecule. UMIs are often attached to DNA molecules in a sample library to uniquely tag each molecule. UMIs enable error correction and increased accuracy during sequencing of DNA molecules.
  • a UMI sequence comprises between 7 nucleotides and 30 nucleotides. In an aspect, a UMI sequence comprises between 5 nucleotides and 40 nucleotides. In an aspect, a UMI sequence comprises between 10 nucleotides and 20 nucleotides. In an aspect, a UMI sequence comprises at least 5 nucleotides. In an aspect, a UMI sequence comprises at least 7 nucleotides. In an aspect, a UMI sequence comprises at least 10 nucleotides. In an aspect, a UMI sequence comprises at least 15 nucleotides. In an aspect, a UMI sequence comprises fewer than 50 nucleotides. In an aspect, a UMI sequence comprises fewer than 40 nucleotides. In an aspect, a UMI sequence comprises fewer than 30 nucleotides. In an aspect, a UMI sequence comprises fewer than 20 nucleotides.
  • a UMI sequence comprises between 7 degenerate nucleotides and 30 degenerate nucleotides. In an aspect, a UMI sequence comprises between 5 degenerate nucleotides and 40 degenerate nucleotides. In an aspect, a UMI sequence comprises between 10 degenerate nucleotides and 20 degenerate nucleotides. In an aspect, a UMI sequence comprises at least 5 degenerate nucleotides. In an aspect, a UMI sequence comprises at least 7 degenerate nucleotides. In an aspect, a UMI sequence comprises at least 10 degenerate nucleotides.
  • a UMI sequence comprises at least 15 degenerate nucleotides. In an aspect, a UMI sequence comprises fewer than 50 degenerate nucleotides. In an aspect, a UMI sequence comprises fewer than 40 degenerate nucleotides. In an aspect, a UMI sequence comprises fewer than 30 degenerate nucleotides. In an aspect, a UMI sequence comprises fewer than 20 degenerate nucleotides.
  • each degenerate nucleotide in a UMI sequence is individually selected from the group consisting of N, B, D, H, V, S, W, Y, R, M, and K.
  • a UMI sequence comprises between 7 degenerate nucleotides and 30 degenerate nucleotides, where each degenerate nucleotide is selected from the group consisting of N, B, D, H, V, S, W, Y, R, M, and K.
  • a method comprises removal of sequencing reads where the UMI sequence of the sequencing reads do not comprise a predefined UMI degenerate base design pattern.
  • a “predefined UMI degenerate base design pattern” refers to a UMI sequence comprising the expected number of degenerate bases and the expected type of degenerate bases for a given method. Non-limiting examples of inappropriate degenerate base designs would include UMI sequences comprising too many degenerate bases or too few degenerate bases.
  • a method comprises removal of at least one sequencing read where the UMI sequence of the at least one sequencing read does not comprise a predefined UMI degenerate base design pattern.
  • a method comprises removal of at least two sequencing reads where the UMI sequence of the at least two sequencing reads do not comprise a predefined UMI degenerate base design pattern. In an aspect, a method comprises removal of at least three sequencing reads where the UMI sequence of the at least three sequencing reads do not comprise a predefined UMI degenerate base design pattern. In an aspect, a method comprises removal of at least four sequencing reads where the UMI sequence of the at least four sequencing reads do not comprise a predefined UMI degenerate base design pattern. In an aspect, a method comprises removal of at least five sequencing reads where the UMI sequence of the at least five sequencing reads do not comprise a predefined UMI degenerate base design pattern.
  • a “Target RNA Region” refers to an RNA region of interest.
  • a Target RNA Region comprises a gene sequence.
  • a Target RNA Region comprises an exon sequence.
  • a Target RNA Region comprises an intron sequence.
  • a Target RNA Region comprises a 5' untranslated region (UTR) sequence.
  • a Target RNA Region comprises a 3' UTR sequence.
  • a Target RNA Region comprises at least 5 nucleotides. In an aspect, a Target RNA Region comprises at least 25 nucleotides. In an aspect, a Target RNA Region comprises at least 50 nucleotides. In an aspect, a Target RNA Region comprises at least 100 nucleotides. In an aspect, a Target RNA Region comprises at least 500 nucleotides. In an aspect, a Target RNA Region comprises at least 1000 nucleotides. In an aspect, a Target RNA Region comprises at least 5000 nucleotides.
  • a Target RNA Region comprises between 5 nucleotides and 10,000 nucleotides. In an aspect, a Target RNA Region comprises between 5 nucleotides and 5,000 nucleotides. In an aspect, a Target RNA Region comprises between 5 nucleotides and 1,000 nucleotides. In an aspect, a Target RNA Region comprises between
  • a Target RNA Region comprises between
  • a Target RNA Region comprises between
  • a Target RNA Region comprises between 25 nucleotides and 500 nucleotides. In an aspect, a Target RNA Region comprises between 25 nucleotides and 100 nucleotides. In an aspect, a Target RNA Region comprises between 50 nucleotides and 500 nucleotides. In an aspect, a Target RNA Region comprises between 50 nucleotides and 100 nucleotides. In an aspect, a Target RNA Region comprises between 100 nucleotides and 500 nucleotides.
  • an RNA sample provided herein comprises between 1 Target RNA Region and 10,000 Target RNA Regions. In an aspect, an RNA sample provided herein comprises between 1 Target RNA Region and 100,000 Target RNA Regions. In an aspect, an RNA sample provided herein comprises between 1 Target RNA Region and 1000 Target RNA Regions. In an aspect, an RNA sample provided herein comprises between 1 Target RNA Region and 500 Target RNA Regions. In an aspect, an RNA sample provided herein comprises between 1 Target RNA Region and 100 Target RNA Regions. In an RNA sample provided herein comprises between 1 Target RNA Region and 10 Target RNA Regions. In an aspect, an RNA sample provided herein comprises between 5 Target RNA Regions and 10 Target RNA Regions.
  • an RNA sample provided herein comprises between 5 Target RNA Regions and 50 Target RNA Regions. In an aspect, an RNA sample provided herein comprises between 5 Target RNA Regions and 100 Target RNA Regions. In an aspect, an RNA sample provided herein comprises between 10 Target RNA Regions and 50 Target RNA Regions.
  • an RNA sample provided herein comprises at least 1 Target RNA Region. In an aspect, an RNA sample provided herein comprises at least 2 Target RNA Regions. In an aspect, an RNA sample provided herein comprises at least 5 Target RNA Regions. In an aspect, an RNA sample provided herein comprises at least 10 Target RNA Regions. In an aspect, an RNA sample provided herein comprises at least 25 Target RNA Regions. In an aspect, an RNA sample provided herein comprises at least 50 Target RNA Regions. In an RNA sample provided herein comprises at least 100 Target RNA Regions. In an aspect, an RNA sample provided herein comprises at least 1000 Target RNA Regions. In an RNA sample provided herein comprises at least 10,000 Target RNA Regions. In an aspect, an RNA sample provided herein comprises at least 100,000 Target RNA Regions.
  • a method comprises calculating Target RNA transcript expression level based on a mean molecule count of the Target RNA Region within the same transcript.
  • Target RNA transcript refers to an RNA transcript comprising one or more Target RNA Regions.
  • mean molecule count refers to the average number of times a copy of a given molecule is present in a sample, solution, product, or mixture.
  • 1 Target RNA Region is amplified from 1 Target RNA transcript.
  • 2 Target RNA Regions are amplified from 1 Target RNA transcript.
  • between 1 and 3 Target RNA Regions are amplified from 1 Target RNA transcript.
  • between 1 and 4 Target RNA Regions are amplified from 1 Target RNA transcript. In an aspect, between 1 and 5 Target RNA Regions are amplified from 1 Target RNA transcript. In an aspect, between 1 and 6 Target RNA Regions are amplified from 1 Target RNA transcript. In an aspect, between 1 and 7 Target RNA Regions are amplified from 1 Target RNA transcript. In an aspect, between 1 and 8 Target RNA Regions are amplified from 1 Target RNA transcript. In an aspect, between 1 and 9 Target RNA Regions are amplified from 1 Target RNA transcript. In an aspect, between 1 and 10 Target RNA Regions are amplified from 1 Target RNA transcript. In an aspect, between 1 and 15 Target RNA Regions are amplified from 1 Target RNA transcript.
  • Target RNA Regions are amplified from 1 Target RNA transcript. In an aspect, between 1 and 25 Target RNA Regions are amplified from 1 Target RNA transcript. In an aspect, between 1 and 30 Target RNA Regions are amplified from 1 Target RNA transcript.
  • At least 1 Target RNA Region is positioned within a single Target RNA transcript.
  • at least 2 Target RNA Regions are positioned within a single Target RNA transcript.
  • at least 3 Target RNA Regions are positioned within a single Target RNA transcript.
  • at least 4 Target RNA Regions are positioned within a single Target RNA transcript.
  • at least 5 Target RNA Regions are positioned within a single Target RNA transcript.
  • at least 6 Target RNA Regions are positioned within a single Target RNA transcript.
  • at least 7 Target RNA Regions are positioned within a single Target RNA transcript.
  • At least 8 Target RNA Regions are positioned within a single Target RNA transcript. In an aspect, at least 9 Target RNA Regions are positioned within a single Target RNA transcript. In an aspect, at least 10 Target RNA Regions are positioned within a single Target RNA transcript. In an aspect, at least 11 Target RNA Regions are positioned within a single Target RNA transcript. In an aspect, at least 12 Target RNA Regions are positioned within a single Target RNA transcript. In an aspect, at least 13 Target RNA Regions are positioned within a single Target RNA transcript. In an aspect, at least 14 Target RNA Regions are positioned within a single Target RNA transcript. In an aspect, at least 15 Target RNA Regions are positioned within a single Target RNA transcript.
  • At least 20 Target RNA Regions are positioned within a single Target RNA transcript. In an aspect, at least 25 Target RNA Regions are positioned within a single Target RNA transcript. In an aspect, at least 30 Target RNA Regions are positioned within a single Target RNA transcript. In an aspect, at least 35 Target RNA Regions are positioned within a single Target RNA transcript. In an aspect, at least 40 Target RNA Regions are positioned within a single Target RNA transcript. In an aspect, at least 45 Target RNA Regions are positioned within a single Target RNA transcript. In an aspect, at least 50 Target RNA Regions are positioned within a single Target RNA transcript.
  • RNA is present in, or obtained from, a sample (e.g, an “RNA sample”).
  • a sample refers to any biological material that is capable of being analyzed by or subjected to the methods and/or compositions provided herein. Any suitable method known in the art can be used to obtain a nucleic acid (e.g, an RNA molecule) from a sample.
  • a sample comprises RNA.
  • a sample comprises RNA and DNA.
  • a sample is obtained from a subject.
  • a “subject” refers to an animal (e.g, without being limiting, a mammal, reptile, bird, fish, amphibian) or other organism, such as, without being limiting, a plant or fungus.
  • a subject can be a healthy individual, an individual that has or is suspected of having a disease or a predisposition to the disease, or an individual that is in need of therapy or suspected of needing therapy.
  • the term “individual” and “subject” are intended to be interchangeable.
  • a subject is a eukaryote.
  • a subject is a prokaryote.
  • a subject is a virus. In an aspect, a subject is an animal. In an aspect, a subject is a plant. In an aspect, a subject is a fungus. In an aspect, a subject is a mammal. In an aspect, a subject is a rodent. In an aspect, a subject is a mouse. In an aspect, a subject is a rat. In an aspect, a subject is a rabbit. In an aspect, a subject is a cat. In an aspect, a subject is a dog. In an aspect, a subject is a horse. In an aspect, a subject is a cow. In an aspect, a subject is a pig. In an aspect, a subject is a primate.
  • a subject is a monkey. In an aspect, a subject is a chimpanzee. In an aspect, a subject is a human. In an aspect, a subject is a bird. In an aspect, a subject is a chicken. In an aspect, a subject is a fish. In an aspect, a subject is a reptile. In an aspect, a subject is an amphibian. In an aspect, a subject is an insect. In an aspect, a subject is an arachnid. In an aspect, a subject is a crustacean. In an aspect, a subject is a mollusk. In an aspect, a subject is a nematode. In an aspect, a subject is an annelid.
  • a subject has, or is suspected of having, cancer.
  • a subject has, or is suspected of having, colorectal cancer.
  • a subject has, or is suspected of having, gastric cancer.
  • a subject has, or is suspected of having, endometrial cancer.
  • a subject has, or is suspected of having, a genetic-based disease, disorder, or condition.
  • RNA can originate from and/or be isolated from any types of cancer for use with the methods and compositions provided herein.
  • Samples can be obtained from any type of cancer.
  • cancers include biliary tract cancer, bladder cancer, transitional cell carcinoma, urothelial carcinoma, brain cancer, gliomas, astrocytomas, breast carcinoma, metaplastic carcinoma, cervical cancer, cervical squamous cell carcinoma, rectal cancer, colorectal carcinoma, colon cancer, hereditary nonpolyposis colorectal cancer, colorectal adenocarcinomas, gastrointestinal stromal tumors (GISTs), endometrial carcinoma, endometrial stromal sarcomas, esophageal cancer, esophageal squamous cell carcinoma, esophageal adenocarcinoma, ocular melanoma, uveal melanoma, gallbladder carcinomas, gallbladder adenocarcinoma, renal cell carcinoma
  • Prostate cancer prostate adenocarcinoma, skin cancer, melanoma, malignant melanoma, cutaneous melanoma, small intestine carcinomas, stomach cancer, gastric carcinoma, gastrointestinal stromal tumor (GIST), uterine cancer, or uterine sarcoma.
  • a sample comprises a cell.
  • a sample comprises a tissue.
  • a sample comprises an organ.
  • a sample comprises blood.
  • a sample comprises plasma.
  • a sample comprises urine.
  • a sample comprises feces. Additional non-limiting examples of samples include serum, sputum, semen, vaginal fluid, synovial fluid, spinal fluid, and saliva.
  • an RNA sample provided herein is obtained from a source selected from the group consisting of formalin-fixed paraffin-embedded tissue, whole blood, plasma, and fresh tissue.
  • an RNA sample provided herein is obtained from formalin-fixed paraffin-embedded tissue. In an aspect, an RNA sample provided herein is obtained from whole blood. In an aspect, an RNA sample provided herein is obtained from plasma. In an aspect, an RNA sample provided herein is obtained from fresh tissue. In an aspect, an RNA sample provided herein is a human RNA sample. In an aspect, an RNA sample provided herein is an animal RNA sample.
  • an RNA sample provided herein comprises less than or equal to 100 ng of RNA. In an aspect, an RNA sample provided herein comprises less than or equal to 75 ng of RNA. In an aspect, an RNA sample provided herein comprises less than or equal to 50 ng of RNA. In an aspect, an RNA sample provided herein comprises less than or equal to 25 ng of RNA. In an aspect, an RNA sample provided herein comprises less than or equal to 20 ng of RNA. In an RNA sample provided herein comprises less than or equal to 15 ng of RNA. In an aspect, an RNA sample provided herein comprises less than or equal to 10 ng of RNA. In an aspect, an RNA sample provided herein comprises less than or equal to 5 ng of RNA. In an RNA sample provided herein comprises less than or equal to 1 ng of RNA.
  • a sample provided herein comprises less than or equal to 100 ng of RNA. In an aspect, a sample provided herein comprises less than or equal to 75 ng of RNA. In an aspect, a sample provided herein comprises less than or equal to 50 ng of
  • RNA in an aspect, comprises less than or equal to 25 ng of
  • RNA in an aspect, comprises less than or equal to 20 ng of
  • RNA in an aspect, comprises less than or equal to 15 ng of
  • RNA in an aspect, comprises less than or equal to 10 ng of
  • RNA in an aspect, comprises less than or equal to 5 ng of
  • RNA in an aspect, comprises less than or equal to 1 ng of
  • a “UMI Family” refers to a group of sequencing reads that comprise identical UMI sequences and also aligns to the same amplicon.
  • a UMI Family comprises at least 1 sequencing read.
  • a UMI Family comprises at least 2 sequencing reads.
  • a UMI Family comprises at least 5 sequencing reads.
  • a UMI Family comprises at least 10 sequencing reads.
  • a UMI Family comprises at least 50 sequencing reads.
  • a UMI Family comprises at least 100 sequencing reads.
  • a UMI Family comprises at least 500 sequencing reads.
  • a UMI Family comprises at least 1000 sequencing reads.
  • a UMI Family comprises at least 2500 sequencing reads.
  • a UMI Family comprises at least 5000 sequencing reads.
  • a UMI Family comprises at least 10,000 sequencing reads.
  • a UMI Family comprises between 1 sequencing read and 10,000 sequencing reads. In an aspect, a UMI Family comprises between 1 sequencing read and 5,000 sequencing reads. In an aspect, a UMI Family comprises between 1 sequencing read and 1000 sequencing reads. In an aspect, a UMI Family comprises between 1 sequencing read and 500 sequencing reads. In an aspect, a UMI Family comprises between 1 sequencing read and 100 sequencing reads.
  • an “amplicon” refers to a copy of DNA made via PCR.
  • a “UMI Primer” is an oligonucleotide molecule comprising a UMI sequence and a gene-specific sequence that is complementary to a Target RNA Region subsequence.
  • a UMI Primer is capable of generating an amplicon of a Target RNA Region to which it hybridizes.
  • a gene-specific sequence is 100% complementary to a Target RNA Region subsequence.
  • a gene-specific sequence is at least 99% complementary to a Target RNA Region subsequence.
  • a gene-specific sequence is at least 98% complementary to a Target RNA Region subsequence.
  • a gene-specific sequence is at least 97% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 96% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 95% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 90% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 85% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 80% complementary to a Target RNA Region subsequence.
  • a “Target RNA Region” subsequence refers to any portion of a Target RNA Region that is at least one nucleotide shorter in length as compared to a full- length Target RNA Region.
  • a UMI Primer binds to a Target RNA Region subsequence.
  • a Target RNA Region subsequence comprises at least 1 fewer nucleotides as compared to a full-length Target RNA Region. In an aspect, a Target RNA Region subsequence comprises at least 2 fewer nucleotides as compared to a full-length Target RNA Region. In an aspect, a Target RNA Region subsequence comprises at least 3 fewer nucleotides as compared to a full-length Target RNA Region. In an aspect, a Target RNA Region subsequence comprises at least 4 fewer nucleotides as compared to a full- length Target RNA Region.
  • a Target RNA Region subsequence comprises at least 5 fewer nucleotides as compared to a full-length Target RNA Region. In an aspect, a Target RNA Region subsequence comprises at least 10 fewer nucleotides as compared to a full-length Target RNA Region. In an aspect, a Target RNA Region subsequence comprises at least 25 fewer nucleotides as compared to a full-length Target RNA Region. In an aspect, a Target RNA Region subsequence comprises at least 50 fewer nucleotides as compared to a full-length Target RNA Region.
  • a Target RNA Region subsequence comprises at least 5% fewer nucleotides as compared to a full-length Target RNA Region. In an aspect, a Target RNA Region subsequence comprises at least 10% fewer nucleotides as compared to a full- length Target RNA Region. In an aspect, a Target RNA Region subsequence comprises at least 25% fewer nucleotides as compared to a full-length Target RNA Region. In an aspect, a Target RNA Region subsequence comprises at least 50% fewer nucleotides as compared to a full-length Target RNA Region.
  • a Target RNA Region subsequence comprises at least 75% fewer nucleotides as compared to a full-length Target RNA Region. In an aspect, a Target RNA Region subsequence comprises at least 90% fewer nucleotides as compared to a full-length Target RNA Region.
  • a Target RNA Region subsequence comprises at least 5 nucleotides. In an aspect, a Target RNA Region subsequence comprises at least 15 nucleotides. In an aspect, a Target RNA Region subsequence comprises at least 25 nucleotides. In an aspect, a Target RNA Region subsequence comprises at least 35 nucleotides. In an aspect, a Target RNA Region subsequence comprises at least 50 nucleotides. In an aspect, a Target RNA Region subsequence comprises at least 75 nucleotides. In an aspect, a Target RNA Region subsequence comprises at least 100 nucleotides.
  • a Target RNA Region subsequence comprises between 5 and 500 nucleotides. In an aspect, a Target RNA Region subsequence comprises between 5 and 250 nucleotides. In an aspect, a Target RNA Region subsequence comprises between 5 and 100 nucleotides. In an aspect, a Target RNA Region subsequence comprises between 5 and 50 nucleotides. In an aspect, a Target RNA Region subsequence comprises between 5 and 35 nucleotides. In an aspect, a Target RNA Region subsequence comprises between 15 and 35 nucleotides.
  • a UMI Primer comprises, in the order from 5' to 3', (a) a first universal region; (b) an optional second region comprising a length of between 1 nucleotide and 50 nucleotides; (c) a third region comprising a UMI sequence; and (d) a fourth region comprising a gene-specific sequence that is complementary to a Target RNA Region subsequence.
  • a “universal region” refers to a sequence that remains the same in UMI primers designed for different Target RNA Regions.
  • a universal region comprises at least 1 nucleotide. In an aspect, a universal region comprises at least 2 nucleotides. In an aspect, a universal region comprises at least 3 nucleotides. In an aspect, a universal region comprises at least 4 nucleotides. In an aspect, a universal region comprises at least 5 nucleotides. In an aspect, a universal region comprises at least 6 nucleotides. In an aspect, a universal region comprises at least 7 nucleotides. In an aspect, a universal region comprises at least 8 nucleotides. In an aspect, a universal region comprises at least 9 nucleotides. In an aspect, a universal region comprises at least 10 nucleotides.
  • a universal region comprises at least 15 nucleotides. In an aspect, a universal region comprises at least 20 nucleotides. In an aspect, a universal region comprises at least 25 nucleotides. In an aspect, a universal region comprises at least 30 nucleotides. In an aspect, a universal region comprises at least 35 nucleotides. In an aspect, a universal region comprises at least 40 nucleotides. In an aspect, a universal region comprises at least 45 nucleotides. In an aspect, a universal region comprises at least 50 nucleotides.
  • a universal region comprises between 1 nucleotide and 50 nucleotides. In an aspect, a universal region comprises between 1 nucleotide and 40 nucleotides. In an aspect, a universal region comprises between 1 nucleotide and 30 nucleotides. In an aspect, a universal region comprises between 1 nucleotide and 20 nucleotides. In an aspect, a universal region comprises between 1 nucleotide and 10 nucleotides.
  • a “Universal Forward Primer” refers to a primer comprising a universal region.
  • a “Universal Reverse Primer” refers to a primer comprising a universal region.
  • non-extended UMI Primers are removed from a mixture via a method selected from the group consisting of solid phase reversible immobilization (SPRI) purification, column purification, and enzymatic digestion.
  • SPRI solid phase reversible immobilization
  • non-extended UMI Primers are removed from a mixture via solid phase reversible immobilization purification.
  • non-extended UMI Primers are removed from a mixture via column purification.
  • non-extended UMI Primers are removed from a mixture via enzymatic digestion.
  • non-extended UMI Primers are removed from a mixture using any suitable method known in the art.
  • a method comprises the use of a reverse transcription primer.
  • a “reverse transcription primer” refers to a primer used in a reverse transcription reaction, where RNA in an RNA sample is converted to complementary DNA (cDNA).
  • complementary DNA or “cDNA” refers to a DNA copy of a messenger RNA (mRNA) molecule produced by a reverse transcriptase.
  • a reverse transcription primer comprises at least 1 degenerate nucleotide.
  • a reverse transcription primer comprises at least 2 degenerate nucleotides.
  • a reverse transcription primer comprises at least 3 degenerate nucleotides.
  • a reverse transcription primer comprises at least 4 degenerate nucleotides. In an aspect, a reverse transcription primer comprises at least 5 degenerate nucleotides. In an aspect, a reverse transcription primer comprises at least 6 degenerate nucleotides. In an aspect, a reverse transcription primer comprises at least 7 degenerate nucleotides. In an aspect, a reverse transcription primer comprises at least 8 degenerate nucleotides. In an aspect, a reverse transcription primer comprises at least 9 degenerate nucleotides. In an aspect, a reverse transcription primer comprises at least 10 degenerate nucleotides. In an aspect, a reverse transcription primer comprises between 1 and 5 degenerate nucleotides.
  • a reverse transcription primer comprises between 1 and 10 degenerate nucleotides. In an aspect, a reverse transcription primer comprises between 1 and 15 degenerate nucleotides. In an aspect, a reverse transcription primer comprises a random hexamer. As is known in the art, a “hexamer” comprises six nucleotides. In an aspect, a reverse transcription primer comprises a polyT string.
  • a “polyT string” refers to at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 25, or at least 30 consecutive thymine nucleobases.
  • a method comprises the introduction of a set of Outer Primers and a set of Inner Primers, where between 3 nucleotides and 20 nucleotides positioned at the 3' end of the Inner Primer are not subsequences of the set of Outer Primers.
  • “Outer Primers” refers to primers that flank a set of “Inner Primers” on a Target RNA Region.
  • a first (e.g., forward) Outer Primer is positioned 5' to a first (e.g., forward) Inner Primer and a second (e.g, reverse) Outer Primer is positioned 3' to a second (e.g, reverse) Inner Primer.
  • a set of Outer Primers and a set of Inner Primers between 3 nucleotides and 10 nucleotides positioned at the 3' end of the Inner Primer are not subsequences of the set of Outer Primers. In an aspect, for a set of Outer Primers and a set of Inner Primers, between 3 nucleotides and 30 nucleotides positioned at the 3' end of the Inner Primer are not subsequences of the set of Outer Primers.
  • this disclosure provides at least one DNA polymerase.
  • a “DNA polymerase” refers to an enzyme that is capable of catalyzing the synthesis of a DNA molecule from nucleoside triphosphates.
  • DNA polymerases add a nucleotide to the 3' end of a DNA strand one nucleotide at a time, creating an antiparallel DNA strand as compared to a template DNA strand.
  • DNA polymerases are unable to begin a new DNA molecule de novo; they require a primer to which it can add a first new nucleotide.
  • a “reagent” refers to any substance or compound added to a mixture to cause a chemical reaction or to test if a chemical reaction occurs.
  • a reagent comprises a component selected from the group consisting of magnesium, at least one dNTP, phosphatase, betaine, dimethyl sulfoxide (DMSO), and tetramethylammonium chloride (TMAC).
  • Non-limiting examples of reagents and buffers needed for DNA polymerase extension include Tris-HCl, potassium chloride, magnesium chloride, oligonucleotide primers, deoxynucleotides (dNTPs), betaine, and dimethyl sulfoxide.
  • Tris-HCl Tris-HCl
  • potassium chloride potassium chloride
  • magnesium chloride oligonucleotide primers
  • dNTPs deoxynucleotides
  • betaine betaine
  • dimethyl sulfoxide dimethyl sulfoxide
  • DNA polymerases can extend primers at different temperatures, depending on the DNA polymerase.
  • a DNA polymerase extends primers at a temperature of at least 40°C.
  • a DNA polymerase extends primers at a temperature of at least 50°C.
  • a DNA polymerase extends primers at a temperature of at least 55°C.
  • a DNA polymerase extends primers at a temperature of at least 60°C.
  • a DNA polymerase extends primers at a temperature of at least 65°C.
  • a DNA polymerase extends primers at a temperature of at least 70°C.
  • a DNA polymerase extends primers at a temperature of at least 75°C. In an aspect, a DNA polymerase extends primers at a temperature of at least 80°C.
  • Primers can bind, or anneal, to a complementary part of a Target RNA Region subsequence at a variety of temperatures, depending on the structure and length of the sequences involved. In an aspect, primer binding occurs at a temperature of at least 35°C. In an aspect, primer binding occurs at a temperature of at least 40°C. In an aspect, primer binding occurs at a temperature of at least 45°C. In an aspect, primer binding occurs at a temperature of at least 50°C. In an aspect, primer binding occurs at a temperature of at least 55°C. In an aspect, primer binding occurs at a temperature of at least 60°C. In an aspect, primer binding occurs at a temperature of at least 65°C. In an aspect, primer binding occurs at a temperature of at least 70°C.
  • DNA polymerase extension and primer binding occur at different temperatures. In an aspect, DNA polymerase extension and primer binding occur at the same temperature.
  • a DNA polymerase is a thermostable DNA polymerase.
  • a “thermostable DNA polymerase” refers to DNA polymerases that can function at high temperatures (e.g, greater than 65°C) and can survive higher temperatures (e.g, up to about 100°C). Thermostable DNA polymerases often have maximal catalytic activity at temperatures between 70°C and 80°C.
  • a thermostable DNA polymerase is selected from the group consisting of comprising Taq DNA polymerase, Phusion® DNA polymerase, Q5® DNA polymerase, and KAPA High Fidelity DNA polymerase.
  • a DNA polymerase is a non-thermostable DNA polymerase.
  • a “non-thermostable DNA polymerase” refers to DNA polymerases that cannot function at high temperatures.
  • a non-thermostable DNA polymerase is selected from the group consisting of phi29 DNA polymerase and Bst DNA polymerase.
  • a method comprises high-throughput sequencing.
  • a method comprises subjecting a sequencing library to high-throughput sequencing.
  • “high-throughput sequencing” refers to any sequencing method that is capable of sequencing multiple (e.g, tens, hundreds, thousands, millions, hundreds of millions) DNA molecules in parallel.
  • Sanger sequencing is not high- throughput sequencing.
  • high-throughput sequencing comprises the use of a sequencing-by-synthesis (SBS) flow cell.
  • SBS flow cell is selected from the group consisting of an Illumina SBS flow cell and a Pacific Biosciences (PacBio) SBS flow cell.
  • high-throughput sequencing is performed via electrical current measurements in conjunction with an Oxford nanopore.
  • high-throughput DNA sequencing comprises sequencing-by- synthesis or nanopore-based sequencing.
  • a “sequencing read” refers to a nucleotide sequence of a single nucleic acid molecule generated via a high-throughput sequencing method.
  • sequencing reads are provided in a FASTX or FASTQ file type.
  • a sequencing read comprises a UMI sequence.
  • a sequencing read comprises a sequence from a gene.
  • a sequencing read comprises a UMI sequence and a sequence from a gene.
  • a sequencing read comprises at least 10 nucleotides. In an aspect, a sequencing read comprises at least 25 nucleotides. In an aspect, a sequencing read comprises at least 50 nucleotides. In an aspect, a sequencing read comprises at least 100 nucleotides. In an aspect, a sequencing read comprises at least 250 nucleotides. In an aspect, a sequencing read comprises at least 500 nucleotides. In an aspect, a sequencing read comprises at least 1000 nucleotides.
  • a sequencing read comprises between 10 nucleotides and 10,000 nucleotides. In an aspect, a sequencing read comprises between 10 nucleotides and 5000 nucleotides. In an aspect, a sequencing read comprises between 10 nucleotides and 1000 nucleotides. In an aspect, a sequencing read comprises between 10 nucleotides and 500 nucleotides. In an aspect, a sequencing read comprises between 10 nucleotides and 100 nucleotides. In an aspect, a sequencing read comprises between 25 nucleotides and 150 nucleotides.
  • a method comprises removing from consideration, for each Target RNA Region, UMI families comprising a UMI family size of less than X, where X is determined as Y% of the mean value for the largest Z UMI Family size(s) in the Target RNA Region.
  • Y is between 1% and 20% and Z is between 1 and 20.
  • Y is between 1% and 50% and Z is between 1 and 50.
  • Y is between 1% and 75% and Z is between 1 and 75.
  • Y is greater than 1% and Z is greater than 1.
  • Y is greater than 5% and Z is greater than 5.
  • Y is greater than 10% and Z is greater than 10.
  • Y and Z are the same integer. In an aspect, Y and Z are different integers. In an aspect, X and Y are the same integer. In an aspect, X and Y are different integers. In an aspect X and Z are the same integer. In an aspect, X and Z are different integers. In an aspect, X, Y, and Z are the same integer. In an aspect, X, Y, and Z are different integers.
  • a method comprises removing from consideration, for each Target RNA Region, UMI families comprising a UMI family size of less than X, where X is determined as Y% of the mean value for the largest Z UMI Family size(s) for all UMI families with the exact same genotype in the Target RNA Region.
  • Y is between 1% and 20% and Z is between 1 and 20.
  • Y is between 1% and 50% and Z is between 1 and 50.
  • Y is between 1% and 75% and Z is between 1 and 75.
  • Y is greater than 1% and Z is greater than 1.
  • Y is greater than 5% and Z is greater than 5.
  • Y is greater than 10% and Z is greater than 10.
  • Y and Z are the same integer.
  • Y and Z are different integers.
  • X and Y are the same integer.
  • X and Y are different integers.
  • X and Z are the same integer.
  • X and Z are different integers.
  • X, Y, and Z are the same integer.
  • X, Y, and Z are different integers.
  • a method comprises removing from consideration, for each Target RNA Region, UMI families comprising a UMI family size of less than X, where X is determined based on a Gaussian fitting on a histogram of log2(UMI Family size) for the Target RNA Region, where log2(X) is (the center of the histogram peak) - (3 x the standard deviation of the histogram).
  • X is between 1 and 10.
  • X is between 1 and 20.
  • X is between 1 and 50.
  • X is between 1 and 75.
  • X is between 1 and 100.
  • X is greater than 1.
  • X is greater than 5.
  • X is greater than 10.
  • X is greater than 50.
  • X is greater than 100.
  • X is greater than 250.
  • X is greater than 500.
  • a method comprises removing from consideration, for each Target RNA Region, UMI families comprising a UMI family size of less than X, where X is a fixed number.
  • X is between 2 and 10.
  • X is between 2 and 20.
  • X is between 2 and 30.
  • X is between 2 and 40.
  • X is between 2 and 50.
  • X is between 2 and 75.
  • X is between 2 and 100.
  • X is greater than 1.
  • X is greater than 5.
  • X is greater than 10.
  • X is greater than 15. In an aspect, X is greater than 20.
  • X is greater than 25. In an aspect, X is greater than 50. In an aspect, X is greater than 75. In an aspect, X is greater than 100. [0092] As used herein, the number of unique UMI sequences “N” is the total count of different UMI sequences at one locus, which indicates the number of original molecules of the Target RNA Regions * the barcoding yield.
  • a “barcoding yield” is the UMI attachment yield, and refers to the percentage of original molecules that can be attached with a UMI.
  • the barcoding yield for each Target RNA Region is calibrated using a sample with a known molecule count.
  • molecule count refers to the number of times a copy of a given nucleic acid molecule is present in a sample, solution, product, or mixture.
  • conversion yield is used interchangeably.
  • a method comprises identifying a UMI Family Sequence.
  • UMI Family Sequence refers to the most frequent nucleotide sequence within a UMI Family.
  • a method comprises removal of a first UMI sequence that differs by a fixed number of nucleotides from a second UMI sequence, where fewer sequencing reads contain the first UMI sequence as compared to the second UMI sequence.
  • a first UMI sequence differs from a second UMI sequence by one nucleotide.
  • a first UMI sequence differs from a second UMI sequence by two nucleotides.
  • a first UMI sequence differs from a second UMI sequence by three nucleotides.
  • a first UMI sequence differs from a second UMI sequence by four nucleotides.
  • a first UMI sequence differs from a second UMI sequence by five nucleotides. In an aspect, a first UMI sequence differs from a second UMI sequence by one nucleotide or two nucleotides. In an aspect, a first UMI sequence differs from a second UMI sequence by between one nucleotide and three nucleotides.
  • sequence 5'-AATG-3' differs from the sequence 5'-AATC-3' by one nucleotide.
  • sequence 5'- AATG-3' differs from the sequence 5'-AAAC-3' by two nucleotides.
  • a Target RNA Region is positioned within a BAG1 gene. In an aspect, a Target RNA Region is positioned within a PGR gene. In an aspect, a Target RNA Region is positioned within an ERBB2 gene. In an aspect, a Target RNA Region is positioned within a GRB7 gene. In an aspect, a Target RNA Region is positioned within a TFRC gene. In an aspect, a Target RNA Region is positioned within an MKI67 gene. In an aspect, a Target RNA Region is positioned within an MMP11 gene. In an aspect, a Target RNA Region is positioned within an MYBL2 gene. In an aspect, a Target RNA Region is positioned within a BIRC5 gene.
  • a Target RNA Region is positioned within an ESRI gene. In an aspect, a Target RNA Region is positioned within a CD68 gene. In an aspect, a Target RNA Region is positioned within a CTSV gene. In an aspect, a Target RNA Region is positioned within a BCL2 gene. In an aspect, a Target RNA Region is positioned within a CCNB1 gene. In an aspect, a Target RNA Region is positioned within a GUSB gene. In an aspect, a Target RNA Region is positioned within a SCUBE2 gene. In an aspect, a Target RNA Region is positioned within a RPLPO gene. In an aspect, a Target RNA Region is positioned within an ACTB gene.
  • a Target RNA Region is positioned within an AURKA gene. In an aspect, a Target RNA Region is positioned within a GAPDH gene. In an aspect, a Target RNA Region is positioned within one or more genes selected from the group consisting of: BAG1, PGR, ERBB2, GRB7, TFRC, MKI67, MMP11, MYBL2, BIRC5, ESRI, CD68, CTSV, BCL2, CCNB1, GUSB, SCUBE2, RPLPO, ACTB, AURKA, and GAPDH.
  • a method for quantitating between 1 and 10,000 Target RNA Regions in an RNA sample comprising:
  • UMI Unique Molecular Identifier
  • UMI Unique Molecular Identifier
  • UMI Unique Molecular Identifier
  • step (a) comprises the use of a reverse transcription primer.
  • reverse transcription primer comprises at least one degenerate nucleotide.
  • RNA sample is obtained from a source selected from the group consisting of formalin-fixed paraffin-embedded tissue, whole blood, plasma, and fresh tissue.
  • RNA sample is a human RNA sample.
  • UMI sequence comprises between 7 and 30 degenerate nucleotides.
  • each of the between 7 and 30 degenerate nucleotides is selected from the group consisting of N, B, D, H, V, S, W, Y, R, M, and K.
  • thermostable DNA polymerase 17. The method of any one of embodiments 1-3, where the first DNA polymerase is a thermostable DNA polymerase.
  • thermostable DNA polymerase is selected from the group consisting of Taq DNA polymerase, Phusion® DNA polymerase, Q5® DNA polymerase, and KAPA High Fidelity DNA polymerase.
  • step (d) comprises a purification method selected from the group consisting of solid phase reversible immobilization (SPRI) purification, column purification, and enzymatic digestion.
  • SPRI solid phase reversible immobilization
  • step (1) comprises a purification method selected from the group consisting of solid phase reversible immobilization (SPRI) purification, column purification, and enzymatic digestion.
  • SPRI solid phase reversible immobilization
  • step (g) comprises removal of at least one sequencing read where the UMI sequence of the at least one sequencing read does not comprise a predefined UMI degenerate base design pattern prior to the calculating.
  • step (i) comprises removal of at least one sequencing read where the UMI sequence of the at least one sequencing read does not comprise a predefined UMI degenerate base design pattern prior to the calculating.
  • step (g) comprises removal of a first UMI sequence that differs by only 1 or 2 nucleotides from a second UMI sequence, where fewer sequencing reads contain the first UMI sequence as compared to the second UMI sequence.
  • step (i) comprises removal of a first UMI sequence that differs by only 1 or 2 nucleotides from a second UMI sequence, where fewer sequencing reads contain the first UMI sequence as compared to the second UMI sequence.
  • step (g) comprises:
  • step (v) dividing N by a barcoding yield, to identify the number of original molecules of the Target RNA Regions, where the barcoding yield for each Target RNA Region is calibrated using a sample with a known molecule count.
  • step (v) dividing N by a barcoding yield, to identify the number of original molecules of the Target RNA Regions, where the barcoding yield for each Target RNA Region is calibrated using a sample with a known molecule count.
  • step (g) further comprises identifying a UMI Family Sequence.
  • step (i) further comprises identifying a UMI Family Sequence.
  • step (i) further comprises introduction of a set of Outer Primers
  • the second set of primers comprises a set of Inner Primers, where between 3 and 20 nucleotides at the 3' end of the Inner Primer are not subsequences of the Outer Primers.
  • RNA sample comprises less than or equal to 10 ng of RNA.
  • RNA QASeq involves the construction of targeted panels for gene expression quantification.
  • the panel design is accomplished by running a fully automated python pipeline denominated ‘pynab.’ Pynab generates highly optimized primer sequences (forward primer (fP), reverse inner primer (rPin), and reverse outer primer (rPout)) targeting sequence(s) of interest within a given gene(s).
  • the regions of interest include exons (transcripts) or specific hotspot locations with variant information such as single nucleotide polymorphisms (SNPs), indels, etc.
  • the primers generated by the pipeline can be easily multiplexed for single PCR reactions as they are designed to be chemically compatible.
  • the automated design pipeline is composed of four modules:
  • Module 1 Gathering gene information. This step involves obtaining genomic information for the sequence(s) of interest (e.g, Target RNA Regions), even if this is a single nucleotide position in a given exon. The input is a list of gene names or gene IDs or alternatively, a list of hotspot coordinates with start and stop positions and the corresponding gene name.
  • Module 2. Designing context sequences. The second step entails designing amplicons around the region of interest. There are two design strategies: 1) All-exon tiling and 2) hotspot.
  • the all-exon tiling strategy includes adding padding regions to exons, merging neighboring padded exons into sequences of interest (SOI), covering SOIs with inserts, adding primer design regions and conducting BLAST checks for each context sequence against a reference human genome to evaluate the number of hits and avoid including primers that will likely produce non-specific amplification.
  • SOI sequences of interest
  • the hotspot panel design strategy only includes covering the SOI with inserts and adding extended regions for primer design.
  • BLAST check is also included in the hotspot panel design.
  • Context sequence design for both strategies can be fully customized to produce long (for example -190 nucleotides) or short (for example -70 nucleotides) amplicons through a configuration file.
  • Module 3 Generate primer candidates.
  • a group of forward and reverse primer candidates are generated from the previously designated primer design regions. Initially, forward primers (fPs) from the fP design region will be generated. At least one fP is produced.
  • reverse primers (rPs) from the rP design region are generated. At least two rPs are produced with at minimum separating distance of 4 nucleotides.
  • Module 4 Optimize primers. In this final step, the set of candidate primers are carefully evaluated in a pair-wise fashion to estimate the likelihood of primer-dimer formation and if required, replace problematic primers with better candidates. Initially fP and rPout are optimized and then rPin is optimized based on existing optimized fP and rPout sequences.
  • RNA sample is initially reverse transcribed to cDNA as input for RNA QASeq protocol.
  • RNA is mixed with dNTP (0.5 mM), Murine RNase Inhibitor (8 U), M- MuLV buffer (IX), M-MuLV Reverse Transcriptase (8 U), and random hexamer (6 pM).
  • the mixture is incubated at 25 °C for 5 minutes, at 42° C for 60 minutes, and then is inactivated at 65°C for 20 minutes.
  • the reaction mixture is directly used as input for UMI PCR without purification.
  • the following sequencing library preparation consists of three PCR reactions: UMI PCR, nested PCR, and index PCR.
  • the cDNA sample is mixed with 1U Phusion High-Fidelity DNA polymerase, Phusion HF buffer, forward and outer reverse primers (15 nM each), and dNTPs (0.2 mM each) to reach a total volume of 50 pL.
  • Thermal cycling starts with 30 seconds at 98°C, followed by 2 cycles of 10 seconds at 98°C, 30 minutes at 63°C, and 15 seconds at 72°C, and then 2 cycles of 10 seconds at 98°C, 15 seconds at 63°C and 15 seconds at 72°C, finally 5 cycles of 10 seconds at 98°C and 30 seconds at 71°C.
  • 1.5 pM of each universal primer is added while keeping the reactions inside the thermal cycler. See Table 1.
  • 1.6X purification with AMPure XP beads is performed.
  • the purified eluate from the previous step is mixed with PowerUp SYBR Green Master Mix (IX final concentration) and 15 nM each inner reverse primer. Thermal cycling starts with 3 minutes at 95 °C, followed by 2 cycles of 10 seconds at 95°C and 30 minutes at 60°C.
  • the PCR product is purified by 1.6X AMPure XP beads.
  • index PCR is performed.
  • the purified eluate from the previous step is mixed with iTaq Universal SYBR Green Supermix (IX final concentration) and 250 nM each NEBNext index primers.
  • RNA QASeq workflow UMI PCR step
  • a cDNA sample is mixed with 1U Phusion High-Fidelity DNA polymerase, Phusion HF buffer, forward and outer reverse primers (15 nM each), universal primers (1.5 pM each) and dNTPs (0.2 mM each) to reach a total volume of 50 pL.
  • Thermal cycling starts with 30 seconds at 98°C, followed by 2 cycles of 10 seconds at 98°C, 30 minutes at 63°C and 15 seconds at 72°C, and then 2 cycles of 10 seconds at 98°C, 15 seconds at 63°C and 15 seconds at 72°C, finally 5 cycles of 10 seconds at 98°C and 30 seconds at 71°C.
  • PCR 1 step. There is no open-tube step for adding universal primers during PCR reaction. After UMI PCR, the mixture is purified using 1.6X AMPure XP beads. Following this, nested PCR and index PCR are performed as described in Example 2.
  • RNA QASeq technology for RNA quantitation is demonstrated in a variety of samples including tumor tissue FFPE RNA, total blood RNA, and total liver RNA.
  • RNA samples are reverse transcribed to cDNA as input for RNA QASeq as described above. Random hexamer is chosen as the reverse transcription primer to be compatible with low- quality fragmented FFPE RNA.
  • RNA accession refers to GenBank (ncbi[dot]nlm[dot]nih[dot]gov/genbank) accession numbers.
  • GenBank ncbi[dot]nlm[dot]nih[dot]gov/genbank accession numbers.
  • Example 5 Validation of RNA quantitation accuracy.
  • RNA quantitation accuracy is first validated using an ERCC RNA spike-in mix. Sixteen ERCC sequences are selected as Target RNA Regions and amplified. See Table 3. The ERCC RNA sample is diluted and mixed with commercial human total liver RNA for a final expected molecule count between 3 and 100,000. The observed molecule count shows good correlation with the expected molecule count. See Figure 3. RNA QASeq quantitation for RNA is across five orders of magnitude and as few as three expected molecules are detected. See Table 4.
  • RNA QASeq Reproducibility for expression level quantitation relative to reference genes in RNA QASeq is evaluated. Total liver RNA is assayed with breast cancer panel in replicate, and consistent expression level is observed. See Figure 4.
  • RNA QASeq reduces quantitation variability in the expression level.
  • the standard deviation for relative expression level in triplicate experiments becomes lower as the number of amplicons per gene increases from one to five, with the median standard deviation reducing from 0.44 to 0.21. An outlier is only observed when only one amplicon is considered. See Figure 5.
  • Example 7 Comparison of RNA QASeq with standard RNA Seq, nanoString, microarray, and RT-qPCR.
  • RNA Expression level from RNA QASeq is extensively compared with other technologies including standard RNA Seq, nanoString, microarray, and RT-qPCR using FFPE RNA from breast cancer and lung cancer tissue. The expression level is normalized in the same way relative to the five reference genes for all the methods and is summarized in Figure 6 for a breast cancer FFPE RNA.
  • RNA QASeq is consistent with standard RNA Seq and nanoString nCounter. Microarray, however, shows poor correlation with any of the other methods. RNA QASeq is further compared with these technologies in additional samples. See Figures 7- 10.
  • NanoString shows high correlation with RNA QASeq in all samples, but requires a much higher input amount of RNA as compared to RNA QASeq. See Figure 7. With nanoString, low expression level species are not observed when starting with 10 ng of RNA in comparison to the typical input of 150 ng of RNA. See Figure 11. Microarray shows poor concordance with both RNA QASeq and nanoString in all samples. See Figures 8 and 12. Standard RNA Seq is consistent with RNA QASeq in most samples. See Figure 9. However, because standard RNA Seq is a non-targeted approach, most reads are wasted on genes of no interest and coverage uniformity issues lead to poor robustness for the quantitation of lowly expressed genes as is observed in two FFPE samples. Finally, RT-qPCR is consistent with UMI-based RNA QASeq quantitation, but is limited by low multiplexing ability. See Figure 10. Example 8. Relative RNA expression levels in clinical samples.
  • RNA expression levels in four clinical FFPE and three healthy placenta FFPE samples are summarized. See Figure 13. Hierarchical clustering indicates that the expression patterns are the most similar among healthy placenta samples.
  • Example 9 Comparison of open-tube and one-pot RNA QASeq workflow protocols.
  • RNA QASeq library preparations are performed using two protocols, open-tube and one-pot (see Examples 2 and 3, respectively), on the same cDNA sample, which is freshly reverse transcribed from human liver RNA.
  • Open-tube and one-pot protocols show high concordance on reads uniformity and UMI count, assuring good preservation of molecule conversion yield and the robustness of RNA expression level analysis. See Figure

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present application discloses methods of quantitating RNA molecules using a multiplex amplicon sequencing system. RNA molecules are reverse transcribed into cDNA, then the cDNA is labeled with an oligonucleotide barcode sequence by polymerase chain reaction. Next, the region(s) of interest are amplified for high-throughput sequencing, and quantification of the region(s) of interest in the RNA sample are performed. The provided methods reduce RNA quantitation variation by designing multiple amplicons per transcript, and are compatible with RNA from different sources including formalin-fixed paraffin-embedded RNA. As few as three input RNA molecules can be detected by RNA Quantitative Amplicon sequencing.

Description

RNA QUANTITATIVE AMPLICON SEQUENCING FOR GENE EXPRESSION QUANTITATION
CROSS-REFERENCE TO RELATED APPLICATIONS AND INCORPORATION OF SEQUENCE LISTING
[0001] This application claims the benefit of U.S. Provisional Application No. 63/274,270, filed November 1, 2021, which is incorporated by reference in its entirety herein. A sequence listing contained in the file named “P35112WOOO_SL.XML” which is 252,036 bytes (measured in MS-Windows®) and created on October 31, 2022, is filed electronically herewith and incorporated by reference in its entirety.
FIELD
[0002] The present disclosure relates to the fields of molecular biology and bioinformatics. More particularly, it relates to methods for analyzing RNA samples to quantify gene expression.
BACKGROUND
[0003] RNA expression level quantitation is important, as the set of RNAs transcribed and their molecule count reflect the current state of the cell and may reveal pathological mechanisms underlying diseases. Many methods have been developed for RNA expression level quantitation, including standard RNA Seq, nanoString, microarray, and quantitative reverse transcription PCR (RT-qPCR).
[0004] Here, we report new methods for RNA Quantitative Amplicon sequencing (RNA QASeq). RNA QASeq utilizes PCR-based molecular barcoding to improve quantitation accuracy, and uses a targeted amplicon approach to focus the sequencing reads to genes of interest for cost reduction. RNA QASeq requires lower input than nanoString, has better quantitation accuracy than microarray, exhibits higher multiplexing ability than RT-qPCR, and is a targeted approach compared to standard RNA Seq to reduce dropout of low expression level species. SUMMARY
[0005] In one aspect, this disclosure provides a method for quantitating between 1 and 10,000 Target RNA Regions in an RNA sample, the method comprising: (a) converting RNA molecules in the RNA sample to complementary DNA (cDNA) molecules; (b) contacting the cDNA molecules with: (i) a set of Unique Molecular Identifier (UMI) Primers, each UMI Primer of the set of UMI Primers comprising a unique molecular identifier sequence (UMI) and a gene-specific sequence that is complementary to a Target RNA Region subsequence; (ii) a second set of primers; (iii) a first DNA polymerase; and (iv) reagents and buffers needed for DNA polymerase extension to generate a mixture; (c) subjecting the mixture to temperatures that allow primer binding and DNA polymerase extension; (d) removing non-extended UMI Primers to generate a product; (e) preparing a sequencing library using the product; (I) subjecting the sequencing library to high- throughput DNA sequencing to generate sequencing reads; and (g) calculating a molecule count for each Target RNA Region based on the UMI from the sequencing reads.
[0006] In one aspect, this disclosure provides a method for quantitating between 1 and 10,000 Target RNA Regions in an RNA sample, the method comprising: (a) converting RNA molecules in the RNA sample to complementary DNA (cDNA) molecules; (b) contacting the cDNA molecules with: (i) a set of Unique Molecular Identifier (UMI) Primers, each UMI Primer of the set of UMI Primers comprising a unique molecular identifier sequence (UMI) and a gene-specific sequence that is complementary to a Target RNA Region subsequence; (ii) optionally, a second set of primers; (iii) a first DNA polymerase; and (iv) reagents and buffers needed for DNA polymerase extension to generate a mixture; (c) subjecting the mixture to temperature that allow primer binding and DNA polymerase extension; (d) removing non-extended UMI Primers to generate a product; (e) preparing a sequencing library using the product; (I) subjecting the sequencing library to high-throughput DNA sequencing to generate sequencing reads; and (g) calculating a molecule count for each Target RNA Region based on the UMI from the sequencing reads.
[0007] In one aspect, this disclosure provides a method for quantitating between 1 and 10,000 Target RNA Regions in an RNA sample, the method comprising: (a) converting RNA molecules in the RNA sample to complementary DNA (cDNA) molecules; (b) contacting the cDNA molecules with: (i) a set of Unique Molecular Identifier (UMI) Primers, each UMI Primer of the set of UMI Primers comprising a unique molecular identifier sequence (UMI) and a gene-specific sequence that is complementary to a Target RNA Region subsequence; (ii) a first DNA polymerase; and (iii) reagents and buffers needed for DNA polymerase extension to generate a mixture; (c) subjecting the mixture to temperatures that allow primer binding and DNA polymerase extension to generate a first product; (d) contacting the first product with a second set of primers; (e) subjecting the mixture to temperatures that allow primer binding and DNA polymerase extension to generate a second product; (f) removing non-extended UMI Primers from the second product to generate a third product; (g) preparing a sequencing library using the third product; (h) subjecting the sequencing library to high-throughput DNA sequencing to generate sequencing reads; and (i) calculating a molecular count for each Target RNA Region based on the UMI from the sequencing reads.
BRIEF DESCRIPTION OF THE DRAWINGS
[0008] Figure 1 depicts a schematic of the one-pot RNA Quantitative Amplicon sequencing (QASeq) workflow. The sequencing library preparation consists of three PCR reaction steps: unique molecular identifier (UMI) PCR (“PCR 1”), nested PCR (“PCR 2”), and index PCR (“PCR 3”).
[0009] Figure 2 depicts a schematic of an expression level calculation formula.
[0010] Figure 3 depicts RNA QASeq quantitation accuracy validation with External
RNA Controls Consortium (ERCC) spike-in reference sample.
[0011] Figure 4 depicts RNA QASeq quantitation reproducibility, using 10 ng of total liver RNA as input, shown in technical replicates.
[0012] Figure 5 depicts reduced RNA QASeq quantitation variability based on the number of amplicons per gene, shown in replicates. As the number of amplicons per genes increases, the standard deviation of the relative expression level shrinks.
[0013] Figure 6 depicts a comparison of relative RNA expression levels of genes obtained from RNA QASeq, nanoString, microarray, and standard RNA Seq, using a breast cancer formalin-fixed, paraffin-embedded (FFPE) RNA sample. The expression level values observed for RNA QASeq are similar to those observed using nanoString and standard RNA Seq. However, the expression level values observed using microarray show poor correlation with the expression level values observed using RNA QASeq, nanoString, and standard RNA Seq. [0014] Figure 7 comprises panels A, B, C, D, E, F, G, H, I and J. Figure 7 depicts a comparison of RNA expression levels obtained from RNA QASeq and nanoString, using various samples as input: Breast Cancer (BC) 1 replicate 1 (panel A), BC 1 replicate 2 (panel B), BC 1 with 10 ng of input (panel C), BC 2 replicate 1 (panel D), BC 2 replicate 2 (panel E), BC 2 with 10 ng of input (panel F), BC 3 replicate 1 (panel G), BC 3 replicate 2 (panel H), Lung Cancer (LC) replicate 1 (panel I), and LC replicate 2 (panel J). In panels A, B, D, E, G, H, I, and J, 150 ng of input RNA is used for nanoString; in panels C and F, 10 ng of input RNA is used for nanoString; in all panels, 20 ng of input RNA is used for RNA QASeq. In general, the expression levels correlate well between RNA QASeq and nanoString.
[0015] Figure 8 comprises panels A, B, C, D, E, F, G, H, I and J. Figure 8 depicts a comparison of RNA expression levels obtained from RNA QASeq and microarray human transcriptome array (HTA), using various samples as input: BC 1 replicate 1 (panel A), BC 1 replicate 2 (panel B), BC 1 with 5 ng of input (panel C), BC 2 replicate 1 (panel D), BC 2 replicate 2 (panel E), BC 2 with 1 ng of input (panel F), BC 3 replicate 1 (panel G), BC 3 replicate 2 (panel H), LC replicate 1 (panel I), and LC replicate 2 (panel J).
Although the expression levels correlate reasonably well between RNA QASeq and microarray methodologies, the correlation is not as strong as that seen between RNA QASeq and nanoString (see Figure 7), between RNA QASeq and standard RNA Seq (see Figure 9) or between RNA QASeq and RT-qPCR (see Figure 10).
[0016] Figure 9 comprises panels A, B, C, D, E, F, G, and H. Figure 9 depicts a comparison of RNA expression levels obtained from RNA QASeq and standard RNA Seq, using various samples as input: BC 1 (panel A), BC 2 (panel B), BC 3 (panel C), commercial total liver RNA (panel D), total RNA from human blood sample (panel E), placenta FFPE 1 (panel F), placenta FFPE 2 (panel G), and placenta FFPE 3 (panel H). About 20 million reads are assigned for standard RNA Seq, with ribosomal depletion. Low expression level genes may be dropped out in standard RNA Seq (for example, dashed boxes in panels A and C), especially in FFPE samples. In general, the expression levels correlate well between RNA QASeq and standard RNA Seq.
[0001] Figure 10 depicts a comparison of RNA expression levels of three target genes (BAG1, MMP11, and BIRC5) obtained from RNA QASeq and RT-qPCR, using 10 ng of total human liver RNA as input for each RNA QASeq library preparation or each RT-qPCR well. The expression levels of the three target genes are normalized with five reference genes (TFRC, GUSB, RPLP0, ACTB, and GAPDH). RNA QASeq and RT-qPCR experiments are both performed in triplicates and the mean RNA expression levels are plotted, with linear regression R2 = 0.995.
[0017] Figure 11 depicts RNA expression level quantitation for the same FFPE RNA sample obtained from nanoString, at different input amounts.
[0018] Figure 12 comprises panels A, B, C, D, and E. Figure 12 depicts a comparison of RNA expression levels obtained from nanoString and microarray HTA, using various samples as input: BC 1 (panel A), BC 2 (panel B), BC 3 (panel C), LC (panel D), and Placenta (panel E).
[0019] Figure 13 depicts relative RNA expression levels measured by RNA QASeq in four clinical FFPE samples and three healthy placenta FFPE samples. Hierarchical clustering indicated the expression patterns are the most similar among healthy placenta samples.
[0020] Figure 14 comprises panels A and B. Figure 14 depicts consistency of UMI count (panel A) and total reads (panel B) between the open-tube RNA QASeq workflow and the one-pot RNA QASeq workflow. The open-tube and one-pot methods provide similar results for each metric.
DETAILED DESCRIPTION
[0021] Unless defined otherwise, all technical and scientific terms used have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. Where a term is provided in the singular, the inventors also contemplate aspects of the disclosure described by the plural of that term. Where there are discrepancies in terms and definitions used in references that are incorporated by reference, the terms used in this application shall have the definitions given herein. Other technical terms used have their ordinary meaning in the art in which they are used, as exemplified by various art-specific dictionaries, for example, “The American Heritage® Science Dictionary” (Editors of the American Heritage Dictionaries, 2011, Houghton Mifflin Harcourt, Boston and New York), the “McGraw-Hill Dictionary of Scientific and Technical Terms” (6th edition, 2002, McGraw-Hill, New York), or the “Oxford Dictionary of Biology” (6th edition, 2008, Oxford University Press, Oxford and New York).
[0022] Any references cited herein, including, e.g., all patents, published patent applications, and non-patent publications, are incorporated herein by reference in their entirety. [0023] Any composition provided herein is specifically envisioned for use with any applicable method provided herein.
[0024] When a grouping of alternatives is presented, any and all combinations of the members that make up that grouping of alternatives is specifically envisioned. For example, if an item is selected from a group consisting of A, B, C, and D, the inventors specifically envision each alternative individually (e.g., A alone, B alone, etc.), as well as combinations such as A, B, and D; A and C; B and C; etc.
[0025] The term “and/or” when used in a list of two or more items means any one of the listed items by itself or in combination with any one or more of the other listed items. For example, the expression “A and/or B” is intended to mean either or both of A and B - i.e., A alone, B alone, or A and B in combination. The expression “A, B and/or C” is intended to mean A alone, B alone, C alone, A and B in combination, A and C in combination, B and C in combination, or A, B, and C in combination.
[0026] When a range of numbers is provided herein, the range is understood to inclusive of the edges of the range as well as any number between the defined edges of the range. For example, “between 1 and 10” includes any number between 1 and 10, as well as the number 1 and the number 10.
[0027] As used herein, the singular form “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise. For example, the term “a compound” or “at least one compound” may include a plurality of compounds, including mixtures thereof. As used herein, the term “plurality” refers to any number greater than one.
[0028] In an aspect, this disclosure provides a method for quantitating between 1 and 10,000 Target RNA Regions in an RNA sample, the method comprising: (a) converting RNA molecules in the RNA sample to complementary DNA (cDNA) molecules; (b) contacting the cDNA molecules with: (i) a set of Unique Molecular Identifier (UMI) Primers, each UMI Primer of the set of UMI Primers comprising a unique molecular identifier sequence (UMI) and a gene-specific sequence that is complementary to a Target RNA Region subsequence; (ii) a second set of primers; (iii) a first DNA polymerase; and (iv) reagents and buffers needed for DNA polymerase extension to generate a mixture; (c) subjecting the mixture to temperatures that allow primer binding and DNA polymerase extension; (d) removing non-extended UMI Primers to generate a product; (e) preparing a sequencing library using the product; (1) subjecting the sequencing library to high- throughput DNA sequencing to generate sequencing reads; and (g) calculating a molecule count for each Target RNA Region based on the UMI from the sequencing reads.
[0029] In an aspect, this disclosure provides a method for quantitating between 1 and 10,000 Target RNA Regions in an RNA sample, the method comprising: (a) converting RNA molecules in the RNA sample to complementary DNA (cDNA) molecules; (b) contacting the cDNA molecules with: (i) a set of Unique Molecular Identifier (UMI) Primers, each UMI Primer of the set of UMI Primers comprising a unique molecular identifier sequence (UMI) and a gene-specific sequence that is complementary to a Target RNA Region subsequence; (ii) optionally, a second set of primers; (iii) a first DNA polymerase; and (iv) reagents and buffers needed for DNA polymerase extension to generate a mixture; (c) subjecting the mixture to temperature that allow primer binding and DNA polymerase extension; (d) removing non-extended UMI Primers to generate a product; (e) preparing a sequencing library using the product; (I) subjecting the sequencing library to high-throughput DNA sequencing to generate sequencing reads; and (g) calculating a molecule count for each Target RNA Region based on the UMI from the sequencing reads.
[0030] In an aspect, this disclosure provides a method for quantitating between 1 and 10,000 Target RNA Regions in an RNA sample, the method comprising: (a) converting RNA molecules in the RNA sample to complementary DNA (cDNA) molecules; (b) contacting the cDNA molecules with: (i) a set of Unique Molecular Identifier (UMI) Primers, each UMI Primer of the set of UMI Primers comprising a unique molecular identifier sequence (UMI) and a gene-specific sequence that is complementary to a Target RNA Region subsequence; (ii) a first DNA polymerase; and (iii) reagents and buffers needed for DNA polymerase extension to generate a mixture; (c) subjecting the mixture to temperatures that allow primer binding and DNA polymerase extension to generate a first product; (d) contacting the first product with a second set of primers; (e) subjecting the mixture to temperatures that allow primer binding and DNA polymerase extension to generate a second product; (I) removing non-extended UMI Primers from the second product to generate a third product; (g) preparing a sequencing library using the third product; (h) subjecting the sequencing library to high-throughput DNA sequencing to generate sequencing reads; and (i) calculating a molecular count for each Target RNA Region based on the UMI from the sequencing reads.
[0031] As used herein, “DNA” refers to deoxyribonucleic acid. DNA can be either single-stranded or double-stranded. DNA typically comprises four nucleotides: cytosine (C), guanine (G), adenine (A), and thymine (T). In an aspect, the sequence of a DNA molecule provided herein comprises one or more degenerate nucleotides. As used herein, a “degenerate nucleotide” refers to a nucleotide that can perform the same function or yield the same output as a structurally different nucleotide. Non-limiting examples of degenerate nucleotides include a C, G, or T nucleotide (B); an A, G, or T nucleotide (D); an A, C, or T nucleotide (H); a G or T nucleotide (K); an A or C nucleotide (M); any nucleotide (N); an A or G nucleotide (R); a G or C nucleotide (S); an A, C, or G nucleotide (V); an A or T nucleotide (W), and a C or T nucleotide (Y).
[0032] As used herein, “RNA” refers to ribonucleic acid. RNA can be either singlestranded or double-stranded. RNA typically comprises four nucleotides: cytosine (C), guanine (G), adenine (A), and uracil (U). In an aspect, the sequence of an RNA molecule provided herein comprises one or more degenerate nucleotides. As used herein, a “degenerate nucleotide” refers to a nucleotide that can perform the same function or yield the same output as a structurally different nucleotide. Non-limiting examples of degenerate nucleotides include a C, G, or U nucleotide (B); an A, G, or U nucleotide (D); an A, C, or U nucleotide (H); a G or U nucleotide (K); an A or C nucleotide (M); any nucleotide (N); an A or G nucleotide (R); a G or C nucleotide (S); an A, C, or G nucleotide (V); an A or U nucleotide (W), and a C or U nucleotide (Y).
[0033] In aspects of this disclosure, one nucleic acid molecule (e.g, a primer) can be complementary to a second nucleic acid molecule (e.g, a Target RNA Region subsequence). It is understood in the art that the sequence of a nucleic acid molecule need not be 100% complementary to that of its target nucleic acid molecule to be specifically hybridizable or hybridizable. Moreover, a polynucleotide may hybridize over one or more segments such that intervening or adjacent segments are not involved in the hybridization event (e.g., a loop structure or hairpin structure). For example, an antisense nucleic acid molecule in which 18 of 20 nucleotides of the antisense compound are complementary to a target region, and would therefore specifically hybridize, would represent 90 percent complementarity. In this example, the remaining noncomplementary nucleotides may be clustered or interspersed with complementary nucleotides and need not be contiguous to each other or to complementary nucleotides. Percent complementarity between particular stretches of nucleic acid sequences within nucleic acids can be determined routinely using BLAST® programs (basic local alignment search tools) and PowerBLAST programs known in the art (see Altschul et al., J. Mol. Biol., 1990, 215, 403-410; Zhang and Madden, Genome Res., 1997, 7, 649-656) or by using the Gap program (Wisconsin Sequence Analysis Package, Version 8 for Unix, Genetics Computer Group, University Research Park, Madison Wis.), using default settings, which uses the algorithm of Smith and Waterman (Adv. Appl. Math., 1981, 2, 482-489).
[0034] For optimal alignment of sequences to calculate their percent identity, various pair-wise or multiple sequence alignment algorithms and programs are known in the art, such as ClustalW or Basic Local Alignment Search Tool® (BLAST®), etc., that can be used to compare the sequence identity or similarity between two or more nucleotide or amino acid sequences. Although other alignment and comparison methods are known in the art, the alignment and percent identity between two sequences (including the percent identity ranges described above) can be as determined by the ClustalW algorithm, see, e.g, Chenna et al., “Multiple sequence alignment with the Clustal series of programs,” Nucleic Acids Research 31: 3497-3500 (2003); Thompson et al., “Clustal W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice,” Nucleic Acids Research 22: 4673-4680 (1994); Larkin MAeta/., “Clustal W and Clustal X version 2.0,” Bioinformatics 23: 2947-48 (2007); and Altschul et al. "Basic local alignment search tool." J. Mol. Biol. 215:403-410 (1990), the entire contents and disclosures of which are incorporated herein by reference.
[0035] In an aspect, a gene-specific sequence is 100% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 99.5% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 99% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 98% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 97% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 96% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 95% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 94% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 93% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 92% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 91% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 90% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 85% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 80% complementary to a Target RNA Region subsequence.
[0036] The term “percent complementary” as used herein in reference to two nucleotide sequences is similar to the concept of percent identity but refers to the percentage of nucleotides of a query sequence that optimally base-pair or hybridize to nucleotides a subject sequence when the query and subject sequences are linearly arranged and optimally base paired without secondary folding structures, such as loops, stems or hairpins. Such a percent complementarity can be between two DNA strands, two RNA strands, or a DNA strand and a RNA strand. The “percent complementarity” can be calculated by (i) optimally base-pairing or hybridizing the two nucleotide sequences in a linear and fully extended arrangement (i.e. , without folding or secondary structures) over a window of comparison, (ii) determining the number of positions that base-pair between the two sequences over the window of comparison to yield the number of complementary positions, (iii) dividing the number of complementary positions by the total number of positions in the window of comparison, and (iv) multiplying this quotient by 100% to yield the percent complementarity of the two sequences. Optimal base pairing of two sequences can be determined based on the known pairings of nucleotide bases, such as G-C, A-T, and A-U, through hydrogen binding. If the “percent complementarity” is being calculated in relation to a reference sequence without specifying a particular comparison window, then the percent identity is determined by dividing the number of complementary positions between the two linear sequences by the total length of the reference sequence. Thus, for purposes of the present application, when two sequences (query and subject) are optimally basepaired (with allowance for mismatches or non-base-paired nucleotides), the “percent complementarity” for the query sequence is equal to the number of base-paired positions between the two sequences divided by the total number of positions in the query sequence over its length, which is then multiplied by 100%.
[0037] In an aspect, this disclosure provides unique molecular identifiers (UMIs). In an aspect, a DNA molecule comprises a UMI. In an aspect, an RNA molecule comprises a UMI. In an aspect, a primer comprises a UMI. As used herein, a “unique molecular identifier” refers to a unique nucleotide sequence that serves as a molecular barcode for an individual molecule. UMIs are often attached to DNA molecules in a sample library to uniquely tag each molecule. UMIs enable error correction and increased accuracy during sequencing of DNA molecules.
[0038] In an aspect, a UMI sequence comprises between 7 nucleotides and 30 nucleotides. In an aspect, a UMI sequence comprises between 5 nucleotides and 40 nucleotides. In an aspect, a UMI sequence comprises between 10 nucleotides and 20 nucleotides. In an aspect, a UMI sequence comprises at least 5 nucleotides. In an aspect, a UMI sequence comprises at least 7 nucleotides. In an aspect, a UMI sequence comprises at least 10 nucleotides. In an aspect, a UMI sequence comprises at least 15 nucleotides. In an aspect, a UMI sequence comprises fewer than 50 nucleotides. In an aspect, a UMI sequence comprises fewer than 40 nucleotides. In an aspect, a UMI sequence comprises fewer than 30 nucleotides. In an aspect, a UMI sequence comprises fewer than 20 nucleotides.
[0039] In an aspect, a UMI sequence comprises between 7 degenerate nucleotides and 30 degenerate nucleotides. In an aspect, a UMI sequence comprises between 5 degenerate nucleotides and 40 degenerate nucleotides. In an aspect, a UMI sequence comprises between 10 degenerate nucleotides and 20 degenerate nucleotides. In an aspect, a UMI sequence comprises at least 5 degenerate nucleotides. In an aspect, a UMI sequence comprises at least 7 degenerate nucleotides. In an aspect, a UMI sequence comprises at least 10 degenerate nucleotides. In an aspect, a UMI sequence comprises at least 15 degenerate nucleotides. In an aspect, a UMI sequence comprises fewer than 50 degenerate nucleotides. In an aspect, a UMI sequence comprises fewer than 40 degenerate nucleotides. In an aspect, a UMI sequence comprises fewer than 30 degenerate nucleotides. In an aspect, a UMI sequence comprises fewer than 20 degenerate nucleotides.
[0040] In an aspect, each degenerate nucleotide in a UMI sequence is individually selected from the group consisting of N, B, D, H, V, S, W, Y, R, M, and K.
[0041] In an aspect, a UMI sequence comprises between 7 degenerate nucleotides and 30 degenerate nucleotides, where each degenerate nucleotide is selected from the group consisting of N, B, D, H, V, S, W, Y, R, M, and K.
[0042] In an aspect, a method comprises removal of sequencing reads where the UMI sequence of the sequencing reads do not comprise a predefined UMI degenerate base design pattern. As used herein, a “predefined UMI degenerate base design pattern” refers to a UMI sequence comprising the expected number of degenerate bases and the expected type of degenerate bases for a given method. Non-limiting examples of inappropriate degenerate base designs would include UMI sequences comprising too many degenerate bases or too few degenerate bases. In an aspect, a method comprises removal of at least one sequencing read where the UMI sequence of the at least one sequencing read does not comprise a predefined UMI degenerate base design pattern. In an aspect, a method comprises removal of at least two sequencing reads where the UMI sequence of the at least two sequencing reads do not comprise a predefined UMI degenerate base design pattern. In an aspect, a method comprises removal of at least three sequencing reads where the UMI sequence of the at least three sequencing reads do not comprise a predefined UMI degenerate base design pattern. In an aspect, a method comprises removal of at least four sequencing reads where the UMI sequence of the at least four sequencing reads do not comprise a predefined UMI degenerate base design pattern. In an aspect, a method comprises removal of at least five sequencing reads where the UMI sequence of the at least five sequencing reads do not comprise a predefined UMI degenerate base design pattern.
[0043] As used herein, a “Target RNA Region” refers to an RNA region of interest. In an aspect, a Target RNA Region comprises a gene sequence. In an aspect, a Target RNA Region comprises an exon sequence. In an aspect, a Target RNA Region comprises an intron sequence. In an aspect, a Target RNA Region comprises a 5' untranslated region (UTR) sequence. In an aspect, a Target RNA Region comprises a 3' UTR sequence.
[0044] In an aspect, a Target RNA Region comprises at least 5 nucleotides. In an aspect, a Target RNA Region comprises at least 25 nucleotides. In an aspect, a Target RNA Region comprises at least 50 nucleotides. In an aspect, a Target RNA Region comprises at least 100 nucleotides. In an aspect, a Target RNA Region comprises at least 500 nucleotides. In an aspect, a Target RNA Region comprises at least 1000 nucleotides. In an aspect, a Target RNA Region comprises at least 5000 nucleotides.
[0045] In an aspect, a Target RNA Region comprises between 5 nucleotides and 10,000 nucleotides. In an aspect, a Target RNA Region comprises between 5 nucleotides and 5,000 nucleotides. In an aspect, a Target RNA Region comprises between 5 nucleotides and 1,000 nucleotides. In an aspect, a Target RNA Region comprises between
5 nucleotides and 500 nucleotides. In an aspect, a Target RNA Region comprises between
5 nucleotides and 100 nucleotides. In an aspect, a Target RNA Region comprises between
25 nucleotides and 500 nucleotides. In an aspect, a Target RNA Region comprises between 25 nucleotides and 100 nucleotides. In an aspect, a Target RNA Region comprises between 50 nucleotides and 500 nucleotides. In an aspect, a Target RNA Region comprises between 50 nucleotides and 100 nucleotides. In an aspect, a Target RNA Region comprises between 100 nucleotides and 500 nucleotides.
[0046] In an aspect, an RNA sample provided herein comprises between 1 Target RNA Region and 10,000 Target RNA Regions. In an aspect, an RNA sample provided herein comprises between 1 Target RNA Region and 100,000 Target RNA Regions. In an aspect, an RNA sample provided herein comprises between 1 Target RNA Region and 1000 Target RNA Regions. In an aspect, an RNA sample provided herein comprises between 1 Target RNA Region and 500 Target RNA Regions. In an aspect, an RNA sample provided herein comprises between 1 Target RNA Region and 100 Target RNA Regions. In an aspect, an RNA sample provided herein comprises between 1 Target RNA Region and 10 Target RNA Regions. In an aspect, an RNA sample provided herein comprises between 5 Target RNA Regions and 10 Target RNA Regions. In an aspect, an RNA sample provided herein comprises between 5 Target RNA Regions and 50 Target RNA Regions. In an aspect, an RNA sample provided herein comprises between 5 Target RNA Regions and 100 Target RNA Regions. In an aspect, an RNA sample provided herein comprises between 10 Target RNA Regions and 50 Target RNA Regions.
[0047] In an aspect, an RNA sample provided herein comprises at least 1 Target RNA Region. In an aspect, an RNA sample provided herein comprises at least 2 Target RNA Regions. In an aspect, an RNA sample provided herein comprises at least 5 Target RNA Regions. In an aspect, an RNA sample provided herein comprises at least 10 Target RNA Regions. In an aspect, an RNA sample provided herein comprises at least 25 Target RNA Regions. In an aspect, an RNA sample provided herein comprises at least 50 Target RNA Regions. In an aspect, an RNA sample provided herein comprises at least 100 Target RNA Regions. In an aspect, an RNA sample provided herein comprises at least 1000 Target RNA Regions. In an aspect, an RNA sample provided herein comprises at least 10,000 Target RNA Regions. In an aspect, an RNA sample provided herein comprises at least 100,000 Target RNA Regions.
[0048] In an aspect, a method comprises calculating Target RNA transcript expression level based on a mean molecule count of the Target RNA Region within the same transcript. As used herein, “Target RNA transcript” refers to an RNA transcript comprising one or more Target RNA Regions. As used herein, “mean molecule count” refers to the average number of times a copy of a given molecule is present in a sample, solution, product, or mixture. [0049] In an aspect, 1 Target RNA Region is amplified from 1 Target RNA transcript. In an aspect, 2 Target RNA Regions are amplified from 1 Target RNA transcript. In an aspect, between 1 and 3 Target RNA Regions are amplified from 1 Target RNA transcript. In an aspect, between 1 and 4 Target RNA Regions are amplified from 1 Target RNA transcript. In an aspect, between 1 and 5 Target RNA Regions are amplified from 1 Target RNA transcript. In an aspect, between 1 and 6 Target RNA Regions are amplified from 1 Target RNA transcript. In an aspect, between 1 and 7 Target RNA Regions are amplified from 1 Target RNA transcript. In an aspect, between 1 and 8 Target RNA Regions are amplified from 1 Target RNA transcript. In an aspect, between 1 and 9 Target RNA Regions are amplified from 1 Target RNA transcript. In an aspect, between 1 and 10 Target RNA Regions are amplified from 1 Target RNA transcript. In an aspect, between 1 and 15 Target RNA Regions are amplified from 1 Target RNA transcript. In an aspect, between 1 and 20 Target RNA Regions are amplified from 1 Target RNA transcript. In an aspect, between 1 and 25 Target RNA Regions are amplified from 1 Target RNA transcript. In an aspect, between 1 and 30 Target RNA Regions are amplified from 1 Target RNA transcript.
[0050] In an aspect, at least 1 Target RNA Region is positioned within a single Target RNA transcript. In an aspect, at least 2 Target RNA Regions are positioned within a single Target RNA transcript. In an aspect, at least 3 Target RNA Regions are positioned within a single Target RNA transcript. In an aspect, at least 4 Target RNA Regions are positioned within a single Target RNA transcript. In an aspect, at least 5 Target RNA Regions are positioned within a single Target RNA transcript. In an aspect, at least 6 Target RNA Regions are positioned within a single Target RNA transcript. In an aspect, at least 7 Target RNA Regions are positioned within a single Target RNA transcript. In an aspect, at least 8 Target RNA Regions are positioned within a single Target RNA transcript. In an aspect, at least 9 Target RNA Regions are positioned within a single Target RNA transcript. In an aspect, at least 10 Target RNA Regions are positioned within a single Target RNA transcript. In an aspect, at least 11 Target RNA Regions are positioned within a single Target RNA transcript. In an aspect, at least 12 Target RNA Regions are positioned within a single Target RNA transcript. In an aspect, at least 13 Target RNA Regions are positioned within a single Target RNA transcript. In an aspect, at least 14 Target RNA Regions are positioned within a single Target RNA transcript. In an aspect, at least 15 Target RNA Regions are positioned within a single Target RNA transcript. In an aspect, at least 20 Target RNA Regions are positioned within a single Target RNA transcript. In an aspect, at least 25 Target RNA Regions are positioned within a single Target RNA transcript. In an aspect, at least 30 Target RNA Regions are positioned within a single Target RNA transcript. In an aspect, at least 35 Target RNA Regions are positioned within a single Target RNA transcript. In an aspect, at least 40 Target RNA Regions are positioned within a single Target RNA transcript. In an aspect, at least 45 Target RNA Regions are positioned within a single Target RNA transcript. In an aspect, at least 50 Target RNA Regions are positioned within a single Target RNA transcript.
[0051] In an aspect, RNA is present in, or obtained from, a sample (e.g, an “RNA sample”). As used herein, a “sample” refers to any biological material that is capable of being analyzed by or subjected to the methods and/or compositions provided herein. Any suitable method known in the art can be used to obtain a nucleic acid (e.g, an RNA molecule) from a sample. In an aspect, a sample comprises RNA. In an aspect, a sample comprises RNA and DNA.
[0052] In an aspect, a sample is obtained from a subject. As used herein, a “subject” refers to an animal (e.g, without being limiting, a mammal, reptile, bird, fish, amphibian) or other organism, such as, without being limiting, a plant or fungus. A subject can be a healthy individual, an individual that has or is suspected of having a disease or a predisposition to the disease, or an individual that is in need of therapy or suspected of needing therapy. The term “individual” and “subject” are intended to be interchangeable. [0053] In an aspect, a subject is a eukaryote. In an aspect, a subject is a prokaryote. In an aspect, a subject is a virus. In an aspect, a subject is an animal. In an aspect, a subject is a plant. In an aspect, a subject is a fungus. In an aspect, a subject is a mammal. In an aspect, a subject is a rodent. In an aspect, a subject is a mouse. In an aspect, a subject is a rat. In an aspect, a subject is a rabbit. In an aspect, a subject is a cat. In an aspect, a subject is a dog. In an aspect, a subject is a horse. In an aspect, a subject is a cow. In an aspect, a subject is a pig. In an aspect, a subject is a primate. In an aspect, a subject is a monkey. In an aspect, a subject is a chimpanzee. In an aspect, a subject is a human. In an aspect, a subject is a bird. In an aspect, a subject is a chicken. In an aspect, a subject is a fish. In an aspect, a subject is a reptile. In an aspect, a subject is an amphibian. In an aspect, a subject is an insect. In an aspect, a subject is an arachnid. In an aspect, a subject is a crustacean. In an aspect, a subject is a mollusk. In an aspect, a subject is a nematode. In an aspect, a subject is an annelid. [0054] In an aspect, a subject has, or is suspected of having, cancer. In an aspect, a subject has, or is suspected of having, colorectal cancer. In an aspect, a subject has, or is suspected of having, gastric cancer. In an aspect, a subject has, or is suspected of having, endometrial cancer. In an aspect, a subject has, or is suspected of having, a genetic-based disease, disorder, or condition.
[0055] RNA can originate from and/or be isolated from any types of cancer for use with the methods and compositions provided herein. Samples can be obtained from any type of cancer. Non-limiting examples of cancers include biliary tract cancer, bladder cancer, transitional cell carcinoma, urothelial carcinoma, brain cancer, gliomas, astrocytomas, breast carcinoma, metaplastic carcinoma, cervical cancer, cervical squamous cell carcinoma, rectal cancer, colorectal carcinoma, colon cancer, hereditary nonpolyposis colorectal cancer, colorectal adenocarcinomas, gastrointestinal stromal tumors (GISTs), endometrial carcinoma, endometrial stromal sarcomas, esophageal cancer, esophageal squamous cell carcinoma, esophageal adenocarcinoma, ocular melanoma, uveal melanoma, gallbladder carcinomas, gallbladder adenocarcinoma, renal cell carcinoma, clear cell renal cell carcinoma, transitional cell carcinoma, urothelial carcinomas, Wilms tumor, leukemia, acute lymphocytic leukemia (ALL), acute myeloid leukemia (AML), chronic lymphocytic (CLL), chronic myeloid (CML), chronic myelomonocytic (CMML), liver cancer, liver carcinoma, hepatoma, hepatocellular carcinoma, cholangiocarcinoma, hepatoblastoma, Lung cancer, non-small cell lung cancer (NSCLC), mesothelioma, B-cell lymphomas, nonHodgkin lymphoma, diffuse large B-cell lymphoma, Mantle cell lymphoma, T cell lymphomas, non-Hodgkin lymphoma, precursor T-lymphoblastic lymphoma/leukemia, peripheral T cell lymphomas, multiple myeloma, nasopharyngeal carcinoma (NPC), neuroblastoma, oropharyngeal cancer, oral cavity squamous cell carcinomas, osteosarcoma, ovarian carcinoma, pancreatic cancer, pancreatic ductal adenocarcinoma, pseudopapillary neoplasms, acinar cell carcinomas. Prostate cancer, prostate adenocarcinoma, skin cancer, melanoma, malignant melanoma, cutaneous melanoma, small intestine carcinomas, stomach cancer, gastric carcinoma, gastrointestinal stromal tumor (GIST), uterine cancer, or uterine sarcoma.
[0056] In an aspect, a sample comprises a cell. In an aspect, a sample comprises a tissue. In an aspect, a sample comprises an organ. In an aspect, a sample comprises blood. In an aspect, a sample comprises plasma. In an aspect, a sample comprises urine. In an aspect, a sample comprises feces. Additional non-limiting examples of samples include serum, sputum, semen, vaginal fluid, synovial fluid, spinal fluid, and saliva. [0057] In an aspect, an RNA sample provided herein is obtained from a source selected from the group consisting of formalin-fixed paraffin-embedded tissue, whole blood, plasma, and fresh tissue. In an aspect, an RNA sample provided herein is obtained from formalin-fixed paraffin-embedded tissue. In an aspect, an RNA sample provided herein is obtained from whole blood. In an aspect, an RNA sample provided herein is obtained from plasma. In an aspect, an RNA sample provided herein is obtained from fresh tissue. In an aspect, an RNA sample provided herein is a human RNA sample. In an aspect, an RNA sample provided herein is an animal RNA sample.
[0058] In an aspect, an RNA sample provided herein comprises less than or equal to 100 ng of RNA. In an aspect, an RNA sample provided herein comprises less than or equal to 75 ng of RNA. In an aspect, an RNA sample provided herein comprises less than or equal to 50 ng of RNA. In an aspect, an RNA sample provided herein comprises less than or equal to 25 ng of RNA. In an aspect, an RNA sample provided herein comprises less than or equal to 20 ng of RNA. In an aspect, an RNA sample provided herein comprises less than or equal to 15 ng of RNA. In an aspect, an RNA sample provided herein comprises less than or equal to 10 ng of RNA. In an aspect, an RNA sample provided herein comprises less than or equal to 5 ng of RNA. In an aspect, an RNA sample provided herein comprises less than or equal to 1 ng of RNA.
[0059] In an aspect, a sample provided herein comprises less than or equal to 100 ng of RNA. In an aspect, a sample provided herein comprises less than or equal to 75 ng of RNA. In an aspect, a sample provided herein comprises less than or equal to 50 ng of
RNA. In an aspect, a sample provided herein comprises less than or equal to 25 ng of
RNA. In an aspect, a sample provided herein comprises less than or equal to 20 ng of
RNA. In an aspect, a sample provided herein comprises less than or equal to 15 ng of
RNA. In an aspect, a sample provided herein comprises less than or equal to 10 ng of
RNA. In an aspect, a sample provided herein comprises less than or equal to 5 ng of
RNA. In an aspect, a sample provided herein comprises less than or equal to 1 ng of
RNA.
[0060] As used herein, a “UMI Family” refers to a group of sequencing reads that comprise identical UMI sequences and also aligns to the same amplicon. In an aspect, a UMI Family comprises at least 1 sequencing read. In an aspect, a UMI Family comprises at least 2 sequencing reads. In an aspect, a UMI Family comprises at least 5 sequencing reads. In an aspect, a UMI Family comprises at least 10 sequencing reads. In an aspect, a UMI Family comprises at least 50 sequencing reads. In an aspect, a UMI Family comprises at least 100 sequencing reads. In an aspect, a UMI Family comprises at least 500 sequencing reads. In an aspect, a UMI Family comprises at least 1000 sequencing reads. In an aspect, a UMI Family comprises at least 2500 sequencing reads. In an aspect, a UMI Family comprises at least 5000 sequencing reads. In an aspect, a UMI Family comprises at least 10,000 sequencing reads.
[0061] In an aspect, a UMI Family comprises between 1 sequencing read and 10,000 sequencing reads. In an aspect, a UMI Family comprises between 1 sequencing read and 5,000 sequencing reads. In an aspect, a UMI Family comprises between 1 sequencing read and 1000 sequencing reads. In an aspect, a UMI Family comprises between 1 sequencing read and 500 sequencing reads. In an aspect, a UMI Family comprises between 1 sequencing read and 100 sequencing reads.
[0062] As used herein, an “amplicon” refers to a copy of DNA made via PCR.
[0063] In an aspect, this disclosure provides UMI Primers. As used herein, a “UMI Primer” is an oligonucleotide molecule comprising a UMI sequence and a gene-specific sequence that is complementary to a Target RNA Region subsequence. In an aspect, a UMI Primer is capable of generating an amplicon of a Target RNA Region to which it hybridizes. In an aspect, a gene-specific sequence is 100% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 99% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 98% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 97% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 96% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 95% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 90% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 85% complementary to a Target RNA Region subsequence. In an aspect, a gene-specific sequence is at least 80% complementary to a Target RNA Region subsequence.
[0064] As used herein, a “Target RNA Region” subsequence refers to any portion of a Target RNA Region that is at least one nucleotide shorter in length as compared to a full- length Target RNA Region. In an aspect, a UMI Primer binds to a Target RNA Region subsequence.
[0065] In an aspect, a Target RNA Region subsequence comprises at least 1 fewer nucleotides as compared to a full-length Target RNA Region. In an aspect, a Target RNA Region subsequence comprises at least 2 fewer nucleotides as compared to a full-length Target RNA Region. In an aspect, a Target RNA Region subsequence comprises at least 3 fewer nucleotides as compared to a full-length Target RNA Region. In an aspect, a Target RNA Region subsequence comprises at least 4 fewer nucleotides as compared to a full- length Target RNA Region. In an aspect, a Target RNA Region subsequence comprises at least 5 fewer nucleotides as compared to a full-length Target RNA Region. In an aspect, a Target RNA Region subsequence comprises at least 10 fewer nucleotides as compared to a full-length Target RNA Region. In an aspect, a Target RNA Region subsequence comprises at least 25 fewer nucleotides as compared to a full-length Target RNA Region. In an aspect, a Target RNA Region subsequence comprises at least 50 fewer nucleotides as compared to a full-length Target RNA Region.
[0066] In an aspect, a Target RNA Region subsequence comprises at least 5% fewer nucleotides as compared to a full-length Target RNA Region. In an aspect, a Target RNA Region subsequence comprises at least 10% fewer nucleotides as compared to a full- length Target RNA Region. In an aspect, a Target RNA Region subsequence comprises at least 25% fewer nucleotides as compared to a full-length Target RNA Region. In an aspect, a Target RNA Region subsequence comprises at least 50% fewer nucleotides as compared to a full-length Target RNA Region. In an aspect, a Target RNA Region subsequence comprises at least 75% fewer nucleotides as compared to a full-length Target RNA Region. In an aspect, a Target RNA Region subsequence comprises at least 90% fewer nucleotides as compared to a full-length Target RNA Region.
[0067] In an aspect, a Target RNA Region subsequence comprises at least 5 nucleotides. In an aspect, a Target RNA Region subsequence comprises at least 15 nucleotides. In an aspect, a Target RNA Region subsequence comprises at least 25 nucleotides. In an aspect, a Target RNA Region subsequence comprises at least 35 nucleotides. In an aspect, a Target RNA Region subsequence comprises at least 50 nucleotides. In an aspect, a Target RNA Region subsequence comprises at least 75 nucleotides. In an aspect, a Target RNA Region subsequence comprises at least 100 nucleotides.
[0068] In an aspect, a Target RNA Region subsequence comprises between 5 and 500 nucleotides. In an aspect, a Target RNA Region subsequence comprises between 5 and 250 nucleotides. In an aspect, a Target RNA Region subsequence comprises between 5 and 100 nucleotides. In an aspect, a Target RNA Region subsequence comprises between 5 and 50 nucleotides. In an aspect, a Target RNA Region subsequence comprises between 5 and 35 nucleotides. In an aspect, a Target RNA Region subsequence comprises between 15 and 35 nucleotides.
[0069] In an aspect, a UMI Primer comprises, in the order from 5' to 3', (a) a first universal region; (b) an optional second region comprising a length of between 1 nucleotide and 50 nucleotides; (c) a third region comprising a UMI sequence; and (d) a fourth region comprising a gene-specific sequence that is complementary to a Target RNA Region subsequence. As used herein, a “universal region” refers to a sequence that remains the same in UMI primers designed for different Target RNA Regions.
[0070] In an aspect, a universal region comprises at least 1 nucleotide. In an aspect, a universal region comprises at least 2 nucleotides. In an aspect, a universal region comprises at least 3 nucleotides. In an aspect, a universal region comprises at least 4 nucleotides. In an aspect, a universal region comprises at least 5 nucleotides. In an aspect, a universal region comprises at least 6 nucleotides. In an aspect, a universal region comprises at least 7 nucleotides. In an aspect, a universal region comprises at least 8 nucleotides. In an aspect, a universal region comprises at least 9 nucleotides. In an aspect, a universal region comprises at least 10 nucleotides. In an aspect, a universal region comprises at least 15 nucleotides. In an aspect, a universal region comprises at least 20 nucleotides. In an aspect, a universal region comprises at least 25 nucleotides. In an aspect, a universal region comprises at least 30 nucleotides. In an aspect, a universal region comprises at least 35 nucleotides. In an aspect, a universal region comprises at least 40 nucleotides. In an aspect, a universal region comprises at least 45 nucleotides. In an aspect, a universal region comprises at least 50 nucleotides.
[0071] In an aspect, a universal region comprises between 1 nucleotide and 50 nucleotides. In an aspect, a universal region comprises between 1 nucleotide and 40 nucleotides. In an aspect, a universal region comprises between 1 nucleotide and 30 nucleotides. In an aspect, a universal region comprises between 1 nucleotide and 20 nucleotides. In an aspect, a universal region comprises between 1 nucleotide and 10 nucleotides.
[0072] As used herein, a “Universal Forward Primer” refers to a primer comprising a universal region. As used herein, a “Universal Reverse Primer” refers to a primer comprising a universal region.
[0073] In an aspect, non-extended UMI Primers are removed from a mixture via a method selected from the group consisting of solid phase reversible immobilization (SPRI) purification, column purification, and enzymatic digestion. In an aspect, non- extended UMI primers are removed from a mixture via solid phase reversible immobilization purification. In an aspect, non-extended UMI Primers are removed from a mixture via column purification. In an aspect, non-extended UMI Primers are removed from a mixture via enzymatic digestion. In an aspect, non-extended UMI Primers are removed from a mixture using any suitable method known in the art.
[0074] In an aspect, a method comprises the use of a reverse transcription primer. As used herein, a “reverse transcription primer” refers to a primer used in a reverse transcription reaction, where RNA in an RNA sample is converted to complementary DNA (cDNA). As used herein, “complementary DNA” or “cDNA” refers to a DNA copy of a messenger RNA (mRNA) molecule produced by a reverse transcriptase. In an aspect, a reverse transcription primer comprises at least 1 degenerate nucleotide. In an aspect, a reverse transcription primer comprises at least 2 degenerate nucleotides. In an aspect, a reverse transcription primer comprises at least 3 degenerate nucleotides. In an aspect, a reverse transcription primer comprises at least 4 degenerate nucleotides. In an aspect, a reverse transcription primer comprises at least 5 degenerate nucleotides. In an aspect, a reverse transcription primer comprises at least 6 degenerate nucleotides. In an aspect, a reverse transcription primer comprises at least 7 degenerate nucleotides. In an aspect, a reverse transcription primer comprises at least 8 degenerate nucleotides. In an aspect, a reverse transcription primer comprises at least 9 degenerate nucleotides. In an aspect, a reverse transcription primer comprises at least 10 degenerate nucleotides. In an aspect, a reverse transcription primer comprises between 1 and 5 degenerate nucleotides. In an aspect, a reverse transcription primer comprises between 1 and 10 degenerate nucleotides. In an aspect, a reverse transcription primer comprises between 1 and 15 degenerate nucleotides. In an aspect, a reverse transcription primer comprises a random hexamer. As is known in the art, a “hexamer” comprises six nucleotides. In an aspect, a reverse transcription primer comprises a polyT string. As used herein, a “polyT string” refers to at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 25, or at least 30 consecutive thymine nucleobases.
[0075] In an aspect, a method comprises the introduction of a set of Outer Primers and a set of Inner Primers, where between 3 nucleotides and 20 nucleotides positioned at the 3' end of the Inner Primer are not subsequences of the set of Outer Primers. As used herein, “Outer Primers” refers to primers that flank a set of “Inner Primers” on a Target RNA Region. For example, without being limiting, a first (e.g., forward) Outer Primer is positioned 5' to a first (e.g., forward) Inner Primer and a second (e.g, reverse) Outer Primer is positioned 3' to a second (e.g, reverse) Inner Primer. In an aspect, for a set of Outer Primers and a set of Inner Primers, between 3 nucleotides and 10 nucleotides positioned at the 3' end of the Inner Primer are not subsequences of the set of Outer Primers. In an aspect, for a set of Outer Primers and a set of Inner Primers, between 3 nucleotides and 30 nucleotides positioned at the 3' end of the Inner Primer are not subsequences of the set of Outer Primers.
[0076] In an aspect, this disclosure provides at least one DNA polymerase. As used herein, a “DNA polymerase” refers to an enzyme that is capable of catalyzing the synthesis of a DNA molecule from nucleoside triphosphates. DNA polymerases add a nucleotide to the 3' end of a DNA strand one nucleotide at a time, creating an antiparallel DNA strand as compared to a template DNA strand. DNA polymerases are unable to begin a new DNA molecule de novo; they require a primer to which it can add a first new nucleotide.
[0077] In an aspect, this disclosure provides reagents and buffers needed for DNA polymerase extension. As used herein, a “reagent” refers to any substance or compound added to a mixture to cause a chemical reaction or to test if a chemical reaction occurs. In an aspect, a reagent comprises a component selected from the group consisting of magnesium, at least one dNTP, phosphatase, betaine, dimethyl sulfoxide (DMSO), and tetramethylammonium chloride (TMAC). Non-limiting examples of reagents and buffers needed for DNA polymerase extension include Tris-HCl, potassium chloride, magnesium chloride, oligonucleotide primers, deoxynucleotides (dNTPs), betaine, and dimethyl sulfoxide. Those of ordinary skill in the art recognize that different DNA polymerases and different Target RNA Regions can require different groupings of necessary reagents and buffers.
[0078] DNA polymerases can extend primers at different temperatures, depending on the DNA polymerase. In an aspect, a DNA polymerase extends primers at a temperature of at least 40°C. In an aspect, a DNA polymerase extends primers at a temperature of at least 50°C. In an aspect, a DNA polymerase extends primers at a temperature of at least 55°C. In an aspect, a DNA polymerase extends primers at a temperature of at least 60°C. In an aspect, a DNA polymerase extends primers at a temperature of at least 65°C. In an aspect, a DNA polymerase extends primers at a temperature of at least 70°C. In an aspect, a DNA polymerase extends primers at a temperature of at least 75°C. In an aspect, a DNA polymerase extends primers at a temperature of at least 80°C. [0079] Primers can bind, or anneal, to a complementary part of a Target RNA Region subsequence at a variety of temperatures, depending on the structure and length of the sequences involved. In an aspect, primer binding occurs at a temperature of at least 35°C. In an aspect, primer binding occurs at a temperature of at least 40°C. In an aspect, primer binding occurs at a temperature of at least 45°C. In an aspect, primer binding occurs at a temperature of at least 50°C. In an aspect, primer binding occurs at a temperature of at least 55°C. In an aspect, primer binding occurs at a temperature of at least 60°C. In an aspect, primer binding occurs at a temperature of at least 65°C. In an aspect, primer binding occurs at a temperature of at least 70°C.
[0080] In an aspect, DNA polymerase extension and primer binding occur at different temperatures. In an aspect, DNA polymerase extension and primer binding occur at the same temperature.
[0081] In an aspect, a DNA polymerase is a thermostable DNA polymerase. As used herein, a “thermostable DNA polymerase” refers to DNA polymerases that can function at high temperatures (e.g, greater than 65°C) and can survive higher temperatures (e.g, up to about 100°C). Thermostable DNA polymerases often have maximal catalytic activity at temperatures between 70°C and 80°C. In an aspect, a thermostable DNA polymerase is selected from the group consisting of comprising Taq DNA polymerase, Phusion® DNA polymerase, Q5® DNA polymerase, and KAPA High Fidelity DNA polymerase.
[0082] In an aspect, a DNA polymerase is a non-thermostable DNA polymerase. As used herein, a “non-thermostable DNA polymerase” refers to DNA polymerases that cannot function at high temperatures. In an aspect, a non-thermostable DNA polymerase is selected from the group consisting of phi29 DNA polymerase and Bst DNA polymerase.
[0083] In an aspect, a method comprises high-throughput sequencing. In an aspect, a method comprises subjecting a sequencing library to high-throughput sequencing. As used herein, “high-throughput sequencing” refers to any sequencing method that is capable of sequencing multiple (e.g, tens, hundreds, thousands, millions, hundreds of millions) DNA molecules in parallel. In an aspect, Sanger sequencing is not high- throughput sequencing. In an aspect, high-throughput sequencing comprises the use of a sequencing-by-synthesis (SBS) flow cell. In an aspect, an SBS flow cell is selected from the group consisting of an Illumina SBS flow cell and a Pacific Biosciences (PacBio) SBS flow cell. In an aspect, high-throughput sequencing is performed via electrical current measurements in conjunction with an Oxford nanopore.
[0084] In an aspect, high-throughput DNA sequencing comprises sequencing-by- synthesis or nanopore-based sequencing.
[0085] As used herein, a “sequencing read” refers to a nucleotide sequence of a single nucleic acid molecule generated via a high-throughput sequencing method. In an aspect, sequencing reads are provided in a FASTX or FASTQ file type. In an aspect, a sequencing read comprises a UMI sequence. In an aspect, a sequencing read comprises a sequence from a gene. In an aspect, a sequencing read comprises a UMI sequence and a sequence from a gene.
[0086] In an aspect, a sequencing read comprises at least 10 nucleotides. In an aspect, a sequencing read comprises at least 25 nucleotides. In an aspect, a sequencing read comprises at least 50 nucleotides. In an aspect, a sequencing read comprises at least 100 nucleotides. In an aspect, a sequencing read comprises at least 250 nucleotides. In an aspect, a sequencing read comprises at least 500 nucleotides. In an aspect, a sequencing read comprises at least 1000 nucleotides.
[0087] In an aspect, a sequencing read comprises between 10 nucleotides and 10,000 nucleotides. In an aspect, a sequencing read comprises between 10 nucleotides and 5000 nucleotides. In an aspect, a sequencing read comprises between 10 nucleotides and 1000 nucleotides. In an aspect, a sequencing read comprises between 10 nucleotides and 500 nucleotides. In an aspect, a sequencing read comprises between 10 nucleotides and 100 nucleotides. In an aspect, a sequencing read comprises between 25 nucleotides and 150 nucleotides.
[0088] In an aspect, a method comprises removing from consideration, for each Target RNA Region, UMI families comprising a UMI family size of less than X, where X is determined as Y% of the mean value for the largest Z UMI Family size(s) in the Target RNA Region. In an aspect, Y is between 1% and 20% and Z is between 1 and 20. In an aspect, Y is between 1% and 50% and Z is between 1 and 50. In an aspect, Y is between 1% and 75% and Z is between 1 and 75. In an aspect, Y is greater than 1% and Z is greater than 1. In an aspect, Y is greater than 5% and Z is greater than 5. In an aspect, Y is greater than 10% and Z is greater than 10. In an aspect, Y and Z are the same integer. In an aspect, Y and Z are different integers. In an aspect, X and Y are the same integer. In an aspect, X and Y are different integers. In an aspect X and Z are the same integer. In an aspect, X and Z are different integers. In an aspect, X, Y, and Z are the same integer. In an aspect, X, Y, and Z are different integers.
[0089] In an aspect, a method comprises removing from consideration, for each Target RNA Region, UMI families comprising a UMI family size of less than X, where X is determined as Y% of the mean value for the largest Z UMI Family size(s) for all UMI families with the exact same genotype in the Target RNA Region. In an aspect, Y is between 1% and 20% and Z is between 1 and 20. In an aspect, Y is between 1% and 50% and Z is between 1 and 50. In an aspect, Y is between 1% and 75% and Z is between 1 and 75. In an aspect, Y is greater than 1% and Z is greater than 1. In an aspect, Y is greater than 5% and Z is greater than 5. In an aspect, Y is greater than 10% and Z is greater than 10. In an aspect, Y and Z are the same integer. In an aspect, Y and Z are different integers. In an aspect, X and Y are the same integer. In an aspect, X and Y are different integers. In an aspect X and Z are the same integer. In an aspect, X and Z are different integers. In an aspect, X, Y, and Z are the same integer. In an aspect, X, Y, and Z are different integers.
[0090] In an aspect, a method comprises removing from consideration, for each Target RNA Region, UMI families comprising a UMI family size of less than X, where X is determined based on a Gaussian fitting on a histogram of log2(UMI Family size) for the Target RNA Region, where log2(X) is (the center of the histogram peak) - (3 x the standard deviation of the histogram). In an aspect, X is between 1 and 10. In an aspect, X is between 1 and 20. In an aspect, X is between 1 and 50. In an aspect, X is between 1 and 75. In an aspect, X is between 1 and 100. In an aspect, X is greater than 1. In an aspect, X is greater than 5. In an aspect, X is greater than 10. In an aspect, X is greater than 50. In an aspect, X is greater than 100. In an aspect, X is greater than 250. In an aspect, X is greater than 500.
[0091] In an aspect, a method comprises removing from consideration, for each Target RNA Region, UMI families comprising a UMI family size of less than X, where X is a fixed number. In an aspect, X is between 2 and 10. In an aspect, X is between 2 and 20. In an aspect, X is between 2 and 30. In an aspect, X is between 2 and 40. In an aspect, X is between 2 and 50. In an aspect, X is between 2 and 75. In an aspect, X is between 2 and 100. In an aspect, X is greater than 1. In an aspect, X is greater than 5. In an aspect, X is greater than 10. In an aspect, X is greater than 15. In an aspect, X is greater than 20. In an aspect, X is greater than 25. In an aspect, X is greater than 50. In an aspect, X is greater than 75. In an aspect, X is greater than 100. [0092] As used herein, the number of unique UMI sequences “N” is the total count of different UMI sequences at one locus, which indicates the number of original molecules of the Target RNA Regions * the barcoding yield.
[0093] As used herein, a “barcoding yield” is the UMI attachment yield, and refers to the percentage of original molecules that can be attached with a UMI. In an aspect, the barcoding yield for each Target RNA Region is calibrated using a sample with a known molecule count. As used herein, “molecule count” refers to the number of times a copy of a given nucleic acid molecule is present in a sample, solution, product, or mixture. As used herein, “barcoding yield” and “conversion yield” are used interchangeably.
[0094] In an aspect, a method comprises identifying a UMI Family Sequence. As used herein, a “UMI Family Sequence” refers to the most frequent nucleotide sequence within a UMI Family.
[0095] In an aspect, a method comprises removal of a first UMI sequence that differs by a fixed number of nucleotides from a second UMI sequence, where fewer sequencing reads contain the first UMI sequence as compared to the second UMI sequence. In an aspect, a first UMI sequence differs from a second UMI sequence by one nucleotide. In an aspect, a first UMI sequence differs from a second UMI sequence by two nucleotides. In an aspect, a first UMI sequence differs from a second UMI sequence by three nucleotides. In an aspect, a first UMI sequence differs from a second UMI sequence by four nucleotides. In an aspect, a first UMI sequence differs from a second UMI sequence by five nucleotides. In an aspect, a first UMI sequence differs from a second UMI sequence by one nucleotide or two nucleotides. In an aspect, a first UMI sequence differs from a second UMI sequence by between one nucleotide and three nucleotides.
[0096] As a non-limiting example, the sequence 5'-AATG-3' differs from the sequence 5'-AATC-3' by one nucleotide. As a non-limiting example, the sequence 5'- AATG-3' differs from the sequence 5'-AAAC-3' by two nucleotides.
[0097] In an aspect, a Target RNA Region is positioned within a BAG1 gene. In an aspect, a Target RNA Region is positioned within a PGR gene. In an aspect, a Target RNA Region is positioned within an ERBB2 gene. In an aspect, a Target RNA Region is positioned within a GRB7 gene. In an aspect, a Target RNA Region is positioned within a TFRC gene. In an aspect, a Target RNA Region is positioned within an MKI67 gene. In an aspect, a Target RNA Region is positioned within an MMP11 gene. In an aspect, a Target RNA Region is positioned within an MYBL2 gene. In an aspect, a Target RNA Region is positioned within a BIRC5 gene. In an aspect, a Target RNA Region is positioned within an ESRI gene. In an aspect, a Target RNA Region is positioned within a CD68 gene. In an aspect, a Target RNA Region is positioned within a CTSV gene. In an aspect, a Target RNA Region is positioned within a BCL2 gene. In an aspect, a Target RNA Region is positioned within a CCNB1 gene. In an aspect, a Target RNA Region is positioned within a GUSB gene. In an aspect, a Target RNA Region is positioned within a SCUBE2 gene. In an aspect, a Target RNA Region is positioned within a RPLPO gene. In an aspect, a Target RNA Region is positioned within an ACTB gene. In an aspect, a Target RNA Region is positioned within an AURKA gene. In an aspect, a Target RNA Region is positioned within a GAPDH gene. In an aspect, a Target RNA Region is positioned within one or more genes selected from the group consisting of: BAG1, PGR, ERBB2, GRB7, TFRC, MKI67, MMP11, MYBL2, BIRC5, ESRI, CD68, CTSV, BCL2, CCNB1, GUSB, SCUBE2, RPLPO, ACTB, AURKA, and GAPDH.
[0098] The following exemplary, non-limiting, embodiments are envisioned:
1. A method for quantitating between 1 and 10,000 Target RNA Regions in an RNA sample, the method comprising:
(a) converting RNA molecules in the RNA sample to complementary DNA (cDNA) molecules;
(b) contacting the cDNA molecules with:
(i) a set of Unique Molecular Identifier (UMI) Primers, each UMI Primer of the set of UMI Primers comprising a unique molecular identifier sequence (UMI) and a gene-specific sequence that is complementary to a Target RNA Region subsequence;
(ii) a second set of primers;
(iii)a first DNA polymerase; and
(iv) reagents and buffers needed for DNA polymerase extension to generate a mixture;
(c) subjecting the mixture to temperatures that allow primer binding and DNA polymerase extension;
(d) removing non-extended UMI Primers to generate a product;
(e) preparing a sequencing library using the product;
(I) subjecting the sequencing library to high-throughput DNA sequencing to generate sequencing reads; and
(g) calculating a molecule count for each Target RNA Region based on the UMI from the sequencing reads. A method for quantitating between 1 and 10,000 Target RNA Regions in an RNA sample, the method comprising:
(a) converting RNA molecules in the RNA sample to complementary DNA (cDNA) molecules;
(b) contacting the cDNA molecules with:
(i) a set of Unique Molecular Identifier (UMI) Primers, each UMI Primer of the set of UMI Primers comprising a unique molecular identifier sequence (UMI) and a gene-specific sequence that is complementary to a Target RNA Region subsequence;
(ii) optionally, a second set of primers;
(iii)a first DNA polymerase; and
(iv) reagents and buffers needed for DNA polymerase extension to generate a mixture;
(c) subjecting the mixture to temperatures that allow primer binding and DNA polymerase extension;
(d) removing non-extended UMI Primers to generate a product;
(e) preparing a sequencing library using the product;
(1) subjecting the sequencing library to high-throughput DNA sequencing to generate sequencing reads; and
(g) calculating a molecule count for each Target RNA Region based on the UMI from the sequencing reads. A method for quantitating between 1 and 10,000 Target RNA Regions in an RNA sample, the method comprising:
(a) converting RNA molecules in the RNA sample to complementary DNA (cDNA) molecules;
(b) contacting the cDNA molecules with:
(i) a set of Unique Molecular Identifier (UMI) Primers, each UMI Primer of the set of UMI Primers comprising a unique molecular identifier sequence (UMI) and a gene-specific sequence that is complementary to a Target RNA Region subsequence;
(ii) a first DNA polymerase; and
(iii)reagents and buffers needed for DNA polymerase extension to generate a mixture; (c) subjecting the mixture to temperatures that allow primer binding and DNA polymerase extension to generate a first product;
(d) contacting the first product with a second set of primers;
(e) subjecting the mixture to temperatures that allow primer binding and DNA polymerase extension to generate a second product;
(1) removing non-extended UMI Primers from the second product to generate a third product;
(g) preparing a sequencing library using the third product;
(h) subjecting the sequencing library to high-throughput DNA sequencing to generate sequencing reads; and
(i) calculating a molecule count for each Target RNA Region based on the UMI from the sequencing reads.
4. The method of embodiment 2, where the second set of primers is added during step (c) instead of during step (b).
5. The method of any one of embodiments 1-3, where the method further comprises: calculating Target RNA transcript expression level based on a mean molecule count of the Target RNA Region within the same transcript.
6. The method of embodiment 5, where between 1 and 10 of the Target RNA Regions are amplified from one Target RNA transcript.
7. The method of any one of embodiments 1-3, where step (a) comprises the use of a reverse transcription primer.
8. The method of any one of embodiments 1-3, where the second set of primers comprises a Universal Forward Primer and a Universal Reverse Primer.
9. The method of embodiment 8, where the reverse transcription primer comprises at least one degenerate nucleotide.
10. The method of embodiment 8, where the reverse transcription primer comprises a random hexamer.
11. The method of embodiment 8, where the reverse transcription primer comprises a polyT string.
12. The method of any one of embodiments 1-3, where the RNA sample is obtained from a source selected from the group consisting of formalin-fixed paraffin-embedded tissue, whole blood, plasma, and fresh tissue.
13. The method of any one of embodiments 1-3, where the RNA sample is a human RNA sample. 14. The method of any one of embodiments 1-3, where the UMI sequence comprises between 7 and 30 degenerate nucleotides.
15. The method of embodiment 14, where each of the between 7 and 30 degenerate nucleotides is selected from the group consisting of N, B, D, H, V, S, W, Y, R, M, and K.
16. The method of any one of embodiments 1-3, where the high-throughput sequencing comprises sequencing-by-synthesis or nanopore-based sequencing.
17. The method of any one of embodiments 1-3, where the first DNA polymerase is a thermostable DNA polymerase.
18. The method of embodiment 17, where the thermostable DNA polymerase is selected from the group consisting of Taq DNA polymerase, Phusion® DNA polymerase, Q5® DNA polymerase, and KAPA High Fidelity DNA polymerase.
19. The method of any one of embodiments 1-3, where the first DNA polymerase is a non-thermostable DNA polymerase.
20. The method of embodiment 19, where the first DNA polymerase is selected from the group consisting of phi29 DNA polymerase and Bst DNA polymerase.
21. The method of any one of embodiments 1 or 2, where step (d) comprises a purification method selected from the group consisting of solid phase reversible immobilization (SPRI) purification, column purification, and enzymatic digestion.
22. The method of embodiment 3, where step (1) comprises a purification method selected from the group consisting of solid phase reversible immobilization (SPRI) purification, column purification, and enzymatic digestion.
23. The method of embodiment 1 or 2, where step (g) comprises removal of at least one sequencing read where the UMI sequence of the at least one sequencing read does not comprise a predefined UMI degenerate base design pattern prior to the calculating.
24. The method of embodiment 3, where step (i) comprises removal of at least one sequencing read where the UMI sequence of the at least one sequencing read does not comprise a predefined UMI degenerate base design pattern prior to the calculating.
25. The method of embodiment 1 or 2, where step (g) comprises removal of a first UMI sequence that differs by only 1 or 2 nucleotides from a second UMI sequence, where fewer sequencing reads contain the first UMI sequence as compared to the second UMI sequence.
26. The method of embodiment 3, where step (i) comprises removal of a first UMI sequence that differs by only 1 or 2 nucleotides from a second UMI sequence, where fewer sequencing reads contain the first UMI sequence as compared to the second UMI sequence. The method of embodiment 1 or 2, where step (g) comprises:
(i) aligning the sequencing reads to the Target RNA Regions, and grouping the sequencing reads into locus-specific subgroups by the loci they are aligned to;
(ii) for each locus, grouping the sequencing reads by the UMI sequence to generate UMI Families;
(iii)removing UMI families comprising a UMI family size of less than X,
(A) where X is set as Y% of the mean value for the largest Z UMI Family size(s) in the Target RNA Region, where Y is between 1% and 20% and Z is between 1 and 20; or
(B) where X is set as Y% of the mean value for the largest Z UMI Family size(s) for all UMI families with the exact same genotype in the Target RNA Regions, where Y is between 1% and 20% and Z is between 1 and 20; or
(C) where X is set based on a Gaussian fitting on a histogram of log2(UMI Family size) for the Target RNA Regions, where log2(X) is (the center of the histogram peak) - (3 x the standard deviation of the histogram); or
(D) where X is set as a fixed number between 2 and 20;
(iv)counting the number of unique UMI sequences (N); and
(v) dividing N by a barcoding yield, to identify the number of original molecules of the Target RNA Regions, where the barcoding yield for each Target RNA Region is calibrated using a sample with a known molecule count. The method of embodiment 3, where step (i) comprises:
(i) aligning the sequencing reads to the Target RNA Regions, and grouping the sequencing reads into locus-specific subgroups by the loci they are aligned to;
(ii) for each locus, grouping the sequencing reads by the UMI sequence to generate UMI Families;
(iii)removing UMI families comprising a UMI family size of less than X,
(A) where X is set as Y% of the mean value for the largest Z UMI Family size(s) in the Target RNA Region, where Y is between 1% and 20% and Z is between 1 and 20; or. (B) where X is set as Y% of the mean value for the largest Z UMI Family size(s) for all UMI families with the exact same genotype in the Target RNA Regions, where Y is between 1% and 20% and Z is between 1 and 20; or
(C) where X is set based on a Gaussian fitting on a histogram of log2(UMI Family size) for the Target RNA Regions, where log2(X) is (the center of the histogram peak) - (3 x the standard deviation of the histogram); or
(D) where X is set as a fixed number between 2 and 20;
(iv)counting the number of unique UMI sequences (N); and
(v) dividing N by a barcoding yield, to identify the number of original molecules of the Target RNA Regions, where the barcoding yield for each Target RNA Region is calibrated using a sample with a known molecule count. The method of embodiment 1 or 2, where step (g) further comprises identifying a UMI Family Sequence. The method of embodiment 3, where step (i) further comprises identifying a UMI Family Sequence. The method of any one of embodiments 1-3, where the UMI Primers comprise from 5' to 3':
(i) a first universal region;
(ii) an optional second region comprising a length between 1 and 50 nucleotides;
(iii)a third region comprising a UMI sequence; and
(iv)a fourth region targeting a specific genomic region. The method of any one of embodiments 1-3, where:
(i) step (b) further comprises introduction of a set of Outer Primers; and
(ii) where the second set of primers comprises a set of Inner Primers, where between 3 and 20 nucleotides at the 3' end of the Inner Primer are not subsequences of the Outer Primers. The method of any one of embodiments 1-3, where the between 1 and 10,000 Target RNA Regions are positioned within one or more genes selected from the group consisting of: BAG1, PGR, ERBB2, GRB7, TFRC, MKI67, MMP11, MYBL2, BIRC5, ESRI, CD68, CTSV, BCL2, CCNB1, GUSB, SCUBE2, RPLP0, ACTB, AURKA, and GAPDH. 34. The method of any one of embodiments 1-3, where the RNA sample comprises less than or equal to 25 ng of RNA.
35. The method of any one of embodiments 1-3, where the RNA sample comprises less than or equal to 10 ng of RNA.
36. The method of any one of embodiments 1-3, where at least five of the Target RNA Regions are positioned within a single Target RNA transcript.
[0099] Having described the present disclosure in detail, it will be apparent that modifications, variations, and equivalent aspects are possible without departing from the spirit and scope of the present disclosure as described herein and in the appended claims. Furthermore, it should be appreciated that all examples in the present disclosure are provided as non-limiting examples.
EXAMPLES
Example 1. RNA QASeq amplicon panel design.
[0100] RNA QASeq involves the construction of targeted panels for gene expression quantification. The panel design is accomplished by running a fully automated python pipeline denominated ‘pynab.’ Pynab generates highly optimized primer sequences (forward primer (fP), reverse inner primer (rPin), and reverse outer primer (rPout)) targeting sequence(s) of interest within a given gene(s). Generally, the regions of interest include exons (transcripts) or specific hotspot locations with variant information such as single nucleotide polymorphisms (SNPs), indels, etc. The primers generated by the pipeline can be easily multiplexed for single PCR reactions as they are designed to be chemically compatible.
[0101] In one example here, the automated design pipeline is composed of four modules:
[0102] Module 1. Gathering gene information. This step involves obtaining genomic information for the sequence(s) of interest (e.g, Target RNA Regions), even if this is a single nucleotide position in a given exon. The input is a list of gene names or gene IDs or alternatively, a list of hotspot coordinates with start and stop positions and the corresponding gene name. [0103] Module 2. Designing context sequences. The second step entails designing amplicons around the region of interest. There are two design strategies: 1) All-exon tiling and 2) hotspot.
[0104] For a given gene, the all-exon tiling strategy includes adding padding regions to exons, merging neighboring padded exons into sequences of interest (SOI), covering SOIs with inserts, adding primer design regions and conducting BLAST checks for each context sequence against a reference human genome to evaluate the number of hits and avoid including primers that will likely produce non-specific amplification.
[0105] For a given gene (aka SOI), the hotspot panel design strategy only includes covering the SOI with inserts and adding extended regions for primer design. BLAST check is also included in the hotspot panel design.
[0106] Context sequence design for both strategies (exon tiling or hotspot) can be fully customized to produce long (for example -190 nucleotides) or short (for example -70 nucleotides) amplicons through a configuration file.
[0107] Module 3. Generate primer candidates. In this step, a group of forward and reverse primer candidates are generated from the previously designated primer design regions. Initially, forward primers (fPs) from the fP design region will be generated. At least one fP is produced. Next, reverse primers (rPs) from the rP design region are generated. At least two rPs are produced with at minimum separating distance of 4 nucleotides.
[0108] Module 4. Optimize primers. In this final step, the set of candidate primers are carefully evaluated in a pair-wise fashion to estimate the likelihood of primer-dimer formation and if required, replace problematic primers with better candidates. Initially fP and rPout are optimized and then rPin is optimized based on existing optimized fP and rPout sequences.
Example 2. Experimental workflow - RNA QASeq (open-tube).
[0109] An RNA sample is initially reverse transcribed to cDNA as input for RNA QASeq protocol. RNA is mixed with dNTP (0.5 mM), Murine RNase Inhibitor (8 U), M- MuLV buffer (IX), M-MuLV Reverse Transcriptase (8 U), and random hexamer (6 pM). The mixture is incubated at 25 °C for 5 minutes, at 42° C for 60 minutes, and then is inactivated at 65°C for 20 minutes. The reaction mixture is directly used as input for UMI PCR without purification. The following sequencing library preparation consists of three PCR reactions: UMI PCR, nested PCR, and index PCR. [0110] In UMI PCR, the cDNA sample is mixed with 1U Phusion High-Fidelity DNA polymerase, Phusion HF buffer, forward and outer reverse primers (15 nM each), and dNTPs (0.2 mM each) to reach a total volume of 50 pL. Thermal cycling starts with 30 seconds at 98°C, followed by 2 cycles of 10 seconds at 98°C, 30 minutes at 63°C, and 15 seconds at 72°C, and then 2 cycles of 10 seconds at 98°C, 15 seconds at 63°C and 15 seconds at 72°C, finally 5 cycles of 10 seconds at 98°C and 30 seconds at 71°C. During the last 5 minutes of the second 30 minutes at 63°C segment, 1.5 pM of each universal primer is added while keeping the reactions inside the thermal cycler. See Table 1. After UMI PCR, 1.6X purification with AMPure XP beads is performed.
[oni] In nested PCR, the purified eluate from the previous step is mixed with PowerUp SYBR Green Master Mix (IX final concentration) and 15 nM each inner reverse primer. Thermal cycling starts with 3 minutes at 95 °C, followed by 2 cycles of 10 seconds at 95°C and 30 minutes at 60°C. The PCR product is purified by 1.6X AMPure XP beads. [0112] Next, index PCR is performed. The purified eluate from the previous step is mixed with iTaq Universal SYBR Green Supermix (IX final concentration) and 250 nM each NEBNext index primers. Thermal cycling starts with a 3 minutes incubation step at 95°C, followed by 25 cycles of 10 seconds at 95°C and 30 seconds at 65°C, and finally 2 minutes at 65°C. After index PCR, double-side size selection (0.4X, 0.4X ratio) is performed. Finally, sequencing libraries are normalized to equimolar concentrations and are loaded onto an Illumina sequencer.
Table 1: Oligonucleotide sequences of the Universal Primers.
Figure imgf000037_0001
Example 3. Experimental workflow - RNA QASeq (one-pot).
[0113] A schematic of the one-pot RNA QASeq workflow is shown in Figure 1.
[0114] In the one-pot RNA QASeq workflow’s UMI PCR step, a cDNA sample is mixed with 1U Phusion High-Fidelity DNA polymerase, Phusion HF buffer, forward and outer reverse primers (15 nM each), universal primers (1.5 pM each) and dNTPs (0.2 mM each) to reach a total volume of 50 pL. Thermal cycling starts with 30 seconds at 98°C, followed by 2 cycles of 10 seconds at 98°C, 30 minutes at 63°C and 15 seconds at 72°C, and then 2 cycles of 10 seconds at 98°C, 15 seconds at 63°C and 15 seconds at 72°C, finally 5 cycles of 10 seconds at 98°C and 30 seconds at 71°C. See Figure 1, “PCR 1” step. There is no open-tube step for adding universal primers during PCR reaction. After UMI PCR, the mixture is purified using 1.6X AMPure XP beads. Following this, nested PCR and index PCR are performed as described in Example 2.
Example 4. RNA QASeq quantitation in a variety of samples.
[0115] RNA QASeq technology for RNA quantitation is demonstrated in a variety of samples including tumor tissue FFPE RNA, total blood RNA, and total liver RNA. RNA samples are reverse transcribed to cDNA as input for RNA QASeq as described above. Random hexamer is chosen as the reverse transcription primer to be compatible with low- quality fragmented FFPE RNA.
[0116] A targeted multi gene breast cancer panel covering 78 amplicons in 15 cancer- related genes and five reference genes similar to the Oncotype DX panel is built. See Table 2. Expression of each gene is calculated from the molecule count of each amplicon, based on UMI count and conversion yield, and is further normalized relative to the expression level of the five reference genes in log2 scale. See Figure 2.
Table 2. Oligonucleotide sequences for a targeted multigene breast cancer panel. mRNA accession refers to GenBank (ncbi[dot]nlm[dot]nih[dot]gov/genbank) accession numbers.
Figure imgf000038_0001
Figure imgf000039_0001
Figure imgf000040_0001
Figure imgf000041_0001
Figure imgf000042_0001
Figure imgf000043_0001
Example 5. Validation of RNA quantitation accuracy.
[0117] RNA quantitation accuracy is first validated using an ERCC RNA spike-in mix. Sixteen ERCC sequences are selected as Target RNA Regions and amplified. See Table 3. The ERCC RNA sample is diluted and mixed with commercial human total liver RNA for a final expected molecule count between 3 and 100,000. The observed molecule count shows good correlation with the expected molecule count. See Figure 3. RNA QASeq quantitation for RNA is across five orders of magnitude and as few as three expected molecules are detected. See Table 4.
Table 3. Oligonucleotide sequences for targeting ERCC references.
Figure imgf000044_0001
Figure imgf000045_0001
Figure imgf000046_0001
Figure imgf000047_0001
Table 4. RNA QASeq quantitation of ERCC references.
Figure imgf000047_0002
[0118] Three technical replicates are performed to estimate sensitivity at the expected molecule count of three. Two ERCC amplicons (ERCC-00075 and ERCC-00048), for which the expected molecule count is three in diluted reference sample are evaluated with three technical replicates. A non-zero molecule count in five out of six cases is observed, providing a sensitivity estimate of 83.33% for three molecules.
Example 6. RNA QASeq quantitation reproducibility and variability.
[0119] Reproducibility for expression level quantitation relative to reference genes in RNA QASeq is evaluated. Total liver RNA is assayed with breast cancer panel in replicate, and consistent expression level is observed. See Figure 4.
[0120] The use of multiple amplicons per gene in RNA QASeq reduces quantitation variability in the expression level. The standard deviation for relative expression level in triplicate experiments becomes lower as the number of amplicons per gene increases from one to five, with the median standard deviation reducing from 0.44 to 0.21. An outlier is only observed when only one amplicon is considered. See Figure 5.
Example 7. Comparison of RNA QASeq with standard RNA Seq, nanoString, microarray, and RT-qPCR.
[0121] RNA Expression level from RNA QASeq is extensively compared with other technologies including standard RNA Seq, nanoString, microarray, and RT-qPCR using FFPE RNA from breast cancer and lung cancer tissue. The expression level is normalized in the same way relative to the five reference genes for all the methods and is summarized in Figure 6 for a breast cancer FFPE RNA. RNA QASeq is consistent with standard RNA Seq and nanoString nCounter. Microarray, however, shows poor correlation with any of the other methods. RNA QASeq is further compared with these technologies in additional samples. See Figures 7- 10. NanoString shows high correlation with RNA QASeq in all samples, but requires a much higher input amount of RNA as compared to RNA QASeq. See Figure 7. With nanoString, low expression level species are not observed when starting with 10 ng of RNA in comparison to the typical input of 150 ng of RNA. See Figure 11. Microarray shows poor concordance with both RNA QASeq and nanoString in all samples. See Figures 8 and 12. Standard RNA Seq is consistent with RNA QASeq in most samples. See Figure 9. However, because standard RNA Seq is a non-targeted approach, most reads are wasted on genes of no interest and coverage uniformity issues lead to poor robustness for the quantitation of lowly expressed genes as is observed in two FFPE samples. Finally, RT-qPCR is consistent with UMI-based RNA QASeq quantitation, but is limited by low multiplexing ability. See Figure 10. Example 8. Relative RNA expression levels in clinical samples.
[0122] The relative RNA expression levels in four clinical FFPE and three healthy placenta FFPE samples are summarized. See Figure 13. Hierarchical clustering indicates that the expression patterns are the most similar among healthy placenta samples. Example 9. Comparison of open-tube and one-pot RNA QASeq workflow protocols.
[0123] RNA QASeq library preparations are performed using two protocols, open-tube and one-pot (see Examples 2 and 3, respectively), on the same cDNA sample, which is freshly reverse transcribed from human liver RNA. Open-tube and one-pot protocols show high concordance on reads uniformity and UMI count, assuring good preservation of molecule conversion yield and the robustness of RNA expression level analysis. See Figure
14

Claims

CLAIMS A method for quantitating between 1 and 10,000 Target RNA Regions in an RNA sample, the method comprising:
(a) converting RNA molecules in the RNA sample to complementary DNA (cDNA) molecules;
(b) contacting the cDNA molecules with:
(i) a set of Unique Molecular Identifier (UMI) Primers, each UMI Primer of the set of UMI Primers comprising a unique molecular identifier sequence (UMI) and a gene-specific sequence that is complementary to a Target RNA Region subsequence;
(ii) a second set of primers;
(iii)a first DNA polymerase; and
(iv) reagents and buffers needed for DNA polymerase extension to generate a mixture;
(c) subjecting the mixture to temperatures that allow primer binding and DNA polymerase extension;
(d) removing non-extended UMI Primers to generate a product;
(e) preparing a sequencing library using the product;
(1) subjecting the sequencing library to high-throughput DNA sequencing to generate sequencing reads; and
(g) calculating a molecule count for each Target RNA Region based on the UMI from the sequencing reads. A method for quantitating between 1 and 10,000 Target RNA Regions in an RNA sample, the method comprising:
(a) converting RNA molecules in the RNA sample to complementary DNA (cDNA) molecules;
(b) contacting the cDNA molecules with:
(i) a set of Unique Molecular Identifier (UMI) Primers, each UMI Primer of the set of UMI Primers comprising a unique molecular identifier sequence (UMI) and a gene-specific sequence that is complementary to a Target RNA Region subsequence;
(ii) optionally, a second set of primers;
(iii)a first DNA polymerase; and
48 (iv) reagents and buffers needed for DNA polymerase extension to generate a mixture;
(c) subjecting the mixture to temperatures that allow primer binding and DNA polymerase extension;
(d) removing non-extended UMI Primers to generate a product;
(e) preparing a sequencing library using the product;
(f) subjecting the sequencing library to high-throughput DNA sequencing to generate sequencing reads; and
(g) calculating a molecule count for each Target RNA Region based on the UMI from the sequencing reads. A method for quantitating between 1 and 10,000 Target RNA Regions in an RNA sample, the method comprising:
(a) converting RNA molecules in the RNA sample to complementary DNA (cDNA) molecules;
(b) contacting the cDNA molecules with:
(i) a set of Unique Molecular Identifier (UMI) Primers, each UMI Primer of the set of UMI Primers comprising a unique molecular identifier sequence (UMI) and a gene-specific sequence that is complementary to a Target RNA Region subsequence;
(ii) a first DNA polymerase; and
(iii)reagents and buffers needed for DNA polymerase extension to generate a mixture;
(c) subjecting the mixture to temperatures that allow primer binding and DNA polymerase extension to generate a first product;
(d) contacting the first product with a second set of primers;
(e) subjecting the mixture to temperatures that allow primer binding and DNA polymerase extension to generate a second product;
(1) removing non-extended UMI Primers from the second product to generate a third product;
(g) preparing a sequencing library using the third product;
(h) subjecting the sequencing library to high-throughput DNA sequencing to generate sequencing reads; and
(i) calculating a molecule count for each Target RNA Region based on the UMI from the sequencing reads.
49
4. The method of claim 2, wherein the second set of primers is added during step (c) instead of during step (b).
5. The method of any one of claims 1-3, wherein the method further comprises: calculating Target RNA transcript expression level based on a mean molecule count of the Target RNA Region within the same transcript.
6. The method of claim 5, wherein between 1 and 10 of the Target RNA Regions are amplified from one Target RNA transcript.
7. The method of any one of claims 1-3, wherein step (a) comprises the use of a reverse transcription primer.
8. The method of any one of claims 1-3, wherein the second set of primers comprises a Universal Forward Primer and a Universal Reverse Primer.
9. The method of claim 8, wherein the reverse transcription primer comprises at least one degenerate nucleotide.
10. The method of claim 8, wherein the reverse transcription primer comprises a random hexamer.
11. The method of claim 8, wherein the reverse transcription primer comprises a polyT string.
12. The method of any one of claims 1-3, wherein the RNA sample is obtained from a source selected from the group consisting of formalin-fixed paraffin-embedded tissue, whole blood, plasma, and fresh tissue.
13. The method of any one of claims 1-3, wherein the RNA sample is a human RNA sample.
14. The method of any one of claims 1-3, wherein the UMI sequence comprises between 7 and 30 degenerate nucleotides.
15. The method of claim 14, wherein each of the between 7 and 30 degenerate nucleotides is selected from the group consisting of N, B, D, H, V, S, W, Y, R, M, and K.
16. The method of any one of claims 1-3, wherein the high-throughput sequencing comprises sequencing-by-synthesis or nanopore-based sequencing.
17. The method of any one of claims 1-3, wherein the first DNA polymerase is a thermostable DNA polymerase.
18. The method of claim 17, wherein the thermostable DNA polymerase is selected from the group consisting of Taq DNA polymerase, Phusion® DNA polymerase, Q5® DNA polymerase, and KAPA High Fidelity DNA polymerase.
50
19. The method of any one of claims 1-3, wherein the first DNA polymerase is a nonthermostable DNA polymerase.
20. The method of claim 19, wherein the first DNA polymerase is selected from the group consisting of phi29 DNA polymerase and Bst DNA polymerase.
21. The method of any one of claims 1 or 2, wherein step (d) comprises a purification method selected from the group consisting of solid phase reversible immobilization (SPRI) purification, column purification, and enzymatic digestion.
22. The method of claim 3, wherein step (1) comprises a purification method selected from the group consisting of solid phase reversible immobilization (SPRI) purification, column purification, and enzymatic digestion.
23. The method of claim 1 or 2, wherein step (g) comprises removal of at least one sequencing read wherein the UMI sequence of the at least one sequencing read does not comprise a predefined UMI degenerate base design pattern prior to the calculating.
24. The method of claim 3, wherein step (i) comprises removal of at least one sequencing read wherein the UMI sequence of the at least one sequencing read does not comprise a predefined UMI degenerate base design pattern prior to the calculating.
25. The method of claim 1 or 2, wherein step (g) comprises removal of a first UMI sequence that differs by only 1 or 2 nucleotides from a second UMI sequence, wherein fewer sequencing reads contain the first UMI sequence as compared to the second UMI sequence.
26. The method of claim 3, wherein step (i) comprises removal of a first UMI sequence that differs by only 1 or 2 nucleotides from a second UMI sequence, wherein fewer sequencing reads contain the first UMI sequence as compared to the second UMI sequence.
27. The method of claim 1 or 2, wherein step (g) comprises:
(i) aligning the sequencing reads to the Target RNA Regions, and grouping the sequencing reads into locus-specific subgroups by the loci they are aligned to;
(ii) for each locus, grouping the sequencing reads by the UMI sequence to generate UMI Families;
(iii)removing UMI families comprising a UMI family size of less than X,
(A) wherein X is set as Y% of the mean value for the largest Z UMI Family size(s) in the Target RNA Region, wherein Y is between 1% and 20% and Z is between 1 and 20; or
51 (B) wherein X is set as Y% of the mean value for the largest Z UMI Family size(s) for all UMI families with the exact same genotype in the Target RNA Regions, wherein Y is between 1% and 20% and Z is between 1 and 20; or
(C) wherein X is set based on a Gaussian fitting on a histogram of log2(UMI Family size) for the Target RNA Regions, wherein log2(X) is (the center of the histogram peak) - (3 x the standard deviation of the histogram); or
(D) wherein X is set as a fixed number between 2 and 20;
(iv)counting the number of unique UMI sequences (N); and
(v) dividing N by a barcoding yield, to identify the number of original molecules of the Target RNA Regions, wherein the barcoding yield for each Target RNA Region is calibrated using a sample with a known molecule count. method of claim 3, wherein step (i) comprises:
(i) aligning the sequencing reads to the Target RNA Regions, and grouping the sequencing reads into locus-specific subgroups by the loci they are aligned to;
(ii) for each locus, grouping the sequencing reads by the UMI sequence to generate UMI Families;
(iii)removing UMI families comprising a UMI family size of less than X,
(A) wherein X is set as Y% of the mean value for the largest Z UMI Family size(s) in the Target RNA Region, wherein Y is between 1% and 20% and Z is between 1 and 20; or.
(B) wherein X is set as Y% of the mean value for the largest Z UMI Family size(s) for all UMI families with the exact same genotype in the Target RNA Regions, wherein Y is between 1% and 20% and Z is between 1 and 20; or
(C) wherein X is set based on a Gaussian fitting on a histogram of log2(UMI Family size) for the Target RNA Regions, wherein log2(X) is (the center of the histogram peak) - (3 x the standard deviation of the histogram); or
(D) wherein X is set as a fixed number between 2 and 20;
(iv)counting the number of unique UMI sequences (N); and (v) dividing N by a barcoding yield, to identify the number of original molecules of the Target RNA Regions, wherein the barcoding yield for each Target RNA Region is calibrated using a sample with a known molecule count. The method of claim 1 or 2, wherein step (g) further comprises identifying a UMI Family Sequence. The method of claim 3, wherein step (i) further comprises identifying a UMI Family Sequence. The method of any one of claims 1-3, wherein the UMI Primers comprise from 5' to 3':
(i) a first universal region;
(ii) an optional second region comprising a length between 1 and 50 nucleotides;
(iii)a third region comprising a UMI sequence; and
(iv)a fourth region targeting a specific genomic region. The method of any one of claims 1-3, wherein:
(i) step (b) further comprises introduction of a set of Outer Primers; and
(ii) wherein the second set of primers comprises a set of Inner Primers, wherein between 3 and 20 nucleotides at the 3' end of the Inner Primer are not subsequences of the Outer Primers. The method of any one of claims 1-3, wherein the between 1 and 10,000 Target RNA Regions are positioned within one or more genes selected from the group consisting of: BAG1, PGR, ERBB2, GRB7, TFRC, MKI67, MMP11, MYBL2, BIRC5, ESRI, CD68, CTSV, BCL2, CCNB1, GUSB, SCUBE2, RPLP0, ACTB, AURKA, and GAPDH. The method of any one of claims 1-3, wherein the RNA sample comprises less than or equal to 25 ng of RNA. The method of any one of claims 1-3, wherein the RNA sample comprises less than or equal to 10 ng of RNA. The method of any one of claims 1-3, wherein at least five of the Target RNA Regions are positioned within a single Target RNA transcript.
PCT/US2022/078978 2021-11-01 2022-10-31 Rna quantitative amplicon sequencing for gene expression quantitation WO2023077121A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202163274270P 2021-11-01 2021-11-01
US63/274,270 2021-11-01

Publications (1)

Publication Number Publication Date
WO2023077121A1 true WO2023077121A1 (en) 2023-05-04

Family

ID=84365649

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2022/078978 WO2023077121A1 (en) 2021-11-01 2022-10-31 Rna quantitative amplicon sequencing for gene expression quantitation

Country Status (1)

Country Link
WO (1) WO2023077121A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170067090A1 (en) * 2014-05-19 2017-03-09 William Marsh Rice University Allele-specific amplification using a composition of overlapping non-allele-specific primer and allele-specific blocker oligonucleotides
WO2019164885A1 (en) * 2018-02-20 2019-08-29 William Marsh Rice University Systems and methods for allele enrichment using multiplexed blocker displacement amplification
WO2020041702A1 (en) * 2018-08-24 2020-02-27 Swift Biosciences, Inc. Asymmetric targeted amplification methods
WO2020142631A2 (en) * 2019-01-04 2020-07-09 William Marsh Rice University Quantitative amplicon sequencing for multiplexed copy number variation detection and allele ratio quantitation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170067090A1 (en) * 2014-05-19 2017-03-09 William Marsh Rice University Allele-specific amplification using a composition of overlapping non-allele-specific primer and allele-specific blocker oligonucleotides
WO2019164885A1 (en) * 2018-02-20 2019-08-29 William Marsh Rice University Systems and methods for allele enrichment using multiplexed blocker displacement amplification
WO2020041702A1 (en) * 2018-08-24 2020-02-27 Swift Biosciences, Inc. Asymmetric targeted amplification methods
WO2020142631A2 (en) * 2019-01-04 2020-07-09 William Marsh Rice University Quantitative amplicon sequencing for multiplexed copy number variation detection and allele ratio quantitation

Non-Patent Citations (11)

* Cited by examiner, † Cited by third party
Title
"McGraw-Hill Dictionary of Scientific and Technical Terms", 2002, MCGRAW-HILL
"Oxford Dictionary of Biology", 2008, OXFORD UNIVERSITY PRESS
"The American Heritage® Science Dictionary", 2011
ALTSCHUL ET AL.: "Basic local alignment search tool", J. MOL. BIOL., vol. 215, 1990, pages 403 - 410, XP002949123, DOI: 10.1006/jmbi.1990.9999
CHENNA ET AL.: "Multiple sequence alignment with the Clustal series of programs", NUCLEIC ACIDS RESEARCH, vol. 31, 2003, pages 3497 - 3500, XP002316493, DOI: 10.1093/nar/gkg500
LARKIN MA ET AL.: "Clustal W and Clustal X version 2.0", BIOINFORMATICS, vol. 23, 2007, pages 2947 - 48
SMITHWATERMAN, ADV. APPL. MATH., vol. 2, 1981, pages 482 - 489
THOMPSON ET AL.: "Clustal W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice", NUCLEIC ACIDS RESEARCH, vol. 22, 1994, pages 4673 - 4680, XP002956304
WU LUCIA ET AL: "Ensemble of Nucleic Acid Absolute Quantitation Modules for Accurate Copy Number Variation Detection and Targeted RNA Profiling", RESEARCH SQUARE, 20 September 2021 (2021-09-20), XP093022744, Retrieved from the Internet <URL:https://www.researchsquare.com/article/rs-923491/v1.pdf?c=1649156930000> [retrieved on 20230210], DOI: 10.21203/rs.3.rs-923491/v1 *
WU LUCIA RUOJIA ET AL: "Ensemble of nucleic acid absolute quantitation modules for copy number variation detection and RNA profiling", NATURE COMMUNICATIONS, vol. 13, no. 1, 4 April 2022 (2022-04-04), XP093022712, Retrieved from the Internet <URL:https://www.nature.com/articles/s41467-022-29487-y.pdf> DOI: 10.1038/s41467-022-29487-y *
ZHANGMADDEN, GENOME RES., vol. 7, 1997, pages 649 - 656

Similar Documents

Publication Publication Date Title
US10934576B2 (en) Profiling expression at transcriptome scale
US20220267845A1 (en) Selective Amplfication of Nucleic Acid Sequences
US20230119938A1 (en) Methods of Preparing Dual-Indexed DNA Libraries for Bisulfite Conversion Sequencing
JP3693352B2 (en) Methods for detecting genetic polymorphisms and monitoring allelic expression using probe arrays
Wang A PCR-based platform for microRNA expression profiling studies
US9096895B2 (en) Method for quantification of small RNA species
US9169520B2 (en) Cancer marker, method for evaluation of cancer by using the cancer marker, and evaluation reagent
US7687616B1 (en) Small molecules modulating activity of micro RNA oligonucleotides and micro RNA targets and uses thereof
WO2006102309A2 (en) Methods, compositions, and kits for detection of micro rna
US10415085B2 (en) Nucleotide sequence, universal reverse primer, universal RT primer, method for designing primer and miRNA detection method
US10870879B2 (en) Method for the preparation of bar-coded primer sets
ZA200504282B (en) Absolute quantitation of nucleic acids by RT-PCR
WO2023077121A1 (en) Rna quantitative amplicon sequencing for gene expression quantitation
EP4359557A1 (en) Methods and compositions for combinatorial indexing of bead-based nucleic acids
KR20230006852A (en) Quantitative blocker displacement amplification (QBDA) sequencing for quantification of uncorrected and multiple variant allele frequencies
US20230399687A1 (en) Quantitative Multiplex Amplicon Sequencing System
US20230323451A1 (en) Selective amplification of molecularly identifiable nucleic 5 acid sequences
US20230313284A1 (en) Advanced dumbell pcr for isomir detection
WO2023107570A1 (en) Expression-weighted tumor mutational burden as an oncology biomarker
WO2024039272A1 (en) Nucleic acid amplification
Xuan et al. microRNA profiling: strategies and challenges
WO2021016403A1 (en) Method, apparatus and system to detect indels and tandem duplications using single cell dna sequencing
WO2013140339A1 (en) Positive control for pcr
JP2016512696A (en) Method for amplifying fragmented target nucleic acid using assembler sequence

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22814270

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE