WO2014201273A1 - High-throughput rna-seq - Google Patents

High-throughput rna-seq Download PDF

Info

Publication number
WO2014201273A1
WO2014201273A1 PCT/US2014/042159 US2014042159W WO2014201273A1 WO 2014201273 A1 WO2014201273 A1 WO 2014201273A1 US 2014042159 W US2014042159 W US 2014042159W WO 2014201273 A1 WO2014201273 A1 WO 2014201273A1
Authority
WO
WIPO (PCT)
Prior art keywords
nucleic acid
sequence
cells
cdna
nucleotides
Prior art date
Application number
PCT/US2014/042159
Other languages
French (fr)
Inventor
Tarjei MIKKELSEN
Magali SOUMILLON
Original Assignee
The Broad Institute, Inc.
President And Fellows Of Harvard College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Broad Institute, Inc., President And Fellows Of Harvard College filed Critical The Broad Institute, Inc.
Priority to US14/898,030 priority Critical patent/US20160122753A1/en
Publication of WO2014201273A1 publication Critical patent/WO2014201273A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1065Preparation or screening of tagged libraries, e.g. tagged microorganisms by STM-mutagenesis, tagged polynucleotides, gene tags
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • C12Q1/6874Methods for sequencing involving nucleic acid arrays, e.g. sequencing by hybridisation

Definitions

  • the present invention relates generally to methods for single-cell nucleic acid profiling, and nucleic acids useful in those methods.
  • it concerns using barcode sequences to track individual nucleic acids at single-cell resolution, utilizing template switching and sequencing reactions to generate the nucleic acid profiles.
  • the methods and compositions provided herein are also applicable to other starting materials, such as cell and tissue lysates or extracted/purified RNA. Background of the Invention
  • transcriptome profiling is an important method for functional characterization of cells and tissues
  • current technical limitations for whole transcriptome analysis limit the technique to either population averages or to a limited number of single cells.
  • These shortcomings limit transcriptome profiling 's ability to accurately assess stochastic variation in gene expression between individual cells and the analysis of distinct subpopulations of cells, both of which have been proposed to be important factors driving cellular differentiation and tissue homeostasis.
  • current single-cell transcriptome profiling methods in addition to being limited to a relatively low number of cells, also are expensive and labor-intensive. Improved methods are therefore required to fully characterize a cell population at single-cell resolution. Such improved methods also have utility in improving analysis of other starting materials, such as cell and tissue lysates or extracted/purified R A.
  • the invention provides a nucleic acid comprising a 5' poly-isonucleotide sequence (for example, comprising an isocytosine, an isoguanosine, or both, such as an isocytosine -isoguanosine-isocytosine sequence), an internal adapter sequence, and a 3' guanosine tract.
  • the 3' guanosine tract can comprise two guanosines, three guanosines, four guanosines, five guanosines, six guanosines, seven guanosines, or eight guanosines.
  • the 3' guanosine tract comprises three guanosines.
  • the adapter sequence can be 12 to 32 nucleotides in length, for example, 22 nucleotides in length (e.g., an adapter sequence of 5'-ACACTCTTTCCCTACACGACGC-3' (SEQ ID NO: 1)).
  • the invention provides a nucleic acid comprising a 5' blocking group (e.g., biotin or an inverted nucleotide), an internal adapter sequence, a barcode sequence, a unique molecular identifier (UMI) sequence, a complementarity sequence, and a 3' dinucleotide sequence comprising a first nucleotide and a second nucleotide, wherein the first nucleotide of the dinucleotide sequence is a nucleotide selected from adenine, guanine, and cytosine, and the second nucleotide of the dinucleotide sequence is a nucleotide selected from adenine, guanine, cytosine, and thymine.
  • a 5' blocking group e.g., biotin or an inverted nucleotide
  • UMI unique molecular identifier
  • the internal adapter sequence is 23 to 43 nucleotides in length, for example, 33 nucleotides in length (e.g., an internal adapter sequence of 5'- ACACTCTTTCCCTACACGACGC-3' (SEQ ID NO: 1)).
  • an internal adapter sequence of 5'- ACACTCTTTCCCTACACGACGC-3' SEQ ID NO: 1
  • the barcode sequence is 4 to 20 nucleotides in length, for example, 6 nucleotides in length. In certain embodiments, the UMI sequence is six to 20 nucleotides in length, for example, ten nucleotides in length. In some
  • the complementarity sequence is a poly(T) sequence, and may be 20 to 40 nucleotides in length, for example, 30 nucleotides in length.
  • the invention provides a kit comprising one or more nucleic acids as described above, for example a) a nucleic acid comprising a 5 ' poly-isonucleotide sequence, an internal adapter sequence, and a 3 ' guanosine tract, b) a nucleic acid comprising a 5' blocking group (e.g., biotin or an inverted nucleotide), an internal adapter sequence, a barcode sequence, a unique molecular identifier (UMI) sequence, a complementarity sequence, and a 3 ' dinucleotide sequence comprising a first nucleotide and a second nucleotide, wherein the first nucleotide of the dinucleotide sequence is a nucleotide selected from adenine,
  • UMI unique molecular
  • the kit comprises a plurality of the nucleic acids of b).
  • the UMI sequence of each nucleic acid in the plurality of nucleic acids is unique among the nucleic acids in the kit, and in still further embodiments, the plurality of nucleic acids comprises different populations of nucleic acid species.
  • each population of nucleic acid species may comprise a different barcode sequence that uniquely identifies a single population of nucleic acid species.
  • each population of nucleic acid species is in a separate container, and the bar code of each population of nucleic acid species differs by at least two nucleotides from the bar code of each other population of nucleic acid species.
  • a kit of the invention may further comprise a third nucleic acid primer comprising 12 to 32 nucleotides (e.g., 22 nucleotides in length) and a 5' blocking group (e.g., biotin or an inverted nucleotide).
  • An exemplary sequence of such a primer is 5'-ACACTCTTTCCCTACACGACGC-3' (SEQ ID NO: 2).
  • a kit may further comprise a nucleic acid comprising a barcode sequence, and optionally also comprise a phosphorothioate bond-containing nucleic acid comprising an ⁇ 1 * ⁇ 2* ⁇ 3* ⁇ 4* ⁇ 5*3' sequence, wherein * is a phosphorothioate bond.
  • the phosphorothioate bond-containing nucleic acid is 48 to 68 nucleotides in length, for example, 58 nucleotides in length.
  • An exemplary sequence of a phosphorothioate bond-containing nucleic acid is
  • the kit further comprises a capture plate and/or a reverse transcriptase enzyme, such as a Moloney Murine Leukemia Virus
  • MMLV reverse transcriptase e.g., SMARTscribeTM reverse transcriptase or Superscript IITM reverse transcriptase or Maxima H MinusTM reverse transcriptase
  • DNA purification column such as a DNA purification spin column
  • protease or proteinase e.g., proteinase K
  • the invention provides a method for gene profiling, comprising a) providing a plurality of single cells; b) releasing mRNA from each single cell to provide a plurality of individual mRNA samples, wherein each individual mRNA sample is from a single cell; c) reverse transcribing the individual mRNA samples and performing a template switching reaction to produce cDNA incorporating a barcode sequence; d) pooling and purifying the barcoded cDNA produced from the separate cells; e) amplifying the barcoded cDNA to generate a cDNA library comprising double-stranded cDNA; f) purifying the double-stranded cDNA; g) fragmenting the purified cDNA; h) purifying the cDNA fragments; and i) sequencing the cDNA fragments.
  • the invention provides a method for gene profiling, comprising a) providing an isolated population of cells; b) releasing mRNA from the population of cells to provide one or more mRNA samples; c) reverse transcribing the one or more mRNA samples and performing a template switching reaction to produce cDNA incorporating a barcode sequence; d) pooling and purifying the barcoded cDNA; e) amplifying the barcoded cDNA to generate a cDNA library comprising double-stranded cDNA; f) purifying the double-stranded cDNA; g) fragmenting the purified cDNA; h) purifying the cDNA fragments; and i) sequencing the cDNA fragments.
  • the method further comprises separating a population of cells (e.g., by flow cytometry) to provide the plurality of single cells, for example, by separating them into a capture plate.
  • a population of cells can be sorted into a capture plate such that each well of the capture plate contains a smaller population of cells.
  • cell lysate or R A samples can be divided into a capture plate.
  • the mR A is released by cell lysis, for example, by freeze-thawing and/or contacting the cells with proteinase K.
  • c) comprises contacting each individual mRNA sample with one or more nucleic acids as described above, for example i) a nucleic acid comprising a 5 ' poly-isonucleotide sequence, an internal adapter sequence, and a 3 ' guanosine tract, ii), a nucleic acid comprising a 5 ' blocking group (e.g., biotin or an inverted nucleotide), an internal adapter sequence, a barcode sequence, a unique molecular identifier (UMI) sequence, a complementarity sequence, and a 3' dinucleotide sequence comprising a first nucleotide and a second nucleotide, wherein the first nucleotide of the dinucleotide sequence is a nucleotide selected from adenine, guanine, and cytosine, and the second nucleotide of the dinucleotide sequence is a nucleotide selected from aden
  • c) is carried out with a reverse transcriptase enzyme, for example, a Moloney Murine Leukemia Virus (MMLV) reverse transcriptase such as SMARTscribeTM reverse transcriptase or Superscript IITM reverse transcriptase or Maxima H MinusTM reverse transcriptase.
  • MMLV Moloney Murine Leukemia Virus
  • the cDNA purification of d) is carried out with a Zymo-SpinTM column.
  • the method further comprises treating the barcoded cDNA with an exonuclease, such as with Exonuclease I.
  • an exonuclease such as with Exonuclease I.
  • the amplification of e) utilizes an amplification primer comprising a 5' blocking group, such as biotin or an inverted nucleotide.
  • amplification primers are 12 to 32 nucleotides in length, for example, 22 nucleotides in length (e.g., as in the amplification primer having the sequence of 5'-ACACTCTTTCCCTACACGACGC-3' (SEQ ID NO: 2)).
  • the purification of f) may be carried out with magnetic beads, e.g., Agencourt AMPure XP magnetic beads (Beckman Coulter, #A63880), and/or may further comprise quantifying the purified cDNA.
  • the single cells are provided in a capture plate of individual wells (e.g., a 384 well plate), each well comprising a single cell.
  • a population of cells is provided in a capture plate, each well comprising a population of cells.
  • cell lysate or RNA samples can be provided in a capture plate.
  • a particular sample such as a sample in a well of a plate
  • that sample may be a single cell or some other sample, such as a lysate or bulk RNA.
  • reference to a "well” or “sample” should be understood to refer to any of those types of samples.
  • reference to "cell/well” or “well/cell” is similarly used to reflect that a sample may be a single cell or some other sample.
  • the fragmentation of g) utilizes a transposase, and may further utilize a first fragmentation nucleic acid and a second
  • first fragmentation nucleic acid wherein the first fragmentation nucleic acid comprises a barcode sequence.
  • An exemplary first fragmentation nucleic acid is 5'- C AAGC AG AAG AC GGC AT AC GAG AT [i7] GT CTC GTGGGCTC GG-3 ' (SEQ ID NO: 4), wherein [i7] represents a barcode sequence.
  • the first fragmentation nucleic acid comprises a barcode sequence.
  • [i7] sequence is four to 16 nucleotides in length, for example, eight nucleotides in length.
  • the [i7] sequence uniquely identifies a single population of nucleic acid species, for example, a population of nucleic acid species derived from a population of single cells from a capture plate.
  • the [i7] sequence is selected from: TCGCCTTA (SEQ ID NO: 5),
  • CTAGTACG (SEQ ID NO: 6), TTCTGCCT (SEQ ID NO: 7), GCTCAGGA (SEQ ID NO: 8), AGGAGTCC (SEQ ID NO: 9), CATGCCTA (SEQ ID NO: 10), GTAGAGAG (SEQ ID NO: 11), CCTCTCTG (SEQ ID NO: 12), AGCGTAGC (SEQ ID NO: 13), CAGCCTCG (SEQ ID NO: 14), TGCCTCTT (SEQ ID NO: 15), and TCCTCTAC (SEQ ID NO: 16).
  • the barcode sequence of the first fragmentation nucleic acid is different than the barcode sequence of the nucleic acid described in ii) above.
  • the barcode sequence of the first fragmentation nucleic acid uniquely identifies a predetermined subset of cells, for example, a subset of cells contained in individual wells of a single capture plate. In further embodiments, the barcode sequence that uniquely identifies the predetermined subset of cells uniquely identifies the capture plate. In certain embodiments, the barcode sequence of the nucleic acid as described in ii) above uniquely identifies the cell within the predetermined subset of cells, which cell comprised the m NA from which the barcoded cDNA of c) was produced. In further embodiments, the barcode sequence that uniquely identifies the cell within the predetermined subset of cells uniquely identifies an individual well in a capture plate, and in still further embodiments, the
  • the barcode sequence of the first fragmentation nucleic acid is 4 to 20 nucleotides in length, for example, 6 nucleotides in length.
  • the second fragmentation nucleic acid is a phosphorothioate bond-containing nucleic acid comprising an X1 *X2*X3*X4*X5*3' sequence, wherein * is a phosphorothioate bond.
  • An exemplary second fragmentation nucleic acid is 48 to 68 nucleotides in length, e.g., 58 nucleotides in length, such as a second fragmentation nucleic acid with a sequence of 5'-
  • the purification of h) is carried out with magnetic beads, and may optionally further comprise separating the magnetic-bead purified cDNA on an agarose gel, excising cDNA corresponding to 300 to 800 nucleotides in length, and purifying the excised cDNA.
  • h) further comprises quantifying the purified cDNA.
  • the sequencing of i) is carried out using R A-seq.
  • the method further comprises assembling a database of the sequences of the sequenced cDNA fragments of j), and may additionally comprise identifying the UMI sequences of the sequences of the database.
  • j) further comprises discounting duplicate sequences that share a UMI sequence, thereby assembling a set of sequences in which each sequence is associated with a unique UMI.
  • a) through h) are repeated before i) to produce a plurality of populations of cDNA fragments, and in particular embodiments, the populations of cDNA fragments are combined prior to i).
  • the barcode sequence of the first fragmentation nucleic acid and the barcode sequence of the nucleic acid as described in ii) above are used to correlate the sequencing data with the predetermined subset of cells and the individual cell.
  • Figure 1 depicts incomplete differentiation of human adipose tissue - derived stromal/stem cells (hASCs) in vitro.
  • Figure 1 A cells at day 0.
  • Figure IB cells at day 7 (i.e., on the seventh day after the cells were induced to differentiate).
  • Figure 1C cells at day 14 (i.e., on the fourteenth day after the cells were induced to differentiate).
  • Figure 2 depicts a flow chart of an exemplary method for single cell RNA sequencing.
  • Figure 3 depicts how a single cell digital gene expression library was constructed, including barcode sequences incorporating sequencing primer sequences, indicated by arrows, and regions that anneal to their complementary oligonucleotides on a flow cell during sequencing (P5 and P7).
  • N 6 cell/well barcode index
  • N 10 Unique Molecular Identifier (UMI).
  • the sequencing primer with an i7 plate index is indicated by an arrow, and the two sequencing primers (read 1 and read 2) also are indicated by arrows.
  • Figure 4 depicts a reduction in PCR bias through the use of Unique Molecular Identifier (UMI) sequences.
  • UMI Unique Molecular Identifier
  • Figure 5 depicts distributions of expression levels of the key marker genes FABP4 (Figure 5A), SCD (Figure 5B), LPL (Figure 5C), and POSTN ( Figure 5D) during adipocyte differentiation. Particularly, Figure 5 depicts the expression levels of gene across the cells/wells over time such that the position on the y axis shows the level of expression and the thickness of the bar shows the number of cells expressing at that level.
  • Figure 6 depicts gene detection in single cells. Approximately 3,000 to 4,000 unique genes were detected per cell and approximately 15,000 unique genes were detected across all cells. Gene expression was reliably detected at approximately 25 to 50 transcripts per cell, although bursty transcription
  • Figure 7 depicts GAPDH detection at day 0.
  • Figure 7 A depicts a histogram showing the distribution of GAPDH expression among cells profiled at day 0 as an exemplification of a transcriptional burst.
  • Figure 7B depicts genes associated with GAPDH.
  • Figure 7C provides a pictorial representation of the cell cycle. GAPDH is considered to be a housekeeping gene and often is used as a reference gene for normalization.
  • Figure 8 depicts principal component analysis of an hASC population at day O.
  • Figure 9 depicts principal component analysis of an hASC population at day 0 (black) and day 1 (gray).
  • Figure 10 depicts principal component analysis of an hASC population at day 0 (black) and day 2 (gray).
  • Figure 11 depicts principal component analysis of an hASC population at day 0 (black) and day 3 (gray).
  • Figure 12 depicts principal component analysis of an hASC population at day 0 (black) and day 7 (gray).
  • Figure 13 depicts principal component analysis of an hASC population at day 0 (black) and day 14 (gray).
  • Figure 14 depicts differentially expressed genes between day 0 (black) and day 14 (gray) hASC populations and between day 14 sub-populations.
  • Figure 15 depicts the expression of adipocyte genes correlating with Gl- arrest. Genes that had similar expression levels at Day 14 and Day 0 ( Figure 15 A, label A) correspond to categories of genes involved in G-l arrest ( Figure 15B, label A), indicating that those cells that did not fully differentiate may be stuck in the GO phase. This reveals a correlation between differentiation state and cell cycle progression when gene expression is analyzed at the single cell level.
  • Figure 16 depicts the process of adipocyte differentiation in mouse (3T3- Ll) and human (hASC) stem cells, and that an absence of clonal expansion of hASCs may limit adipogenesis.
  • Figure 17 depicts cell culture heterogeneity using single-cell sequencing.
  • Figure 17A depicts gene expression estimates from bulk cells compared to their corresponding means across single cell profiles.
  • UPM unique molecular identifier (UMI) counts for one gene per million UMI counts for all genes.
  • Figures 17C and 17D depict single cell qPvT-PCR validation and single molecule FISH validation, respectively, of the observed positive correlation between the LPL and G0S2 markers from separate cells also collected at day 7.
  • Figure 18 depicts a comparison of RefSeq gene expression levels as estimated from the total number of raw aligned sequencing reads or the total number of unique UMIs. Each dot compares the mean raw counts across all profiled cells in the first time course (Dl) to the mean UMI counts for the same gene. The raw and UMI counts are strongly correlated, but the UMI counts correct for a systematic bias in the raw expression levels of a subset of genes, which is likely caused by preferential PCR amplification or sequencing.
  • Figure 20 depicts a comparison of single-molecule RNA sequencing ( Figure 20A) and single molecule FISH (smFISH, Figure 20B) data for LPL and G0S2 during the D3 time course.
  • Single -molecule RNA sequencing values are in UPM, while smFISH measurements are in mRNAs detected per cell.
  • the smFISH data confirm the positive correlation between LPL and G0S2 after 7 days of differentiation.
  • R Pearson's correlation coefficient.
  • Figure 21 depicts gene expression dynamics at single cell resolution. Each scatter plot depicts the first three principal components (PCs) of the initial hASC time course at the indicated time point ( Figure 21 A: day 0; Figure 21B: day 1; Figure 21C: day 2; Figure 21D: day 3; Figure 21E: day 5; Figure 21F: day 7; Figure 21G: day 9; Figure 21H: day 14). Black dots show cells collected at the indicated time point, while gray dots show cells collected at all previous time points.
  • Figure 211 depicts separately sorted cells with high and low lipid content from day 14 projected into the same PC space.
  • Figure 22 depicts distributions of weights for the top four PCs in an initial hASC time course and a lipid-based sorting.
  • selected genes and gene sets associated with positive and negative weights are provided. Percentages indicate the ratio of the total variance in the data set captured by each PC. Horizontal lines within each set of boxes indicate medians, boxes indicate the 1st and 3rd quartiles, and whiskers indicate the ranges.
  • the present invention provides nucleic acids, kits, and methods for transcriptome-wide profiling at single cell resolution.
  • the invention provides Unique Molecular Identifiers (UMIs) (e.g., polynucleotides comprising UMIs) that specifically tag individual cDNA species as they are created from mRNA, thereby acting as a robust guard against amplification biases.
  • UMIs Unique Molecular Identifiers
  • Each UMI enables a sequenced cDNA to be traced back to a single particular mRNA molecule that was present in a cell.
  • the invention provides two levels of barcode-based multiplexing, allowing a sequenced cDNA to be traced to a particular cell from among a subset of cells.
  • the invention provides efficient transposon-based fragmentation, resulting in high yield cDNA libraries.
  • the invention provides sequencing of the 3 '-end of mRNAs, limiting the sequencing coverage required to assess gene expression level of each single cell transcriptome.
  • the methods allow the preparation of RNA-seq libraries in a manner that is not labor-intensive or time- consuming. Indeed, RNA-seq libraries of a thousand single cells can be easily prepared in two days. Any of the foregoing (or any of the nucleic acids, reagents, kits, and methods described herein may be provided and/or used alone or in any combination).
  • the invention also provides nucleic acids, kits, and methods for sequencing of extracted/purified RNA (bulk RNA sequencing) or for analysis of an isolated population of cells (e.g., from an isolated population of cells or a tissue; analysis of a cell or tissue lysate).
  • bulk RNA sequencing or for analysis of an isolated population of cells (e.g., from an isolated population of cells or a tissue; analysis of a cell or tissue lysate).
  • any of the compositions, reagents, and methods described herein as applicable to single cells also are applicable to other sources of starting materials, such as extracted RNA, purified RNA, cell lysates, or tissue lysates, and such application is contemplated.
  • any of the compositions, reagents, and methods described herein as applicable to single cells also are applicable to other sources of starting materials, such as extracted RNA, purified RNA, cell lysates, or tissue lysates, and such application is contemplated.
  • the present invention provides improved nucleic acids, kits, and methods capable of transcriptome-wide profiling at single cell resolution of tens of thousands of cells simultaneously and cost-effectively (approximately $2 per sample, as compared to approximately $80 per sample with a current method).
  • the methods and kits may include both customized nucleic acids and/or method steps that are themselves the subject of this application, as well as one or more commercially available reagents, kits, apparatuses, or method steps.
  • the methods of the invention provide a number of distinct advantages over existing methods. Some current methods require a polyA addition step prior to sequencing, but this step can be eliminated through the use of a Moloney Murine Leukemia Virus reverse transcriptase.
  • full-length cDNA amplification can be carried out using the suppression PCR principle, thereby enriching full length cDNAs, and the method can be applied directly to cells rather than requiring
  • the methods of the invention also provide an advantage in that they utilize at least two barcode sequences rather than one, allowing for the
  • the methods of the invention provide an advantage over current methods targeting the 3 'end of mRNA that use linear mR A amplification.
  • Linear mR A amplification is time-consuming compared to template switching/suppression PCR amplification.
  • Linear mRNA amplification also is labor-intensive and limits the number of cells that can be processed to approximately 50 cells per day by a single person.
  • the methods of the invention can accommodate 384 cells in a single plate, allowing a single person to easily process up to 1152 cells per day.
  • UMIs also provides a distinct advantage over typical single- cell RNA-seq methods. Because of the very low starting amount of RNA in a single cell, several amplification steps are required during the process of the RNA- seq library preparation, and the UMIs protect against amplification biases.
  • the methods of the invention utilizing a transposase-based sequencing library preparation have the added advantage of eliminating a number of labor- intensive and costly steps in library preparation, including magnetic bead immobilization, separate fragmentation, end repair, dA-tailing, and adaptor ligation.
  • By eliminating the separate steps of chemical fragmentation and its purification, end repair, dA-tailing and adapter ligation, labor and cost are reduced, and the yield is much higher than with other techniques because there are fewer purification steps (during which material can be lost) and because this method to tag the fragment is much more efficient than by ligation with a regular ligase. Because less material is lost in the process, the methods of the invention can start with a much lower amount of starting cDNA.
  • the invention provides methods that are advantageous based on a number of improvements to existing methods.
  • a typical method provided by the invention is depicted in Figure 2, and starts with preparing a capture plate for cell sorting. Cells are then sorted into the plate (e.g., by fluorescence activated cell sorting), after which the plate may be frozen down for storage. For single cell analysis, one cell is sorted into each well of the plate.
  • One advantage of the nucleic acids provided herein is that the use of various barcodes permits the end user to correlate transcript expression back to a particular well and plate, and thus to a specific cell evaluated.
  • the plate can, in certain embodiments, be thawed from its frozen state.
  • a proteinase or protease such as proteinase K
  • the cell sorting and individual cell lysis steps can be skipped, as the starting material is already R A.
  • the starting material is a population of cells, the population can be divided into a multi-well plate in preparation for lysis.
  • the starting material is a lysate prepared from a population of cells or tissues, cell or tissue lysis may optionally occur in a prior step before introduction into the well and then lysate itself may be added to each well of a multi-well plate.
  • a population of cells can be sorted into lysis buffer and lysed (e.g., by freeze-thawing, proteinase K treatment, or a combination thereof) before the lysate is added to the plate.
  • the next steps are to reverse transcribe the mR A that has been released from the cells and to perform a template switching step.
  • the reverse transcription and template switching can be performed using the nucleic acids of the invention, which efficiently perform these steps.
  • a cDNA synthesis primer comprising a 5' blocking group, an internal adapter sequence, a barcode sequence, a unique molecular identifier (UMI) sequence, a complementarity sequence, and a
  • the first nucleotide of the dinucleotide sequence is a nucleotide selected from adenine, guanine, and cytosine
  • the second nucleotide of the dinucleotide sequence is a nucleotide selected from adenine, guanine, cytosine, and thymine
  • the 5 ' blocking group is used to ensure the correct directionality of cDNA synthesis and the adapter sequence provides a sequence annealing to a sequencing primer, so the first sequencing read will contain the barcode and UMI sequences.
  • the barcode sequence is used to track which well (and, thus, which cell) a particular cDNA was generated from.
  • a barcode can provide a reference for (and, thus, a way to identify) the sample or the pool (e.g., the well) rather than a single cell.
  • a UMI can be used in bulk RNA-seq and lysate sequencing to identify the transcript and the ⁇ primer (which, in other embodiments, typically contains the barcode for the plate, e.g., for plate indexing - sometimes referred to as the plate barcode or the index) identifies the sample or pool (e.g., the well) rather than the single cell.
  • the UMI can be, for example, a 16mer UMI.
  • a combination of one or more barcodes and a UMI is used.
  • a UMI is used either alone or with a single barcode. In either way, the methods and compositions provide a mechanism for identifying where a particular transcript came from.
  • i7 is used for plate indexing (e.g., it is a barcode to identify a particular plate).
  • serves as a sample barcode.
  • the UMI provides a way to trace each cDNA produced to a particular mRNA derived from a cell/sample.
  • the complementarity sequence anneals to the mRNA, for example, to the poly(A) tail of an mRNA, although it also could anneal to a specific target sequence, such as the sequence of a particular mRNA, instead.
  • the 3 ' dinucleotide sequence target the extremity of the polyA tail, the last two bases of the mRNA before the polyA tail.
  • a template- switching oligonucleotide comprising a 5 ' poly-isonucleotidecytosine- isoguanosine-isocytosine sequence, an internal adapter sequence, and a 3' guanosine tract can be used in the template switching step.
  • the 5' poly- isonucleotidecytosine-isoguanosine-isocytosine sequence provides non-standard base pairs in the template switching oligo to prevent background cDNA synthesis.
  • nucleotide isomers inhibit reverse transcriptase, such as MMLV reverse transcriptase, from extending the cDNA beyond the template switching adapter, thus increasing cDNA yield by reducing formation of concatemers of the template switching adapter.
  • the adapter sequence provides the sub sequence required for the suppression PCR, and the 3 ' guanosine tract is used to anneal to a polycytosine tract generated at the 3 ' end of the first strand of cDNA synthesized. These steps are useful in incorporating a barcode and a UMI into the resulting cDNAs.
  • the barcode introduced here helps track the individual well (and, therefore, cell/sample) that a cDNA population came from, while the UMI is unique for each mR A that produces a cDNA.
  • the population of UMIs incorporated into the cDNAs provide a molecular "snapshot" of the mRNA population of the cell or sample at the time of lysis, because subsequent amplification steps do not alter the number of UMIs, making it possible to trace back each cDNA sequenced later to a particular mRNA released from the cell/sample.
  • the template switching step is selective for the creation of full-length cDNAs.
  • the wells can be pooled together and purified, followed by treatment with an exonuclease such as Exonuclease I.
  • an exonuclease such as Exonuclease I
  • the primer used for the suppression PCR can bind to the remaining adapters that are in excess from the template switching reaction, so the addition of an exonuclease, such as Exonuclease I, improves results.
  • the cDNAs then are amplified (e.g, via PCR), followed by subsequent purification and quantification steps.
  • the library is prepared for sequencing by fragmentation, e.g., with a transposase-based fragmentation system.
  • This step also introduces a second bar code to the cDNAs, this second bar code being specific for the capture plate from which the cDNAs were pooled.
  • each cDNA will have a bar code for both the plate and the well from which it was derived, allowing simultaneous processing of a large number of samples, in which each individual sequence can be traced back to a single mRNA of a specific cell (or, in the case of another type of sample, to be traced back to a well containing a cell or tissue lysate sample, a purified RNA sample, or the like).
  • the library then can be purified, selected for appropriate size fragments, assessed for quantity and quality, and sequenced (e.g., by R A-seq such as the Illumina HiSeqTM (Catalog # SY-401-2501) or MiSeqTM (Catalog # SY-410-1003) systems).
  • the sequencer can handle various read lengths and either single-end or paired-end sequencing.
  • the libraries can be run in a way that matches with the read length required to read each barcode and obtains enough information from the sequence of the cDNA to identify from which gene it was coming from. For example, 17 cycles can be run for read 1 (see above) to read first the 6bp well/cell barcode and the lObp of UMI. This is then followed by 9 cycles to read the 8bp i7 plate index. Finally, 46 cycles are, in certain
  • embodiments run on the other strand to read the cDNA/gene sequence.
  • the machine allows the operator to set up a custom run for which they decide the read length for each portion for which sequence is to be obtained.
  • This sequencing design allows an individual to decipher all the information while using the smaller/cheapest kit to meet their needs (e.g., 50 cycle kit that actually contains enough reagents for 74 cycles). Alternatively, an individual could run more cycles to get longer stretches of cDNA.
  • samples from multiple capture plates can be combined without losing the identity of each cDNA in the mixture because of the two barcode sequences.
  • the data can be deconvo luted after sequencing to determine the UMI of each particular cDNA and the well and plate it came from via the barcodes. This is advantageous because it allows a researcher to run many more samples together than would otherwise be possible, and to do so with less cost and labor.
  • diagnostics refers to methods by which the skilled artisan can estimate and/or determine whether or not a patient is afflicted with a given disease or condition. The skilled worker often makes a diagnosis based on one or more diagnostic indicators. Exemplary diagnostic indicators may include the manifestation of symptoms or the presence, absence, or change in one or more markers for the disease or condition. A diagnosis may indicate the presence or absence, or severity, of the disease or condition.
  • prognosis is used herein to refer to the likelihood of the progression or regression of a disease or condition, including likelihood of the recurrence of a disease or condition.
  • treating refers to taking steps to obtain beneficial or desired results, including clinical results. Beneficial or desired clinical results include, but are not limited to, reduction, alleviation or amelioration of one or more symptoms associated with the disease or condition.
  • administering or “administration of a compound or an agent to a subject can be carried out using one of a variety of methods known to those skilled in the art.
  • a compound or an agent can be administered orally, intravenously, arterially, intradermally, intramuscularly, intraperitoneally, subcutaneously, ocularly, sublingually, intranasally, intraspinally, intracerebrally, and transdermally.
  • a compound or agent can appropriately be introduced by rechargeable or biodegradable polymeric devices or other devices, e.g., patches and pumps, or formulations, which provide for the extended, slow, or controlled release of the compound or agent.
  • Administering can also be performed, for example, once, a plurality of times, and/or over one or more extended periods.
  • Administration of a compound may include both direct administration, including self-administration, and indirect administration, including the act of prescribing a drug. For example, a physician who instructs a patient to self-administer a therapeutic agent, or to have the agent administered by another, and/or who provides a patient with a prescription for a drug has administered the drug to the patient.
  • nucleic acid refers to DNA molecules (e.g., cDNA or genomic DNA), RNA molecules (e.g., mRNA), DNA-RNA hybrids, and analogs of the DNA or RNA generated using nucleotide analogs.
  • the nucleic acid molecule can be a nucleotide, oligonucleotide, double-stranded DNA, single- stranded DNA, multi-stranded DNA, complementary DNA, genomic DNA, non- coding DNA, messenger RNA (mRNA), microRNA (miRNA), small nucleolar RNA (snoRNA), ribosomal RNA (rRNA), transfer RNA (tRNA), small interfering RNA (siRNA), heterogeneous nuclear RNAs (hnRNA), or small hairpin RNA (shRNA).
  • mRNA messenger RNA
  • miRNA microRNA
  • rRNA ribosomal RNA
  • tRNA transfer RNA
  • siRNA small interfering RNA
  • hnRNA heterogeneous nuclear RNAs
  • shRNA small hairpin RNA
  • transcriptome can refer to any sequencing or gene expression information concerning the transcriptome or portion thereof. This information can be either qualitative (e.g., presence or absence) or quantitative (e.g., levels or mRNA copy numbers). In some embodiments, a profile can indicate a lack of expression of one or more genes.
  • cDNA library refers to a collection of complementary DNA (cDNA) fragments.
  • a cDNA library may be generated from the transcriptome of a single cell or from a plurality of single cells. cDNA is produced from mRNA found in a cell and therefore reflects those genes that have been transcribed for subsequent protein expression.
  • a "plurality" of cells refers to a population of cells and can include any number of cells to be used in the methods described herein.
  • a plurality of cells includes at least 10 cells, at least 25 cells, at least 50 cells, at least 100 cells, at least 200 cells, at least 500 cells, at least 1,000 cells, at least 5,000 cells, or at least 10,000 cells.
  • a plurality of cells includes from 10 to 100 cells, from 50 to 200 cells, from 100 to 500 cells, from 100 to 1,000 cells, or from 1,000 to 5,000 cells.
  • a “single cell” refers to one cell.
  • Single cells useful in the methods described herein can be obtained from a tissue of interest, or from a biopsy, blood sample, or cell culture. Additionally, cells from specific organs, tissues, tumors, neoplasms, or the like can be obtained and used in the methods described herein. Cells can be cultured cells or cells from a dissociated tissue, and can be fresh or preserved in a preservative buffer such as R Aprotect.
  • the method of preparing the cDNA library can include the step of obtaining single cells.
  • a single cell suspension can be obtained using standard methods known in the art including, for example, enzymatically using trypsin or papain to digest proteins connecting cells in tissue samples or releasing adherent cells in culture, or mechanically separating cells in a sample.
  • Single cells can be placed in any suitable reaction vessel in which single cells can be treated individually. For example a 96-well plate, such that each single cell is placed in a single well.
  • an "oligonucleotide” or “polynucleotide” refers to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides or analogs thereof. Polynucleotides can have any three- dimensional structure and can perform any function.
  • Exemplary polynucleotides include a gene or gene fragment (e.g., a probe or primer), exons, introns, messenger R A (mR A), transfer R A, ribosomal R A, ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA or RNA of any sequence, and nucleic acid probes and primers.
  • a gene or gene fragment e.g., a probe or primer
  • exons e.g., introns, messenger R A (mR A), transfer R A, ribosomal R A, ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA or RNA of any sequence, and nucleic acid probes and primers.
  • mR A messenger R A
  • transfer R A transfer R A
  • polynucleotide can comprise modified nucleotides, such as isonucleotides, methylated nucleotides, and other nucleotide analogs. The term also refers to both double- and single-stranded molecules.
  • a polynucleotide is composed of a specific sequence of four nucleotide bases: adenine (A), cytosine (C), guanine (G), and thymine (T). Uracil (U) substitutes for thymine when the polynucleotide is RNA.
  • the sequence can be input into databases in a computer having a central processing unit and used for bioinformatics applications such as functional genomics and homology searching.
  • a "primer” is a polynucleotide that hybridizes to a target or template that may be present in a sample of interest. After hybridization, the primer promotes the polymerization of a polynucleotide complementary to the target, for example in a reverse transcription or amplification reaction.
  • Methods for selecting or sorting cells are well established, and in some embodiments include, but are not limited to, fluorescence-activated cell sorting (FACS), micromanipulation, manual sorting, and the use of semi-automated cell pickers.
  • Individual cells can be individually selected based on features detectable by observation (e.g., by microscopic observation). Exemplary features can include location, morphology, and reporter gene expression.
  • a population of cells can be sorted to provide a subpopulation or a predetermined subset of cells. In some embodiments, the population, subpopulation, or predetermined subset can be sorted to provide single cells. In some embodiments, the cells are sorted into a capture plate.
  • Capture plates can comprise a number of wells into which the cells are sorted, for example, 24 wells, 96 wells, 384 wells, or 1536 wells.
  • a population of cells is lysed without sorting.
  • the population of cells can be, for example, a tissue sample.
  • the population of cells is an isolated population of cells.
  • the starting material for further analysis may be, for example, a cell or tissue lysate or bulk purified or extracted RNA.
  • cells can be divided into the wells of a plate without sorting.
  • the amount of material in each well is normalized with respect to the other wells so as to provide similar sequencing coverage across a plate.
  • the cells may be lysed.
  • Cells may be lysed by any number of known techniques. Exemplary cell lysis techniques include freeze-thawing, heating the cells, using a detergent or other chemical method, or a combination thereof. Techniques minimizing degradation of the released mRNA are preferred. Likewise, techniques preventing the release of nuclear chromatin are preferred. For example, heating the cells in the presence of Tween-20 is sufficient to lyse cells while minimizing genomic contamination from nuclear chromatin.
  • cells are lysed using freeze-thawing.
  • a proteinase or protease such as proteinase K, is added to the lysis reaction to increase the efficiency of lysis.
  • cells are lysed using freeze-thawing optionally supplemented with addition of proteinase K.
  • cell lysis may be of single cells already sorted into individual wells of a plate.
  • lysis of populations of cells may be performed and the starting material for further sequence analysis may be a cell or tissue lysate made from a plurality of cells and then aliquoted to wells of a plate.
  • the material may be stored at a suitable temperature, such as -80 °C, prior to further use.
  • cDNA is synthesized from mRNA through the process of reverse transcription.
  • Reverse transcription can be performed directly on cell lysates (for example, a cell lysate prepared as described above), by adding a reaction mix for reverse transcription directly to the cell lysate.
  • the total RNA or mRNA can be purified after cell lysis, for example through the use of column based (e.g., Qiagen RNeasy Mini kit Cat. No. 74104, ZymoResearch Direct-zol RNA Cat. No. R2050) or magnetic bead purification (e.g., Agencourt RNAClean XP, Cat. No. A63987).
  • the reverse transcription is combined with a template switching step to improve the yield of longer (e.g., full length) cDNA molecules.
  • the reverse transcriptase used has tailing or terminal transferase activity, and synthesizes and anchors first- strand cDNA in one step.
  • the reverse transcriptase is a Moloney Murine Leukemia Virus (MMLV) reverse transcriptase, for example, SMARTscribeTM (Clontech, Cat. No. 639536) reverse transcriptase, Superscript IITM reverse transcriptase (Life Technologies, Cat. No. 18064-014), or Maxima H MinusTM reverse transcriptase. (Thermo Scientific, Cat. No. EP0753).
  • Template switching introduces an arbitrary sequence at the 3 ' end of the cDNA that is designed to be the reverse complement to the 3 ' end of a cDNA synthesis primer.
  • the synthesis of the first strand of the cDNA can be directed by a cDNA synthesis primer (CDS) that includes an RNA complementary sequence (RCS).
  • CDS cDNA synthesis primer
  • RCS RNA complementary sequence
  • the RCS is at least partially complementary to one or more mRNA species in an individual mRNA sample, allowing the primer to hybridize to at least some mRNA species in a sample to direct cDNA synthesis using the mRNA as a template.
  • the RCS can comprise oligo (dT) sequence that binds to many mRNA species, or it can be specific for a particular mRNA species, for example, by binding to an mRNA sequence of a gene of interest.
  • the RCS can comprise a random sequence, such as random hexamers.
  • a non-self- complementary sequence can be used.
  • a template-switching oligonucleotide that includes a portion which is at least partially complementary to a portion of the 3 ' end of the first strand of cDNA generated by the reverse transcription can also be used in the methods of the invention. Because the terminal transferase activity of reverse transcriptase typically causes the incorporation of two to five cytosines at the 3 ' end of the first strand of cDNA synthesized, the first strand of cDNA can include a plurality of cytosines, or cytosine analogues that base pair with guanosine, at its 3 ' end to which the template-switching oligonucleotide with a 3' guanosine tract can anneal.
  • a template-switching oligonucleotide is extended to form a double stranded cDNA.
  • a template-switching oligonucleotide can include a 3 ' portion comprising a plurality of guanosines or guanosine analogues that base pair with cytosine.
  • Exemplary guanosines or guanosine analogues include, but are not limited to,
  • the guanosines can be ribonucleosides or locked nucleic acid monomers.
  • a locked nucleic acid is an R A nucleotide wherein the ribose moiety has been modified with an extra bridge connecting the 2' oxygen and the 4' carbon.
  • a peptide nucleic acid is an artificially synthesized polymer similar to DNA or RNA, wherein the backbone is composed of repeating N-(2-aminoethyI)- glycine units linked by peptide bonds.
  • the reverse transcription and template switching comprise contacting an mRNA sample with two nucleic acid primers.
  • the first nucleic acid primer e.g., a template-switching
  • oligonucleotide comprising a 5 ' poly-isonucleotidecytosine-isoguanosine- isocytosine sequence, an internal adapter sequence, and a 3 ' guanosine tract.
  • the 5' poly-isonucleotide sequence comprises an isocytosine, or an isoguanosine, or both.
  • the 5 ' poly-isonucleotide sequence comprises an isocytosine -isoguanosine-isocytosine sequence.
  • the 3' guanosine tract comprises two, three, four, five, six, seven, eight, nine, ten, or more guanosines. In certain embodiments, the 3' guanosine tract comprises three guanosines. In some embodiments, the adapter sequence is 12 to 32 nucleotides in length, for example, 22 nucleotides in length.
  • the internal adapter sequence is 5'- ACACTCTTTCCCTACACGACGC-3' (SEQ ID NO: 1).
  • sequence of the first primer is 5'- iCiGiCACACTCTTTCCCTACACGACGCrGrGrG-3' (SEQ ID NO: 17)(e.g., 1 ⁇ ,) wherein iC represents isocytosine (iso-dC), iG represents isoguanosine, and rG represents RNA guanosine.
  • the second nucleic acid primer (e.g., a cDNA synthesis primer) comprises a 5' blocking group, an internal adapter sequence, a barcode sequence, a unique molecular identifier (UMI) sequence, a
  • the bar code can be omitted from the cDNA synthesis primer and an extra 6 base pairs can be added to the UMI sequence.
  • the 5' blocking group is selected from biotin, an inverted nucleotide (e.g., inverted dideoxy-T), a fluorophore, an amino group, and iso-dG or isodC.
  • the internal adapter sequence is 23 to 43 nucleotides in length, for example, 33 nucleotides in length.
  • the internal adapter sequence is 5'-ACACTCTTTCCCTACACGACGC-3' (SEQ ID NO: 1).
  • the barcode sequence is 4 to 20 nucleotides in length, for example, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides in length.
  • the UMI sequence is 6 to 20 nucleotides in length, for example, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides in length.
  • the complementarity sequence is a poly(T) sequence.
  • the complementarity sequence is 20 to 40 nucleotides in length, for example, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 nucleotides in length.
  • the second nucleic acid primer is 5 '-
  • the barcodes may be designed so that each barcode sequence differs from the barcodes of all other primers by at least two nucleotides, so that a single sequencing error cannot lead to the misidentification of the barcode.
  • the UMI sequences provide a robust guard against amplification biases. More particularly, each UMI is present only once in a population of second nucleic acid primers. Thus, each UMI is incorporated into a unique cDNA sequence generated from a cellular mRNA, and any subsequent amplification steps will not alter the one UMI to one mRNA ratio.
  • the UMI sequence rather than being 10 nucleotides in length, is 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or more nucleotides in length.
  • the length should be selected to provide sufficient unique sequences for the population of cells to be tested (preferably with at least two nucleotide differences between any pair of UMIs), preferably without adding unnecessary length that increases sequencing cost.
  • Barcode sequences enable each cDNA sample generated by the above method to have a distinct tag, or a distinct combination of tags, such that once the tagged cDNA samples have been pooled, the tag can be used to identify the single cell from which each cDNA sample originated.
  • each cDNA sample can be linked to a single cell, even after the tagged cDNA samples have been pooled and amplified.
  • the use of the foregoing nucleic acids permits deconvolution of pooled data to single cell/well resolution. This is particularly advantageous for facilitating the application of this technology to screening assays.
  • a nucleic acid useful in the invention can contain a non-natural sugar moiety in the backbone, for example, sugar moieties with 2' modifications such as addition of a halogen, alkyl-substituted alkyl, SH, SCH 3 .
  • 2' modifications such as addition of a halogen, alkyl-substituted alkyl, SH, SCH 3 .
  • Similar modifications also can be made at other positions on the sugar.
  • Nucleic acids, nucleoside analogs or nucleotide analogs having sugar modifications can be further modified to include a reversible blocking group, a peptide linked label, or both. In those embodiments comprising a 2' modification, the base can have a peptide- linked label.
  • a nucleic acid useful in the invention also can include native or non- native bases.
  • a native deoxyribonucleic acid can have one or more bases selected from adenine, thymine, cytosine, and guanine
  • a ribonucleic acid can have one or more bases selected from uracil, adenine, cytosine, and guanine.
  • Exemplary non-native bases include, but are not limited to, inosine, xanthine, hypoxanthine, isocytosine, isoguanosine, 5-methylcytosine, 5- hydroxymethyl cytosine, 2-aminoadenine, 6-methyl adenine, 6-methyl guanine.
  • isocytosine and isoguanosine may reduce non-specific hybridization.
  • a non-native base can have universal base pairing activity, wherein it is capable of base-pairing with any other naturally occurring base, e.g., 3- nitropyrrole and 5-nitroindole.
  • the cDNA is pooled together. For example, a population of cells can be individually sorted into the wells of a tray, lysed, and undergo reverse transcription and template switching. These cDNAs then can be pooled and purified. In certain embodiments, the cDNA is purified through a column-based purification method, e.g., with a DNA Clean & Concentrator-5 column (Zymo Research, #D4013).
  • a column-based purification method e.g., with a DNA Clean & Concentrator-5 column (Zymo Research, #D4013).
  • pooled cDNAs are treated with an exonuclease (e.g., Exonuclease I) to degrade any primers remaining from the reverse transcription and template switching steps. This prevents possible interference by these primers in subsequent amplification.
  • exonuclease e.g., Exonuclease I
  • amplification refers to a process by which multiple copies of a particular polynucleotide are formed, and includes methods such as the polymerase chain reaction (PCR), ligation amplification (also known as ligase chain reaction, or LCR), and other
  • amplification refers specifically to PCR.
  • Amplification methods are widely known in the art.
  • PCR refers to a method of amplification comprising hybridization of primers to specific sequences within a DNA sample and amplification involving multiple rounds of annealing, elongation, and denaturation using a DNA polymerase. The resulting DNA products are then often screened for a band of the correct size.
  • the primers used are oligonucleotides of appropriate length and sequence to provide initiation of polymerization.
  • Reagents and hardware for conducting amplification reactions are widely known and commercially available. Primers useful to amplify sequences from a particular gene region are sufficiently complementary to hybridize to target sequences.
  • Nucleic acids generated by amplification can be sequenced directly. [0076] When hybridization occurs in an antiparallel configuration between two single-stranded polynucleotides, the reaction is called “annealing" and those polynucleotides are described as “complementary”. A double-stranded
  • polynucleotide can be complementary or homologous to another polynucleotide, if hybridization can occur between one of the strands of the first polynucleotide and the second.
  • Complementarity or homology is quantifiable in terms of the proportion of bases in opposing strands that are expected to form hydrogen bonding with each other, according to generally accepted base-pairing rules.
  • hybridization is influenced by hybridization conditions, such as temperature and salt. In the context of amplification, these parameters can be suitably selected.
  • cDNA created by reverse transcription and template switching, and optionally treated with an exonuclease is amplified to provide more starting material for sequencing.
  • cDNA can be amplified by a single primer with a region that is complementary to all cDNAs, e.g., an adapter sequence.
  • the primer has a 5 ' blocking group such as biotin.
  • An exemplary primer is as follows: 5 '-
  • One exemplary amplification reaction uses cDNA; PCR buffer, such as 1 OX Advantage 2 PCR buffer; dNTPs; the DNA primer 5 ' -
  • amplification reaction may be modified to use fewer than 18 cycles, e.g., 10 cycles.
  • One exemplary amplification reaction uses 20 ⁇ ⁇ of cDNA; 5 ⁇ ⁇ of 10X Advantage 2 PCR buffer; ⁇ ⁇ , of dNTPs; ⁇ ⁇ , of the DNA primer 5 '- /5Biosg/ACACTCTTTCCCTACACGACGC-3 ' (SEQ ID NO: 19) (10 ⁇ ,
  • Nucleic acid purification is well known in the art.
  • a nucleic acid e.g., cDNA
  • a spin- based column such as those commercially available from Zymo ResearchTM (DNA Clean & ConcentratorTM-5, Cat. No. D4013) or QiagenTM (MinElute PCR purification kit. Cat. No. 28004).
  • the spin column is a column lacking a physical ring, for example the ring found in QiagenTM columns, allowing elution of the purified nucleic acid in a lower volume than would be possible in a spin column with a ring.
  • a nucleic acid e.g., cDNA, such as in a cDNA library
  • magnetic beads include, for example, the Agencourt AMPure XPTM system (Beckman Coulter, Cat. No. A63881).
  • a nucleic acid e.g., cDNA, such as in a cDNA library
  • a nucleic acid is purified after being run on a gel.
  • Gel extraction purification kits are well known, and include, for example, the MinElute Gel Extraction KitTM (Qiagen, Cat. No. 28604).
  • a cDNA library for sequencing is fragmented prior to the sequencing.
  • a cDNA library can be fragmented by any known method, for example, mechanical fragmentation or a transposase-based fragmentation such as that used in the NexteraTM system (e.g., the Illumina Nextera XT DNA Sample
  • a barcode sequence introduced during preparation of a cDNA library for sequencing is specific for a predetermined set of cells.
  • This predetermined set of cells can be a subset of a larger set of cells.
  • a tissue biopsy can be sorted into a set of cells to be further sorted into single cells in a capture plate for gene profiling. If a bulk lysate or population of cells is being used as a starting material rather than a single cells that have been sorted, a barcode sequence may, in certain
  • a cDNA library for sequencing is quantified and evaluated for quality prior to the sequencing to ensure that the library is of sufficient quantity and quality to yield positive results from sequencing.
  • a cDNA library can be quantified using a fluorometer and analyzed for quantity and average size through the use of a number of commercially available kits. The 2 main metrics for quality are the concentration of the library (which needs to be sufficient for loading on the sequencer) and the length of the cDNA fragments to be sequenced. Size selection is performed on a gel to enrich for fragments of the correct size. The gel itself gives an idea of the quality of the library.
  • the final extracted library can be run on an Agilent Bioanalyzer (Cat. No. G2940CA) to obtain the size distribution for the cDNA fragments.
  • sequencing refers to any technique known in the art that allows the identification of consecutive nucleotides of at least part of a nucleic acid.
  • exemplary sequencing techniques include RNA-seq (also known as whole transcriptome sequencing), IlluminaTM sequencing, direct sequencing, random shotgun sequencing, Sanger dideoxy termination sequencing, whole-genome sequencing, massively parallel signature sequencing (MPSS), sequencing by hybridization, pyrosequencing, capillary electrophoresis, gel electrophoresis, duplex sequencing, cycle sequencing, single-base extension sequencing, solid- phase sequencing, high-throughput sequencing, massively parallel signature sequencing, emulsion PCR, sequencing by reversible dye terminator, paired-end sequencing, near-term sequencing, exonuclease sequencing, sequencing by ligation, short-read sequencing, single-molecule sequencing, sequencing-by- synthesis, real-time sequencing, reverse-terminator sequencing, nanopore sequencing, 454 sequencing, Solexa Genome Analyzer sequencing, SOL
  • sequencing is performed on Illumina Hiseq or MiSeq paired-end flow cells.
  • nucleic acids, methods, and kits of the invention are capable of sequencing data analysis.
  • Sequencing products can be traced not only to a single plate of cells from which it came, but also to a single cell (e.g., a well) and, indeed, a single cellular transcript.
  • This deconvolution of sequencing data can be achieved through the use of barcode and UMI sequences.
  • sequencing is combined with 3' digital gene expression to provide a number of counts for a particular sequence or sequences (e.g., cDNAs containing a particular combination of bar codes and a UMI).
  • each fragment of each transcript is sequenced and then counted for how many fragments of each transcript have been sequenced.
  • the computed gene expression should be normalized based on the length of a given transcript because a longer transcript will have a greater chance of having one of its fragments sequenced.
  • full transcript sequencing typically requires more sequencing coverage than DGE, for which only the 3 'end needs to be sequenced. Kits
  • the invention provides a kit comprising a plurality of the one or both of the reverse transcription/template switching nucleic acid primers described above.
  • the UMI sequence of each of the second nucleic acid primer described above in the plurality of nucleic acids of the kit is unique among the nucleic acids of the kit.
  • the plurality of nucleic acids comprises different populations of nucleic acid species.
  • each population of nucleic acid species comprises a different barcode sequence that uniquely identifies a single population of nucleic acid species.
  • the kit further comprises a third nucleic acid primer comprising 12 to 32 nucleotides and a 5' blocking group as described above.
  • the third nucleic acid is 22 nucleotides in length.
  • An exemplary sequence of the third nucleic acid primer is 5'- ACACTCTTTCCCTACACGACGC-3' (SEQ ID NO: 2).
  • the kit further comprises a nucleic acid comprising a barcode sequence.
  • the kit further comprises a phosphorothioate bond-containing nucleic acid comprising an X1 *X2*X3*X4*X5*3' sequence, wherein * is a phosphorothioate bond.
  • the phosphorothioate bond- containing nucleic acid is 48 to 68 nucleotides in length, for example, 58 nucleotides in length.
  • An exemplary sequence of the phosphorothioate bond- containing nucleic acid is 5'-
  • the kit further comprises a capture plate and/or a reverse transcriptase enzyme and/or a DNA purification column (e.g., a DNA purification spin column) and/or proteinase K.
  • a DNA purification column e.g., a DNA purification spin column
  • the kit can comprise a Moloney Murine Leukemia Virus (MMLV) reverse transcriptase, for example, SMARTscribeTM reverse transcriptase,
  • MMLV Moloney Murine Leukemia Virus
  • SMARTscribeTM reverse transcriptase SMARTscribeTM reverse transcriptase
  • kits include any one or any combinations of the reagents described herein and, optionally, directions for use.
  • the reagents may be provided in separate containers, such as separate tubes or vials.
  • the kit contains sterile water for use.
  • the nucleic acids, kits, and/or methods of the invention are used for research applications requiring sequencing or gene expression profiling.
  • the research applications include studying cellular differentiation, characterizing tissue heterogeneity, high- throughput screening of agents (e.g., potential therapeutics, potential
  • the nucleic acids e.g., compositions), kits, and/or methods, of the disclosure are applied to gene expression analysis of single cells, optionally in response to contacting the single cell with an agent in the high- throughput screening context.
  • the ability to analyze gene expression accurately and across large numbers of cells, and to be able to accurately correlate the expression level to a particular cell/well is an exemplary advantage and application of the instant technology.
  • the technology is, in certain embodiments, similarly applied to other samples, such as cell or tissue lysates.
  • the invention is useful in generating a gene expression profile for a plurality of cells.
  • gene expression profiles can be used in a number of applications related to the diagnosis, prognosis, and treatment of a subject.
  • cells from a tissue sample collected from a patient can be used in the methods of the invention to generate an expression profile that can be compared against a known profile that is indicative of the disease or condition, thus informing a physician of whether the subject has the disease or condition.
  • the profile can be compared to a known profile useful in the prognosis of the disease or condition. For example, if the known profile is predictive of a cancer prognosis, the comparison may inform the physician of the stage of cancer or the cancer's likelihood of metastasis.
  • the invention can be used in a method of treating a disease or condition in a subject in need thereof.
  • a method of the invention can be used to obtain gene expression profiles in a subject before and after treatment with a therapeutic agent, thereby providing a means of determining the efficacy of the therapeutic agent. These data can be used to determine the efficacy of a treatment, or to help a physician determine an effective treatment regimen.
  • the invention is applicable to various diseases or conditions.
  • diseases or conditions are a cancer, a cardiovascular disease or condition, a neurological or neuropsychiatric disease or condition, an infectious disease or condition, a respiratory or gastrointestinal tract disease or condition, a reproductive disease or condition, a renal disease or condition, a prenatal or pregnancy-related disease or condition, an autoimmune or immune-related disease or condition, a pediatric disease, disorder, or condition, a mitochondrial disorder, an ophthalmic disease or condition, a musculo-skeletal disease or condition, or a dermal disease or condition.
  • All publications, patents and published patent applications referred to in this application are specifically incorporated by reference herein. In case of conflict, the present specification, including its specific definitions, will control.
  • Example 1 Protocol for transcriptome-wide single-cell RNA sequencing [0091] To test the methods of the invention, the protocol described below was developed.
  • RNAprotect Cell Reagent Qiagen, #76526) and 1 ⁇ of RNaseOUT Recombinant Ribonuclease Inhibitor (Life Technologies, #10777-019). Cells were stored up to two weeks at 4 °C. Prior to sorting, cells in the RNAprotect Cell Reagent were diluted in 1.5mL PBS, pH 7.4 (no calcium, no magnesium, no phenol red, Life Technologies, #10010-049). The cells then were stained for viability (DNA staining by Hoechst 33342) with NucBlue Live ReadyProbes Reagent (Life Technologies, #R37605).
  • ⁇ ⁇ of a universal adapter DNA primer (template-switching oligonucleotide) 5 '- iCiGiCACACTCTTTCCCTACACGACGCrGrGrG-3 ' ( ⁇ ⁇ ,) (SEQ ID NO: 17) was added to each well, wherein iC represesents isocytosine (iso-dC), iG represents isoguanosine, and rG represents RNA guanosine.
  • the barcode sequences were designed such that each barcode differed from the others by at least two nucleotides, so that a single sequencing error could not lead to the misidentification of the barcode (Table 1).
  • the plate was subsequently incubated at 72 °C for 3 minutes then immediately placed on ice to cool down (although this step is optional).
  • the Template Switching step was carried out in each well using the following reagents: 2 ⁇ of 5X 1st strand buffer (250mM UltraPure Tris-HCl, pH 8.0, Life Technologies, #15568-025; 375mM KC1, LifeTechnologies, #AM9640G; 30mM MgC12, Life Technologies,
  • CAGGCC 255 CAGGGG 256
  • Exonuclease I 2 ⁇ L of 10X reaction buffer, of Exonuclease I (New England Biolabs, #M0293L), and the reaction was incubated at 37 °C for 30 minutes, then at 80 °C for 20 minutes.
  • 5Biosg represents 5' biotin) (10 ⁇ , Integrated DNA Technologies); ⁇ , of the Advantage 2 Polymerase Mix; and 22 ⁇ of Nuclease-Free Water, and performed using the following program: 95 °C for 1 minute; 18 cycles of a) 95 °C for 15 seconds, 65 °C for 30 seconds, 68 °C for 6 minutes, and 72 °C for 10 minutes (followed by an option hold period at 4 °C).
  • Full length cDNAs were purified with 30 ⁇ , of beads (here, Agencourt AMPure XP magnetic beads (Beckman Coulter, #A63880)). The full length cDNAs were eluted in 12 ⁇ of Nuclease-Free Water and quantified on the Qubit 2.0 Flurometer (Life Technologies) using the dsDNA HS Assay (Life Technologies).
  • the resulting sequencing library was purified with 30 ⁇ of Agencourt AMPure XP magnetic beads and eluted in 20 ⁇ of nuclease free water.
  • the entire library was run on an E-Gel EX Gel, 2% (Life Technologies, #G4010-02), and the band corresponding to a size range of 300 to 800bp was excised and purified using the QIAquick Gel Extraction Kit (Qiagen, #28704).
  • Sequencing library quality assessment [0103] The library was quantified on the Qubit 2.0 Fluorometer using the dsDNA HS Assay. The quality and average size of the library were assessed by
  • BioAnalyzer (Agilent) with the High Sensitivity DNA kit (Agilent, #5067-4626).
  • Sequencing is performed on any Illumina® HiSeqTM or MiSeqTM using standard Illumina® sequencing kit. Libraries are run on paired-end flow cells by running 17 cycles on the first strand, then 8 cycles to decode the NexteraTM barcode and finally 34 cycles (although 46 cycles also can be used to increase the amount of sequencing data). Up to twelve Nextera libraries/384-well capture plates, each comprising 384 cells, are multiplexed together (twelve libraries can be used with a set of twelve plate-identifying barcode sequences, although this number can be expanded with additional barcode sequences), allowing the simultaneous sequencing of up to 4,608 single cell transcriptomes on a single lane.
  • the methods and reagents (e.g., polynucleotides, kits, etc.) described herein have numerous applications.
  • the following provides an example demonstrating the application of the instant technology to a particular context.
  • the method described above was used to sequence the transcriptomes of a population of differentiating human adipose tissue-derived stromal/stem cells (hASCs) at three different time points (day 0, day 1, day 2, day 3, day 5, day 7, day 9, and day 14).
  • Visual inspection of these cells indicates that differentiation over time is incomplete, thus leading to a heterogeneous cell population (Figure 1).
  • Figure 3 depicts the design of the sequencing library incorporating the two levels of barcoding (well/cell and plate), the UMI, and the primer sequences indicated as P5 and P7 for Illumina sequencing.
  • P5 and P7 are the regions that anneal to their complementary oligos on the flow cell.
  • the index (i7) represents the plate index than is added during the Nextera tagmentation process after all wells have been pooled and pre-amplified. It is incorporated by PCR during the last step of the library preparation.
  • One i7 index is used per pool/plate of 96 or 384 samples/cells, allowing for a higher level of multiplexing by pooling several plates together for sequencing.
  • the sequencing primers P5 and P7 initiate the sequencing reaction. The sequencing will result in 3 distinct reads.
  • the first one is 16bp long and includes 6bp of the well/cell barcode followed by lObp of the UMI. Then the i7 index sequencing primer allows us to read the plate/pool index (i7, 8bp) on the same strand. Finally, the other strand is generated (paired-end sequencing) and the read 2 sequencing primer allows us to read the actual cDNA fragment, which is typically 45bp with a 50 cycle kit.
  • the disclosure provides a polynucleotide as set forth on Figure 3 (e.g., a polynucleotide comprising various polynucleotide portions, such as contiguous portions, as set forth in Figure 3).
  • the various portions are described herein and the figure contemplates polynucleotides comprising any combinations of these various portion. Expression values were correlated by comparing raw read counts to UMI counts ( Figure 4). Incorporating and counting UMIs helped to reduce the PCR bias.
  • GAPDH usually is present at a constant level of expression in a population of cells, when observed at the single cell level, a significant portion of cells were seen that did not express GAPDH because GAPDH is a cell cycle-regulated gene.
  • GAPDH is not necessarily a good reference gene especially at the single cell level. This underscores the power of the single cell sequencing methods of the invention.
  • FIG. 8 A projection of three of the highest components of a principal component analysis based on gene expression are shown in Figures 8 to 13. Each point represents a profiled cell. The cells profiled at day 0 are represented in black, while the cells profiled at the subsequent time points (day 1, day 2, day 3, day 7, and day 14) are shown in gray (or in red if depicted in color). A clear distinction can be seen between the day 0 cells and the cells from subsequent time points. To explore these differences, a Gene Ontology analysis then was performed on the differentially expressed genes between two subpopulations distinguishable at day 14 with the principal component analysis: a subpopulation of genes that clusters with day 0 genes and a subpopulation that is separate from those genes.
  • the invention provides a useful method for single cell sequencing and single transcript tracking that uses the aggregation of samples and subsequent deconvolution of data. Through this process of aggregation and deconvolution, the sequencing can be performed with less cost and greater efficiency than by traditional sequencing techniques. Moreover, the results obtained here reflect the ability to detect changes and differences across heterogeneous populations when those populations are evaluated at the single cell level. Such changes and differences may be lost (e.g., averaged out) if gene expression across the heterogenous population is instead evaluated.
  • Example 3 Simultaneous single cell sequencing of 12,832 cells [0110]
  • single cell sequencing methods and compositions e.g., reagents, nucleic acids, kits
  • a primary human adipose-derived stem/stromal cell (hASC) differentiation system was used as a test system, akin to that described above.
  • hASC human adipose-derived stem/stromal cell
  • the resulting data reveal the major axes of variation on gene expression, suggest a biological basis for the morphological heterogeneity observed in these cultures, and provide a rich resource for dissection of the regulatory networks involved in adipocyte formation and function beyond what investigations using other techniques have shown.
  • identification of rare expression programs can be enabled by deeper and more sensitive profiling of every cell, and direct comparison of in vitro and in vivo heterogeneity can be observed through direct profiling of single cells from tissue samples.
  • hASCs Human adipose-derived stem/stromal cells
  • the cultures were then induced to differentiate towards an adipogenic fate after reaching 80% confluency (differentiations Dl and D2) or two days after reaching 100% confluency (differentiation D3) by switching from growth medium to the StemPro adipogenesis differentiation medium (Life Technologies), and were subsequently prepared for further analysis, such as by qPCR or smFISH.
  • the differentiation medium was changed every three days for up to 14 days.
  • the variation in initial conditions was introduced to assess the robustness of the subsequent time course data.
  • 384-well SBS capture plates were filled with 5 ⁇ 1 of a 1 :500 dilution of Phusion HF buffer (New England Biolabs) in water and cells were then sorted into each well using a FACSAria II flow cytometer (BD Biosciences) based on Hoechst DNA staining. After sorting, the plates were immediately sealed, spun down, cooled on dry ice, and stored at -80°C. For lipid content-based FACS, cells were also stained with HSC LipidTOX Neutral Lipid Stain (Life Technologies) and sorted according to their relatively "high” or “low” lipid content, either by taking the top and bottom 20% of stained cells (D2) or the top and bottom 50% (D3).
  • cDNA from 384 wells was pooled together and purified and concentrated using a single DNA Clean & Concentrator- 5 column (Zymo Research). Pooled cDNAs were treated with an exonuclease, in this example Exonuclease I (New England Biolabs), and subsequently amplified by single primer PCR using the Advantage 2 Polymerase Mix (Clontech) and the SINGV6 primer (10 pmol, Integrated DNA Technologies) (5'- /5Biosg/ACACTCTTTCCCTACACGACGC-3' (SEQ ID NO: 19)).
  • Exonuclease I New England Biolabs
  • SINGV6 primer 10 pmol, Integrated DNA Technologies
  • the resulting sequencing library was purified with Agencourt AMPure XP magnetic beads (0.6x, Beckman Coulter), size selected (300-800bp) on an E-Gel EX Gel, 2% (Life Technologies), purified using a QIAquick Gel Extraction Kit (Qiagen) and quantified on a Qubit 2.0 Flurometer using a dsDNA HS Assay (Life Technologies).
  • Digital gene expression (DGE) libraries for sequencing were prepared from 10 ng of extracted total RNA, using the protocol described above for single cells, with the exception of using more concentrated template-switching and barcoded nucleic acids (10 pmol) and a version of the cDNA synthesis primer that did not contain the well-specific 6bp barcodes but instead a 16bp UMI (Integrated DNA Technologies) (5'-
  • Probes targeting LPL, G0S2 and TCF25 transcripts were synthesized as amine-conjugated oligonucleotides and then labelled with Cy5 (GE Healthcare), Alexa Fluor 594 (Molecular Probes) or 6-TAMRA (Molecular Probes).
  • Hybridizations and washes were performed using modifications to previously described procedures (see, e.g., Bienko et al, Nat. Methods 10: 122-124 (2013) and Raj et al, Nat. Methods 5 :877-879 (2008)).
  • lipids Prior to hybridizations, lipids were extracted by incubation of fixed cells in 2: 1 chloroform:methanol for 30 min at room temperature. Cells were washed quickly with 70% ethanol and then resuspended in 200 ⁇ 1 RNA Hybridization buffer containing 2x SSC buffer, 25%> Formamide, 10% Dextran Sulphate (Sigma), E.
  • coli tRNA (Sigma), Bovine Serum Albumin (Ambion), Ribonucleoside Vanadyl Complex and 150 ng of each desired probe set (the mass refers only to pooled oligonucleotides, excluding fluorophores, and is based on absorbance measurements at 260 nm).
  • Hybridizations were performed for 16-18 h at 30 °C, after which cells were washed twice for 30 min at 30 °C in RNA Wash buffer (containing 2 SSC buffer, Formamide 25% (Ambion) and 100 ng/ml DAPI).
  • All second sequence reads were aligned to a reference database containing all human RefSeq mRNA sequences (obtained from the UCSC Genome Browser hgl9 reference set), the human hgl9 mitochondrial reference sequences and the ERCC RNA spike-in reference sequences, using bwa version 0.7.4 4 with non-default parameter "-1 24".
  • Read pairs for which the second read aligned to a human RefSeq gene were kept for further analysis if 1) the initial six bases of the first read all had quality scores of at least 10 and corresponded exactly to a designed well-barcode and 2) the next ten bases of the first read (the UMI) all had quality scores of at least 30.
  • DGE Digital gene expression
  • the UMI counts for each gene in the remaining wells were then normalized by dividing by the sum of UMI counts across all genes in the same well. This normalization removes variation from differences in RNA content per cell and can be revisited for analyses that are sensitive to this phenomenon.
  • Pairwise Pearson correlations between genes across single cells and their associated p-values were computed using the scikit-learn metrics .pairwise_distances function.
  • the 5% false discovery rate (FDR) thresholds were estimated from the p-value distribution using the Benjamini-Hochberg-Yukeli procedure.
  • the expected null distributions of pairwise correlation coefficients were estimated by permuting expression values across cells from the same time point and re-computing the pairwise correlations 100 times.
  • PC A Principal component analyses
  • GSEA Gene set enrichment analyses
  • hASC cultures were collected just prior to induction of differentiation (day 0), as well as at seven time points after induction (days 1, 2, 3, 5, 7, 9 and 14). At day 14, approximately two thirds of the cells contained clearly visible lipid droplets while the remainder retained a more fibroblastlike morphology.
  • a nucleic acid stain was used to identify and sort intact single cells into 384-well plates with a fluorescence-activated cell sorter.
  • a neutral lipid stain also was used to separately sort single cells based on their lipid contents. This method allowed us to combine the advantages of FACS sorting, such as staining cells using, for example, a DNA stain or a lipid stain, and selecting specific cells to profile.
  • DGE survey-depth digital gene expression
  • PCI PC metagene
  • PC2 was high only in cells collected from day 0, effectively separating these from the differentiating cells. It showed a strong positive association with expression of genes required for progression through the mitotic cell cycle and, to a lesser extent, with genes associated with non-adipogenic differentiation. A decrease in PC2 may therefore reflect an exit from the cell cycle and lineage commitment.
  • PC3 was high during the first two days post- induction, but steadily decreased as the cells approached day 14. This decrease was associated with up-regulation of lipid homeostasis pathways and markers of adipocyte maturation.
  • PC4 showed a transient drop at day 1 , which was associated with increased expression of genes known to be rapidly induced by adipogenic cocktails, including early adipogenic regulators CEBPB and CEBPD 11. PC4 may therefore reflect an early response to induction of differentiation.
  • RNA sequencing RNA sequencing
  • a population cells or tissues e.g., cell or tissue lysates
  • RNA sequencing using a 3 ' digital gene expression method allows the profiling of a high number of samples in a cost-efficient manner.
  • the protocol is robust for a broad range of input from single cells to pooled cells or extracted RNA. It allows the profiling of a large number of samples of extracted RNA (patient samples for example), profiling of a population of small number of cells (e.g., cell or tissue lysates), as well as analysis of sorted, single cells.
  • the use of the barcodes and UMIs described herein permit the tracking of individual transcripts to a specific multi-well plate and to a specific well of that plate, thus permitting correlation of data to the original starting material.
  • the above examples are indicative of the powerful applications of the technology.
  • the ability to correlate expression analysis to a particular well of a multi-well plate is critical in the screening assay context, regardless of whether the material in the screen is a single cell or lysate. Because the bar codes and UMI allow tracking of individual transcripts, sequencing reactions can be run as massive multiplex reactions rather than a series of individual reactions without losing transcript-level data. This results in a significant increase in efficiency and decrease in cost.
  • the sequencing data then can be deconvo luted using, for example, 3 ' digital gene expression to count the number of occurrences of bar code and UMI sequences and obtain an expression level for a particular transcript.
  • the methods and reagents described herein also are adaptable to other platforms, e.g., micro fluidic systems such as Fluidigm's CI micro fluidic device. For example, the capture of 96 cells was performed on the CI chip, and the reagents and adapters to prepare the cDNA were incorporated directly on the C 1 chip. cDNAs were retrieved as an output of the CI chip, pooled, and prepared as a Nextera library.
  • the nucleic acids, methods, and kits of the invention also provide the ability to profile single cells for which it is not possible to do an individual RNA extraction and purification, or, by working directly with lysates, profiling a high number of conditions under which cells are cultivated without necessarily performing a separate RNA extraction and purification step (e.g., if sequencing cells from a high throughput compound screen, it is unnecessary to extract and purify the RNA from each well individually).
  • one or more of the following modifications to the protocol or reagents used were and can optionally be employed.
  • another reverse transcriptase can be used, such as the MMLV Maxima H Minus Reverse Transcriptase (Thermo Scientific).
  • MMLV Maxima H Minus Reverse Transcriptase Thermo Scientific
  • numerous different MMLV reverse transcriptases have been successfully used and can be selected based on user preference, cost, availability and the like.
  • a proteinase or protease such as proteinase K, may be added during lysis.
  • proteinase K is included as part of lysis for sorted single cells and isolated cells/ly sates.
  • RNA sequencing of lysates inputs ranged from single cells to 10,000 cells (including tens or hundreds of cells). For pooled cells, more concentrated proteinase K (2mg/ml instead of lmg/ml for single cells) was used, and the cells were incubated longer (one hour at 50 °C instead of 15 minutes) to increase lysis efficiency.
  • Full length cDNA amplification Amplify full length cDNA by single primer PCR using the Advantage 2 PCR Enzyme System (Clontech, #639206).
  • the PCR reaction is as follows: 20 ⁇ , of cDNA from previous step, 5 ⁇ of 10X Advantage 2 PCR buffer, ⁇ of dNTPs, ⁇ of the SINGV6 primer ( ⁇ , Integrated DNA Technologies), ⁇ of Advantage 2 Polymerase Mix, and 22 ⁇ of Nuclease-Free Water.
  • Full length cDNA purification and quantification [0145] Purify the full length cDNAs with 30 ⁇ of Agencourt AMPure XP magnetic beads (Beckman Coulter, #A63880). Elute the full length cDNAs in 12 ⁇ of Nuclease-Free Water and quantify on the Qubit 2.0 Flurometer (Life Technologies) using the dsDNA HS Assay (Life Technologies. #Q32851).
  • Sequencing Library Quality Assessment [0148] Quantify the library on the Qubit 2.0 Flurometer using the dsDNA HS Assay.
  • the quality and average size of the library can be assessed by BioAnalyzer (Agilent) with the High Sensitivity DNA kit (Agilent, #5067-4626).
  • Sequencing can be performed on any Illumina HiSeq or MiSeq, using the standard Illumina sequencing kit. Libraries are run on paired-end flow cells by running 17 cycles on the first end, then 8 cycles to decode the Nextera barcode and finally 46 cycles. Up to twelve Nextera libraries/384-well capture plate, each comprising 384 cells, can be multiplexed together (twelve i7 barcodes currently available) allowing the simultaneous sequencing of up to 4,608 single cell transcriptomes on a single lane.
  • sequences are provided below and herein. Such sequences are merely illustrative of various polynucleotides and components useful in the methods of the present invention. These polynucleotides are suitable across any of the various sample types described herein (e.g., single cells, lysates, bulk RNA, etc.).
  • V (A, G, or C) N: (A, G, C, or T)

Abstract

The present invention relates generally to methods for single-cell nucleic acid profiling, and nucleic acids useful in those methods. For example, it concerns using barcode sequences to track individual nucleic acids at single-cell resolution, utilizing template switching and sequencing reactions to generate the nucleic acid profiles. These methods and compositions are also applicable to other starting materials, such as cell and tissue lysates or extracted/purified RNA.

Description

High-throughput R A-seq
Related Application
[0001] This application claims priority and benefit from U.S. Provisional Patent Application No. 61/834,163, filed June 12, 2013, the contents and disclosures of which are hereby incorporated by reference in their entirety.
Field of the Invention
[0002] The present invention relates generally to methods for single-cell nucleic acid profiling, and nucleic acids useful in those methods. In some embodiments, it concerns using barcode sequences to track individual nucleic acids at single-cell resolution, utilizing template switching and sequencing reactions to generate the nucleic acid profiles. In addition to the substantial utility in single cell profiling, the methods and compositions provided herein are also applicable to other starting materials, such as cell and tissue lysates or extracted/purified RNA. Background of the Invention
[0003] Although transcriptome profiling is an important method for functional characterization of cells and tissues, current technical limitations for whole transcriptome analysis limit the technique to either population averages or to a limited number of single cells. These shortcomings limit transcriptome profiling 's ability to accurately assess stochastic variation in gene expression between individual cells and the analysis of distinct subpopulations of cells, both of which have been proposed to be important factors driving cellular differentiation and tissue homeostasis. In addition, current single-cell transcriptome profiling methods, in addition to being limited to a relatively low number of cells, also are expensive and labor-intensive. Improved methods are therefore required to fully characterize a cell population at single-cell resolution. Such improved methods also have utility in improving analysis of other starting materials, such as cell and tissue lysates or extracted/purified R A.
Summary of the Invention [0004] In some embodiments, the invention provides a nucleic acid comprising a 5' poly-isonucleotide sequence (for example, comprising an isocytosine, an isoguanosine, or both, such as an isocytosine -isoguanosine-isocytosine sequence), an internal adapter sequence, and a 3' guanosine tract. The 3' guanosine tract can comprise two guanosines, three guanosines, four guanosines, five guanosines, six guanosines, seven guanosines, or eight guanosines. In certain embodiments, the 3' guanosine tract comprises three guanosines. The adapter sequence can be 12 to 32 nucleotides in length, for example, 22 nucleotides in length (e.g., an adapter sequence of 5'-ACACTCTTTCCCTACACGACGC-3' (SEQ ID NO: 1)).
[0005] In some embodiments, the invention provides a nucleic acid comprising a 5' blocking group (e.g., biotin or an inverted nucleotide), an internal adapter sequence, a barcode sequence, a unique molecular identifier (UMI) sequence, a complementarity sequence, and a 3' dinucleotide sequence comprising a first nucleotide and a second nucleotide, wherein the first nucleotide of the dinucleotide sequence is a nucleotide selected from adenine, guanine, and cytosine, and the second nucleotide of the dinucleotide sequence is a nucleotide selected from adenine, guanine, cytosine, and thymine. In certain embodiments, the internal adapter sequence is 23 to 43 nucleotides in length, for example, 33 nucleotides in length (e.g., an internal adapter sequence of 5'- ACACTCTTTCCCTACACGACGC-3' (SEQ ID NO: 1)). In certain
embodiments, the barcode sequence is 4 to 20 nucleotides in length, for example, 6 nucleotides in length. In certain embodiments, the UMI sequence is six to 20 nucleotides in length, for example, ten nucleotides in length. In some
embodiments, the complementarity sequence is a poly(T) sequence, and may be 20 to 40 nucleotides in length, for example, 30 nucleotides in length. [0006] In some embodiments, the invention provides a kit comprising one or more nucleic acids as described above, for example a) a nucleic acid comprising a 5 ' poly-isonucleotide sequence, an internal adapter sequence, and a 3 ' guanosine tract, b) a nucleic acid comprising a 5' blocking group (e.g., biotin or an inverted nucleotide), an internal adapter sequence, a barcode sequence, a unique molecular identifier (UMI) sequence, a complementarity sequence, and a 3 ' dinucleotide sequence comprising a first nucleotide and a second nucleotide, wherein the first nucleotide of the dinucleotide sequence is a nucleotide selected from adenine, guanine, and cytosine, and the second nucleotide of the dinucleotide sequence is a nucleotide selected from adenine, guanine, cytosine, and thymine, or c) both. In certain embodiments, the kit comprises a plurality of the nucleic acids of b). In further embodiments, the UMI sequence of each nucleic acid in the plurality of nucleic acids is unique among the nucleic acids in the kit, and in still further embodiments, the plurality of nucleic acids comprises different populations of nucleic acid species. In such embodiments, each population of nucleic acid species may comprise a different barcode sequence that uniquely identifies a single population of nucleic acid species. In certain embodiments, each population of nucleic acid species is in a separate container, and the bar code of each population of nucleic acid species differs by at least two nucleotides from the bar code of each other population of nucleic acid species. [0007] A kit of the invention may further comprise a third nucleic acid primer comprising 12 to 32 nucleotides (e.g., 22 nucleotides in length) and a 5' blocking group (e.g., biotin or an inverted nucleotide). An exemplary sequence of such a primer is 5'-ACACTCTTTCCCTACACGACGC-3' (SEQ ID NO: 2). A kit may further comprise a nucleic acid comprising a barcode sequence, and optionally also comprise a phosphorothioate bond-containing nucleic acid comprising an Χ1 *Χ2*Χ3*Χ4*Χ5*3' sequence, wherein * is a phosphorothioate bond. In certain embodiments, the phosphorothioate bond-containing nucleic acid is 48 to 68 nucleotides in length, for example, 58 nucleotides in length. An exemplary sequence of a phosphorothioate bond-containing nucleic acid is
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTC TTCCG*A*T*C*T*3' (SEQ ID NO: 3).
[0008] In some embodiments, the kit further comprises a capture plate and/or a reverse transcriptase enzyme, such as a Moloney Murine Leukemia Virus
(MMLV) reverse transcriptase (e.g., SMARTscribe™ reverse transcriptase or Superscript II™ reverse transcriptase or Maxima H Minus™ reverse transcriptase) and/or a DNA purification column, such as a DNA purification spin column, and/or a protease or proteinase (e.g., proteinase K).
[0009] In some embodiments, the invention provides a method for gene profiling, comprising a) providing a plurality of single cells; b) releasing mRNA from each single cell to provide a plurality of individual mRNA samples, wherein each individual mRNA sample is from a single cell; c) reverse transcribing the individual mRNA samples and performing a template switching reaction to produce cDNA incorporating a barcode sequence; d) pooling and purifying the barcoded cDNA produced from the separate cells; e) amplifying the barcoded cDNA to generate a cDNA library comprising double-stranded cDNA; f) purifying the double-stranded cDNA; g) fragmenting the purified cDNA; h) purifying the cDNA fragments; and i) sequencing the cDNA fragments. In some alternative embodiments, the invention provides a method for gene profiling, comprising a) providing an isolated population of cells; b) releasing mRNA from the population of cells to provide one or more mRNA samples; c) reverse transcribing the one or more mRNA samples and performing a template switching reaction to produce cDNA incorporating a barcode sequence; d) pooling and purifying the barcoded cDNA; e) amplifying the barcoded cDNA to generate a cDNA library comprising double-stranded cDNA; f) purifying the double-stranded cDNA; g) fragmenting the purified cDNA; h) purifying the cDNA fragments; and i) sequencing the cDNA fragments.
[0010] In certain embodiments, the method further comprises separating a population of cells (e.g., by flow cytometry) to provide the plurality of single cells, for example, by separating them into a capture plate. In alternative embodiments, a population of cells can be sorted into a capture plate such that each well of the capture plate contains a smaller population of cells. Alternatively, cell lysate or R A samples can be divided into a capture plate. In certain embodiments, the mR A is released by cell lysis, for example, by freeze-thawing and/or contacting the cells with proteinase K. In certain embodiments, c) comprises contacting each individual mRNA sample with one or more nucleic acids as described above, for example i) a nucleic acid comprising a 5 ' poly-isonucleotide sequence, an internal adapter sequence, and a 3 ' guanosine tract, ii), a nucleic acid comprising a 5 ' blocking group (e.g., biotin or an inverted nucleotide), an internal adapter sequence, a barcode sequence, a unique molecular identifier (UMI) sequence, a complementarity sequence, and a 3' dinucleotide sequence comprising a first nucleotide and a second nucleotide, wherein the first nucleotide of the dinucleotide sequence is a nucleotide selected from adenine, guanine, and cytosine, and the second nucleotide of the dinucleotide sequence is a nucleotide selected from adenine, guanine, cytosine, and thymine, or iii) both. In certain embodiments, c) is carried out with a reverse transcriptase enzyme, for example, a Moloney Murine Leukemia Virus (MMLV) reverse transcriptase such as SMARTscribe™ reverse transcriptase or Superscript II™ reverse transcriptase or Maxima H Minus™ reverse transcriptase. In certain embodiments, the cDNA purification of d) is carried out with a Zymo-Spin™ column.
[0011] In certain embodiments, the method further comprises treating the barcoded cDNA with an exonuclease, such as with Exonuclease I. In certain embodiments, the amplification of e) utilizes an amplification primer comprising a 5' blocking group, such as biotin or an inverted nucleotide. Exemplary
amplification primers are 12 to 32 nucleotides in length, for example, 22 nucleotides in length (e.g., as in the amplification primer having the sequence of 5'-ACACTCTTTCCCTACACGACGC-3' (SEQ ID NO: 2)). In certain embodiments, the purification of f) may be carried out with magnetic beads, e.g., Agencourt AMPure XP magnetic beads (Beckman Coulter, #A63880), and/or may further comprise quantifying the purified cDNA. In certain embodiments, the single cells are provided in a capture plate of individual wells (e.g., a 384 well plate), each well comprising a single cell. In alternative embodiments, a population of cells is provided in a capture plate, each well comprising a population of cells. Alternatively, cell lysate or RNA samples can be provided in a capture plate. In should be understand throughout that when referring to identification of a particular sample, such as a sample in a well of a plate, that sample may be a single cell or some other sample, such as a lysate or bulk RNA. Thus, reference to a "well" or "sample" should be understood to refer to any of those types of samples. In certain embodiments, reference to "cell/well" or "well/cell" is similarly used to reflect that a sample may be a single cell or some other sample. When a sample is a single cell, identification of a well is equivalent to identification of a single cell. When the sample is something other than a single cell, identification of a well identifies the well in which that sample is provided but does not necessarily identify a single cell. [0012] In certain embodiments, the fragmentation of g) utilizes a transposase, and may further utilize a first fragmentation nucleic acid and a second
fragmentation nucleic acid, wherein the first fragmentation nucleic acid comprises a barcode sequence. An exemplary first fragmentation nucleic acid is 5'- C AAGC AG AAG AC GGC AT AC GAG AT [i7] GT CTC GTGGGCTC GG-3 ' (SEQ ID NO: 4), wherein [i7] represents a barcode sequence. In some embodiments, the
[i7] sequence is four to 16 nucleotides in length, for example, eight nucleotides in length. In some embodiments, the [i7] sequence uniquely identifies a single population of nucleic acid species, for example, a population of nucleic acid species derived from a population of single cells from a capture plate. In some embodiments, the [i7] sequence is selected from: TCGCCTTA (SEQ ID NO: 5),
CTAGTACG (SEQ ID NO: 6), TTCTGCCT (SEQ ID NO: 7), GCTCAGGA (SEQ ID NO: 8), AGGAGTCC (SEQ ID NO: 9), CATGCCTA (SEQ ID NO: 10), GTAGAGAG (SEQ ID NO: 11), CCTCTCTG (SEQ ID NO: 12), AGCGTAGC (SEQ ID NO: 13), CAGCCTCG (SEQ ID NO: 14), TGCCTCTT (SEQ ID NO: 15), and TCCTCTAC (SEQ ID NO: 16). In certain embodiments, the barcode sequence of the first fragmentation nucleic acid is different than the barcode sequence of the nucleic acid described in ii) above. In certain embodiments, the barcode sequence of the first fragmentation nucleic acid uniquely identifies a predetermined subset of cells, for example, a subset of cells contained in individual wells of a single capture plate. In further embodiments, the barcode sequence that uniquely identifies the predetermined subset of cells uniquely identifies the capture plate. In certain embodiments, the barcode sequence of the nucleic acid as described in ii) above uniquely identifies the cell within the predetermined subset of cells, which cell comprised the m NA from which the barcoded cDNA of c) was produced. In further embodiments, the barcode sequence that uniquely identifies the cell within the predetermined subset of cells uniquely identifies an individual well in a capture plate, and in still further embodiments, the
combination of the barcode sequence that uniquely identifies the predetermined subset of cells and the barcode sequence that uniquely identifies the cell within a predetermined subset of cells uniquely identifies the capture plate and the individual well which comprised the cell, which cell comprised the mRNA from which the barcoded cDNA of c) was produced. In certain embodiments, the barcode sequence of the first fragmentation nucleic acid is 4 to 20 nucleotides in length, for example, 6 nucleotides in length. In certain embodiments, the second fragmentation nucleic acid is a phosphorothioate bond-containing nucleic acid comprising an X1 *X2*X3*X4*X5*3' sequence, wherein * is a phosphorothioate bond. An exemplary second fragmentation nucleic acid is 48 to 68 nucleotides in length, e.g., 58 nucleotides in length, such as a second fragmentation nucleic acid with a sequence of 5'-
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTC TTCCG*A*T*C*T*-3' (SEQ ID NO: 3). [0013] In certain embodiments, the purification of h) is carried out with magnetic beads, and may optionally further comprise separating the magnetic-bead purified cDNA on an agarose gel, excising cDNA corresponding to 300 to 800 nucleotides in length, and purifying the excised cDNA. In certain embodiments, h) further comprises quantifying the purified cDNA. In certain embodiments, the sequencing of i) is carried out using R A-seq. In certain embodiments, the method further comprises assembling a database of the sequences of the sequenced cDNA fragments of j), and may additionally comprise identifying the UMI sequences of the sequences of the database. In further embodiments, j) further comprises discounting duplicate sequences that share a UMI sequence, thereby assembling a set of sequences in which each sequence is associated with a unique UMI.
[0014] In certain embodiments, a) through h) are repeated before i) to produce a plurality of populations of cDNA fragments, and in particular embodiments, the populations of cDNA fragments are combined prior to i). In certain embodiments, the barcode sequence of the first fragmentation nucleic acid and the barcode sequence of the nucleic acid as described in ii) above are used to correlate the sequencing data with the predetermined subset of cells and the individual cell.
Brief Description of the Drawings
[0015] Figure 1 depicts incomplete differentiation of human adipose tissue - derived stromal/stem cells (hASCs) in vitro. Figure 1 A: cells at day 0. Figure IB: cells at day 7 (i.e., on the seventh day after the cells were induced to differentiate). Figure 1C: cells at day 14 (i.e., on the fourteenth day after the cells were induced to differentiate).
[0016] Figure 2 depicts a flow chart of an exemplary method for single cell RNA sequencing.
[0017] Figure 3 depicts how a single cell digital gene expression library was constructed, including barcode sequences incorporating sequencing primer sequences, indicated by arrows, and regions that anneal to their complementary oligonucleotides on a flow cell during sequencing (P5 and P7). N6: cell/well barcode index; N10: Unique Molecular Identifier (UMI). The sequencing primer with an i7 plate index is indicated by an arrow, and the two sequencing primers (read 1 and read 2) also are indicated by arrows.
[0018] Figure 4 depicts a reduction in PCR bias through the use of Unique Molecular Identifier (UMI) sequences.
[0019] Figure 5 depicts distributions of expression levels of the key marker genes FABP4 (Figure 5A), SCD (Figure 5B), LPL (Figure 5C), and POSTN (Figure 5D) during adipocyte differentiation. Particularly, Figure 5 depicts the expression levels of gene across the cells/wells over time such that the position on the y axis shows the level of expression and the thickness of the bar shows the number of cells expressing at that level.
[0020] Figure 6 depicts gene detection in single cells. Approximately 3,000 to 4,000 unique genes were detected per cell and approximately 15,000 unique genes were detected across all cells. Gene expression was reliably detected at approximately 25 to 50 transcripts per cell, although bursty transcription
(transcription occurring in pulses rather than at a constant rate) introduced additional variation.
[0021] Figure 7 depicts GAPDH detection at day 0. Figure 7 A depicts a histogram showing the distribution of GAPDH expression among cells profiled at day 0 as an exemplification of a transcriptional burst. Figure 7B depicts genes associated with GAPDH. Figure 7C provides a pictorial representation of the cell cycle. GAPDH is considered to be a housekeeping gene and often is used as a reference gene for normalization.
[0022] Figure 8 depicts principal component analysis of an hASC population at day O.
[0023] Figure 9 depicts principal component analysis of an hASC population at day 0 (black) and day 1 (gray). [0024] Figure 10 depicts principal component analysis of an hASC population at day 0 (black) and day 2 (gray).
[0025] Figure 11 depicts principal component analysis of an hASC population at day 0 (black) and day 3 (gray). [0026] Figure 12 depicts principal component analysis of an hASC population at day 0 (black) and day 7 (gray).
[0027] Figure 13 depicts principal component analysis of an hASC population at day 0 (black) and day 14 (gray).
[0028] Figure 14 depicts differentially expressed genes between day 0 (black) and day 14 (gray) hASC populations and between day 14 sub-populations.
[0029] Figure 15 depicts the expression of adipocyte genes correlating with Gl- arrest. Genes that had similar expression levels at Day 14 and Day 0 (Figure 15 A, label A) correspond to categories of genes involved in G-l arrest (Figure 15B, label A), indicating that those cells that did not fully differentiate may be stuck in the GO phase. This reveals a correlation between differentiation state and cell cycle progression when gene expression is analyzed at the single cell level.
[0030] Figure 16 depicts the process of adipocyte differentiation in mouse (3T3- Ll) and human (hASC) stem cells, and that an absence of clonal expansion of hASCs may limit adipogenesis. [0031] Figure 17 depicts cell culture heterogeneity using single-cell sequencing. Figure 17A depicts gene expression estimates from bulk cells compared to their corresponding means across single cell profiles. UPM: unique molecular identifier (UMI) counts for one gene per million UMI counts for all genes. Figure 17B depicts the distribution of observed pairwise correlations (Pearson's r) between all pairs of genes that were detected in at least 10% of day 7 cells (n = 4,038 genes), as compared to an estimated null distribution obtained by permuting the expression values of each gene across the same cells. Figures 17C and 17D depict single cell qPvT-PCR validation and single molecule FISH validation, respectively, of the observed positive correlation between the LPL and G0S2 markers from separate cells also collected at day 7.
[0032] Figure 18 depicts a comparison of RefSeq gene expression levels as estimated from the total number of raw aligned sequencing reads or the total number of unique UMIs. Each dot compares the mean raw counts across all profiled cells in the first time course (Dl) to the mean UMI counts for the same gene. The raw and UMI counts are strongly correlated, but the UMI counts correct for a systematic bias in the raw expression levels of a subset of genes, which is likely caused by preferential PCR amplification or sequencing.
[0033] Figure 19 depicts the relationship between the proportion of cells where a gene was detected (UMI count > 1) and its estimated expression level from bulk RNA profiling. Data is shown for day 0 of the D3 differentiation time course. Solid line: medians; top and bottom dotted lines: 90th and 10th percentiles, respectively. UPM = UMI counts for a gene per million UMI counts from all genes.
[0034] Figure 20 depicts a comparison of single-molecule RNA sequencing (Figure 20A) and single molecule FISH (smFISH, Figure 20B) data for LPL and G0S2 during the D3 time course. Single -molecule RNA sequencing values are in UPM, while smFISH measurements are in mRNAs detected per cell. The smFISH data confirm the positive correlation between LPL and G0S2 after 7 days of differentiation. R: Pearson's correlation coefficient.
[0035] Figure 21 depicts gene expression dynamics at single cell resolution. Each scatter plot depicts the first three principal components (PCs) of the initial hASC time course at the indicated time point (Figure 21 A: day 0; Figure 21B: day 1; Figure 21C: day 2; Figure 21D: day 3; Figure 21E: day 5; Figure 21F: day 7; Figure 21G: day 9; Figure 21H: day 14). Black dots show cells collected at the indicated time point, while gray dots show cells collected at all previous time points. Figure 211 depicts separately sorted cells with high and low lipid content from day 14 projected into the same PC space.
[0036] Figure 22 depicts distributions of weights for the top four PCs in an initial hASC time course and a lipid-based sorting. To the right of the gene expression data, selected genes and gene sets associated with positive and negative weights are provided. Percentages indicate the ratio of the total variance in the data set captured by each PC. Horizontal lines within each set of boxes indicate medians, boxes indicate the 1st and 3rd quartiles, and whiskers indicate the ranges.
Detailed Description of the Invention
[0037] The present invention provides nucleic acids, kits, and methods for transcriptome-wide profiling at single cell resolution. In some embodiments, the invention provides Unique Molecular Identifiers (UMIs) (e.g., polynucleotides comprising UMIs) that specifically tag individual cDNA species as they are created from mRNA, thereby acting as a robust guard against amplification biases. Each UMI enables a sequenced cDNA to be traced back to a single particular mRNA molecule that was present in a cell. In some embodiments, the invention provides two levels of barcode-based multiplexing, allowing a sequenced cDNA to be traced to a particular cell from among a subset of cells. In some embodiments, the invention provides efficient transposon-based fragmentation, resulting in high yield cDNA libraries. In some embodiments, the invention provides sequencing of the 3 '-end of mRNAs, limiting the sequencing coverage required to assess gene expression level of each single cell transcriptome. The methods allow the preparation of RNA-seq libraries in a manner that is not labor-intensive or time- consuming. Indeed, RNA-seq libraries of a thousand single cells can be easily prepared in two days. Any of the foregoing (or any of the nucleic acids, reagents, kits, and methods described herein may be provided and/or used alone or in any combination). [0038] The foregoing is also applicable to populations of cells, cell lysates, tissue lysates, and/or extracted/purified RNA. For example, the invention also provides nucleic acids, kits, and methods for sequencing of extracted/purified RNA (bulk RNA sequencing) or for analysis of an isolated population of cells (e.g., from an isolated population of cells or a tissue; analysis of a cell or tissue lysate). In certain embodiments, any of the compositions, reagents, and methods described herein as applicable to single cells also are applicable to other sources of starting materials, such as extracted RNA, purified RNA, cell lysates, or tissue lysates, and such application is contemplated. In certain embodiments, any of the
compositions, reagents, and methods described herein as applicable to extracted RNA, purified RNA, cell lysates or tissue lysates, also are applicable to single cells, and such application is contemplated.
[0039] The present invention provides improved nucleic acids, kits, and methods capable of transcriptome-wide profiling at single cell resolution of tens of thousands of cells simultaneously and cost-effectively (approximately $2 per sample, as compared to approximately $80 per sample with a current method). In certain embodiments, the methods and kits may include both customized nucleic acids and/or method steps that are themselves the subject of this application, as well as one or more commercially available reagents, kits, apparatuses, or method steps. The methods of the invention provide a number of distinct advantages over existing methods. Some current methods require a polyA addition step prior to sequencing, but this step can be eliminated through the use of a Moloney Murine Leukemia Virus reverse transcriptase. Moreover, full-length cDNA amplification can be carried out using the suppression PCR principle, thereby enriching full length cDNAs, and the method can be applied directly to cells rather than requiring
RNA extraction first.
[0040] The methods of the invention also provide an advantage in that they utilize at least two barcode sequences rather than one, allowing for the
simultaneous sequencing of at least 4,608 single-cell transcriptomes in a single lane, as compared to only 96 transcriptomes in current methods. Still further, optimization of reaction volumes can conserve expensive reagents, such as the reverse transcriptase enzyme, reducing costs. Additionally, by utilizing 3' end digital sequencing, less sequencing coverage is needed to determine gene expression levels, further reducing costs. [0041] The methods of the invention provide an advantage over current methods targeting the 3 'end of mRNA that use linear mR A amplification. Linear mR A amplification is time-consuming compared to template switching/suppression PCR amplification. Linear mRNA amplification also is labor-intensive and limits the number of cells that can be processed to approximately 50 cells per day by a single person. By contrast, the methods of the invention can accommodate 384 cells in a single plate, allowing a single person to easily process up to 1152 cells per day.
[0042] The use of UMIs also provides a distinct advantage over typical single- cell RNA-seq methods. Because of the very low starting amount of RNA in a single cell, several amplification steps are required during the process of the RNA- seq library preparation, and the UMIs protect against amplification biases.
[0043] The methods of the invention utilizing a transposase-based sequencing library preparation have the added advantage of eliminating a number of labor- intensive and costly steps in library preparation, including magnetic bead immobilization, separate fragmentation, end repair, dA-tailing, and adaptor ligation. By eliminating the separate steps of chemical fragmentation and its purification, end repair, dA-tailing and adapter ligation, labor and cost are reduced, and the yield is much higher than with other techniques because there are fewer purification steps (during which material can be lost) and because this method to tag the fragment is much more efficient than by ligation with a regular ligase. Because less material is lost in the process, the methods of the invention can start with a much lower amount of starting cDNA. This is beneficial because even when combining and amplifying cDNA from 384 cells, there is often a low starting amount of cDNA to begin the library preparation. [0044] The invention provides methods that are advantageous based on a number of improvements to existing methods. A typical method provided by the invention is depicted in Figure 2, and starts with preparing a capture plate for cell sorting. Cells are then sorted into the plate (e.g., by fluorescence activated cell sorting), after which the plate may be frozen down for storage. For single cell analysis, one cell is sorted into each well of the plate. One advantage of the nucleic acids provided herein is that the use of various barcodes permits the end user to correlate transcript expression back to a particular well and plate, and thus to a specific cell evaluated. To lyse the cells, the plate can, in certain embodiments, be thawed from its frozen state. Optionally, a proteinase or protease, such as proteinase K, is added to the cells to increase the efficiency of the lysis. If performing bulk RNA-seq, the cell sorting and individual cell lysis steps can be skipped, as the starting material is already R A. If the starting material is a population of cells, the population can be divided into a multi-well plate in preparation for lysis. Or, if the starting material is a lysate prepared from a population of cells or tissues, cell or tissue lysis may optionally occur in a prior step before introduction into the well and then lysate itself may be added to each well of a multi-well plate. For example, a population of cells can be sorted into lysis buffer and lysed (e.g., by freeze-thawing, proteinase K treatment, or a combination thereof) before the lysate is added to the plate. The next steps are to reverse transcribe the mR A that has been released from the cells and to perform a template switching step. The reverse transcription and template switching can be performed using the nucleic acids of the invention, which efficiently perform these steps. For example, a cDNA synthesis primer comprising a 5' blocking group, an internal adapter sequence, a barcode sequence, a unique molecular identifier (UMI) sequence, a complementarity sequence, and a
3 ' dinucleotide sequence comprising a first nucleotide and a second nucleotide, wherein the first nucleotide of the dinucleotide sequence is a nucleotide selected from adenine, guanine, and cytosine, and the second nucleotide of the dinucleotide sequence is a nucleotide selected from adenine, guanine, cytosine, and thymine, can be used for reverse transcription. Here, the 5 ' blocking group is used to ensure the correct directionality of cDNA synthesis and the adapter sequence provides a sequence annealing to a sequencing primer, so the first sequencing read will contain the barcode and UMI sequences. Part of the adapter sequence also is used during the suppression PCR. The barcode sequence is used to track which well (and, thus, which cell) a particular cDNA was generated from. In bulk RNA-seq and lysate sequencing embodiments, a barcode can provide a reference for (and, thus, a way to identify) the sample or the pool (e.g., the well) rather than a single cell. Alternatively, a UMI can be used in bulk RNA-seq and lysate sequencing to identify the transcript and the \Ί primer (which, in other embodiments, typically contains the barcode for the plate, e.g., for plate indexing - sometimes referred to as the plate barcode or the index) identifies the sample or pool (e.g., the well) rather than the single cell. In these embodiments, the UMI can be, for example, a 16mer UMI. Thus, in certain embodiments, a combination of one or more barcodes and a UMI is used. In other embodiments, a UMI is used either alone or with a single barcode. In either way, the methods and compositions provide a mechanism for identifying where a particular transcript came from. In certain embodiments, i7 is used for plate indexing (e.g., it is a barcode to identify a particular plate). In other embodiments, \Ί serves as a sample barcode. The UMI provides a way to trace each cDNA produced to a particular mRNA derived from a cell/sample. The complementarity sequence anneals to the mRNA, for example, to the poly(A) tail of an mRNA, although it also could anneal to a specific target sequence, such as the sequence of a particular mRNA, instead. The 3 ' dinucleotide sequence target the extremity of the polyA tail, the last two bases of the mRNA before the polyA tail. These two final nucleotides prevent the nucleic acid from annealing elsewhere within the polyA tail, which can be as long as 250bp in length. If the nucleic acid were to bind elsewhere, one would not be able to directly access the useful sequence information of the transcript. A template- switching oligonucleotide comprising a 5 ' poly-isonucleotidecytosine- isoguanosine-isocytosine sequence, an internal adapter sequence, and a 3' guanosine tract can be used in the template switching step. The 5' poly- isonucleotidecytosine-isoguanosine-isocytosine sequence provides non-standard base pairs in the template switching oligo to prevent background cDNA synthesis. These nucleotide isomers inhibit reverse transcriptase, such as MMLV reverse transcriptase, from extending the cDNA beyond the template switching adapter, thus increasing cDNA yield by reducing formation of concatemers of the template switching adapter. The adapter sequence provides the sub sequence required for the suppression PCR, and the 3 ' guanosine tract is used to anneal to a polycytosine tract generated at the 3 ' end of the first strand of cDNA synthesized. These steps are useful in incorporating a barcode and a UMI into the resulting cDNAs. The barcode introduced here helps track the individual well (and, therefore, cell/sample) that a cDNA population came from, while the UMI is unique for each mR A that produces a cDNA. Thus, the population of UMIs incorporated into the cDNAs provide a molecular "snapshot" of the mRNA population of the cell or sample at the time of lysis, because subsequent amplification steps do not alter the number of UMIs, making it possible to trace back each cDNA sequenced later to a particular mRNA released from the cell/sample. The template switching step is selective for the creation of full-length cDNAs.
[0045] After reverse transcription and template switching, the wells can be pooled together and purified, followed by treatment with an exonuclease such as Exonuclease I. Without the exonuclease treatment, such as Exonuclease I treatment, the primer used for the suppression PCR can bind to the remaining adapters that are in excess from the template switching reaction, so the addition of an exonuclease, such as Exonuclease I, improves results. The cDNAs then are amplified (e.g, via PCR), followed by subsequent purification and quantification steps. Next, the library is prepared for sequencing by fragmentation, e.g., with a transposase-based fragmentation system. This step also introduces a second bar code to the cDNAs, this second bar code being specific for the capture plate from which the cDNAs were pooled. Thus, each cDNA will have a bar code for both the plate and the well from which it was derived, allowing simultaneous processing of a large number of samples, in which each individual sequence can be traced back to a single mRNA of a specific cell (or, in the case of another type of sample, to be traced back to a well containing a cell or tissue lysate sample, a purified RNA sample, or the like). The library then can be purified, selected for appropriate size fragments, assessed for quantity and quality, and sequenced (e.g., by R A-seq such as the Illumina HiSeq™ (Catalog # SY-401-2501) or MiSeq™ (Catalog # SY-410-1003) systems). The sequencer can handle various read lengths and either single-end or paired-end sequencing. The libraries can be run in a way that matches with the read length required to read each barcode and obtains enough information from the sequence of the cDNA to identify from which gene it was coming from. For example, 17 cycles can be run for read 1 (see above) to read first the 6bp well/cell barcode and the lObp of UMI. This is then followed by 9 cycles to read the 8bp i7 plate index. Finally, 46 cycles are, in certain
embodiments, run on the other strand to read the cDNA/gene sequence. The machine allows the operator to set up a custom run for which they decide the read length for each portion for which sequence is to be obtained. This sequencing design allows an individual to decipher all the information while using the smaller/cheapest kit to meet their needs (e.g., 50 cycle kit that actually contains enough reagents for 74 cycles). Alternatively, an individual could run more cycles to get longer stretches of cDNA.
Before sequencing, samples from multiple capture plates can be combined without losing the identity of each cDNA in the mixture because of the two barcode sequences. Thus, the data can be deconvo luted after sequencing to determine the UMI of each particular cDNA and the well and plate it came from via the barcodes. This is advantageous because it allows a researcher to run many more samples together than would otherwise be possible, and to do so with less cost and labor.
Definitions [0046] Throughout this specification, the word "comprise" or variations such as "comprises" or "comprising" will be understood to imply the inclusion of a stated integer (or components) or group of integers (or components), but not the exclusion of any other integer (or components) or group of integers (or
components). [0047] The singular forms "a," "an," and "the" include the plurals unless the context clearly dictates otherwise.
[0048] The term "including" is used to mean "including but not limited to." "Including" and "including but not limited to" are used interchangeably. [0049] The terms "patient," "subject," and "individual" may be used
interchangeably and refer to either a human or a non-human animal. These terms include mammals such as humans, primates, livestock animals (e.g., bovines, porcines), companion animals (e.g., canines, felines) and rodents (e.g., mice and rats). [0050] The term "diagnosis" as used herein refers to methods by which the skilled artisan can estimate and/or determine whether or not a patient is afflicted with a given disease or condition. The skilled worker often makes a diagnosis based on one or more diagnostic indicators. Exemplary diagnostic indicators may include the manifestation of symptoms or the presence, absence, or change in one or more markers for the disease or condition. A diagnosis may indicate the presence or absence, or severity, of the disease or condition.
[0051] The term "prognosis" is used herein to refer to the likelihood of the progression or regression of a disease or condition, including likelihood of the recurrence of a disease or condition. [0052] As used herein, "treating" a disease or condition refers to taking steps to obtain beneficial or desired results, including clinical results. Beneficial or desired clinical results include, but are not limited to, reduction, alleviation or amelioration of one or more symptoms associated with the disease or condition.
[0053] As used herein, "administering" or "administration of a compound or an agent to a subject can be carried out using one of a variety of methods known to those skilled in the art. For example, a compound or an agent can be administered orally, intravenously, arterially, intradermally, intramuscularly, intraperitoneally, subcutaneously, ocularly, sublingually, intranasally, intraspinally, intracerebrally, and transdermally. A compound or agent can appropriately be introduced by rechargeable or biodegradable polymeric devices or other devices, e.g., patches and pumps, or formulations, which provide for the extended, slow, or controlled release of the compound or agent. Administering can also be performed, for example, once, a plurality of times, and/or over one or more extended periods. Administration of a compound may include both direct administration, including self-administration, and indirect administration, including the act of prescribing a drug. For example, a physician who instructs a patient to self-administer a therapeutic agent, or to have the agent administered by another, and/or who provides a patient with a prescription for a drug has administered the drug to the patient.
[0054] The term "nucleic acid" refers to DNA molecules (e.g., cDNA or genomic DNA), RNA molecules (e.g., mRNA), DNA-RNA hybrids, and analogs of the DNA or RNA generated using nucleotide analogs. The nucleic acid molecule can be a nucleotide, oligonucleotide, double-stranded DNA, single- stranded DNA, multi-stranded DNA, complementary DNA, genomic DNA, non- coding DNA, messenger RNA (mRNA), microRNA (miRNA), small nucleolar RNA (snoRNA), ribosomal RNA (rRNA), transfer RNA (tRNA), small interfering RNA (siRNA), heterogeneous nuclear RNAs (hnRNA), or small hairpin RNA (shRNA).
[0055] As used herein, a "profile" of a transcriptome or portion of a
transcriptome can refer to any sequencing or gene expression information concerning the transcriptome or portion thereof. This information can be either qualitative (e.g., presence or absence) or quantitative (e.g., levels or mRNA copy numbers). In some embodiments, a profile can indicate a lack of expression of one or more genes.
[0056] The term "cDNA library" refers to a collection of complementary DNA (cDNA) fragments. A cDNA library may be generated from the transcriptome of a single cell or from a plurality of single cells. cDNA is produced from mRNA found in a cell and therefore reflects those genes that have been transcribed for subsequent protein expression.
[0057] As used herein, a "plurality" of cells refers to a population of cells and can include any number of cells to be used in the methods described herein. For example, a plurality of cells includes at least 10 cells, at least 25 cells, at least 50 cells, at least 100 cells, at least 200 cells, at least 500 cells, at least 1,000 cells, at least 5,000 cells, or at least 10,000 cells. In some embodiments, a plurality of cells includes from 10 to 100 cells, from 50 to 200 cells, from 100 to 500 cells, from 100 to 1,000 cells, or from 1,000 to 5,000 cells. [0058] As used herein, a "single cell" refers to one cell. Single cells useful in the methods described herein can be obtained from a tissue of interest, or from a biopsy, blood sample, or cell culture. Additionally, cells from specific organs, tissues, tumors, neoplasms, or the like can be obtained and used in the methods described herein. Cells can be cultured cells or cells from a dissociated tissue, and can be fresh or preserved in a preservative buffer such as R Aprotect.
Furthermore, in general, cells from any population can be used in the methods, such as a population of prokaryotic or eukaryotic single-celled organisms including bacteria or yeast. In some aspects of the invention, the method of preparing the cDNA library can include the step of obtaining single cells. A single cell suspension can be obtained using standard methods known in the art including, for example, enzymatically using trypsin or papain to digest proteins connecting cells in tissue samples or releasing adherent cells in culture, or mechanically separating cells in a sample. Single cells can be placed in any suitable reaction vessel in which single cells can be treated individually. For example a 96-well plate, such that each single cell is placed in a single well.
[0059] As used herein, an "oligonucleotide" or "polynucleotide" refers to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides or analogs thereof. Polynucleotides can have any three- dimensional structure and can perform any function. Exemplary polynucleotides include a gene or gene fragment (e.g., a probe or primer), exons, introns, messenger R A (mR A), transfer R A, ribosomal R A, ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA or RNA of any sequence, and nucleic acid probes and primers. A
polynucleotide can comprise modified nucleotides, such as isonucleotides, methylated nucleotides, and other nucleotide analogs. The term also refers to both double- and single-stranded molecules. A polynucleotide is composed of a specific sequence of four nucleotide bases: adenine (A), cytosine (C), guanine (G), and thymine (T). Uracil (U) substitutes for thymine when the polynucleotide is RNA. The sequence can be input into databases in a computer having a central processing unit and used for bioinformatics applications such as functional genomics and homology searching.
[0060] As used herein, a "primer" is a polynucleotide that hybridizes to a target or template that may be present in a sample of interest. After hybridization, the primer promotes the polymerization of a polynucleotide complementary to the target, for example in a reverse transcription or amplification reaction.
Cell sorting and lysis
[0061] Methods for selecting or sorting cells are well established, and in some embodiments include, but are not limited to, fluorescence-activated cell sorting (FACS), micromanipulation, manual sorting, and the use of semi-automated cell pickers. Individual cells can be individually selected based on features detectable by observation (e.g., by microscopic observation). Exemplary features can include location, morphology, and reporter gene expression. A population of cells can be sorted to provide a subpopulation or a predetermined subset of cells. In some embodiments, the population, subpopulation, or predetermined subset can be sorted to provide single cells. In some embodiments, the cells are sorted into a capture plate. Capture plates can comprise a number of wells into which the cells are sorted, for example, 24 wells, 96 wells, 384 wells, or 1536 wells. In some embodiments, a population of cells is lysed without sorting. The population of cells can be, for example, a tissue sample. In certain embodiments, the population of cells is an isolated population of cells. In such embodiments, the starting material for further analysis may be, for example, a cell or tissue lysate or bulk purified or extracted RNA. In such embodiments, cells can be divided into the wells of a plate without sorting. In particular embodiments, the amount of material in each well is normalized with respect to the other wells so as to provide similar sequencing coverage across a plate.
[0062] To release mRNA from cells, the cells may be lysed. Cells may be lysed by any number of known techniques. Exemplary cell lysis techniques include freeze-thawing, heating the cells, using a detergent or other chemical method, or a combination thereof. Techniques minimizing degradation of the released mRNA are preferred. Likewise, techniques preventing the release of nuclear chromatin are preferred. For example, heating the cells in the presence of Tween-20 is sufficient to lyse cells while minimizing genomic contamination from nuclear chromatin. In certain embodiments, cells are lysed using freeze-thawing. In some embodiments, a proteinase or protease, such as proteinase K, is added to the lysis reaction to increase the efficiency of lysis. In certain embodiments, cells are lysed using freeze-thawing optionally supplemented with addition of proteinase K.
[0063] As noted above, cell lysis may be of single cells already sorted into individual wells of a plate. Alternatively, lysis of populations of cells may be performed and the starting material for further sequence analysis may be a cell or tissue lysate made from a plurality of cells and then aliquoted to wells of a plate. Regardless of starting material, in certain embodiments, following lysis the material may be stored at a suitable temperature, such as -80 °C, prior to further use.
Reverse transcription and template switching [0064] In some embodiments, cDNA is synthesized from mRNA through the process of reverse transcription. Reverse transcription can be performed directly on cell lysates (for example, a cell lysate prepared as described above), by adding a reaction mix for reverse transcription directly to the cell lysate. In alternative embodiments, the total RNA or mRNA can be purified after cell lysis, for example through the use of column based (e.g., Qiagen RNeasy Mini kit Cat. No. 74104, ZymoResearch Direct-zol RNA Cat. No. R2050) or magnetic bead purification (e.g., Agencourt RNAClean XP, Cat. No. A63987). Methods for reverse transcription of mRNA to cDNA are well established in the art. In some embodiments, the reverse transcription is combined with a template switching step to improve the yield of longer (e.g., full length) cDNA molecules. In certain embodiments, the reverse transcriptase used has tailing or terminal transferase activity, and synthesizes and anchors first- strand cDNA in one step. In certain embodiments, the reverse transcriptase is a Moloney Murine Leukemia Virus (MMLV) reverse transcriptase, for example, SMARTscribe™ (Clontech, Cat. No. 639536) reverse transcriptase, Superscript II™ reverse transcriptase (Life Technologies, Cat. No. 18064-014), or Maxima H Minus™ reverse transcriptase. (Thermo Scientific, Cat. No. EP0753).
[0065] Template switching introduces an arbitrary sequence at the 3 ' end of the cDNA that is designed to be the reverse complement to the 3 ' end of a cDNA synthesis primer. In some embodiments, the synthesis of the first strand of the cDNA can be directed by a cDNA synthesis primer (CDS) that includes an RNA complementary sequence (RCS). In some embodiments, the RCS is at least partially complementary to one or more mRNA species in an individual mRNA sample, allowing the primer to hybridize to at least some mRNA species in a sample to direct cDNA synthesis using the mRNA as a template. The RCS can comprise oligo (dT) sequence that binds to many mRNA species, or it can be specific for a particular mRNA species, for example, by binding to an mRNA sequence of a gene of interest. Alternatively, the RCS can comprise a random sequence, such as random hexamers. To avoid the CDS self-priming, a non-self- complementary sequence can be used.
[0066] A template-switching oligonucleotide that includes a portion which is at least partially complementary to a portion of the 3 ' end of the first strand of cDNA generated by the reverse transcription can also be used in the methods of the invention. Because the terminal transferase activity of reverse transcriptase typically causes the incorporation of two to five cytosines at the 3 ' end of the first strand of cDNA synthesized, the first strand of cDNA can include a plurality of cytosines, or cytosine analogues that base pair with guanosine, at its 3 ' end to which the template-switching oligonucleotide with a 3' guanosine tract can anneal. During the template switching step, the template-switching oligonucleotide is extended to form a double stranded cDNA. Thus, in some embodiments, a template-switching oligonucleotide can include a 3 ' portion comprising a plurality of guanosines or guanosine analogues that base pair with cytosine. Exemplary guanosines or guanosine analogues include, but are not limited to,
deoxyriboguanosine, riboguanosine, locked nucleic acid-guanosine, and peptide nucleic acid-guanosine. The guanosines can be ribonucleosides or locked nucleic acid monomers. A locked nucleic acid is an R A nucleotide wherein the ribose moiety has been modified with an extra bridge connecting the 2' oxygen and the 4' carbon. A peptide nucleic acid is an artificially synthesized polymer similar to DNA or RNA, wherein the backbone is composed of repeating N-(2-aminoethyI)- glycine units linked by peptide bonds.
[0067] In some embodiments, the reverse transcription and template switching comprise contacting an mRNA sample with two nucleic acid primers. In certain embodiments, the first nucleic acid primer (e.g., a template-switching
oligonucleotide) comprising a 5 ' poly-isonucleotidecytosine-isoguanosine- isocytosine sequence, an internal adapter sequence, and a 3 ' guanosine tract. In certain embodiments, the 5' poly-isonucleotide sequence comprises an isocytosine, or an isoguanosine, or both. In certain embodiments, the 5 ' poly-isonucleotide sequence comprises an isocytosine -isoguanosine-isocytosine sequence.
Incorporating non-natural nucleotides, such as an isocytosine or an isoguanosine into template-switching primers can reduce background and improve cDNA synthesis (Kapteyn et al., BMC Genomics. 11 :413 (2010)). In some embodiments, the 3' guanosine tract comprises two, three, four, five, six, seven, eight, nine, ten, or more guanosines. In certain embodiments, the 3' guanosine tract comprises three guanosines. In some embodiments, the adapter sequence is 12 to 32 nucleotides in length, for example, 22 nucleotides in length. In particular embodiments, the internal adapter sequence is 5'- ACACTCTTTCCCTACACGACGC-3' (SEQ ID NO: 1). In particular embodiments, the sequence of the first primer is 5'- iCiGiCACACTCTTTCCCTACACGACGCrGrGrG-3' (SEQ ID NO: 17)(e.g., 1 μΜ,) wherein iC represents isocytosine (iso-dC), iG represents isoguanosine, and rG represents RNA guanosine.
[0068] In certain embodiments, the second nucleic acid primer (e.g., a cDNA synthesis primer) comprises a 5' blocking group, an internal adapter sequence, a barcode sequence, a unique molecular identifier (UMI) sequence, a
complementarity sequence, and a 3' dinucleotide sequence comprising a first nucleotide and a second nucleotide, wherein the first nucleotide of the dinucleotide sequence is a nucleotide selected from adenine, guanine, and cytosine, and the second nucleotide of the dinucleotide sequence is a nucleotide selected from adenine, guanine, cytosine, and thymine. Optionally, to sequence bulk RNA or lysates, the bar code can be omitted from the cDNA synthesis primer and an extra 6 base pairs can be added to the UMI sequence. In particular embodiments, the 5' blocking group is selected from biotin, an inverted nucleotide (e.g., inverted dideoxy-T), a fluorophore, an amino group, and iso-dG or isodC. In particular embodiments, the internal adapter sequence is 23 to 43 nucleotides in length, for example, 33 nucleotides in length. In particular embodiments, the internal adapter sequence is 5'-ACACTCTTTCCCTACACGACGC-3' (SEQ ID NO: 1). In particular embodiments, the barcode sequence is 4 to 20 nucleotides in length, for example, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides in length. In particular embodiments, the UMI sequence is 6 to 20 nucleotides in length, for example, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides in length. In particular embodiments, the complementarity sequence is a poly(T) sequence. In particular embodiments, the complementarity sequence is 20 to 40 nucleotides in length, for example, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 nucleotides in length. In specific embodiments, the second nucleic acid primer is 5 '-
/5Biosg/ACACTCTTTCCCTACACGACGCTCTTCCGATCT[BC6]NNNNNNN ΝΝ^ΉΧΉΉΉ^ (SEQ ID NO: 18), wherein 5Biosg represents 5' biotin; V represents a nucleotide selected from A, G, and C; the 3' N represents a nucleotide selected from A, G, C, and T; [BC6] represents a 6 base pair barcode sequence; and the (N)10 after the barcode sequence represents a Unique Molecular Identifier (UMI) sequence. In these primers, the barcodes may be designed so that each barcode sequence differs from the barcodes of all other primers by at least two nucleotides, so that a single sequencing error cannot lead to the misidentification of the barcode.
[0069] The UMI sequences provide a robust guard against amplification biases. More particularly, each UMI is present only once in a population of second nucleic acid primers. Thus, each UMI is incorporated into a unique cDNA sequence generated from a cellular mRNA, and any subsequent amplification steps will not alter the one UMI to one mRNA ratio. In certain embodiments, the UMI sequence, rather than being 10 nucleotides in length, is 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or more nucleotides in length. The length should be selected to provide sufficient unique sequences for the population of cells to be tested (preferably with at least two nucleotide differences between any pair of UMIs), preferably without adding unnecessary length that increases sequencing cost. [0070] Barcode sequences enable each cDNA sample generated by the above method to have a distinct tag, or a distinct combination of tags, such that once the tagged cDNA samples have been pooled, the tag can be used to identify the single cell from which each cDNA sample originated. Thus, each cDNA sample can be linked to a single cell, even after the tagged cDNA samples have been pooled and amplified. In other words, the use of the foregoing nucleic acids permits deconvolution of pooled data to single cell/well resolution. This is particularly advantageous for facilitating the application of this technology to screening assays.
[0071] In some embodiments, a nucleic acid useful in the invention can contain a non-natural sugar moiety in the backbone, for example, sugar moieties with 2' modifications such as addition of a halogen, alkyl-substituted alkyl, SH, SCH3. OCN, CI, Br, CN, CF3, OCF3, S02CH3, OS02, N02, N3, or NH2. Similar modifications also can be made at other positions on the sugar. Nucleic acids, nucleoside analogs or nucleotide analogs having sugar modifications can be further modified to include a reversible blocking group, a peptide linked label, or both. In those embodiments comprising a 2' modification, the base can have a peptide- linked label.
[0072] A nucleic acid useful in the invention also can include native or non- native bases. In some embodiments, a native deoxyribonucleic acid can have one or more bases selected from adenine, thymine, cytosine, and guanine, and a ribonucleic acid can have one or more bases selected from uracil, adenine, cytosine, and guanine. Exemplary non-native bases include, but are not limited to, inosine, xanthine, hypoxanthine, isocytosine, isoguanosine, 5-methylcytosine, 5- hydroxymethyl cytosine, 2-aminoadenine, 6-methyl adenine, 6-methyl guanine. 2- propyl guanine, 2-propyl adenine, 2-thiothymine, 2-thiocylosine, 5-propynyl uracil, 5-propynyl cytosine, 6-azo uracil, 6-azo cytosine, 6-azo thymine, 4- thiouracil, 8-halo adenine, 8-halo guanine, 8-amino adenine, 8-amino guanine, 8- thiol adenine, 8-thiol guanine, 8-thioalkyl adenine, 8-thioalkyl guanine, 8-hydroxyl adenine, 8-hydroxyl guanine, 5-halo substituted uracil, 5-halo substituted cytosine, 7-methylguanine, 7-methyladenine, 8-azaguanine, 8-azaadenine, 7-deazaguanine, 7-deazaadenine, 3-deazaguanine, and 3-deazaadenine. In certain embodiments, isocytosine and isoguanosine may reduce non-specific hybridization. In some embodiments, a non-native base can have universal base pairing activity, wherein it is capable of base-pairing with any other naturally occurring base, e.g., 3- nitropyrrole and 5-nitroindole. cDNA pooling and purification
[0073] In some embodiments, after reverse transcription and template switching have been used to generate cDNA, the cDNA is pooled together. For example, a population of cells can be individually sorted into the wells of a tray, lysed, and undergo reverse transcription and template switching. These cDNAs then can be pooled and purified. In certain embodiments, the cDNA is purified through a column-based purification method, e.g., with a DNA Clean & Concentrator-5 column (Zymo Research, #D4013).
Exonuclease treatment
[0074] In some embodiments, pooled cDNAs are treated with an exonuclease (e.g., Exonuclease I) to degrade any primers remaining from the reverse transcription and template switching steps. This prevents possible interference by these primers in subsequent amplification.
Amplification
[0075] As used herein, the term "amplification" or "amplifying" refers to a process by which multiple copies of a particular polynucleotide are formed, and includes methods such as the polymerase chain reaction (PCR), ligation amplification (also known as ligase chain reaction, or LCR), and other
amplification methods. In some embodiments, amplification refers specifically to PCR. Amplification methods are widely known in the art. In general, PCR refers to a method of amplification comprising hybridization of primers to specific sequences within a DNA sample and amplification involving multiple rounds of annealing, elongation, and denaturation using a DNA polymerase. The resulting DNA products are then often screened for a band of the correct size. The primers used are oligonucleotides of appropriate length and sequence to provide initiation of polymerization. Reagents and hardware for conducting amplification reactions are widely known and commercially available. Primers useful to amplify sequences from a particular gene region are sufficiently complementary to hybridize to target sequences. Nucleic acids generated by amplification can be sequenced directly. [0076] When hybridization occurs in an antiparallel configuration between two single-stranded polynucleotides, the reaction is called "annealing" and those polynucleotides are described as "complementary". A double-stranded
polynucleotide can be complementary or homologous to another polynucleotide, if hybridization can occur between one of the strands of the first polynucleotide and the second. Complementarity or homology (the degree that one polynucleotide is complementary with another) is quantifiable in terms of the proportion of bases in opposing strands that are expected to form hydrogen bonding with each other, according to generally accepted base-pairing rules. The stringency of
hybridization is influenced by hybridization conditions, such as temperature and salt. In the context of amplification, these parameters can be suitably selected.
[0077] In some embodiments, cDNA created by reverse transcription and template switching, and optionally treated with an exonuclease, is amplified to provide more starting material for sequencing. cDNA can be amplified by a single primer with a region that is complementary to all cDNAs, e.g., an adapter sequence. In certain embodiments, the primer has a 5 ' blocking group such as biotin. An exemplary primer is as follows: 5 '-
/5Biosg/ACACTCTTTCCCTACACGACGC-3 ' (wherein 5Biosg represents 5 ' biotin) (SEQ ID NO: 19). One exemplary amplification reaction uses cDNA; PCR buffer, such as 1 OX Advantage 2 PCR buffer; dNTPs; the DNA primer 5 ' -
/5Biosg/ACACTCTTTCCCTACACGACGC-3 ' (SEQ ID NO: 19); Polymerase Mix, such as Advantage 2 Polymerase Mix; and Water, such as nuclease-free water, and is (in certain embodiments) performed using the following program: 95 °C for 1 minute; 18 cycles of a) 95 °C for 15 seconds, 65 °C for 30 seconds, 68 °C for 6 minutes, and 72 °C for 10 minutes (followed by an optional hold period at 4 °C). In certain bulk RNA-seq and lysate sequencing embodiments, this
amplification reaction may be modified to use fewer than 18 cycles, e.g., 10 cycles. One exemplary amplification reaction uses 20μΙ^ of cDNA; 5μΙ^ of 10X Advantage 2 PCR buffer; Ι μΙ, of dNTPs; Ι μΙ, of the DNA primer 5 '- /5Biosg/ACACTCTTTCCCTACACGACGC-3 ' (SEQ ID NO: 19) (10μΜ,
Integrated DNA Technologies); Ι μΙ^ of the Advantage 2 Polymerase Mix; and 22μΕ of Nuclease-Free Water, and is optionally performed using the following program: 95 °C for 1 min; 18 cycles of a) 95 °C for 15 sec, 65 °C for 30 sec, 68 °C for 6 min, and 72 °C for 10 min (followed by an option hold period at 4 °C).
However, the skilled worker will appreciate that amplification conditions may be adjusted depending on the exact primer and template being used. Nucleic acid purification and quantification
[0078] Nucleic acid purification (e.g., cDNA purification) is well known in the art. In some embodiments, a nucleic acid (e.g., cDNA) is purified with a spin- based column, such as those commercially available from Zymo Research™ (DNA Clean & Concentrator™-5, Cat. No. D4013) or Qiagen™ (MinElute PCR purification kit. Cat. No. 28004). In particular embodiments, the spin column is a column lacking a physical ring, for example the ring found in Qiagen™ columns, allowing elution of the purified nucleic acid in a lower volume than would be possible in a spin column with a ring. In some embodiments, a nucleic acid (e.g., cDNA, such as in a cDNA library), is purified using magnetic beads. Magnetic bead purification systems are well known and include, for example, the Agencourt AMPure XP™ system (Beckman Coulter, Cat. No. A63881). In some
embodiments, a nucleic acid (e.g., cDNA, such as in a cDNA library) is purified after being run on a gel. Gel extraction purification kits are well known, and include, for example, the MinElute Gel Extraction Kit™ (Qiagen, Cat. No. 28604).
Sequencing library preparation
[0079] In some embodiments, a cDNA library for sequencing is fragmented prior to the sequencing. A cDNA library can be fragmented by any known method, for example, mechanical fragmentation or a transposase-based fragmentation such as that used in the Nextera™ system (e.g., the Illumina Nextera XT DNA Sample
Preparation Kit Cat. No. FC-131-1096 or the Nextera DNA Sample Preparation Kit Cat. No. FC-121-1031). Fragmentation via a transposase-based system has the benefit of being able to incorporate into the fragments barcode sequences that facilitate identification of the fragments. In some embodiments, a barcode sequence introduced during preparation of a cDNA library for sequencing is specific for a predetermined set of cells. This predetermined set of cells can be a subset of a larger set of cells. For example, a tissue biopsy can be sorted into a set of cells to be further sorted into single cells in a capture plate for gene profiling. If a bulk lysate or population of cells is being used as a starting material rather than a single cells that have been sorted, a barcode sequence may, in certain
embodiments, not be necessary in this step if a barcode already has been incorporated into the cDNA library in previous steps. However, a plate barcode still could be used to multiplex a high number of samples even for purified
R A/lysates.
Sequencing library quality assessment
[0080] In some embodiments, a cDNA library for sequencing is quantified and evaluated for quality prior to the sequencing to ensure that the library is of sufficient quantity and quality to yield positive results from sequencing. For example, a cDNA library can be quantified using a fluorometer and analyzed for quantity and average size through the use of a number of commercially available kits. The 2 main metrics for quality are the concentration of the library (which needs to be sufficient for loading on the sequencer) and the length of the cDNA fragments to be sequenced. Size selection is performed on a gel to enrich for fragments of the correct size. The gel itself gives an idea of the quality of the library. The final extracted library can be run on an Agilent Bioanalyzer (Cat. No. G2940CA) to obtain the size distribution for the cDNA fragments.
Sequencing [0081] As used herein, "sequencing" refers to any technique known in the art that allows the identification of consecutive nucleotides of at least part of a nucleic acid. Exemplary sequencing techniques include RNA-seq (also known as whole transcriptome sequencing), Illumina™ sequencing, direct sequencing, random shotgun sequencing, Sanger dideoxy termination sequencing, whole-genome sequencing, massively parallel signature sequencing (MPSS), sequencing by hybridization, pyrosequencing, capillary electrophoresis, gel electrophoresis, duplex sequencing, cycle sequencing, single-base extension sequencing, solid- phase sequencing, high-throughput sequencing, massively parallel signature sequencing, emulsion PCR, sequencing by reversible dye terminator, paired-end sequencing, near-term sequencing, exonuclease sequencing, sequencing by ligation, short-read sequencing, single-molecule sequencing, sequencing-by- synthesis, real-time sequencing, reverse-terminator sequencing, nanopore sequencing, 454 sequencing, Solexa Genome Analyzer sequencing, SOLiD™ sequencing, MS-PET sequencing, mass spectrometry, and a combination thereof. In some embodiments, sequencing comprises detecting a sequencing product using an instrument, for example but not limited to an ABI PRISM™ 377 DNA
Sequencer, an ABI PRISM™ 310, 3100, 3100-Avant, 3730, or 3730x1 Genetic Analyzer, an ABI PRISM™ 3700 DNA Analyzer, or an Applied Biosystems
SOLiD™ System (all from Applied Biosystems), a Genome Sequencer 20 System (Roche Applied Science), or a mass spectrometer. In certain embodiments, sequencing is performed on Illumina Hiseq or MiSeq paired-end flow cells.
Data analysis [0082] As described herein, one major advantage of the nucleic acids, methods, and kits of the invention is that samples can be pooled and sequenced rather than needing to be sequenced individually. Sequencing products can be traced not only to a single plate of cells from which it came, but also to a single cell (e.g., a well) and, indeed, a single cellular transcript. This deconvolution of sequencing data can be achieved through the use of barcode and UMI sequences. In some
embodiments, sequencing is combined with 3' digital gene expression to provide a number of counts for a particular sequence or sequences (e.g., cDNAs containing a particular combination of bar codes and a UMI). In some embodiments, each fragment of each transcript is sequenced and then counted for how many fragments of each transcript have been sequenced. In these embodiments, the computed gene expression should be normalized based on the length of a given transcript because a longer transcript will have a greater chance of having one of its fragments sequenced. However, full transcript sequencing typically requires more sequencing coverage than DGE, for which only the 3 'end needs to be sequenced. Kits
[0083] In some embodiments, the invention provides a kit comprising a plurality of the one or both of the reverse transcription/template switching nucleic acid primers described above. In some embodiments, the UMI sequence of each of the second nucleic acid primer described above in the plurality of nucleic acids of the kit is unique among the nucleic acids of the kit. In some embodiments, the plurality of nucleic acids comprises different populations of nucleic acid species. In certain embodiments, each population of nucleic acid species comprises a different barcode sequence that uniquely identifies a single population of nucleic acid species. In some embodiments, the kit further comprises a third nucleic acid primer comprising 12 to 32 nucleotides and a 5' blocking group as described above. In some embodiments, the third nucleic acid is 22 nucleotides in length. An exemplary sequence of the third nucleic acid primer is 5'- ACACTCTTTCCCTACACGACGC-3' (SEQ ID NO: 2). In some embodiments, the kit further comprises a nucleic acid comprising a barcode sequence. In some embodiments, the kit further comprises a phosphorothioate bond-containing nucleic acid comprising an X1 *X2*X3*X4*X5*3' sequence, wherein * is a phosphorothioate bond. In certain embodiments, the phosphorothioate bond- containing nucleic acid is 48 to 68 nucleotides in length, for example, 58 nucleotides in length. An exemplary sequence of the phosphorothioate bond- containing nucleic acid is 5'-
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTC TTCCG*A*T*C*T*3' (SEQ ID NO: 3). In further embodiments, the kit further comprises a capture plate and/or a reverse transcriptase enzyme and/or a DNA purification column (e.g., a DNA purification spin column) and/or proteinase K.
For example, the kit can comprise a Moloney Murine Leukemia Virus (MMLV) reverse transcriptase, for example, SMARTscribe™ reverse transcriptase,
Superscript II™ reverse transcriptase, or Maxima H Minus™ reverse
transcriptase. Exemplary kits include any one or any combinations of the reagents described herein and, optionally, directions for use. When multiple reagents and/or nucleic acids are provided in a single kit, the reagents may be provided in separate containers, such as separate tubes or vials. Optionally, the kit contains sterile water for use.
Research applications
[0084] In some embodiments, the nucleic acids, kits, and/or methods of the invention are used for research applications requiring sequencing or gene expression profiling. In certain embodiments, the research applications include studying cellular differentiation, characterizing tissue heterogeneity, high- throughput screening of agents (e.g., potential therapeutics, potential
differentiation inducers, potential toxins, or any other agents whose effects on cells are of interest), stem cell reprogramming, cell lineage tracing, and virus detection in blood samples. Exemplary applications of the technology to the research context and proof are provided in the Examples and are merely illustrative of uses of the technology.
[0085] In certain embodiments, the nucleic acids (e.g., compositions), kits, and/or methods, of the disclosure are applied to gene expression analysis of single cells, optionally in response to contacting the single cell with an agent in the high- throughput screening context. The ability to analyze gene expression accurately and across large numbers of cells, and to be able to accurately correlate the expression level to a particular cell/well is an exemplary advantage and application of the instant technology. The technology is, in certain embodiments, similarly applied to other samples, such as cell or tissue lysates.
Diagnosis, prognosis, and treatment
[0086] As described above, the invention is useful in generating a gene expression profile for a plurality of cells. These gene expression profiles can be used in a number of applications related to the diagnosis, prognosis, and treatment of a subject. For example, cells from a tissue sample collected from a patient can be used in the methods of the invention to generate an expression profile that can be compared against a known profile that is indicative of the disease or condition, thus informing a physician of whether the subject has the disease or condition. Similarly, the profile can be compared to a known profile useful in the prognosis of the disease or condition. For example, if the known profile is predictive of a cancer prognosis, the comparison may inform the physician of the stage of cancer or the cancer's likelihood of metastasis. In some embodiments, the invention can be used in a method of treating a disease or condition in a subject in need thereof. For example, a method of the invention can be used to obtain gene expression profiles in a subject before and after treatment with a therapeutic agent, thereby providing a means of determining the efficacy of the therapeutic agent. These data can be used to determine the efficacy of a treatment, or to help a physician determine an effective treatment regimen.
[0087] The invention is applicable to various diseases or conditions. Exemplary diseases or conditions are a cancer, a cardiovascular disease or condition, a neurological or neuropsychiatric disease or condition, an infectious disease or condition, a respiratory or gastrointestinal tract disease or condition, a reproductive disease or condition, a renal disease or condition, a prenatal or pregnancy-related disease or condition, an autoimmune or immune-related disease or condition, a pediatric disease, disorder, or condition, a mitochondrial disorder, an ophthalmic disease or condition, a musculo-skeletal disease or condition, or a dermal disease or condition. [0088] All publications, patents and published patent applications referred to in this application are specifically incorporated by reference herein. In case of conflict, the present specification, including its specific definitions, will control.
[0089] Each embodiment described herein may be combined with any other embodiment described herein. [0090] The following examples are provided to illustrate certain embodiments of the invention and are not intended to limit the scope of the invention.
Examples
Example 1: Protocol for transcriptome-wide single-cell RNA sequencing [0091] To test the methods of the invention, the protocol described below was developed.
Capture plate preparation
[0092] 5μί of lysis buffer, composed of a 1/500 dilution of Phusion HF buffer (New England Biolabs, #B0518S) were distributed in each well of a Twin.tec PCR 384-well collection plates (Eppendorf, # 951020729).
Cell preparation
[0093] Media was removed by pelleting the cells for 5min at lOOOrpm, and the RNA was immediately stabilized by resuspending the cells in 500μί of
RNAprotect Cell Reagent (Qiagen, #76526) and 1 μΕ of RNaseOUT Recombinant Ribonuclease Inhibitor (Life Technologies, #10777-019). Cells were stored up to two weeks at 4 °C. Prior to sorting, cells in the RNAprotect Cell Reagent were diluted in 1.5mL PBS, pH 7.4 (no calcium, no magnesium, no phenol red, Life Technologies, #10010-049). The cells then were stained for viability (DNA staining by Hoechst 33342) with NucBlue Live ReadyProbes Reagent (Life Technologies, #R37605).
Cell collection
[0094] Cells were sorted individually in each well of a 384-well capture plate using the FACSAria II flow cytometer (BD Biosciences). "Live" cells were selected and duplets avoided using the Hoechst DNA staining. In other words, following Hoechst staining, dead cells could be removed and not processed further and presence of a single cell / well could be confirmed. After sorting, the plates were immediately sealed, spun down, and frozen on dry ice. The sorted cells were stored at -80 °C. Cell lysis
[0095] Cells were thawed for 5 minutes at room temperature, then placed on ice. Reverse Transcription/Template Switching
[0096] 1 \iL of a 1 x 10"7 dilution of ERCC RNA Spike-In Mix (Life
Technologies, #4456740) was added to each well. Ι μΙ, of a universal adapter DNA primer (template-switching oligonucleotide) 5 '- iCiGiCACACTCTTTCCCTACACGACGCrGrGrG-3 ' (Ι μΜ,) (SEQ ID NO: 17) was added to each well, wherein iC represesents isocytosine (iso-dC), iG represents isoguanosine, and rG represents RNA guanosine. Ι μί of a cDNA synthesis primer 5'-
/5Biosg/ACACTCTTTCCCTACACGACGCTCTTCCGATCT[BC6]NNNNNNN NNNTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTVN-3* (SEQ ID NO: 18) (Ι μΜ) is added to each well, wherein 5Biosg represents 5 ' biotin, V represents a nucleotide selected from A, G, and C, N represents a nucleotide selected from A, G, C, and T, [BC6] represents a 6 base pair barcode sequence, different for each well of a 384 well plate, and (N)10 represents a Unique Molecular Identifier (UMI) sequence. The barcode sequences were designed such that each barcode differed from the others by at least two nucleotides, so that a single sequencing error could not lead to the misidentification of the barcode (Table 1). The plate was subsequently incubated at 72 °C for 3 minutes then immediately placed on ice to cool down (although this step is optional). The Template Switching step was carried out in each well using the following reagents: 2μΕ of 5X 1st strand buffer (250mM UltraPure Tris-HCl, pH 8.0, Life Technologies, #15568-025; 375mM KC1, LifeTechnologies, #AM9640G; 30mM MgC12, Life Technologies,
#AM9530G); Ι μΙ, of DL-Dithiothreitol solution BioUltra, 20mM (Sigma-Aldrich, #43816); Ι μί of dNTPs (New England Biolabs, #N0447L); 0.25μί of a MMLV Reverse Transcriptase, in this particular example, the MMLV reverse transcriptase SmartScribe Reverse Transcriptase (Clontech, #639538); and 0.75μΙ, of Nuclease- Free Water (not DEPC-Treated) water (LifeTechnologies, #AM9937). The plate was incubated at 42 °C for 1 hour 30 minutes. Table 1 : Exemplary bar code sequences
Figure imgf000040_0001
AGATTA 56
AGTAAT 57
AGTATA 58
AGTTAA 59
ATAAAC 60
ATAACA 61
ATAAGT 62
ATAATG 63
ATACAA 64
ATACTT 65
ATAGAT 66
ATAGTA 67
ATATAG 68
ATATCT 69
ATATGA 70
ATATTC 71
ATCAAA 72
ATCATT 73
ATCTAT 74
ATCTTA 75
ATGAAT 76
ATGATA 77
ATGTAA 78
ATTAAG 79
ATTACT 80
ATTAGA 81
ATTATC 82
ATTCAT 83
ATTCTA 84
ATTGAA 85
ATTGTT 86
ATTTAC 87
ATTTCA 88
ATTTGT 89
ATTTTG 90
CAAAAT 91
CAAATA 92
CAATAA 93
CATAAA 94
CATATT 95 CATTAT 96
CATTTA 97
CTAAAA 98
CTAATT 99
CTATAT 100
CTATTA 101
CTTAAT 102
CTTATA 103
CTTTAA 104
GAAATT 105
GAATAT 106
GAATTA 107
GATAAT 108
GATATA 109
GATTAA 1 10
GTAAAT 1 1 1
GTAATA 1 12
GTATAA 113
GTTAAA 114
GTTATT 115
GTTTAT 116
GTTTTA 117
TAAAAC 118
TAAACA 1 19
TAAAGT 120
TAAATG 121
TAACAA 122
TAACTT 123
TAAGAT 124
TAAGTA 125
TAATAG 126
TAATCT 127
TAATGA 128
TAATTC 129
TACAAA 130
TACATT 131
TACTAT 132
TACTTA 133
TAGAAT 134
TAGATA 135 TAGTAA 136
TAGTTT 137
TATAAG 138
TATACT 139
TATAGA 140
TATATC 141
TATCAT 142
TATCTA 143
TATGAA 144
TATGTT 145
TATTAC 146
TATTCA 147
TATTGT 148
TATTTG 149
TCAAAA 150
TCAATT 151
TCATAT 152
TCATTA 153
TCTAAT 154
TCTATA 155
TCTTAA 156
TGAAAT 157
TGAATA 158
TGATAA 159
TGATTT 160
TGTAAA 161
TGTATT 162
TGTTAT 163
TGTTTA 164
TTAAAG 165
TTAACT 166
TTAAGA 167
TTAATC 168
TTACAT 169
TTACTA 170
TTAGAA 171
TTAGTT 172
TTATAC 173
TTATCA 174
TTATGT 175 TTATTG 176
TTCAAT 177
TTCATA 178
TTCTAA 179
TTGAAA 180
TTGATT 181
TTGTTA 182
TTTAAC 183
TTTACA 184
TTTAGT 185
TTTATG 186
TTTCAA 187
TTTCTT 188
TTTGTA 189
TTTTAG 190
TTTTCT 191
TTTTGA 192
TCTTTC 193
TTGGAT 194
ACCGTA 195
AGACCT 196
AGGGAT 197
ATCGAG 198
CAAGCT 199
CACCAA 200
CAGTCA 201
CATCAG 202
CATGGT 203
CCACAT 204
CCGATT 205
CGACTT 206
CGATTG 207
CTAGTG 208
CTTCTG 209
GAAGAC 210
GATCGT 211
GCTAGA 212
GCTTAC 213
GGACAT 214
GGCAAT 215 GGGATT 216
GTACAC 217
GTCAAG 218
GTGACT 219
GTTCGA 220
TAGTGG 221
TCCAAC 222
TCGAAG 223
TCTGCA 224
TTCCTC 225
TTGTCC 226
TTTGGC 227
CCAACC 228
CCTTCC 229
CTCTCC 230
GGACCA 231
GTACCG 232
ACCCCC 233
ACCCGG 234
ACCGCG 235
ACCGGC 236
ACGCCG 237
ACGCGC 238
ACGGCC 239
ACGGGG 240
AGCCCG 241
AGCCGC 242
AGCGCC 243
AGCGGG 244
AGGCCC 245
AGGCGG 246
AGGGCG 247
AGGGGC 248
CACCCC 249
CACCGG 250
CACGCG 251
CACGGC 252
CAGCCG 253
CAGCGC 254
CAGGCC 255 CAGGGG 256
CCACCG 257
CCACGC 258
CCAGGG 259
CCCACG 260
CCCAGC 261
CCCCAC 262
CCCCCA 263
CCCCGT 264
CCCCTG 265
CCCGAG 266
CCCGGA 267
CCCTGG 268
CCGAGG 269
CCGCAG 270
CCGCGA 271
CCGGAC 272
CCGGCA 273
CCGGGT 274
CCGGTG 275
CCGTCG 276
CCGTGC 277
CCTCGG 278
CCTGCG 279
CCTGGC 280
CGACCC 281
CGACGG 282
CGAGCG 283
CGAGGC 284
CGCACC 285
CGCAGG 286
CGCCAG 287
CGCCCT 288
CGCCGA 289
CGCCTC 290
CGCGAC 291
CGCGCA 292
CGCGGT 293
CGCGTG 294
CGCTCG 295 CGCTGC 296
CGGACG 297
CGGAGC 298
CGGCAC 299
CGGCCA 300
CGGCGT 301
CGGCTG 302
CGGGAG 303
CGGGCT 304
CGGGGA 305
CGGGTC 306
CGGTCC 307
CGGTGG 308
CGTCCG 309
CGTCGC 310
CGTGCC 31 1
CGTGGG 312
CTCCCG 313
CTCCGC 314
CTCGGG 315
CTGCGG 316
CTGGCG 317
CTGGGC 318
GACCCG 319
GACCGC 320
GACGCC 321
GACGGG 322
GAGCCC 323
GAGCGG 324
GAGGCG 325
GAGGGC 326
GCACCC 327
GCACGG 328
GCAGCG 329
GCAGGC 330
GCCACC 331
GCCAGG 332
GCCCAG 333
GCCCCT 334
GCCCGA 335 GCCCTC 336
GCCGAC 337
GCCGCA 338
GCCGGT 339
GCCGTG 340
GCCTCG 341
GCCTGC 342
GCGACG 343
GCGAGC 344
GCGCAC 345
GCGCCA 346
GCGCGT 347
GCGCTG 348
GCGGAG 349
GCGGCT 350
GCGGGA 351
GCGGTC 352
GCGTCC 353
GCGTGG 354
GCTCCG 355
GCTCGC 356
GCTGCC 357
GCTGGG 358
GGACGC 359
GGAGCC 360
GGAGGG 361
GGCACG 362
GGCAGC 363
GGCCAC 364
GGCGAG 365
GGCGCT 366
GGCGGA 367
GGCGTC 368
GGCTCC 369
GGGACC 370
GGGAGG 371
GGGCAG 372
GGGCCT 373
GGGCGA 374
GGGCTC 375 GGGGAC 376
GGGGCA 377
GGGGGT 378
GGGGTG 379
GGGTCG 380
GGGTGC 381
GGTCCC 382
GGTGCG 383
GGTGGC 384
GTCCCC 385
GTCGCG 386
GTCGGC 387
GTGCGC 388
GTGGCC 389
GTGGGG 390
TCCCCG 391
TCCCGC 392
TCCGGG 393
TCGCGG 394
TCGGCG 395
TCGGGC 396
TGCCCC 397
TGCGCG 398
TGCGGC 399
TGGCCG 400
TGGCGC 401
TGGGCC 402
TGGGGG 403
cDNA pooling and purification
[0097] All 384 wells were pooled together, and 35mL of DNA Binding Buffer (Zymo Research, #D4004-1-L) was added to the pooled cDNAs. All cDNAs pooled from one 384-well plate were purified through a DNA purification spin column, in this case, one single DNA Clean & Concentrator-5 column (Zymo Research, #D4013), and the cDNAs were eluted in 17 of Nuclease-Free Water.
Exonuclease I treatment [0098] Pooled cDNAs were treated with an exonuclease, in this case
Exonuclease I, 2^L of 10X reaction buffer, of Exonuclease I (New England Biolabs, #M0293L), and the reaction was incubated at 37 °C for 30 minutes, then at 80 °C for 20 minutes. Full length cDNA amplification
[0099] Full length cDNA was amplified by single primer PCR using the Advantage 2 PCR Enzyme System (Clontech, #639206). The PCR reaction was set up as follows: 20μΙ, of cDNA from previous step; 5μί of 10X Advantage 2 PCR buffer; ΙμΙ, of dNTPs; ΙμΙ, of the DNA primer 5'- /5Biosg/AC ACTCTTTCCCTACACGACGC-3 ' (SEQ ID NO : 19) (wherein
5Biosg represents 5' biotin) (10μΜ, Integrated DNA Technologies); ΙμΙ, of the Advantage 2 Polymerase Mix; and 22μΕ of Nuclease-Free Water, and performed using the following program: 95 °C for 1 minute; 18 cycles of a) 95 °C for 15 seconds, 65 °C for 30 seconds, 68 °C for 6 minutes, and 72 °C for 10 minutes (followed by an option hold period at 4 °C).
Full length cDNA purification and quantification
[0100] Full length cDNAs were purified with 30μΙ, of beads (here, Agencourt AMPure XP magnetic beads (Beckman Coulter, #A63880)). The full length cDNAs were eluted in 12μΕ of Nuclease-Free Water and quantified on the Qubit 2.0 Flurometer (Life Technologies) using the dsDNA HS Assay (Life
Technologies #Q32851).
Sequencing library preparation
[0101] From the purified full length cDNA, lng of cDNA was engaged in Nextera library preparation according to the Illumina protocol, with the exception that in the Illumina protocol, only the i7 primer (e.g., a primer which is standard to the Illumina system) was used to barcode cDNA originating from the same 384- well plate, whereas we also use 5μΜ of a second primer (5'- AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTC TTCCG*A*T*C*T*-3' (SEQ ID NO: 3), wherein * represents a phosphorothioate bond) during the library amplification step.
Sequencing library purification and size selection
[0102] The resulting sequencing library was purified with 30μί of Agencourt AMPure XP magnetic beads and eluted in 20μί of nuclease free water. The entire library was run on an E-Gel EX Gel, 2% (Life Technologies, #G4010-02), and the band corresponding to a size range of 300 to 800bp was excised and purified using the QIAquick Gel Extraction Kit (Qiagen, #28704).
Sequencing library quality assessment [0103] The library was quantified on the Qubit 2.0 Fluorometer using the dsDNA HS Assay. The quality and average size of the library were assessed by
BioAnalyzer (Agilent) with the High Sensitivity DNA kit (Agilent, #5067-4626).
Sequencing
[0104] Sequencing is performed on any Illumina® HiSeq™ or MiSeq™ using standard Illumina® sequencing kit. Libraries are run on paired-end flow cells by running 17 cycles on the first strand, then 8 cycles to decode the Nextera™ barcode and finally 34 cycles (although 46 cycles also can be used to increase the amount of sequencing data). Up to twelve Nextera libraries/384-well capture plates, each comprising 384 cells, are multiplexed together (twelve libraries can be used with a set of twelve plate-identifying barcode sequences, although this number can be expanded with additional barcode sequences), allowing the simultaneous sequencing of up to 4,608 single cell transcriptomes on a single lane.
Example 2: Single cell sequencing of differentiating stem cells
[0105] The methods and reagents (e.g., polynucleotides, kits, etc.) described herein have numerous applications. The following provides an example demonstrating the application of the instant technology to a particular context. The method described above was used to sequence the transcriptomes of a population of differentiating human adipose tissue-derived stromal/stem cells (hASCs) at three different time points (day 0, day 1, day 2, day 3, day 5, day 7, day 9, and day 14). Visual inspection of these cells indicates that differentiation over time is incomplete, thus leading to a heterogeneous cell population (Figure 1). Given the heterogeneous appearance of the cells, we would expect that, if cells in the culture could be rigorously analyzed at the single cell level and gene expression accurately correlated with each specific single cell, expression of genes relevant to
differentiation and other activities would differ across individual cells at a given time point. We thus undertook such analysis as proof of principle of the robustness of the methods and compositions of the present invention.
[0106] As proof of principle, single-cell R A-seq data were generated for -9,216 cells in total that represent -1,152 cells collected for each of the eight time points profiled (day 0, day 1, day 2, day 3, day 5, day7, day 9, and day 14). To generate these data, FACS was used to sort the cells into 24 384-well plates.
Figure 3 depicts the design of the sequencing library incorporating the two levels of barcoding (well/cell and plate), the UMI, and the primer sequences indicated as P5 and P7 for Illumina sequencing. P5 and P7 are the regions that anneal to their complementary oligos on the flow cell. The index (i7) represents the plate index than is added during the Nextera tagmentation process after all wells have been pooled and pre-amplified. It is incorporated by PCR during the last step of the library preparation. One i7 index is used per pool/plate of 96 or 384 samples/cells, allowing for a higher level of multiplexing by pooling several plates together for sequencing. The sequencing primers P5 and P7 initiate the sequencing reaction. The sequencing will result in 3 distinct reads. The first one is 16bp long and includes 6bp of the well/cell barcode followed by lObp of the UMI. Then the i7 index sequencing primer allows us to read the plate/pool index (i7, 8bp) on the same strand. Finally, the other strand is generated (paired-end sequencing) and the read 2 sequencing primer allows us to read the actual cDNA fragment, which is typically 45bp with a 50 cycle kit. By using the 3 reads and deciphering the barcodes, we can trace each cDNA to a specific well, plate, and transcript. In certain embodiments, the disclosure provides a polynucleotide as set forth on Figure 3 (e.g., a polynucleotide comprising various polynucleotide portions, such as contiguous portions, as set forth in Figure 3). The various portions are described herein and the figure contemplates polynucleotides comprising any combinations of these various portion. Expression values were correlated by comparing raw read counts to UMI counts (Figure 4). Incorporating and counting UMIs helped to reduce the PCR bias.
[0107] Key marker genes among the cells for each time point were measured, and the distribution of expression levels was plotted over time (days 0 to 14) as shown in Figure 5. With the single cell RNA-seq data, the proportions of cells expressing a gene at a given level are observable. Gene detection in single cells was plotted as a histogram showing how many expressed genes were detected per cell (Figure 6). By way of exemplifying the data for a gene, GAPDH was selected as an example of a "housekeeping" gene that shows a burst of transcription and that is a cell cycle-regulated gene. The histogram of Figure 7 represents the distribution of GAPDH expression among the cells profiled at day 0. While
GAPDH usually is present at a constant level of expression in a population of cells, when observed at the single cell level, a significant portion of cells were seen that did not express GAPDH because GAPDH is a cell cycle-regulated gene. Thus, by using the single cell sequencing method, we revealed that, despite its widespread use as a "housekeeping" reference gene, GAPDH is not necessarily a good reference gene especially at the single cell level. This underscores the power of the single cell sequencing methods of the invention.
[0108] A projection of three of the highest components of a principal component analysis based on gene expression are shown in Figures 8 to 13. Each point represents a profiled cell. The cells profiled at day 0 are represented in black, while the cells profiled at the subsequent time points (day 1, day 2, day 3, day 7, and day 14) are shown in gray (or in red if depicted in color). A clear distinction can be seen between the day 0 cells and the cells from subsequent time points. To explore these differences, a Gene Ontology analysis then was performed on the differentially expressed genes between two subpopulations distinguishable at day 14 with the principal component analysis: a subpopulation of genes that clusters with day 0 genes and a subpopulation that is separate from those genes. Key genes that characterize these two day 14 subpopulations were identified and categorized using the Gene Ontology database (Figure 14). The ability to distinguish these subpopulations illustrates the robustness of the methodology. A partial conclusion of these analyses shows the link between the expression of adipocyte genes and G- 1 arrest (Figure 15). Based on this analysis, it appears that one subpopulation fully differentiates, while the other seems to be stuck in the GO phase and cannot fully differentiate. These data were then further used in a comparison of adipogenesis efficiency between a mouse system (3T3-L1) where the differentiation process is much more efficient and for which there is a clonal expansion, and in human cells (hASCs), where this clonal expansion is absent (Figure 16). This clonal expansion may be essential to avoid a subpopulation becoming stuck in the GO phase and resulting in incomplete differentiation. [0109] In conclusion, the data show that the invention provides a useful method for single cell sequencing and single transcript tracking that uses the aggregation of samples and subsequent deconvolution of data. Through this process of aggregation and deconvolution, the sequencing can be performed with less cost and greater efficiency than by traditional sequencing techniques. Moreover, the results obtained here reflect the ability to detect changes and differences across heterogeneous populations when those populations are evaluated at the single cell level. Such changes and differences may be lost (e.g., averaged out) if gene expression across the heterogenous population is instead evaluated.
Example 3: Simultaneous single cell sequencing of 12,832 cells [0110] To further demonstrate the applicability of single cell sequencing methods and compositions (e.g., reagents, nucleic acids, kits) of the disclosure for addressing a range of questions, including questions related to understanding cell and developmental biology, a primary human adipose-derived stem/stromal cell (hASC) differentiation system was used as a test system, akin to that described above. Once again, single cell R A sequencing methods and compositions of the invention was successfully used to survey gene expression in differentiating hASC cultures at single cell resolution. The resulting data reveal the major axes of variation on gene expression, suggest a biological basis for the morphological heterogeneity observed in these cultures, and provide a rich resource for dissection of the regulatory networks involved in adipocyte formation and function beyond what investigations using other techniques have shown. Through advances in sequencing and cell isolation technologies, identification of rare expression programs can be enabled by deeper and more sensitive profiling of every cell, and direct comparison of in vitro and in vivo heterogeneity can be observed through direct profiling of single cells from tissue samples.
[0111] The protocol used in this particular example was as follows.
Cell culture
[0112] Human adipose-derived stem/stromal cells (hASCs) were isolated from lipoaspirates and purified by flow-cytometry (CD29, CD44, CD73, CD90, CD 105 and CD166 positive; CD14, CD31, CD45 and Linl negative) (cells were obtained from Life Technologies). The hASCs were cultured in a 2% reduced serum medium (MesenPro RS, Life Technologies) and expanded for no more than 3 passages. The cultures were then induced to differentiate towards an adipogenic fate after reaching 80% confluency (differentiations Dl and D2) or two days after reaching 100% confluency (differentiation D3) by switching from growth medium to the StemPro adipogenesis differentiation medium (Life Technologies), and were subsequently prepared for further analysis, such as by qPCR or smFISH.
Following induction, the differentiation medium was changed every three days for up to 14 days. The variation in initial conditions (confluency upon differentiation) was introduced to assess the robustness of the subsequent time course data.
Single cell isolation
[0113] Cells were harvested using TrypLE Express (Life Technologies) and medium removed by pelleting the cells in a centrifuge (5 minutes at 1000 rpm). RNA was stabilized by immediately resuspending the pelleted cells in RNAprotect Cell Reagent (Qiagen) and RNaseOUT Recombinant Ribonuclease Inhibitor (Life Technologies) at a 1 : 1000 dilution. Just prior to fluorescence-activated cell sorting (FACS), the cells were diluted in PBS (pH 7.4, no calcium, magnesium or phenol red; Life Technologies) and stained for viability using Hoechst 33342 (Life Technologies). 384-well SBS capture plates were filled with 5μ1 of a 1 :500 dilution of Phusion HF buffer (New England Biolabs) in water and cells were then sorted into each well using a FACSAria II flow cytometer (BD Biosciences) based on Hoechst DNA staining. After sorting, the plates were immediately sealed, spun down, cooled on dry ice, and stored at -80°C. For lipid content-based FACS, cells were also stained with HSC LipidTOX Neutral Lipid Stain (Life Technologies) and sorted according to their relatively "high" or "low" lipid content, either by taking the top and bottom 20% of stained cells (D2) or the top and bottom 50% (D3).
Sequencing of sorted single cells [0114] Frozen cells were thawed for 5 minutes at room temperature. For the second time course (D3) only, lysis conditions further included treating the cells with proteinase K (200μg/mL; Ambion), followed by RNA desiccation to inactivate the proteinase K and simultaneously reduce the reaction volume. The cells were kept at 50 °C for 15 minutes in a sealed plate, then 95 °C for 10 minutes with the seal removed.
Primers
[0115] The primers used, and the resulting products, are as follows.
1st strand cDNA 5*-RNA:NB(A)30-3* 3'-
CCC:cDNA:NV(T)30(N)10[BC6]TCTAGCCTTCTCGCAGCACATCCCTTTCT CACA-5*
2nd strand cDNA 5*-ACACTCTTTCCCTACACGACGCGGG:cDNA:NB(A)30-3*
CCC:cDNA:NV(T)30(N)10[BC6]TCTAGCCTTCTCGCAGCACATCCCTTTCT CACA-5*
Resulting full length cDNA 5*- ACACTCTTTCCCTACACGACGCGGG:cDNA:NB(A)30(N)10[BC6]AGATCG GAAGAGCGTCGTGTAGGGAAAGAGTGT-3*
3*-
TGTGAGAAAGGGATGTGCTGCGCCC:cDNA:NV(T)30(N)10[BC6]TCTAGC CTTCTCGCAGCACATCCCTTTCTCACA-5* Full length cDNA amplification:
Single primer PCR
3-*CGCAGCACATCCCTTTCTCACA-5* 5*-
ACACTCTTTCCCTACACGACGCGGG:cDNA:NB(A)30(N)10[BC6]AGATCG GAAGAGCGTCGTGTAGGGAAAGAGTGT-3*
3*-
TGTGAGAAAGGGATGTGCTGCGCCC:cDNA:NV(T)30(N)10[BC6]TCTAGC CTTCTCGCAGCACATCCCTTTCTCACA-5*
5*-ACACTCTTTCCCTACACGACGC-3* Transposon based library (Nextera)
Tagmentation
5*-ACACTCTTTCCCTACACGACGCTCTTCCGATCT[BC6](N)10(T)30VN- Frag-3'
3*-Frag-GACAGAGAATATGTGTAGAGGCTCGGGTGCTCTG-5*
Library amplification (modified)
3*-GGCTCGGGTGCTCTG[i7]TAGAGCATACGGCAGAAGACGAAC-5*
5*-ACACTCTTTCCCTACACGACGCTCTTCCGATCT[BC6](N)10(T)30VN- Frag-CTGTCTCTTATACACATCTCCGAGCCCACGAGAC-3*
3*-TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA[BC6](N)10(A)30BN- Frag-GACAGAGAATATGTGTAGAGGCTCGGGTGCTCTG-5*
5'-
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTC TTCCGATCT-3*
Resulting library
5'-
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTC TTCCGATCT[BC6](N)10(T)30VN-Frag-
CTGTCTCTTATACACATCTCCGAGCCCACGAGAC[i7]ATCTCGTATGCCG TCTTCTGCTTG-3*
3*-
TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGA GAAGGCTAGA[BC6](N) 10(A)30BN-Frag-
GACAGAGAATATGTGTAGAGGCTCGGGTGCTCTG[i7]TAGAGCATACGG CAGAAGACGAAC-5* Sequencing
Read 1 [BC6] + UMI (N)10 -» 5'-
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTC TTCCGATCT[BC6](N)10(T)30VN-Frag-
CTGTCTCTTATACACATCTCCGAGCCCACGAGAC[i7]ATCTCGTATGCCG TCTTCTGCTTG-3*
3*-
TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGA GAAGGCTAGA[BC6](N)10(A)30BN-Frag-
GACAGAGAATATGTGTAGAGGCTCGGGTGCTCTG[i7]TAGAGCATACGG CAGAAGACGAAC-5*
Read 2 Nextera Index [i7]
<- Read 3: 3 'end cDNA fragment [0116] To start, diluted ERCC RNA Spike-In Mix (Ιμΐ of 1 : 107 for D1/D2 or Ιμΐ of 1 : 106 for D3; Life Technologies) was added to each well, and the template switching reverse transcription reaction described above was carried out using a MMLV Reverse Transcriptase (here, either SmartScribe Reverse Transcriptase (D1/D2; Clontech) or Maxima H Minus Reverse Transcriptase (D3; Thermo Scientific)) with the template-switching oligonucleotide (2 pmol, Eurogentec) (5 '- iCiGiCACACTCTTTCCCTACACGACGCrGrGrG-3' (SEQ ID NO: 17), where iC is iso-dC, iG is iso-dG, and rG is RNA G) and a cDNA synthesis primer (2 pmol, Integrated DNA Technologies) and 5'-
/5Biosg/ACACTCTTTCCCTACACGACGCTCTTCCGATCT[BC6]NNNNNNN NNNTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTVN-3* (SEQ ID NO: 18), wherein 5Biosg represents 5' biotin; V represents a nucleotide selected from A, G, and C; the 3' N represents a nucleotide selected from A, G, C, and T; [BC6] represents a 6 base pair barcode sequence; and the (N)10 after the barcode sequence represents a Unique Molecular Identifier (UMI) sequence (10 base pair barcode). After the template switching reaction, cDNA from 384 wells was pooled together and purified and concentrated using a single DNA Clean & Concentrator- 5 column (Zymo Research). Pooled cDNAs were treated with an exonuclease, in this example Exonuclease I (New England Biolabs), and subsequently amplified by single primer PCR using the Advantage 2 Polymerase Mix (Clontech) and the SINGV6 primer (10 pmol, Integrated DNA Technologies) (5'- /5Biosg/ACACTCTTTCCCTACACGACGC-3' (SEQ ID NO: 19)). Full length cDNAs were purified with Agencourt AMPure XP magnetic beads (0.6x, Beckman Coulter) and quantified on the Qubit 2.0 Flurometer using a dsDNA HS Assay (Life Technologies). The full-length cDNA was then used in the Nextera XT library preparation kit (Illumina) according to the manufacturer's protocol, with the exception that the i5 primer was replaced by a phosphorothioate bond-containing nucleic acid (5μΜ, Integrated DNA Technologies) (5'- AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTC TTCCG*A*T*C*T*-3', where * = phosphorothioate bonds (SEQ ID NO: 3)). The resulting sequencing library was purified with Agencourt AMPure XP magnetic beads (0.6x, Beckman Coulter), size selected (300-800bp) on an E-Gel EX Gel, 2% (Life Technologies), purified using a QIAquick Gel Extraction Kit (Qiagen) and quantified on a Qubit 2.0 Flurometer using a dsDNA HS Assay (Life
Technologies). Libraries were sequenced on an Illumina Hiseq paired-end flow cells with 17 cycles on the first read to decode the well barcode and UMI, an 8 cycle index read to decode the i7 Nextera barcode, and finally a 34 cycle second read to sequence the cDNA. Sequencing on bulk samples
[0117] Populations of both unsorted and sorted cells were lysed in QIAzol (Qiagen) and RNA was extracted and purified using Direct-zol RNA MiniPrep (Zymo Research). Digital gene expression (DGE) libraries for sequencing were prepared from 10 ng of extracted total RNA, using the protocol described above for single cells, with the exception of using more concentrated template-switching and barcoded nucleic acids (10 pmol) and a version of the cDNA synthesis primer that did not contain the well-specific 6bp barcodes but instead a 16bp UMI (Integrated DNA Technologies) (5'-
/5Biosg/ACACTCTTTCCCTACACGACGCTCTTCCGATCT NNNNNN NNN NNNNTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTVN-3* (SEQ ID NO: 404))
Single cell RT-qPCR
[0118] Single cells were sorted into 384-well plates, frozen at -80 °C, thawed for 5 min at room temperature, treated with proteinase K (200μg/mL, Ambion), and desiccated as described above. cDNA synthesis was carried out in each well using Superscript VILO (2μ1 final volume; Life Technologies). qPCR was then performed on the total cDNA output using FAM and VIC Taqman probes (Life Technologies) and processed on an Applied Biosystems ViiA 7 Real-Time PCR system (Life Technologies).
Single-molecule FISH [0119] Probes targeting LPL, G0S2 and TCF25 transcripts were synthesized as amine-conjugated oligonucleotides and then labelled with Cy5 (GE Healthcare), Alexa Fluor 594 (Molecular Probes) or 6-TAMRA (Molecular Probes).
Hybridizations and washes were performed using modifications to previously described procedures (see, e.g., Bienko et al, Nat. Methods 10: 122-124 (2013) and Raj et al, Nat. Methods 5 :877-879 (2008)). Prior to hybridizations, lipids were extracted by incubation of fixed cells in 2: 1 chloroform:methanol for 30 min at room temperature. Cells were washed quickly with 70% ethanol and then resuspended in 200μ1 RNA Hybridization buffer containing 2x SSC buffer, 25%> Formamide, 10% Dextran Sulphate (Sigma), E. coli tRNA (Sigma), Bovine Serum Albumin (Ambion), Ribonucleoside Vanadyl Complex and 150 ng of each desired probe set (the mass refers only to pooled oligonucleotides, excluding fluorophores, and is based on absorbance measurements at 260 nm). Hybridizations were performed for 16-18 h at 30 °C, after which cells were washed twice for 30 min at 30 °C in RNA Wash buffer (containing 2 SSC buffer, Formamide 25% (Ambion) and 100 ng/ml DAPI). For microscopy, cells were resuspended in a mounting solution containing 1 x PBS 0.4% Glucose, 100 μg/ml Catalase, 37 μg/ml Glucose Oxidase and 2 mM Trolox and immobilized on poly-lysine coated chambered cover glasses. Imaging was performed as described above, using an inverted epi- fluorescence microscope (Nikon) equipped with a high-resolution CCD camera (Pixis, Princeton Instruments) and a 100x magnification oil immersion, high numerical aperture Nikon objective. An image stack consisting of 50 image planes spaced 0.3 um apart was acquired per region of interest. Individual images were filtered with a high-pass Fast Fourier Transform filter, where the filter cutoff was chosen to preserve diffraction-limited signals. Filtering was repeated on the resulting image of the maximum projection. Signal positions, widths, and intensities were quantified by fitting 2D Gaussians approximating the point-spread function (PSF) of the microscope. To separate sporadic signals caused by autofluorescence or non-specifically bound probes from real mRNA signals, signals were filtered based on width and signal-to-noise ratio. Cells were segmented manually and signals were assigned to individual cells.
Computational analysis of sequence data
[0120] All second sequence reads were aligned to a reference database containing all human RefSeq mRNA sequences (obtained from the UCSC Genome Browser hgl9 reference set), the human hgl9 mitochondrial reference sequences and the ERCC RNA spike-in reference sequences, using bwa version 0.7.4 4 with non-default parameter "-1 24". Read pairs for which the second read aligned to a human RefSeq gene were kept for further analysis if 1) the initial six bases of the first read all had quality scores of at least 10 and corresponded exactly to a designed well-barcode and 2) the next ten bases of the first read (the UMI) all had quality scores of at least 30. Digital gene expression (DGE) profiles were then generated by counting, for each microplate well and RefSeq gene, the number of unique UMIs associated with that gene in that well. Python scripts were used to implement the alignment and DGE derivation from the samples. Computational analysis of DGE profiles [0121] All computational and statistical analyses were performed using Python 2.7 with the Enthought Canopy Distribution, Numpy 1.8.0 and Scipy 0.13.0, scikit- learn 0.14, and Matplotlib 1.3.1. For each plate, wells with less than 1,000 or more than 10,000 total UMI counts were discarded (24% of all wells, largely low- value wells). The UMI counts for each gene in the remaining wells were then normalized by dividing by the sum of UMI counts across all genes in the same well. This normalization removes variation from differences in RNA content per cell and can be revisited for analyses that are sensitive to this phenomenon. Pairwise Pearson correlations between genes across single cells and their associated p-values were computed using the scikit-learn metrics .pairwise_distances function. The 5% false discovery rate (FDR) thresholds were estimated from the p-value distribution using the Benjamini-Hochberg-Yukeli procedure. The expected null distributions of pairwise correlation coefficients were estimated by permuting expression values across cells from the same time point and re-computing the pairwise correlations 100 times. Principal component analyses (PC A) were performed by first scaling the normalized UMI-derived expression levels of each gene to zero mean and unit variance using the scikit-learn preprocess. scale function and then applying the RandomizedPCA transformation. Each time course dataset was processed separately. To project lipid- sorted cell data into the corresponding time course principal component space (i.e., the three dimensional space represented by the 3 major principal components), the time course and lipid-sorted expression values were concatenated and re-scaled prior to applying the time course PCA
transformation. Gene set enrichment analyses (GSEA) were performed using the GSEAPreRanked module of the GSEA 2.0 software
(http://www.broadinstitute.org/gsea/) with the MSigDB 4.0 gene sets 6. Genes were ranked by the PC weights for interpretation of PC metagenes or by the signal to noise metric (μΑ+μΒ/σΑ-σΒ) for comparisons of low and high lipid cells.
Significant gene sets were called at the threshold recommended by the GSEA developers (25% FDR). Results [0122] A variety of cell populations can be induced to differentiate into adipocytes by treating the cells with cocktails of adipogenic hormones and growth factors. However, the yields of lipid- filled, adipocyte-like cells obtained from these methods are highly variable. Moreover, it is unclear whether this variability reflects heterogeneity in the starting populations, stochastic responses to imperfect differentiation stimuli, or other factors. Thus, adipocyte differentiation was selected as a good model system to test single-cell sequencing. The most commonly used cell line in adipogenesis research is the immortalized murine 3T3- Ll cell line, which supports near complete conversion to adipocyte-like cells. Numerous molecular differences have, however, been found between this cell line and human adipocyte stem cells (hASCs). Single-cell profiling should help clarify the nature of these differences.
[0123] hASC cultures were collected just prior to induction of differentiation (day 0), as well as at seven time points after induction (days 1, 2, 3, 5, 7, 9 and 14). At day 14, approximately two thirds of the cells contained clearly visible lipid droplets while the remainder retained a more fibroblastlike morphology. A nucleic acid stain was used to identify and sort intact single cells into 384-well plates with a fluorescence-activated cell sorter. A neutral lipid stain also was used to separately sort single cells based on their lipid contents. This method allowed us to combine the advantages of FACS sorting, such as staining cells using, for example, a DNA stain or a lipid stain, and selecting specific cells to profile. Additional cells then were collected and sorted from independent cultures at days 0, 3 and 7. In total, single-cell sequencing libraries were prepared from 44 microplates. The plates were sequenced to a mean depth of -165,000 reads per well and the reads aligned to RefSeq transcripts. After stringent filtering on sequence and alignment quality, and then estimating the expression levels in each cell from UMI counts (Figure 18), survey-depth digital gene expression (DGE) profiles were obtained from a total of 12,832 cells (76% of the total wells). As judged by the UMI counts, each DGE profile captured between 1,000 and -10,000 unique mRNAs (mean = 2,602 and 3,336 for the protocols from Example 1 and this Example, respectively), which constitutes a ~4-fold increase in mean library complexity relative to a previous high-throughput protocol (Jaitin et al, Science 343:776-779 (2014)).
[0124] Initial analysis of the resulting data showed that the mean gene expression levels across the single cell profiles were significantly correlated with their corresponding levels from bulk unsorted cells collected at the same time point (r = 0.8, p < 10-100; Figure 17A). Of 15,099 distinct RefSeq genes that were detected at day 0 in bulk unsorted cells, 14,612 (97%) also were detected in at least one single cell from the same day. As expected from the relatively low sequencing coverage, only the most actively transcribed genes were captured from every cell (Figure 19). However, significant positive and negative correlations still could be detected between the expression levels of individual genes across cells collected on the same day (Figure 17B). For example, LPL and G0S2, two traditional markers that are both up-regulated after induction of adipogenesis, had positively correlated expression levels after differentiation (r = 0.23, p < 10-12 on day 7; FDR < 5%). A positive correlation could be validated between these genes both by qRT-PCR analysis of independently sorted single cells (Figure 17C) and in situ by multiplexed single molecule FISH (smFISH; Figure 17D and Figure 20). Thus, the single cell RNA sequencing method tested can capture gene expression variation at single-cell resolution. [0125] To understand the observed cell-to-cell variation in gene expression in more detail, a principal component analysis (PCA) of the initial time course (days 0 to 14; 6,197 cells; Figure 21A-H) was performed. Plotting the position of each cell in the space defined by the first three principal components revealed that there was little overlap between cells from day 0 and cells from later time points. This suggested that addition of the adipogenic differentiation cocktail induced a rapid response in virtually all of the cultured cells. Plotting the positions also revealed that gene expression levels continued to evolve from day 1 to day 14, but that there was substantial overlap between the cells collected at close time points. This is consistent with a population-wide, but asynchronous, response to induction of differentiation. [0126] To explore the biological basis for the observed gene expression variation, the relationships between each of the top principal components (PCs), gene expression and time, were then examined (Figure 22). The PCs can be interpreted as metagenes that capture coordinated expression of multiple genes in the original data set. For each PC, we therefore ranked the genes according to their corresponding PC weights and then looked for evidence of coordinately regulated pathways using gene set enrichment analysis (GSEA). This analysis suggested qualitative biological interpretations for at least the top four PCs.
[0127] The first PC metagene (PCI) was positively associated with genes involved in general cellular metabolism, including the majority of genes involved in ribosome assembly, mitochondrial biogenesis, and oxidative phosphorylation, while it was negatively associated with inflammatory pathways, cytokine production and caspase expression. Variations along PCI reflect differences between metabolically active "healthy" and inactive "unhealthy" cells.
Interestingly, while there was a shift towards the latter state towards day 14, there was substantial overlap between the PCI distributions from all time points, which indicates that this axis of variation was a major contributor to culture heterogeneity prior to induction of differentiation. Because significant cell detachment or death was not observed during the two weeks of differentiation, the inflammation signature likely represents a chronic cell state rather than ongoing apoptosis. By contrast, PC2 was high only in cells collected from day 0, effectively separating these from the differentiating cells. It showed a strong positive association with expression of genes required for progression through the mitotic cell cycle and, to a lesser extent, with genes associated with non-adipogenic differentiation. A decrease in PC2 may therefore reflect an exit from the cell cycle and lineage commitment. Expression of PC3 was high during the first two days post- induction, but steadily decreased as the cells approached day 14. This decrease was associated with up-regulation of lipid homeostasis pathways and markers of adipocyte maturation. PC4 showed a transient drop at day 1 , which was associated with increased expression of genes known to be rapidly induced by adipogenic cocktails, including early adipogenic regulators CEBPB and CEBPD 11. PC4 may therefore reflect an early response to induction of differentiation.
[0128] To explore the relationship between variations in gene expression and in lipid droplet accumulation, an additional 933 cells with high lipid content and an additional 666 cells with low lipid content were collected and analyzed at day 14. When the DGE profiles of these cells were projected into the space defined by the initial time course PCs, the high and low lipid cells were largely separated by their distribution along PCI (Figure 211 and Figure 22). Particularly, cells with higher lipid content showed higher expression of genes related to basic cellular metabolism, while cells with lower lipid content showed higher expression of inflammatory genes. Interestingly, there was substantial overlap along PC3, and while some classic adipocyte markers like FABP4 (aP2) were enriched in the high lipid fraction, key regulatory factors such as PPARG were not. This implies that pathways related to lipid homeostasis and adipocyte maturation had been activated in both fractions.
[0129] Separate PCAs of the second collected time course (2,968 cells from days 0, 3 and 7, and 2,068 additional cells with high or low lipids from day 7) yielded qualitatively similar patterns, which suggests that the observations are robust to technical variation across cell cultures. Thus, while morphological analysis suggested that only a fraction of hASCs respond to the differentiation cocktail, the single-cell data surprisingly show that virtually all of the cells exited the mitotic cell cycle and proceeded to up-regulate an adipogenic gene expression program. The observed variability in lipid droplet accumulation and conversion to mature adipocyte-like morphologies is instead most strongly linked to an inverse correlation in expression of basic cellular metabolism and inflammatory expression programs, which was also present prior to the induction of differentiation.
Notably, cells with low lipid contents showed elevated expression of several proinflammatory regulatory factors, including IRF1, IRF3 and IRF4. These factors have previously been shown to negatively influence total lipid accumulation in murine bulk cultures and in vivo models, which supports a causal link between cell-to-cell variation in expression of these factors and lipid accumulation.
Specific activation in the fraction of low lipid cells may explain the paradoxical increases in expression of these factors that have previously been observed in bulk cultures. Example 4: Protocol for high throughput sequencing
[0130] Although the protocols described above were originally designed to perform RNA sequencing on sorted single cells, they are also suitable for use with other starting samples, such as extracted or purified RNA (bulk RNA sequencing) or a population cells or tissues (e.g., cell or tissue lysates). As with single cell RNA sequencing, using a 3 ' digital gene expression method allows the profiling of a high number of samples in a cost-efficient manner. The protocol is robust for a broad range of input from single cells to pooled cells or extracted RNA. It allows the profiling of a large number of samples of extracted RNA (patient samples for example), profiling of a population of small number of cells (e.g., cell or tissue lysates), as well as analysis of sorted, single cells. Regardless of starting materials, the use of the barcodes and UMIs described herein permit the tracking of individual transcripts to a specific multi-well plate and to a specific well of that plate, thus permitting correlation of data to the original starting material. The above examples are indicative of the powerful applications of the technology. [0131] By way of further example, the ability to correlate expression analysis to a particular well of a multi-well plate (e.g., to the starting sample) is critical in the screening assay context, regardless of whether the material in the screen is a single cell or lysate. Because the bar codes and UMI allow tracking of individual transcripts, sequencing reactions can be run as massive multiplex reactions rather than a series of individual reactions without losing transcript-level data. This results in a significant increase in efficiency and decrease in cost. The sequencing data then can be deconvo luted using, for example, 3 ' digital gene expression to count the number of occurrences of bar code and UMI sequences and obtain an expression level for a particular transcript. [0132] The methods and reagents described herein also are adaptable to other platforms, e.g., micro fluidic systems such as Fluidigm's CI micro fluidic device. For example, the capture of 96 cells was performed on the CI chip, and the reagents and adapters to prepare the cDNA were incorporated directly on the C 1 chip. cDNAs were retrieved as an output of the CI chip, pooled, and prepared as a Nextera library.
[0133] The nucleic acids, methods, and kits of the invention also provide the ability to profile single cells for which it is not possible to do an individual RNA extraction and purification, or, by working directly with lysates, profiling a high number of conditions under which cells are cultivated without necessarily performing a separate RNA extraction and purification step (e.g., if sequencing cells from a high throughput compound screen, it is unnecessary to extract and purify the RNA from each well individually).
[0134] In certain embodiments, one or more of the following modifications to the protocol or reagents used were and can optionally be employed. Specifically, another reverse transcriptase can be used, such as the MMLV Maxima H Minus Reverse Transcriptase (Thermo Scientific). At this point, numerous different MMLV reverse transcriptases have been successfully used and can be selected based on user preference, cost, availability and the like. In certain embodiments, a proteinase or protease, such as proteinase K, may be added during lysis. In certain embodiments, proteinase K is included as part of lysis for sorted single cells and isolated cells/ly sates. Higher concentrations of proteinase K and increased incubation times are used, in certain embodiments, for a pool of cells as compared to single cells. Other modifications include a reduction in the volume of the RT reaction to 2μ1 by drying out the RNA during the proteinase K inactivation to increase reaction efficiency and use of 6-nucleotide barcodes to refer to a sample or pool instead of a single cell when performing sequencing on extracted RNA or a pool of cells.
[0135] For bulk RNA sequencing, lOng of total RNA were used as input, although this amount is flexible. Additionally, reactions were performed in ΙΟμΙ, and the reactions used more concentrated (ΙΟμΜ) template-switching and barcode- containing oligonucleotides. For RNA sequencing of lysates, inputs ranged from single cells to 10,000 cells (including tens or hundreds of cells). For pooled cells, more concentrated proteinase K (2mg/ml instead of lmg/ml for single cells) was used, and the cells were incubated longer (one hour at 50 °C instead of 15 minutes) to increase lysis efficiency.
[0136] An exemplary protocol is as follows.
Capture plate preparation
[0137] Add 5]iL of lysis buffer, composed of a 1/500 dilution of Phusion HF buffer (New England Biolabs, #B0518S) in each well of a collection Twin.tec PCR 384-well plate (Eppendorf, # 951020729).
Cell preparation
[0138] Remove media by pelleting the cells (5min at lOOOrpm), and resuspend the cells in RNAprotect Cell Reagent (-ΙΟΟμί per 100,000 cells, Qiagen, #76526) and Ι μΙ^ of RNaseOUT Recombinant Ribonuclease Inhibitor (Life Technologies, #10777-019). Cells can be stored up to 2 weeks at 4 °C. Next, dilute the cells in ~1.5mL PBS, pH 7.4 (no calcium, no magnesium, no phenol red, Life
Technologies, #10010-049). Stain the cells for viability (DNA staining by Hoechst 33342) with NucBlue Live ReadyProbes Reagent (Life Technologies, #R37605). Cell collection
[0139] Sort individual cells in each well of the 384-well capture plate using the FACSAria II flow cytometer (BD Biosciences). "Live" cells are selected and duplets avoided using the Hoechst DNA staining. After sorting, immediately seal the plates, spin them down, and freeze them on dry ice. Sorted cells are stored at -80 °C. If performing bulk lysate sequencing, which starts with extracted/purified RNA and proceeds directly to reverse transcription/template switching, this step should be skipped. Cell Lysis
[0140] Thaw the cells for 5 minutes at room temperature, then place the plate on ice. Add Ι μΙ, of Proteinase K Solution (diluted to lmg/mL; 1/20;
LifeTechnologies, #AM2548) to each well. Incubate the plate at 50 °C for 15 minutes, then remove the seal and incubate the plate at 95°C for 10 minutes. Place the plate back on ice.
Reverse Transcription/Template Switching
[0141] Denature 42μ1 of a 1 x 10"6 dilution of ERCC RNA Spike-In Mix (Life Technologies, #4456740) for 2 min at 70°C, then place directly on ice. Prepare the following RT/template switching mix (for 384 wells): 160μ1 of 5x RT buffer, 80μ1 of dNTPs (New England Biolabs, #N0447L), 72μ1 of Nuclease-Free Water (not DEPC-Treated) water (LifeTechnologies, #AM9937), 40μ1 of a denatured 1 x 10"6 dilution of ERCC RNA Spike-In Mix (Life Technologies, #4456740), 8μ1 of the universal E5V6NEXT adapter (ΙΟΟμΜ, Eurogentec), and 50μί of Maxima H Minus Reverse Transcriptase (Thermo Scientific, #EP0753). Add Ι μΐ of the mix to each well and Ι μΙ, of the barcoded oligonucleotide adapter (2μΜ, Integrated DNA Technologies to each well. Incubate the plate at 42°C for 1 hour 30 minutes. cDNA pooling and purification
[0142] Pool all 384 wells together, and add 5.5mL of DNA Binding Buffer (Zymo Research, #D4004-1-L) to the pooled cDNAs. Purify all cDNAs pooled from one 384-well plate through one single DNA Clean & Concentrator-5 column (Zymo Research, #D4013). Elute cDNAs in 18 μί of Nuclease-Free Water.
Exonuclease I treatment
[0143] Add 2^L of 10X reaction buffer and Ι μΙ^ of Exonuclease I (New England Biolabs, #M0293L) to the cDNAs. Incubate the reaction at 37°C for 30 minutes, then at 80°C for 20 minutes.
Full length cDNA amplification [0144] Amplify full length cDNA by single primer PCR using the Advantage 2 PCR Enzyme System (Clontech, #639206). The PCR reaction is as follows: 20μΙ, of cDNA from previous step, 5μί of 10X Advantage 2 PCR buffer, ΙμΕ of dNTPs, ΙμΕ of the SINGV6 primer (ΙΟμΜ, Integrated DNA Technologies), ΙμΕ of Advantage 2 Polymerase Mix, and 22μΕ of Nuclease-Free Water. Perform the PCT according to the following program: 95 °C for 1 minutes; 18 cycles of a) 95 °C for 15 seconds, b) 65 °C for 30 seconds, and c) 68°C for 6 minutes; 72 °C for 10 minutes; and, optionally, 4 °C to store the reaction.
Full length cDNA purification and quantification [0145] Purify the full length cDNAs with 30μΕ of Agencourt AMPure XP magnetic beads (Beckman Coulter, #A63880). Elute the full length cDNAs in 12μΕ of Nuclease-Free Water and quantify on the Qubit 2.0 Flurometer (Life Technologies) using the dsDNA HS Assay (Life Technologies. #Q32851).
Sequencing Library Preparation [0146] To increase complexity, all cDNA from the purified full length cDNA is engaged in the Nextera library preparation. If the total amount of cDNA is superior to lng and inferior to lOng, proceed to tagmentation reactions of ~lng according to the Illumina Nextera XT (FC- 131-1024) protocol. After the neutralization step, add 180μ1 DNA Binding Buffer (Zymo Research, #D4004-1-L) to each tagmentation reaction, and pool and purify the tagmentation reactions on one single DNA Clean & Concentrator-5 column (Zymo Research, #D4013).
Then, amplify the tagmented purified cDNA following the Illumina protocol with the exception of running only 10 cycles of PCR, using only the i7 primer to barcode cDNA originating from the same 384-well plate and replacing the i5 primer with P5NEXTPT5, 5μΜ (Integrated DNA Technologies) as the second primer. If the total amount of cDNA is superior to lOng and inferior to 50ng, proceed to the tagmentation using the Nextera DNA kit (FC-121-1030), suitable for 50ng of input. Scale down all reagents and reaction volume according to the input concentration. Purify the tagmented cDNA on a single DNA Clean & Concentrator-5 column (Zymo Research, #D4013) according to the Illumina protocol. Use the 25 μΐ eluted cDNA for the library amplification, and use only the i7 primer to barcode cDNA originating from the same 384-well plate, replacing the i5 primer with P5NEXTPT5, 5μΜ (Integrated DNA Technologies) as the second primer. Do not add the PCR primer cocktail. Perform either 10 cycles (for an input of less than 20ng) or 5 cycles (for an input of 20ng and above) of PCR according to the Illumina protocol.
Sequencing Library Purification and Size Selection
[0147] Purify the sequencing library with 30μί of Agencourt AMPure XP magnetic beads and elute it in 20μί of water. Run the entire library on an E-Gel EX Gel, 2% (Life Technologies, #G4010-02) and excise, purify using the
QIAquick Gel Extraction Kit (Qiagen, #28704), and elute in 15μ1 the band corresponding to a size range of 300 to 800bp.
Sequencing Library Quality Assessment [0148] Quantify the library on the Qubit 2.0 Flurometer using the dsDNA HS Assay. Optionally, the quality and average size of the library can be assessed by BioAnalyzer (Agilent) with the High Sensitivity DNA kit (Agilent, #5067-4626).
Sequencing
[0149] Sequencing can be performed on any Illumina HiSeq or MiSeq, using the standard Illumina sequencing kit. Libraries are run on paired-end flow cells by running 17 cycles on the first end, then 8 cycles to decode the Nextera barcode and finally 46 cycles. Up to twelve Nextera libraries/384-well capture plate, each comprising 384 cells, can be multiplexed together (twelve i7 barcodes currently available) allowing the simultaneous sequencing of up to 4,608 single cell transcriptomes on a single lane.
Exemplary sequences are provided below and herein. Such sequences are merely illustrative of various polynucleotides and components useful in the methods of the present invention. These polynucleotides are suitable across any of the various sample types described herein (e.g., single cells, lysates, bulk RNA, etc.).
Adapter/Primer Sequences
Template-switching oligonucleotide 5 ' -iCiGiC ACACTCTTTCCCTACACGACGCrGrGrG-3 ' (SEQ ID NO : 17) iC : iso-dC iG: iso-dG rG: RNA G
Bar code-containing oligonucleotide adapter 5'-
/5Biosg/ACACTCTTTCCCTACACGACGCTCTTCCGATCT[BC6]NNNNNNN NNNTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTVN-3' (SEQ ID NO: 18)
5Biosg: 5 ' biotin
V: (A, G, or C) N: (A, G, C, or T)
[BC6] : 6bp barcode, different in each well. The barcodes were designed such that each barcode differs from the others by at least two nucleotides, so that a single sequencing error cannot lead to the misidentification of the barcode. (N)10 : Unique Molecular Identifier (UMI). Amplification primer
5 '-/5Biosg/ACACTCTTTCCCTACACGACGC-3 ' (SEQ ID NO: 19) 5Biosg : 5 ' biotin Phosphorothioate bond-containing nucleic acid
5'-
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTC TTCCG*A*T*C*T*-3' (SEQ ID NO: 3) * : phosphorothioate bond

Claims

What is Claimed is:
1. A nucleic acid comprising a 5 ' poly-isonucleotide sequence, an internal adapter sequence, and a 3 ' guanosine tract.
2. The nucleic acid of claim 1, wherein the 5' poly-isonucleotide sequence comprises an isocytosine.
3. The nucleic acid of claims 1 or 2, wherein the 5' poly-isonucleotide sequence comprises an isoguanosine.
4. The nucleic acid of any one of claims 1-3, wherein the 5' poly- isonucleotide sequence comprises an isocytosine-isoguanosine -isocytosine sequence.
5. The nucleic acid of any one of claims 1-4, wherein the 3' guanosine tract comprises two guanosines, three guanosines, four guanosines, five guanosines, six guanosines, seven guanosines, or eight guanosines.
6. The nucleic acid of claim 5, wherein the 3' guanosine tract comprises three guanosines.
7. The nucleic acid of any one of claims 1-6, wherein the adapter sequence is 12 to 32 nucleotides in length.
8. The nucleic acid of claim 7, wherein the adapter sequence is 22 nucleotides in length.
9. The nucleic acid of claim 8, wherein the internal adapter sequence is 5'- ACACTCTTTCCCTACACGACGC-3 ' .
10. A nucleic acid comprising a 5' blocking group, an internal adapter sequence, a barcode sequence, a unique molecular identifier (UMI) sequence, a complementarity sequence, and a 3' dinucleotide sequence comprising a first nucleotide and a second nucleotide, wherein the first nucleotide of the dinucleotide sequence is a nucleotide selected from adenine, guanine, and cytosine, and the second nucleotide of the dinucleotide sequence is a nucleotide selected from adenine, guanine, cytosine, and thymine.
11. The nucleic acid of claim 10, wherein the 5 ' blocking group is selected from biotin and an inverted nucleotide.
12. The nucleic acid of claim 11, wherein the 5' blocking group is biotin.
13. The nucleic acid of any one of claims 10-12, wherein the internal adapter sequence is 23 to 43 nucleotides in length.
14. The nucleic acid of claim 13, wherein the internal adapter sequence is 33 nucleotides in length.
15. The nucleic acid sequence of claim 14, wherein the internal adapter sequence is 5'-ACACTCTTTCCCTACACGACGC-3'.
16. The nucleic acid of any one of claims 10-15, wherein the barcode sequence is 4 to 20 nucleotides in length.
17. The nucleic acid of claim 16, wherein the barcode sequence is 6 nucleotides in length.
18. The nucleic acid of any one of claims 10-17, wherein the UMI sequence is six to 20 nucleotides in length.
19. The nucleic acid of claim 18, wherein the UMI sequence is ten nucleotides in length.
20. The nucleic acid of any one of claims 10-19, wherein the complementarity sequence is a poly(T) sequence.
21. The nucleic acid of any one of claims 10-20, wherein the complementarity sequence is 20 to 40 nucleotides in length.
22. The nucleic acid of claim 21, wherein the complementarity sequence is 30 nucleotides in length.
23. A kit comprising a nucleic acid of any one of claims 1-9.
24. The kit of claim 23, further comprising a nucleic acid of any one of claims 10-23.
25. The kit of claim 24, wherein the kit comprises a plurality of nucleic acids of any one of claims 10-23.
26. The kit of claim 25, wherein the UMI sequence of each nucleic acid in the plurality of nucleic acids is unique among the nucleic acids in the kit.
27. The kit of claim 25 or 26, wherein the plurality of nucleic acids comprises different populations of nucleic acid species.
28. The kit of claim 27, wherein each population of nucleic acid species comprises a different barcode sequence that uniquely identifies a single population of nucleic acid species.
29. The kit of claim 25, wherein each population of nucleic acid species is in a separate container, and the bar code of each population of nucleic acid species differs by at least two nucleotides from the bar code of each other population of nucleic acid species.
30. The kit of any one of claims 23-29, further comprising a third nucleic acid primer comprising 12 to 32 nucleotides and a 5' blocking group.
31. The kit of claim 30, wherein the 5 ' blocking group is selected from biotin and an inverted nucleotide.
32. The kit of claim 31 , wherein the 5 ' blocking group is biotin.
33. The kit of any one of claims 30-32, wherein the third nucleic acid is 22 nucleotides in length.
34. The kit of claim 33, wherein the sequence of the nucleic acid primer is 5'- ACACTCTTTCCCTACACGACGC-3 ' .
35. The kit of any one of claims 23-34, further comprising a nucleic acid comprising a barcode sequence.
36. The kit of any one of claims 23-35, further comprising a phosphorothioate bond-containing nucleic acid comprising an X1 *X2*X3*X4*X5*3' sequence, wherein * is a phosphorothioate bond.
37. The kit of claim 36, wherein the phosphorothioate bond-containing nucleic acid is 48 to 68 nucleotides in length.
38. The kit of claim 37, wherein the phosphorothioate bond-containing nucleic acid is 58 nucleotides in length.
39. The kit of claim 38, wherein the sequence of the phosphorothioate bond- containing nucleic acid is 5'-
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTC TTCCG*A*T*C*T*-3'.
40. The kit of any one of claims 23-39, further comprising a capture plate.
41. The kit of any one of claims 23-40, further comprising a reverse transcriptase enzyme.
42. The kit of claim 41, wherein the reverse transcriptase enzyme is a Moloney Murine Leukemia Virus (MMLV) reverse transcriptase.
43. The kit of claim 42, wherein the MMLV reverse transcriptase is
SMARTscribe™ reverse transcriptase, Superscript II™ reverse transcriptase, or Maxima H Minus™ reverse transcriptase.
44. The kit of any one of claims 23-43, further comprising a DNA purification column.
45. The kit of claim 44, wherein the DNA purification column is a DNA purification spin column.
46. The kit of any one of claims 23-45, further comprising proteinase K.
47. A method for gene profiling, comprising: a) providing a plurality of single cells; b) releasing mRNA from each single cell to provide a plurality of individual mRNA samples, wherein each individual mRNA sample is from a single cell; c) reverse transcribing the individual mRNA samples and performing a template switching reaction to produce cDNA incorporating a barcode sequence; d) pooling and purifying the barcoded cDNA produced from the separate cells; e) amplifying the barcoded cDNA to generate a cDNA library comprising double-stranded cDNA; f) purifying the double-stranded cDNA; g) fragmenting the purified cDNA; h) purifying the cDNA fragments; and i) sequencing the cDNA fragments.
48. A method for gene profiling, comprising: a) providing an isolated population of cells;
b) releasing mRNA from the population of cells to provide one or more mRNA samples;
c) reverse transcribing the one or more mRNA samples and performing a template switching reaction to produce cDNA incorporating a barcode sequence;
d) pooling and purifying the barcoded cDNA;
e) amplifying the barcoded cDNA to generate a cDNA library comprising double-stranded cDNA;
f) purifying the double-stranded cDNA;
g) fragmenting the purified cDNA;
h) purifying the cDNA fragments; and
i) sequencing the cDNA fragments.
49. The method of claim 47 or 48, further comprising separating a population of cells to provide the plurality of single cells.
50. The method of claim 49, wherein the cells are separated into a capture plate.
51. The method of any one of claims 48-50, wherein the cells are separated by flow cytometry.
52. The method of any one of claims 48-50, wherein the mRNA is released by cell lysis.
53. The method of claim 52, wherein the cells are lysed by freeze-thawing.
54. The method of claim 52 or 53, further comprising contacting the cells with proteinase K.
55. The method of any one of claims 47-54, wherein c) comprises contacting each individual mRNA sample with a nucleic acid of any one of claims 1-9 and a nucleic acid of any one of claims 10-22.
56. The method of any one of claims 47-54, wherein c) is carried out with a reverse transcriptase enzyme.
57. The method of claim 56, wherein the reverse transcriptase enzyme is a Moloney Murine Leukemia Virus (MMLV) reverse transcriptase.
58. The method of claim 57, wherein the MMLV reverse transcriptase is SMARTscribe™ reverse transcriptase, Superscript II™ reverse transcriptase, or Maxima H Minus™ reverse transcriptase.
59. The method of any one of claims 47-58, wherein the cDNA purification of d) is carried out with a Zymo-Spin™ column.
60. The method of any one of claims 47-58, further comprising treating the barcoded cDNA with an exonuclease.
61. The method of claim 60, wherein the exonuclease is Exonuclease I.
62. The method of any one of claims 47-61, wherein the amplification of e) utilizes an amplification primer comprising a 5' blocking group.
63. The method of claim 62, wherein the blocking group is selected from biotin and an inverted nucleotide.
64. The method of claim 63, wherein the blocking group is biotin.
65. The method of any one of claims 62-64, wherein the amplification primer is 12 to 32 nucleotides in length.
66. The method of claim 65, wherein the nucleotide is 22 nucleotides in length.
67. The method of claim 66, wherein the sequence of the amplification primer is 5'-ACACTCTTTCCCTACACGACGC-3'.
68. The method of any one of claims 47-67, wherein the purification of f) is carried out with magnetic beads.
69. The method of any one of claims 47-68, wherein f) further comprises quantifying the purified cDNA.
70. The method of any one of claims 47-69, wherein the single cells are provided in a capture plate of individual wells, each well comprising a single cell.
71. The method of any one of claims 47-70, wherein the fragmentation of g) utilizes a transposase.
72. The method of any one of claims 47-71, wherein the fragmentation of g) utilizes a first fragmentation nucleic acid and a second fragmentation nucleic acid, wherein the first fragmentation nucleic acid comprises a barcode sequence.
73. The method of claim 72, wherein the sequence of the first fragmentation nucleic acid is 5'-
CAAGCAGAAGACGGCATACGAGAT[i7]GTCTCGTGGGCTCGG-3 ', wherein [i7] is a nucleic acid sequence.
74. The method of claim 73, wherein [i7] is a nucleic acid sequence between four and 16 nucleotides in length.
75. The method of claim 74, wherein [i7] is eight nucleotides in length.
76. The method of claim 75, wherein the sequence of [i7] is selected from: TCGCCTTA, CTAGTACG, TTCTGCCT, GCTCAGGA, AGGAGTCC,
CATGCCTA, GTAGAGAG, CCTCTCTG, AGCGTAGC, CAGCCTCG, TGCCTCTT, and TCCTCTAC.
77. The method of any one of claims 72-76, wherein the barcode sequence of the first fragmentation nucleic acid is different than the barcode sequence of the nucleic acid of any one of claims 10-22.
78. The method of claim 77, wherein the barcode sequence of the first fragmentation nucleic acid uniquely identifies a predetermined subset of cells.
79. The method of claim 78, wherein the predetermined subset of cells is a subset of cells contained in individual wells of a single capture plate.
80. The method of claim 79, wherein the barcode sequence that uniquely identifies the predetermined subset of cells uniquely identifies the capture plate.
81. The method of any one of claims 77-79, wherein the barcode sequence of the nucleic acid of any one of claims 10-22 uniquely identifies the cell within the predetermined subset of cells, which cell comprised the mR A from which the barcoded cDNA of c) was produced.
82. The method of claim 81 , wherein the barcode sequence that uniquely identifies the cell within the predetermined subset of cells uniquely identifies an individual well in a capture plate.
83. The method of claim 82, wherein the combination of the barcode sequence that uniquely identifies the predetermined subset of cells and the barcode sequence that uniquely identifies the cell within a predetermined subset of cells uniquely identifies the capture plate and the individual well which comprised the cell, which cell comprised the mRNA from which the barcoded cDNA of c) was produced.
84. The method of any one of claims 72-83, wherein the barcode sequence of the first fragmentation nucleic acid is 4 to 20 nucleotides in length.
85. The method of claim 84, wherein the barcode sequence is 6 nucleotides in length.
86. The method of claim 85, wherein the second fragmentation nucleic acid is a phosphorothioate bond-containing nucleic acid comprising an
X1 *X2*X3*X4*X5*3' sequence, wherein * is a phosphorothioate bond.
87. The method of claim 86, wherein the second fragmentation nucleic acid is 48 to 68 nucleotides in length.
88. The method of claim 87, wherein the second fragmentation nucleic acid is 58 nucleotides in length.
89. The method of claim 88, wherein the sequence of the second fragmentation nucleic acid is 5'-
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTC TTCCG*A*T*C*T*-3'.
90. The method of any one of claims 47-89, wherein the purification of h) is carried out with magnetic beads.
91. The method of claim 90, further comprising separating the magnetic-bead purified cDNA on an agarose gel, excising cDNA corresponding to 300 to 800 nucleotides in length, and purifying the excised cDNA.
92. The method of any one of claims 47-91, wherein h) further comprises quantifying the purified cDNA.
93. The method of any one of claims 47-92, wherein the sequencing of i) is carried out using R A-seq.
94. The method of any one of claims 47-93, further comprising assembling a database of the sequences of the sequenced cDNA fragments of j).
95. The method of claim 94, further comprising identifying the UMI sequences of the sequences of the database.
96. The method of claim 95, further comprising discounting duplicate sequences that share a UMI sequence, thereby assembling a set of sequences in which each sequence is associated with a unique UMI.
97. The method of any one of claims 47-96, further comprising repeating a) through h) before i) to produce a plurality of populations of cDNA fragments.
98. The method of claim 97, wherein the populations of cDNA fragments are combined prior to i).
99. The method of any one of claims 72-98, wherein the barcode sequence of the first fragmentation nucleic acid and the barcode sequence of the nucleic acid of any one of claims 10-22 are used to correlate the sequencing data with the predetermined subset of cells and the individual cell.
PCT/US2014/042159 2013-06-12 2014-06-12 High-throughput rna-seq WO2014201273A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/898,030 US20160122753A1 (en) 2013-06-12 2014-06-12 High-throughput rna-seq

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201361834163P 2013-06-12 2013-06-12
US61/834,163 2013-06-12

Publications (1)

Publication Number Publication Date
WO2014201273A1 true WO2014201273A1 (en) 2014-12-18

Family

ID=52022775

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2014/042159 WO2014201273A1 (en) 2013-06-12 2014-06-12 High-throughput rna-seq

Country Status (2)

Country Link
US (1) US20160122753A1 (en)
WO (1) WO2014201273A1 (en)

Cited By (65)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016125106A1 (en) 2015-02-05 2016-08-11 Technion Research & Development Foundation Limited System and method for single cell genetic analysis
WO2016134078A1 (en) * 2015-02-19 2016-08-25 Becton, Dickinson And Company High-throughput single-cell analysis combining proteomic and genomic information
WO2016172373A1 (en) * 2015-04-23 2016-10-27 Cellular Research, Inc. Methods and compositions for whole transcriptome amplification
WO2016191533A1 (en) * 2015-05-26 2016-12-01 The Trustees Of Columbia University In The City Of New York Rna printing and sequencing devices, methods, and systems
US9567646B2 (en) 2013-08-28 2017-02-14 Cellular Research, Inc. Massively parallel single cell analysis
WO2017079593A1 (en) * 2015-11-04 2017-05-11 Atreca, Inc. Combinatorial sets of nucleic acid barcodes for analysis of nucleic acids associated with single cells
US9708659B2 (en) 2009-12-15 2017-07-18 Cellular Research, Inc. Digital counting of individual molecules by stochastic attachment of diverse labels
US9727810B2 (en) 2015-02-27 2017-08-08 Cellular Research, Inc. Spatially addressable molecular barcoding
US20180002749A1 (en) * 2016-06-30 2018-01-04 Grail, Inc. Differential tagging of rna for preparation of a cell-free dna/rna sequencing library
WO2018023068A1 (en) 2016-07-29 2018-02-01 New England Biolabs, Inc. Methods and compositions for preventing concatemerization during template- switching
US9905005B2 (en) 2013-10-07 2018-02-27 Cellular Research, Inc. Methods and systems for digitally counting features on arrays
EP3262214A4 (en) * 2015-02-27 2018-07-25 Fluidigm Corporation Single-cell nucleic acids for high-throughput studies
JP2018526026A (en) * 2015-08-28 2018-09-13 イルミナ インコーポレイテッド Single-cell nucleic acid sequence analysis
WO2018204423A1 (en) * 2017-05-01 2018-11-08 Illumina, Inc. Optimal index sequences for multiplex massively parallel sequencing
WO2018208699A1 (en) * 2017-05-08 2018-11-15 Illumina, Inc. Universal short adapters for indexing of polynucleotide samples
WO2018226293A1 (en) * 2017-06-05 2018-12-13 Becton, Dickinson And Company Sample indexing for single cells
EP3194593B1 (en) * 2014-09-15 2019-02-06 AbVitro LLC High-throughput nucleotide library sequencing
US10202641B2 (en) 2016-05-31 2019-02-12 Cellular Research, Inc. Error correction in amplification of samples
US10240148B2 (en) 2016-08-03 2019-03-26 New England Biolabs, Inc. Methods and compositions for preventing concatemerization during template-switching
US10301677B2 (en) 2016-05-25 2019-05-28 Cellular Research, Inc. Normalization of nucleic acid libraries
US10338066B2 (en) 2016-09-26 2019-07-02 Cellular Research, Inc. Measurement of protein expression using reagents with barcoded oligonucleotide sequences
US10370630B2 (en) 2014-02-10 2019-08-06 Technion Research & Development Foundation Limited Method and apparatus for cell isolation, growth, replication, manipulation, and analysis
WO2019165181A1 (en) * 2018-02-23 2019-08-29 Yale University Single-cell freeze-thaw lysis
WO2019191122A1 (en) * 2018-03-26 2019-10-03 Qiagen Sciences, Llc Integrative dna and rna library preparations and uses thereof
EP3494214A4 (en) * 2016-08-05 2020-03-04 Bio-Rad Laboratories, Inc. Second strand direct
WO2020046833A1 (en) * 2018-08-28 2020-03-05 Cellular Research, Inc. Sample multiplexing using carbohydrate-binding and membrane-permeable reagents
US10619186B2 (en) 2015-09-11 2020-04-14 Cellular Research, Inc. Methods and compositions for library normalization
US10640763B2 (en) 2016-05-31 2020-05-05 Cellular Research, Inc. Molecular indexing of internal sequences
US10641772B2 (en) 2015-02-20 2020-05-05 Takara Bio Usa, Inc. Method for rapid accurate dispensing, visualization and analysis of single cells
CN111406114A (en) * 2017-05-29 2020-07-10 哈佛学院董事及会员团体 Method for amplifying single cell transcriptome
US10718014B2 (en) 2004-05-28 2020-07-21 Takara Bio Usa, Inc. Thermo-controllable high-density chips for multiplex analyses
US10722880B2 (en) 2017-01-13 2020-07-28 Cellular Research, Inc. Hydrophilic coating of fluidic channels
US10822643B2 (en) 2016-05-02 2020-11-03 Cellular Research, Inc. Accurate molecular barcoding
US10941396B2 (en) 2012-02-27 2021-03-09 Becton, Dickinson And Company Compositions and kits for molecular counting
EP3733865A4 (en) * 2017-12-28 2021-09-08 MGI Tech Co., Ltd. Method for obtaining single-cell mrna sequence
US11124823B2 (en) 2015-06-01 2021-09-21 Becton, Dickinson And Company Methods for RNA quantification
US11164659B2 (en) 2016-11-08 2021-11-02 Becton, Dickinson And Company Methods for expression profile classification
EP3940074A1 (en) 2016-07-29 2022-01-19 New England Biolabs, Inc. Methods and compositions for preventing concatemerization during template- switching
WO2022084742A1 (en) * 2020-10-19 2022-04-28 The Hong Kong University Of Science And Technology Simultaneous amplification of dna and rna from single cells
US11319583B2 (en) 2017-02-01 2022-05-03 Becton, Dickinson And Company Selective amplification using blocking oligonucleotides
US11365409B2 (en) 2018-05-03 2022-06-21 Becton, Dickinson And Company Molecular barcoding on opposite transcript ends
US11371076B2 (en) 2019-01-16 2022-06-28 Becton, Dickinson And Company Polymerase chain reaction normalization through primer titration
WO2022133734A1 (en) * 2020-12-22 2022-06-30 Singleron (Nanjing) Biotechnologies, Ltd. Methods and reagents for high-throughput transcriptome sequencing for drug screening
US11397882B2 (en) 2016-05-26 2022-07-26 Becton, Dickinson And Company Molecular label counting adjustment methods
US11460405B2 (en) 2016-07-21 2022-10-04 Takara Bio Usa, Inc. Multi-Z imaging and dispensing with multi-well devices
US11492660B2 (en) 2018-12-13 2022-11-08 Becton, Dickinson And Company Selective extension in single cell whole transcriptome analysis
US11535882B2 (en) 2015-03-30 2022-12-27 Becton, Dickinson And Company Methods and compositions for combinatorial barcoding
US11608497B2 (en) 2016-11-08 2023-03-21 Becton, Dickinson And Company Methods for cell label classification
US11639517B2 (en) 2018-10-01 2023-05-02 Becton, Dickinson And Company Determining 5′ transcript sequences
US11649497B2 (en) 2020-01-13 2023-05-16 Becton, Dickinson And Company Methods and compositions for quantitation of proteins and RNA
US11661625B2 (en) 2020-05-14 2023-05-30 Becton, Dickinson And Company Primers for immune repertoire profiling
US11661631B2 (en) 2019-01-23 2023-05-30 Becton, Dickinson And Company Oligonucleotides associated with antibodies
US11739443B2 (en) 2020-11-20 2023-08-29 Becton, Dickinson And Company Profiling of highly expressed and lowly expressed proteins
GB2581599B (en) * 2017-08-10 2023-08-30 Element Biosciences Inc Tagging nucleic acid molecules from single cells for phased sequencing
US11773441B2 (en) 2018-05-03 2023-10-03 Becton, Dickinson And Company High throughput multiomics sample analysis
US11773436B2 (en) 2019-11-08 2023-10-03 Becton, Dickinson And Company Using random priming to obtain full-length V(D)J information for immune repertoire sequencing
US11788120B2 (en) 2017-11-27 2023-10-17 The Trustees Of Columbia University In The City Of New York RNA printing and sequencing devices, methods, and systems
US11859171B2 (en) 2013-04-17 2024-01-02 Agency For Science, Technology And Research Method for generating extended sequence reads
US11932849B2 (en) 2018-11-08 2024-03-19 Becton, Dickinson And Company Whole transcriptome analysis of single cells using random priming
US11932901B2 (en) 2020-07-13 2024-03-19 Becton, Dickinson And Company Target enrichment using nucleic acid probes for scRNAseq
US11939622B2 (en) 2019-07-22 2024-03-26 Becton, Dickinson And Company Single cell chromatin immunoprecipitation sequencing assay
US11946095B2 (en) 2017-12-19 2024-04-02 Becton, Dickinson And Company Particles associated with oligonucleotides
WO2024081622A1 (en) * 2022-10-11 2024-04-18 The Board Of Trustees Of The Leland Stanford Junior University Improvement to cdna library priming
US11965208B2 (en) 2019-04-19 2024-04-23 Becton, Dickinson And Company Methods of associating phenotypical data and single cell sequencing data
US11970737B2 (en) 2019-08-26 2024-04-30 Becton, Dickinson And Company Digital counting of individual molecules by stochastic attachment of diverse labels

Families Citing this family (70)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10400280B2 (en) 2012-08-14 2019-09-03 10X Genomics, Inc. Methods and systems for processing polynucleotides
US20150376609A1 (en) 2014-06-26 2015-12-31 10X Genomics, Inc. Methods of Analyzing Nucleic Acids from Individual Cells or Cell Populations
US9951386B2 (en) 2014-06-26 2018-04-24 10X Genomics, Inc. Methods and systems for processing polynucleotides
MX364957B (en) 2012-08-14 2019-05-15 10X Genomics Inc Microcapsule compositions and methods.
US10323279B2 (en) 2012-08-14 2019-06-18 10X Genomics, Inc. Methods and systems for processing polynucleotides
US10273541B2 (en) 2012-08-14 2019-04-30 10X Genomics, Inc. Methods and systems for processing polynucleotides
US10752949B2 (en) 2012-08-14 2020-08-25 10X Genomics, Inc. Methods and systems for processing polynucleotides
US10221442B2 (en) 2012-08-14 2019-03-05 10X Genomics, Inc. Compositions and methods for sample processing
US9701998B2 (en) 2012-12-14 2017-07-11 10X Genomics, Inc. Methods and systems for processing polynucleotides
US11591637B2 (en) 2012-08-14 2023-02-28 10X Genomics, Inc. Compositions and methods for sample processing
CA2894694C (en) 2012-12-14 2023-04-25 10X Genomics, Inc. Methods and systems for processing polynucleotides
US10533221B2 (en) 2012-12-14 2020-01-14 10X Genomics, Inc. Methods and systems for processing polynucleotides
CA2900481A1 (en) 2013-02-08 2014-08-14 10X Genomics, Inc. Polynucleotide barcode generation
CN105121664B (en) * 2013-02-20 2018-11-02 埃默里大学 Mixture and its it is compositions related in nucleic acid sequencing approach
AU2014268710B2 (en) 2013-05-23 2018-10-18 The Board Of Trustees Of The Leland Stanford Junior University Transposition into native chromatin for personal epigenomics
US9824068B2 (en) 2013-12-16 2017-11-21 10X Genomics, Inc. Methods and apparatus for sorting data
AU2015243445B2 (en) 2014-04-10 2020-05-28 10X Genomics, Inc. Fluidic devices, systems, and methods for encapsulating and partitioning reagents, and applications of same
US10975371B2 (en) * 2014-04-29 2021-04-13 Illumina, Inc. Nucleic acid sequence analysis from single cells
CA3060708A1 (en) * 2014-04-29 2015-11-05 Illumina, Inc Multiplexed single cell gene expression analysis using template switch and tagmentation
US20160122817A1 (en) 2014-10-29 2016-05-05 10X Genomics, Inc. Methods and compositions for targeted nucleic acid sequencing
US9975122B2 (en) 2014-11-05 2018-05-22 10X Genomics, Inc. Instrument systems for integrated sample processing
SG11201705615UA (en) 2015-01-12 2017-08-30 10X Genomics Inc Processes and systems for preparing nucleic acid sequencing libraries and libraries prepared using same
EP3262188B1 (en) 2015-02-24 2021-05-05 10X Genomics, Inc. Methods for targeted nucleic acid sequence coverage
EP3262407B1 (en) 2015-02-24 2023-08-30 10X Genomics, Inc. Partition processing methods and systems
EP3285926B1 (en) * 2015-04-21 2022-03-02 General Automation Lab Technologies Inc. Kit and method for high throughput microbiology applications
US11371094B2 (en) 2015-11-19 2022-06-28 10X Genomics, Inc. Systems and methods for nucleic acid processing using degenerate nucleotides
SG11201804086VA (en) 2015-12-04 2018-06-28 10X Genomics Inc Methods and compositions for nucleic acid analysis
SG11201806757XA (en) 2016-02-11 2018-09-27 10X Genomics Inc Systems, methods, and media for de novo assembly of whole genome sequence data
WO2017197338A1 (en) 2016-05-13 2017-11-16 10X Genomics, Inc. Microfluidic systems and methods of use
US10465242B2 (en) 2016-07-14 2019-11-05 University Of Utah Research Foundation Multi-sequence capture system
US10550429B2 (en) 2016-12-22 2020-02-04 10X Genomics, Inc. Methods and systems for processing polynucleotides
US10011872B1 (en) 2016-12-22 2018-07-03 10X Genomics, Inc. Methods and systems for processing polynucleotides
US10815525B2 (en) 2016-12-22 2020-10-27 10X Genomics, Inc. Methods and systems for processing polynucleotides
EP4029939B1 (en) 2017-01-30 2023-06-28 10X Genomics, Inc. Methods and systems for droplet-based single cell barcoding
US10995333B2 (en) 2017-02-06 2021-05-04 10X Genomics, Inc. Systems and methods for nucleic acid preparation
US10400235B2 (en) 2017-05-26 2019-09-03 10X Genomics, Inc. Single cell analysis of transposase accessible chromatin
CN116064732A (en) 2017-05-26 2023-05-05 10X基因组学有限公司 Single cell analysis of transposase accessibility chromatin
US10837047B2 (en) 2017-10-04 2020-11-17 10X Genomics, Inc. Compositions, methods, and systems for bead formation using improved polymers
WO2019084043A1 (en) 2017-10-26 2019-05-02 10X Genomics, Inc. Methods and systems for nuclecic acid preparation and chromatin analysis
EP3700672B1 (en) 2017-10-27 2022-12-28 10X Genomics, Inc. Methods for sample preparation and analysis
DK3707723T3 (en) * 2017-11-06 2023-12-18 Illumina Inc TECHNIQUES FOR INDEXING NUCLEIC ACIDS
SG11201913654QA (en) 2017-11-15 2020-01-30 10X Genomics Inc Functionalized gel beads
US10829815B2 (en) 2017-11-17 2020-11-10 10X Genomics, Inc. Methods and systems for associating physical and genetic properties of biological particles
WO2019108851A1 (en) 2017-11-30 2019-06-06 10X Genomics, Inc. Systems and methods for nucleic acid preparation and analysis
WO2019157529A1 (en) 2018-02-12 2019-08-15 10X Genomics, Inc. Methods characterizing multiple analytes from individual cells or cell populations
US11639928B2 (en) 2018-02-22 2023-05-02 10X Genomics, Inc. Methods and systems for characterizing analytes from individual cells or cell populations
SG11202009889VA (en) 2018-04-06 2020-11-27 10X Genomics Inc Systems and methods for quality control in single cell processing
US11932899B2 (en) 2018-06-07 2024-03-19 10X Genomics, Inc. Methods and systems for characterizing nucleic acid molecules
US11703427B2 (en) 2018-06-25 2023-07-18 10X Genomics, Inc. Methods and systems for cell and bead processing
US20200032335A1 (en) 2018-07-27 2020-01-30 10X Genomics, Inc. Systems and methods for metabolome analysis
CN109295050A (en) * 2018-09-26 2019-02-01 刘强 Both-end label specific linkers, kit and the banking process in the library Blood Trace cfDNA
CN109295049A (en) * 2018-09-26 2019-02-01 刘强 Label specific linkers, primer sets and the banking process in the library Blood Trace cfDNA
US11459607B1 (en) 2018-12-10 2022-10-04 10X Genomics, Inc. Systems and methods for processing-nucleic acid molecules from a single cell using sequential co-partitioning and composite barcodes
EP3894593A2 (en) 2018-12-13 2021-10-20 DNA Script Direct oligonucleotide synthesis on cells and biomolecules
US11845983B1 (en) 2019-01-09 2023-12-19 10X Genomics, Inc. Methods and systems for multiplexing of droplet based assays
US11851683B1 (en) 2019-02-12 2023-12-26 10X Genomics, Inc. Methods and systems for selective analysis of cellular samples
US11467153B2 (en) 2019-02-12 2022-10-11 10X Genomics, Inc. Methods for processing nucleic acid molecules
EP3924505A1 (en) 2019-02-12 2021-12-22 10X Genomics, Inc. Methods for processing nucleic acid molecules
US11655499B1 (en) 2019-02-25 2023-05-23 10X Genomics, Inc. Detection of sequence elements in nucleic acid molecules
SG11202111242PA (en) 2019-03-11 2021-11-29 10X Genomics Inc Systems and methods for processing optically tagged beads
US20220228168A1 (en) * 2019-04-29 2022-07-21 The Broad Institute, Inc. Affinity-based multiplexing for live-cell monitoring of complex cell populations
CN110643692A (en) * 2019-07-08 2020-01-03 中山大学中山眼科中心 Analysis method and kit for sequencing single cell transcript isomer
CN111187812A (en) * 2020-01-19 2020-05-22 青岛普泽麦迪生物技术有限公司 Direct sequencing method using low total RNA
US11851700B1 (en) 2020-05-13 2023-12-26 10X Genomics, Inc. Methods, kits, and compositions for processing extracellular molecules
RU2752663C1 (en) * 2020-05-18 2021-07-29 ОБЩЕСТВО С ОГРАНИЧЕННОЙ ОТВЕТСТВЕННОСТЬЮ "СберМедИИ" Method for quantifying the statistical analysis of alternative splicing in rna-sec data
US10941453B1 (en) * 2020-05-20 2021-03-09 Paragon Genomics, Inc. High throughput detection of pathogen RNA in clinical specimens
WO2022182682A1 (en) 2021-02-23 2022-09-01 10X Genomics, Inc. Probe-based analysis of nucleic acids and proteins
CN113322314B (en) * 2021-06-04 2022-11-29 上海交通大学 Novel tissue unicell space transcriptome technology
CN113604540B (en) * 2021-07-23 2022-08-16 杭州圣庭医疗科技有限公司 Method for rapidly constructing RRBS sequencing library by using blood circulation tumor DNA
US11680293B1 (en) 2022-04-21 2023-06-20 Paragon Genomics, Inc. Methods and compositions for amplifying DNA and generating DNA sequencing results from target-enriched DNA molecules

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120010091A1 (en) * 2009-03-30 2012-01-12 Illumina, Inc. Gene expression analysis in single cells

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120010091A1 (en) * 2009-03-30 2012-01-12 Illumina, Inc. Gene expression analysis in single cells

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ISLAM, S. ET AL.: "Characterization of the single- cell transcriptional landscape by highly multiplex RNA-seq", GENOME RESEARCH, vol. 21, 2011, pages 1160 - 1167, XP002682367, doi:10.1101/GR.110882.110 *
KAPTEYN, J. ET AL.: "Incorporation of non- natural nucleotides into template-switching oligonucleotides reduces background and improves cDNA synthesis from very small RNA samples", BMC GENOMICS, vol. 11, 2010, XP021072710, doi:10.1186/1471-2164-11-413 *
SOUMILLON, M. ET AL.: "Characterization of directed differentiation by high- throughput single- cell RNA-Seq", BIORXIV, GENOMICS, 2014, pages 3236/1 - 3236/14, Retrieved from the Internet <URL:http://bioixiv.org/content/early/2014/03/05/003236> [retrieved on 20140827] *

Cited By (130)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10718014B2 (en) 2004-05-28 2020-07-21 Takara Bio Usa, Inc. Thermo-controllable high-density chips for multiplex analyses
US9845502B2 (en) 2009-12-15 2017-12-19 Cellular Research, Inc. Digital counting of individual molecules by stochastic attachment of diverse labels
US10202646B2 (en) 2009-12-15 2019-02-12 Becton, Dickinson And Company Digital counting of individual molecules by stochastic attachment of diverse labels
US10392661B2 (en) 2009-12-15 2019-08-27 Becton, Dickinson And Company Digital counting of individual molecules by stochastic attachment of diverse labels
US10059991B2 (en) 2009-12-15 2018-08-28 Cellular Research, Inc. Digital counting of individual molecules by stochastic attachment of diverse labels
US10047394B2 (en) 2009-12-15 2018-08-14 Cellular Research, Inc. Digital counting of individual molecules by stochastic attachment of diverse labels
US10619203B2 (en) 2009-12-15 2020-04-14 Becton, Dickinson And Company Digital counting of individual molecules by stochastic attachment of diverse labels
US9708659B2 (en) 2009-12-15 2017-07-18 Cellular Research, Inc. Digital counting of individual molecules by stochastic attachment of diverse labels
US9816137B2 (en) 2009-12-15 2017-11-14 Cellular Research, Inc. Digital counting of individual molecules by stochastic attachment of diverse labels
US11634708B2 (en) 2012-02-27 2023-04-25 Becton, Dickinson And Company Compositions and kits for molecular counting
US10941396B2 (en) 2012-02-27 2021-03-09 Becton, Dickinson And Company Compositions and kits for molecular counting
US11859171B2 (en) 2013-04-17 2024-01-02 Agency For Science, Technology And Research Method for generating extended sequence reads
US10927419B2 (en) 2013-08-28 2021-02-23 Becton, Dickinson And Company Massively parallel single cell analysis
US10131958B1 (en) 2013-08-28 2018-11-20 Cellular Research, Inc. Massively parallel single cell analysis
US11702706B2 (en) 2013-08-28 2023-07-18 Becton, Dickinson And Company Massively parallel single cell analysis
US10208356B1 (en) 2013-08-28 2019-02-19 Becton, Dickinson And Company Massively parallel single cell analysis
US10151003B2 (en) 2013-08-28 2018-12-11 Cellular Research, Inc. Massively Parallel single cell analysis
US11618929B2 (en) 2013-08-28 2023-04-04 Becton, Dickinson And Company Massively parallel single cell analysis
US9637799B2 (en) 2013-08-28 2017-05-02 Cellular Research, Inc. Massively parallel single cell analysis
US9598736B2 (en) 2013-08-28 2017-03-21 Cellular Research, Inc. Massively parallel single cell analysis
US9567645B2 (en) 2013-08-28 2017-02-14 Cellular Research, Inc. Massively parallel single cell analysis
US10253375B1 (en) 2013-08-28 2019-04-09 Becton, Dickinson And Company Massively parallel single cell analysis
US10954570B2 (en) 2013-08-28 2021-03-23 Becton, Dickinson And Company Massively parallel single cell analysis
US9567646B2 (en) 2013-08-28 2017-02-14 Cellular Research, Inc. Massively parallel single cell analysis
US9905005B2 (en) 2013-10-07 2018-02-27 Cellular Research, Inc. Methods and systems for digitally counting features on arrays
US10370630B2 (en) 2014-02-10 2019-08-06 Technion Research & Development Foundation Limited Method and apparatus for cell isolation, growth, replication, manipulation, and analysis
US10590483B2 (en) 2014-09-15 2020-03-17 Abvitro Llc High-throughput nucleotide library sequencing
EP3194593B1 (en) * 2014-09-15 2019-02-06 AbVitro LLC High-throughput nucleotide library sequencing
US10400273B2 (en) 2015-02-05 2019-09-03 Technion Research & Development Foundation Limited System and method for single cell genetic analysis
WO2016125106A1 (en) 2015-02-05 2016-08-11 Technion Research & Development Foundation Limited System and method for single cell genetic analysis
EP3766988A1 (en) * 2015-02-19 2021-01-20 Becton, Dickinson and Company High-throughput single-cell analysis combining proteomic and genomic information
WO2016134078A1 (en) * 2015-02-19 2016-08-25 Becton, Dickinson And Company High-throughput single-cell analysis combining proteomic and genomic information
US11098358B2 (en) 2015-02-19 2021-08-24 Becton, Dickinson And Company High-throughput single-cell analysis combining proteomic and genomic information
US10697010B2 (en) 2015-02-19 2020-06-30 Becton, Dickinson And Company High-throughput single-cell analysis combining proteomic and genomic information
US10641772B2 (en) 2015-02-20 2020-05-05 Takara Bio Usa, Inc. Method for rapid accurate dispensing, visualization and analysis of single cells
EP3822361A1 (en) 2015-02-20 2021-05-19 Takara Bio USA, Inc. Method for rapid accurate dispensing, visualization and analysis of single cells
US10002316B2 (en) 2015-02-27 2018-06-19 Cellular Research, Inc. Spatially addressable molecular barcoding
US10190163B2 (en) 2015-02-27 2019-01-29 Fluidigm Corporation Single cell nucleic acids for high-throughput studies
US10954560B2 (en) 2015-02-27 2021-03-23 Fluidigm Corporation Single-cell nucleic acids for high-throughput studies
US9727810B2 (en) 2015-02-27 2017-08-08 Cellular Research, Inc. Spatially addressable molecular barcoding
USRE48913E1 (en) 2015-02-27 2022-02-01 Becton, Dickinson And Company Spatially addressable molecular barcoding
EP3262214A4 (en) * 2015-02-27 2018-07-25 Fluidigm Corporation Single-cell nucleic acids for high-throughput studies
US11535882B2 (en) 2015-03-30 2022-12-27 Becton, Dickinson And Company Methods and compositions for combinatorial barcoding
CN107580632B (en) * 2015-04-23 2021-12-28 贝克顿迪金森公司 Methods and compositions for whole transcriptome amplification
US11390914B2 (en) 2015-04-23 2022-07-19 Becton, Dickinson And Company Methods and compositions for whole transcriptome amplification
CN107580632A (en) * 2015-04-23 2018-01-12 赛卢拉研究公司 Method and composition for the amplification of full transcript profile
WO2016172373A1 (en) * 2015-04-23 2016-10-27 Cellular Research, Inc. Methods and compositions for whole transcriptome amplification
WO2016191533A1 (en) * 2015-05-26 2016-12-01 The Trustees Of Columbia University In The City Of New York Rna printing and sequencing devices, methods, and systems
US11124823B2 (en) 2015-06-01 2021-09-21 Becton, Dickinson And Company Methods for RNA quantification
JP2018526026A (en) * 2015-08-28 2018-09-13 イルミナ インコーポレイテッド Single-cell nucleic acid sequence analysis
JP2020189846A (en) * 2015-08-28 2020-11-26 イルミナ インコーポレイテッド Single cell nucleic acid sequence analysis
JP7351950B2 (en) 2015-08-28 2023-09-27 イルミナ インコーポレイテッド Single cell nucleic acid sequence analysis
JP7035128B2 (en) 2015-08-28 2022-03-14 イルミナ インコーポレイテッド Single cell nucleic acid sequence analysis
JP2022066349A (en) * 2015-08-28 2022-04-28 イルミナ インコーポレイテッド Single cell nucleic acid sequence analysis
US10619186B2 (en) 2015-09-11 2020-04-14 Cellular Research, Inc. Methods and compositions for library normalization
US11332776B2 (en) 2015-09-11 2022-05-17 Becton, Dickinson And Company Methods and compositions for library normalization
US11098304B2 (en) 2015-11-04 2021-08-24 Atreca, Inc. Combinatorial sets of nucleic acid barcodes for analysis of nucleic acids associated with single cells
EP3371309B1 (en) * 2015-11-04 2023-07-05 Atreca, Inc. Combinatorial sets of nucleic acid barcodes for analysis of nucleic acids associated with single cells
CN108473984A (en) * 2015-11-04 2018-08-31 阿特雷卡公司 The group of nucleic acid bar code for analyzing nucleic acid associated with individual cells is combined
WO2017079593A1 (en) * 2015-11-04 2017-05-11 Atreca, Inc. Combinatorial sets of nucleic acid barcodes for analysis of nucleic acids associated with single cells
US10822643B2 (en) 2016-05-02 2020-11-03 Cellular Research, Inc. Accurate molecular barcoding
US10301677B2 (en) 2016-05-25 2019-05-28 Cellular Research, Inc. Normalization of nucleic acid libraries
US11845986B2 (en) 2016-05-25 2023-12-19 Becton, Dickinson And Company Normalization of nucleic acid libraries
US11397882B2 (en) 2016-05-26 2022-07-26 Becton, Dickinson And Company Molecular label counting adjustment methods
US10640763B2 (en) 2016-05-31 2020-05-05 Cellular Research, Inc. Molecular indexing of internal sequences
US10202641B2 (en) 2016-05-31 2019-02-12 Cellular Research, Inc. Error correction in amplification of samples
US11525157B2 (en) 2016-05-31 2022-12-13 Becton, Dickinson And Company Error correction in amplification of samples
US11220685B2 (en) 2016-05-31 2022-01-11 Becton, Dickinson And Company Molecular indexing of internal sequences
US20180002749A1 (en) * 2016-06-30 2018-01-04 Grail, Inc. Differential tagging of rna for preparation of a cell-free dna/rna sequencing library
US10144962B2 (en) * 2016-06-30 2018-12-04 Grail, Inc. Differential tagging of RNA for preparation of a cell-free DNA/RNA sequencing library
US11180801B2 (en) 2016-06-30 2021-11-23 Grail, Llc Differential tagging of RNA for preparation of a cell-free DNA/RNA sequencing library
US11460405B2 (en) 2016-07-21 2022-10-04 Takara Bio Usa, Inc. Multi-Z imaging and dispensing with multi-well devices
WO2018023068A1 (en) 2016-07-29 2018-02-01 New England Biolabs, Inc. Methods and compositions for preventing concatemerization during template- switching
EP3940074A1 (en) 2016-07-29 2022-01-19 New England Biolabs, Inc. Methods and compositions for preventing concatemerization during template- switching
US10246706B2 (en) 2016-08-03 2019-04-02 New England Biolabs, Inc. Methods and compositions for preventing concatemerization during template-switching
US10240148B2 (en) 2016-08-03 2019-03-26 New England Biolabs, Inc. Methods and compositions for preventing concatemerization during template-switching
US10676736B2 (en) 2016-08-05 2020-06-09 Bio-Rad Laboratories, Inc. Second strand direct
US10876112B2 (en) 2016-08-05 2020-12-29 Bio-Rad Laboratories, Inc. Second strand direct
CN113151423A (en) * 2016-08-05 2021-07-23 生物辐射实验室股份有限公司 Second chain guide
US11725206B2 (en) 2016-08-05 2023-08-15 Bio-Rad Laboratories, Inc. Second strand direct
EP3494214A4 (en) * 2016-08-05 2020-03-04 Bio-Rad Laboratories, Inc. Second strand direct
US11782059B2 (en) 2016-09-26 2023-10-10 Becton, Dickinson And Company Measurement of protein expression using reagents with barcoded oligonucleotide sequences
US11467157B2 (en) 2016-09-26 2022-10-11 Becton, Dickinson And Company Measurement of protein expression using reagents with barcoded oligonucleotide sequences
US11460468B2 (en) 2016-09-26 2022-10-04 Becton, Dickinson And Company Measurement of protein expression using reagents with barcoded oligonucleotide sequences
US10338066B2 (en) 2016-09-26 2019-07-02 Cellular Research, Inc. Measurement of protein expression using reagents with barcoded oligonucleotide sequences
US11608497B2 (en) 2016-11-08 2023-03-21 Becton, Dickinson And Company Methods for cell label classification
US11164659B2 (en) 2016-11-08 2021-11-02 Becton, Dickinson And Company Methods for expression profile classification
US10722880B2 (en) 2017-01-13 2020-07-28 Cellular Research, Inc. Hydrophilic coating of fluidic channels
US11319583B2 (en) 2017-02-01 2022-05-03 Becton, Dickinson And Company Selective amplification using blocking oligonucleotides
WO2018204423A1 (en) * 2017-05-01 2018-11-08 Illumina, Inc. Optimal index sequences for multiplex massively parallel sequencing
US11788139B2 (en) 2017-05-01 2023-10-17 Illumina, Inc. Optimal index sequences for multiplex massively parallel sequencing
CN110799653A (en) * 2017-05-01 2020-02-14 伊鲁米那股份有限公司 Optimal index sequences for multiple massively parallel sequencing
US11028435B2 (en) 2017-05-01 2021-06-08 Illumina, Inc. Optimal index sequences for multiplex massively parallel sequencing
US11814678B2 (en) 2017-05-08 2023-11-14 Illumina, Inc. Universal short adapters for indexing of polynucleotide samples
US11028436B2 (en) 2017-05-08 2021-06-08 Illumina, Inc. Universal short adapters for indexing of polynucleotide samples
WO2018208699A1 (en) * 2017-05-08 2018-11-15 Illumina, Inc. Universal short adapters for indexing of polynucleotide samples
CN111406114A (en) * 2017-05-29 2020-07-10 哈佛学院董事及会员团体 Method for amplifying single cell transcriptome
EP3631004A4 (en) * 2017-05-29 2021-03-03 President and Fellows of Harvard College A method of amplifying single cell transcriptome
JP2020521486A (en) * 2017-05-29 2020-07-27 プレジデント アンド フェローズ オブ ハーバード カレッジ Single cell transcriptome amplification method
US10676779B2 (en) 2017-06-05 2020-06-09 Becton, Dickinson And Company Sample indexing for single cells
WO2018226293A1 (en) * 2017-06-05 2018-12-13 Becton, Dickinson And Company Sample indexing for single cells
US10669570B2 (en) 2017-06-05 2020-06-02 Becton, Dickinson And Company Sample indexing for single cells
GB2581599B (en) * 2017-08-10 2023-08-30 Element Biosciences Inc Tagging nucleic acid molecules from single cells for phased sequencing
US11788120B2 (en) 2017-11-27 2023-10-17 The Trustees Of Columbia University In The City Of New York RNA printing and sequencing devices, methods, and systems
US11946095B2 (en) 2017-12-19 2024-04-02 Becton, Dickinson And Company Particles associated with oligonucleotides
US11718872B2 (en) 2017-12-28 2023-08-08 Mgi Tech Co., Ltd. Method for obtaining single-cell mRNA sequence
EP3733865A4 (en) * 2017-12-28 2021-09-08 MGI Tech Co., Ltd. Method for obtaining single-cell mrna sequence
US20210087549A1 (en) * 2018-02-23 2021-03-25 Yale University Single-cell freeze-thaw lysis
WO2019165181A1 (en) * 2018-02-23 2019-08-29 Yale University Single-cell freeze-thaw lysis
EP3755799A4 (en) * 2018-02-23 2021-12-08 Yale University Single-cell freeze-thaw lysis
WO2019191122A1 (en) * 2018-03-26 2019-10-03 Qiagen Sciences, Llc Integrative dna and rna library preparations and uses thereof
US11365409B2 (en) 2018-05-03 2022-06-21 Becton, Dickinson And Company Molecular barcoding on opposite transcript ends
US11773441B2 (en) 2018-05-03 2023-10-03 Becton, Dickinson And Company High throughput multiomics sample analysis
WO2020046833A1 (en) * 2018-08-28 2020-03-05 Cellular Research, Inc. Sample multiplexing using carbohydrate-binding and membrane-permeable reagents
US11639517B2 (en) 2018-10-01 2023-05-02 Becton, Dickinson And Company Determining 5′ transcript sequences
US11932849B2 (en) 2018-11-08 2024-03-19 Becton, Dickinson And Company Whole transcriptome analysis of single cells using random priming
US11492660B2 (en) 2018-12-13 2022-11-08 Becton, Dickinson And Company Selective extension in single cell whole transcriptome analysis
US11371076B2 (en) 2019-01-16 2022-06-28 Becton, Dickinson And Company Polymerase chain reaction normalization through primer titration
US11661631B2 (en) 2019-01-23 2023-05-30 Becton, Dickinson And Company Oligonucleotides associated with antibodies
US11965208B2 (en) 2019-04-19 2024-04-23 Becton, Dickinson And Company Methods of associating phenotypical data and single cell sequencing data
US11939622B2 (en) 2019-07-22 2024-03-26 Becton, Dickinson And Company Single cell chromatin immunoprecipitation sequencing assay
US11970737B2 (en) 2019-08-26 2024-04-30 Becton, Dickinson And Company Digital counting of individual molecules by stochastic attachment of diverse labels
US11773436B2 (en) 2019-11-08 2023-10-03 Becton, Dickinson And Company Using random priming to obtain full-length V(D)J information for immune repertoire sequencing
US11649497B2 (en) 2020-01-13 2023-05-16 Becton, Dickinson And Company Methods and compositions for quantitation of proteins and RNA
US11661625B2 (en) 2020-05-14 2023-05-30 Becton, Dickinson And Company Primers for immune repertoire profiling
US11932901B2 (en) 2020-07-13 2024-03-19 Becton, Dickinson And Company Target enrichment using nucleic acid probes for scRNAseq
WO2022084742A1 (en) * 2020-10-19 2022-04-28 The Hong Kong University Of Science And Technology Simultaneous amplification of dna and rna from single cells
US11739443B2 (en) 2020-11-20 2023-08-29 Becton, Dickinson And Company Profiling of highly expressed and lowly expressed proteins
WO2022133734A1 (en) * 2020-12-22 2022-06-30 Singleron (Nanjing) Biotechnologies, Ltd. Methods and reagents for high-throughput transcriptome sequencing for drug screening
WO2024081622A1 (en) * 2022-10-11 2024-04-18 The Board Of Trustees Of The Leland Stanford Junior University Improvement to cdna library priming

Also Published As

Publication number Publication date
US20160122753A1 (en) 2016-05-05

Similar Documents

Publication Publication Date Title
US20160122753A1 (en) High-throughput rna-seq
Enderle et al. Characterization of RNA from exosomes and other extracellular vesicles isolated by a novel spin column-based method
EP3289097B2 (en) Error suppression in sequenced dna fragments using redundant reads with unique molecular indices (umis)
EP3529374B1 (en) Sequencing and analysis of exosome associated nucleic acids
US9133513B2 (en) High throughput methylation detection method
CN102329876B (en) Method for measuring nucleotide sequence of disease associated nucleic acid molecules in sample to be detected
Coudry et al. Successful application of microarray technology to microdissected formalin-fixed, paraffin-embedded tissue
EP3470530A1 (en) Transposition into native chromatin for personal epigenomics
US20100120097A1 (en) Methods and compositions for nucleic acid sequencing
US20100035249A1 (en) Rna sequencing and analysis using solid support
RU2753883C2 (en) Set of probes for analyzing dna samples and methods for their use
TW201321518A (en) Method of micro-scale nucleic acid library construction and application thereof
JP2009072062A (en) Method for isolating 5&#39;-terminals of nucleic acid and its application
KR20120037992A (en) Nucleic acid analysis
US20140272993A1 (en) Method of sequencing a full microrna profile from cerebrospinal fluid
JP7248228B2 (en) Methods and kits for construction of RNA libraries
US20060063181A1 (en) Method for identification and quantification of short or small RNA molecules
CN115997032A (en) Method for detecting whole transcriptome in single cell
US20220002797A1 (en) Full-length rna sequencing
KR101767644B1 (en) Composition and method for prediction of pigs litter size using gene expression profile
WO2022067494A1 (en) Method for detection of whole transcriptome in single cells
Bhattacharya et al. Experimental toolkit to study RNA level regulation
US20230002755A1 (en) Method for producing non-ribosomal rna-containing sample
US20230193356A1 (en) Single cell combinatorial indexing from amplified nucleic acids
EP3725880A1 (en) Method and kit for the purification of functional risc-associated small rnas

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14810232

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14810232

Country of ref document: EP

Kind code of ref document: A1