WO2014201273A1 - High-throughput rna-seq - Google Patents
High-throughput rna-seq Download PDFInfo
- Publication number
- WO2014201273A1 WO2014201273A1 PCT/US2014/042159 US2014042159W WO2014201273A1 WO 2014201273 A1 WO2014201273 A1 WO 2014201273A1 US 2014042159 W US2014042159 W US 2014042159W WO 2014201273 A1 WO2014201273 A1 WO 2014201273A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- nucleic acid
- sequence
- cells
- cdna
- nucleotides
- Prior art date
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1034—Isolating an individual clone by screening libraries
- C12N15/1065—Preparation or screening of tagged libraries, e.g. tagged microorganisms by STM-mutagenesis, tagged polynucleotides, gene tags
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6844—Nucleic acid amplification reactions
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
- C12Q1/6874—Methods for sequencing involving nucleic acid arrays, e.g. sequencing by hybridisation
Definitions
- the present invention relates generally to methods for single-cell nucleic acid profiling, and nucleic acids useful in those methods.
- it concerns using barcode sequences to track individual nucleic acids at single-cell resolution, utilizing template switching and sequencing reactions to generate the nucleic acid profiles.
- the methods and compositions provided herein are also applicable to other starting materials, such as cell and tissue lysates or extracted/purified RNA. Background of the Invention
- transcriptome profiling is an important method for functional characterization of cells and tissues
- current technical limitations for whole transcriptome analysis limit the technique to either population averages or to a limited number of single cells.
- These shortcomings limit transcriptome profiling 's ability to accurately assess stochastic variation in gene expression between individual cells and the analysis of distinct subpopulations of cells, both of which have been proposed to be important factors driving cellular differentiation and tissue homeostasis.
- current single-cell transcriptome profiling methods in addition to being limited to a relatively low number of cells, also are expensive and labor-intensive. Improved methods are therefore required to fully characterize a cell population at single-cell resolution. Such improved methods also have utility in improving analysis of other starting materials, such as cell and tissue lysates or extracted/purified R A.
- the invention provides a nucleic acid comprising a 5' poly-isonucleotide sequence (for example, comprising an isocytosine, an isoguanosine, or both, such as an isocytosine -isoguanosine-isocytosine sequence), an internal adapter sequence, and a 3' guanosine tract.
- the 3' guanosine tract can comprise two guanosines, three guanosines, four guanosines, five guanosines, six guanosines, seven guanosines, or eight guanosines.
- the 3' guanosine tract comprises three guanosines.
- the adapter sequence can be 12 to 32 nucleotides in length, for example, 22 nucleotides in length (e.g., an adapter sequence of 5'-ACACTCTTTCCCTACACGACGC-3' (SEQ ID NO: 1)).
- the invention provides a nucleic acid comprising a 5' blocking group (e.g., biotin or an inverted nucleotide), an internal adapter sequence, a barcode sequence, a unique molecular identifier (UMI) sequence, a complementarity sequence, and a 3' dinucleotide sequence comprising a first nucleotide and a second nucleotide, wherein the first nucleotide of the dinucleotide sequence is a nucleotide selected from adenine, guanine, and cytosine, and the second nucleotide of the dinucleotide sequence is a nucleotide selected from adenine, guanine, cytosine, and thymine.
- a 5' blocking group e.g., biotin or an inverted nucleotide
- UMI unique molecular identifier
- the internal adapter sequence is 23 to 43 nucleotides in length, for example, 33 nucleotides in length (e.g., an internal adapter sequence of 5'- ACACTCTTTCCCTACACGACGC-3' (SEQ ID NO: 1)).
- an internal adapter sequence of 5'- ACACTCTTTCCCTACACGACGC-3' SEQ ID NO: 1
- the barcode sequence is 4 to 20 nucleotides in length, for example, 6 nucleotides in length. In certain embodiments, the UMI sequence is six to 20 nucleotides in length, for example, ten nucleotides in length. In some
- the complementarity sequence is a poly(T) sequence, and may be 20 to 40 nucleotides in length, for example, 30 nucleotides in length.
- the invention provides a kit comprising one or more nucleic acids as described above, for example a) a nucleic acid comprising a 5 ' poly-isonucleotide sequence, an internal adapter sequence, and a 3 ' guanosine tract, b) a nucleic acid comprising a 5' blocking group (e.g., biotin or an inverted nucleotide), an internal adapter sequence, a barcode sequence, a unique molecular identifier (UMI) sequence, a complementarity sequence, and a 3 ' dinucleotide sequence comprising a first nucleotide and a second nucleotide, wherein the first nucleotide of the dinucleotide sequence is a nucleotide selected from adenine,
- UMI unique molecular
- the kit comprises a plurality of the nucleic acids of b).
- the UMI sequence of each nucleic acid in the plurality of nucleic acids is unique among the nucleic acids in the kit, and in still further embodiments, the plurality of nucleic acids comprises different populations of nucleic acid species.
- each population of nucleic acid species may comprise a different barcode sequence that uniquely identifies a single population of nucleic acid species.
- each population of nucleic acid species is in a separate container, and the bar code of each population of nucleic acid species differs by at least two nucleotides from the bar code of each other population of nucleic acid species.
- a kit of the invention may further comprise a third nucleic acid primer comprising 12 to 32 nucleotides (e.g., 22 nucleotides in length) and a 5' blocking group (e.g., biotin or an inverted nucleotide).
- An exemplary sequence of such a primer is 5'-ACACTCTTTCCCTACACGACGC-3' (SEQ ID NO: 2).
- a kit may further comprise a nucleic acid comprising a barcode sequence, and optionally also comprise a phosphorothioate bond-containing nucleic acid comprising an ⁇ 1 * ⁇ 2* ⁇ 3* ⁇ 4* ⁇ 5*3' sequence, wherein * is a phosphorothioate bond.
- the phosphorothioate bond-containing nucleic acid is 48 to 68 nucleotides in length, for example, 58 nucleotides in length.
- An exemplary sequence of a phosphorothioate bond-containing nucleic acid is
- the kit further comprises a capture plate and/or a reverse transcriptase enzyme, such as a Moloney Murine Leukemia Virus
- MMLV reverse transcriptase e.g., SMARTscribeTM reverse transcriptase or Superscript IITM reverse transcriptase or Maxima H MinusTM reverse transcriptase
- DNA purification column such as a DNA purification spin column
- protease or proteinase e.g., proteinase K
- the invention provides a method for gene profiling, comprising a) providing a plurality of single cells; b) releasing mRNA from each single cell to provide a plurality of individual mRNA samples, wherein each individual mRNA sample is from a single cell; c) reverse transcribing the individual mRNA samples and performing a template switching reaction to produce cDNA incorporating a barcode sequence; d) pooling and purifying the barcoded cDNA produced from the separate cells; e) amplifying the barcoded cDNA to generate a cDNA library comprising double-stranded cDNA; f) purifying the double-stranded cDNA; g) fragmenting the purified cDNA; h) purifying the cDNA fragments; and i) sequencing the cDNA fragments.
- the invention provides a method for gene profiling, comprising a) providing an isolated population of cells; b) releasing mRNA from the population of cells to provide one or more mRNA samples; c) reverse transcribing the one or more mRNA samples and performing a template switching reaction to produce cDNA incorporating a barcode sequence; d) pooling and purifying the barcoded cDNA; e) amplifying the barcoded cDNA to generate a cDNA library comprising double-stranded cDNA; f) purifying the double-stranded cDNA; g) fragmenting the purified cDNA; h) purifying the cDNA fragments; and i) sequencing the cDNA fragments.
- the method further comprises separating a population of cells (e.g., by flow cytometry) to provide the plurality of single cells, for example, by separating them into a capture plate.
- a population of cells can be sorted into a capture plate such that each well of the capture plate contains a smaller population of cells.
- cell lysate or R A samples can be divided into a capture plate.
- the mR A is released by cell lysis, for example, by freeze-thawing and/or contacting the cells with proteinase K.
- c) comprises contacting each individual mRNA sample with one or more nucleic acids as described above, for example i) a nucleic acid comprising a 5 ' poly-isonucleotide sequence, an internal adapter sequence, and a 3 ' guanosine tract, ii), a nucleic acid comprising a 5 ' blocking group (e.g., biotin or an inverted nucleotide), an internal adapter sequence, a barcode sequence, a unique molecular identifier (UMI) sequence, a complementarity sequence, and a 3' dinucleotide sequence comprising a first nucleotide and a second nucleotide, wherein the first nucleotide of the dinucleotide sequence is a nucleotide selected from adenine, guanine, and cytosine, and the second nucleotide of the dinucleotide sequence is a nucleotide selected from aden
- c) is carried out with a reverse transcriptase enzyme, for example, a Moloney Murine Leukemia Virus (MMLV) reverse transcriptase such as SMARTscribeTM reverse transcriptase or Superscript IITM reverse transcriptase or Maxima H MinusTM reverse transcriptase.
- MMLV Moloney Murine Leukemia Virus
- the cDNA purification of d) is carried out with a Zymo-SpinTM column.
- the method further comprises treating the barcoded cDNA with an exonuclease, such as with Exonuclease I.
- an exonuclease such as with Exonuclease I.
- the amplification of e) utilizes an amplification primer comprising a 5' blocking group, such as biotin or an inverted nucleotide.
- amplification primers are 12 to 32 nucleotides in length, for example, 22 nucleotides in length (e.g., as in the amplification primer having the sequence of 5'-ACACTCTTTCCCTACACGACGC-3' (SEQ ID NO: 2)).
- the purification of f) may be carried out with magnetic beads, e.g., Agencourt AMPure XP magnetic beads (Beckman Coulter, #A63880), and/or may further comprise quantifying the purified cDNA.
- the single cells are provided in a capture plate of individual wells (e.g., a 384 well plate), each well comprising a single cell.
- a population of cells is provided in a capture plate, each well comprising a population of cells.
- cell lysate or RNA samples can be provided in a capture plate.
- a particular sample such as a sample in a well of a plate
- that sample may be a single cell or some other sample, such as a lysate or bulk RNA.
- reference to a "well” or “sample” should be understood to refer to any of those types of samples.
- reference to "cell/well” or “well/cell” is similarly used to reflect that a sample may be a single cell or some other sample.
- the fragmentation of g) utilizes a transposase, and may further utilize a first fragmentation nucleic acid and a second
- first fragmentation nucleic acid wherein the first fragmentation nucleic acid comprises a barcode sequence.
- An exemplary first fragmentation nucleic acid is 5'- C AAGC AG AAG AC GGC AT AC GAG AT [i7] GT CTC GTGGGCTC GG-3 ' (SEQ ID NO: 4), wherein [i7] represents a barcode sequence.
- the first fragmentation nucleic acid comprises a barcode sequence.
- [i7] sequence is four to 16 nucleotides in length, for example, eight nucleotides in length.
- the [i7] sequence uniquely identifies a single population of nucleic acid species, for example, a population of nucleic acid species derived from a population of single cells from a capture plate.
- the [i7] sequence is selected from: TCGCCTTA (SEQ ID NO: 5),
- CTAGTACG (SEQ ID NO: 6), TTCTGCCT (SEQ ID NO: 7), GCTCAGGA (SEQ ID NO: 8), AGGAGTCC (SEQ ID NO: 9), CATGCCTA (SEQ ID NO: 10), GTAGAGAG (SEQ ID NO: 11), CCTCTCTG (SEQ ID NO: 12), AGCGTAGC (SEQ ID NO: 13), CAGCCTCG (SEQ ID NO: 14), TGCCTCTT (SEQ ID NO: 15), and TCCTCTAC (SEQ ID NO: 16).
- the barcode sequence of the first fragmentation nucleic acid is different than the barcode sequence of the nucleic acid described in ii) above.
- the barcode sequence of the first fragmentation nucleic acid uniquely identifies a predetermined subset of cells, for example, a subset of cells contained in individual wells of a single capture plate. In further embodiments, the barcode sequence that uniquely identifies the predetermined subset of cells uniquely identifies the capture plate. In certain embodiments, the barcode sequence of the nucleic acid as described in ii) above uniquely identifies the cell within the predetermined subset of cells, which cell comprised the m NA from which the barcoded cDNA of c) was produced. In further embodiments, the barcode sequence that uniquely identifies the cell within the predetermined subset of cells uniquely identifies an individual well in a capture plate, and in still further embodiments, the
- the barcode sequence of the first fragmentation nucleic acid is 4 to 20 nucleotides in length, for example, 6 nucleotides in length.
- the second fragmentation nucleic acid is a phosphorothioate bond-containing nucleic acid comprising an X1 *X2*X3*X4*X5*3' sequence, wherein * is a phosphorothioate bond.
- An exemplary second fragmentation nucleic acid is 48 to 68 nucleotides in length, e.g., 58 nucleotides in length, such as a second fragmentation nucleic acid with a sequence of 5'-
- the purification of h) is carried out with magnetic beads, and may optionally further comprise separating the magnetic-bead purified cDNA on an agarose gel, excising cDNA corresponding to 300 to 800 nucleotides in length, and purifying the excised cDNA.
- h) further comprises quantifying the purified cDNA.
- the sequencing of i) is carried out using R A-seq.
- the method further comprises assembling a database of the sequences of the sequenced cDNA fragments of j), and may additionally comprise identifying the UMI sequences of the sequences of the database.
- j) further comprises discounting duplicate sequences that share a UMI sequence, thereby assembling a set of sequences in which each sequence is associated with a unique UMI.
- a) through h) are repeated before i) to produce a plurality of populations of cDNA fragments, and in particular embodiments, the populations of cDNA fragments are combined prior to i).
- the barcode sequence of the first fragmentation nucleic acid and the barcode sequence of the nucleic acid as described in ii) above are used to correlate the sequencing data with the predetermined subset of cells and the individual cell.
- Figure 1 depicts incomplete differentiation of human adipose tissue - derived stromal/stem cells (hASCs) in vitro.
- Figure 1 A cells at day 0.
- Figure IB cells at day 7 (i.e., on the seventh day after the cells were induced to differentiate).
- Figure 1C cells at day 14 (i.e., on the fourteenth day after the cells were induced to differentiate).
- Figure 2 depicts a flow chart of an exemplary method for single cell RNA sequencing.
- Figure 3 depicts how a single cell digital gene expression library was constructed, including barcode sequences incorporating sequencing primer sequences, indicated by arrows, and regions that anneal to their complementary oligonucleotides on a flow cell during sequencing (P5 and P7).
- N 6 cell/well barcode index
- N 10 Unique Molecular Identifier (UMI).
- the sequencing primer with an i7 plate index is indicated by an arrow, and the two sequencing primers (read 1 and read 2) also are indicated by arrows.
- Figure 4 depicts a reduction in PCR bias through the use of Unique Molecular Identifier (UMI) sequences.
- UMI Unique Molecular Identifier
- Figure 5 depicts distributions of expression levels of the key marker genes FABP4 (Figure 5A), SCD (Figure 5B), LPL (Figure 5C), and POSTN ( Figure 5D) during adipocyte differentiation. Particularly, Figure 5 depicts the expression levels of gene across the cells/wells over time such that the position on the y axis shows the level of expression and the thickness of the bar shows the number of cells expressing at that level.
- Figure 6 depicts gene detection in single cells. Approximately 3,000 to 4,000 unique genes were detected per cell and approximately 15,000 unique genes were detected across all cells. Gene expression was reliably detected at approximately 25 to 50 transcripts per cell, although bursty transcription
- Figure 7 depicts GAPDH detection at day 0.
- Figure 7 A depicts a histogram showing the distribution of GAPDH expression among cells profiled at day 0 as an exemplification of a transcriptional burst.
- Figure 7B depicts genes associated with GAPDH.
- Figure 7C provides a pictorial representation of the cell cycle. GAPDH is considered to be a housekeeping gene and often is used as a reference gene for normalization.
- Figure 8 depicts principal component analysis of an hASC population at day O.
- Figure 9 depicts principal component analysis of an hASC population at day 0 (black) and day 1 (gray).
- Figure 10 depicts principal component analysis of an hASC population at day 0 (black) and day 2 (gray).
- Figure 11 depicts principal component analysis of an hASC population at day 0 (black) and day 3 (gray).
- Figure 12 depicts principal component analysis of an hASC population at day 0 (black) and day 7 (gray).
- Figure 13 depicts principal component analysis of an hASC population at day 0 (black) and day 14 (gray).
- Figure 14 depicts differentially expressed genes between day 0 (black) and day 14 (gray) hASC populations and between day 14 sub-populations.
- Figure 15 depicts the expression of adipocyte genes correlating with Gl- arrest. Genes that had similar expression levels at Day 14 and Day 0 ( Figure 15 A, label A) correspond to categories of genes involved in G-l arrest ( Figure 15B, label A), indicating that those cells that did not fully differentiate may be stuck in the GO phase. This reveals a correlation between differentiation state and cell cycle progression when gene expression is analyzed at the single cell level.
- Figure 16 depicts the process of adipocyte differentiation in mouse (3T3- Ll) and human (hASC) stem cells, and that an absence of clonal expansion of hASCs may limit adipogenesis.
- Figure 17 depicts cell culture heterogeneity using single-cell sequencing.
- Figure 17A depicts gene expression estimates from bulk cells compared to their corresponding means across single cell profiles.
- UPM unique molecular identifier (UMI) counts for one gene per million UMI counts for all genes.
- Figures 17C and 17D depict single cell qPvT-PCR validation and single molecule FISH validation, respectively, of the observed positive correlation between the LPL and G0S2 markers from separate cells also collected at day 7.
- Figure 18 depicts a comparison of RefSeq gene expression levels as estimated from the total number of raw aligned sequencing reads or the total number of unique UMIs. Each dot compares the mean raw counts across all profiled cells in the first time course (Dl) to the mean UMI counts for the same gene. The raw and UMI counts are strongly correlated, but the UMI counts correct for a systematic bias in the raw expression levels of a subset of genes, which is likely caused by preferential PCR amplification or sequencing.
- Figure 20 depicts a comparison of single-molecule RNA sequencing ( Figure 20A) and single molecule FISH (smFISH, Figure 20B) data for LPL and G0S2 during the D3 time course.
- Single -molecule RNA sequencing values are in UPM, while smFISH measurements are in mRNAs detected per cell.
- the smFISH data confirm the positive correlation between LPL and G0S2 after 7 days of differentiation.
- R Pearson's correlation coefficient.
- Figure 21 depicts gene expression dynamics at single cell resolution. Each scatter plot depicts the first three principal components (PCs) of the initial hASC time course at the indicated time point ( Figure 21 A: day 0; Figure 21B: day 1; Figure 21C: day 2; Figure 21D: day 3; Figure 21E: day 5; Figure 21F: day 7; Figure 21G: day 9; Figure 21H: day 14). Black dots show cells collected at the indicated time point, while gray dots show cells collected at all previous time points.
- Figure 211 depicts separately sorted cells with high and low lipid content from day 14 projected into the same PC space.
- Figure 22 depicts distributions of weights for the top four PCs in an initial hASC time course and a lipid-based sorting.
- selected genes and gene sets associated with positive and negative weights are provided. Percentages indicate the ratio of the total variance in the data set captured by each PC. Horizontal lines within each set of boxes indicate medians, boxes indicate the 1st and 3rd quartiles, and whiskers indicate the ranges.
- the present invention provides nucleic acids, kits, and methods for transcriptome-wide profiling at single cell resolution.
- the invention provides Unique Molecular Identifiers (UMIs) (e.g., polynucleotides comprising UMIs) that specifically tag individual cDNA species as they are created from mRNA, thereby acting as a robust guard against amplification biases.
- UMIs Unique Molecular Identifiers
- Each UMI enables a sequenced cDNA to be traced back to a single particular mRNA molecule that was present in a cell.
- the invention provides two levels of barcode-based multiplexing, allowing a sequenced cDNA to be traced to a particular cell from among a subset of cells.
- the invention provides efficient transposon-based fragmentation, resulting in high yield cDNA libraries.
- the invention provides sequencing of the 3 '-end of mRNAs, limiting the sequencing coverage required to assess gene expression level of each single cell transcriptome.
- the methods allow the preparation of RNA-seq libraries in a manner that is not labor-intensive or time- consuming. Indeed, RNA-seq libraries of a thousand single cells can be easily prepared in two days. Any of the foregoing (or any of the nucleic acids, reagents, kits, and methods described herein may be provided and/or used alone or in any combination).
- the invention also provides nucleic acids, kits, and methods for sequencing of extracted/purified RNA (bulk RNA sequencing) or for analysis of an isolated population of cells (e.g., from an isolated population of cells or a tissue; analysis of a cell or tissue lysate).
- bulk RNA sequencing or for analysis of an isolated population of cells (e.g., from an isolated population of cells or a tissue; analysis of a cell or tissue lysate).
- any of the compositions, reagents, and methods described herein as applicable to single cells also are applicable to other sources of starting materials, such as extracted RNA, purified RNA, cell lysates, or tissue lysates, and such application is contemplated.
- any of the compositions, reagents, and methods described herein as applicable to single cells also are applicable to other sources of starting materials, such as extracted RNA, purified RNA, cell lysates, or tissue lysates, and such application is contemplated.
- the present invention provides improved nucleic acids, kits, and methods capable of transcriptome-wide profiling at single cell resolution of tens of thousands of cells simultaneously and cost-effectively (approximately $2 per sample, as compared to approximately $80 per sample with a current method).
- the methods and kits may include both customized nucleic acids and/or method steps that are themselves the subject of this application, as well as one or more commercially available reagents, kits, apparatuses, or method steps.
- the methods of the invention provide a number of distinct advantages over existing methods. Some current methods require a polyA addition step prior to sequencing, but this step can be eliminated through the use of a Moloney Murine Leukemia Virus reverse transcriptase.
- full-length cDNA amplification can be carried out using the suppression PCR principle, thereby enriching full length cDNAs, and the method can be applied directly to cells rather than requiring
- the methods of the invention also provide an advantage in that they utilize at least two barcode sequences rather than one, allowing for the
- the methods of the invention provide an advantage over current methods targeting the 3 'end of mRNA that use linear mR A amplification.
- Linear mR A amplification is time-consuming compared to template switching/suppression PCR amplification.
- Linear mRNA amplification also is labor-intensive and limits the number of cells that can be processed to approximately 50 cells per day by a single person.
- the methods of the invention can accommodate 384 cells in a single plate, allowing a single person to easily process up to 1152 cells per day.
- UMIs also provides a distinct advantage over typical single- cell RNA-seq methods. Because of the very low starting amount of RNA in a single cell, several amplification steps are required during the process of the RNA- seq library preparation, and the UMIs protect against amplification biases.
- the methods of the invention utilizing a transposase-based sequencing library preparation have the added advantage of eliminating a number of labor- intensive and costly steps in library preparation, including magnetic bead immobilization, separate fragmentation, end repair, dA-tailing, and adaptor ligation.
- By eliminating the separate steps of chemical fragmentation and its purification, end repair, dA-tailing and adapter ligation, labor and cost are reduced, and the yield is much higher than with other techniques because there are fewer purification steps (during which material can be lost) and because this method to tag the fragment is much more efficient than by ligation with a regular ligase. Because less material is lost in the process, the methods of the invention can start with a much lower amount of starting cDNA.
- the invention provides methods that are advantageous based on a number of improvements to existing methods.
- a typical method provided by the invention is depicted in Figure 2, and starts with preparing a capture plate for cell sorting. Cells are then sorted into the plate (e.g., by fluorescence activated cell sorting), after which the plate may be frozen down for storage. For single cell analysis, one cell is sorted into each well of the plate.
- One advantage of the nucleic acids provided herein is that the use of various barcodes permits the end user to correlate transcript expression back to a particular well and plate, and thus to a specific cell evaluated.
- the plate can, in certain embodiments, be thawed from its frozen state.
- a proteinase or protease such as proteinase K
- the cell sorting and individual cell lysis steps can be skipped, as the starting material is already R A.
- the starting material is a population of cells, the population can be divided into a multi-well plate in preparation for lysis.
- the starting material is a lysate prepared from a population of cells or tissues, cell or tissue lysis may optionally occur in a prior step before introduction into the well and then lysate itself may be added to each well of a multi-well plate.
- a population of cells can be sorted into lysis buffer and lysed (e.g., by freeze-thawing, proteinase K treatment, or a combination thereof) before the lysate is added to the plate.
- the next steps are to reverse transcribe the mR A that has been released from the cells and to perform a template switching step.
- the reverse transcription and template switching can be performed using the nucleic acids of the invention, which efficiently perform these steps.
- a cDNA synthesis primer comprising a 5' blocking group, an internal adapter sequence, a barcode sequence, a unique molecular identifier (UMI) sequence, a complementarity sequence, and a
- the first nucleotide of the dinucleotide sequence is a nucleotide selected from adenine, guanine, and cytosine
- the second nucleotide of the dinucleotide sequence is a nucleotide selected from adenine, guanine, cytosine, and thymine
- the 5 ' blocking group is used to ensure the correct directionality of cDNA synthesis and the adapter sequence provides a sequence annealing to a sequencing primer, so the first sequencing read will contain the barcode and UMI sequences.
- the barcode sequence is used to track which well (and, thus, which cell) a particular cDNA was generated from.
- a barcode can provide a reference for (and, thus, a way to identify) the sample or the pool (e.g., the well) rather than a single cell.
- a UMI can be used in bulk RNA-seq and lysate sequencing to identify the transcript and the ⁇ primer (which, in other embodiments, typically contains the barcode for the plate, e.g., for plate indexing - sometimes referred to as the plate barcode or the index) identifies the sample or pool (e.g., the well) rather than the single cell.
- the UMI can be, for example, a 16mer UMI.
- a combination of one or more barcodes and a UMI is used.
- a UMI is used either alone or with a single barcode. In either way, the methods and compositions provide a mechanism for identifying where a particular transcript came from.
- i7 is used for plate indexing (e.g., it is a barcode to identify a particular plate).
- ⁇ serves as a sample barcode.
- the UMI provides a way to trace each cDNA produced to a particular mRNA derived from a cell/sample.
- the complementarity sequence anneals to the mRNA, for example, to the poly(A) tail of an mRNA, although it also could anneal to a specific target sequence, such as the sequence of a particular mRNA, instead.
- the 3 ' dinucleotide sequence target the extremity of the polyA tail, the last two bases of the mRNA before the polyA tail.
- a template- switching oligonucleotide comprising a 5 ' poly-isonucleotidecytosine- isoguanosine-isocytosine sequence, an internal adapter sequence, and a 3' guanosine tract can be used in the template switching step.
- the 5' poly- isonucleotidecytosine-isoguanosine-isocytosine sequence provides non-standard base pairs in the template switching oligo to prevent background cDNA synthesis.
- nucleotide isomers inhibit reverse transcriptase, such as MMLV reverse transcriptase, from extending the cDNA beyond the template switching adapter, thus increasing cDNA yield by reducing formation of concatemers of the template switching adapter.
- the adapter sequence provides the sub sequence required for the suppression PCR, and the 3 ' guanosine tract is used to anneal to a polycytosine tract generated at the 3 ' end of the first strand of cDNA synthesized. These steps are useful in incorporating a barcode and a UMI into the resulting cDNAs.
- the barcode introduced here helps track the individual well (and, therefore, cell/sample) that a cDNA population came from, while the UMI is unique for each mR A that produces a cDNA.
- the population of UMIs incorporated into the cDNAs provide a molecular "snapshot" of the mRNA population of the cell or sample at the time of lysis, because subsequent amplification steps do not alter the number of UMIs, making it possible to trace back each cDNA sequenced later to a particular mRNA released from the cell/sample.
- the template switching step is selective for the creation of full-length cDNAs.
- the wells can be pooled together and purified, followed by treatment with an exonuclease such as Exonuclease I.
- an exonuclease such as Exonuclease I
- the primer used for the suppression PCR can bind to the remaining adapters that are in excess from the template switching reaction, so the addition of an exonuclease, such as Exonuclease I, improves results.
- the cDNAs then are amplified (e.g, via PCR), followed by subsequent purification and quantification steps.
- the library is prepared for sequencing by fragmentation, e.g., with a transposase-based fragmentation system.
- This step also introduces a second bar code to the cDNAs, this second bar code being specific for the capture plate from which the cDNAs were pooled.
- each cDNA will have a bar code for both the plate and the well from which it was derived, allowing simultaneous processing of a large number of samples, in which each individual sequence can be traced back to a single mRNA of a specific cell (or, in the case of another type of sample, to be traced back to a well containing a cell or tissue lysate sample, a purified RNA sample, or the like).
- the library then can be purified, selected for appropriate size fragments, assessed for quantity and quality, and sequenced (e.g., by R A-seq such as the Illumina HiSeqTM (Catalog # SY-401-2501) or MiSeqTM (Catalog # SY-410-1003) systems).
- the sequencer can handle various read lengths and either single-end or paired-end sequencing.
- the libraries can be run in a way that matches with the read length required to read each barcode and obtains enough information from the sequence of the cDNA to identify from which gene it was coming from. For example, 17 cycles can be run for read 1 (see above) to read first the 6bp well/cell barcode and the lObp of UMI. This is then followed by 9 cycles to read the 8bp i7 plate index. Finally, 46 cycles are, in certain
- embodiments run on the other strand to read the cDNA/gene sequence.
- the machine allows the operator to set up a custom run for which they decide the read length for each portion for which sequence is to be obtained.
- This sequencing design allows an individual to decipher all the information while using the smaller/cheapest kit to meet their needs (e.g., 50 cycle kit that actually contains enough reagents for 74 cycles). Alternatively, an individual could run more cycles to get longer stretches of cDNA.
- samples from multiple capture plates can be combined without losing the identity of each cDNA in the mixture because of the two barcode sequences.
- the data can be deconvo luted after sequencing to determine the UMI of each particular cDNA and the well and plate it came from via the barcodes. This is advantageous because it allows a researcher to run many more samples together than would otherwise be possible, and to do so with less cost and labor.
- diagnostics refers to methods by which the skilled artisan can estimate and/or determine whether or not a patient is afflicted with a given disease or condition. The skilled worker often makes a diagnosis based on one or more diagnostic indicators. Exemplary diagnostic indicators may include the manifestation of symptoms or the presence, absence, or change in one or more markers for the disease or condition. A diagnosis may indicate the presence or absence, or severity, of the disease or condition.
- prognosis is used herein to refer to the likelihood of the progression or regression of a disease or condition, including likelihood of the recurrence of a disease or condition.
- treating refers to taking steps to obtain beneficial or desired results, including clinical results. Beneficial or desired clinical results include, but are not limited to, reduction, alleviation or amelioration of one or more symptoms associated with the disease or condition.
- administering or “administration of a compound or an agent to a subject can be carried out using one of a variety of methods known to those skilled in the art.
- a compound or an agent can be administered orally, intravenously, arterially, intradermally, intramuscularly, intraperitoneally, subcutaneously, ocularly, sublingually, intranasally, intraspinally, intracerebrally, and transdermally.
- a compound or agent can appropriately be introduced by rechargeable or biodegradable polymeric devices or other devices, e.g., patches and pumps, or formulations, which provide for the extended, slow, or controlled release of the compound or agent.
- Administering can also be performed, for example, once, a plurality of times, and/or over one or more extended periods.
- Administration of a compound may include both direct administration, including self-administration, and indirect administration, including the act of prescribing a drug. For example, a physician who instructs a patient to self-administer a therapeutic agent, or to have the agent administered by another, and/or who provides a patient with a prescription for a drug has administered the drug to the patient.
- nucleic acid refers to DNA molecules (e.g., cDNA or genomic DNA), RNA molecules (e.g., mRNA), DNA-RNA hybrids, and analogs of the DNA or RNA generated using nucleotide analogs.
- the nucleic acid molecule can be a nucleotide, oligonucleotide, double-stranded DNA, single- stranded DNA, multi-stranded DNA, complementary DNA, genomic DNA, non- coding DNA, messenger RNA (mRNA), microRNA (miRNA), small nucleolar RNA (snoRNA), ribosomal RNA (rRNA), transfer RNA (tRNA), small interfering RNA (siRNA), heterogeneous nuclear RNAs (hnRNA), or small hairpin RNA (shRNA).
- mRNA messenger RNA
- miRNA microRNA
- rRNA ribosomal RNA
- tRNA transfer RNA
- siRNA small interfering RNA
- hnRNA heterogeneous nuclear RNAs
- shRNA small hairpin RNA
- transcriptome can refer to any sequencing or gene expression information concerning the transcriptome or portion thereof. This information can be either qualitative (e.g., presence or absence) or quantitative (e.g., levels or mRNA copy numbers). In some embodiments, a profile can indicate a lack of expression of one or more genes.
- cDNA library refers to a collection of complementary DNA (cDNA) fragments.
- a cDNA library may be generated from the transcriptome of a single cell or from a plurality of single cells. cDNA is produced from mRNA found in a cell and therefore reflects those genes that have been transcribed for subsequent protein expression.
- a "plurality" of cells refers to a population of cells and can include any number of cells to be used in the methods described herein.
- a plurality of cells includes at least 10 cells, at least 25 cells, at least 50 cells, at least 100 cells, at least 200 cells, at least 500 cells, at least 1,000 cells, at least 5,000 cells, or at least 10,000 cells.
- a plurality of cells includes from 10 to 100 cells, from 50 to 200 cells, from 100 to 500 cells, from 100 to 1,000 cells, or from 1,000 to 5,000 cells.
- a “single cell” refers to one cell.
- Single cells useful in the methods described herein can be obtained from a tissue of interest, or from a biopsy, blood sample, or cell culture. Additionally, cells from specific organs, tissues, tumors, neoplasms, or the like can be obtained and used in the methods described herein. Cells can be cultured cells or cells from a dissociated tissue, and can be fresh or preserved in a preservative buffer such as R Aprotect.
- the method of preparing the cDNA library can include the step of obtaining single cells.
- a single cell suspension can be obtained using standard methods known in the art including, for example, enzymatically using trypsin or papain to digest proteins connecting cells in tissue samples or releasing adherent cells in culture, or mechanically separating cells in a sample.
- Single cells can be placed in any suitable reaction vessel in which single cells can be treated individually. For example a 96-well plate, such that each single cell is placed in a single well.
- an "oligonucleotide” or “polynucleotide” refers to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides or analogs thereof. Polynucleotides can have any three- dimensional structure and can perform any function.
- Exemplary polynucleotides include a gene or gene fragment (e.g., a probe or primer), exons, introns, messenger R A (mR A), transfer R A, ribosomal R A, ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA or RNA of any sequence, and nucleic acid probes and primers.
- a gene or gene fragment e.g., a probe or primer
- exons e.g., introns, messenger R A (mR A), transfer R A, ribosomal R A, ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA or RNA of any sequence, and nucleic acid probes and primers.
- mR A messenger R A
- transfer R A transfer R A
- polynucleotide can comprise modified nucleotides, such as isonucleotides, methylated nucleotides, and other nucleotide analogs. The term also refers to both double- and single-stranded molecules.
- a polynucleotide is composed of a specific sequence of four nucleotide bases: adenine (A), cytosine (C), guanine (G), and thymine (T). Uracil (U) substitutes for thymine when the polynucleotide is RNA.
- the sequence can be input into databases in a computer having a central processing unit and used for bioinformatics applications such as functional genomics and homology searching.
- a "primer” is a polynucleotide that hybridizes to a target or template that may be present in a sample of interest. After hybridization, the primer promotes the polymerization of a polynucleotide complementary to the target, for example in a reverse transcription or amplification reaction.
- Methods for selecting or sorting cells are well established, and in some embodiments include, but are not limited to, fluorescence-activated cell sorting (FACS), micromanipulation, manual sorting, and the use of semi-automated cell pickers.
- Individual cells can be individually selected based on features detectable by observation (e.g., by microscopic observation). Exemplary features can include location, morphology, and reporter gene expression.
- a population of cells can be sorted to provide a subpopulation or a predetermined subset of cells. In some embodiments, the population, subpopulation, or predetermined subset can be sorted to provide single cells. In some embodiments, the cells are sorted into a capture plate.
- Capture plates can comprise a number of wells into which the cells are sorted, for example, 24 wells, 96 wells, 384 wells, or 1536 wells.
- a population of cells is lysed without sorting.
- the population of cells can be, for example, a tissue sample.
- the population of cells is an isolated population of cells.
- the starting material for further analysis may be, for example, a cell or tissue lysate or bulk purified or extracted RNA.
- cells can be divided into the wells of a plate without sorting.
- the amount of material in each well is normalized with respect to the other wells so as to provide similar sequencing coverage across a plate.
- the cells may be lysed.
- Cells may be lysed by any number of known techniques. Exemplary cell lysis techniques include freeze-thawing, heating the cells, using a detergent or other chemical method, or a combination thereof. Techniques minimizing degradation of the released mRNA are preferred. Likewise, techniques preventing the release of nuclear chromatin are preferred. For example, heating the cells in the presence of Tween-20 is sufficient to lyse cells while minimizing genomic contamination from nuclear chromatin.
- cells are lysed using freeze-thawing.
- a proteinase or protease such as proteinase K, is added to the lysis reaction to increase the efficiency of lysis.
- cells are lysed using freeze-thawing optionally supplemented with addition of proteinase K.
- cell lysis may be of single cells already sorted into individual wells of a plate.
- lysis of populations of cells may be performed and the starting material for further sequence analysis may be a cell or tissue lysate made from a plurality of cells and then aliquoted to wells of a plate.
- the material may be stored at a suitable temperature, such as -80 °C, prior to further use.
- cDNA is synthesized from mRNA through the process of reverse transcription.
- Reverse transcription can be performed directly on cell lysates (for example, a cell lysate prepared as described above), by adding a reaction mix for reverse transcription directly to the cell lysate.
- the total RNA or mRNA can be purified after cell lysis, for example through the use of column based (e.g., Qiagen RNeasy Mini kit Cat. No. 74104, ZymoResearch Direct-zol RNA Cat. No. R2050) or magnetic bead purification (e.g., Agencourt RNAClean XP, Cat. No. A63987).
- the reverse transcription is combined with a template switching step to improve the yield of longer (e.g., full length) cDNA molecules.
- the reverse transcriptase used has tailing or terminal transferase activity, and synthesizes and anchors first- strand cDNA in one step.
- the reverse transcriptase is a Moloney Murine Leukemia Virus (MMLV) reverse transcriptase, for example, SMARTscribeTM (Clontech, Cat. No. 639536) reverse transcriptase, Superscript IITM reverse transcriptase (Life Technologies, Cat. No. 18064-014), or Maxima H MinusTM reverse transcriptase. (Thermo Scientific, Cat. No. EP0753).
- Template switching introduces an arbitrary sequence at the 3 ' end of the cDNA that is designed to be the reverse complement to the 3 ' end of a cDNA synthesis primer.
- the synthesis of the first strand of the cDNA can be directed by a cDNA synthesis primer (CDS) that includes an RNA complementary sequence (RCS).
- CDS cDNA synthesis primer
- RCS RNA complementary sequence
- the RCS is at least partially complementary to one or more mRNA species in an individual mRNA sample, allowing the primer to hybridize to at least some mRNA species in a sample to direct cDNA synthesis using the mRNA as a template.
- the RCS can comprise oligo (dT) sequence that binds to many mRNA species, or it can be specific for a particular mRNA species, for example, by binding to an mRNA sequence of a gene of interest.
- the RCS can comprise a random sequence, such as random hexamers.
- a non-self- complementary sequence can be used.
- a template-switching oligonucleotide that includes a portion which is at least partially complementary to a portion of the 3 ' end of the first strand of cDNA generated by the reverse transcription can also be used in the methods of the invention. Because the terminal transferase activity of reverse transcriptase typically causes the incorporation of two to five cytosines at the 3 ' end of the first strand of cDNA synthesized, the first strand of cDNA can include a plurality of cytosines, or cytosine analogues that base pair with guanosine, at its 3 ' end to which the template-switching oligonucleotide with a 3' guanosine tract can anneal.
- a template-switching oligonucleotide is extended to form a double stranded cDNA.
- a template-switching oligonucleotide can include a 3 ' portion comprising a plurality of guanosines or guanosine analogues that base pair with cytosine.
- Exemplary guanosines or guanosine analogues include, but are not limited to,
- the guanosines can be ribonucleosides or locked nucleic acid monomers.
- a locked nucleic acid is an R A nucleotide wherein the ribose moiety has been modified with an extra bridge connecting the 2' oxygen and the 4' carbon.
- a peptide nucleic acid is an artificially synthesized polymer similar to DNA or RNA, wherein the backbone is composed of repeating N-(2-aminoethyI)- glycine units linked by peptide bonds.
- the reverse transcription and template switching comprise contacting an mRNA sample with two nucleic acid primers.
- the first nucleic acid primer e.g., a template-switching
- oligonucleotide comprising a 5 ' poly-isonucleotidecytosine-isoguanosine- isocytosine sequence, an internal adapter sequence, and a 3 ' guanosine tract.
- the 5' poly-isonucleotide sequence comprises an isocytosine, or an isoguanosine, or both.
- the 5 ' poly-isonucleotide sequence comprises an isocytosine -isoguanosine-isocytosine sequence.
- the 3' guanosine tract comprises two, three, four, five, six, seven, eight, nine, ten, or more guanosines. In certain embodiments, the 3' guanosine tract comprises three guanosines. In some embodiments, the adapter sequence is 12 to 32 nucleotides in length, for example, 22 nucleotides in length.
- the internal adapter sequence is 5'- ACACTCTTTCCCTACACGACGC-3' (SEQ ID NO: 1).
- sequence of the first primer is 5'- iCiGiCACACTCTTTCCCTACACGACGCrGrGrG-3' (SEQ ID NO: 17)(e.g., 1 ⁇ ,) wherein iC represents isocytosine (iso-dC), iG represents isoguanosine, and rG represents RNA guanosine.
- the second nucleic acid primer (e.g., a cDNA synthesis primer) comprises a 5' blocking group, an internal adapter sequence, a barcode sequence, a unique molecular identifier (UMI) sequence, a
- the bar code can be omitted from the cDNA synthesis primer and an extra 6 base pairs can be added to the UMI sequence.
- the 5' blocking group is selected from biotin, an inverted nucleotide (e.g., inverted dideoxy-T), a fluorophore, an amino group, and iso-dG or isodC.
- the internal adapter sequence is 23 to 43 nucleotides in length, for example, 33 nucleotides in length.
- the internal adapter sequence is 5'-ACACTCTTTCCCTACACGACGC-3' (SEQ ID NO: 1).
- the barcode sequence is 4 to 20 nucleotides in length, for example, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides in length.
- the UMI sequence is 6 to 20 nucleotides in length, for example, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides in length.
- the complementarity sequence is a poly(T) sequence.
- the complementarity sequence is 20 to 40 nucleotides in length, for example, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 nucleotides in length.
- the second nucleic acid primer is 5 '-
- the barcodes may be designed so that each barcode sequence differs from the barcodes of all other primers by at least two nucleotides, so that a single sequencing error cannot lead to the misidentification of the barcode.
- the UMI sequences provide a robust guard against amplification biases. More particularly, each UMI is present only once in a population of second nucleic acid primers. Thus, each UMI is incorporated into a unique cDNA sequence generated from a cellular mRNA, and any subsequent amplification steps will not alter the one UMI to one mRNA ratio.
- the UMI sequence rather than being 10 nucleotides in length, is 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or more nucleotides in length.
- the length should be selected to provide sufficient unique sequences for the population of cells to be tested (preferably with at least two nucleotide differences between any pair of UMIs), preferably without adding unnecessary length that increases sequencing cost.
- Barcode sequences enable each cDNA sample generated by the above method to have a distinct tag, or a distinct combination of tags, such that once the tagged cDNA samples have been pooled, the tag can be used to identify the single cell from which each cDNA sample originated.
- each cDNA sample can be linked to a single cell, even after the tagged cDNA samples have been pooled and amplified.
- the use of the foregoing nucleic acids permits deconvolution of pooled data to single cell/well resolution. This is particularly advantageous for facilitating the application of this technology to screening assays.
- a nucleic acid useful in the invention can contain a non-natural sugar moiety in the backbone, for example, sugar moieties with 2' modifications such as addition of a halogen, alkyl-substituted alkyl, SH, SCH 3 .
- 2' modifications such as addition of a halogen, alkyl-substituted alkyl, SH, SCH 3 .
- Similar modifications also can be made at other positions on the sugar.
- Nucleic acids, nucleoside analogs or nucleotide analogs having sugar modifications can be further modified to include a reversible blocking group, a peptide linked label, or both. In those embodiments comprising a 2' modification, the base can have a peptide- linked label.
- a nucleic acid useful in the invention also can include native or non- native bases.
- a native deoxyribonucleic acid can have one or more bases selected from adenine, thymine, cytosine, and guanine
- a ribonucleic acid can have one or more bases selected from uracil, adenine, cytosine, and guanine.
- Exemplary non-native bases include, but are not limited to, inosine, xanthine, hypoxanthine, isocytosine, isoguanosine, 5-methylcytosine, 5- hydroxymethyl cytosine, 2-aminoadenine, 6-methyl adenine, 6-methyl guanine.
- isocytosine and isoguanosine may reduce non-specific hybridization.
- a non-native base can have universal base pairing activity, wherein it is capable of base-pairing with any other naturally occurring base, e.g., 3- nitropyrrole and 5-nitroindole.
- the cDNA is pooled together. For example, a population of cells can be individually sorted into the wells of a tray, lysed, and undergo reverse transcription and template switching. These cDNAs then can be pooled and purified. In certain embodiments, the cDNA is purified through a column-based purification method, e.g., with a DNA Clean & Concentrator-5 column (Zymo Research, #D4013).
- a column-based purification method e.g., with a DNA Clean & Concentrator-5 column (Zymo Research, #D4013).
- pooled cDNAs are treated with an exonuclease (e.g., Exonuclease I) to degrade any primers remaining from the reverse transcription and template switching steps. This prevents possible interference by these primers in subsequent amplification.
- exonuclease e.g., Exonuclease I
- amplification refers to a process by which multiple copies of a particular polynucleotide are formed, and includes methods such as the polymerase chain reaction (PCR), ligation amplification (also known as ligase chain reaction, or LCR), and other
- amplification refers specifically to PCR.
- Amplification methods are widely known in the art.
- PCR refers to a method of amplification comprising hybridization of primers to specific sequences within a DNA sample and amplification involving multiple rounds of annealing, elongation, and denaturation using a DNA polymerase. The resulting DNA products are then often screened for a band of the correct size.
- the primers used are oligonucleotides of appropriate length and sequence to provide initiation of polymerization.
- Reagents and hardware for conducting amplification reactions are widely known and commercially available. Primers useful to amplify sequences from a particular gene region are sufficiently complementary to hybridize to target sequences.
- Nucleic acids generated by amplification can be sequenced directly. [0076] When hybridization occurs in an antiparallel configuration between two single-stranded polynucleotides, the reaction is called “annealing" and those polynucleotides are described as “complementary”. A double-stranded
- polynucleotide can be complementary or homologous to another polynucleotide, if hybridization can occur between one of the strands of the first polynucleotide and the second.
- Complementarity or homology is quantifiable in terms of the proportion of bases in opposing strands that are expected to form hydrogen bonding with each other, according to generally accepted base-pairing rules.
- hybridization is influenced by hybridization conditions, such as temperature and salt. In the context of amplification, these parameters can be suitably selected.
- cDNA created by reverse transcription and template switching, and optionally treated with an exonuclease is amplified to provide more starting material for sequencing.
- cDNA can be amplified by a single primer with a region that is complementary to all cDNAs, e.g., an adapter sequence.
- the primer has a 5 ' blocking group such as biotin.
- An exemplary primer is as follows: 5 '-
- One exemplary amplification reaction uses cDNA; PCR buffer, such as 1 OX Advantage 2 PCR buffer; dNTPs; the DNA primer 5 ' -
- amplification reaction may be modified to use fewer than 18 cycles, e.g., 10 cycles.
- One exemplary amplification reaction uses 20 ⁇ ⁇ of cDNA; 5 ⁇ ⁇ of 10X Advantage 2 PCR buffer; ⁇ ⁇ , of dNTPs; ⁇ ⁇ , of the DNA primer 5 '- /5Biosg/ACACTCTTTCCCTACACGACGC-3 ' (SEQ ID NO: 19) (10 ⁇ ,
- Nucleic acid purification is well known in the art.
- a nucleic acid e.g., cDNA
- a spin- based column such as those commercially available from Zymo ResearchTM (DNA Clean & ConcentratorTM-5, Cat. No. D4013) or QiagenTM (MinElute PCR purification kit. Cat. No. 28004).
- the spin column is a column lacking a physical ring, for example the ring found in QiagenTM columns, allowing elution of the purified nucleic acid in a lower volume than would be possible in a spin column with a ring.
- a nucleic acid e.g., cDNA, such as in a cDNA library
- magnetic beads include, for example, the Agencourt AMPure XPTM system (Beckman Coulter, Cat. No. A63881).
- a nucleic acid e.g., cDNA, such as in a cDNA library
- a nucleic acid is purified after being run on a gel.
- Gel extraction purification kits are well known, and include, for example, the MinElute Gel Extraction KitTM (Qiagen, Cat. No. 28604).
- a cDNA library for sequencing is fragmented prior to the sequencing.
- a cDNA library can be fragmented by any known method, for example, mechanical fragmentation or a transposase-based fragmentation such as that used in the NexteraTM system (e.g., the Illumina Nextera XT DNA Sample
- a barcode sequence introduced during preparation of a cDNA library for sequencing is specific for a predetermined set of cells.
- This predetermined set of cells can be a subset of a larger set of cells.
- a tissue biopsy can be sorted into a set of cells to be further sorted into single cells in a capture plate for gene profiling. If a bulk lysate or population of cells is being used as a starting material rather than a single cells that have been sorted, a barcode sequence may, in certain
- a cDNA library for sequencing is quantified and evaluated for quality prior to the sequencing to ensure that the library is of sufficient quantity and quality to yield positive results from sequencing.
- a cDNA library can be quantified using a fluorometer and analyzed for quantity and average size through the use of a number of commercially available kits. The 2 main metrics for quality are the concentration of the library (which needs to be sufficient for loading on the sequencer) and the length of the cDNA fragments to be sequenced. Size selection is performed on a gel to enrich for fragments of the correct size. The gel itself gives an idea of the quality of the library.
- the final extracted library can be run on an Agilent Bioanalyzer (Cat. No. G2940CA) to obtain the size distribution for the cDNA fragments.
- sequencing refers to any technique known in the art that allows the identification of consecutive nucleotides of at least part of a nucleic acid.
- exemplary sequencing techniques include RNA-seq (also known as whole transcriptome sequencing), IlluminaTM sequencing, direct sequencing, random shotgun sequencing, Sanger dideoxy termination sequencing, whole-genome sequencing, massively parallel signature sequencing (MPSS), sequencing by hybridization, pyrosequencing, capillary electrophoresis, gel electrophoresis, duplex sequencing, cycle sequencing, single-base extension sequencing, solid- phase sequencing, high-throughput sequencing, massively parallel signature sequencing, emulsion PCR, sequencing by reversible dye terminator, paired-end sequencing, near-term sequencing, exonuclease sequencing, sequencing by ligation, short-read sequencing, single-molecule sequencing, sequencing-by- synthesis, real-time sequencing, reverse-terminator sequencing, nanopore sequencing, 454 sequencing, Solexa Genome Analyzer sequencing, SOL
- sequencing is performed on Illumina Hiseq or MiSeq paired-end flow cells.
- nucleic acids, methods, and kits of the invention are capable of sequencing data analysis.
- Sequencing products can be traced not only to a single plate of cells from which it came, but also to a single cell (e.g., a well) and, indeed, a single cellular transcript.
- This deconvolution of sequencing data can be achieved through the use of barcode and UMI sequences.
- sequencing is combined with 3' digital gene expression to provide a number of counts for a particular sequence or sequences (e.g., cDNAs containing a particular combination of bar codes and a UMI).
- each fragment of each transcript is sequenced and then counted for how many fragments of each transcript have been sequenced.
- the computed gene expression should be normalized based on the length of a given transcript because a longer transcript will have a greater chance of having one of its fragments sequenced.
- full transcript sequencing typically requires more sequencing coverage than DGE, for which only the 3 'end needs to be sequenced. Kits
- the invention provides a kit comprising a plurality of the one or both of the reverse transcription/template switching nucleic acid primers described above.
- the UMI sequence of each of the second nucleic acid primer described above in the plurality of nucleic acids of the kit is unique among the nucleic acids of the kit.
- the plurality of nucleic acids comprises different populations of nucleic acid species.
- each population of nucleic acid species comprises a different barcode sequence that uniquely identifies a single population of nucleic acid species.
- the kit further comprises a third nucleic acid primer comprising 12 to 32 nucleotides and a 5' blocking group as described above.
- the third nucleic acid is 22 nucleotides in length.
- An exemplary sequence of the third nucleic acid primer is 5'- ACACTCTTTCCCTACACGACGC-3' (SEQ ID NO: 2).
- the kit further comprises a nucleic acid comprising a barcode sequence.
- the kit further comprises a phosphorothioate bond-containing nucleic acid comprising an X1 *X2*X3*X4*X5*3' sequence, wherein * is a phosphorothioate bond.
- the phosphorothioate bond- containing nucleic acid is 48 to 68 nucleotides in length, for example, 58 nucleotides in length.
- An exemplary sequence of the phosphorothioate bond- containing nucleic acid is 5'-
- the kit further comprises a capture plate and/or a reverse transcriptase enzyme and/or a DNA purification column (e.g., a DNA purification spin column) and/or proteinase K.
- a DNA purification column e.g., a DNA purification spin column
- the kit can comprise a Moloney Murine Leukemia Virus (MMLV) reverse transcriptase, for example, SMARTscribeTM reverse transcriptase,
- MMLV Moloney Murine Leukemia Virus
- SMARTscribeTM reverse transcriptase SMARTscribeTM reverse transcriptase
- kits include any one or any combinations of the reagents described herein and, optionally, directions for use.
- the reagents may be provided in separate containers, such as separate tubes or vials.
- the kit contains sterile water for use.
- the nucleic acids, kits, and/or methods of the invention are used for research applications requiring sequencing or gene expression profiling.
- the research applications include studying cellular differentiation, characterizing tissue heterogeneity, high- throughput screening of agents (e.g., potential therapeutics, potential
- the nucleic acids e.g., compositions), kits, and/or methods, of the disclosure are applied to gene expression analysis of single cells, optionally in response to contacting the single cell with an agent in the high- throughput screening context.
- the ability to analyze gene expression accurately and across large numbers of cells, and to be able to accurately correlate the expression level to a particular cell/well is an exemplary advantage and application of the instant technology.
- the technology is, in certain embodiments, similarly applied to other samples, such as cell or tissue lysates.
- the invention is useful in generating a gene expression profile for a plurality of cells.
- gene expression profiles can be used in a number of applications related to the diagnosis, prognosis, and treatment of a subject.
- cells from a tissue sample collected from a patient can be used in the methods of the invention to generate an expression profile that can be compared against a known profile that is indicative of the disease or condition, thus informing a physician of whether the subject has the disease or condition.
- the profile can be compared to a known profile useful in the prognosis of the disease or condition. For example, if the known profile is predictive of a cancer prognosis, the comparison may inform the physician of the stage of cancer or the cancer's likelihood of metastasis.
- the invention can be used in a method of treating a disease or condition in a subject in need thereof.
- a method of the invention can be used to obtain gene expression profiles in a subject before and after treatment with a therapeutic agent, thereby providing a means of determining the efficacy of the therapeutic agent. These data can be used to determine the efficacy of a treatment, or to help a physician determine an effective treatment regimen.
- the invention is applicable to various diseases or conditions.
- diseases or conditions are a cancer, a cardiovascular disease or condition, a neurological or neuropsychiatric disease or condition, an infectious disease or condition, a respiratory or gastrointestinal tract disease or condition, a reproductive disease or condition, a renal disease or condition, a prenatal or pregnancy-related disease or condition, an autoimmune or immune-related disease or condition, a pediatric disease, disorder, or condition, a mitochondrial disorder, an ophthalmic disease or condition, a musculo-skeletal disease or condition, or a dermal disease or condition.
- All publications, patents and published patent applications referred to in this application are specifically incorporated by reference herein. In case of conflict, the present specification, including its specific definitions, will control.
- Example 1 Protocol for transcriptome-wide single-cell RNA sequencing [0091] To test the methods of the invention, the protocol described below was developed.
- RNAprotect Cell Reagent Qiagen, #76526) and 1 ⁇ of RNaseOUT Recombinant Ribonuclease Inhibitor (Life Technologies, #10777-019). Cells were stored up to two weeks at 4 °C. Prior to sorting, cells in the RNAprotect Cell Reagent were diluted in 1.5mL PBS, pH 7.4 (no calcium, no magnesium, no phenol red, Life Technologies, #10010-049). The cells then were stained for viability (DNA staining by Hoechst 33342) with NucBlue Live ReadyProbes Reagent (Life Technologies, #R37605).
- ⁇ ⁇ of a universal adapter DNA primer (template-switching oligonucleotide) 5 '- iCiGiCACACTCTTTCCCTACACGACGCrGrGrG-3 ' ( ⁇ ⁇ ,) (SEQ ID NO: 17) was added to each well, wherein iC represesents isocytosine (iso-dC), iG represents isoguanosine, and rG represents RNA guanosine.
- the barcode sequences were designed such that each barcode differed from the others by at least two nucleotides, so that a single sequencing error could not lead to the misidentification of the barcode (Table 1).
- the plate was subsequently incubated at 72 °C for 3 minutes then immediately placed on ice to cool down (although this step is optional).
- the Template Switching step was carried out in each well using the following reagents: 2 ⁇ of 5X 1st strand buffer (250mM UltraPure Tris-HCl, pH 8.0, Life Technologies, #15568-025; 375mM KC1, LifeTechnologies, #AM9640G; 30mM MgC12, Life Technologies,
- CAGGCC 255 CAGGGG 256
- Exonuclease I 2 ⁇ L of 10X reaction buffer, of Exonuclease I (New England Biolabs, #M0293L), and the reaction was incubated at 37 °C for 30 minutes, then at 80 °C for 20 minutes.
- 5Biosg represents 5' biotin) (10 ⁇ , Integrated DNA Technologies); ⁇ , of the Advantage 2 Polymerase Mix; and 22 ⁇ of Nuclease-Free Water, and performed using the following program: 95 °C for 1 minute; 18 cycles of a) 95 °C for 15 seconds, 65 °C for 30 seconds, 68 °C for 6 minutes, and 72 °C for 10 minutes (followed by an option hold period at 4 °C).
- Full length cDNAs were purified with 30 ⁇ , of beads (here, Agencourt AMPure XP magnetic beads (Beckman Coulter, #A63880)). The full length cDNAs were eluted in 12 ⁇ of Nuclease-Free Water and quantified on the Qubit 2.0 Flurometer (Life Technologies) using the dsDNA HS Assay (Life Technologies).
- the resulting sequencing library was purified with 30 ⁇ of Agencourt AMPure XP magnetic beads and eluted in 20 ⁇ of nuclease free water.
- the entire library was run on an E-Gel EX Gel, 2% (Life Technologies, #G4010-02), and the band corresponding to a size range of 300 to 800bp was excised and purified using the QIAquick Gel Extraction Kit (Qiagen, #28704).
- Sequencing library quality assessment [0103] The library was quantified on the Qubit 2.0 Fluorometer using the dsDNA HS Assay. The quality and average size of the library were assessed by
- BioAnalyzer (Agilent) with the High Sensitivity DNA kit (Agilent, #5067-4626).
- Sequencing is performed on any Illumina® HiSeqTM or MiSeqTM using standard Illumina® sequencing kit. Libraries are run on paired-end flow cells by running 17 cycles on the first strand, then 8 cycles to decode the NexteraTM barcode and finally 34 cycles (although 46 cycles also can be used to increase the amount of sequencing data). Up to twelve Nextera libraries/384-well capture plates, each comprising 384 cells, are multiplexed together (twelve libraries can be used with a set of twelve plate-identifying barcode sequences, although this number can be expanded with additional barcode sequences), allowing the simultaneous sequencing of up to 4,608 single cell transcriptomes on a single lane.
- the methods and reagents (e.g., polynucleotides, kits, etc.) described herein have numerous applications.
- the following provides an example demonstrating the application of the instant technology to a particular context.
- the method described above was used to sequence the transcriptomes of a population of differentiating human adipose tissue-derived stromal/stem cells (hASCs) at three different time points (day 0, day 1, day 2, day 3, day 5, day 7, day 9, and day 14).
- Visual inspection of these cells indicates that differentiation over time is incomplete, thus leading to a heterogeneous cell population (Figure 1).
- Figure 3 depicts the design of the sequencing library incorporating the two levels of barcoding (well/cell and plate), the UMI, and the primer sequences indicated as P5 and P7 for Illumina sequencing.
- P5 and P7 are the regions that anneal to their complementary oligos on the flow cell.
- the index (i7) represents the plate index than is added during the Nextera tagmentation process after all wells have been pooled and pre-amplified. It is incorporated by PCR during the last step of the library preparation.
- One i7 index is used per pool/plate of 96 or 384 samples/cells, allowing for a higher level of multiplexing by pooling several plates together for sequencing.
- the sequencing primers P5 and P7 initiate the sequencing reaction. The sequencing will result in 3 distinct reads.
- the first one is 16bp long and includes 6bp of the well/cell barcode followed by lObp of the UMI. Then the i7 index sequencing primer allows us to read the plate/pool index (i7, 8bp) on the same strand. Finally, the other strand is generated (paired-end sequencing) and the read 2 sequencing primer allows us to read the actual cDNA fragment, which is typically 45bp with a 50 cycle kit.
- the disclosure provides a polynucleotide as set forth on Figure 3 (e.g., a polynucleotide comprising various polynucleotide portions, such as contiguous portions, as set forth in Figure 3).
- the various portions are described herein and the figure contemplates polynucleotides comprising any combinations of these various portion. Expression values were correlated by comparing raw read counts to UMI counts ( Figure 4). Incorporating and counting UMIs helped to reduce the PCR bias.
- GAPDH usually is present at a constant level of expression in a population of cells, when observed at the single cell level, a significant portion of cells were seen that did not express GAPDH because GAPDH is a cell cycle-regulated gene.
- GAPDH is not necessarily a good reference gene especially at the single cell level. This underscores the power of the single cell sequencing methods of the invention.
- FIG. 8 A projection of three of the highest components of a principal component analysis based on gene expression are shown in Figures 8 to 13. Each point represents a profiled cell. The cells profiled at day 0 are represented in black, while the cells profiled at the subsequent time points (day 1, day 2, day 3, day 7, and day 14) are shown in gray (or in red if depicted in color). A clear distinction can be seen between the day 0 cells and the cells from subsequent time points. To explore these differences, a Gene Ontology analysis then was performed on the differentially expressed genes between two subpopulations distinguishable at day 14 with the principal component analysis: a subpopulation of genes that clusters with day 0 genes and a subpopulation that is separate from those genes.
- the invention provides a useful method for single cell sequencing and single transcript tracking that uses the aggregation of samples and subsequent deconvolution of data. Through this process of aggregation and deconvolution, the sequencing can be performed with less cost and greater efficiency than by traditional sequencing techniques. Moreover, the results obtained here reflect the ability to detect changes and differences across heterogeneous populations when those populations are evaluated at the single cell level. Such changes and differences may be lost (e.g., averaged out) if gene expression across the heterogenous population is instead evaluated.
- Example 3 Simultaneous single cell sequencing of 12,832 cells [0110]
- single cell sequencing methods and compositions e.g., reagents, nucleic acids, kits
- a primary human adipose-derived stem/stromal cell (hASC) differentiation system was used as a test system, akin to that described above.
- hASC human adipose-derived stem/stromal cell
- the resulting data reveal the major axes of variation on gene expression, suggest a biological basis for the morphological heterogeneity observed in these cultures, and provide a rich resource for dissection of the regulatory networks involved in adipocyte formation and function beyond what investigations using other techniques have shown.
- identification of rare expression programs can be enabled by deeper and more sensitive profiling of every cell, and direct comparison of in vitro and in vivo heterogeneity can be observed through direct profiling of single cells from tissue samples.
- hASCs Human adipose-derived stem/stromal cells
- the cultures were then induced to differentiate towards an adipogenic fate after reaching 80% confluency (differentiations Dl and D2) or two days after reaching 100% confluency (differentiation D3) by switching from growth medium to the StemPro adipogenesis differentiation medium (Life Technologies), and were subsequently prepared for further analysis, such as by qPCR or smFISH.
- the differentiation medium was changed every three days for up to 14 days.
- the variation in initial conditions was introduced to assess the robustness of the subsequent time course data.
- 384-well SBS capture plates were filled with 5 ⁇ 1 of a 1 :500 dilution of Phusion HF buffer (New England Biolabs) in water and cells were then sorted into each well using a FACSAria II flow cytometer (BD Biosciences) based on Hoechst DNA staining. After sorting, the plates were immediately sealed, spun down, cooled on dry ice, and stored at -80°C. For lipid content-based FACS, cells were also stained with HSC LipidTOX Neutral Lipid Stain (Life Technologies) and sorted according to their relatively "high” or “low” lipid content, either by taking the top and bottom 20% of stained cells (D2) or the top and bottom 50% (D3).
- cDNA from 384 wells was pooled together and purified and concentrated using a single DNA Clean & Concentrator- 5 column (Zymo Research). Pooled cDNAs were treated with an exonuclease, in this example Exonuclease I (New England Biolabs), and subsequently amplified by single primer PCR using the Advantage 2 Polymerase Mix (Clontech) and the SINGV6 primer (10 pmol, Integrated DNA Technologies) (5'- /5Biosg/ACACTCTTTCCCTACACGACGC-3' (SEQ ID NO: 19)).
- Exonuclease I New England Biolabs
- SINGV6 primer 10 pmol, Integrated DNA Technologies
- the resulting sequencing library was purified with Agencourt AMPure XP magnetic beads (0.6x, Beckman Coulter), size selected (300-800bp) on an E-Gel EX Gel, 2% (Life Technologies), purified using a QIAquick Gel Extraction Kit (Qiagen) and quantified on a Qubit 2.0 Flurometer using a dsDNA HS Assay (Life Technologies).
- Digital gene expression (DGE) libraries for sequencing were prepared from 10 ng of extracted total RNA, using the protocol described above for single cells, with the exception of using more concentrated template-switching and barcoded nucleic acids (10 pmol) and a version of the cDNA synthesis primer that did not contain the well-specific 6bp barcodes but instead a 16bp UMI (Integrated DNA Technologies) (5'-
- Probes targeting LPL, G0S2 and TCF25 transcripts were synthesized as amine-conjugated oligonucleotides and then labelled with Cy5 (GE Healthcare), Alexa Fluor 594 (Molecular Probes) or 6-TAMRA (Molecular Probes).
- Hybridizations and washes were performed using modifications to previously described procedures (see, e.g., Bienko et al, Nat. Methods 10: 122-124 (2013) and Raj et al, Nat. Methods 5 :877-879 (2008)).
- lipids Prior to hybridizations, lipids were extracted by incubation of fixed cells in 2: 1 chloroform:methanol for 30 min at room temperature. Cells were washed quickly with 70% ethanol and then resuspended in 200 ⁇ 1 RNA Hybridization buffer containing 2x SSC buffer, 25%> Formamide, 10% Dextran Sulphate (Sigma), E.
- coli tRNA (Sigma), Bovine Serum Albumin (Ambion), Ribonucleoside Vanadyl Complex and 150 ng of each desired probe set (the mass refers only to pooled oligonucleotides, excluding fluorophores, and is based on absorbance measurements at 260 nm).
- Hybridizations were performed for 16-18 h at 30 °C, after which cells were washed twice for 30 min at 30 °C in RNA Wash buffer (containing 2 SSC buffer, Formamide 25% (Ambion) and 100 ng/ml DAPI).
- All second sequence reads were aligned to a reference database containing all human RefSeq mRNA sequences (obtained from the UCSC Genome Browser hgl9 reference set), the human hgl9 mitochondrial reference sequences and the ERCC RNA spike-in reference sequences, using bwa version 0.7.4 4 with non-default parameter "-1 24".
- Read pairs for which the second read aligned to a human RefSeq gene were kept for further analysis if 1) the initial six bases of the first read all had quality scores of at least 10 and corresponded exactly to a designed well-barcode and 2) the next ten bases of the first read (the UMI) all had quality scores of at least 30.
- DGE Digital gene expression
- the UMI counts for each gene in the remaining wells were then normalized by dividing by the sum of UMI counts across all genes in the same well. This normalization removes variation from differences in RNA content per cell and can be revisited for analyses that are sensitive to this phenomenon.
- Pairwise Pearson correlations between genes across single cells and their associated p-values were computed using the scikit-learn metrics .pairwise_distances function.
- the 5% false discovery rate (FDR) thresholds were estimated from the p-value distribution using the Benjamini-Hochberg-Yukeli procedure.
- the expected null distributions of pairwise correlation coefficients were estimated by permuting expression values across cells from the same time point and re-computing the pairwise correlations 100 times.
- PC A Principal component analyses
- GSEA Gene set enrichment analyses
- hASC cultures were collected just prior to induction of differentiation (day 0), as well as at seven time points after induction (days 1, 2, 3, 5, 7, 9 and 14). At day 14, approximately two thirds of the cells contained clearly visible lipid droplets while the remainder retained a more fibroblastlike morphology.
- a nucleic acid stain was used to identify and sort intact single cells into 384-well plates with a fluorescence-activated cell sorter.
- a neutral lipid stain also was used to separately sort single cells based on their lipid contents. This method allowed us to combine the advantages of FACS sorting, such as staining cells using, for example, a DNA stain or a lipid stain, and selecting specific cells to profile.
- DGE survey-depth digital gene expression
- PCI PC metagene
- PC2 was high only in cells collected from day 0, effectively separating these from the differentiating cells. It showed a strong positive association with expression of genes required for progression through the mitotic cell cycle and, to a lesser extent, with genes associated with non-adipogenic differentiation. A decrease in PC2 may therefore reflect an exit from the cell cycle and lineage commitment.
- PC3 was high during the first two days post- induction, but steadily decreased as the cells approached day 14. This decrease was associated with up-regulation of lipid homeostasis pathways and markers of adipocyte maturation.
- PC4 showed a transient drop at day 1 , which was associated with increased expression of genes known to be rapidly induced by adipogenic cocktails, including early adipogenic regulators CEBPB and CEBPD 11. PC4 may therefore reflect an early response to induction of differentiation.
- RNA sequencing RNA sequencing
- a population cells or tissues e.g., cell or tissue lysates
- RNA sequencing using a 3 ' digital gene expression method allows the profiling of a high number of samples in a cost-efficient manner.
- the protocol is robust for a broad range of input from single cells to pooled cells or extracted RNA. It allows the profiling of a large number of samples of extracted RNA (patient samples for example), profiling of a population of small number of cells (e.g., cell or tissue lysates), as well as analysis of sorted, single cells.
- the use of the barcodes and UMIs described herein permit the tracking of individual transcripts to a specific multi-well plate and to a specific well of that plate, thus permitting correlation of data to the original starting material.
- the above examples are indicative of the powerful applications of the technology.
- the ability to correlate expression analysis to a particular well of a multi-well plate is critical in the screening assay context, regardless of whether the material in the screen is a single cell or lysate. Because the bar codes and UMI allow tracking of individual transcripts, sequencing reactions can be run as massive multiplex reactions rather than a series of individual reactions without losing transcript-level data. This results in a significant increase in efficiency and decrease in cost.
- the sequencing data then can be deconvo luted using, for example, 3 ' digital gene expression to count the number of occurrences of bar code and UMI sequences and obtain an expression level for a particular transcript.
- the methods and reagents described herein also are adaptable to other platforms, e.g., micro fluidic systems such as Fluidigm's CI micro fluidic device. For example, the capture of 96 cells was performed on the CI chip, and the reagents and adapters to prepare the cDNA were incorporated directly on the C 1 chip. cDNAs were retrieved as an output of the CI chip, pooled, and prepared as a Nextera library.
- the nucleic acids, methods, and kits of the invention also provide the ability to profile single cells for which it is not possible to do an individual RNA extraction and purification, or, by working directly with lysates, profiling a high number of conditions under which cells are cultivated without necessarily performing a separate RNA extraction and purification step (e.g., if sequencing cells from a high throughput compound screen, it is unnecessary to extract and purify the RNA from each well individually).
- one or more of the following modifications to the protocol or reagents used were and can optionally be employed.
- another reverse transcriptase can be used, such as the MMLV Maxima H Minus Reverse Transcriptase (Thermo Scientific).
- MMLV Maxima H Minus Reverse Transcriptase Thermo Scientific
- numerous different MMLV reverse transcriptases have been successfully used and can be selected based on user preference, cost, availability and the like.
- a proteinase or protease such as proteinase K, may be added during lysis.
- proteinase K is included as part of lysis for sorted single cells and isolated cells/ly sates.
- RNA sequencing of lysates inputs ranged from single cells to 10,000 cells (including tens or hundreds of cells). For pooled cells, more concentrated proteinase K (2mg/ml instead of lmg/ml for single cells) was used, and the cells were incubated longer (one hour at 50 °C instead of 15 minutes) to increase lysis efficiency.
- Full length cDNA amplification Amplify full length cDNA by single primer PCR using the Advantage 2 PCR Enzyme System (Clontech, #639206).
- the PCR reaction is as follows: 20 ⁇ , of cDNA from previous step, 5 ⁇ of 10X Advantage 2 PCR buffer, ⁇ of dNTPs, ⁇ of the SINGV6 primer ( ⁇ , Integrated DNA Technologies), ⁇ of Advantage 2 Polymerase Mix, and 22 ⁇ of Nuclease-Free Water.
- Full length cDNA purification and quantification [0145] Purify the full length cDNAs with 30 ⁇ of Agencourt AMPure XP magnetic beads (Beckman Coulter, #A63880). Elute the full length cDNAs in 12 ⁇ of Nuclease-Free Water and quantify on the Qubit 2.0 Flurometer (Life Technologies) using the dsDNA HS Assay (Life Technologies. #Q32851).
- Sequencing Library Quality Assessment [0148] Quantify the library on the Qubit 2.0 Flurometer using the dsDNA HS Assay.
- the quality and average size of the library can be assessed by BioAnalyzer (Agilent) with the High Sensitivity DNA kit (Agilent, #5067-4626).
- Sequencing can be performed on any Illumina HiSeq or MiSeq, using the standard Illumina sequencing kit. Libraries are run on paired-end flow cells by running 17 cycles on the first end, then 8 cycles to decode the Nextera barcode and finally 46 cycles. Up to twelve Nextera libraries/384-well capture plate, each comprising 384 cells, can be multiplexed together (twelve i7 barcodes currently available) allowing the simultaneous sequencing of up to 4,608 single cell transcriptomes on a single lane.
- sequences are provided below and herein. Such sequences are merely illustrative of various polynucleotides and components useful in the methods of the present invention. These polynucleotides are suitable across any of the various sample types described herein (e.g., single cells, lysates, bulk RNA, etc.).
- V (A, G, or C) N: (A, G, C, or T)
Abstract
The present invention relates generally to methods for single-cell nucleic acid profiling, and nucleic acids useful in those methods. For example, it concerns using barcode sequences to track individual nucleic acids at single-cell resolution, utilizing template switching and sequencing reactions to generate the nucleic acid profiles. These methods and compositions are also applicable to other starting materials, such as cell and tissue lysates or extracted/purified RNA.
Description
High-throughput R A-seq
Related Application
[0001] This application claims priority and benefit from U.S. Provisional Patent Application No. 61/834,163, filed June 12, 2013, the contents and disclosures of which are hereby incorporated by reference in their entirety.
Field of the Invention
[0002] The present invention relates generally to methods for single-cell nucleic acid profiling, and nucleic acids useful in those methods. In some embodiments, it concerns using barcode sequences to track individual nucleic acids at single-cell resolution, utilizing template switching and sequencing reactions to generate the nucleic acid profiles. In addition to the substantial utility in single cell profiling, the methods and compositions provided herein are also applicable to other starting materials, such as cell and tissue lysates or extracted/purified RNA. Background of the Invention
[0003] Although transcriptome profiling is an important method for functional characterization of cells and tissues, current technical limitations for whole transcriptome analysis limit the technique to either population averages or to a limited number of single cells. These shortcomings limit transcriptome profiling 's ability to accurately assess stochastic variation in gene expression between
individual cells and the analysis of distinct subpopulations of cells, both of which have been proposed to be important factors driving cellular differentiation and tissue homeostasis. In addition, current single-cell transcriptome profiling methods, in addition to being limited to a relatively low number of cells, also are expensive and labor-intensive. Improved methods are therefore required to fully characterize a cell population at single-cell resolution. Such improved methods also have utility in improving analysis of other starting materials, such as cell and tissue lysates or extracted/purified R A.
Summary of the Invention [0004] In some embodiments, the invention provides a nucleic acid comprising a 5' poly-isonucleotide sequence (for example, comprising an isocytosine, an isoguanosine, or both, such as an isocytosine -isoguanosine-isocytosine sequence), an internal adapter sequence, and a 3' guanosine tract. The 3' guanosine tract can comprise two guanosines, three guanosines, four guanosines, five guanosines, six guanosines, seven guanosines, or eight guanosines. In certain embodiments, the 3' guanosine tract comprises three guanosines. The adapter sequence can be 12 to 32 nucleotides in length, for example, 22 nucleotides in length (e.g., an adapter sequence of 5'-ACACTCTTTCCCTACACGACGC-3' (SEQ ID NO: 1)).
[0005] In some embodiments, the invention provides a nucleic acid comprising a 5' blocking group (e.g., biotin or an inverted nucleotide), an internal adapter sequence, a barcode sequence, a unique molecular identifier (UMI) sequence, a complementarity sequence, and a 3' dinucleotide sequence comprising a first nucleotide and a second nucleotide, wherein the first nucleotide of the dinucleotide sequence is a nucleotide selected from adenine, guanine, and cytosine, and the second nucleotide of the dinucleotide sequence is a nucleotide selected from adenine, guanine, cytosine, and thymine. In certain embodiments, the internal adapter sequence is 23 to 43 nucleotides in length, for example, 33 nucleotides in length (e.g., an internal adapter sequence of 5'- ACACTCTTTCCCTACACGACGC-3' (SEQ ID NO: 1)). In certain
embodiments, the barcode sequence is 4 to 20 nucleotides in length, for example, 6
nucleotides in length. In certain embodiments, the UMI sequence is six to 20 nucleotides in length, for example, ten nucleotides in length. In some
embodiments, the complementarity sequence is a poly(T) sequence, and may be 20 to 40 nucleotides in length, for example, 30 nucleotides in length. [0006] In some embodiments, the invention provides a kit comprising one or more nucleic acids as described above, for example a) a nucleic acid comprising a 5 ' poly-isonucleotide sequence, an internal adapter sequence, and a 3 ' guanosine tract, b) a nucleic acid comprising a 5' blocking group (e.g., biotin or an inverted nucleotide), an internal adapter sequence, a barcode sequence, a unique molecular identifier (UMI) sequence, a complementarity sequence, and a 3 ' dinucleotide sequence comprising a first nucleotide and a second nucleotide, wherein the first nucleotide of the dinucleotide sequence is a nucleotide selected from adenine, guanine, and cytosine, and the second nucleotide of the dinucleotide sequence is a nucleotide selected from adenine, guanine, cytosine, and thymine, or c) both. In certain embodiments, the kit comprises a plurality of the nucleic acids of b). In further embodiments, the UMI sequence of each nucleic acid in the plurality of nucleic acids is unique among the nucleic acids in the kit, and in still further embodiments, the plurality of nucleic acids comprises different populations of nucleic acid species. In such embodiments, each population of nucleic acid species may comprise a different barcode sequence that uniquely identifies a single population of nucleic acid species. In certain embodiments, each population of nucleic acid species is in a separate container, and the bar code of each population of nucleic acid species differs by at least two nucleotides from the bar code of each other population of nucleic acid species. [0007] A kit of the invention may further comprise a third nucleic acid primer comprising 12 to 32 nucleotides (e.g., 22 nucleotides in length) and a 5' blocking group (e.g., biotin or an inverted nucleotide). An exemplary sequence of such a primer is 5'-ACACTCTTTCCCTACACGACGC-3' (SEQ ID NO: 2). A kit may further comprise a nucleic acid comprising a barcode sequence, and optionally also comprise a phosphorothioate bond-containing nucleic acid comprising an
Χ1 *Χ2*Χ3*Χ4*Χ5*3' sequence, wherein * is a phosphorothioate bond. In certain embodiments, the phosphorothioate bond-containing nucleic acid is 48 to 68 nucleotides in length, for example, 58 nucleotides in length. An exemplary sequence of a phosphorothioate bond-containing nucleic acid is
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTC TTCCG*A*T*C*T*3' (SEQ ID NO: 3).
[0008] In some embodiments, the kit further comprises a capture plate and/or a reverse transcriptase enzyme, such as a Moloney Murine Leukemia Virus
(MMLV) reverse transcriptase (e.g., SMARTscribe™ reverse transcriptase or Superscript II™ reverse transcriptase or Maxima H Minus™ reverse transcriptase) and/or a DNA purification column, such as a DNA purification spin column, and/or a protease or proteinase (e.g., proteinase K).
[0009] In some embodiments, the invention provides a method for gene profiling, comprising a) providing a plurality of single cells; b) releasing mRNA from each single cell to provide a plurality of individual mRNA samples, wherein each individual mRNA sample is from a single cell; c) reverse transcribing the individual mRNA samples and performing a template switching reaction to produce cDNA incorporating a barcode sequence; d) pooling and purifying the barcoded cDNA produced from the separate cells; e) amplifying the barcoded cDNA to generate a cDNA library comprising double-stranded cDNA; f) purifying the double-stranded cDNA; g) fragmenting the purified cDNA; h) purifying the cDNA fragments; and i) sequencing the cDNA fragments. In some alternative embodiments, the invention provides a method for gene profiling, comprising a) providing an isolated population of cells; b) releasing mRNA from the population of cells to provide one or more mRNA samples; c) reverse transcribing the one or more mRNA samples and performing a template switching reaction to produce cDNA incorporating a barcode sequence; d) pooling and purifying the barcoded cDNA; e) amplifying the barcoded cDNA to generate a cDNA library comprising double-stranded cDNA; f) purifying the double-stranded cDNA; g) fragmenting
the purified cDNA; h) purifying the cDNA fragments; and i) sequencing the cDNA fragments.
[0010] In certain embodiments, the method further comprises separating a population of cells (e.g., by flow cytometry) to provide the plurality of single cells, for example, by separating them into a capture plate. In alternative embodiments, a population of cells can be sorted into a capture plate such that each well of the capture plate contains a smaller population of cells. Alternatively, cell lysate or R A samples can be divided into a capture plate. In certain embodiments, the mR A is released by cell lysis, for example, by freeze-thawing and/or contacting the cells with proteinase K. In certain embodiments, c) comprises contacting each individual mRNA sample with one or more nucleic acids as described above, for example i) a nucleic acid comprising a 5 ' poly-isonucleotide sequence, an internal adapter sequence, and a 3 ' guanosine tract, ii), a nucleic acid comprising a 5 ' blocking group (e.g., biotin or an inverted nucleotide), an internal adapter sequence, a barcode sequence, a unique molecular identifier (UMI) sequence, a complementarity sequence, and a 3' dinucleotide sequence comprising a first nucleotide and a second nucleotide, wherein the first nucleotide of the dinucleotide sequence is a nucleotide selected from adenine, guanine, and cytosine, and the second nucleotide of the dinucleotide sequence is a nucleotide selected from adenine, guanine, cytosine, and thymine, or iii) both. In certain embodiments, c) is carried out with a reverse transcriptase enzyme, for example, a Moloney Murine Leukemia Virus (MMLV) reverse transcriptase such as SMARTscribe™ reverse transcriptase or Superscript II™ reverse transcriptase or Maxima H Minus™ reverse transcriptase. In certain embodiments, the cDNA purification of d) is carried out with a Zymo-Spin™ column.
[0011] In certain embodiments, the method further comprises treating the barcoded cDNA with an exonuclease, such as with Exonuclease I. In certain embodiments, the amplification of e) utilizes an amplification primer comprising a 5' blocking group, such as biotin or an inverted nucleotide. Exemplary
amplification primers are 12 to 32 nucleotides in length, for example, 22
nucleotides in length (e.g., as in the amplification primer having the sequence of 5'-ACACTCTTTCCCTACACGACGC-3' (SEQ ID NO: 2)). In certain embodiments, the purification of f) may be carried out with magnetic beads, e.g., Agencourt AMPure XP magnetic beads (Beckman Coulter, #A63880), and/or may further comprise quantifying the purified cDNA. In certain embodiments, the single cells are provided in a capture plate of individual wells (e.g., a 384 well plate), each well comprising a single cell. In alternative embodiments, a population of cells is provided in a capture plate, each well comprising a population of cells. Alternatively, cell lysate or RNA samples can be provided in a capture plate. In should be understand throughout that when referring to identification of a particular sample, such as a sample in a well of a plate, that sample may be a single cell or some other sample, such as a lysate or bulk RNA. Thus, reference to a "well" or "sample" should be understood to refer to any of those types of samples. In certain embodiments, reference to "cell/well" or "well/cell" is similarly used to reflect that a sample may be a single cell or some other sample. When a sample is a single cell, identification of a well is equivalent to identification of a single cell. When the sample is something other than a single cell, identification of a well identifies the well in which that sample is provided but does not necessarily identify a single cell. [0012] In certain embodiments, the fragmentation of g) utilizes a transposase, and may further utilize a first fragmentation nucleic acid and a second
fragmentation nucleic acid, wherein the first fragmentation nucleic acid comprises a barcode sequence. An exemplary first fragmentation nucleic acid is 5'- C AAGC AG AAG AC GGC AT AC GAG AT [i7] GT CTC GTGGGCTC GG-3 ' (SEQ ID NO: 4), wherein [i7] represents a barcode sequence. In some embodiments, the
[i7] sequence is four to 16 nucleotides in length, for example, eight nucleotides in length. In some embodiments, the [i7] sequence uniquely identifies a single population of nucleic acid species, for example, a population of nucleic acid species derived from a population of single cells from a capture plate. In some embodiments, the [i7] sequence is selected from: TCGCCTTA (SEQ ID NO: 5),
CTAGTACG (SEQ ID NO: 6), TTCTGCCT (SEQ ID NO: 7), GCTCAGGA (SEQ
ID NO: 8), AGGAGTCC (SEQ ID NO: 9), CATGCCTA (SEQ ID NO: 10), GTAGAGAG (SEQ ID NO: 11), CCTCTCTG (SEQ ID NO: 12), AGCGTAGC (SEQ ID NO: 13), CAGCCTCG (SEQ ID NO: 14), TGCCTCTT (SEQ ID NO: 15), and TCCTCTAC (SEQ ID NO: 16). In certain embodiments, the barcode sequence of the first fragmentation nucleic acid is different than the barcode sequence of the nucleic acid described in ii) above. In certain embodiments, the barcode sequence of the first fragmentation nucleic acid uniquely identifies a predetermined subset of cells, for example, a subset of cells contained in individual wells of a single capture plate. In further embodiments, the barcode sequence that uniquely identifies the predetermined subset of cells uniquely identifies the capture plate. In certain embodiments, the barcode sequence of the nucleic acid as described in ii) above uniquely identifies the cell within the predetermined subset of cells, which cell comprised the m NA from which the barcoded cDNA of c) was produced. In further embodiments, the barcode sequence that uniquely identifies the cell within the predetermined subset of cells uniquely identifies an individual well in a capture plate, and in still further embodiments, the
combination of the barcode sequence that uniquely identifies the predetermined subset of cells and the barcode sequence that uniquely identifies the cell within a predetermined subset of cells uniquely identifies the capture plate and the individual well which comprised the cell, which cell comprised the mRNA from which the barcoded cDNA of c) was produced. In certain embodiments, the barcode sequence of the first fragmentation nucleic acid is 4 to 20 nucleotides in length, for example, 6 nucleotides in length. In certain embodiments, the second fragmentation nucleic acid is a phosphorothioate bond-containing nucleic acid comprising an X1 *X2*X3*X4*X5*3' sequence, wherein * is a phosphorothioate bond. An exemplary second fragmentation nucleic acid is 48 to 68 nucleotides in length, e.g., 58 nucleotides in length, such as a second fragmentation nucleic acid with a sequence of 5'-
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTC TTCCG*A*T*C*T*-3' (SEQ ID NO: 3).
[0013] In certain embodiments, the purification of h) is carried out with magnetic beads, and may optionally further comprise separating the magnetic-bead purified cDNA on an agarose gel, excising cDNA corresponding to 300 to 800 nucleotides in length, and purifying the excised cDNA. In certain embodiments, h) further comprises quantifying the purified cDNA. In certain embodiments, the sequencing of i) is carried out using R A-seq. In certain embodiments, the method further comprises assembling a database of the sequences of the sequenced cDNA fragments of j), and may additionally comprise identifying the UMI sequences of the sequences of the database. In further embodiments, j) further comprises discounting duplicate sequences that share a UMI sequence, thereby assembling a set of sequences in which each sequence is associated with a unique UMI.
[0014] In certain embodiments, a) through h) are repeated before i) to produce a plurality of populations of cDNA fragments, and in particular embodiments, the populations of cDNA fragments are combined prior to i). In certain embodiments, the barcode sequence of the first fragmentation nucleic acid and the barcode sequence of the nucleic acid as described in ii) above are used to correlate the sequencing data with the predetermined subset of cells and the individual cell.
Brief Description of the Drawings
[0015] Figure 1 depicts incomplete differentiation of human adipose tissue - derived stromal/stem cells (hASCs) in vitro. Figure 1 A: cells at day 0. Figure IB: cells at day 7 (i.e., on the seventh day after the cells were induced to differentiate). Figure 1C: cells at day 14 (i.e., on the fourteenth day after the cells were induced to differentiate).
[0016] Figure 2 depicts a flow chart of an exemplary method for single cell RNA sequencing.
[0017] Figure 3 depicts how a single cell digital gene expression library was constructed, including barcode sequences incorporating sequencing primer sequences, indicated by arrows, and regions that anneal to their complementary oligonucleotides on a flow cell during sequencing (P5 and P7). N6: cell/well
barcode index; N10: Unique Molecular Identifier (UMI). The sequencing primer with an i7 plate index is indicated by an arrow, and the two sequencing primers (read 1 and read 2) also are indicated by arrows.
[0018] Figure 4 depicts a reduction in PCR bias through the use of Unique Molecular Identifier (UMI) sequences.
[0019] Figure 5 depicts distributions of expression levels of the key marker genes FABP4 (Figure 5A), SCD (Figure 5B), LPL (Figure 5C), and POSTN (Figure 5D) during adipocyte differentiation. Particularly, Figure 5 depicts the expression levels of gene across the cells/wells over time such that the position on the y axis shows the level of expression and the thickness of the bar shows the number of cells expressing at that level.
[0020] Figure 6 depicts gene detection in single cells. Approximately 3,000 to 4,000 unique genes were detected per cell and approximately 15,000 unique genes were detected across all cells. Gene expression was reliably detected at approximately 25 to 50 transcripts per cell, although bursty transcription
(transcription occurring in pulses rather than at a constant rate) introduced additional variation.
[0021] Figure 7 depicts GAPDH detection at day 0. Figure 7 A depicts a histogram showing the distribution of GAPDH expression among cells profiled at day 0 as an exemplification of a transcriptional burst. Figure 7B depicts genes associated with GAPDH. Figure 7C provides a pictorial representation of the cell cycle. GAPDH is considered to be a housekeeping gene and often is used as a reference gene for normalization.
[0022] Figure 8 depicts principal component analysis of an hASC population at day O.
[0023] Figure 9 depicts principal component analysis of an hASC population at day 0 (black) and day 1 (gray).
[0024] Figure 10 depicts principal component analysis of an hASC population at day 0 (black) and day 2 (gray).
[0025] Figure 11 depicts principal component analysis of an hASC population at day 0 (black) and day 3 (gray). [0026] Figure 12 depicts principal component analysis of an hASC population at day 0 (black) and day 7 (gray).
[0027] Figure 13 depicts principal component analysis of an hASC population at day 0 (black) and day 14 (gray).
[0028] Figure 14 depicts differentially expressed genes between day 0 (black) and day 14 (gray) hASC populations and between day 14 sub-populations.
[0029] Figure 15 depicts the expression of adipocyte genes correlating with Gl- arrest. Genes that had similar expression levels at Day 14 and Day 0 (Figure 15 A, label A) correspond to categories of genes involved in G-l arrest (Figure 15B, label A), indicating that those cells that did not fully differentiate may be stuck in the GO phase. This reveals a correlation between differentiation state and cell cycle progression when gene expression is analyzed at the single cell level.
[0030] Figure 16 depicts the process of adipocyte differentiation in mouse (3T3- Ll) and human (hASC) stem cells, and that an absence of clonal expansion of hASCs may limit adipogenesis. [0031] Figure 17 depicts cell culture heterogeneity using single-cell sequencing. Figure 17A depicts gene expression estimates from bulk cells compared to their corresponding means across single cell profiles. UPM: unique molecular identifier (UMI) counts for one gene per million UMI counts for all genes. Figure 17B depicts the distribution of observed pairwise correlations (Pearson's r) between all pairs of genes that were detected in at least 10% of day 7 cells (n = 4,038 genes), as compared to an estimated null distribution obtained by permuting the expression values of each gene across the same cells. Figures 17C and 17D depict single cell
qPvT-PCR validation and single molecule FISH validation, respectively, of the observed positive correlation between the LPL and G0S2 markers from separate cells also collected at day 7.
[0032] Figure 18 depicts a comparison of RefSeq gene expression levels as estimated from the total number of raw aligned sequencing reads or the total number of unique UMIs. Each dot compares the mean raw counts across all profiled cells in the first time course (Dl) to the mean UMI counts for the same gene. The raw and UMI counts are strongly correlated, but the UMI counts correct for a systematic bias in the raw expression levels of a subset of genes, which is likely caused by preferential PCR amplification or sequencing.
[0033] Figure 19 depicts the relationship between the proportion of cells where a gene was detected (UMI count > 1) and its estimated expression level from bulk RNA profiling. Data is shown for day 0 of the D3 differentiation time course. Solid line: medians; top and bottom dotted lines: 90th and 10th percentiles, respectively. UPM = UMI counts for a gene per million UMI counts from all genes.
[0034] Figure 20 depicts a comparison of single-molecule RNA sequencing (Figure 20A) and single molecule FISH (smFISH, Figure 20B) data for LPL and G0S2 during the D3 time course. Single -molecule RNA sequencing values are in UPM, while smFISH measurements are in mRNAs detected per cell. The smFISH data confirm the positive correlation between LPL and G0S2 after 7 days of differentiation. R: Pearson's correlation coefficient.
[0035] Figure 21 depicts gene expression dynamics at single cell resolution. Each scatter plot depicts the first three principal components (PCs) of the initial hASC time course at the indicated time point (Figure 21 A: day 0; Figure 21B: day 1; Figure 21C: day 2; Figure 21D: day 3; Figure 21E: day 5; Figure 21F: day 7; Figure 21G: day 9; Figure 21H: day 14). Black dots show cells collected at the indicated time point, while gray dots show cells collected at all previous time
points. Figure 211 depicts separately sorted cells with high and low lipid content from day 14 projected into the same PC space.
[0036] Figure 22 depicts distributions of weights for the top four PCs in an initial hASC time course and a lipid-based sorting. To the right of the gene expression data, selected genes and gene sets associated with positive and negative weights are provided. Percentages indicate the ratio of the total variance in the data set captured by each PC. Horizontal lines within each set of boxes indicate medians, boxes indicate the 1st and 3rd quartiles, and whiskers indicate the ranges.
Detailed Description of the Invention
[0037] The present invention provides nucleic acids, kits, and methods for transcriptome-wide profiling at single cell resolution. In some embodiments, the invention provides Unique Molecular Identifiers (UMIs) (e.g., polynucleotides comprising UMIs) that specifically tag individual cDNA species as they are created from mRNA, thereby acting as a robust guard against amplification biases. Each UMI enables a sequenced cDNA to be traced back to a single particular mRNA molecule that was present in a cell. In some embodiments, the invention provides two levels of barcode-based multiplexing, allowing a sequenced cDNA to be traced to a particular cell from among a subset of cells. In some embodiments, the invention provides efficient transposon-based fragmentation, resulting in high yield cDNA libraries. In some embodiments, the invention provides sequencing of the 3 '-end of mRNAs, limiting the sequencing coverage required to assess gene expression level of each single cell transcriptome. The methods allow the preparation of RNA-seq libraries in a manner that is not labor-intensive or time- consuming. Indeed, RNA-seq libraries of a thousand single cells can be easily prepared in two days. Any of the foregoing (or any of the nucleic acids, reagents, kits, and methods described herein may be provided and/or used alone or in any combination).
[0038] The foregoing is also applicable to populations of cells, cell lysates, tissue lysates, and/or extracted/purified RNA. For example, the invention also provides nucleic acids, kits, and methods for sequencing of extracted/purified RNA (bulk RNA sequencing) or for analysis of an isolated population of cells (e.g., from an isolated population of cells or a tissue; analysis of a cell or tissue lysate). In certain embodiments, any of the compositions, reagents, and methods described herein as applicable to single cells also are applicable to other sources of starting materials, such as extracted RNA, purified RNA, cell lysates, or tissue lysates, and such application is contemplated. In certain embodiments, any of the
compositions, reagents, and methods described herein as applicable to extracted RNA, purified RNA, cell lysates or tissue lysates, also are applicable to single cells, and such application is contemplated.
[0039] The present invention provides improved nucleic acids, kits, and methods capable of transcriptome-wide profiling at single cell resolution of tens of thousands of cells simultaneously and cost-effectively (approximately $2 per sample, as compared to approximately $80 per sample with a current method). In certain embodiments, the methods and kits may include both customized nucleic acids and/or method steps that are themselves the subject of this application, as well as one or more commercially available reagents, kits, apparatuses, or method steps. The methods of the invention provide a number of distinct advantages over existing methods. Some current methods require a polyA addition step prior to sequencing, but this step can be eliminated through the use of a Moloney Murine Leukemia Virus reverse transcriptase. Moreover, full-length cDNA amplification can be carried out using the suppression PCR principle, thereby enriching full length cDNAs, and the method can be applied directly to cells rather than requiring
RNA extraction first.
[0040] The methods of the invention also provide an advantage in that they utilize at least two barcode sequences rather than one, allowing for the
simultaneous sequencing of at least 4,608 single-cell transcriptomes in a single lane, as compared to only 96 transcriptomes in current methods. Still further,
optimization of reaction volumes can conserve expensive reagents, such as the reverse transcriptase enzyme, reducing costs. Additionally, by utilizing 3' end digital sequencing, less sequencing coverage is needed to determine gene expression levels, further reducing costs. [0041] The methods of the invention provide an advantage over current methods targeting the 3 'end of mRNA that use linear mR A amplification. Linear mR A amplification is time-consuming compared to template switching/suppression PCR amplification. Linear mRNA amplification also is labor-intensive and limits the number of cells that can be processed to approximately 50 cells per day by a single person. By contrast, the methods of the invention can accommodate 384 cells in a single plate, allowing a single person to easily process up to 1152 cells per day.
[0042] The use of UMIs also provides a distinct advantage over typical single- cell RNA-seq methods. Because of the very low starting amount of RNA in a single cell, several amplification steps are required during the process of the RNA- seq library preparation, and the UMIs protect against amplification biases.
[0043] The methods of the invention utilizing a transposase-based sequencing library preparation have the added advantage of eliminating a number of labor- intensive and costly steps in library preparation, including magnetic bead immobilization, separate fragmentation, end repair, dA-tailing, and adaptor ligation. By eliminating the separate steps of chemical fragmentation and its purification, end repair, dA-tailing and adapter ligation, labor and cost are reduced, and the yield is much higher than with other techniques because there are fewer purification steps (during which material can be lost) and because this method to tag the fragment is much more efficient than by ligation with a regular ligase. Because less material is lost in the process, the methods of the invention can start with a much lower amount of starting cDNA. This is beneficial because even when combining and amplifying cDNA from 384 cells, there is often a low starting amount of cDNA to begin the library preparation.
[0044] The invention provides methods that are advantageous based on a number of improvements to existing methods. A typical method provided by the invention is depicted in Figure 2, and starts with preparing a capture plate for cell sorting. Cells are then sorted into the plate (e.g., by fluorescence activated cell sorting), after which the plate may be frozen down for storage. For single cell analysis, one cell is sorted into each well of the plate. One advantage of the nucleic acids provided herein is that the use of various barcodes permits the end user to correlate transcript expression back to a particular well and plate, and thus to a specific cell evaluated. To lyse the cells, the plate can, in certain embodiments, be thawed from its frozen state. Optionally, a proteinase or protease, such as proteinase K, is added to the cells to increase the efficiency of the lysis. If performing bulk RNA-seq, the cell sorting and individual cell lysis steps can be skipped, as the starting material is already R A. If the starting material is a population of cells, the population can be divided into a multi-well plate in preparation for lysis. Or, if the starting material is a lysate prepared from a population of cells or tissues, cell or tissue lysis may optionally occur in a prior step before introduction into the well and then lysate itself may be added to each well of a multi-well plate. For example, a population of cells can be sorted into lysis buffer and lysed (e.g., by freeze-thawing, proteinase K treatment, or a combination thereof) before the lysate is added to the plate. The next steps are to reverse transcribe the mR A that has been released from the cells and to perform a template switching step. The reverse transcription and template switching can be performed using the nucleic acids of the invention, which efficiently perform these steps. For example, a cDNA synthesis primer comprising a 5' blocking group, an internal adapter sequence, a barcode sequence, a unique molecular identifier (UMI) sequence, a complementarity sequence, and a
3 ' dinucleotide sequence comprising a first nucleotide and a second nucleotide, wherein the first nucleotide of the dinucleotide sequence is a nucleotide selected from adenine, guanine, and cytosine, and the second nucleotide of the dinucleotide sequence is a nucleotide selected from adenine, guanine, cytosine, and thymine, can be used for reverse transcription. Here, the 5 ' blocking group is used to ensure the correct directionality of cDNA synthesis and the adapter sequence provides a
sequence annealing to a sequencing primer, so the first sequencing read will contain the barcode and UMI sequences. Part of the adapter sequence also is used during the suppression PCR. The barcode sequence is used to track which well (and, thus, which cell) a particular cDNA was generated from. In bulk RNA-seq and lysate sequencing embodiments, a barcode can provide a reference for (and, thus, a way to identify) the sample or the pool (e.g., the well) rather than a single cell. Alternatively, a UMI can be used in bulk RNA-seq and lysate sequencing to identify the transcript and the \Ί primer (which, in other embodiments, typically contains the barcode for the plate, e.g., for plate indexing - sometimes referred to as the plate barcode or the index) identifies the sample or pool (e.g., the well) rather than the single cell. In these embodiments, the UMI can be, for example, a 16mer UMI. Thus, in certain embodiments, a combination of one or more barcodes and a UMI is used. In other embodiments, a UMI is used either alone or with a single barcode. In either way, the methods and compositions provide a mechanism for identifying where a particular transcript came from. In certain embodiments, i7 is used for plate indexing (e.g., it is a barcode to identify a particular plate). In other embodiments, \Ί serves as a sample barcode. The UMI provides a way to trace each cDNA produced to a particular mRNA derived from a cell/sample. The complementarity sequence anneals to the mRNA, for example, to the poly(A) tail of an mRNA, although it also could anneal to a specific target sequence, such as the sequence of a particular mRNA, instead. The 3 ' dinucleotide sequence target the extremity of the polyA tail, the last two bases of the mRNA before the polyA tail. These two final nucleotides prevent the nucleic acid from annealing elsewhere within the polyA tail, which can be as long as 250bp in length. If the nucleic acid were to bind elsewhere, one would not be able to directly access the useful sequence information of the transcript. A template- switching oligonucleotide comprising a 5 ' poly-isonucleotidecytosine- isoguanosine-isocytosine sequence, an internal adapter sequence, and a 3' guanosine tract can be used in the template switching step. The 5' poly- isonucleotidecytosine-isoguanosine-isocytosine sequence provides non-standard base pairs in the template switching oligo to prevent background cDNA synthesis.
These nucleotide isomers inhibit reverse transcriptase, such as MMLV reverse transcriptase, from extending the cDNA beyond the template switching adapter, thus increasing cDNA yield by reducing formation of concatemers of the template switching adapter. The adapter sequence provides the sub sequence required for the suppression PCR, and the 3 ' guanosine tract is used to anneal to a polycytosine tract generated at the 3 ' end of the first strand of cDNA synthesized. These steps are useful in incorporating a barcode and a UMI into the resulting cDNAs. The barcode introduced here helps track the individual well (and, therefore, cell/sample) that a cDNA population came from, while the UMI is unique for each mR A that produces a cDNA. Thus, the population of UMIs incorporated into the cDNAs provide a molecular "snapshot" of the mRNA population of the cell or sample at the time of lysis, because subsequent amplification steps do not alter the number of UMIs, making it possible to trace back each cDNA sequenced later to a particular mRNA released from the cell/sample. The template switching step is selective for the creation of full-length cDNAs.
[0045] After reverse transcription and template switching, the wells can be pooled together and purified, followed by treatment with an exonuclease such as Exonuclease I. Without the exonuclease treatment, such as Exonuclease I treatment, the primer used for the suppression PCR can bind to the remaining adapters that are in excess from the template switching reaction, so the addition of an exonuclease, such as Exonuclease I, improves results. The cDNAs then are amplified (e.g, via PCR), followed by subsequent purification and quantification steps. Next, the library is prepared for sequencing by fragmentation, e.g., with a transposase-based fragmentation system. This step also introduces a second bar code to the cDNAs, this second bar code being specific for the capture plate from which the cDNAs were pooled. Thus, each cDNA will have a bar code for both the plate and the well from which it was derived, allowing simultaneous processing of a large number of samples, in which each individual sequence can be traced back to a single mRNA of a specific cell (or, in the case of another type of sample, to be traced back to a well containing a cell or tissue lysate sample, a purified RNA sample, or the like). The library then can be purified, selected for appropriate size
fragments, assessed for quantity and quality, and sequenced (e.g., by R A-seq such as the Illumina HiSeq™ (Catalog # SY-401-2501) or MiSeq™ (Catalog # SY-410-1003) systems). The sequencer can handle various read lengths and either single-end or paired-end sequencing. The libraries can be run in a way that matches with the read length required to read each barcode and obtains enough information from the sequence of the cDNA to identify from which gene it was coming from. For example, 17 cycles can be run for read 1 (see above) to read first the 6bp well/cell barcode and the lObp of UMI. This is then followed by 9 cycles to read the 8bp i7 plate index. Finally, 46 cycles are, in certain
embodiments, run on the other strand to read the cDNA/gene sequence. The machine allows the operator to set up a custom run for which they decide the read length for each portion for which sequence is to be obtained. This sequencing design allows an individual to decipher all the information while using the smaller/cheapest kit to meet their needs (e.g., 50 cycle kit that actually contains enough reagents for 74 cycles). Alternatively, an individual could run more cycles to get longer stretches of cDNA.
Before sequencing, samples from multiple capture plates can be combined without losing the identity of each cDNA in the mixture because of the two barcode sequences. Thus, the data can be deconvo luted after sequencing to determine the UMI of each particular cDNA and the well and plate it came from via the barcodes. This is advantageous because it allows a researcher to run many more samples together than would otherwise be possible, and to do so with less cost and labor.
Definitions [0046] Throughout this specification, the word "comprise" or variations such as "comprises" or "comprising" will be understood to imply the inclusion of a stated integer (or components) or group of integers (or components), but not the exclusion of any other integer (or components) or group of integers (or
components).
[0047] The singular forms "a," "an," and "the" include the plurals unless the context clearly dictates otherwise.
[0048] The term "including" is used to mean "including but not limited to." "Including" and "including but not limited to" are used interchangeably. [0049] The terms "patient," "subject," and "individual" may be used
interchangeably and refer to either a human or a non-human animal. These terms include mammals such as humans, primates, livestock animals (e.g., bovines, porcines), companion animals (e.g., canines, felines) and rodents (e.g., mice and rats). [0050] The term "diagnosis" as used herein refers to methods by which the skilled artisan can estimate and/or determine whether or not a patient is afflicted with a given disease or condition. The skilled worker often makes a diagnosis based on one or more diagnostic indicators. Exemplary diagnostic indicators may include the manifestation of symptoms or the presence, absence, or change in one or more markers for the disease or condition. A diagnosis may indicate the presence or absence, or severity, of the disease or condition.
[0051] The term "prognosis" is used herein to refer to the likelihood of the progression or regression of a disease or condition, including likelihood of the recurrence of a disease or condition. [0052] As used herein, "treating" a disease or condition refers to taking steps to obtain beneficial or desired results, including clinical results. Beneficial or desired clinical results include, but are not limited to, reduction, alleviation or amelioration of one or more symptoms associated with the disease or condition.
[0053] As used herein, "administering" or "administration of a compound or an agent to a subject can be carried out using one of a variety of methods known to those skilled in the art. For example, a compound or an agent can be administered orally, intravenously, arterially, intradermally, intramuscularly, intraperitoneally, subcutaneously, ocularly, sublingually, intranasally, intraspinally, intracerebrally,
and transdermally. A compound or agent can appropriately be introduced by rechargeable or biodegradable polymeric devices or other devices, e.g., patches and pumps, or formulations, which provide for the extended, slow, or controlled release of the compound or agent. Administering can also be performed, for example, once, a plurality of times, and/or over one or more extended periods. Administration of a compound may include both direct administration, including self-administration, and indirect administration, including the act of prescribing a drug. For example, a physician who instructs a patient to self-administer a therapeutic agent, or to have the agent administered by another, and/or who provides a patient with a prescription for a drug has administered the drug to the patient.
[0054] The term "nucleic acid" refers to DNA molecules (e.g., cDNA or genomic DNA), RNA molecules (e.g., mRNA), DNA-RNA hybrids, and analogs of the DNA or RNA generated using nucleotide analogs. The nucleic acid molecule can be a nucleotide, oligonucleotide, double-stranded DNA, single- stranded DNA, multi-stranded DNA, complementary DNA, genomic DNA, non- coding DNA, messenger RNA (mRNA), microRNA (miRNA), small nucleolar RNA (snoRNA), ribosomal RNA (rRNA), transfer RNA (tRNA), small interfering RNA (siRNA), heterogeneous nuclear RNAs (hnRNA), or small hairpin RNA (shRNA).
[0055] As used herein, a "profile" of a transcriptome or portion of a
transcriptome can refer to any sequencing or gene expression information concerning the transcriptome or portion thereof. This information can be either qualitative (e.g., presence or absence) or quantitative (e.g., levels or mRNA copy numbers). In some embodiments, a profile can indicate a lack of expression of one or more genes.
[0056] The term "cDNA library" refers to a collection of complementary DNA (cDNA) fragments. A cDNA library may be generated from the transcriptome of a single cell or from a plurality of single cells. cDNA is produced from mRNA
found in a cell and therefore reflects those genes that have been transcribed for subsequent protein expression.
[0057] As used herein, a "plurality" of cells refers to a population of cells and can include any number of cells to be used in the methods described herein. For example, a plurality of cells includes at least 10 cells, at least 25 cells, at least 50 cells, at least 100 cells, at least 200 cells, at least 500 cells, at least 1,000 cells, at least 5,000 cells, or at least 10,000 cells. In some embodiments, a plurality of cells includes from 10 to 100 cells, from 50 to 200 cells, from 100 to 500 cells, from 100 to 1,000 cells, or from 1,000 to 5,000 cells. [0058] As used herein, a "single cell" refers to one cell. Single cells useful in the methods described herein can be obtained from a tissue of interest, or from a biopsy, blood sample, or cell culture. Additionally, cells from specific organs, tissues, tumors, neoplasms, or the like can be obtained and used in the methods described herein. Cells can be cultured cells or cells from a dissociated tissue, and can be fresh or preserved in a preservative buffer such as R Aprotect.
Furthermore, in general, cells from any population can be used in the methods, such as a population of prokaryotic or eukaryotic single-celled organisms including bacteria or yeast. In some aspects of the invention, the method of preparing the cDNA library can include the step of obtaining single cells. A single cell suspension can be obtained using standard methods known in the art including, for example, enzymatically using trypsin or papain to digest proteins connecting cells in tissue samples or releasing adherent cells in culture, or mechanically separating cells in a sample. Single cells can be placed in any suitable reaction vessel in which single cells can be treated individually. For example a 96-well plate, such that each single cell is placed in a single well.
[0059] As used herein, an "oligonucleotide" or "polynucleotide" refers to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides or analogs thereof. Polynucleotides can have any three- dimensional structure and can perform any function. Exemplary polynucleotides include a gene or gene fragment (e.g., a probe or primer), exons, introns,
messenger R A (mR A), transfer R A, ribosomal R A, ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA or RNA of any sequence, and nucleic acid probes and primers. A
polynucleotide can comprise modified nucleotides, such as isonucleotides, methylated nucleotides, and other nucleotide analogs. The term also refers to both double- and single-stranded molecules. A polynucleotide is composed of a specific sequence of four nucleotide bases: adenine (A), cytosine (C), guanine (G), and thymine (T). Uracil (U) substitutes for thymine when the polynucleotide is RNA. The sequence can be input into databases in a computer having a central processing unit and used for bioinformatics applications such as functional genomics and homology searching.
[0060] As used herein, a "primer" is a polynucleotide that hybridizes to a target or template that may be present in a sample of interest. After hybridization, the primer promotes the polymerization of a polynucleotide complementary to the target, for example in a reverse transcription or amplification reaction.
Cell sorting and lysis
[0061] Methods for selecting or sorting cells are well established, and in some embodiments include, but are not limited to, fluorescence-activated cell sorting (FACS), micromanipulation, manual sorting, and the use of semi-automated cell pickers. Individual cells can be individually selected based on features detectable by observation (e.g., by microscopic observation). Exemplary features can include location, morphology, and reporter gene expression. A population of cells can be sorted to provide a subpopulation or a predetermined subset of cells. In some embodiments, the population, subpopulation, or predetermined subset can be sorted to provide single cells. In some embodiments, the cells are sorted into a capture plate. Capture plates can comprise a number of wells into which the cells are sorted, for example, 24 wells, 96 wells, 384 wells, or 1536 wells. In some embodiments, a population of cells is lysed without sorting. The population of cells can be, for example, a tissue sample. In certain embodiments, the population of cells is an isolated population of cells. In such embodiments, the starting
material for further analysis may be, for example, a cell or tissue lysate or bulk purified or extracted RNA. In such embodiments, cells can be divided into the wells of a plate without sorting. In particular embodiments, the amount of material in each well is normalized with respect to the other wells so as to provide similar sequencing coverage across a plate.
[0062] To release mRNA from cells, the cells may be lysed. Cells may be lysed by any number of known techniques. Exemplary cell lysis techniques include freeze-thawing, heating the cells, using a detergent or other chemical method, or a combination thereof. Techniques minimizing degradation of the released mRNA are preferred. Likewise, techniques preventing the release of nuclear chromatin are preferred. For example, heating the cells in the presence of Tween-20 is sufficient to lyse cells while minimizing genomic contamination from nuclear chromatin. In certain embodiments, cells are lysed using freeze-thawing. In some embodiments, a proteinase or protease, such as proteinase K, is added to the lysis reaction to increase the efficiency of lysis. In certain embodiments, cells are lysed using freeze-thawing optionally supplemented with addition of proteinase K.
[0063] As noted above, cell lysis may be of single cells already sorted into individual wells of a plate. Alternatively, lysis of populations of cells may be performed and the starting material for further sequence analysis may be a cell or tissue lysate made from a plurality of cells and then aliquoted to wells of a plate. Regardless of starting material, in certain embodiments, following lysis the material may be stored at a suitable temperature, such as -80 °C, prior to further use.
Reverse transcription and template switching [0064] In some embodiments, cDNA is synthesized from mRNA through the process of reverse transcription. Reverse transcription can be performed directly on cell lysates (for example, a cell lysate prepared as described above), by adding a reaction mix for reverse transcription directly to the cell lysate. In alternative embodiments, the total RNA or mRNA can be purified after cell lysis, for example
through the use of column based (e.g., Qiagen RNeasy Mini kit Cat. No. 74104, ZymoResearch Direct-zol RNA Cat. No. R2050) or magnetic bead purification (e.g., Agencourt RNAClean XP, Cat. No. A63987). Methods for reverse transcription of mRNA to cDNA are well established in the art. In some embodiments, the reverse transcription is combined with a template switching step to improve the yield of longer (e.g., full length) cDNA molecules. In certain embodiments, the reverse transcriptase used has tailing or terminal transferase activity, and synthesizes and anchors first- strand cDNA in one step. In certain embodiments, the reverse transcriptase is a Moloney Murine Leukemia Virus (MMLV) reverse transcriptase, for example, SMARTscribe™ (Clontech, Cat. No. 639536) reverse transcriptase, Superscript II™ reverse transcriptase (Life Technologies, Cat. No. 18064-014), or Maxima H Minus™ reverse transcriptase. (Thermo Scientific, Cat. No. EP0753).
[0065] Template switching introduces an arbitrary sequence at the 3 ' end of the cDNA that is designed to be the reverse complement to the 3 ' end of a cDNA synthesis primer. In some embodiments, the synthesis of the first strand of the cDNA can be directed by a cDNA synthesis primer (CDS) that includes an RNA complementary sequence (RCS). In some embodiments, the RCS is at least partially complementary to one or more mRNA species in an individual mRNA sample, allowing the primer to hybridize to at least some mRNA species in a sample to direct cDNA synthesis using the mRNA as a template. The RCS can comprise oligo (dT) sequence that binds to many mRNA species, or it can be specific for a particular mRNA species, for example, by binding to an mRNA sequence of a gene of interest. Alternatively, the RCS can comprise a random sequence, such as random hexamers. To avoid the CDS self-priming, a non-self- complementary sequence can be used.
[0066] A template-switching oligonucleotide that includes a portion which is at least partially complementary to a portion of the 3 ' end of the first strand of cDNA generated by the reverse transcription can also be used in the methods of the invention. Because the terminal transferase activity of reverse transcriptase
typically causes the incorporation of two to five cytosines at the 3 ' end of the first strand of cDNA synthesized, the first strand of cDNA can include a plurality of cytosines, or cytosine analogues that base pair with guanosine, at its 3 ' end to which the template-switching oligonucleotide with a 3' guanosine tract can anneal. During the template switching step, the template-switching oligonucleotide is extended to form a double stranded cDNA. Thus, in some embodiments, a template-switching oligonucleotide can include a 3 ' portion comprising a plurality of guanosines or guanosine analogues that base pair with cytosine. Exemplary guanosines or guanosine analogues include, but are not limited to,
deoxyriboguanosine, riboguanosine, locked nucleic acid-guanosine, and peptide nucleic acid-guanosine. The guanosines can be ribonucleosides or locked nucleic acid monomers. A locked nucleic acid is an R A nucleotide wherein the ribose moiety has been modified with an extra bridge connecting the 2' oxygen and the 4' carbon. A peptide nucleic acid is an artificially synthesized polymer similar to DNA or RNA, wherein the backbone is composed of repeating N-(2-aminoethyI)- glycine units linked by peptide bonds.
[0067] In some embodiments, the reverse transcription and template switching comprise contacting an mRNA sample with two nucleic acid primers. In certain embodiments, the first nucleic acid primer (e.g., a template-switching
oligonucleotide) comprising a 5 ' poly-isonucleotidecytosine-isoguanosine- isocytosine sequence, an internal adapter sequence, and a 3 ' guanosine tract. In certain embodiments, the 5' poly-isonucleotide sequence comprises an isocytosine, or an isoguanosine, or both. In certain embodiments, the 5 ' poly-isonucleotide sequence comprises an isocytosine -isoguanosine-isocytosine sequence.
Incorporating non-natural nucleotides, such as an isocytosine or an isoguanosine into template-switching primers can reduce background and improve cDNA synthesis (Kapteyn et al., BMC Genomics. 11 :413 (2010)). In some embodiments, the 3' guanosine tract comprises two, three, four, five, six, seven, eight, nine, ten, or more guanosines. In certain embodiments, the 3' guanosine tract comprises three guanosines. In some embodiments, the adapter sequence is 12 to 32 nucleotides in length, for example, 22 nucleotides in length. In particular
embodiments, the internal adapter sequence is 5'- ACACTCTTTCCCTACACGACGC-3' (SEQ ID NO: 1). In particular embodiments, the sequence of the first primer is 5'- iCiGiCACACTCTTTCCCTACACGACGCrGrGrG-3' (SEQ ID NO: 17)(e.g., 1 μΜ,) wherein iC represents isocytosine (iso-dC), iG represents isoguanosine, and rG represents RNA guanosine.
[0068] In certain embodiments, the second nucleic acid primer (e.g., a cDNA synthesis primer) comprises a 5' blocking group, an internal adapter sequence, a barcode sequence, a unique molecular identifier (UMI) sequence, a
complementarity sequence, and a 3' dinucleotide sequence comprising a first nucleotide and a second nucleotide, wherein the first nucleotide of the dinucleotide sequence is a nucleotide selected from adenine, guanine, and cytosine, and the second nucleotide of the dinucleotide sequence is a nucleotide selected from adenine, guanine, cytosine, and thymine. Optionally, to sequence bulk RNA or lysates, the bar code can be omitted from the cDNA synthesis primer and an extra 6 base pairs can be added to the UMI sequence. In particular embodiments, the 5' blocking group is selected from biotin, an inverted nucleotide (e.g., inverted dideoxy-T), a fluorophore, an amino group, and iso-dG or isodC. In particular embodiments, the internal adapter sequence is 23 to 43 nucleotides in length, for example, 33 nucleotides in length. In particular embodiments, the internal adapter sequence is 5'-ACACTCTTTCCCTACACGACGC-3' (SEQ ID NO: 1). In particular embodiments, the barcode sequence is 4 to 20 nucleotides in length, for example, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides in length. In particular embodiments, the UMI sequence is 6 to 20 nucleotides in length, for example, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides in length. In particular embodiments, the complementarity sequence is a poly(T) sequence. In particular embodiments, the complementarity sequence is 20 to 40 nucleotides in length, for example, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 nucleotides in length. In specific embodiments, the second nucleic acid primer is 5 '-
/5Biosg/ACACTCTTTCCCTACACGACGCTCTTCCGATCT[BC6]NNNNNNN
ΝΝ^ΉΧΉΉΉ^ (SEQ ID NO: 18), wherein 5Biosg represents 5' biotin; V represents a nucleotide selected from A, G, and C; the 3' N represents a nucleotide selected from A, G, C, and T; [BC6] represents a 6 base pair barcode sequence; and the (N)10 after the barcode sequence represents a Unique Molecular Identifier (UMI) sequence. In these primers, the barcodes may be designed so that each barcode sequence differs from the barcodes of all other primers by at least two nucleotides, so that a single sequencing error cannot lead to the misidentification of the barcode.
[0069] The UMI sequences provide a robust guard against amplification biases. More particularly, each UMI is present only once in a population of second nucleic acid primers. Thus, each UMI is incorporated into a unique cDNA sequence generated from a cellular mRNA, and any subsequent amplification steps will not alter the one UMI to one mRNA ratio. In certain embodiments, the UMI sequence, rather than being 10 nucleotides in length, is 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or more nucleotides in length. The length should be selected to provide sufficient unique sequences for the population of cells to be tested (preferably with at least two nucleotide differences between any pair of UMIs), preferably without adding unnecessary length that increases sequencing cost. [0070] Barcode sequences enable each cDNA sample generated by the above method to have a distinct tag, or a distinct combination of tags, such that once the tagged cDNA samples have been pooled, the tag can be used to identify the single cell from which each cDNA sample originated. Thus, each cDNA sample can be linked to a single cell, even after the tagged cDNA samples have been pooled and amplified. In other words, the use of the foregoing nucleic acids permits deconvolution of pooled data to single cell/well resolution. This is particularly advantageous for facilitating the application of this technology to screening assays.
[0071] In some embodiments, a nucleic acid useful in the invention can contain a non-natural sugar moiety in the backbone, for example, sugar moieties with 2' modifications such as addition of a halogen, alkyl-substituted alkyl, SH, SCH3.
OCN, CI, Br, CN, CF3, OCF3, S02CH3, OS02, N02, N3, or NH2. Similar modifications also can be made at other positions on the sugar. Nucleic acids, nucleoside analogs or nucleotide analogs having sugar modifications can be further modified to include a reversible blocking group, a peptide linked label, or both. In those embodiments comprising a 2' modification, the base can have a peptide- linked label.
[0072] A nucleic acid useful in the invention also can include native or non- native bases. In some embodiments, a native deoxyribonucleic acid can have one or more bases selected from adenine, thymine, cytosine, and guanine, and a ribonucleic acid can have one or more bases selected from uracil, adenine, cytosine, and guanine. Exemplary non-native bases include, but are not limited to, inosine, xanthine, hypoxanthine, isocytosine, isoguanosine, 5-methylcytosine, 5- hydroxymethyl cytosine, 2-aminoadenine, 6-methyl adenine, 6-methyl guanine. 2- propyl guanine, 2-propyl adenine, 2-thiothymine, 2-thiocylosine, 5-propynyl uracil, 5-propynyl cytosine, 6-azo uracil, 6-azo cytosine, 6-azo thymine, 4- thiouracil, 8-halo adenine, 8-halo guanine, 8-amino adenine, 8-amino guanine, 8- thiol adenine, 8-thiol guanine, 8-thioalkyl adenine, 8-thioalkyl guanine, 8-hydroxyl adenine, 8-hydroxyl guanine, 5-halo substituted uracil, 5-halo substituted cytosine, 7-methylguanine, 7-methyladenine, 8-azaguanine, 8-azaadenine, 7-deazaguanine, 7-deazaadenine, 3-deazaguanine, and 3-deazaadenine. In certain embodiments, isocytosine and isoguanosine may reduce non-specific hybridization. In some embodiments, a non-native base can have universal base pairing activity, wherein it is capable of base-pairing with any other naturally occurring base, e.g., 3- nitropyrrole and 5-nitroindole. cDNA pooling and purification
[0073] In some embodiments, after reverse transcription and template switching have been used to generate cDNA, the cDNA is pooled together. For example, a population of cells can be individually sorted into the wells of a tray, lysed, and undergo reverse transcription and template switching. These cDNAs then can be pooled and purified. In certain embodiments, the cDNA is purified through a
column-based purification method, e.g., with a DNA Clean & Concentrator-5 column (Zymo Research, #D4013).
Exonuclease treatment
[0074] In some embodiments, pooled cDNAs are treated with an exonuclease (e.g., Exonuclease I) to degrade any primers remaining from the reverse transcription and template switching steps. This prevents possible interference by these primers in subsequent amplification.
Amplification
[0075] As used herein, the term "amplification" or "amplifying" refers to a process by which multiple copies of a particular polynucleotide are formed, and includes methods such as the polymerase chain reaction (PCR), ligation amplification (also known as ligase chain reaction, or LCR), and other
amplification methods. In some embodiments, amplification refers specifically to PCR. Amplification methods are widely known in the art. In general, PCR refers to a method of amplification comprising hybridization of primers to specific sequences within a DNA sample and amplification involving multiple rounds of annealing, elongation, and denaturation using a DNA polymerase. The resulting DNA products are then often screened for a band of the correct size. The primers used are oligonucleotides of appropriate length and sequence to provide initiation of polymerization. Reagents and hardware for conducting amplification reactions are widely known and commercially available. Primers useful to amplify sequences from a particular gene region are sufficiently complementary to hybridize to target sequences. Nucleic acids generated by amplification can be sequenced directly. [0076] When hybridization occurs in an antiparallel configuration between two single-stranded polynucleotides, the reaction is called "annealing" and those polynucleotides are described as "complementary". A double-stranded
polynucleotide can be complementary or homologous to another polynucleotide, if hybridization can occur between one of the strands of the first polynucleotide and
the second. Complementarity or homology (the degree that one polynucleotide is complementary with another) is quantifiable in terms of the proportion of bases in opposing strands that are expected to form hydrogen bonding with each other, according to generally accepted base-pairing rules. The stringency of
hybridization is influenced by hybridization conditions, such as temperature and salt. In the context of amplification, these parameters can be suitably selected.
[0077] In some embodiments, cDNA created by reverse transcription and template switching, and optionally treated with an exonuclease, is amplified to provide more starting material for sequencing. cDNA can be amplified by a single primer with a region that is complementary to all cDNAs, e.g., an adapter sequence. In certain embodiments, the primer has a 5 ' blocking group such as biotin. An exemplary primer is as follows: 5 '-
/5Biosg/ACACTCTTTCCCTACACGACGC-3 ' (wherein 5Biosg represents 5 ' biotin) (SEQ ID NO: 19). One exemplary amplification reaction uses cDNA; PCR buffer, such as 1 OX Advantage 2 PCR buffer; dNTPs; the DNA primer 5 ' -
/5Biosg/ACACTCTTTCCCTACACGACGC-3 ' (SEQ ID NO: 19); Polymerase Mix, such as Advantage 2 Polymerase Mix; and Water, such as nuclease-free water, and is (in certain embodiments) performed using the following program: 95 °C for 1 minute; 18 cycles of a) 95 °C for 15 seconds, 65 °C for 30 seconds, 68 °C for 6 minutes, and 72 °C for 10 minutes (followed by an optional hold period at 4 °C). In certain bulk RNA-seq and lysate sequencing embodiments, this
amplification reaction may be modified to use fewer than 18 cycles, e.g., 10 cycles. One exemplary amplification reaction uses 20μΙ^ of cDNA; 5μΙ^ of 10X Advantage 2 PCR buffer; Ι μΙ, of dNTPs; Ι μΙ, of the DNA primer 5 '- /5Biosg/ACACTCTTTCCCTACACGACGC-3 ' (SEQ ID NO: 19) (10μΜ,
Integrated DNA Technologies); Ι μΙ^ of the Advantage 2 Polymerase Mix; and 22μΕ of Nuclease-Free Water, and is optionally performed using the following program: 95 °C for 1 min; 18 cycles of a) 95 °C for 15 sec, 65 °C for 30 sec, 68 °C for 6 min, and 72 °C for 10 min (followed by an option hold period at 4 °C).
However, the skilled worker will appreciate that amplification conditions may be adjusted depending on the exact primer and template being used.
Nucleic acid purification and quantification
[0078] Nucleic acid purification (e.g., cDNA purification) is well known in the art. In some embodiments, a nucleic acid (e.g., cDNA) is purified with a spin- based column, such as those commercially available from Zymo Research™ (DNA Clean & Concentrator™-5, Cat. No. D4013) or Qiagen™ (MinElute PCR purification kit. Cat. No. 28004). In particular embodiments, the spin column is a column lacking a physical ring, for example the ring found in Qiagen™ columns, allowing elution of the purified nucleic acid in a lower volume than would be possible in a spin column with a ring. In some embodiments, a nucleic acid (e.g., cDNA, such as in a cDNA library), is purified using magnetic beads. Magnetic bead purification systems are well known and include, for example, the Agencourt AMPure XP™ system (Beckman Coulter, Cat. No. A63881). In some
embodiments, a nucleic acid (e.g., cDNA, such as in a cDNA library) is purified after being run on a gel. Gel extraction purification kits are well known, and include, for example, the MinElute Gel Extraction Kit™ (Qiagen, Cat. No. 28604).
Sequencing library preparation
[0079] In some embodiments, a cDNA library for sequencing is fragmented prior to the sequencing. A cDNA library can be fragmented by any known method, for example, mechanical fragmentation or a transposase-based fragmentation such as that used in the Nextera™ system (e.g., the Illumina Nextera XT DNA Sample
Preparation Kit Cat. No. FC-131-1096 or the Nextera DNA Sample Preparation Kit Cat. No. FC-121-1031). Fragmentation via a transposase-based system has the benefit of being able to incorporate into the fragments barcode sequences that facilitate identification of the fragments. In some embodiments, a barcode sequence introduced during preparation of a cDNA library for sequencing is specific for a predetermined set of cells. This predetermined set of cells can be a subset of a larger set of cells. For example, a tissue biopsy can be sorted into a set of cells to be further sorted into single cells in a capture plate for gene profiling. If
a bulk lysate or population of cells is being used as a starting material rather than a single cells that have been sorted, a barcode sequence may, in certain
embodiments, not be necessary in this step if a barcode already has been incorporated into the cDNA library in previous steps. However, a plate barcode still could be used to multiplex a high number of samples even for purified
R A/lysates.
Sequencing library quality assessment
[0080] In some embodiments, a cDNA library for sequencing is quantified and evaluated for quality prior to the sequencing to ensure that the library is of sufficient quantity and quality to yield positive results from sequencing. For example, a cDNA library can be quantified using a fluorometer and analyzed for quantity and average size through the use of a number of commercially available kits. The 2 main metrics for quality are the concentration of the library (which needs to be sufficient for loading on the sequencer) and the length of the cDNA fragments to be sequenced. Size selection is performed on a gel to enrich for fragments of the correct size. The gel itself gives an idea of the quality of the library. The final extracted library can be run on an Agilent Bioanalyzer (Cat. No. G2940CA) to obtain the size distribution for the cDNA fragments.
Sequencing [0081] As used herein, "sequencing" refers to any technique known in the art that allows the identification of consecutive nucleotides of at least part of a nucleic acid. Exemplary sequencing techniques include RNA-seq (also known as whole transcriptome sequencing), Illumina™ sequencing, direct sequencing, random shotgun sequencing, Sanger dideoxy termination sequencing, whole-genome sequencing, massively parallel signature sequencing (MPSS), sequencing by hybridization, pyrosequencing, capillary electrophoresis, gel electrophoresis, duplex sequencing, cycle sequencing, single-base extension sequencing, solid- phase sequencing, high-throughput sequencing, massively parallel signature
sequencing, emulsion PCR, sequencing by reversible dye terminator, paired-end sequencing, near-term sequencing, exonuclease sequencing, sequencing by ligation, short-read sequencing, single-molecule sequencing, sequencing-by- synthesis, real-time sequencing, reverse-terminator sequencing, nanopore sequencing, 454 sequencing, Solexa Genome Analyzer sequencing, SOLiD™ sequencing, MS-PET sequencing, mass spectrometry, and a combination thereof. In some embodiments, sequencing comprises detecting a sequencing product using an instrument, for example but not limited to an ABI PRISM™ 377 DNA
Sequencer, an ABI PRISM™ 310, 3100, 3100-Avant, 3730, or 3730x1 Genetic Analyzer, an ABI PRISM™ 3700 DNA Analyzer, or an Applied Biosystems
SOLiD™ System (all from Applied Biosystems), a Genome Sequencer 20 System (Roche Applied Science), or a mass spectrometer. In certain embodiments, sequencing is performed on Illumina Hiseq or MiSeq paired-end flow cells.
Data analysis [0082] As described herein, one major advantage of the nucleic acids, methods, and kits of the invention is that samples can be pooled and sequenced rather than needing to be sequenced individually. Sequencing products can be traced not only to a single plate of cells from which it came, but also to a single cell (e.g., a well) and, indeed, a single cellular transcript. This deconvolution of sequencing data can be achieved through the use of barcode and UMI sequences. In some
embodiments, sequencing is combined with 3' digital gene expression to provide a number of counts for a particular sequence or sequences (e.g., cDNAs containing a particular combination of bar codes and a UMI). In some embodiments, each fragment of each transcript is sequenced and then counted for how many fragments of each transcript have been sequenced. In these embodiments, the computed gene expression should be normalized based on the length of a given transcript because a longer transcript will have a greater chance of having one of its fragments sequenced. However, full transcript sequencing typically requires more sequencing coverage than DGE, for which only the 3 'end needs to be sequenced.
Kits
[0083] In some embodiments, the invention provides a kit comprising a plurality of the one or both of the reverse transcription/template switching nucleic acid primers described above. In some embodiments, the UMI sequence of each of the second nucleic acid primer described above in the plurality of nucleic acids of the kit is unique among the nucleic acids of the kit. In some embodiments, the plurality of nucleic acids comprises different populations of nucleic acid species. In certain embodiments, each population of nucleic acid species comprises a different barcode sequence that uniquely identifies a single population of nucleic acid species. In some embodiments, the kit further comprises a third nucleic acid primer comprising 12 to 32 nucleotides and a 5' blocking group as described above. In some embodiments, the third nucleic acid is 22 nucleotides in length. An exemplary sequence of the third nucleic acid primer is 5'- ACACTCTTTCCCTACACGACGC-3' (SEQ ID NO: 2). In some embodiments, the kit further comprises a nucleic acid comprising a barcode sequence. In some embodiments, the kit further comprises a phosphorothioate bond-containing nucleic acid comprising an X1 *X2*X3*X4*X5*3' sequence, wherein * is a phosphorothioate bond. In certain embodiments, the phosphorothioate bond- containing nucleic acid is 48 to 68 nucleotides in length, for example, 58 nucleotides in length. An exemplary sequence of the phosphorothioate bond- containing nucleic acid is 5'-
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTC TTCCG*A*T*C*T*3' (SEQ ID NO: 3). In further embodiments, the kit further comprises a capture plate and/or a reverse transcriptase enzyme and/or a DNA purification column (e.g., a DNA purification spin column) and/or proteinase K.
For example, the kit can comprise a Moloney Murine Leukemia Virus (MMLV) reverse transcriptase, for example, SMARTscribe™ reverse transcriptase,
Superscript II™ reverse transcriptase, or Maxima H Minus™ reverse
transcriptase. Exemplary kits include any one or any combinations of the reagents described herein and, optionally, directions for use. When multiple reagents and/or nucleic acids are provided in a single kit, the reagents may be provided in separate
containers, such as separate tubes or vials. Optionally, the kit contains sterile water for use.
Research applications
[0084] In some embodiments, the nucleic acids, kits, and/or methods of the invention are used for research applications requiring sequencing or gene expression profiling. In certain embodiments, the research applications include studying cellular differentiation, characterizing tissue heterogeneity, high- throughput screening of agents (e.g., potential therapeutics, potential
differentiation inducers, potential toxins, or any other agents whose effects on cells are of interest), stem cell reprogramming, cell lineage tracing, and virus detection in blood samples. Exemplary applications of the technology to the research context and proof are provided in the Examples and are merely illustrative of uses of the technology.
[0085] In certain embodiments, the nucleic acids (e.g., compositions), kits, and/or methods, of the disclosure are applied to gene expression analysis of single cells, optionally in response to contacting the single cell with an agent in the high- throughput screening context. The ability to analyze gene expression accurately and across large numbers of cells, and to be able to accurately correlate the expression level to a particular cell/well is an exemplary advantage and application of the instant technology. The technology is, in certain embodiments, similarly applied to other samples, such as cell or tissue lysates.
Diagnosis, prognosis, and treatment
[0086] As described above, the invention is useful in generating a gene expression profile for a plurality of cells. These gene expression profiles can be used in a number of applications related to the diagnosis, prognosis, and treatment of a subject. For example, cells from a tissue sample collected from a patient can be used in the methods of the invention to generate an expression profile that can be compared against a known profile that is indicative of the disease or condition, thus informing a physician of whether the subject has the disease or condition.
Similarly, the profile can be compared to a known profile useful in the prognosis of the disease or condition. For example, if the known profile is predictive of a cancer prognosis, the comparison may inform the physician of the stage of cancer or the cancer's likelihood of metastasis. In some embodiments, the invention can be used in a method of treating a disease or condition in a subject in need thereof. For example, a method of the invention can be used to obtain gene expression profiles in a subject before and after treatment with a therapeutic agent, thereby providing a means of determining the efficacy of the therapeutic agent. These data can be used to determine the efficacy of a treatment, or to help a physician determine an effective treatment regimen.
[0087] The invention is applicable to various diseases or conditions. Exemplary diseases or conditions are a cancer, a cardiovascular disease or condition, a neurological or neuropsychiatric disease or condition, an infectious disease or condition, a respiratory or gastrointestinal tract disease or condition, a reproductive disease or condition, a renal disease or condition, a prenatal or pregnancy-related disease or condition, an autoimmune or immune-related disease or condition, a pediatric disease, disorder, or condition, a mitochondrial disorder, an ophthalmic disease or condition, a musculo-skeletal disease or condition, or a dermal disease or condition. [0088] All publications, patents and published patent applications referred to in this application are specifically incorporated by reference herein. In case of conflict, the present specification, including its specific definitions, will control.
[0089] Each embodiment described herein may be combined with any other embodiment described herein. [0090] The following examples are provided to illustrate certain embodiments of the invention and are not intended to limit the scope of the invention.
Examples
Example 1: Protocol for transcriptome-wide single-cell RNA sequencing
[0091] To test the methods of the invention, the protocol described below was developed.
Capture plate preparation
[0092] 5μί of lysis buffer, composed of a 1/500 dilution of Phusion HF buffer (New England Biolabs, #B0518S) were distributed in each well of a Twin.tec PCR 384-well collection plates (Eppendorf, # 951020729).
Cell preparation
[0093] Media was removed by pelleting the cells for 5min at lOOOrpm, and the RNA was immediately stabilized by resuspending the cells in 500μί of
RNAprotect Cell Reagent (Qiagen, #76526) and 1 μΕ of RNaseOUT Recombinant Ribonuclease Inhibitor (Life Technologies, #10777-019). Cells were stored up to two weeks at 4 °C. Prior to sorting, cells in the RNAprotect Cell Reagent were diluted in 1.5mL PBS, pH 7.4 (no calcium, no magnesium, no phenol red, Life Technologies, #10010-049). The cells then were stained for viability (DNA staining by Hoechst 33342) with NucBlue Live ReadyProbes Reagent (Life Technologies, #R37605).
Cell collection
[0094] Cells were sorted individually in each well of a 384-well capture plate using the FACSAria II flow cytometer (BD Biosciences). "Live" cells were selected and duplets avoided using the Hoechst DNA staining. In other words, following Hoechst staining, dead cells could be removed and not processed further and presence of a single cell / well could be confirmed. After sorting, the plates were immediately sealed, spun down, and frozen on dry ice. The sorted cells were stored at -80 °C. Cell lysis
[0095] Cells were thawed for 5 minutes at room temperature, then placed on ice.
Reverse Transcription/Template Switching
[0096] 1 \iL of a 1 x 10"7 dilution of ERCC RNA Spike-In Mix (Life
Technologies, #4456740) was added to each well. Ι μΙ, of a universal adapter DNA primer (template-switching oligonucleotide) 5 '- iCiGiCACACTCTTTCCCTACACGACGCrGrGrG-3 ' (Ι μΜ,) (SEQ ID NO: 17) was added to each well, wherein iC represesents isocytosine (iso-dC), iG represents isoguanosine, and rG represents RNA guanosine. Ι μί of a cDNA synthesis primer 5'-
/5Biosg/ACACTCTTTCCCTACACGACGCTCTTCCGATCT[BC6]NNNNNNN NNNTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTVN-3* (SEQ ID NO: 18) (Ι μΜ) is added to each well, wherein 5Biosg represents 5 ' biotin, V represents a nucleotide selected from A, G, and C, N represents a nucleotide selected from A, G, C, and T, [BC6] represents a 6 base pair barcode sequence, different for each well of a 384 well plate, and (N)10 represents a Unique Molecular Identifier (UMI) sequence. The barcode sequences were designed such that each barcode differed from the others by at least two nucleotides, so that a single sequencing error could not lead to the misidentification of the barcode (Table 1). The plate was subsequently incubated at 72 °C for 3 minutes then immediately placed on ice to cool down (although this step is optional). The Template Switching step was carried out in each well using the following reagents: 2μΕ of 5X 1st strand buffer (250mM UltraPure Tris-HCl, pH 8.0, Life Technologies, #15568-025; 375mM KC1, LifeTechnologies, #AM9640G; 30mM MgC12, Life Technologies,
#AM9530G); Ι μΙ, of DL-Dithiothreitol solution BioUltra, 20mM (Sigma-Aldrich, #43816); Ι μί of dNTPs (New England Biolabs, #N0447L); 0.25μί of a MMLV Reverse Transcriptase, in this particular example, the MMLV reverse transcriptase SmartScribe Reverse Transcriptase (Clontech, #639538); and 0.75μΙ, of Nuclease- Free Water (not DEPC-Treated) water (LifeTechnologies, #AM9937). The plate was incubated at 42 °C for 1 hour 30 minutes.
Table 1 : Exemplary bar code sequences
AGTAAT 57
AGTATA 58
AGTTAA 59
ATAAAC 60
ATAACA 61
ATAAGT 62
ATAATG 63
ATACAA 64
ATACTT 65
ATAGAT 66
ATAGTA 67
ATATAG 68
ATATCT 69
ATATGA 70
ATATTC 71
ATCAAA 72
ATCATT 73
ATCTAT 74
ATCTTA 75
ATGAAT 76
ATGATA 77
ATGTAA 78
ATTAAG 79
ATTACT 80
ATTAGA 81
ATTATC 82
ATTCAT 83
ATTCTA 84
ATTGAA 85
ATTGTT 86
ATTTAC 87
ATTTCA 88
ATTTGT 89
ATTTTG 90
CAAAAT 91
CAAATA 92
CAATAA 93
CATAAA 94
CATATT 95
CATTAT 96
CATTTA 97
CTAAAA 98
CTAATT 99
CTATAT 100
CTATTA 101
CTTAAT 102
CTTATA 103
CTTTAA 104
GAAATT 105
GAATAT 106
GAATTA 107
GATAAT 108
GATATA 109
GATTAA 1 10
GTAAAT 1 1 1
GTAATA 1 12
GTATAA 113
GTTAAA 114
GTTATT 115
GTTTAT 116
GTTTTA 117
TAAAAC 118
TAAACA 1 19
TAAAGT 120
TAAATG 121
TAACAA 122
TAACTT 123
TAAGAT 124
TAAGTA 125
TAATAG 126
TAATCT 127
TAATGA 128
TAATTC 129
TACAAA 130
TACATT 131
TACTAT 132
TACTTA 133
TAGAAT 134
TAGATA 135
TAGTAA 136
TAGTTT 137
TATAAG 138
TATACT 139
TATAGA 140
TATATC 141
TATCAT 142
TATCTA 143
TATGAA 144
TATGTT 145
TATTAC 146
TATTCA 147
TATTGT 148
TATTTG 149
TCAAAA 150
TCAATT 151
TCATAT 152
TCATTA 153
TCTAAT 154
TCTATA 155
TCTTAA 156
TGAAAT 157
TGAATA 158
TGATAA 159
TGATTT 160
TGTAAA 161
TGTATT 162
TGTTAT 163
TGTTTA 164
TTAAAG 165
TTAACT 166
TTAAGA 167
TTAATC 168
TTACAT 169
TTACTA 170
TTAGAA 171
TTAGTT 172
TTATAC 173
TTATCA 174
TTATGT 175
TTATTG 176
TTCAAT 177
TTCATA 178
TTCTAA 179
TTGAAA 180
TTGATT 181
TTGTTA 182
TTTAAC 183
TTTACA 184
TTTAGT 185
TTTATG 186
TTTCAA 187
TTTCTT 188
TTTGTA 189
TTTTAG 190
TTTTCT 191
TTTTGA 192
TCTTTC 193
TTGGAT 194
ACCGTA 195
AGACCT 196
AGGGAT 197
ATCGAG 198
CAAGCT 199
CACCAA 200
CAGTCA 201
CATCAG 202
CATGGT 203
CCACAT 204
CCGATT 205
CGACTT 206
CGATTG 207
CTAGTG 208
CTTCTG 209
GAAGAC 210
GATCGT 211
GCTAGA 212
GCTTAC 213
GGACAT 214
GGCAAT 215
GGGATT 216
GTACAC 217
GTCAAG 218
GTGACT 219
GTTCGA 220
TAGTGG 221
TCCAAC 222
TCGAAG 223
TCTGCA 224
TTCCTC 225
TTGTCC 226
TTTGGC 227
CCAACC 228
CCTTCC 229
CTCTCC 230
GGACCA 231
GTACCG 232
ACCCCC 233
ACCCGG 234
ACCGCG 235
ACCGGC 236
ACGCCG 237
ACGCGC 238
ACGGCC 239
ACGGGG 240
AGCCCG 241
AGCCGC 242
AGCGCC 243
AGCGGG 244
AGGCCC 245
AGGCGG 246
AGGGCG 247
AGGGGC 248
CACCCC 249
CACCGG 250
CACGCG 251
CACGGC 252
CAGCCG 253
CAGCGC 254
CAGGCC 255
CAGGGG 256
CCACCG 257
CCACGC 258
CCAGGG 259
CCCACG 260
CCCAGC 261
CCCCAC 262
CCCCCA 263
CCCCGT 264
CCCCTG 265
CCCGAG 266
CCCGGA 267
CCCTGG 268
CCGAGG 269
CCGCAG 270
CCGCGA 271
CCGGAC 272
CCGGCA 273
CCGGGT 274
CCGGTG 275
CCGTCG 276
CCGTGC 277
CCTCGG 278
CCTGCG 279
CCTGGC 280
CGACCC 281
CGACGG 282
CGAGCG 283
CGAGGC 284
CGCACC 285
CGCAGG 286
CGCCAG 287
CGCCCT 288
CGCCGA 289
CGCCTC 290
CGCGAC 291
CGCGCA 292
CGCGGT 293
CGCGTG 294
CGCTCG 295
CGCTGC 296
CGGACG 297
CGGAGC 298
CGGCAC 299
CGGCCA 300
CGGCGT 301
CGGCTG 302
CGGGAG 303
CGGGCT 304
CGGGGA 305
CGGGTC 306
CGGTCC 307
CGGTGG 308
CGTCCG 309
CGTCGC 310
CGTGCC 31 1
CGTGGG 312
CTCCCG 313
CTCCGC 314
CTCGGG 315
CTGCGG 316
CTGGCG 317
CTGGGC 318
GACCCG 319
GACCGC 320
GACGCC 321
GACGGG 322
GAGCCC 323
GAGCGG 324
GAGGCG 325
GAGGGC 326
GCACCC 327
GCACGG 328
GCAGCG 329
GCAGGC 330
GCCACC 331
GCCAGG 332
GCCCAG 333
GCCCCT 334
GCCCGA 335
GCCCTC 336
GCCGAC 337
GCCGCA 338
GCCGGT 339
GCCGTG 340
GCCTCG 341
GCCTGC 342
GCGACG 343
GCGAGC 344
GCGCAC 345
GCGCCA 346
GCGCGT 347
GCGCTG 348
GCGGAG 349
GCGGCT 350
GCGGGA 351
GCGGTC 352
GCGTCC 353
GCGTGG 354
GCTCCG 355
GCTCGC 356
GCTGCC 357
GCTGGG 358
GGACGC 359
GGAGCC 360
GGAGGG 361
GGCACG 362
GGCAGC 363
GGCCAC 364
GGCGAG 365
GGCGCT 366
GGCGGA 367
GGCGTC 368
GGCTCC 369
GGGACC 370
GGGAGG 371
GGGCAG 372
GGGCCT 373
GGGCGA 374
GGGCTC 375
GGGGAC 376
GGGGCA 377
GGGGGT 378
GGGGTG 379
GGGTCG 380
GGGTGC 381
GGTCCC 382
GGTGCG 383
GGTGGC 384
GTCCCC 385
GTCGCG 386
GTCGGC 387
GTGCGC 388
GTGGCC 389
GTGGGG 390
TCCCCG 391
TCCCGC 392
TCCGGG 393
TCGCGG 394
TCGGCG 395
TCGGGC 396
TGCCCC 397
TGCGCG 398
TGCGGC 399
TGGCCG 400
TGGCGC 401
TGGGCC 402
TGGGGG 403
cDNA pooling and purification
[0097] All 384 wells were pooled together, and 35mL of DNA Binding Buffer (Zymo Research, #D4004-1-L) was added to the pooled cDNAs. All cDNAs pooled from one 384-well plate were purified through a DNA purification spin column, in this case, one single DNA Clean & Concentrator-5 column (Zymo Research, #D4013), and the cDNAs were eluted in 17 of Nuclease-Free Water.
Exonuclease I treatment
[0098] Pooled cDNAs were treated with an exonuclease, in this case
Exonuclease I, 2^L of 10X reaction buffer, of Exonuclease I (New England Biolabs, #M0293L), and the reaction was incubated at 37 °C for 30 minutes, then at 80 °C for 20 minutes. Full length cDNA amplification
[0099] Full length cDNA was amplified by single primer PCR using the Advantage 2 PCR Enzyme System (Clontech, #639206). The PCR reaction was set up as follows: 20μΙ, of cDNA from previous step; 5μί of 10X Advantage 2 PCR buffer; ΙμΙ, of dNTPs; ΙμΙ, of the DNA primer 5'- /5Biosg/AC ACTCTTTCCCTACACGACGC-3 ' (SEQ ID NO : 19) (wherein
5Biosg represents 5' biotin) (10μΜ, Integrated DNA Technologies); ΙμΙ, of the Advantage 2 Polymerase Mix; and 22μΕ of Nuclease-Free Water, and performed using the following program: 95 °C for 1 minute; 18 cycles of a) 95 °C for 15 seconds, 65 °C for 30 seconds, 68 °C for 6 minutes, and 72 °C for 10 minutes (followed by an option hold period at 4 °C).
Full length cDNA purification and quantification
[0100] Full length cDNAs were purified with 30μΙ, of beads (here, Agencourt AMPure XP magnetic beads (Beckman Coulter, #A63880)). The full length cDNAs were eluted in 12μΕ of Nuclease-Free Water and quantified on the Qubit 2.0 Flurometer (Life Technologies) using the dsDNA HS Assay (Life
Technologies #Q32851).
Sequencing library preparation
[0101] From the purified full length cDNA, lng of cDNA was engaged in Nextera library preparation according to the Illumina protocol, with the exception that in the Illumina protocol, only the i7 primer (e.g., a primer which is standard to the Illumina system) was used to barcode cDNA originating from the same 384- well plate, whereas we also use 5μΜ of a second primer (5'- AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTC
TTCCG*A*T*C*T*-3' (SEQ ID NO: 3), wherein * represents a phosphorothioate bond) during the library amplification step.
Sequencing library purification and size selection
[0102] The resulting sequencing library was purified with 30μί of Agencourt AMPure XP magnetic beads and eluted in 20μί of nuclease free water. The entire library was run on an E-Gel EX Gel, 2% (Life Technologies, #G4010-02), and the band corresponding to a size range of 300 to 800bp was excised and purified using the QIAquick Gel Extraction Kit (Qiagen, #28704).
Sequencing library quality assessment [0103] The library was quantified on the Qubit 2.0 Fluorometer using the dsDNA HS Assay. The quality and average size of the library were assessed by
BioAnalyzer (Agilent) with the High Sensitivity DNA kit (Agilent, #5067-4626).
Sequencing
[0104] Sequencing is performed on any Illumina® HiSeq™ or MiSeq™ using standard Illumina® sequencing kit. Libraries are run on paired-end flow cells by running 17 cycles on the first strand, then 8 cycles to decode the Nextera™ barcode and finally 34 cycles (although 46 cycles also can be used to increase the amount of sequencing data). Up to twelve Nextera libraries/384-well capture plates, each comprising 384 cells, are multiplexed together (twelve libraries can be used with a set of twelve plate-identifying barcode sequences, although this number can be expanded with additional barcode sequences), allowing the simultaneous sequencing of up to 4,608 single cell transcriptomes on a single lane.
Example 2: Single cell sequencing of differentiating stem cells
[0105] The methods and reagents (e.g., polynucleotides, kits, etc.) described herein have numerous applications. The following provides an example demonstrating the application of the instant technology to a particular context. The method described above was used to sequence the transcriptomes of a population
of differentiating human adipose tissue-derived stromal/stem cells (hASCs) at three different time points (day 0, day 1, day 2, day 3, day 5, day 7, day 9, and day 14). Visual inspection of these cells indicates that differentiation over time is incomplete, thus leading to a heterogeneous cell population (Figure 1). Given the heterogeneous appearance of the cells, we would expect that, if cells in the culture could be rigorously analyzed at the single cell level and gene expression accurately correlated with each specific single cell, expression of genes relevant to
differentiation and other activities would differ across individual cells at a given time point. We thus undertook such analysis as proof of principle of the robustness of the methods and compositions of the present invention.
[0106] As proof of principle, single-cell R A-seq data were generated for -9,216 cells in total that represent -1,152 cells collected for each of the eight time points profiled (day 0, day 1, day 2, day 3, day 5, day7, day 9, and day 14). To generate these data, FACS was used to sort the cells into 24 384-well plates.
Figure 3 depicts the design of the sequencing library incorporating the two levels of barcoding (well/cell and plate), the UMI, and the primer sequences indicated as P5 and P7 for Illumina sequencing. P5 and P7 are the regions that anneal to their complementary oligos on the flow cell. The index (i7) represents the plate index than is added during the Nextera tagmentation process after all wells have been pooled and pre-amplified. It is incorporated by PCR during the last step of the library preparation. One i7 index is used per pool/plate of 96 or 384 samples/cells, allowing for a higher level of multiplexing by pooling several plates together for sequencing. The sequencing primers P5 and P7 initiate the sequencing reaction. The sequencing will result in 3 distinct reads. The first one is 16bp long and includes 6bp of the well/cell barcode followed by lObp of the UMI. Then the i7 index sequencing primer allows us to read the plate/pool index (i7, 8bp) on the same strand. Finally, the other strand is generated (paired-end sequencing) and the read 2 sequencing primer allows us to read the actual cDNA fragment, which is typically 45bp with a 50 cycle kit. By using the 3 reads and deciphering the barcodes, we can trace each cDNA to a specific well, plate, and transcript. In certain embodiments, the disclosure provides a polynucleotide as set forth on
Figure 3 (e.g., a polynucleotide comprising various polynucleotide portions, such as contiguous portions, as set forth in Figure 3). The various portions are described herein and the figure contemplates polynucleotides comprising any combinations of these various portion. Expression values were correlated by comparing raw read counts to UMI counts (Figure 4). Incorporating and counting UMIs helped to reduce the PCR bias.
[0107] Key marker genes among the cells for each time point were measured, and the distribution of expression levels was plotted over time (days 0 to 14) as shown in Figure 5. With the single cell RNA-seq data, the proportions of cells expressing a gene at a given level are observable. Gene detection in single cells was plotted as a histogram showing how many expressed genes were detected per cell (Figure 6). By way of exemplifying the data for a gene, GAPDH was selected as an example of a "housekeeping" gene that shows a burst of transcription and that is a cell cycle-regulated gene. The histogram of Figure 7 represents the distribution of GAPDH expression among the cells profiled at day 0. While
GAPDH usually is present at a constant level of expression in a population of cells, when observed at the single cell level, a significant portion of cells were seen that did not express GAPDH because GAPDH is a cell cycle-regulated gene. Thus, by using the single cell sequencing method, we revealed that, despite its widespread use as a "housekeeping" reference gene, GAPDH is not necessarily a good reference gene especially at the single cell level. This underscores the power of the single cell sequencing methods of the invention.
[0108] A projection of three of the highest components of a principal component analysis based on gene expression are shown in Figures 8 to 13. Each point represents a profiled cell. The cells profiled at day 0 are represented in black, while the cells profiled at the subsequent time points (day 1, day 2, day 3, day 7, and day 14) are shown in gray (or in red if depicted in color). A clear distinction can be seen between the day 0 cells and the cells from subsequent time points. To explore these differences, a Gene Ontology analysis then was performed on the differentially expressed genes between two subpopulations distinguishable at day
14 with the principal component analysis: a subpopulation of genes that clusters with day 0 genes and a subpopulation that is separate from those genes. Key genes that characterize these two day 14 subpopulations were identified and categorized using the Gene Ontology database (Figure 14). The ability to distinguish these subpopulations illustrates the robustness of the methodology. A partial conclusion of these analyses shows the link between the expression of adipocyte genes and G- 1 arrest (Figure 15). Based on this analysis, it appears that one subpopulation fully differentiates, while the other seems to be stuck in the GO phase and cannot fully differentiate. These data were then further used in a comparison of adipogenesis efficiency between a mouse system (3T3-L1) where the differentiation process is much more efficient and for which there is a clonal expansion, and in human cells (hASCs), where this clonal expansion is absent (Figure 16). This clonal expansion may be essential to avoid a subpopulation becoming stuck in the GO phase and resulting in incomplete differentiation. [0109] In conclusion, the data show that the invention provides a useful method for single cell sequencing and single transcript tracking that uses the aggregation of samples and subsequent deconvolution of data. Through this process of aggregation and deconvolution, the sequencing can be performed with less cost and greater efficiency than by traditional sequencing techniques. Moreover, the results obtained here reflect the ability to detect changes and differences across heterogeneous populations when those populations are evaluated at the single cell level. Such changes and differences may be lost (e.g., averaged out) if gene expression across the heterogenous population is instead evaluated.
Example 3: Simultaneous single cell sequencing of 12,832 cells [0110] To further demonstrate the applicability of single cell sequencing methods and compositions (e.g., reagents, nucleic acids, kits) of the disclosure for addressing a range of questions, including questions related to understanding cell and developmental biology, a primary human adipose-derived stem/stromal cell (hASC) differentiation system was used as a test system, akin to that described above. Once again, single cell R A sequencing methods and compositions of the
invention was successfully used to survey gene expression in differentiating hASC cultures at single cell resolution. The resulting data reveal the major axes of variation on gene expression, suggest a biological basis for the morphological heterogeneity observed in these cultures, and provide a rich resource for dissection of the regulatory networks involved in adipocyte formation and function beyond what investigations using other techniques have shown. Through advances in sequencing and cell isolation technologies, identification of rare expression programs can be enabled by deeper and more sensitive profiling of every cell, and direct comparison of in vitro and in vivo heterogeneity can be observed through direct profiling of single cells from tissue samples.
[0111] The protocol used in this particular example was as follows.
Cell culture
[0112] Human adipose-derived stem/stromal cells (hASCs) were isolated from lipoaspirates and purified by flow-cytometry (CD29, CD44, CD73, CD90, CD 105 and CD166 positive; CD14, CD31, CD45 and Linl negative) (cells were obtained from Life Technologies). The hASCs were cultured in a 2% reduced serum medium (MesenPro RS, Life Technologies) and expanded for no more than 3 passages. The cultures were then induced to differentiate towards an adipogenic fate after reaching 80% confluency (differentiations Dl and D2) or two days after reaching 100% confluency (differentiation D3) by switching from growth medium to the StemPro adipogenesis differentiation medium (Life Technologies), and were subsequently prepared for further analysis, such as by qPCR or smFISH.
Following induction, the differentiation medium was changed every three days for up to 14 days. The variation in initial conditions (confluency upon differentiation) was introduced to assess the robustness of the subsequent time course data.
Single cell isolation
[0113] Cells were harvested using TrypLE Express (Life Technologies) and medium removed by pelleting the cells in a centrifuge (5 minutes at 1000 rpm). RNA was stabilized by immediately resuspending the pelleted cells in RNAprotect
Cell Reagent (Qiagen) and RNaseOUT Recombinant Ribonuclease Inhibitor (Life Technologies) at a 1 : 1000 dilution. Just prior to fluorescence-activated cell sorting (FACS), the cells were diluted in PBS (pH 7.4, no calcium, magnesium or phenol red; Life Technologies) and stained for viability using Hoechst 33342 (Life Technologies). 384-well SBS capture plates were filled with 5μ1 of a 1 :500 dilution of Phusion HF buffer (New England Biolabs) in water and cells were then sorted into each well using a FACSAria II flow cytometer (BD Biosciences) based on Hoechst DNA staining. After sorting, the plates were immediately sealed, spun down, cooled on dry ice, and stored at -80°C. For lipid content-based FACS, cells were also stained with HSC LipidTOX Neutral Lipid Stain (Life Technologies) and sorted according to their relatively "high" or "low" lipid content, either by taking the top and bottom 20% of stained cells (D2) or the top and bottom 50% (D3).
Sequencing of sorted single cells [0114] Frozen cells were thawed for 5 minutes at room temperature. For the second time course (D3) only, lysis conditions further included treating the cells with proteinase K (200μg/mL; Ambion), followed by RNA desiccation to inactivate the proteinase K and simultaneously reduce the reaction volume. The cells were kept at 50 °C for 15 minutes in a sealed plate, then 95 °C for 10 minutes with the seal removed.
Primers
[0115] The primers used, and the resulting products, are as follows.
1st strand cDNA 5*-RNA:NB(A)30-3*
3'-
CCC:cDNA:NV(T)30(N)10[BC6]TCTAGCCTTCTCGCAGCACATCCCTTTCT CACA-5*
2nd strand cDNA 5*-ACACTCTTTCCCTACACGACGCGGG:cDNA:NB(A)30-3*
CCC:cDNA:NV(T)30(N)10[BC6]TCTAGCCTTCTCGCAGCACATCCCTTTCT CACA-5*
Resulting full length cDNA 5*- ACACTCTTTCCCTACACGACGCGGG:cDNA:NB(A)30(N)10[BC6]AGATCG GAAGAGCGTCGTGTAGGGAAAGAGTGT-3*
3*-
TGTGAGAAAGGGATGTGCTGCGCCC:cDNA:NV(T)30(N)10[BC6]TCTAGC CTTCTCGCAGCACATCCCTTTCTCACA-5* Full length cDNA amplification:
Single primer PCR
3-*CGCAGCACATCCCTTTCTCACA-5* 5*-
ACACTCTTTCCCTACACGACGCGGG:cDNA:NB(A)30(N)10[BC6]AGATCG GAAGAGCGTCGTGTAGGGAAAGAGTGT-3*
3*-
TGTGAGAAAGGGATGTGCTGCGCCC:cDNA:NV(T)30(N)10[BC6]TCTAGC CTTCTCGCAGCACATCCCTTTCTCACA-5*
5*-ACACTCTTTCCCTACACGACGC-3*
Transposon based library (Nextera)
Tagmentation
5*-ACACTCTTTCCCTACACGACGCTCTTCCGATCT[BC6](N)10(T)30VN- Frag-3'
3*-Frag-GACAGAGAATATGTGTAGAGGCTCGGGTGCTCTG-5*
Library amplification (modified)
3*-GGCTCGGGTGCTCTG[i7]TAGAGCATACGGCAGAAGACGAAC-5*
5*-ACACTCTTTCCCTACACGACGCTCTTCCGATCT[BC6](N)10(T)30VN- Frag-CTGTCTCTTATACACATCTCCGAGCCCACGAGAC-3*
3*-TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA[BC6](N)10(A)30BN- Frag-GACAGAGAATATGTGTAGAGGCTCGGGTGCTCTG-5*
5'-
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTC TTCCGATCT-3*
Resulting library
5'-
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTC TTCCGATCT[BC6](N)10(T)30VN-Frag-
CTGTCTCTTATACACATCTCCGAGCCCACGAGAC[i7]ATCTCGTATGCCG TCTTCTGCTTG-3*
3*-
TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGA GAAGGCTAGA[BC6](N) 10(A)30BN-Frag-
GACAGAGAATATGTGTAGAGGCTCGGGTGCTCTG[i7]TAGAGCATACGG CAGAAGACGAAC-5*
Sequencing
Read 1 [BC6] + UMI (N)10 -» 5'-
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTC TTCCGATCT[BC6](N)10(T)30VN-Frag-
CTGTCTCTTATACACATCTCCGAGCCCACGAGAC[i7]ATCTCGTATGCCG TCTTCTGCTTG-3*
3*-
TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGA GAAGGCTAGA[BC6](N)10(A)30BN-Frag-
GACAGAGAATATGTGTAGAGGCTCGGGTGCTCTG[i7]TAGAGCATACGG CAGAAGACGAAC-5*
Read 2 Nextera Index [i7]
<- Read 3: 3 'end cDNA fragment [0116] To start, diluted ERCC RNA Spike-In Mix (Ιμΐ of 1 : 107 for D1/D2 or Ιμΐ of 1 : 106 for D3; Life Technologies) was added to each well, and the template switching reverse transcription reaction described above was carried out using a MMLV Reverse Transcriptase (here, either SmartScribe Reverse Transcriptase (D1/D2; Clontech) or Maxima H Minus Reverse Transcriptase (D3; Thermo Scientific)) with the template-switching oligonucleotide (2 pmol, Eurogentec) (5 '- iCiGiCACACTCTTTCCCTACACGACGCrGrGrG-3' (SEQ ID NO: 17), where iC is iso-dC, iG is iso-dG, and rG is RNA G) and a cDNA synthesis primer (2 pmol, Integrated DNA Technologies) and 5'-
/5Biosg/ACACTCTTTCCCTACACGACGCTCTTCCGATCT[BC6]NNNNNNN NNNTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTVN-3* (SEQ ID NO: 18), wherein 5Biosg represents 5' biotin; V represents a nucleotide selected from A, G, and C; the 3' N represents a nucleotide selected from A, G, C, and T; [BC6] represents a 6 base pair barcode sequence; and the (N)10 after the barcode
sequence represents a Unique Molecular Identifier (UMI) sequence (10 base pair barcode). After the template switching reaction, cDNA from 384 wells was pooled together and purified and concentrated using a single DNA Clean & Concentrator- 5 column (Zymo Research). Pooled cDNAs were treated with an exonuclease, in this example Exonuclease I (New England Biolabs), and subsequently amplified by single primer PCR using the Advantage 2 Polymerase Mix (Clontech) and the SINGV6 primer (10 pmol, Integrated DNA Technologies) (5'- /5Biosg/ACACTCTTTCCCTACACGACGC-3' (SEQ ID NO: 19)). Full length cDNAs were purified with Agencourt AMPure XP magnetic beads (0.6x, Beckman Coulter) and quantified on the Qubit 2.0 Flurometer using a dsDNA HS Assay (Life Technologies). The full-length cDNA was then used in the Nextera XT library preparation kit (Illumina) according to the manufacturer's protocol, with the exception that the i5 primer was replaced by a phosphorothioate bond-containing nucleic acid (5μΜ, Integrated DNA Technologies) (5'- AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTC TTCCG*A*T*C*T*-3', where * = phosphorothioate bonds (SEQ ID NO: 3)). The resulting sequencing library was purified with Agencourt AMPure XP magnetic beads (0.6x, Beckman Coulter), size selected (300-800bp) on an E-Gel EX Gel, 2% (Life Technologies), purified using a QIAquick Gel Extraction Kit (Qiagen) and quantified on a Qubit 2.0 Flurometer using a dsDNA HS Assay (Life
Technologies). Libraries were sequenced on an Illumina Hiseq paired-end flow cells with 17 cycles on the first read to decode the well barcode and UMI, an 8 cycle index read to decode the i7 Nextera barcode, and finally a 34 cycle second read to sequence the cDNA. Sequencing on bulk samples
[0117] Populations of both unsorted and sorted cells were lysed in QIAzol (Qiagen) and RNA was extracted and purified using Direct-zol RNA MiniPrep (Zymo Research). Digital gene expression (DGE) libraries for sequencing were prepared from 10 ng of extracted total RNA, using the protocol described above for single cells, with the exception of using more concentrated template-switching
and barcoded nucleic acids (10 pmol) and a version of the cDNA synthesis primer that did not contain the well-specific 6bp barcodes but instead a 16bp UMI (Integrated DNA Technologies) (5'-
/5Biosg/ACACTCTTTCCCTACACGACGCTCTTCCGATCT NNNNNN NNN NNNNTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTVN-3* (SEQ ID NO: 404))
Single cell RT-qPCR
[0118] Single cells were sorted into 384-well plates, frozen at -80 °C, thawed for 5 min at room temperature, treated with proteinase K (200μg/mL, Ambion), and desiccated as described above. cDNA synthesis was carried out in each well using Superscript VILO (2μ1 final volume; Life Technologies). qPCR was then performed on the total cDNA output using FAM and VIC Taqman probes (Life Technologies) and processed on an Applied Biosystems ViiA 7 Real-Time PCR system (Life Technologies).
Single-molecule FISH [0119] Probes targeting LPL, G0S2 and TCF25 transcripts were synthesized as amine-conjugated oligonucleotides and then labelled with Cy5 (GE Healthcare), Alexa Fluor 594 (Molecular Probes) or 6-TAMRA (Molecular Probes).
Hybridizations and washes were performed using modifications to previously described procedures (see, e.g., Bienko et al, Nat. Methods 10: 122-124 (2013) and Raj et al, Nat. Methods 5 :877-879 (2008)). Prior to hybridizations, lipids were extracted by incubation of fixed cells in 2: 1 chloroform:methanol for 30 min at room temperature. Cells were washed quickly with 70% ethanol and then resuspended in 200μ1 RNA Hybridization buffer containing 2x SSC buffer, 25%> Formamide, 10% Dextran Sulphate (Sigma), E. coli tRNA (Sigma), Bovine Serum Albumin (Ambion), Ribonucleoside Vanadyl Complex and 150 ng of each desired probe set (the mass refers only to pooled oligonucleotides, excluding fluorophores, and is based on absorbance measurements at 260 nm). Hybridizations were performed for 16-18 h at 30 °C, after which cells were washed twice for 30 min at 30 °C in RNA Wash buffer (containing 2 SSC buffer, Formamide 25% (Ambion)
and 100 ng/ml DAPI). For microscopy, cells were resuspended in a mounting solution containing 1 x PBS 0.4% Glucose, 100 μg/ml Catalase, 37 μg/ml Glucose Oxidase and 2 mM Trolox and immobilized on poly-lysine coated chambered cover glasses. Imaging was performed as described above, using an inverted epi- fluorescence microscope (Nikon) equipped with a high-resolution CCD camera (Pixis, Princeton Instruments) and a 100x magnification oil immersion, high numerical aperture Nikon objective. An image stack consisting of 50 image planes spaced 0.3 um apart was acquired per region of interest. Individual images were filtered with a high-pass Fast Fourier Transform filter, where the filter cutoff was chosen to preserve diffraction-limited signals. Filtering was repeated on the resulting image of the maximum projection. Signal positions, widths, and intensities were quantified by fitting 2D Gaussians approximating the point-spread function (PSF) of the microscope. To separate sporadic signals caused by autofluorescence or non-specifically bound probes from real mRNA signals, signals were filtered based on width and signal-to-noise ratio. Cells were segmented manually and signals were assigned to individual cells.
Computational analysis of sequence data
[0120] All second sequence reads were aligned to a reference database containing all human RefSeq mRNA sequences (obtained from the UCSC Genome Browser hgl9 reference set), the human hgl9 mitochondrial reference sequences and the ERCC RNA spike-in reference sequences, using bwa version 0.7.4 4 with non-default parameter "-1 24". Read pairs for which the second read aligned to a human RefSeq gene were kept for further analysis if 1) the initial six bases of the first read all had quality scores of at least 10 and corresponded exactly to a designed well-barcode and 2) the next ten bases of the first read (the UMI) all had quality scores of at least 30. Digital gene expression (DGE) profiles were then generated by counting, for each microplate well and RefSeq gene, the number of unique UMIs associated with that gene in that well. Python scripts were used to implement the alignment and DGE derivation from the samples. Computational analysis of DGE profiles
[0121] All computational and statistical analyses were performed using Python 2.7 with the Enthought Canopy Distribution, Numpy 1.8.0 and Scipy 0.13.0, scikit- learn 0.14, and Matplotlib 1.3.1. For each plate, wells with less than 1,000 or more than 10,000 total UMI counts were discarded (24% of all wells, largely low- value wells). The UMI counts for each gene in the remaining wells were then normalized by dividing by the sum of UMI counts across all genes in the same well. This normalization removes variation from differences in RNA content per cell and can be revisited for analyses that are sensitive to this phenomenon. Pairwise Pearson correlations between genes across single cells and their associated p-values were computed using the scikit-learn metrics .pairwise_distances function. The 5% false discovery rate (FDR) thresholds were estimated from the p-value distribution using the Benjamini-Hochberg-Yukeli procedure. The expected null distributions of pairwise correlation coefficients were estimated by permuting expression values across cells from the same time point and re-computing the pairwise correlations 100 times. Principal component analyses (PC A) were performed by first scaling the normalized UMI-derived expression levels of each gene to zero mean and unit variance using the scikit-learn preprocess. scale function and then applying the RandomizedPCA transformation. Each time course dataset was processed separately. To project lipid- sorted cell data into the corresponding time course principal component space (i.e., the three dimensional space represented by the 3 major principal components), the time course and lipid-sorted expression values were concatenated and re-scaled prior to applying the time course PCA
transformation. Gene set enrichment analyses (GSEA) were performed using the GSEAPreRanked module of the GSEA 2.0 software
(http://www.broadinstitute.org/gsea/) with the MSigDB 4.0 gene sets 6. Genes were ranked by the PC weights for interpretation of PC metagenes or by the signal to noise metric (μΑ+μΒ/σΑ-σΒ) for comparisons of low and high lipid cells.
Significant gene sets were called at the threshold recommended by the GSEA developers (25% FDR). Results
[0122] A variety of cell populations can be induced to differentiate into adipocytes by treating the cells with cocktails of adipogenic hormones and growth factors. However, the yields of lipid- filled, adipocyte-like cells obtained from these methods are highly variable. Moreover, it is unclear whether this variability reflects heterogeneity in the starting populations, stochastic responses to imperfect differentiation stimuli, or other factors. Thus, adipocyte differentiation was selected as a good model system to test single-cell sequencing. The most commonly used cell line in adipogenesis research is the immortalized murine 3T3- Ll cell line, which supports near complete conversion to adipocyte-like cells. Numerous molecular differences have, however, been found between this cell line and human adipocyte stem cells (hASCs). Single-cell profiling should help clarify the nature of these differences.
[0123] hASC cultures were collected just prior to induction of differentiation (day 0), as well as at seven time points after induction (days 1, 2, 3, 5, 7, 9 and 14). At day 14, approximately two thirds of the cells contained clearly visible lipid droplets while the remainder retained a more fibroblastlike morphology. A nucleic acid stain was used to identify and sort intact single cells into 384-well plates with a fluorescence-activated cell sorter. A neutral lipid stain also was used to separately sort single cells based on their lipid contents. This method allowed us to combine the advantages of FACS sorting, such as staining cells using, for example, a DNA stain or a lipid stain, and selecting specific cells to profile. Additional cells then were collected and sorted from independent cultures at days 0, 3 and 7. In total, single-cell sequencing libraries were prepared from 44 microplates. The plates were sequenced to a mean depth of -165,000 reads per well and the reads aligned to RefSeq transcripts. After stringent filtering on sequence and alignment quality, and then estimating the expression levels in each cell from UMI counts (Figure 18), survey-depth digital gene expression (DGE) profiles were obtained from a total of 12,832 cells (76% of the total wells). As judged by the UMI counts, each DGE profile captured between 1,000 and -10,000 unique mRNAs (mean = 2,602 and 3,336 for the protocols from Example 1 and this Example, respectively),
which constitutes a ~4-fold increase in mean library complexity relative to a previous high-throughput protocol (Jaitin et al, Science 343:776-779 (2014)).
[0124] Initial analysis of the resulting data showed that the mean gene expression levels across the single cell profiles were significantly correlated with their corresponding levels from bulk unsorted cells collected at the same time point (r = 0.8, p < 10-100; Figure 17A). Of 15,099 distinct RefSeq genes that were detected at day 0 in bulk unsorted cells, 14,612 (97%) also were detected in at least one single cell from the same day. As expected from the relatively low sequencing coverage, only the most actively transcribed genes were captured from every cell (Figure 19). However, significant positive and negative correlations still could be detected between the expression levels of individual genes across cells collected on the same day (Figure 17B). For example, LPL and G0S2, two traditional markers that are both up-regulated after induction of adipogenesis, had positively correlated expression levels after differentiation (r = 0.23, p < 10-12 on day 7; FDR < 5%). A positive correlation could be validated between these genes both by qRT-PCR analysis of independently sorted single cells (Figure 17C) and in situ by multiplexed single molecule FISH (smFISH; Figure 17D and Figure 20). Thus, the single cell RNA sequencing method tested can capture gene expression variation at single-cell resolution. [0125] To understand the observed cell-to-cell variation in gene expression in more detail, a principal component analysis (PCA) of the initial time course (days 0 to 14; 6,197 cells; Figure 21A-H) was performed. Plotting the position of each cell in the space defined by the first three principal components revealed that there was little overlap between cells from day 0 and cells from later time points. This suggested that addition of the adipogenic differentiation cocktail induced a rapid response in virtually all of the cultured cells. Plotting the positions also revealed that gene expression levels continued to evolve from day 1 to day 14, but that there was substantial overlap between the cells collected at close time points. This is consistent with a population-wide, but asynchronous, response to induction of differentiation.
[0126] To explore the biological basis for the observed gene expression variation, the relationships between each of the top principal components (PCs), gene expression and time, were then examined (Figure 22). The PCs can be interpreted as metagenes that capture coordinated expression of multiple genes in the original data set. For each PC, we therefore ranked the genes according to their corresponding PC weights and then looked for evidence of coordinately regulated pathways using gene set enrichment analysis (GSEA). This analysis suggested qualitative biological interpretations for at least the top four PCs.
[0127] The first PC metagene (PCI) was positively associated with genes involved in general cellular metabolism, including the majority of genes involved in ribosome assembly, mitochondrial biogenesis, and oxidative phosphorylation, while it was negatively associated with inflammatory pathways, cytokine production and caspase expression. Variations along PCI reflect differences between metabolically active "healthy" and inactive "unhealthy" cells.
Interestingly, while there was a shift towards the latter state towards day 14, there was substantial overlap between the PCI distributions from all time points, which indicates that this axis of variation was a major contributor to culture heterogeneity prior to induction of differentiation. Because significant cell detachment or death was not observed during the two weeks of differentiation, the inflammation signature likely represents a chronic cell state rather than ongoing apoptosis. By contrast, PC2 was high only in cells collected from day 0, effectively separating these from the differentiating cells. It showed a strong positive association with expression of genes required for progression through the mitotic cell cycle and, to a lesser extent, with genes associated with non-adipogenic differentiation. A decrease in PC2 may therefore reflect an exit from the cell cycle and lineage commitment. Expression of PC3 was high during the first two days post- induction, but steadily decreased as the cells approached day 14. This decrease was associated with up-regulation of lipid homeostasis pathways and markers of adipocyte maturation. PC4 showed a transient drop at day 1 , which was associated with increased expression of genes known to be rapidly induced by adipogenic
cocktails, including early adipogenic regulators CEBPB and CEBPD 11. PC4 may therefore reflect an early response to induction of differentiation.
[0128] To explore the relationship between variations in gene expression and in lipid droplet accumulation, an additional 933 cells with high lipid content and an additional 666 cells with low lipid content were collected and analyzed at day 14. When the DGE profiles of these cells were projected into the space defined by the initial time course PCs, the high and low lipid cells were largely separated by their distribution along PCI (Figure 211 and Figure 22). Particularly, cells with higher lipid content showed higher expression of genes related to basic cellular metabolism, while cells with lower lipid content showed higher expression of inflammatory genes. Interestingly, there was substantial overlap along PC3, and while some classic adipocyte markers like FABP4 (aP2) were enriched in the high lipid fraction, key regulatory factors such as PPARG were not. This implies that pathways related to lipid homeostasis and adipocyte maturation had been activated in both fractions.
[0129] Separate PCAs of the second collected time course (2,968 cells from days 0, 3 and 7, and 2,068 additional cells with high or low lipids from day 7) yielded qualitatively similar patterns, which suggests that the observations are robust to technical variation across cell cultures. Thus, while morphological analysis suggested that only a fraction of hASCs respond to the differentiation cocktail, the single-cell data surprisingly show that virtually all of the cells exited the mitotic cell cycle and proceeded to up-regulate an adipogenic gene expression program. The observed variability in lipid droplet accumulation and conversion to mature adipocyte-like morphologies is instead most strongly linked to an inverse correlation in expression of basic cellular metabolism and inflammatory expression programs, which was also present prior to the induction of differentiation.
Notably, cells with low lipid contents showed elevated expression of several proinflammatory regulatory factors, including IRF1, IRF3 and IRF4. These factors have previously been shown to negatively influence total lipid accumulation in murine bulk cultures and in vivo models, which supports a causal link between
cell-to-cell variation in expression of these factors and lipid accumulation.
Specific activation in the fraction of low lipid cells may explain the paradoxical increases in expression of these factors that have previously been observed in bulk cultures. Example 4: Protocol for high throughput sequencing
[0130] Although the protocols described above were originally designed to perform RNA sequencing on sorted single cells, they are also suitable for use with other starting samples, such as extracted or purified RNA (bulk RNA sequencing) or a population cells or tissues (e.g., cell or tissue lysates). As with single cell RNA sequencing, using a 3 ' digital gene expression method allows the profiling of a high number of samples in a cost-efficient manner. The protocol is robust for a broad range of input from single cells to pooled cells or extracted RNA. It allows the profiling of a large number of samples of extracted RNA (patient samples for example), profiling of a population of small number of cells (e.g., cell or tissue lysates), as well as analysis of sorted, single cells. Regardless of starting materials, the use of the barcodes and UMIs described herein permit the tracking of individual transcripts to a specific multi-well plate and to a specific well of that plate, thus permitting correlation of data to the original starting material. The above examples are indicative of the powerful applications of the technology. [0131] By way of further example, the ability to correlate expression analysis to a particular well of a multi-well plate (e.g., to the starting sample) is critical in the screening assay context, regardless of whether the material in the screen is a single cell or lysate. Because the bar codes and UMI allow tracking of individual transcripts, sequencing reactions can be run as massive multiplex reactions rather than a series of individual reactions without losing transcript-level data. This results in a significant increase in efficiency and decrease in cost. The sequencing data then can be deconvo luted using, for example, 3 ' digital gene expression to count the number of occurrences of bar code and UMI sequences and obtain an expression level for a particular transcript.
[0132] The methods and reagents described herein also are adaptable to other platforms, e.g., micro fluidic systems such as Fluidigm's CI micro fluidic device. For example, the capture of 96 cells was performed on the CI chip, and the reagents and adapters to prepare the cDNA were incorporated directly on the C 1 chip. cDNAs were retrieved as an output of the CI chip, pooled, and prepared as a Nextera library.
[0133] The nucleic acids, methods, and kits of the invention also provide the ability to profile single cells for which it is not possible to do an individual RNA extraction and purification, or, by working directly with lysates, profiling a high number of conditions under which cells are cultivated without necessarily performing a separate RNA extraction and purification step (e.g., if sequencing cells from a high throughput compound screen, it is unnecessary to extract and purify the RNA from each well individually).
[0134] In certain embodiments, one or more of the following modifications to the protocol or reagents used were and can optionally be employed. Specifically, another reverse transcriptase can be used, such as the MMLV Maxima H Minus Reverse Transcriptase (Thermo Scientific). At this point, numerous different MMLV reverse transcriptases have been successfully used and can be selected based on user preference, cost, availability and the like. In certain embodiments, a proteinase or protease, such as proteinase K, may be added during lysis. In certain embodiments, proteinase K is included as part of lysis for sorted single cells and isolated cells/ly sates. Higher concentrations of proteinase K and increased incubation times are used, in certain embodiments, for a pool of cells as compared to single cells. Other modifications include a reduction in the volume of the RT reaction to 2μ1 by drying out the RNA during the proteinase K inactivation to increase reaction efficiency and use of 6-nucleotide barcodes to refer to a sample or pool instead of a single cell when performing sequencing on extracted RNA or a pool of cells.
[0135] For bulk RNA sequencing, lOng of total RNA were used as input, although this amount is flexible. Additionally, reactions were performed in ΙΟμΙ,
and the reactions used more concentrated (ΙΟμΜ) template-switching and barcode- containing oligonucleotides. For RNA sequencing of lysates, inputs ranged from single cells to 10,000 cells (including tens or hundreds of cells). For pooled cells, more concentrated proteinase K (2mg/ml instead of lmg/ml for single cells) was used, and the cells were incubated longer (one hour at 50 °C instead of 15 minutes) to increase lysis efficiency.
[0136] An exemplary protocol is as follows.
Capture plate preparation
[0137] Add 5]iL of lysis buffer, composed of a 1/500 dilution of Phusion HF buffer (New England Biolabs, #B0518S) in each well of a collection Twin.tec PCR 384-well plate (Eppendorf, # 951020729).
Cell preparation
[0138] Remove media by pelleting the cells (5min at lOOOrpm), and resuspend the cells in RNAprotect Cell Reagent (-ΙΟΟμί per 100,000 cells, Qiagen, #76526) and Ι μΙ^ of RNaseOUT Recombinant Ribonuclease Inhibitor (Life Technologies, #10777-019). Cells can be stored up to 2 weeks at 4 °C. Next, dilute the cells in ~1.5mL PBS, pH 7.4 (no calcium, no magnesium, no phenol red, Life
Technologies, #10010-049). Stain the cells for viability (DNA staining by Hoechst 33342) with NucBlue Live ReadyProbes Reagent (Life Technologies, #R37605). Cell collection
[0139] Sort individual cells in each well of the 384-well capture plate using the FACSAria II flow cytometer (BD Biosciences). "Live" cells are selected and duplets avoided using the Hoechst DNA staining. After sorting, immediately seal the plates, spin them down, and freeze them on dry ice. Sorted cells are stored at -80 °C. If performing bulk lysate sequencing, which starts with extracted/purified RNA and proceeds directly to reverse transcription/template switching, this step should be skipped.
Cell Lysis
[0140] Thaw the cells for 5 minutes at room temperature, then place the plate on ice. Add Ι μΙ, of Proteinase K Solution (diluted to lmg/mL; 1/20;
LifeTechnologies, #AM2548) to each well. Incubate the plate at 50 °C for 15 minutes, then remove the seal and incubate the plate at 95°C for 10 minutes. Place the plate back on ice.
Reverse Transcription/Template Switching
[0141] Denature 42μ1 of a 1 x 10"6 dilution of ERCC RNA Spike-In Mix (Life Technologies, #4456740) for 2 min at 70°C, then place directly on ice. Prepare the following RT/template switching mix (for 384 wells): 160μ1 of 5x RT buffer, 80μ1 of dNTPs (New England Biolabs, #N0447L), 72μ1 of Nuclease-Free Water (not DEPC-Treated) water (LifeTechnologies, #AM9937), 40μ1 of a denatured 1 x 10"6 dilution of ERCC RNA Spike-In Mix (Life Technologies, #4456740), 8μ1 of the universal E5V6NEXT adapter (ΙΟΟμΜ, Eurogentec), and 50μί of Maxima H Minus Reverse Transcriptase (Thermo Scientific, #EP0753). Add Ι μΐ of the mix to each well and Ι μΙ, of the barcoded oligonucleotide adapter (2μΜ, Integrated DNA Technologies to each well. Incubate the plate at 42°C for 1 hour 30 minutes. cDNA pooling and purification
[0142] Pool all 384 wells together, and add 5.5mL of DNA Binding Buffer (Zymo Research, #D4004-1-L) to the pooled cDNAs. Purify all cDNAs pooled from one 384-well plate through one single DNA Clean & Concentrator-5 column (Zymo Research, #D4013). Elute cDNAs in 18 μί of Nuclease-Free Water.
Exonuclease I treatment
[0143] Add 2^L of 10X reaction buffer and Ι μΙ^ of Exonuclease I (New England Biolabs, #M0293L) to the cDNAs. Incubate the reaction at 37°C for 30 minutes, then at 80°C for 20 minutes.
Full length cDNA amplification
[0144] Amplify full length cDNA by single primer PCR using the Advantage 2 PCR Enzyme System (Clontech, #639206). The PCR reaction is as follows: 20μΙ, of cDNA from previous step, 5μί of 10X Advantage 2 PCR buffer, ΙμΕ of dNTPs, ΙμΕ of the SINGV6 primer (ΙΟμΜ, Integrated DNA Technologies), ΙμΕ of Advantage 2 Polymerase Mix, and 22μΕ of Nuclease-Free Water. Perform the PCT according to the following program: 95 °C for 1 minutes; 18 cycles of a) 95 °C for 15 seconds, b) 65 °C for 30 seconds, and c) 68°C for 6 minutes; 72 °C for 10 minutes; and, optionally, 4 °C to store the reaction.
Full length cDNA purification and quantification [0145] Purify the full length cDNAs with 30μΕ of Agencourt AMPure XP magnetic beads (Beckman Coulter, #A63880). Elute the full length cDNAs in 12μΕ of Nuclease-Free Water and quantify on the Qubit 2.0 Flurometer (Life Technologies) using the dsDNA HS Assay (Life Technologies. #Q32851).
Sequencing Library Preparation [0146] To increase complexity, all cDNA from the purified full length cDNA is engaged in the Nextera library preparation. If the total amount of cDNA is superior to lng and inferior to lOng, proceed to tagmentation reactions of ~lng according to the Illumina Nextera XT (FC- 131-1024) protocol. After the neutralization step, add 180μ1 DNA Binding Buffer (Zymo Research, #D4004-1-L) to each tagmentation reaction, and pool and purify the tagmentation reactions on one single DNA Clean & Concentrator-5 column (Zymo Research, #D4013).
Then, amplify the tagmented purified cDNA following the Illumina protocol with the exception of running only 10 cycles of PCR, using only the i7 primer to barcode cDNA originating from the same 384-well plate and replacing the i5 primer with P5NEXTPT5, 5μΜ (Integrated DNA Technologies) as the second primer. If the total amount of cDNA is superior to lOng and inferior to 50ng, proceed to the tagmentation using the Nextera DNA kit (FC-121-1030), suitable for 50ng of input. Scale down all reagents and reaction volume according to the input concentration. Purify the tagmented cDNA on a single DNA Clean &
Concentrator-5 column (Zymo Research, #D4013) according to the Illumina protocol. Use the 25 μΐ eluted cDNA for the library amplification, and use only the i7 primer to barcode cDNA originating from the same 384-well plate, replacing the i5 primer with P5NEXTPT5, 5μΜ (Integrated DNA Technologies) as the second primer. Do not add the PCR primer cocktail. Perform either 10 cycles (for an input of less than 20ng) or 5 cycles (for an input of 20ng and above) of PCR according to the Illumina protocol.
Sequencing Library Purification and Size Selection
[0147] Purify the sequencing library with 30μί of Agencourt AMPure XP magnetic beads and elute it in 20μί of water. Run the entire library on an E-Gel EX Gel, 2% (Life Technologies, #G4010-02) and excise, purify using the
QIAquick Gel Extraction Kit (Qiagen, #28704), and elute in 15μ1 the band corresponding to a size range of 300 to 800bp.
Sequencing Library Quality Assessment [0148] Quantify the library on the Qubit 2.0 Flurometer using the dsDNA HS Assay. Optionally, the quality and average size of the library can be assessed by BioAnalyzer (Agilent) with the High Sensitivity DNA kit (Agilent, #5067-4626).
Sequencing
[0149] Sequencing can be performed on any Illumina HiSeq or MiSeq, using the standard Illumina sequencing kit. Libraries are run on paired-end flow cells by running 17 cycles on the first end, then 8 cycles to decode the Nextera barcode and finally 46 cycles. Up to twelve Nextera libraries/384-well capture plate, each comprising 384 cells, can be multiplexed together (twelve i7 barcodes currently available) allowing the simultaneous sequencing of up to 4,608 single cell transcriptomes on a single lane.
Exemplary sequences are provided below and herein. Such sequences are merely illustrative of various polynucleotides and components useful in the methods of the
present invention. These polynucleotides are suitable across any of the various sample types described herein (e.g., single cells, lysates, bulk RNA, etc.).
Adapter/Primer Sequences
Template-switching oligonucleotide 5 ' -iCiGiC ACACTCTTTCCCTACACGACGCrGrGrG-3 ' (SEQ ID NO : 17) iC : iso-dC iG: iso-dG rG: RNA G
Bar code-containing oligonucleotide adapter 5'-
/5Biosg/ACACTCTTTCCCTACACGACGCTCTTCCGATCT[BC6]NNNNNNN NNNTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTVN-3' (SEQ ID NO: 18)
5Biosg: 5 ' biotin
V: (A, G, or C) N: (A, G, C, or T)
[BC6] : 6bp barcode, different in each well. The barcodes were designed such that each barcode differs from the others by at least two nucleotides, so that a single sequencing error cannot lead to the misidentification of the barcode. (N)10 : Unique Molecular Identifier (UMI). Amplification primer
5 '-/5Biosg/ACACTCTTTCCCTACACGACGC-3 ' (SEQ ID NO: 19) 5Biosg : 5 ' biotin
Phosphorothioate bond-containing nucleic acid
5'-
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTC TTCCG*A*T*C*T*-3' (SEQ ID NO: 3) * : phosphorothioate bond
Claims
1. A nucleic acid comprising a 5 ' poly-isonucleotide sequence, an internal adapter sequence, and a 3 ' guanosine tract.
2. The nucleic acid of claim 1, wherein the 5' poly-isonucleotide sequence comprises an isocytosine.
3. The nucleic acid of claims 1 or 2, wherein the 5' poly-isonucleotide sequence comprises an isoguanosine.
4. The nucleic acid of any one of claims 1-3, wherein the 5' poly- isonucleotide sequence comprises an isocytosine-isoguanosine -isocytosine sequence.
5. The nucleic acid of any one of claims 1-4, wherein the 3' guanosine tract comprises two guanosines, three guanosines, four guanosines, five guanosines, six guanosines, seven guanosines, or eight guanosines.
6. The nucleic acid of claim 5, wherein the 3' guanosine tract comprises three guanosines.
7. The nucleic acid of any one of claims 1-6, wherein the adapter sequence is 12 to 32 nucleotides in length.
8. The nucleic acid of claim 7, wherein the adapter sequence is 22 nucleotides in length.
9. The nucleic acid of claim 8, wherein the internal adapter sequence is 5'- ACACTCTTTCCCTACACGACGC-3 ' .
10. A nucleic acid comprising a 5' blocking group, an internal adapter sequence, a barcode sequence, a unique molecular identifier (UMI) sequence, a complementarity sequence, and a 3' dinucleotide sequence comprising a first nucleotide and a second nucleotide, wherein the first nucleotide of the dinucleotide sequence is a nucleotide selected from adenine, guanine, and cytosine, and the
second nucleotide of the dinucleotide sequence is a nucleotide selected from adenine, guanine, cytosine, and thymine.
11. The nucleic acid of claim 10, wherein the 5 ' blocking group is selected from biotin and an inverted nucleotide.
12. The nucleic acid of claim 11, wherein the 5' blocking group is biotin.
13. The nucleic acid of any one of claims 10-12, wherein the internal adapter sequence is 23 to 43 nucleotides in length.
14. The nucleic acid of claim 13, wherein the internal adapter sequence is 33 nucleotides in length.
15. The nucleic acid sequence of claim 14, wherein the internal adapter sequence is 5'-ACACTCTTTCCCTACACGACGC-3'.
16. The nucleic acid of any one of claims 10-15, wherein the barcode sequence is 4 to 20 nucleotides in length.
17. The nucleic acid of claim 16, wherein the barcode sequence is 6 nucleotides in length.
18. The nucleic acid of any one of claims 10-17, wherein the UMI sequence is six to 20 nucleotides in length.
19. The nucleic acid of claim 18, wherein the UMI sequence is ten nucleotides in length.
20. The nucleic acid of any one of claims 10-19, wherein the complementarity sequence is a poly(T) sequence.
21. The nucleic acid of any one of claims 10-20, wherein the complementarity sequence is 20 to 40 nucleotides in length.
22. The nucleic acid of claim 21, wherein the complementarity sequence is 30 nucleotides in length.
23. A kit comprising a nucleic acid of any one of claims 1-9.
24. The kit of claim 23, further comprising a nucleic acid of any one of claims 10-23.
25. The kit of claim 24, wherein the kit comprises a plurality of nucleic acids of any one of claims 10-23.
26. The kit of claim 25, wherein the UMI sequence of each nucleic acid in the plurality of nucleic acids is unique among the nucleic acids in the kit.
27. The kit of claim 25 or 26, wherein the plurality of nucleic acids comprises different populations of nucleic acid species.
28. The kit of claim 27, wherein each population of nucleic acid species comprises a different barcode sequence that uniquely identifies a single population of nucleic acid species.
29. The kit of claim 25, wherein each population of nucleic acid species is in a separate container, and the bar code of each population of nucleic acid species differs by at least two nucleotides from the bar code of each other population of nucleic acid species.
30. The kit of any one of claims 23-29, further comprising a third nucleic acid primer comprising 12 to 32 nucleotides and a 5' blocking group.
31. The kit of claim 30, wherein the 5 ' blocking group is selected from biotin and an inverted nucleotide.
32. The kit of claim 31 , wherein the 5 ' blocking group is biotin.
33. The kit of any one of claims 30-32, wherein the third nucleic acid is 22 nucleotides in length.
34. The kit of claim 33, wherein the sequence of the nucleic acid primer is 5'- ACACTCTTTCCCTACACGACGC-3 ' .
35. The kit of any one of claims 23-34, further comprising a nucleic acid comprising a barcode sequence.
36. The kit of any one of claims 23-35, further comprising a phosphorothioate bond-containing nucleic acid comprising an X1 *X2*X3*X4*X5*3' sequence, wherein * is a phosphorothioate bond.
37. The kit of claim 36, wherein the phosphorothioate bond-containing nucleic acid is 48 to 68 nucleotides in length.
38. The kit of claim 37, wherein the phosphorothioate bond-containing nucleic acid is 58 nucleotides in length.
39. The kit of claim 38, wherein the sequence of the phosphorothioate bond- containing nucleic acid is 5'-
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTC TTCCG*A*T*C*T*-3'.
40. The kit of any one of claims 23-39, further comprising a capture plate.
41. The kit of any one of claims 23-40, further comprising a reverse transcriptase enzyme.
42. The kit of claim 41, wherein the reverse transcriptase enzyme is a Moloney Murine Leukemia Virus (MMLV) reverse transcriptase.
43. The kit of claim 42, wherein the MMLV reverse transcriptase is
SMARTscribe™ reverse transcriptase, Superscript II™ reverse transcriptase, or Maxima H Minus™ reverse transcriptase.
44. The kit of any one of claims 23-43, further comprising a DNA purification column.
45. The kit of claim 44, wherein the DNA purification column is a DNA purification spin column.
46. The kit of any one of claims 23-45, further comprising proteinase K.
47. A method for gene profiling, comprising: a) providing a plurality of single cells; b) releasing mRNA from each single cell to provide a plurality of individual mRNA samples, wherein each individual mRNA sample is from a single cell; c) reverse transcribing the individual mRNA samples and performing a template switching reaction to produce cDNA incorporating a barcode sequence; d) pooling and purifying the barcoded cDNA produced from the separate cells; e) amplifying the barcoded cDNA to generate a cDNA library comprising double-stranded cDNA; f) purifying the double-stranded cDNA; g) fragmenting the purified cDNA; h) purifying the cDNA fragments; and i) sequencing the cDNA fragments.
48. A method for gene profiling, comprising: a) providing an isolated population of cells;
b) releasing mRNA from the population of cells to provide one or more mRNA samples;
c) reverse transcribing the one or more mRNA samples and performing a template switching reaction to produce cDNA incorporating a barcode sequence;
d) pooling and purifying the barcoded cDNA;
e) amplifying the barcoded cDNA to generate a cDNA library
comprising double-stranded cDNA;
f) purifying the double-stranded cDNA;
g) fragmenting the purified cDNA;
h) purifying the cDNA fragments; and
i) sequencing the cDNA fragments.
49. The method of claim 47 or 48, further comprising separating a population of cells to provide the plurality of single cells.
50. The method of claim 49, wherein the cells are separated into a capture plate.
51. The method of any one of claims 48-50, wherein the cells are separated by flow cytometry.
52. The method of any one of claims 48-50, wherein the mRNA is released by cell lysis.
53. The method of claim 52, wherein the cells are lysed by freeze-thawing.
54. The method of claim 52 or 53, further comprising contacting the cells with proteinase K.
55. The method of any one of claims 47-54, wherein c) comprises contacting each individual mRNA sample with a nucleic acid of any one of claims 1-9 and a nucleic acid of any one of claims 10-22.
56. The method of any one of claims 47-54, wherein c) is carried out with a reverse transcriptase enzyme.
57. The method of claim 56, wherein the reverse transcriptase enzyme is a Moloney Murine Leukemia Virus (MMLV) reverse transcriptase.
58. The method of claim 57, wherein the MMLV reverse transcriptase is SMARTscribe™ reverse transcriptase, Superscript II™ reverse transcriptase, or Maxima H Minus™ reverse transcriptase.
59. The method of any one of claims 47-58, wherein the cDNA purification of d) is carried out with a Zymo-Spin™ column.
60. The method of any one of claims 47-58, further comprising treating the barcoded cDNA with an exonuclease.
61. The method of claim 60, wherein the exonuclease is Exonuclease I.
62. The method of any one of claims 47-61, wherein the amplification of e) utilizes an amplification primer comprising a 5' blocking group.
63. The method of claim 62, wherein the blocking group is selected from biotin and an inverted nucleotide.
64. The method of claim 63, wherein the blocking group is biotin.
65. The method of any one of claims 62-64, wherein the amplification primer is 12 to 32 nucleotides in length.
66. The method of claim 65, wherein the nucleotide is 22 nucleotides in length.
67. The method of claim 66, wherein the sequence of the amplification primer is 5'-ACACTCTTTCCCTACACGACGC-3'.
68. The method of any one of claims 47-67, wherein the purification of f) is carried out with magnetic beads.
69. The method of any one of claims 47-68, wherein f) further comprises quantifying the purified cDNA.
70. The method of any one of claims 47-69, wherein the single cells are provided in a capture plate of individual wells, each well comprising a single cell.
71. The method of any one of claims 47-70, wherein the fragmentation of g) utilizes a transposase.
72. The method of any one of claims 47-71, wherein the fragmentation of g) utilizes a first fragmentation nucleic acid and a second fragmentation nucleic acid, wherein the first fragmentation nucleic acid comprises a barcode sequence.
73. The method of claim 72, wherein the sequence of the first fragmentation nucleic acid is 5'-
CAAGCAGAAGACGGCATACGAGAT[i7]GTCTCGTGGGCTCGG-3 ', wherein [i7] is a nucleic acid sequence.
74. The method of claim 73, wherein [i7] is a nucleic acid sequence between four and 16 nucleotides in length.
75. The method of claim 74, wherein [i7] is eight nucleotides in length.
76. The method of claim 75, wherein the sequence of [i7] is selected from: TCGCCTTA, CTAGTACG, TTCTGCCT, GCTCAGGA, AGGAGTCC,
CATGCCTA, GTAGAGAG, CCTCTCTG, AGCGTAGC, CAGCCTCG, TGCCTCTT, and TCCTCTAC.
77. The method of any one of claims 72-76, wherein the barcode sequence of the first fragmentation nucleic acid is different than the barcode sequence of the nucleic acid of any one of claims 10-22.
78. The method of claim 77, wherein the barcode sequence of the first fragmentation nucleic acid uniquely identifies a predetermined subset of cells.
79. The method of claim 78, wherein the predetermined subset of cells is a subset of cells contained in individual wells of a single capture plate.
80. The method of claim 79, wherein the barcode sequence that uniquely identifies the predetermined subset of cells uniquely identifies the capture plate.
81. The method of any one of claims 77-79, wherein the barcode sequence of the nucleic acid of any one of claims 10-22 uniquely identifies the cell within the predetermined subset of cells, which cell comprised the mR A from which the barcoded cDNA of c) was produced.
82. The method of claim 81 , wherein the barcode sequence that uniquely identifies the cell within the predetermined subset of cells uniquely identifies an individual well in a capture plate.
83. The method of claim 82, wherein the combination of the barcode sequence that uniquely identifies the predetermined subset of cells and the barcode sequence that uniquely identifies the cell within a predetermined subset of cells uniquely identifies the capture plate and the individual well which comprised the cell, which cell comprised the mRNA from which the barcoded cDNA of c) was produced.
84. The method of any one of claims 72-83, wherein the barcode sequence of the first fragmentation nucleic acid is 4 to 20 nucleotides in length.
85. The method of claim 84, wherein the barcode sequence is 6 nucleotides in length.
86. The method of claim 85, wherein the second fragmentation nucleic acid is a phosphorothioate bond-containing nucleic acid comprising an
X1 *X2*X3*X4*X5*3' sequence, wherein * is a phosphorothioate bond.
87. The method of claim 86, wherein the second fragmentation nucleic acid is 48 to 68 nucleotides in length.
88. The method of claim 87, wherein the second fragmentation nucleic acid is 58 nucleotides in length.
89. The method of claim 88, wherein the sequence of the second fragmentation nucleic acid is 5'-
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTC TTCCG*A*T*C*T*-3'.
90. The method of any one of claims 47-89, wherein the purification of h) is carried out with magnetic beads.
91. The method of claim 90, further comprising separating the magnetic-bead purified cDNA on an agarose gel, excising cDNA corresponding to 300 to 800 nucleotides in length, and purifying the excised cDNA.
92. The method of any one of claims 47-91, wherein h) further comprises quantifying the purified cDNA.
93. The method of any one of claims 47-92, wherein the sequencing of i) is carried out using R A-seq.
94. The method of any one of claims 47-93, further comprising assembling a database of the sequences of the sequenced cDNA fragments of j).
95. The method of claim 94, further comprising identifying the UMI sequences of the sequences of the database.
96. The method of claim 95, further comprising discounting duplicate sequences that share a UMI sequence, thereby assembling a set of sequences in which each sequence is associated with a unique UMI.
97. The method of any one of claims 47-96, further comprising repeating a) through h) before i) to produce a plurality of populations of cDNA fragments.
98. The method of claim 97, wherein the populations of cDNA fragments are combined prior to i).
99. The method of any one of claims 72-98, wherein the barcode sequence of the first fragmentation nucleic acid and the barcode sequence of the nucleic acid of any one of claims 10-22 are used to correlate the sequencing data with the predetermined subset of cells and the individual cell.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/898,030 US20160122753A1 (en) | 2013-06-12 | 2014-06-12 | High-throughput rna-seq |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201361834163P | 2013-06-12 | 2013-06-12 | |
US61/834,163 | 2013-06-12 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014201273A1 true WO2014201273A1 (en) | 2014-12-18 |
Family
ID=52022775
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2014/042159 WO2014201273A1 (en) | 2013-06-12 | 2014-06-12 | High-throughput rna-seq |
Country Status (2)
Country | Link |
---|---|
US (1) | US20160122753A1 (en) |
WO (1) | WO2014201273A1 (en) |
Cited By (65)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016125106A1 (en) | 2015-02-05 | 2016-08-11 | Technion Research & Development Foundation Limited | System and method for single cell genetic analysis |
WO2016134078A1 (en) * | 2015-02-19 | 2016-08-25 | Becton, Dickinson And Company | High-throughput single-cell analysis combining proteomic and genomic information |
WO2016172373A1 (en) * | 2015-04-23 | 2016-10-27 | Cellular Research, Inc. | Methods and compositions for whole transcriptome amplification |
WO2016191533A1 (en) * | 2015-05-26 | 2016-12-01 | The Trustees Of Columbia University In The City Of New York | Rna printing and sequencing devices, methods, and systems |
US9567646B2 (en) | 2013-08-28 | 2017-02-14 | Cellular Research, Inc. | Massively parallel single cell analysis |
WO2017079593A1 (en) * | 2015-11-04 | 2017-05-11 | Atreca, Inc. | Combinatorial sets of nucleic acid barcodes for analysis of nucleic acids associated with single cells |
US9708659B2 (en) | 2009-12-15 | 2017-07-18 | Cellular Research, Inc. | Digital counting of individual molecules by stochastic attachment of diverse labels |
US9727810B2 (en) | 2015-02-27 | 2017-08-08 | Cellular Research, Inc. | Spatially addressable molecular barcoding |
US20180002749A1 (en) * | 2016-06-30 | 2018-01-04 | Grail, Inc. | Differential tagging of rna for preparation of a cell-free dna/rna sequencing library |
WO2018023068A1 (en) | 2016-07-29 | 2018-02-01 | New England Biolabs, Inc. | Methods and compositions for preventing concatemerization during template- switching |
US9905005B2 (en) | 2013-10-07 | 2018-02-27 | Cellular Research, Inc. | Methods and systems for digitally counting features on arrays |
EP3262214A4 (en) * | 2015-02-27 | 2018-07-25 | Fluidigm Corporation | Single-cell nucleic acids for high-throughput studies |
JP2018526026A (en) * | 2015-08-28 | 2018-09-13 | イルミナ インコーポレイテッド | Single-cell nucleic acid sequence analysis |
WO2018204423A1 (en) * | 2017-05-01 | 2018-11-08 | Illumina, Inc. | Optimal index sequences for multiplex massively parallel sequencing |
WO2018208699A1 (en) * | 2017-05-08 | 2018-11-15 | Illumina, Inc. | Universal short adapters for indexing of polynucleotide samples |
WO2018226293A1 (en) * | 2017-06-05 | 2018-12-13 | Becton, Dickinson And Company | Sample indexing for single cells |
EP3194593B1 (en) * | 2014-09-15 | 2019-02-06 | AbVitro LLC | High-throughput nucleotide library sequencing |
US10202641B2 (en) | 2016-05-31 | 2019-02-12 | Cellular Research, Inc. | Error correction in amplification of samples |
US10240148B2 (en) | 2016-08-03 | 2019-03-26 | New England Biolabs, Inc. | Methods and compositions for preventing concatemerization during template-switching |
US10301677B2 (en) | 2016-05-25 | 2019-05-28 | Cellular Research, Inc. | Normalization of nucleic acid libraries |
US10338066B2 (en) | 2016-09-26 | 2019-07-02 | Cellular Research, Inc. | Measurement of protein expression using reagents with barcoded oligonucleotide sequences |
US10370630B2 (en) | 2014-02-10 | 2019-08-06 | Technion Research & Development Foundation Limited | Method and apparatus for cell isolation, growth, replication, manipulation, and analysis |
WO2019165181A1 (en) * | 2018-02-23 | 2019-08-29 | Yale University | Single-cell freeze-thaw lysis |
WO2019191122A1 (en) * | 2018-03-26 | 2019-10-03 | Qiagen Sciences, Llc | Integrative dna and rna library preparations and uses thereof |
EP3494214A4 (en) * | 2016-08-05 | 2020-03-04 | Bio-Rad Laboratories, Inc. | Second strand direct |
WO2020046833A1 (en) * | 2018-08-28 | 2020-03-05 | Cellular Research, Inc. | Sample multiplexing using carbohydrate-binding and membrane-permeable reagents |
US10619186B2 (en) | 2015-09-11 | 2020-04-14 | Cellular Research, Inc. | Methods and compositions for library normalization |
US10640763B2 (en) | 2016-05-31 | 2020-05-05 | Cellular Research, Inc. | Molecular indexing of internal sequences |
US10641772B2 (en) | 2015-02-20 | 2020-05-05 | Takara Bio Usa, Inc. | Method for rapid accurate dispensing, visualization and analysis of single cells |
CN111406114A (en) * | 2017-05-29 | 2020-07-10 | 哈佛学院董事及会员团体 | Method for amplifying single cell transcriptome |
US10718014B2 (en) | 2004-05-28 | 2020-07-21 | Takara Bio Usa, Inc. | Thermo-controllable high-density chips for multiplex analyses |
US10722880B2 (en) | 2017-01-13 | 2020-07-28 | Cellular Research, Inc. | Hydrophilic coating of fluidic channels |
US10822643B2 (en) | 2016-05-02 | 2020-11-03 | Cellular Research, Inc. | Accurate molecular barcoding |
US10941396B2 (en) | 2012-02-27 | 2021-03-09 | Becton, Dickinson And Company | Compositions and kits for molecular counting |
EP3733865A4 (en) * | 2017-12-28 | 2021-09-08 | MGI Tech Co., Ltd. | Method for obtaining single-cell mrna sequence |
US11124823B2 (en) | 2015-06-01 | 2021-09-21 | Becton, Dickinson And Company | Methods for RNA quantification |
US11164659B2 (en) | 2016-11-08 | 2021-11-02 | Becton, Dickinson And Company | Methods for expression profile classification |
EP3940074A1 (en) | 2016-07-29 | 2022-01-19 | New England Biolabs, Inc. | Methods and compositions for preventing concatemerization during template- switching |
WO2022084742A1 (en) * | 2020-10-19 | 2022-04-28 | The Hong Kong University Of Science And Technology | Simultaneous amplification of dna and rna from single cells |
US11319583B2 (en) | 2017-02-01 | 2022-05-03 | Becton, Dickinson And Company | Selective amplification using blocking oligonucleotides |
US11365409B2 (en) | 2018-05-03 | 2022-06-21 | Becton, Dickinson And Company | Molecular barcoding on opposite transcript ends |
US11371076B2 (en) | 2019-01-16 | 2022-06-28 | Becton, Dickinson And Company | Polymerase chain reaction normalization through primer titration |
WO2022133734A1 (en) * | 2020-12-22 | 2022-06-30 | Singleron (Nanjing) Biotechnologies, Ltd. | Methods and reagents for high-throughput transcriptome sequencing for drug screening |
US11397882B2 (en) | 2016-05-26 | 2022-07-26 | Becton, Dickinson And Company | Molecular label counting adjustment methods |
US11460405B2 (en) | 2016-07-21 | 2022-10-04 | Takara Bio Usa, Inc. | Multi-Z imaging and dispensing with multi-well devices |
US11492660B2 (en) | 2018-12-13 | 2022-11-08 | Becton, Dickinson And Company | Selective extension in single cell whole transcriptome analysis |
US11535882B2 (en) | 2015-03-30 | 2022-12-27 | Becton, Dickinson And Company | Methods and compositions for combinatorial barcoding |
US11608497B2 (en) | 2016-11-08 | 2023-03-21 | Becton, Dickinson And Company | Methods for cell label classification |
US11639517B2 (en) | 2018-10-01 | 2023-05-02 | Becton, Dickinson And Company | Determining 5′ transcript sequences |
US11649497B2 (en) | 2020-01-13 | 2023-05-16 | Becton, Dickinson And Company | Methods and compositions for quantitation of proteins and RNA |
US11661625B2 (en) | 2020-05-14 | 2023-05-30 | Becton, Dickinson And Company | Primers for immune repertoire profiling |
US11661631B2 (en) | 2019-01-23 | 2023-05-30 | Becton, Dickinson And Company | Oligonucleotides associated with antibodies |
US11739443B2 (en) | 2020-11-20 | 2023-08-29 | Becton, Dickinson And Company | Profiling of highly expressed and lowly expressed proteins |
GB2581599B (en) * | 2017-08-10 | 2023-08-30 | Element Biosciences Inc | Tagging nucleic acid molecules from single cells for phased sequencing |
US11773441B2 (en) | 2018-05-03 | 2023-10-03 | Becton, Dickinson And Company | High throughput multiomics sample analysis |
US11773436B2 (en) | 2019-11-08 | 2023-10-03 | Becton, Dickinson And Company | Using random priming to obtain full-length V(D)J information for immune repertoire sequencing |
US11788120B2 (en) | 2017-11-27 | 2023-10-17 | The Trustees Of Columbia University In The City Of New York | RNA printing and sequencing devices, methods, and systems |
US11859171B2 (en) | 2013-04-17 | 2024-01-02 | Agency For Science, Technology And Research | Method for generating extended sequence reads |
US11932849B2 (en) | 2018-11-08 | 2024-03-19 | Becton, Dickinson And Company | Whole transcriptome analysis of single cells using random priming |
US11932901B2 (en) | 2020-07-13 | 2024-03-19 | Becton, Dickinson And Company | Target enrichment using nucleic acid probes for scRNAseq |
US11939622B2 (en) | 2019-07-22 | 2024-03-26 | Becton, Dickinson And Company | Single cell chromatin immunoprecipitation sequencing assay |
US11946095B2 (en) | 2017-12-19 | 2024-04-02 | Becton, Dickinson And Company | Particles associated with oligonucleotides |
WO2024081622A1 (en) * | 2022-10-11 | 2024-04-18 | The Board Of Trustees Of The Leland Stanford Junior University | Improvement to cdna library priming |
US11965208B2 (en) | 2019-04-19 | 2024-04-23 | Becton, Dickinson And Company | Methods of associating phenotypical data and single cell sequencing data |
US11970737B2 (en) | 2019-08-26 | 2024-04-30 | Becton, Dickinson And Company | Digital counting of individual molecules by stochastic attachment of diverse labels |
Families Citing this family (70)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10400280B2 (en) | 2012-08-14 | 2019-09-03 | 10X Genomics, Inc. | Methods and systems for processing polynucleotides |
US20150376609A1 (en) | 2014-06-26 | 2015-12-31 | 10X Genomics, Inc. | Methods of Analyzing Nucleic Acids from Individual Cells or Cell Populations |
US9951386B2 (en) | 2014-06-26 | 2018-04-24 | 10X Genomics, Inc. | Methods and systems for processing polynucleotides |
MX364957B (en) | 2012-08-14 | 2019-05-15 | 10X Genomics Inc | Microcapsule compositions and methods. |
US10323279B2 (en) | 2012-08-14 | 2019-06-18 | 10X Genomics, Inc. | Methods and systems for processing polynucleotides |
US10273541B2 (en) | 2012-08-14 | 2019-04-30 | 10X Genomics, Inc. | Methods and systems for processing polynucleotides |
US10752949B2 (en) | 2012-08-14 | 2020-08-25 | 10X Genomics, Inc. | Methods and systems for processing polynucleotides |
US10221442B2 (en) | 2012-08-14 | 2019-03-05 | 10X Genomics, Inc. | Compositions and methods for sample processing |
US9701998B2 (en) | 2012-12-14 | 2017-07-11 | 10X Genomics, Inc. | Methods and systems for processing polynucleotides |
US11591637B2 (en) | 2012-08-14 | 2023-02-28 | 10X Genomics, Inc. | Compositions and methods for sample processing |
CA2894694C (en) | 2012-12-14 | 2023-04-25 | 10X Genomics, Inc. | Methods and systems for processing polynucleotides |
US10533221B2 (en) | 2012-12-14 | 2020-01-14 | 10X Genomics, Inc. | Methods and systems for processing polynucleotides |
CA2900481A1 (en) | 2013-02-08 | 2014-08-14 | 10X Genomics, Inc. | Polynucleotide barcode generation |
CN105121664B (en) * | 2013-02-20 | 2018-11-02 | 埃默里大学 | Mixture and its it is compositions related in nucleic acid sequencing approach |
AU2014268710B2 (en) | 2013-05-23 | 2018-10-18 | The Board Of Trustees Of The Leland Stanford Junior University | Transposition into native chromatin for personal epigenomics |
US9824068B2 (en) | 2013-12-16 | 2017-11-21 | 10X Genomics, Inc. | Methods and apparatus for sorting data |
AU2015243445B2 (en) | 2014-04-10 | 2020-05-28 | 10X Genomics, Inc. | Fluidic devices, systems, and methods for encapsulating and partitioning reagents, and applications of same |
US10975371B2 (en) * | 2014-04-29 | 2021-04-13 | Illumina, Inc. | Nucleic acid sequence analysis from single cells |
CA3060708A1 (en) * | 2014-04-29 | 2015-11-05 | Illumina, Inc | Multiplexed single cell gene expression analysis using template switch and tagmentation |
US20160122817A1 (en) | 2014-10-29 | 2016-05-05 | 10X Genomics, Inc. | Methods and compositions for targeted nucleic acid sequencing |
US9975122B2 (en) | 2014-11-05 | 2018-05-22 | 10X Genomics, Inc. | Instrument systems for integrated sample processing |
SG11201705615UA (en) | 2015-01-12 | 2017-08-30 | 10X Genomics Inc | Processes and systems for preparing nucleic acid sequencing libraries and libraries prepared using same |
EP3262188B1 (en) | 2015-02-24 | 2021-05-05 | 10X Genomics, Inc. | Methods for targeted nucleic acid sequence coverage |
EP3262407B1 (en) | 2015-02-24 | 2023-08-30 | 10X Genomics, Inc. | Partition processing methods and systems |
EP3285926B1 (en) * | 2015-04-21 | 2022-03-02 | General Automation Lab Technologies Inc. | Kit and method for high throughput microbiology applications |
US11371094B2 (en) | 2015-11-19 | 2022-06-28 | 10X Genomics, Inc. | Systems and methods for nucleic acid processing using degenerate nucleotides |
SG11201804086VA (en) | 2015-12-04 | 2018-06-28 | 10X Genomics Inc | Methods and compositions for nucleic acid analysis |
SG11201806757XA (en) | 2016-02-11 | 2018-09-27 | 10X Genomics Inc | Systems, methods, and media for de novo assembly of whole genome sequence data |
WO2017197338A1 (en) | 2016-05-13 | 2017-11-16 | 10X Genomics, Inc. | Microfluidic systems and methods of use |
US10465242B2 (en) | 2016-07-14 | 2019-11-05 | University Of Utah Research Foundation | Multi-sequence capture system |
US10550429B2 (en) | 2016-12-22 | 2020-02-04 | 10X Genomics, Inc. | Methods and systems for processing polynucleotides |
US10011872B1 (en) | 2016-12-22 | 2018-07-03 | 10X Genomics, Inc. | Methods and systems for processing polynucleotides |
US10815525B2 (en) | 2016-12-22 | 2020-10-27 | 10X Genomics, Inc. | Methods and systems for processing polynucleotides |
EP4029939B1 (en) | 2017-01-30 | 2023-06-28 | 10X Genomics, Inc. | Methods and systems for droplet-based single cell barcoding |
US10995333B2 (en) | 2017-02-06 | 2021-05-04 | 10X Genomics, Inc. | Systems and methods for nucleic acid preparation |
US10400235B2 (en) | 2017-05-26 | 2019-09-03 | 10X Genomics, Inc. | Single cell analysis of transposase accessible chromatin |
CN116064732A (en) | 2017-05-26 | 2023-05-05 | 10X基因组学有限公司 | Single cell analysis of transposase accessibility chromatin |
US10837047B2 (en) | 2017-10-04 | 2020-11-17 | 10X Genomics, Inc. | Compositions, methods, and systems for bead formation using improved polymers |
WO2019084043A1 (en) | 2017-10-26 | 2019-05-02 | 10X Genomics, Inc. | Methods and systems for nuclecic acid preparation and chromatin analysis |
EP3700672B1 (en) | 2017-10-27 | 2022-12-28 | 10X Genomics, Inc. | Methods for sample preparation and analysis |
DK3707723T3 (en) * | 2017-11-06 | 2023-12-18 | Illumina Inc | TECHNIQUES FOR INDEXING NUCLEIC ACIDS |
SG11201913654QA (en) | 2017-11-15 | 2020-01-30 | 10X Genomics Inc | Functionalized gel beads |
US10829815B2 (en) | 2017-11-17 | 2020-11-10 | 10X Genomics, Inc. | Methods and systems for associating physical and genetic properties of biological particles |
WO2019108851A1 (en) | 2017-11-30 | 2019-06-06 | 10X Genomics, Inc. | Systems and methods for nucleic acid preparation and analysis |
WO2019157529A1 (en) | 2018-02-12 | 2019-08-15 | 10X Genomics, Inc. | Methods characterizing multiple analytes from individual cells or cell populations |
US11639928B2 (en) | 2018-02-22 | 2023-05-02 | 10X Genomics, Inc. | Methods and systems for characterizing analytes from individual cells or cell populations |
SG11202009889VA (en) | 2018-04-06 | 2020-11-27 | 10X Genomics Inc | Systems and methods for quality control in single cell processing |
US11932899B2 (en) | 2018-06-07 | 2024-03-19 | 10X Genomics, Inc. | Methods and systems for characterizing nucleic acid molecules |
US11703427B2 (en) | 2018-06-25 | 2023-07-18 | 10X Genomics, Inc. | Methods and systems for cell and bead processing |
US20200032335A1 (en) | 2018-07-27 | 2020-01-30 | 10X Genomics, Inc. | Systems and methods for metabolome analysis |
CN109295050A (en) * | 2018-09-26 | 2019-02-01 | 刘强 | Both-end label specific linkers, kit and the banking process in the library Blood Trace cfDNA |
CN109295049A (en) * | 2018-09-26 | 2019-02-01 | 刘强 | Label specific linkers, primer sets and the banking process in the library Blood Trace cfDNA |
US11459607B1 (en) | 2018-12-10 | 2022-10-04 | 10X Genomics, Inc. | Systems and methods for processing-nucleic acid molecules from a single cell using sequential co-partitioning and composite barcodes |
EP3894593A2 (en) | 2018-12-13 | 2021-10-20 | DNA Script | Direct oligonucleotide synthesis on cells and biomolecules |
US11845983B1 (en) | 2019-01-09 | 2023-12-19 | 10X Genomics, Inc. | Methods and systems for multiplexing of droplet based assays |
US11851683B1 (en) | 2019-02-12 | 2023-12-26 | 10X Genomics, Inc. | Methods and systems for selective analysis of cellular samples |
US11467153B2 (en) | 2019-02-12 | 2022-10-11 | 10X Genomics, Inc. | Methods for processing nucleic acid molecules |
EP3924505A1 (en) | 2019-02-12 | 2021-12-22 | 10X Genomics, Inc. | Methods for processing nucleic acid molecules |
US11655499B1 (en) | 2019-02-25 | 2023-05-23 | 10X Genomics, Inc. | Detection of sequence elements in nucleic acid molecules |
SG11202111242PA (en) | 2019-03-11 | 2021-11-29 | 10X Genomics Inc | Systems and methods for processing optically tagged beads |
US20220228168A1 (en) * | 2019-04-29 | 2022-07-21 | The Broad Institute, Inc. | Affinity-based multiplexing for live-cell monitoring of complex cell populations |
CN110643692A (en) * | 2019-07-08 | 2020-01-03 | 中山大学中山眼科中心 | Analysis method and kit for sequencing single cell transcript isomer |
CN111187812A (en) * | 2020-01-19 | 2020-05-22 | 青岛普泽麦迪生物技术有限公司 | Direct sequencing method using low total RNA |
US11851700B1 (en) | 2020-05-13 | 2023-12-26 | 10X Genomics, Inc. | Methods, kits, and compositions for processing extracellular molecules |
RU2752663C1 (en) * | 2020-05-18 | 2021-07-29 | ОБЩЕСТВО С ОГРАНИЧЕННОЙ ОТВЕТСТВЕННОСТЬЮ "СберМедИИ" | Method for quantifying the statistical analysis of alternative splicing in rna-sec data |
US10941453B1 (en) * | 2020-05-20 | 2021-03-09 | Paragon Genomics, Inc. | High throughput detection of pathogen RNA in clinical specimens |
WO2022182682A1 (en) | 2021-02-23 | 2022-09-01 | 10X Genomics, Inc. | Probe-based analysis of nucleic acids and proteins |
CN113322314B (en) * | 2021-06-04 | 2022-11-29 | 上海交通大学 | Novel tissue unicell space transcriptome technology |
CN113604540B (en) * | 2021-07-23 | 2022-08-16 | 杭州圣庭医疗科技有限公司 | Method for rapidly constructing RRBS sequencing library by using blood circulation tumor DNA |
US11680293B1 (en) | 2022-04-21 | 2023-06-20 | Paragon Genomics, Inc. | Methods and compositions for amplifying DNA and generating DNA sequencing results from target-enriched DNA molecules |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120010091A1 (en) * | 2009-03-30 | 2012-01-12 | Illumina, Inc. | Gene expression analysis in single cells |
-
2014
- 2014-06-12 WO PCT/US2014/042159 patent/WO2014201273A1/en active Application Filing
- 2014-06-12 US US14/898,030 patent/US20160122753A1/en not_active Abandoned
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120010091A1 (en) * | 2009-03-30 | 2012-01-12 | Illumina, Inc. | Gene expression analysis in single cells |
Non-Patent Citations (3)
Title |
---|
ISLAM, S. ET AL.: "Characterization of the single- cell transcriptional landscape by highly multiplex RNA-seq", GENOME RESEARCH, vol. 21, 2011, pages 1160 - 1167, XP002682367, doi:10.1101/GR.110882.110 * |
KAPTEYN, J. ET AL.: "Incorporation of non- natural nucleotides into template-switching oligonucleotides reduces background and improves cDNA synthesis from very small RNA samples", BMC GENOMICS, vol. 11, 2010, XP021072710, doi:10.1186/1471-2164-11-413 * |
SOUMILLON, M. ET AL.: "Characterization of directed differentiation by high- throughput single- cell RNA-Seq", BIORXIV, GENOMICS, 2014, pages 3236/1 - 3236/14, Retrieved from the Internet <URL:http://bioixiv.org/content/early/2014/03/05/003236> [retrieved on 20140827] * |
Cited By (130)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10718014B2 (en) | 2004-05-28 | 2020-07-21 | Takara Bio Usa, Inc. | Thermo-controllable high-density chips for multiplex analyses |
US9845502B2 (en) | 2009-12-15 | 2017-12-19 | Cellular Research, Inc. | Digital counting of individual molecules by stochastic attachment of diverse labels |
US10202646B2 (en) | 2009-12-15 | 2019-02-12 | Becton, Dickinson And Company | Digital counting of individual molecules by stochastic attachment of diverse labels |
US10392661B2 (en) | 2009-12-15 | 2019-08-27 | Becton, Dickinson And Company | Digital counting of individual molecules by stochastic attachment of diverse labels |
US10059991B2 (en) | 2009-12-15 | 2018-08-28 | Cellular Research, Inc. | Digital counting of individual molecules by stochastic attachment of diverse labels |
US10047394B2 (en) | 2009-12-15 | 2018-08-14 | Cellular Research, Inc. | Digital counting of individual molecules by stochastic attachment of diverse labels |
US10619203B2 (en) | 2009-12-15 | 2020-04-14 | Becton, Dickinson And Company | Digital counting of individual molecules by stochastic attachment of diverse labels |
US9708659B2 (en) | 2009-12-15 | 2017-07-18 | Cellular Research, Inc. | Digital counting of individual molecules by stochastic attachment of diverse labels |
US9816137B2 (en) | 2009-12-15 | 2017-11-14 | Cellular Research, Inc. | Digital counting of individual molecules by stochastic attachment of diverse labels |
US11634708B2 (en) | 2012-02-27 | 2023-04-25 | Becton, Dickinson And Company | Compositions and kits for molecular counting |
US10941396B2 (en) | 2012-02-27 | 2021-03-09 | Becton, Dickinson And Company | Compositions and kits for molecular counting |
US11859171B2 (en) | 2013-04-17 | 2024-01-02 | Agency For Science, Technology And Research | Method for generating extended sequence reads |
US10927419B2 (en) | 2013-08-28 | 2021-02-23 | Becton, Dickinson And Company | Massively parallel single cell analysis |
US10131958B1 (en) | 2013-08-28 | 2018-11-20 | Cellular Research, Inc. | Massively parallel single cell analysis |
US11702706B2 (en) | 2013-08-28 | 2023-07-18 | Becton, Dickinson And Company | Massively parallel single cell analysis |
US10208356B1 (en) | 2013-08-28 | 2019-02-19 | Becton, Dickinson And Company | Massively parallel single cell analysis |
US10151003B2 (en) | 2013-08-28 | 2018-12-11 | Cellular Research, Inc. | Massively Parallel single cell analysis |
US11618929B2 (en) | 2013-08-28 | 2023-04-04 | Becton, Dickinson And Company | Massively parallel single cell analysis |
US9637799B2 (en) | 2013-08-28 | 2017-05-02 | Cellular Research, Inc. | Massively parallel single cell analysis |
US9598736B2 (en) | 2013-08-28 | 2017-03-21 | Cellular Research, Inc. | Massively parallel single cell analysis |
US9567645B2 (en) | 2013-08-28 | 2017-02-14 | Cellular Research, Inc. | Massively parallel single cell analysis |
US10253375B1 (en) | 2013-08-28 | 2019-04-09 | Becton, Dickinson And Company | Massively parallel single cell analysis |
US10954570B2 (en) | 2013-08-28 | 2021-03-23 | Becton, Dickinson And Company | Massively parallel single cell analysis |
US9567646B2 (en) | 2013-08-28 | 2017-02-14 | Cellular Research, Inc. | Massively parallel single cell analysis |
US9905005B2 (en) | 2013-10-07 | 2018-02-27 | Cellular Research, Inc. | Methods and systems for digitally counting features on arrays |
US10370630B2 (en) | 2014-02-10 | 2019-08-06 | Technion Research & Development Foundation Limited | Method and apparatus for cell isolation, growth, replication, manipulation, and analysis |
US10590483B2 (en) | 2014-09-15 | 2020-03-17 | Abvitro Llc | High-throughput nucleotide library sequencing |
EP3194593B1 (en) * | 2014-09-15 | 2019-02-06 | AbVitro LLC | High-throughput nucleotide library sequencing |
US10400273B2 (en) | 2015-02-05 | 2019-09-03 | Technion Research & Development Foundation Limited | System and method for single cell genetic analysis |
WO2016125106A1 (en) | 2015-02-05 | 2016-08-11 | Technion Research & Development Foundation Limited | System and method for single cell genetic analysis |
EP3766988A1 (en) * | 2015-02-19 | 2021-01-20 | Becton, Dickinson and Company | High-throughput single-cell analysis combining proteomic and genomic information |
WO2016134078A1 (en) * | 2015-02-19 | 2016-08-25 | Becton, Dickinson And Company | High-throughput single-cell analysis combining proteomic and genomic information |
US11098358B2 (en) | 2015-02-19 | 2021-08-24 | Becton, Dickinson And Company | High-throughput single-cell analysis combining proteomic and genomic information |
US10697010B2 (en) | 2015-02-19 | 2020-06-30 | Becton, Dickinson And Company | High-throughput single-cell analysis combining proteomic and genomic information |
US10641772B2 (en) | 2015-02-20 | 2020-05-05 | Takara Bio Usa, Inc. | Method for rapid accurate dispensing, visualization and analysis of single cells |
EP3822361A1 (en) | 2015-02-20 | 2021-05-19 | Takara Bio USA, Inc. | Method for rapid accurate dispensing, visualization and analysis of single cells |
US10002316B2 (en) | 2015-02-27 | 2018-06-19 | Cellular Research, Inc. | Spatially addressable molecular barcoding |
US10190163B2 (en) | 2015-02-27 | 2019-01-29 | Fluidigm Corporation | Single cell nucleic acids for high-throughput studies |
US10954560B2 (en) | 2015-02-27 | 2021-03-23 | Fluidigm Corporation | Single-cell nucleic acids for high-throughput studies |
US9727810B2 (en) | 2015-02-27 | 2017-08-08 | Cellular Research, Inc. | Spatially addressable molecular barcoding |
USRE48913E1 (en) | 2015-02-27 | 2022-02-01 | Becton, Dickinson And Company | Spatially addressable molecular barcoding |
EP3262214A4 (en) * | 2015-02-27 | 2018-07-25 | Fluidigm Corporation | Single-cell nucleic acids for high-throughput studies |
US11535882B2 (en) | 2015-03-30 | 2022-12-27 | Becton, Dickinson And Company | Methods and compositions for combinatorial barcoding |
CN107580632B (en) * | 2015-04-23 | 2021-12-28 | 贝克顿迪金森公司 | Methods and compositions for whole transcriptome amplification |
US11390914B2 (en) | 2015-04-23 | 2022-07-19 | Becton, Dickinson And Company | Methods and compositions for whole transcriptome amplification |
CN107580632A (en) * | 2015-04-23 | 2018-01-12 | 赛卢拉研究公司 | Method and composition for the amplification of full transcript profile |
WO2016172373A1 (en) * | 2015-04-23 | 2016-10-27 | Cellular Research, Inc. | Methods and compositions for whole transcriptome amplification |
WO2016191533A1 (en) * | 2015-05-26 | 2016-12-01 | The Trustees Of Columbia University In The City Of New York | Rna printing and sequencing devices, methods, and systems |
US11124823B2 (en) | 2015-06-01 | 2021-09-21 | Becton, Dickinson And Company | Methods for RNA quantification |
JP2018526026A (en) * | 2015-08-28 | 2018-09-13 | イルミナ インコーポレイテッド | Single-cell nucleic acid sequence analysis |
JP2020189846A (en) * | 2015-08-28 | 2020-11-26 | イルミナ インコーポレイテッド | Single cell nucleic acid sequence analysis |
JP7351950B2 (en) | 2015-08-28 | 2023-09-27 | イルミナ インコーポレイテッド | Single cell nucleic acid sequence analysis |
JP7035128B2 (en) | 2015-08-28 | 2022-03-14 | イルミナ インコーポレイテッド | Single cell nucleic acid sequence analysis |
JP2022066349A (en) * | 2015-08-28 | 2022-04-28 | イルミナ インコーポレイテッド | Single cell nucleic acid sequence analysis |
US10619186B2 (en) | 2015-09-11 | 2020-04-14 | Cellular Research, Inc. | Methods and compositions for library normalization |
US11332776B2 (en) | 2015-09-11 | 2022-05-17 | Becton, Dickinson And Company | Methods and compositions for library normalization |
US11098304B2 (en) | 2015-11-04 | 2021-08-24 | Atreca, Inc. | Combinatorial sets of nucleic acid barcodes for analysis of nucleic acids associated with single cells |
EP3371309B1 (en) * | 2015-11-04 | 2023-07-05 | Atreca, Inc. | Combinatorial sets of nucleic acid barcodes for analysis of nucleic acids associated with single cells |
CN108473984A (en) * | 2015-11-04 | 2018-08-31 | 阿特雷卡公司 | The group of nucleic acid bar code for analyzing nucleic acid associated with individual cells is combined |
WO2017079593A1 (en) * | 2015-11-04 | 2017-05-11 | Atreca, Inc. | Combinatorial sets of nucleic acid barcodes for analysis of nucleic acids associated with single cells |
US10822643B2 (en) | 2016-05-02 | 2020-11-03 | Cellular Research, Inc. | Accurate molecular barcoding |
US10301677B2 (en) | 2016-05-25 | 2019-05-28 | Cellular Research, Inc. | Normalization of nucleic acid libraries |
US11845986B2 (en) | 2016-05-25 | 2023-12-19 | Becton, Dickinson And Company | Normalization of nucleic acid libraries |
US11397882B2 (en) | 2016-05-26 | 2022-07-26 | Becton, Dickinson And Company | Molecular label counting adjustment methods |
US10640763B2 (en) | 2016-05-31 | 2020-05-05 | Cellular Research, Inc. | Molecular indexing of internal sequences |
US10202641B2 (en) | 2016-05-31 | 2019-02-12 | Cellular Research, Inc. | Error correction in amplification of samples |
US11525157B2 (en) | 2016-05-31 | 2022-12-13 | Becton, Dickinson And Company | Error correction in amplification of samples |
US11220685B2 (en) | 2016-05-31 | 2022-01-11 | Becton, Dickinson And Company | Molecular indexing of internal sequences |
US20180002749A1 (en) * | 2016-06-30 | 2018-01-04 | Grail, Inc. | Differential tagging of rna for preparation of a cell-free dna/rna sequencing library |
US10144962B2 (en) * | 2016-06-30 | 2018-12-04 | Grail, Inc. | Differential tagging of RNA for preparation of a cell-free DNA/RNA sequencing library |
US11180801B2 (en) | 2016-06-30 | 2021-11-23 | Grail, Llc | Differential tagging of RNA for preparation of a cell-free DNA/RNA sequencing library |
US11460405B2 (en) | 2016-07-21 | 2022-10-04 | Takara Bio Usa, Inc. | Multi-Z imaging and dispensing with multi-well devices |
WO2018023068A1 (en) | 2016-07-29 | 2018-02-01 | New England Biolabs, Inc. | Methods and compositions for preventing concatemerization during template- switching |
EP3940074A1 (en) | 2016-07-29 | 2022-01-19 | New England Biolabs, Inc. | Methods and compositions for preventing concatemerization during template- switching |
US10246706B2 (en) | 2016-08-03 | 2019-04-02 | New England Biolabs, Inc. | Methods and compositions for preventing concatemerization during template-switching |
US10240148B2 (en) | 2016-08-03 | 2019-03-26 | New England Biolabs, Inc. | Methods and compositions for preventing concatemerization during template-switching |
US10676736B2 (en) | 2016-08-05 | 2020-06-09 | Bio-Rad Laboratories, Inc. | Second strand direct |
US10876112B2 (en) | 2016-08-05 | 2020-12-29 | Bio-Rad Laboratories, Inc. | Second strand direct |
CN113151423A (en) * | 2016-08-05 | 2021-07-23 | 生物辐射实验室股份有限公司 | Second chain guide |
US11725206B2 (en) | 2016-08-05 | 2023-08-15 | Bio-Rad Laboratories, Inc. | Second strand direct |
EP3494214A4 (en) * | 2016-08-05 | 2020-03-04 | Bio-Rad Laboratories, Inc. | Second strand direct |
US11782059B2 (en) | 2016-09-26 | 2023-10-10 | Becton, Dickinson And Company | Measurement of protein expression using reagents with barcoded oligonucleotide sequences |
US11467157B2 (en) | 2016-09-26 | 2022-10-11 | Becton, Dickinson And Company | Measurement of protein expression using reagents with barcoded oligonucleotide sequences |
US11460468B2 (en) | 2016-09-26 | 2022-10-04 | Becton, Dickinson And Company | Measurement of protein expression using reagents with barcoded oligonucleotide sequences |
US10338066B2 (en) | 2016-09-26 | 2019-07-02 | Cellular Research, Inc. | Measurement of protein expression using reagents with barcoded oligonucleotide sequences |
US11608497B2 (en) | 2016-11-08 | 2023-03-21 | Becton, Dickinson And Company | Methods for cell label classification |
US11164659B2 (en) | 2016-11-08 | 2021-11-02 | Becton, Dickinson And Company | Methods for expression profile classification |
US10722880B2 (en) | 2017-01-13 | 2020-07-28 | Cellular Research, Inc. | Hydrophilic coating of fluidic channels |
US11319583B2 (en) | 2017-02-01 | 2022-05-03 | Becton, Dickinson And Company | Selective amplification using blocking oligonucleotides |
WO2018204423A1 (en) * | 2017-05-01 | 2018-11-08 | Illumina, Inc. | Optimal index sequences for multiplex massively parallel sequencing |
US11788139B2 (en) | 2017-05-01 | 2023-10-17 | Illumina, Inc. | Optimal index sequences for multiplex massively parallel sequencing |
CN110799653A (en) * | 2017-05-01 | 2020-02-14 | 伊鲁米那股份有限公司 | Optimal index sequences for multiple massively parallel sequencing |
US11028435B2 (en) | 2017-05-01 | 2021-06-08 | Illumina, Inc. | Optimal index sequences for multiplex massively parallel sequencing |
US11814678B2 (en) | 2017-05-08 | 2023-11-14 | Illumina, Inc. | Universal short adapters for indexing of polynucleotide samples |
US11028436B2 (en) | 2017-05-08 | 2021-06-08 | Illumina, Inc. | Universal short adapters for indexing of polynucleotide samples |
WO2018208699A1 (en) * | 2017-05-08 | 2018-11-15 | Illumina, Inc. | Universal short adapters for indexing of polynucleotide samples |
CN111406114A (en) * | 2017-05-29 | 2020-07-10 | 哈佛学院董事及会员团体 | Method for amplifying single cell transcriptome |
EP3631004A4 (en) * | 2017-05-29 | 2021-03-03 | President and Fellows of Harvard College | A method of amplifying single cell transcriptome |
JP2020521486A (en) * | 2017-05-29 | 2020-07-27 | プレジデント アンド フェローズ オブ ハーバード カレッジ | Single cell transcriptome amplification method |
US10676779B2 (en) | 2017-06-05 | 2020-06-09 | Becton, Dickinson And Company | Sample indexing for single cells |
WO2018226293A1 (en) * | 2017-06-05 | 2018-12-13 | Becton, Dickinson And Company | Sample indexing for single cells |
US10669570B2 (en) | 2017-06-05 | 2020-06-02 | Becton, Dickinson And Company | Sample indexing for single cells |
GB2581599B (en) * | 2017-08-10 | 2023-08-30 | Element Biosciences Inc | Tagging nucleic acid molecules from single cells for phased sequencing |
US11788120B2 (en) | 2017-11-27 | 2023-10-17 | The Trustees Of Columbia University In The City Of New York | RNA printing and sequencing devices, methods, and systems |
US11946095B2 (en) | 2017-12-19 | 2024-04-02 | Becton, Dickinson And Company | Particles associated with oligonucleotides |
US11718872B2 (en) | 2017-12-28 | 2023-08-08 | Mgi Tech Co., Ltd. | Method for obtaining single-cell mRNA sequence |
EP3733865A4 (en) * | 2017-12-28 | 2021-09-08 | MGI Tech Co., Ltd. | Method for obtaining single-cell mrna sequence |
US20210087549A1 (en) * | 2018-02-23 | 2021-03-25 | Yale University | Single-cell freeze-thaw lysis |
WO2019165181A1 (en) * | 2018-02-23 | 2019-08-29 | Yale University | Single-cell freeze-thaw lysis |
EP3755799A4 (en) * | 2018-02-23 | 2021-12-08 | Yale University | Single-cell freeze-thaw lysis |
WO2019191122A1 (en) * | 2018-03-26 | 2019-10-03 | Qiagen Sciences, Llc | Integrative dna and rna library preparations and uses thereof |
US11365409B2 (en) | 2018-05-03 | 2022-06-21 | Becton, Dickinson And Company | Molecular barcoding on opposite transcript ends |
US11773441B2 (en) | 2018-05-03 | 2023-10-03 | Becton, Dickinson And Company | High throughput multiomics sample analysis |
WO2020046833A1 (en) * | 2018-08-28 | 2020-03-05 | Cellular Research, Inc. | Sample multiplexing using carbohydrate-binding and membrane-permeable reagents |
US11639517B2 (en) | 2018-10-01 | 2023-05-02 | Becton, Dickinson And Company | Determining 5′ transcript sequences |
US11932849B2 (en) | 2018-11-08 | 2024-03-19 | Becton, Dickinson And Company | Whole transcriptome analysis of single cells using random priming |
US11492660B2 (en) | 2018-12-13 | 2022-11-08 | Becton, Dickinson And Company | Selective extension in single cell whole transcriptome analysis |
US11371076B2 (en) | 2019-01-16 | 2022-06-28 | Becton, Dickinson And Company | Polymerase chain reaction normalization through primer titration |
US11661631B2 (en) | 2019-01-23 | 2023-05-30 | Becton, Dickinson And Company | Oligonucleotides associated with antibodies |
US11965208B2 (en) | 2019-04-19 | 2024-04-23 | Becton, Dickinson And Company | Methods of associating phenotypical data and single cell sequencing data |
US11939622B2 (en) | 2019-07-22 | 2024-03-26 | Becton, Dickinson And Company | Single cell chromatin immunoprecipitation sequencing assay |
US11970737B2 (en) | 2019-08-26 | 2024-04-30 | Becton, Dickinson And Company | Digital counting of individual molecules by stochastic attachment of diverse labels |
US11773436B2 (en) | 2019-11-08 | 2023-10-03 | Becton, Dickinson And Company | Using random priming to obtain full-length V(D)J information for immune repertoire sequencing |
US11649497B2 (en) | 2020-01-13 | 2023-05-16 | Becton, Dickinson And Company | Methods and compositions for quantitation of proteins and RNA |
US11661625B2 (en) | 2020-05-14 | 2023-05-30 | Becton, Dickinson And Company | Primers for immune repertoire profiling |
US11932901B2 (en) | 2020-07-13 | 2024-03-19 | Becton, Dickinson And Company | Target enrichment using nucleic acid probes for scRNAseq |
WO2022084742A1 (en) * | 2020-10-19 | 2022-04-28 | The Hong Kong University Of Science And Technology | Simultaneous amplification of dna and rna from single cells |
US11739443B2 (en) | 2020-11-20 | 2023-08-29 | Becton, Dickinson And Company | Profiling of highly expressed and lowly expressed proteins |
WO2022133734A1 (en) * | 2020-12-22 | 2022-06-30 | Singleron (Nanjing) Biotechnologies, Ltd. | Methods and reagents for high-throughput transcriptome sequencing for drug screening |
WO2024081622A1 (en) * | 2022-10-11 | 2024-04-18 | The Board Of Trustees Of The Leland Stanford Junior University | Improvement to cdna library priming |
Also Published As
Publication number | Publication date |
---|---|
US20160122753A1 (en) | 2016-05-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20160122753A1 (en) | High-throughput rna-seq | |
Enderle et al. | Characterization of RNA from exosomes and other extracellular vesicles isolated by a novel spin column-based method | |
EP3289097B2 (en) | Error suppression in sequenced dna fragments using redundant reads with unique molecular indices (umis) | |
EP3529374B1 (en) | Sequencing and analysis of exosome associated nucleic acids | |
US9133513B2 (en) | High throughput methylation detection method | |
CN102329876B (en) | Method for measuring nucleotide sequence of disease associated nucleic acid molecules in sample to be detected | |
Coudry et al. | Successful application of microarray technology to microdissected formalin-fixed, paraffin-embedded tissue | |
EP3470530A1 (en) | Transposition into native chromatin for personal epigenomics | |
US20100120097A1 (en) | Methods and compositions for nucleic acid sequencing | |
US20100035249A1 (en) | Rna sequencing and analysis using solid support | |
RU2753883C2 (en) | Set of probes for analyzing dna samples and methods for their use | |
TW201321518A (en) | Method of micro-scale nucleic acid library construction and application thereof | |
JP2009072062A (en) | Method for isolating 5'-terminals of nucleic acid and its application | |
KR20120037992A (en) | Nucleic acid analysis | |
US20140272993A1 (en) | Method of sequencing a full microrna profile from cerebrospinal fluid | |
JP7248228B2 (en) | Methods and kits for construction of RNA libraries | |
US20060063181A1 (en) | Method for identification and quantification of short or small RNA molecules | |
CN115997032A (en) | Method for detecting whole transcriptome in single cell | |
US20220002797A1 (en) | Full-length rna sequencing | |
KR101767644B1 (en) | Composition and method for prediction of pigs litter size using gene expression profile | |
WO2022067494A1 (en) | Method for detection of whole transcriptome in single cells | |
Bhattacharya et al. | Experimental toolkit to study RNA level regulation | |
US20230002755A1 (en) | Method for producing non-ribosomal rna-containing sample | |
US20230193356A1 (en) | Single cell combinatorial indexing from amplified nucleic acids | |
EP3725880A1 (en) | Method and kit for the purification of functional risc-associated small rnas |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14810232 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 14810232 Country of ref document: EP Kind code of ref document: A1 |