WO2023141604A2 - Methods of molecular tagging for single-cell analysis - Google Patents

Methods of molecular tagging for single-cell analysis Download PDF

Info

Publication number
WO2023141604A2
WO2023141604A2 PCT/US2023/061040 US2023061040W WO2023141604A2 WO 2023141604 A2 WO2023141604 A2 WO 2023141604A2 US 2023061040 W US2023061040 W US 2023061040W WO 2023141604 A2 WO2023141604 A2 WO 2023141604A2
Authority
WO
WIPO (PCT)
Prior art keywords
cell
rna
sequence
cdna
amplicons
Prior art date
Application number
PCT/US2023/061040
Other languages
French (fr)
Other versions
WO2023141604A3 (en
Inventor
Dalia Dhingra
Chieh-Yuan Li
Daniel Y. LI
Adam SCIAMBI
Aik OOI
Original Assignee
Mission Bio, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mission Bio, Inc. filed Critical Mission Bio, Inc.
Publication of WO2023141604A2 publication Critical patent/WO2023141604A2/en
Publication of WO2023141604A3 publication Critical patent/WO2023141604A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay

Definitions

  • Single cell analysis methodologies involve analyzing analytes of single cells to characterize single cells (e.g., for disease).
  • barcodes such as unique molecular identifiers (UMIs)
  • UMIs unique molecular identifiers
  • these prior methodologies are often inefficient or sub-optimal. This can be attributed to various factors, examples of which include reaction inefficiencies arising from undesired reagents interactions between reagents for reverse transcription and reagents for nucleic acid amplification, narrow experimental conditions that are sub-optimal for performing reactions in-droplet, poor amplification efficiency due to secondary structure effects of long amplicons.
  • these prior methodologies can often be resource-intensive e.g., time intensive and expensive (e.g., due to needing to sequence long amplicons including gene sequences of interest).
  • improved methodologies for analyzing single cell analytes are needed.
  • analytes of the cell refer to nucleic acids of the cell.
  • analytes of the cell refer to genomic DNA of the cell.
  • analytes of the cell refer to RNA of the cell.
  • analytes of the cell refer to both DNA and RNA of the cell.
  • analytes of the cell refer to protein analytes of the cell.
  • the incorporation of molecular tags enables the distinguishing of the analytes of the cell.
  • the incorporation of molecular tags enables distinguishing amplicons derived from a first analyte, such as a first RNA analyte, and amplicons derived from a second analyte, such as a second RNA analyte.
  • the incorporation of molecular tags may be useful for analyzing analytes of the cell, such as for purposes of quantifying the number of different RNA analytes.
  • RNA of a cell comprising: encapsulating a cell within a droplet with reagents; within the droplet, generating a cell lysate using the reagents, the cell lysate comprising A) cDNA derived from RNA of the cell or B) complements of the cDNA; encapsulating the cell lysate in a second droplet with reactants; within the second droplet, generating amplicons comprising sequences derived from oligonucleotides comprising molecular tags using the reactants and the cDNA of the cell lysate; and sequencing the generated amplicons to analyze RNA of the cell, wherein oligonucleotides comprising molecular tags are included in either the reagents in the droplet or in the reactants in the second droplet.
  • the reagents comprise oligonucleotides comprising molecular tags.
  • the cDNA comprises the oligonucleotides comprising molecular tags.
  • the oligonucleotides comprising molecular tags further comprise reverse transcription (RT) primers or template switching oligonucleotides (TSOs).
  • RT reverse transcription
  • TSO template switching oligonucleotides
  • a molecular tag is located at an end of a RT primer or TSO.
  • the oligonucleotides comprising molecular tags further comprise RT primers, and wherein generating the cell lysate comprising cDNA derived from RNA of the cell comprises: providing a RT primer to the RNA of the cell; and performing reverse transcription to generate the cDNA comprising the RT primer and the molecular tag.
  • the oligonucleotides comprising molecular tags further comprise RT primers comprising one or more ribonucleotides.
  • the oligonucleotides comprising molecular tags further comprise RT primers comprising one or more uracil.
  • generating amplicons comprising sequences derived from the oligonucleotides comprising molecular tags comprises: digesting the oligonucleotides comprising molecular tags; and performing nucleic acid amplification to generate amplicons comprising sequences derived from the oligonucleotides comprising molecular tags.
  • digesting the oligonucleotides comprises exposing the oligonucleotides to a ribonuclease or uracil-DNA glycosylase (UDG).
  • methods disclosed herein further comprise: prior to digesting the oligonucleotides comprising molecular tags: providing a primer to the cDNA; and extending the primer to generate a sequence complementary to the oligonucleotide comprising a molecular tag.
  • generating amplicons comprising sequences derived from the oligonucleotides comprising molecular tags further comprises incorporating cellular barcodes into the amplicons, wherein the cellular barcodes identify the cell from which the amplicons originate from.
  • the oligonucleotides comprising molecular tags further comprise TSOs, and wherein generating the cell lysate comprising cDNA derived from RNA of the cell comprises: performing reverse transcription to generate a first cDNA molecule; performing template switching by providing a TSO to the first cDNA molecule; and generating the cDNA from the hybridized TSO and the first cDNA molecule.
  • the TSO comprises a sequence that hybridizes with untemplated cytosine nucleotides of the first cDNA molecule.
  • the TSO comprises a rGrGrG sequence that hybridizes with three untemplated cytosine nucleotides of the first cDNA molecule.
  • the reactants comprise oligonucleotides comprising molecular tags.
  • the oligonucleotides comprising molecular tags further comprise forward primers.
  • the oligonucleotides comprising molecular tags further comprise reverse primers.
  • generating amplicons comprising sequences derived from oligonucleotides comprising molecular tags using the reactants and the cDNA of the cell lysate further comprises: providing the forward primers of the oligonucleotides to the cDNA of the cell lysate; and extending the forward primers to generate sequences that incorporate the molecular tags of the oligonucleotides.
  • generating amplicons comprising sequences derived from oligonucleotides comprising molecular tags using the reactants and the cDNA of the cell lysate further comprises: performing nucleic acid amplification to generate amplicons comprising sequences derived from the oligonucleotides comprising molecular tags.
  • the forward primers are gene specific primers.
  • performing nucleic acid amplification to generate amplicons further comprises incorporating cellular barcodes into the amplicons, wherein the cellular barcodes identify the cell from which the amplicons originate from.
  • RNA of a cell comprising: encapsulating a cell within a droplet with reagents; within the droplet, generating a cell lysate comprising A) cDNA derived from RNA of the cell or B) complements of the cDNA; encapsulating the cell lysate in a second droplet with reactants; within the second droplet, generating amplicons comprising sequences derived from the cDNA of the cell lysate; breaking the second droplet to obtain the generated amplicons in bulk; providing oligonucleotides comprising molecular tags to the generated amplicons in bulk; and sequencing at least the oligonucleotides comprising molecular tags to analyze RNA of the cell.
  • generating amplicons comprising sequences derived from the cDNA of the cell lysate further comprises: providing oligonucleotides comprising primers and cellular barcodes to the cDNA of the cell lysate, wherein the cellular barcodes identify the cell; and extending the primers to generate sequences that incorporate the cellular barcodes.
  • generating amplicons comprising sequences derived from the cDNA of the cell lysate further comprises performing nucleic acid amplification.
  • the primers are gene specific primers.
  • generating amplicons comprising sequences derived from the cDNA of the cell lysate further comprises: providing oligonucleotides comprising primers and gene tags; extending the primers to generate amplicons that further incorporate the gene tags.
  • methods disclosed herein further comprise: hybridizing the provided oligonucleotides comprising molecular tags with the amplicons that incorporate the gene tags; generating nucleic acid sequences by extending the hybridized oligonucleotides comprising molecular tags, wherein the generated nucleic sequences comprise molecular tags and gene tags.
  • methods disclosed herein further comprise: sequencing gene tags of the nucleic acid sequences to analyze RNA of the cell.
  • the nucleic acid sequences do not include the cDNA of the cell lysate or sequences derived from the cDNA of the cell lysate.
  • sequencing at least the oligonucleotides comprising molecular tags and sequencing gene tags of the nucleic acid sequences do not include sequencing cDNA of the cell lysate or sequences derived from the cDNA of the cell lysate.
  • RNA of a cell comprising: encapsulating a cell within a droplet with reagents; within the droplet, generating a cell lysate comprising A) cDNA derived from RNA of the cell or B) complements of the cDNA; encapsulating the cell lysate in a second droplet with reactants, wherein the reactants comprise oligonucleotides comprising one or more universal bases; within the second droplet, generating amplicons comprising molecular tags derived from oligonucleotides comprising one or more universal bases using the reactants and the cDNA of the cell lysate or complements of the cDNA, wherein amplicons from different cDNA are distinguishable according to the molecular tags derived from oligonucleotides comprising one or more universal bases; and sequencing the generated amplicons to analyze RNA of the cell, wherein oligonucleotides comprising one or
  • generating amplicons comprising sequences derived from oligonucleotides comprising one or more universal bases using the reactants and the cDNA of the cell lysate comprises: performing a first cycle of nucleic acid amplification to incorporate the oligonucleotides comprising one or more universal bases; and performing a second cycle of nucleic acid amplification to generate the amplicons, wherein the molecular tags are generated within the amplicons during the second cycle of nucleic acid amplification.
  • the molecular tags are generated within the amplicons during the second cycle of nucleic acid amplification by polymerases that generate strands complementary to the one or more universal bases of the oligonucleotides.
  • the oligonucleotides comprising one or more universal bases comprise two or more consecutive universal bases.
  • the oligonucleotides comprising one or more universal bases comprise three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, or ten or more consecutive universal bases.
  • each of the universal bases are independently any one of an inosine base, 2'-DeoxyNebularine, 2’deoxyinosine, 5-nitroindole, 5' 5-Nitroindole, or 3 -Nitropyrrole.
  • each molecular tag differs from other molecular tags.
  • at least one molecular tag has a same sequence as another molecular tag.
  • at least 0.1% of molecular tags have a same sequence as another molecular tag.
  • at least 0.5% of molecular tags have a same sequence as another molecular tag.
  • each of the molecular tags independently comprise any one of three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, or twenty nucleotide bases. In various embodiments, each of the molecular tags independently comprise either seven or eight nucleotide bases.
  • RNA of a cell comprising: encapsulating a cell within a droplet with reagents; within the droplet, generating a cell lysate comprising A) cDNA with different start and stop sites derived from RNA of the cell that have been differentially cleaved or B) complements of the cDNA; encapsulating the cell lysate in a second droplet with reactants; within the second droplet, generating amplicons using the reactants and the cDNA of the cell lysate, wherein the amplicons are derived from the cDNA with different start and stop sites; and sequencing the generated amplicons to analyze RNA of the cell.
  • the RNA of the cell have been differentially cleaved by a RNAse included in the reagents.
  • the reagents further comprise a plurality of truncation oligonucleotides, wherein the plurality of truncation oligonucleotides comprise DNA nucleobases.
  • generating the cell lysate comprising cDNA with different start and stop sites derived from RNA of the cell that have been differentially cleaved further comprises: hybridizing the plurality of truncation oligonucleotides to RNA of the cell; differentially cleaving the RNA of the cell at locations where the plurality of truncation oligonucleotides are hybridized to RNA of the cell.
  • the different start and stop sites include at least 10 2 , at least 10 3 , at least 10 4 , at least 10 5 , at least 10 6 , at least 10 7 , at least 10 8 , or at least 10 9 different start and stop sites.
  • sequencing the generated amplicons to analyze RNA of the cell comprises: identifying amplicons with sequences corresponding to the same start and stop site; and correlating the identified amplicons with sequences corresponding to the same start and stop site to a common RNA of the cell.
  • RNA of a cell comprising: encapsulating a cell within a droplet with reagents; within the droplet, generating a cell lysate comprising A) cDNA derived from RNA of the cell or B) complements of the cDNA; encapsulating the cell lysate in a second droplet with reactants; within the second droplet, performing nucleic acid amplification to generate amplicons comprising nucleotide bases derived from alternate bases introduced in one or both of the reagents or reactants, wherein the alternate bases are propagated through one or more cycles of the nucleic acid amplification; and sequencing the generated amplicons comprising nucleotide bases derived from alternate bases to analyze RNA of the cell.
  • alternate bases are introduced in the reagents, and wherein generating the cell lysate comprises incorporating alternate bases into the cDNA or complements of the cDNA. In various embodiments, generating the cell lysate comprises incorporating alternate bases into the cDNA during reverse transcription. In various embodiments, the alternate bases are incorporated into the cDNA or complements of the cDNA in a random manner. In various embodiments, performing nucleic acid amplification to generate amplicons comprising nucleotide bases derived from alternate bases comprises amplifying the cDNA comprising the incorporated alternate bases.
  • alternate bases are introduced in the reactants, and wherein performing nucleic acid amplification comprises incorporating alternate bases during a first cycle of the nucleic acid amplification.
  • the alternate bases are incorporated during the first cycle of the nucleic acid amplification in a random manner.
  • additional alternate bases are randomly incorporated in the one or more cycles of the nucleic acid amplification.
  • the alternate bases comprise any one of inosine base, 2'-DeoxyNebularine, 2’deoxyinosine, 5 -nitroindole, 5' 5- Nitroindole, or 3 -Nitropyrrole.
  • sequencing the generated amplicons comprising common alternate bases to analyze RNA of the cell comprises: identifying one or more sequence reads of amplicons comprising a plurality of common alternate bases; and assigning the identified one or more sequence reads of amplicons to a RNA of the cell.
  • generating the cell lysate using the reagents comprises releasing genomic DNA (gDNA) of the cell such that the cell lysate comprises the gDNA.
  • releasing gDNA of the cell comprises exposing the cell to proteinase K.
  • methods disclosed herein further comprise: within the second droplet, generating amplicons derived from the released gDNA of the cell lysate using the reactants; and sequencing the generated amplicons derived from the released gDNA.
  • generating amplicons derived from the released gDNA comprises incorporating cellular barcodes into amplicons derived from the released gDNA using the reactants, wherein the cellular barcodes identify the cell from which the amplicons originate from.
  • RNA of a cell comprising: performing in situ processing of the cell to incorporate one or more molecular tags into a nucleic acid derived from RNA of the cell; encapsulating, within a first droplet, the cell comprising the nucleic acid derived from RNA of the cell and reagents; further encapsulating the nucleic acid derived from RNA in a second droplet with reactants; within the second droplet, performing nucleic acid amplification to incorporate a cell barcode into amplicons using the reactants, wherein the amplicons comprise the one or more molecular tags, the cell barcode, and a gene specific sequence of the RNA or a complement thereof; and sequencing the generated amplicons to analyze RNA of the cell,
  • performing in situ processing of the cell to incorporate one or more molecular tags into a nucleic acid derived from RNA of the cell comprises: providing a primer comprising a sequence complementary to a sequence of the
  • the sequence complementary to a sequence of the RNA of the cell comprises a poly-T sequence complementary to a poly-A tail of the RNA. In various embodiments, the sequence complementary to a sequence of the RNA of the cell comprises a gene specific sequence. In various embodiments, the sequence of the RNA and the second sequence of the RNA are adjacent to one another other.
  • performing in situ processing of the cell to incorporate one or more molecular tags into a nucleic acid derived from RNA of the cell comprises: providing a first primer comprising a sequence complementary to a sequence of the RNA of the cell; providing a second primer comprising a sequence complementary to a second sequence of the RNA of the cell; and ligating the first primer and the second primer to generate the nucleic acid molecule comprising the molecular tag, wherein one or more both of the first primer or the second primer comprises a molecular tag.
  • the sequence complementary to a sequence of the RNA of the cell comprises a poly-T sequence complementary to a poly-A tail of the RNA.
  • the sequence complementary to a sequence of the RNA of the cell comprises a gene specific sequence.
  • the sequence of the RNA and the second sequence of the RNA are adjacent to one another other.
  • the first primer or the second primer further comprises a constant region.
  • performing in situ processing of the cell to incorporate one or more molecular tags into a nucleic acid derived from RNA of the cell comprises: providing a primer comprising a sequence complementary to a sequence of the RNA of the cell and a molecular tag; using the primer, reverse transcribing the nucleic acid derived from RNA of the cell.
  • the sequence complementary to a sequence of the RNA of the cell comprises a poly-T sequence complementary to a poly-A tail of the RNA.
  • the sequence complementary to a sequence of the RNA of the cell comprises a gene specific sequence.
  • methods disclosed herein further comprise: subsequent to encapsulating the cell comprising the nucleic acid derived from RNA of the cell and reagents in the first droplet, releasing genomic DNA from the cell using the reagents.
  • the nucleic acid derived from RNA of the cell comprises a constant region with a primer annealing temperature that differs from a primer annealing temperature of the released genomic DNA.
  • the primer annealing temperature of the constant region is lower than the primer annealing temperature of the released genomic DNA.
  • performing nucleic acid amplification to incorporate a cell barcode comprises performing nucleic acid amplification cycles at an annealing temperature of the genomic DNA to preferentially amplify the genomic DNA in comparison to the nucleic acid derived from RNA of the cell.
  • FIG. 1 A shows an overall system environment for analyzing cell(s) through a single cell workflow analysis, in accordance with an embodiment.
  • FIG. IB depicts a single cell workflow analysis to generate amplicons for sequencing, in accordance with an embodiment.
  • FIG. 2 is a flow process for analyzing nucleic acid sequences derived from analytes of the single cell, in accordance with an embodiment.
  • FIGs. 3 A-3C depict the processing and releasing of analytes of a single cell in a droplet, in accordance with an embodiment.
  • FIG. 4 A depicts the processing of RNA and gDNA in a droplet, in accordance with a first embodiment using digestible ribonucleotides.
  • FIG. 4B depicts the amplification and barcoding of nucleic acids derived from RNA and gDNA, in accordance with the embodiment shown in FIG. 4A.
  • FIG. 4C depicts the processing of RNA and gDNA in a droplet, in accordance with an second embodiment using digestible uracils.
  • FIG. 4D depicts the amplification and barcoding of nucleic acids derived from RNA and gDNA, in accordance with the embodiment shown in FIG. 4C.
  • FIG. 4E depicts the processing of RNA and gDNA in a droplet, in accordance with a third embodiment using digestible primers.
  • FIGs. 4F and 4G depict the amplification and barcoding of nucleic acids derived from RNA and gDNA, in accordance with the third embodiment shown in FIG. 4E.
  • FIG. 5 A depicts the processing of RNA in a droplet, in accordance with an embodiment using template switching oligonucleotides.
  • FIG. 5B depicts the amplification and barcoding of nucleic acids derived from RNA, in accordance with the embodiment shown in FIG. 5 A.
  • FIG. 6 A depicts the processing of RNA and gDNA in a droplet, in accordance with an embodiment using gene specific primers comprising molecular tags.
  • FIG. 6B depicts the amplification and barcoding of nucleic acids derived from RNA and gDNA, in accordance with the embodiment shown in FIG. 6A.
  • FIG. 7 A depicts the processing of RNA and gDNA in a droplet, in accordance with an embodiment in which molecular tags are introduced in bulk.
  • FIG. 7B depicts the amplification and barcoding of nucleic acids derived from RNA and gDNA, in accordance with the embodiment shown in FIG. 7A.
  • FIG. 7C depicts the molecular tagging in bulk, in accordance with the embodiments shown in FIGs. 7 A and 7B.
  • FIG. 8 A depicts the processing of RNA in a droplet, in accordance with an embodiment incorporating molecular tags and gene tags.
  • FIG. 8B depicts the amplification and barcoding of nucleic acids derived from RNA, in accordance with the embodiment shown in FIG. 8A.
  • FIG. 8C depicts the molecular tagging in bulk, in accordance with the embodiments shown in FIGs. 8 A and 8B.
  • FIG. 9 A depicts the processing of RNA and gDNA in a droplet, in accordance with an embodiment incorporating universal bases.
  • FIG. 9B depicts the amplification and barcoding of nucleic acids derived from RNA and gDNA, in accordance with the embodiment shown in FIG. 9A.
  • FIG. 10 depicts the processing of RNA in a droplet, in accordance with an embodiment involving differentially cleaving RNA.
  • FIG. 11 A depicts the processing of RNA in a droplet, in accordance with an embodiment incorporating alternate bases.
  • FIG. 1 IB depicts the amplification and barcoding of nucleic acids derived from RNA, in accordance with the embodiment shown in FIG. 11 A.
  • FIG. 12A depicts the in situ processing of RNA, in accordance with a first embodiment.
  • FIG. 12B depicts the in droplet processing of RNA, in accordance with the embodiment shown in FIG. 12 A.
  • FIG. 13 A depicts the in situ processing of RNA, in accordance with a second embodiment.
  • FIG. 13B depicts the in droplet processing of RNA, in accordance with the embodiment shown in FIG. 13A.
  • FIG. 14A depicts the in situ processing of RNA, in accordance with a third embodiment.
  • FIGs. 14B and 14C depict the in droplet processing of RNA, in accordance with the embodiment shown in FIG. 14 A.
  • FIG. 14D depicts example amplicons of the DNA and RNA library, in accordance with the embodiments shown in FIGs. 14A-14C.
  • FIG. 15A depicts the in situ processing of RNA involving reverse transcription, in accordance with an embodiment.
  • FIG. 15B depicts the in situ processing of RNA involving reverse transcription, in accordance with a second embodiment.
  • FIG. 15C depicts the in droplet processing (e.g., encapsulation and barcoding), in accordance with an embodiment.
  • FIG. 15D depicts example amplicons of the DNA and RNA library, in accordance with the embodiments shown in FIGs. 15A-15C.
  • FIG. 16 depicts an example computing device for implementing system and methods described in reference to the above figures.
  • subject or “patient” are used interchangeably and encompass an organism, human or non-human, mammal or non-mammal, male or female.
  • sample or “test sample” can include a single cell or multiple cells or fragments of cells or an aliquot of body fluid, such as a blood sample, taken from a subject, by means including venipuncture, excretion, ejaculation, massage, biopsy, needle aspirate, lavage sample, scraping, surgical incision, or intervention or other means known in the art.
  • analyte refers to a component of a cell.
  • Cell analytes can be informative for characterizing a cell. Therefore, performing single-cell analysis of one or more analytes of a cell using the systems and methods described herein are informative for determining a state or behavior of a cell.
  • an analyte include a nucleic acid (e.g., RNA, DNA, cDNA), a protein, a peptide, an antibody, an antibody fragment, a polysaccharide, a sugar, a lipid, a small molecule, or combinations thereof.
  • a single-cell analysis involves analyzing RNA.
  • a single-cell analysis involves analyzing two different analytes such as RNA and DNA. In particular embodiments, a single-cell analysis involves analyzing three or more different analytes of a cell, such as RNA, DNA, and protein.
  • the discrete entities as described herein are droplets.
  • the terms “emulsion,” “drop,” “droplet,” and “microdroplet” are used interchangeably herein, to refer to small, generally spherically structures, containing at least a first fluid phase, e.g., an aqueous phase (e.g., water), bounded by a second fluid phase (e.g., oil) which is immiscible with the first fluid phase.
  • droplets according to the present disclosure may contain a first fluid phase, e.g., oil, bounded by a second immiscible fluid phase, e.g. an aqueous phase fluid (e.g., water).
  • the second fluid phase will be an immiscible phase carrier fluid.
  • droplets according to the present disclosure may be provided as aqueous-in-oil emulsions or oil-in-aqueous emulsions. Droplets may be sized and/or shaped as described herein for discrete entities. For example, droplets according to the present disclosure generally range from 1 pm to 1000 pm, inclusive, in diameter. Droplets according to the present disclosure may be used to encapsulate cells, nucleic acids (e.g., DNA), enzymes, reagents, reaction mixture, and a variety of other components.
  • the term emulsion may be used to refer to an emulsion produced in, on, or by a microfluidic device and/or flowed from or applied by a microfluidic device.
  • “Complementarity” or “complementary” refers to the ability of a nucleic acid to form hydrogen bond(s) or hybridize with another nucleic acid sequence by either traditional Watson-Crick or other non-traditional types.
  • hybridization refers to the binding, duplexing, or hybridizing of a molecule only to a particular nucleotide sequence under low, medium, or highly stringent conditions, including when that sequence is present in a complex mixture (e.g., total cellular) DNA or RNA. See, e.g., Ausubel, et al., Current Protocols In Molecular Biology, John Wiley & Sons, New York, N.Y., 1993.
  • a nucleotide at a certain position of a polynucleotide is capable of forming a Watson-Crick pairing with a nucleotide at the same position in an antiparallel DNA or RNA strand
  • the polynucleotide and the DNA or RNA molecule are complementary to each other at that position.
  • the polynucleotide and the DNA or RNA molecule are "substantially complementary" to each other when a sufficient number of corresponding positions in each molecule are occupied by nucleotides that can hybridize or anneal with each other in order to affect the desired process.
  • a complementary sequence is a sequence capable of annealing under stringent conditions to provide a 3'-terminal serving as the origin of synthesis of complementary chain.
  • the terms "amplify,” “amplifying,” “amplification reaction” and their variants, refer generally to any action or process whereby at least a portion of a nucleic acid molecule (referred to as a template nucleic acid molecule) is replicated or copied into at least one additional nucleic acid molecule.
  • the additional nucleic acid molecule optionally includes sequence that is substantially identical or substantially complementary to at least some portion of the template nucleic acid molecule.
  • the template nucleic acid molecule can be single-stranded or double-stranded and the additional nucleic acid molecule can independently be single-stranded or doublestranded.
  • amplification includes a template-dependent in vitro enzyme-catalyzed reaction for the production of at least one copy of at least some portion of the nucleic acid molecule or the production of at least one copy of a nucleic acid sequence that is complementary to at least some portion of the nucleic acid molecule.
  • Amplification optionally includes linear or exponential replication of a nucleic acid molecule.
  • such amplification is performed using isothermal conditions; in other embodiments, such amplification can include thermocycling.
  • the amplification is a multiplex amplification that includes the simultaneous amplification of a plurality of target sequences in a single amplification reaction.
  • amplification includes amplification of at least some portion of DNA- and RNA-based nucleic acids alone, or in combination.
  • the amplification reaction can include single or double-stranded nucleic acid substrates and can further include any of the amplification processes known to one of ordinary skill in the art.
  • the amplification reaction includes polymerase chain reaction (PCR).
  • the amplification reaction includes an isothermal amplification reaction such as LAMP.
  • synthesis and "amplification" of nucleic acid are used.
  • nucleic acid in the present invention means the elongation or extension of nucleic acid from an oligonucleotide serving as the origin of synthesis. If not only this synthesis but also the formation of other nucleic acid and the elongation or extension reaction of this formed nucleic acid occur continuously, a series of these reactions is comprehensively called amplification.
  • the polynucleic acid produced by the amplification technology employed is generically referred to as an "amplicon" or "amplification product.”
  • Any nucleic acid amplification method may be utilized, such as a PCR-based assay, e.g., quantitative PCR (qPCR), or an isothermal amplification may be used to detect the presence of certain nucleic acids, e.g., genes of interest, present in discrete entities or one or more components thereof, e.g., cells encapsulated therein.
  • a PCR-based assay e.g., quantitative PCR (qPCR)
  • qPCR quantitative PCR
  • an isothermal amplification may be used to detect the presence of certain nucleic acids, e.g., genes of interest, present in discrete entities or one or more components thereof, e.g., cells encapsulated therein.
  • Such assays can be applied to discrete entities within a microfluidic device or a portion thereof or any other suitable location.
  • the conditions of such amplification or PCR-based assays may include detecting nucleic acid amplification over time and may vary in one or more
  • nucleic acid polymerases can be used in the amplification reactions utilized in certain embodiments provided herein, including any enzyme that can catalyze the polymerization of nucleotides (including analogs thereof) into a nucleic acid strand. Such nucleotide polymerization can occur in a template-dependent fashion.
  • Such polymerases can include without limitation naturally occurring polymerases and any subunits and truncations thereof, mutant polymerases, variant polymerases, recombinant, fusion or otherwise engineered polymerases, chemically modified polymerases, synthetic molecules or assemblies, and any analogs, derivatives or fragments thereof that retain the ability to catalyze such polymerization.
  • the polymerase can be a mutant polymerase comprising one or more mutations involving the replacement of one or more amino acids with other amino acids, the insertion or deletion of one or more amino acids from the polymerase, or the linkage of parts of two or more polymerases.
  • the polymerase comprises one or more active sites at which nucleotide binding and/or catalysis of nucleotide polymerization can occur.
  • Some exemplary polymerases include without limitation DNA polymerases and RNA polymerases.
  • polymerase and its variants, as used herein, also includes fusion proteins comprising at least two portions linked to each other, where the first portion comprises a peptide that can catalyze the polymerization of nucleotides into a nucleic acid strand and is linked to a second portion that comprises a second polypeptide.
  • the second polypeptide can include a reporter enzyme or a processivity-enhancing domain.
  • the polymerase can possess 5' exonuclease activity or terminal transferase activity.
  • the polymerase can be optionally reactivated, for example through the use of heat, chemicals or re-addition of new amounts of polymerase into a reaction mixture.
  • the polymerase can include a hot-start polymerase or an aptamer-based polymerase that optionally can be reactivated.
  • 'Forward primer binding site” and "reverse primer binding site” refer to the regions on the template nucleic acid and/or the amplicon to which the forward and reverse primers bind.
  • the primers act to delimit the region of the original template polynucleotide which is exponentially amplified during amplification.
  • additional primers may bind to the region 5' of the forward primer and/or reverse primers. Where such additional primers are used, the forward primer binding site and/or the reverse primer binding site may encompass the binding regions of these additional primers as well as the binding regions of the primers themselves.
  • the method may use one or more additional primers which bind to a region that lies 5' of the forward and/or reverse primer binding region. Such a method was disclosed, for example, in W00028082 which discloses the use of "displacement primers" or "outer primers.”
  • nucleic acid refers to biopolymers of nucleotides and, unless the context indicates otherwise, includes modified and unmodified nucleotides, and both DNA and RNA, and modified nucleic acid backbones.
  • the nucleic acid is a peptide nucleic acid (PNA) or a locked nucleic acid (LNA).
  • PNA peptide nucleic acid
  • LNA locked nucleic acid
  • the methods as described herein are performed using DNA as the nucleic acid template for amplification.
  • nucleic acid whose nucleotide is replaced by an artificial derivative or modified nucleic acid from natural DNA or RNA is also included in the nucleic acid of the present invention insofar as it functions as a template for synthesis of complementary chain.
  • the nucleic acid of the present invention is generally contained in a biological sample.
  • the biological sample includes animal, plant or microbial tissues, cells, cultures and excretions, or extracts therefrom.
  • the biological sample includes intracellular parasitic genomic DNA or RNA such as virus or mycoplasma.
  • the nucleic acid may be derived from nucleic acid contained in said biological sample.
  • genomic DNA or cDNA synthesized from mRNA, or nucleic acid amplified on the basis of nucleic acid derived from the biological sample, are preferably used in the described methods.
  • nucleotides are in 5' to 3' order from left to right and that "A” denotes deoxyadenosine, "C” denotes deoxycytidine, “G” denotes deoxyguanosine, "T” denotes deoxythymidine, and "U 1 denotes uridine.
  • Oligonucleotides are said to have "5' ends” and "3' ends” because mononucleotides are typically reacted to form oligonucleotides via attachment of the 5' phosphate or equivalent group of one nucleotide to the 3' hydroxyl or equivalent group of its neighboring nucleotide, optionally via a phosphodiester or other suitable linkage.
  • molecular tag refers to an entity that allows for distinguishing between analytes of a cell, such as RNA analytes of a cell.
  • molecular tags are present in amplicons, which, upon sequencing, enables the identification of the particular analytes (e.g., RNA analytes) from which the amplicons were generated.
  • a molecular tag is a short nucleotide sequence comprising between 3 to 20 nucleobases, between 5 to 18 nucleobases, between 6 to 15 nucleobases, or between 7 to 12 nucleobases.
  • Molecular tags can be incorporated via a variety of methods, including via primers (e.g., digestible primers such as ribonucleotide or uracil primers, gene specific primers, within a droplet, or within bulk solution).
  • primers e.g., digestible primers such as ribonucleotide or uracil primers, gene specific primers, within a droplet, or within bulk solution.
  • molecular tags need not be contiguous nucleotide sequences.
  • a molecular tag may be nucleobases located at variable positions within an amplicon.
  • the nucleobases may be alternate bases (e.g., bases different from reference bases, such as wild-type nucleotide bases) that are located at variable positions within the amplicon.
  • molecular tags are represented by sequences corresponding to different start and stop sites in RNA analytes which were differentially cleaved to create the start and stop sites.
  • molecular tags are represented by varying lengths of amplicons, which correspond to degenerate breaking sites on RNA analytes, such as manipulated or artificially induced breaking sites on RNA analytes.
  • the differing sequences of amplicons corresponding to different start and stop sites enable identification of the particular analytes (e.g., RNA analytes) from which the amplicons were generated.
  • each molecular tag differs from all other molecular tags.
  • At least one molecular tag has a same sequence as another molecular tag. In various embodiments, at least 0.1% of molecular tags have a same sequence as another molecular tag. In various embodiments, at least 0.5% of molecular tags have a same sequence as another molecular tag. In various embodiments, at least 1% of molecular tags have a same sequence as another molecular tag. In various embodiments, at least 2% of molecular tags have a same sequence as another molecular tag. In various embodiments, at least 3% of molecular tags have a same sequence as another molecular tag. In various embodiments, at least 4% of molecular tags have a same sequence as another molecular tag.
  • At least 5% of molecular tags have a same sequence as another molecular tag. In various embodiments, at least 6% of molecular tags have a same sequence as another molecular tag. In various embodiments, at least 7% of molecular tags have a same sequence as another molecular tag. In various embodiments, at least 8% of molecular tags have a same sequence as another molecular tag. In various embodiments, at least 9% of molecular tags have a same sequence as another molecular tag. In various embodiments, at least 10% of molecular tags have a same sequence as another molecular tag.
  • gene tag refers to an entity that allows for distinguishing between analytes of different gene targets.
  • gene tags can be sequences that are unique for a particular gene target and therefore, they enable differentiation between amplicons that derive from analytes of different genes.
  • gene tags can be sequences that enable differentiation between amplicons derived from the same gene.
  • gene tags can enable differentiation between amplicons derived from different RNA transcripts, such as different RNA transcripts from the same gene or different RNA transcripts from different genes.
  • gene tags can enable differentiation between amplicons derived from different genomic DNA sequences, such as different genomic DNA sequences of the same gene or different genomic DNA sequences from different genes.
  • a sequence of each gene tag may differ from a sequence of other gene tags.
  • a gene tag is a short nucleotide sequence comprising between 3 to 20 nucleobases, between 5 to 18 nucleobases, between 6 to 15 nucleobases, or between 7 to 12 nucleobases.
  • gene tags can be incorporated with reverse primers that hybridize with RNA analytes for performing reverse transcription. Therefore, gene tags are incorporated into cDNA following reverse transcription.
  • FIG. 1 A shows an overall system environment for analyzing cell(s) through a single cell workflow analysis, in accordance with an embodiment.
  • the single cell workflow device 100 is configured to process the cell(s) 110 and generate sequence reads derived from individual cell(s) 110. Further details as to the processes of the single cell workflow device 100 are described below in reference to FIG. IB.
  • the computing device 180 can analyze the sequence reads e.g., for purposes of building RNA/DNA libraries and/or characterizing individual cells.
  • the single cell workflow device 100 includes at least a microfluidic device that is configured to encapsulate cells with reagents to generate cell lysates comprising RNA and/or genomic (gDNA), encapsulate cell lysates with reaction mixtures, and perform nucleic acid amplification reactions.
  • the microfluidic device can include one or more fluidic channels that are fluidically connected. Therefore, the combining of an aqueous fluid through a first channel and a carrier fluid through a second channel results in the generation of emulsion droplets.
  • the fluidic channels of the microfluidic device may have at least one cross- sectional dimension on the order of a millimeter or smaller (e.g., less than or equal to about 1 millimeter). Additional details of microchannel design and dimensions is described in International Patent Application No. PCT/US2016/016444 and US Patent Application No. 14/420,646, each of which is hereby incorporated by reference in its entirety.
  • An example of a microfluidic device is the TapestriTM Platform. While the instant disclosure provides a specific example, it is understood by one of ordinary skill in the art that the disclosed principles are not limited thereto and may be implemented independently of the TapestriTM, miseqTM and novaseqTM devices.
  • the single cell workflow device 100 may also include one or more of (a) a temperature control module for controlling the temperature of one or more portions of the subject devices and/or droplets therein and which is operably connected to the microfluidic device(s), (b) a detection means, i.e., a detector, e.g., an optical imager, operably connected to the microfluidic device(s), (c) an incubator, e.g., a cell incubator, operably connected to the microfluidic device(s), and (d) a sequencer operably connected to the microfluidic device(s).
  • a temperature control module for controlling the temperature of one or more portions of the subject devices and/or droplets therein and which is operably connected to the microfluidic device(s)
  • a detection means i.e., a detector, e.g., an optical imager
  • an incubator e.g., a cell incubator
  • a sequencer operably connected to the microfluidic device(s).
  • the one or more temperature and/or pressure control modules provide control over the temperature and/or pressure of a carrier fluid in one or more flow channels of a device.
  • a temperature control module may be one or more thermal cycler that regulates the temperature for performing nucleic acid amplification.
  • the one or more detection means i.e., a detector, e.g., an optical imager, are configured for detecting the presence of one or more droplets, or one or more characteristics thereof, including their composition.
  • detection means are configured to recognize one or more components of one or more droplets, in one or more flow channel.
  • the sequencer is a hardware device configured to perform sequencing, such as next generation sequencing.
  • sequencers examples include Illumina sequencers (e.g., MiniSeqTM, MiSeqTM, NextSeqTM 550 Series, or NextSeqTM 2000), Roche sequencing system 454, and Thermo Fisher Scientific sequencers (e.g., Ion GeneStudio S5 system, Ion Torrent Genexus System).
  • Illumina sequencers e.g., MiniSeqTM, MiSeqTM, NextSeqTM 550 Series, or NextSeqTM 2000
  • Roche sequencing system 454 e.g., Roche sequencing system 454, and Thermo Fisher Scientific sequencers (e.g., Ion GeneStudio S5 system, Ion Torrent Genexus System).
  • FIG. IB depicts an embodiment of processing single cells to generate amplified nucleic acid molecules for sequencing.
  • the amplified nucleic acid molecules include molecular tags.
  • the processing of single cells can be performed by a single cell workflow device (e.g., the single cell workflow device 100 disclosed in FIG. 1 A).
  • FIG. IB depicts a workflow process including the steps of cell encapsulation 160, analyte release 165, cell barcoding 170, and target amplification 175 of target nucleic acid molecules.
  • the cell encapsulation step 160 involves encapsulating a single cell 110 with reagents 120 into a droplet.
  • the droplet is formed by partitioning aqueous fluid containing the cell 110 and reagents 120 into a carrier fluid (e.g., oil 115), thereby resulting in a aqueous fluid-in-oil emulsion.
  • the droplet includes encapsulated cell 125 and the reagents 120.
  • the encapsulated cell undergoes an analyte release at step 165.
  • the reagents cause the cell to lyse, thereby generating a cell lysate 130 within the droplet.
  • the cell lysate 130 includes the contents of the cell, which can include one or more different types of analytes (e.g., RNA transcripts, DNA, protein, lipids, or carbohydrates).
  • analytes e.g., RNA transcripts, DNA, protein, lipids, or carbohydrates.
  • the different analytes of the cell lysate 130 can interact with reagents 120 within the droplet.
  • reverse transcriptase in the reagents 120 can reverse transcribe cDNA molecules from RNA transcripts that are present in the cell lysate 130.
  • the reagents 120 include primers.
  • the primers are useful for conducting a reaction, such as for conducting reverse transcription to generate cDNA.
  • the primers include molecular tags and therefore, the molecular tags are incorporated into the cDNA following reverse transcription.
  • the primers are gene specific primers.
  • the primers are reverse primers that are capable of hybridizing to a portion of a nucleic acid, such as a RNA transcript. In such embodiments, the primers enables the reverse transcription of RNA transcripts to generate cDNA. Therefore, the reverse primers participate in the reverse transcription reaction, thereby generating cDNA molecules that incorporate molecular tags.
  • the primers are digestible primers.
  • digestible primers can participate in the reverse transcription of RNA transcripts to generate cDNA, but are later digested such that the digestible primers do not participate in subsequent reactions involving the cDNA (e.g., amplification of cDNA). Further details on digestible primers is described below.
  • the reagents 120 include primers that further include gene tags. Gene tags can be sequences that are unique for a particular gene target and therefore, they enable differentiation between amplicons that derive from analytes of different genes.
  • the reagents 120 include template switching oligonucleotides (TSOs).
  • TSOs are DNA oligonucleotide sequences. A portion of the TSOs hybridizes with cDNA molecules, thereby enabling the subsequent template switching.
  • the TSOs include guanosines at its 3’ end.
  • the TSOs include riboguanosines at its 3’ end.
  • the TSOs include three riboguanosines (rGrGrG) at its 3’ end which hybridizes with 3’ dC extension of cDNA molecules.
  • TSOs include molecular tags. Therefore, the molecular tags of the TSOs are incorporated through the template switching.
  • the reagents 120 include truncation oligonucleotides and cleaving enzymes, such as RNAse.
  • the truncation oligonucleotides are DNA oligonucleotides that are complementary to specific portions of RNA sequences. Therefore, different truncation oligonucleotides can be designed and included in the reagents 120.
  • the RNase is RNaseH.
  • RNase such as RNase H, cleaves RNA/DNA duplexes. Therefore, RNAseH can cleave truncation oligonucleotides that hybridize with RNA.
  • truncation oligonucleotides can be designed to hybridize with different regions of different RNA analytes of the cell.
  • a first truncation oligonucleotide can be designed to hybridize at the 3’ end of a first RNA analyte of the cell whereas a second truncation oligonucleotide can be designed to hybridize with a portion of a second RNA analyte that is a number of positions away from the 3’ end of the second RNA analyte.
  • RNaseH cleavage differentially cleaves the different RNA analytes based on the location of the hybridized truncation oligonucleotides (e.g., within the hybridized region between RNA and DNA duplex), thereby generating RNA analytes with different start and stop sites.
  • these RNA analytes with different sequences can lead to amplicons with different sequences, which serves as the molecular tags that enable distinguishing between different RNA analytes.
  • the reagents 120 include universal bases, examples of which include an inosine base, 2'-DeoxyNebularine, 2’deoxyinosine, 5-nitroindole, 5' 5- Nitroindole, or 3 -Nitropyrrole.
  • universal bases exhibit base pairing with any of adenosine, cytosine, guanine, and thymine.
  • the universal bases are introduced in conjunction with primers, such as reverse primers for reverse transcription. Therefore, following reverse transcription, random sequences complementary to universal base sequences are incorporated into cDNA molecules.
  • sequences of at least 3 universal bases are included in the reagents 120.
  • sequences of at least one or more universal bases are included in the reagents 120. In various embodiments, sequences of at least two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, or ten or more consecutive universal bases are included in the reagents 120. Thus, random sequences complementary to the sequence of universal bases can be incorporated into the cDNA, and further propagated to the amplicons that are later sequenced.
  • the reagents 120 include alternate bases.
  • alternate bases include bases other than deoxynucleotides (dNTPs) (e.g., other than deoxyadenosine 5 ’-triphosphate, deoxyguanine 5 ’-trisphosphate, deoxycytidine 5’- triphosphate, and deoxythymidine 5 ’-triphosphate).
  • Example alternate bases include, but are not limited to: inosine base, 2'-DeoxyNebularine, 2’deoxyinosine, 5-nitroindole, 5' 5- Nitroindole, or 3 -Nitropyrrole.
  • alternate bases include universal bases. Further examples of universal bases are described in Liang, F., et al “Universal Base Analogues and their Applications in DNA Sequencing Technology.” RSC Advances, 35, 2013, which is hereby incorporated by reference in its entirety.
  • alternate bases are included in the reagents 120 at a particular ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1 :2 ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1 :5 ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1 : 10 ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1 :20 ratio relative to dNTPs.
  • alternate bases are included in the reagents 120 at a ratio less than a 1:30 ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1 :40 ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1 :50 ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1 :75 ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1 : 100 ratio relative to dNTPs.
  • alternate bases are included in the reagents 120 at a ratio less than a 1 : 125 ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1: 150 ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1 : 175 ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1 :200 ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1 :225 ratio relative to dNTPs.
  • alternate bases are included in the reagents 120 at a ratio less than a 1 :250 ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1 :275 ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1:300 ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1 :400 ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1 :500 ratio relative to dNTPs.
  • alternate bases are included in the reagents 120 between a ratio of 1: 10 relative to dNTPs and 1 :500 relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 between a ratio of 1 :50 relative to dNTPs and 1 :300 relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 between a ratio of 1 : 100 relative to dNTPs and 1 :250 relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 between a ratio of 1 : 150 relative to dNTPs and 1 :200 relative to dNTPs.
  • the cell barcoding step 170 involves encapsulating the cell lysate 130 into a second droplet along with a barcode 145 and/or reaction mixture 140.
  • the second emulsion is formed by partitioning aqueous fluid containing the cell lysate 130 into immiscible oil 135.
  • the reaction mixture 140 and barcode 145 can be introduced through a separate stream of aqueous fluid, thereby partitioning the reaction mixture 140 and barcode 145 into the second droplet along with the cell lysate 130.
  • the reaction mixture 140 enables the performance of a reaction, such as a nucleic acid amplification reaction.
  • the reaction mixture 140 includes primers for introducing molecular tags.
  • the primers are gene specific primers.
  • the molecular tags are incorporated into amplicons.
  • the reaction mixture 140 includes one or more enzymes capable of digesting primers and/or molecular tags. In such embodiments where the reaction mixture 140 includes one or more enzymes capable of digesting the digestible primers, the enzymes digest the digestible primers here in this droplet at step 170.
  • the digestible primers are previously digested in the droplet at step 165 and therefore, need not be digested here at step 170.
  • the enzymes digest the digestible primers prior to a first cycle of nucleic acid amplification.
  • the enzymes digest the digestible primers subsequent to a first cycle of nucleic acid amplification.
  • the enzymes digest the digestible primers subsequent to a first cycle of nucleic acid amplification, but prior to a second cycle of nucleic acid amplification.
  • the target amplification step 175 involves amplifying target nucleic acids.
  • target nucleic acids of the cell lysate undergo amplification using the reaction mixture 140 in the second emulsion, thereby generating amplicons derived from the target nucleic acids.
  • any digestible primers that were previously introduced e.g., previously introduced as part of the reagents 120
  • digestible primers do not play a role in the target amplification 175 step.
  • a barcode 145 can label a target nucleic acid or amplicon to be analyzed (e.g., an analyte of the cell lysate such as genomic DNA or cDNA that has been reverse transcribed from RNA), which enables subsequent identification of the origin of a sequence read (e.g., a cellular origin) that is derived from the target nucleic acid.
  • a sequence read e.g., a cellular origin
  • multiple barcodes 145 can label multiple target nucleic acid of the cell lysate, thereby enabling the subsequent identification of the origin of large quantities of sequence reads.
  • the workflow process shown in FIG. IB is a two-step workflow process in which analyte release 165 from the cell occurs separate from the steps of cell barcoding 170 and target amplification 175. Specifically, analyte release 165 from a cell occurs within a first droplet followed by cell barcoding 170 and target amplification 175 in a second emulsion.
  • alternative workflow processes e.g., workflow processes other than the two-step workflow process shown in FIG. 1 A
  • the cell 110, reagents 120, reaction mixture 140, and barcode 145 can be encapsulated in a single emulsion.
  • analyte release 165 can occur within the droplet, followed by cell barcoding 170 and target amplification 175 within the same droplet.
  • FIG. IB depicts cell barcoding 170 and target amplification 175 as two separate steps, in various embodiments, the target nucleic acid is labeled with a barcode 145 through the nucleic acid amplification step.
  • FIG. 2 is a flow process for analyzing nucleic acid sequences derived from analytes of the single cell, in accordance with an embodiment. Specifically, FIG. 2 depicts the steps of pooling amplified nucleic acids at step 205, sequencing the amplified nucleic acids at step 210, read alignment at step 215, and characterization at step 220. Generally, the flow process shown in FIG. 2 is a continuation of the workflow process shown in FIG. IB. [0083] For example, after target amplification at step 175 of FIG. IB, the amplified nucleic acids 250A, 250B, and 250C are pooled at step 205 shown in FIG. 2.
  • FIG. 2 depicts three amplified nucleic acids 250A, 250B, and 250C.
  • pooled nucleic acids can include hundreds, thousands, or millions of nucleic acids derived from analytes of multiple cells.
  • molecular tags are introduced in bulk to the pooled amplified nucleic acids 205, otherwise referred to as amplicons. Thus, the molecular tags can be incorporated into the amplicons in bulk prior to sequencing.
  • each amplified nucleic acid 250 includes at least a sequence of a target nucleic acid 240 and a barcode 230.
  • an amplified nucleic acid 250 can include additional sequences, such as any of a universal primer sequence, a random primer sequence, a gene specific primer forward sequence, a gene specific primer reverse sequence, a constant region, or sequencing adapters.
  • each amplified nucleic acid 250 need not include the sequence of the target nucleic acid 240.
  • an amplified nucleic acid can include a gene tag.
  • the amplified nucleic acids 250A, 250B, and 250C are derived from the same single cell and therefore, the barcodes 230A, 230B, and 230C are the same. Therefore, sequencing of the barcodes 230 enables the determination that the amplified nucleic acids 250 are derived from the same cell.
  • the amplified nucleic acids 250A, 250B, and 250C are pooled and derived from different cells. Therefore, the barcodes 230 A, 230B, and 230C are different from one another and sequencing of the barcodes 230 enables the determination that the amplified nucleic acids 250 are derived from different cells.
  • amplified nucleic acids 205 may further include a molecular tag that enables distinguishing between analytes (e.g., RNA analytes) of a cell.
  • the molecular tag is located on the 3’ end of the amplified nucleic acid 205.
  • the molecular tag is located on the 5’ end of the amplified nucleic acid 205.
  • the molecular tag is not located on an end of the amplified nucleic acid 205, but rather within the amplified nucleic acid 205.
  • the pooled amplified nucleic acids 250 undergo sequencing to generate sequence reads. Sequence reads originating from individual cells are clustered according to the barcode sequences included in the amplicons.
  • the sequence reads for each single cell are aligned (e.g., to a reference genome). Aligning the sequence reads to the reference genome enables the determination of where in the genome the sequence read is derived from. For example, multiple sequence reads generated from amplicons derived from a RNA transcript molecule, when aligned to a position of the genome, can reveal that a gene at the position of the genome was transcribed. As another example, multiple sequence reads generated amplicons derived from a genomic DNA molecule, when aligned to a position of the genome, can reveal the sequence of the gene at the position of the genome.
  • characterization of the libraries and/or the single cells can be performed.
  • characterization of a library e.g., DNA library or RNA library
  • sequencing metrics e.g., sequencing metrics
  • characterization of single cells can involve identifying one or more mutations (e.g., allelic variants, point mutations, single nucleotide variations/polymorphisms, translocations, DNA/RNA fusions, loss of heterozygosity) that are present in one or more of the single cells. Further description regarding characterization of single cells is described in PCT/US2020/026480 and PCT/US2020/026482, each of which is hereby incorporated by reference in its entirety.
  • characterization at step 220 involves quantifying the number of analytes present in a cell based on the molecular tags included on the amplicons. For example, different molecular tags enables distinguishing between amplicons derived from different RNA analytes of a cell. Thus, by quantifying the number of different molecular tags, the number of RNA analytes for particular genes can be quantified as a measure of gene expression in the cell. For example, a larger quantified number of molecular tags can indicate the presence of a larger number of RNA analytes for a particular gene, thereby representing higher gene expression. In contrast, a lower quantified number of molecular tags can indicate the presence of fewer RNA analytes for a particular gene, thereby representing lower gene expression.
  • Embodiments described herein involve encapsulating one or more cells (e.g., at step 160 in FIG. IB) to perform single-cell analysis on the one or more cells.
  • the one or more cells can be isolated from a test sample obtained from a subject or a patient.
  • the one or more cells are healthy cells taken from a healthy subject.
  • the one or more cells include cancer cells taken from a subject previously diagnosed with cancer.
  • cancer cells can be tumor cells available in the bloodstream of the subject diagnosed with cancer.
  • the test sample is obtained from a subject following treatment of the subject (e.g., following a therapy such as cancer therapy).
  • single-cell analysis of the cells enables cellular and sub-cellular prediction of the subject’s response to a therapy.
  • encapsulating a cell with reagents is accomplished by combining an aqueous phase including the cell and reagents with an immiscible oil phase.
  • an aqueous phase including the cell and reagents are flowed together with a flowing immiscible oil phase such that water in oil emulsions are formed, where at least one emulsion includes a single cell and the reagents.
  • the immiscible oil phase includes a fluorous oil, a fluorous non-ionic surfactant, or both.
  • emulsions can have an internal volume of about 0.001 to 1000 picoliters or more and can range from 0.1 to 1000 pm in diameter.
  • the aqueous phase including the cell and reagents need not be simultaneously flowing with the immiscible oil phase.
  • the aqueous phase can be flowed to contact a stationary reservoir of the immiscible oil phase, thereby enabling the budding of water in oil emulsions within the stationary oil reservoir.
  • combining the aqueous phase and the immiscible oil phase can be performed in a microfluidic device.
  • the aqueous phase can flow through a microchannel of the microfluidic device to contact the immiscible oil phase, which is simultaneously flowing through a separate microchannel or is held in a stationary reservoir of the microfluidic device.
  • the encapsulated cell and reagents within an emulsion can then be flowed through the microfluidic device to undergo cell lysis.
  • Further example embodiments of adding reagents and cells to emulsions can include merging emulsions that separately contain the cells and reagents or picoinjecting reagents into an emulsion. Further description of example embodiments is described in US Application No. 14/420,646, which is hereby incorporated by reference in its entirety.
  • the encapsulated cell in an emulsion is lysed to generate cell lysate.
  • the cell is lysed due to the reagents which include one or more lysing agents that cause the cell to lyse.
  • lysing agents include detergents such as Triton X-100, NP-40 (e.g., Tergitol-type NP-40 or nonyl phenoxypolyethoxylethanol), as well as cytotoxins.
  • NP-40 include Thermo Scientific NP-40 Surfact-Amps Detergent solution, Igepal® ca-630, and Sigma Aldrich NP-40 (TERGITOL Type NP-40).
  • cell lysis may also, or instead, rely on techniques that do not involve a lysing agent in the reagent.
  • lysis may be achieved by mechanical techniques that may employ various geometric features to effect piercing, shearing, abrading, etc. of cells. Other types of mechanical breakage such as acoustic techniques may also be used. Further, thermal energy can also be used to lyse cells. Any convenient means of effecting cell lysis may be employed in the methods described herein.
  • the reagents include reverse transcriptase which reverse transcribes mRNA transcripts released from the cell to generate corresponding cDNA and further include primers that hybridize with mRNA transcripts, thereby enabling the reverse transcription reaction to occur.
  • molecular tags are introduced in the reagents and therefore, are incorporated into cDNA after reverse transcription.
  • FIGs. 3 A-3C depict the processing and releasing of analytes of a single cell in a droplet, in accordance with an embodiment. In FIG. 3 A, the cell is lysed, as indicated by the dotted line of the cell membrane.
  • the reagents include a detergent, such as NP40 (e.g., 0.01% or 1.0% NP40) or Triton-XlOO, which causes the cell to lyse.
  • the lysed cell includes analytes such as RNA transcripts within the cytoplasm of the cell as well as packaged DNA 302, which refers to the organization of DNA with histones, thereby forming nucleosomes that are packaged as chromatin.
  • the reagents included in the emulsion 300A further includes reverse transcriptase (abbreviated as “RT” 310).
  • the reagents included in the emulsion 300A further includes an enzyme 312 that digests the packaged DNA 302.
  • the enzyme 312 is proteinase K.
  • FIG. 3B depicts the emulsion 300B in a second state as reverse transcriptase performs reverse transcription on the RNA transcripts and the enzymes 312 digest the packaged DNA 302.
  • cDNA is generated as a result of reverse transcription.
  • the generated cDNA include molecular tags.
  • the genomic DNA is released from the packaged DNA 302 form.
  • FIG. 3C depicts the emulsion 300C in a third state that includes synthesized cDNA 306.
  • FIG. 3C also depicts freed gDNA 340 that is released from the packaged DNA 302.
  • the cDNA 306 include molecular tags.
  • the emulsion 300C can be exposed to conditions to inactivate the enzymes 312. In various embodiments, the emulsion 300C is exposed to an elevated temperature of at least 50°C to inactivate the enzymes 312. In various embodiments, the emulsion 300C is exposed to an elevated temperature of at least 60°C to inactivate the enzymes 312. In various embodiments, the emulsion 300C is exposed to an elevated temperature of at least 70°C to inactivate the enzymes 312. In various embodiments, the emulsion 300C is exposed to an elevated temperature of at least 80°C to inactivate the enzymes 312.
  • the emulsion 300C is exposed to an elevated temperature of at least 90°C to inactivate the enzymes 312. In various embodiments, the emulsion 300C is exposed to an elevated temperature of at least 95°C to inactivate the enzymes 312. In various embodiments, the emulsion 300C is exposed to an elevated temperature of at least 100°C to inactivate the enzymes 312.
  • the reaction mixture includes reactants sufficient for performing a reaction, such as nucleic acid amplification, on analytes of the cell lysate.
  • the reaction mixture 140 includes components, such as primers, for performing the nucleic acid reaction on the analytes.
  • primers are capable of acting as a point of initiation of synthesis along a complementary strand when placed under conditions in which synthesis of a primer extension product which is complementary to a nucleic acid strand is catalyzed.
  • a cell lysate is encapsulated with a reaction mixture and a barcode by combining an aqueous phase including the reaction mixture and the barcode with the cell lysate and an immiscible oil phase.
  • an aqueous phase including the reaction mixture and the barcode are flowed together with a flowing cell lysate and a flowing immiscible oil phase such that water in oil emulsions are formed, where at least one emulsion includes a cell lysate, the reaction mixture, and the barcode.
  • the immiscible oil phase includes a fluorous oil, a fluorous non-ionic surfactant, or both.
  • emulsions can have an internal volume of about 0.001 to 1000 picoliters or more and can range from 0.1 to 1000 pm in diameter.
  • combining the aqueous phase and the immiscible oil phase can be performed in a microfluidic device.
  • the aqueous phase can flow through a microchannel of the microfluidic device to contact the immiscible oil phase, which is simultaneously flowing through a separate microchannel or is held in a stationary reservoir of the microfluidic device.
  • the encapsulated cell lysate, reaction mixture, and barcode within an emulsion can then be flowed through the microfluidic device to perform amplification of target nucleic acids.
  • Further example embodiments of adding reaction mixture and barcodes to emulsions can include merging emulsions that separately contain the cell lysate and reaction mixture and barcodes or picoinjecting the reaction mixture and/or barcode into an emulsion. Further description of example embodiments of merging emulsions or picoinjecting substances into an emulsion is found in US Application No. 14/420,646, which is hereby incorporated by reference in its entirety.
  • the digestible primers are digested.
  • digested primers encompass primers that are broken down such that the primer can no longer hybridize with a target sequence.
  • digestested primers further encompasses completely digested primers that are reduced to individual nucleotides.
  • Digestible primers are digested to remove their subsequent participation in reactions such as nucleic acid amplification.
  • the digestion of digestible primers reduces or eliminates presence of the digestible primers. This can include digestible primers that have formed primer byproducts and misprimed digestible primers (e.g., digestible primers that have primed a different nucleic acid such as genomic DNA).
  • the emulsion may be incubated under conditions that facilitates the nucleic acid amplification reaction.
  • the emulsion may be incubated on the same microfluidic device as was used to add the reaction mixture and/or barcode, or may be incubated on a separate device.
  • incubating the emulsion under conditions that facilitates nucleic acid amplification is performed on the same microfluidic device used to encapsulate the cells and lyse the cells. Incubating the emulsions may take a variety of forms.
  • the emulsions containing the reaction mix, barcode, and cell lysate may be flowed through a channel that incubates the emulsions under conditions effective for nucleic acid amplification.
  • Flowing the microdroplets through a channel may involve a channel that snakes over various temperature zones maintained at temperatures effective for PCR.
  • Such channels may, for example, cycle over two or more temperature zones, wherein at least one zone is maintained at about 65° C. and at least one zone is maintained at about 95° C.
  • the number of zones, and the respective temperature of each zone may be readily determined by those of skill in the art to achieve the desired nucleic acid amplification.
  • the extent of nucleic amplification can be controlled by modulating the concentration of the reactants in the reaction mixture. In some instances, this is useful for fine tuning of the reactions in which the amplified products are used.
  • emulsions containing the amplified nucleic acids are collected.
  • the emulsions are collected in a well, such as a well of a microfluidic device.
  • the emulsions are collected in a reservoir or a tube, such as an Eppendorf tube.
  • the amplified nucleic acids across the different emulsions are pooled.
  • the emulsions are broken by providing an external stimuli to pool the amplified nucleic acids.
  • the emulsions naturally aggregate over time given the density differences between the aqueous phase and immiscible oil phase.
  • the amplified nucleic acids pool in the aqueous phase.
  • the amplified nucleic acids can undergo further preparation for sequencing.
  • sequencing adapters can be added to the pooled nucleic acids.
  • Example sequencing adapters are P5 and P7 sequencing adapters. The sequencing adapters enable the subsequent sequencing of the nucleic acids.
  • Amplified nucleic acids are sequenced to obtain sequence reads for generating a sequencing library.
  • the amplified nucleic acids include molecular tags and are sequenced to generate sequence reads with molecular tags.
  • Sequence reads can be achieved with commercially available next generation sequencing (NGS) platforms, including platforms that perform any of sequencing by synthesis, sequencing by ligation, pyrosequencing, using reversible terminator chemistry, using phospholinked fluorescent nucleotides, or real-time sequencing.
  • NGS next generation sequencing
  • amplified nucleic acids may be sequenced on an Illumina MiSeq platform.
  • libraries of NGS fragments are cloned in-situ amplified by capture of one matrix molecule using granules coated with oligonucleotides complementary to adapters.
  • Each granule containing a matrix of the same type is placed in a microbubble of the “water in oil” type and the matrix is cloned amplified using a method called emulsion PCR.
  • emulsion PCR After amplification, the emulsion is destroyed and the granules are stacked in separate wells of a titration picoplate acting as a flow cell during sequencing reactions.
  • each of the four dNTP reagents into the flow cell occurs in the presence of sequencing enzymes and a luminescent reporter, such as luciferase.
  • a luminescent reporter such as luciferase.
  • the resulting ATP produces a flash of luminescence within the well, which is recorded using a CCD camera. It is possible to achieve a read length of more than or equal to 400 bases, and it is possible to obtain 10 6 readings of the sequence, resulting in up to 500 million base pairs (megabytes) of the sequence.
  • sequencing data is produced in the form of short readings.
  • fragments of a library of NGS fragments are captured on the surface of a flow cell that is coated with oligonucleotide anchor molecules.
  • An anchor molecule is used as a PCR primer, but due to the length of the matrix and its proximity to other nearby anchor oligonucleotides, elongation by PCR leads to the formation of a “vault” of the molecule with its hybridization with the neighboring anchor oligonucleotide and the formation of a bridging structure on the surface of the flow cell .
  • These DNA loops are denatured and cleaved. Straight chains are then sequenced using reversibly stained terminators.
  • the nucleotides included in the sequence are determined by detecting fluorescence after inclusion, where each fluorescent and blocking agent is removed prior to the next dNTP addition cycle. Additional details for sequencing using the Illumina platform is found in Voelkerding et al., Clinical Chem., 55: 641-658, 2009; MacLean et al., Nature Rev. Microbiol., 7: 287-296; US patent No. 6,833,246; US patent No. 7,115,400; US patent No. 6,969,488; each of which is hereby incorporated by reference in its entirety.
  • Sequencing of nucleic acid molecules using SOLiD technology includes clonal amplification of the library of NGS fragments using emulsion PCR. After that, the granules containing the matrix are immobilized on the derivatized surface of the glass flow cell and annealed with a primer complementary to the adapter oligonucleotide. However, instead of using the indicated primer for 3 'extension, it is used to obtain a 5' phosphate group for ligation for test probes containing two probe-specific bases followed by 6 degenerate bases and one of four fluorescent labels.
  • test probes have 16 possible combinations of two bases at the 3 'end of each probe and one of four fluorescent dyes at the 5' end.
  • the color of the fluorescent dye and, thus, the identity of each probe corresponds to a certain color space coding scheme.
  • HeliScope from Helicos BioSciences is used. Sequencing is achieved by the addition of polymerase and serial additions of fluorescently- labeled dNTP reagents. Switching on leads to the appearance of a fluorescent signal corresponding to dNTP, and the specified signal is captured by the CCD camera before each dNTP addition cycle. The reading length of the sequence varies from 25-50 nucleotides with a total yield exceeding 1 billion nucleotide pairs per analytical work cycle. Additional details for performing sequencing using HeliScope is found in Voelkerding et al., Clinical Chem., 55: 641-658, 2009; MacLean et al., Nature Rev.
  • a Roche sequencing system 454 is used. Sequencing 454 involves two steps. In the first step, DNA is cut into fragments of approximately 300-800 base pairs, and these fragments have blunt ends. Oligonucleotide adapters are then ligated to the ends of the fragments. The adapter serve as primers for amplification and sequencing of fragments. Fragments can be attached to DNA-capture beads, for example, streptavidin- coated beads, using, for example, an adapter that contains a 5'-biotin tag. Fragments attached to the granules are amplified by PCR within the droplets of an oil-water emulsion.
  • the result is multiple copies of cloned amplified DNA fragments on each bead.
  • the granules are captured in wells (several picoliters in volume).
  • Pyrosequencing is carried out on each DNA fragment in parallel. Adding one or more nucleotides leads to the generation of a light signal, which is recorded on the CCD camera of the sequencing instrument. The signal intensity is proportional to the number of nucleotides included.
  • Pyrosequencing uses pyrophosphate (PPi), which is released upon the addition of a nucleotide. PPi is converted to ATP using ATP sulfurylase in the presence of adenosine 5 'phosphosulfate.
  • Luciferase uses ATP to convert luciferin to oxyluciferin, and as a result of this reaction, light is generated that is detected and analyzed. Additional details for performing sequencing 454 is found in Margulies et al. (2005) Nature 437: 376-380, which is hereby incorporated by reference in its entirety.
  • Ion Torrent technology is a DNA sequencing method based on the detection of hydrogen ions that are released during DNA polymerization.
  • the microwell contains a fragment of a library of NGS fragments to be sequenced.
  • the hypersensitive ion sensor ISFET Under the microwell layer is the hypersensitive ion sensor ISFET. All layers are contained within a semiconductor CMOS chip, similar to the chip used in the electronics industry.
  • CMOS chip similar to the chip used in the electronics industry.
  • dNTP is incorporated into a growing complementary chain, a hydrogen ion is released that excites a hypersensitive ion sensor. If homopolymer repeats are present in the sequence of the template, multiple dNTP molecules will be included in one cycle. This results in a corresponding amount of hydrogen atoms being released and in proportion to a higher electrical signal.
  • sequencing is performed using Oxford Nanopore technologies. Additional details for the Oxford Nanopore technology is described in Jain, M., et al. The Oxford Nanopore MinlON: delivery of nanopore sequencing to the genomics community. Genome Biol 17, 239 (2016), which is incorporated by reference in its entirety. [00116] In various embodiments, sequencing is performed using PacBio technologies. Additional details for PacBio sequencing is described in Rhoads, A. et al, PacBio Sequencing and its Applications. Genomics, Proteomics, & Bioinformatics, 13(5), (2015), 278-289, which is incorporated by reference in its entirety.
  • sequencing reads obtained from the NGS methods can be filtered by quality and grouped by barcode sequence using any algorithms known in the art, e.g., Python script barcodeCleanup.py .
  • a given sequencing read may be discarded if more than about 20% of its bases have a quality score (Q-score) less than Q20, indicating a base call accuracy of about 99%.
  • a given sequencing read may be discarded if more than about 5%, about 10%, about 15%, about 20%, about 25%, about 30% have a Q-score less than Q10, Q20, Q30, Q40, Q50, Q60, or more, indicating a base call accuracy of about 90%, about 99%, about 99.9%, about 99.99%, about 99.999%, about 99.9999%, or more, respectively.
  • all sequencing reads associated with a barcode containing less than 50 reads may be discarded to ensure that all barcode groups, representing single cells, contain a sufficient number of high-quality reads.
  • all sequencing reads associated with a barcode containing less than 30, less than 40, less than 50, less than 60, less than 70, less than 80, less than 90, less than 100 or more may be discarded to ensure the quality of the barcode groups representing single cells.
  • Sequence reads with common barcode sequences may be aligned to a reference genome using known methods in the art to determine alignment position information.
  • the alignment position information may indicate a beginning position and an end position of a region in the reference genome that corresponds to a beginning nucleotide base and end nucleotide base of a given sequence read.
  • a region in the reference genome may be associated with a target gene or a segment of a gene.
  • Example aligner algorithms include BWA, Bowtie, Spliced Transcripts Alignment to a Reference (STAR), Tophat, or HISAT2. Further details for aligning sequence reads to reference sequences is described in US Application No.
  • an output file having SAM (sequence alignment map) format or BAM (binary alignment map) format may be generated and output for subsequent analysis.
  • Embodiments disclosed herein refer to a process for analyzing analytes of a cell through a single-cell workflow using molecular tags.
  • the embodiments described herein including the embodiments described below in reference to FIGs. 4A-4G, 5A-5B, 6A-6B, 7A-7C, 8A-8C, 9A-9B, 10, and 11 A-l IB, refer to the incorporation of molecular tags in any one of the 1) encapsulation stage, 2) barcoding stage, or 3) bulk stage.
  • the steps that occur during the encapsulation stage refer to at least cell encapsulation 160 and analyte release 165 shown in FIG. IB.
  • the steps that occur during the barcoding stage occur during cell barcoding 170 and target amplification 175 shown in FIG. IB.
  • the steps that occur during the bulk stage occur after target amplification 175 shown in FIG. IB after the amplicons from different cells have been pooled.
  • Methods disclosed herein involve incorporating molecular tags using digestible primers, such as digestible ribonucleotides or digestible uracils.
  • digestible primers such as digestible ribonucleotides or digestible uracils.
  • removal of digestible primers ensures that such primers are not present during downstream reactions (e.g., during a cell barcoding step 170 as shown in FIG. IB.
  • the presence of such primers during downstream reactions e.g., during nucleic acid amplification
  • would result in their participation in the downstream reactions thereby resulting in multiple molecular tags per unique molecule.
  • digestible primers a single molecular tag is incorporated per unique molecule.
  • FIG. 4 A depicts the processing of RNA and gDNA in a droplet, in accordance with an embodiment using digestible ribonucleotides.
  • FIG. 4A shows the processing of RNA and DNA during the encapsulation stage.
  • a digestible oligonucleotide comprising a reverse primer (rev primer), a molecular tag, and a seq8F is provided to the RNA.
  • the reverse primer, molecular tag, and seq8F each include one or more ribonucleotides, which ensures that the oligonucleotide can be subsequently digested.
  • the reverse primer hybridizes with a portion of the RNA and reverse transcriptase performs reverse transcription.
  • a cDNA that includes the oligonucleotide (e.g., reverse primer, molecular tag, and seq8F).
  • genomic DNA gDNA is released using a protease, such as proteinase K.
  • FIG. 4B depicts the amplification and barcoding of nucleic acids derived from RNA and gDNA, in accordance with the embodiment shown in FIG. 4A.
  • FIG. 4B shows the steps performed during the barcoding stage.
  • a forward primer hybridizes with a portion of the cDNA.
  • the forward primer may be a gene specific primer.
  • the strand is extended along the forward primer.
  • a forward and reverse primer pair hybridize with portions of the gDNA.
  • both the forward primer and reverse primer may be gene specific primers.
  • Nucleic acid extension e.g., using DNA polymerase
  • a corresponding cDNA strand is now generated.
  • the corresponding cDNA strand includes a complementary sequence of the reverse primer, molecular tag, and seq8F.
  • the complementary sequence of the reverse primer, molecular tag, and seq8F do not include ribonucleotides.
  • RNAse such as RNAse H is provided to digest the digestible oligonucleotide including the reverse primer, molecular tag, and seq8F that include ribonucleotides.
  • this prevents the digestible oligonucleotide from participating in subsequent nucleic acid amplification reactions.
  • it removes the original molecular tag including ribonucleotides that was first introduced.
  • cell barcodes are incorporated into the amplicons.
  • cell barcodes are incorporated by providing an oligonucleotide that includes a constant region (e.g., seq8F) and the cell barcode (CBC).
  • the constant region is complementary to another constant region on the cDNA amplicon.
  • extension of the nucleic acid along the cDNA amplicon generates a subsequent amplicon that incorporates the cell barcode.
  • the subsequent amplicon additionally includes the molecular tag.
  • cell barcodes are incorporated into amplicons deriving from genomic DNA.
  • An oligonucleotide that includes a constant region (e.g., seq8F) and the cell barcode (CBC) hybridizes with a complementary constant region on the genomic DNA that was introduced through a forward primer in a previous PCR cycle.
  • CBC cell barcode
  • FIG. 4C depicts the processing of RNA and gDNA in a droplet, in accordance with an embodiment using digestible uracils. Specifically, FIG. 4C shows the processing of RNA and DNA during the encapsulation stage.
  • a digestible oligonucleotide comprising a reverse primer (rev primer), a molecular tag, and a seq8F is provided to the RNA.
  • the reverse primer, molecular tag, and seq8F each include one or more uracil bases, which ensures that the digestible oligonucleotide can be subsequently digested.
  • the reverse primer hybridizes with a portion of the RNA and reverse transcriptase performs reverse transcription.
  • a forward primer is provided to generate a further cDNA strand.
  • the forward primer hybridizes with a portion of the cDNA, and extension occurs along the cDNA.
  • the further cDNA strand includes a sequence complementary to the reverse primer, molecular tag, and seq8F.
  • the sequence complementary to the reverse primer, molecular tag, and seq8F do not include uracils.
  • genomic DNA gDNA is released using a protease, such as proteinase K.
  • FIG. 4D depicts the amplification and barcoding of nucleic acids derived from RNA and gDNA, in accordance with the embodiment shown in FIG. 4C.
  • FIG. 4D shows the steps performed during the barcoding stage.
  • uracil-DNA glycosylase UDG
  • UDG uracil-DNA glycosylase
  • a forward and reverse primer pair hybridize with portions of the gDNA.
  • both the forward primer and reverse primer may be gene specific primers.
  • Nucleic acid extension e.g., using DNA polymerase occurs and extends the gDNA from the forward primer and reverse primer.
  • cell barcodes are incorporated into the amplicons.
  • Cell barcodes are incorporated into amplicons derived from cDNA by providing an oligonucleotide that includes a constant region (e.g., seq8F) and the cell barcode (CBC).
  • the constant region is complementary to another constant region on the cDNA amplicon.
  • extension of the nucleic acid along the cDNA amplicon generates a subsequent amplicon that incorporates the cell barcode.
  • the subsequent amplicon additionally includes the molecular tag.
  • cell barcodes are incorporated into amplicons deriving from genomic DNA.
  • An oligonucleotide that includes a constant region (e.g., seq8F) and the cell barcode (CBC) hybridizes with a complementary constant region on the genomic DNA that was introduced through a forward primer in a previous PCR cycle.
  • CBC cell barcode
  • the primers included in the encapsulation step (e.g., shown in FIG. 4 A and FIG. 4C) and the barcoding step (shown in FIG. 4B and FIG. 4D) can be differently designed such that the barcode can be differently incorporated into the amplicons derived from RNA.
  • the reverse primers added in the encapsulation step include a seq8F constant region.
  • the seq8F constant region enables the incorporation of the cell barcode such that both the molecular tag and the cell barcode are on the same end of the amplicon derived from RNA.
  • the seq8F constant region can be introduced in the barcoding step via a forward primer that hybridizes with the cDNA.
  • the seq8F region enables the incorporation of the cell barcode such that the molecular tag and the cell barcode are on opposite ends of the amplicon derived from RNA.
  • FIGs. 4E-4G depicts the processing of RNA and gDNA in a droplet and the amplification and barcoding of nucleic acids derived from RNA and gDNA, in accordance with a third embodiment using digestible primers.
  • FIG. 4E depicts the processing of RNA and gDNA in an encapsulation droplet (e.g., at step 160 and 165 shown in FIG. IB).
  • the top panel shows the processing of RNA
  • the bottom panel shows the processing of DNA.
  • a digestible primer e.g., with digestible ribonucleotides or uracils
  • the digestible primer includes a reverse primer, a molecular tag (Ml) and a 32902 sequence.
  • the reverse primer and 32902 sequence may include digestible ribonucleotides or uracils.
  • RNA-cDNA hybrid is exposed to RNaseH, which randomly nicks the RNA.
  • RNaseH the still hybridized RNA strand serves as a primer for generating a second strand of cDNA.
  • a second strand of cDNA is generated by DNA polymerase using the first cDNA strand as a template.
  • [00135] Referring to the DNA, it is released from chromatin packaging by being exposed to a protease, such as proteinase K. Thus, as shown in FIG. 4E, double stranded genomic DNA is released into the droplet.
  • a protease such as proteinase K
  • FIG. 4F shows the amplification and barcoding of nucleic acids derived from RNA.
  • the steps shown in FIG. 4F occur in a second droplet after the steps in FIG. 4E occur in a first droplet.
  • the digestible primer sequences of the first cDNA strand are digested.
  • the digestible primer sequences include uracils
  • uracil-DNA glycosylase (UDG) is provided to digest the sequences.
  • UDG uracil-DNA glycosylase
  • RNaseH is provided to digest the sequences.
  • a cell barcode is incorporated into the amplicon derived from the RNA molecule.
  • a forward primer and reverse primer e.g., 32902 reverse primer
  • the forward primer can include a constant sequence (Seq8F sequence) which is useful in subsequent amplification rounds for incorporating the cell barcode.
  • the amplicon now includes a Seq8F sequence.
  • a primer sequence including a complementary constant region (Seq8F), cell barcode (CBC), and a read sequence is provided.
  • the complementary constant region hybridizes with the Seq8F constant region, and therefore, through a subsequent amplification cycle, the cell barcode is incorporated into an amplicon.
  • FIG. 4F show the optional step of a streptavidin bead pull down to obtain the amplicon sequence of the following format: Read 1 - cell barcode - seq8F - forward primer - cDNA sequence - reverse primer, molecular tag - 32902 sequence. Subsequent PCR cycles can take place for building the library.
  • FIG. 4G shows the amplification and barcoding of nucleic acids derived from DNA.
  • the forward and reverse primers may be gene specific primers that target specific sequences of the genomic DNA.
  • Such forward and reverse primers may anneal with the DNA strand at a particular annealing temperature (e.g., between 50°C and 70°C, and preferably about 61°C).
  • the forward primer may include a constant region (Seq8F) which is subsequently incorporated into the amplicon following the amplification cycle.
  • a primer sequence including a complementary constant region (Seq8F), cell barcode (CBC), and a read sequence (Read 1) is provided.
  • the primer sequence may anneal with the constant region (Seq8F) at a second annealing temperature that is lower than the annealing temperature described above in relation to the forward and reverse primers.
  • the second annealing temperature is between 40°C and 60°C, and preferably about 51°C. This enables incorporation of the cell barcode into amplicons following a subsequent cycle of amplification.
  • Methods disclosed herein involve incorporating molecular tags using template switching oligonucleotides.
  • such methods can be advantageous to incorporate molecular barcodes into full length transcripts.
  • subsequent downstream analysis e.g., sequencing
  • sequencing can capture information of the full length transcripts as opposed to only a portion of the transcripts. This can be particularly valuable for applications that focus on quantifying full length molecules based on presence of molecular barcodes on an end of the molecules (e.g., on the 5’ end).
  • FIG. 5 A depicts the processing of RNA in a droplet, in accordance with an embodiment using template switching oligonucleotides. Specifically, FIG. 5A shows the processing of RNA during the encapsulation stage. Although not shown in FIG. 5 A, the processing of DNA can occur in parallel using methods disclosed herein.
  • the top panel of FIG. 5 A shows the introduction of an oligonucleotide including a reverse primer and seq8F.
  • the reverse primer hybridizes with a portion of the RNA analyte and reverse transcription occurs to generate cDNA that includes the reverse primer and seq8F.
  • the reverse transcription process generates a 3’ dC extension on the cDNA.
  • a template switching oligonucleotide is introduced.
  • the TSO includes a template switching (TS) sequence, a molecular tag, and a sequence that enables hybridization with the cDNA molecule. As shown in FIG.
  • the sequence that enables hybridization with the cDNA molecule is a repeating guanine unit that hybridizes with the 3’ dC extension of the cDNA.
  • the repeating guanine unit is a rGrGrG sequence.
  • the TSO causes template switching and therefore, extension further occurs along the cDNA molecule beginning at the 3’ dC extension.
  • a complementary sequence of the molecular tag is incorporated into the cDNA.
  • FIG. 5B depicts the amplification and barcoding of nucleic acids derived from RNA, in accordance with the embodiment shown in FIG. 5 A.
  • FIG. 5B shows the steps performed during the barcoding stage.
  • a forward primer is hybridized to the cDNA.
  • the forward primer hybridizes with the TS sequence and nucleic acid extension proceeds to generate a complementary cDNA that incorporates the molecular tag.
  • a cell barcode (CBC) is incorporated.
  • Cell barcodes are incorporated by providing an oligonucleotide that includes a constant region (e.g., seq8F) and the cell barcode (CBC).
  • the constant region is complementary to another constant region on the cDNA amplicon.
  • extension of the nucleic acid along the cDNA amplicon generates a subsequent amplicon that incorporates the cell barcode.
  • the subsequent amplicon additionally includes the molecular tag.
  • Methods disclosed herein involve incorporating molecular tags using gene specific primers.
  • such methods include providing gene specific primers with molecular tags independent of reverse transcription. This avoids reaction inefficiencies arising from undesired interactions between different reagents for reverse transcription (e.g., reverse transcription primers and enzymes) and reagents for nucleic acid amplification (e.g., gene specific primers with molecular tags and enzymes). Additionally higher temperatures can be applied during extension during nucleic acid amplification (thereby incorporating the molecular tags), as opposed to during reverse transcription. Therefore, fewer byproducts are generated, the byproducts arising from interactions between the molecular tags and other primers. Furthermore, by separating reverse transcription and nucleic acid amplification, the number of nucleic acid amplifications cycles can be limited, which minimizes amplification bias.
  • FIG. 6 A depicts the processing of RNA and gDNA in a droplet, in accordance with an embodiment using gene specific primers comprising molecular tags.
  • FIG. 6 A shows the processing of RNA and gDNA during the encapsulation stage.
  • the top panel of FIG. 6 A shows the introduction of an oligonucleotide including a reverse primer.
  • the reverse primer may be a digestible primer that includes uracils.
  • the reverse primer may be a digestible primer that includes ribonucleotides.
  • the reverse primer is not a digestible primer.
  • the reverse primer hybridizes with a portion of the RNA analyte and reverse transcription occurs to generate cDNA that includes the reverse primer.
  • genomic DNA gDNA
  • a protease such as proteinase K.
  • FIG. 6B depicts the amplification and barcoding of nucleic acids derived from RNA and gDNA, in accordance with the embodiment shown in FIG. 6A.
  • FIG. 6B shows the steps performed during the barcoding stage.
  • uracil-DNA glycosylase UDG
  • UDG uracil-DNA glycosylase
  • an oligonucleotide including a forward primer, a molecular tag, and seq8F (constant region) is introduced.
  • the forward primer is a gene specific primer.
  • the forward primer hybridizes with a specific region of the cDNA.
  • Nucleic acid extension results in the generation of a complementary cDNA amplicon that incorporates the molecular tag.
  • a forward and reverse primer pair hybridize with portions of the gDNA.
  • both the forward primer and reverse primer may be gene specific primers.
  • Nucleic acid extension e.g., using DNA polymerase occurs and extends the gDNA from the forward primer and reverse primer.
  • a second reverse primer (different from reverse primer used for initiating reverse transcription) is introduced, which hybridizes with the cDNA. This enables nucleic acid extension along the cDNA to further generate a cDNA amplicon that also includes the molecular tag.
  • cell barcodes are incorporated into the amplicons.
  • Cell barcodes are incorporated into amplicons derived from cDNA by providing an oligonucleotide that includes a constant region (e.g., seq8F) and the cell barcode (CBC).
  • the constant region (seq8F) is complementary to another constant region (seq8F) on the cDNA amplicon.
  • extension of the nucleic acid along the cDNA amplicon generates a subsequent amplicon that incorporates the cell barcode.
  • the subsequent amplicon additionally includes the molecular tag.
  • cell barcodes are incorporated into amplicons deriving from genomic DNA.
  • An oligonucleotide that includes a constant region (e.g., seq8F) and the cell barcode (CBC) hybridizes with a complementary constant region on the genomic DNA that was introduced through a forward primer in a previous PCR cycle.
  • CBC cell barcode
  • Methods disclosed herein involve incorporating molecular tags in bulk following single cell encapsulation, lysis, and barcoding.
  • introducing molecular tags in bulk enables a broader range of experimental conditions (e.g., temperatures, buffer conditions, concentrations of molecular tags and reagents) than if the molecular tags are introduced within droplets.
  • performing bulk molecular tagging may be able to achieve higher efficiency in comparison to molecular tagging in droplets.
  • FIG. 7 A depicts the processing of RNA and gDNA in a droplet, in accordance with an embodiment in which molecular tags are introduced in bulk.
  • FIG. 7A shows the processing of RNA and gDNA during the encapsulation stage.
  • the top panel of FIG. 7A shows the introduction of an oligonucleotide including a reverse primer.
  • the reverse primer may be a digestible primer that includes uracils.
  • the reverse primer may be a digestible primer that includes ribonucleotides.
  • FIG. 7A shows the reverse primer as a digestible primer, in various embodiments, the reverse primer is not digestible. Therefore, the reverse primer remains present throughout the subsequent steps.
  • the reverse primer hybridizes with a portion of the RNA analyte and reverse transcription occurs to generate cDNA that includes the reverse primer. Additionally, genomic DNA (gDNA) is released using a protease, such as proteinase K.
  • FIG. 7B depicts the amplification and barcoding of nucleic acids derived from RNA and gDNA, in accordance with the embodiment shown in FIG. 7A.
  • FIG. 7B shows the steps performed during the barcoding stage.
  • UDG uracil-DNA glycosylase
  • RNaseH uracil-DNA glycosylase
  • an oligonucleotide including a forward primer and seq8F (constant region) is introduced.
  • the forward primer is a gene specific primer.
  • the forward primer hybridizes with a specific region of the cDNA. Nucleic acid extension results in the generation of a complementary cDNA amplicon that incorporates the forward primer.
  • a forward and reverse primer pair hybridize with portions of the gDNA.
  • both the forward primer and reverse primer may be gene specific primers.
  • Nucleic acid extension e.g., using DNA polymerase occurs and extends the gDNA from the forward primer and reverse primer.
  • a second reverse primer (different from reverse primer used for initiating reverse transcription) is introduced, which hybridizes with the cDNA. This enables nucleic acid extension along the cDNA to further generate a cDNA amplicon.
  • cell barcodes are incorporated into the amplicons.
  • Cell barcodes are incorporated into amplicons derived from cDNA by providing an oligonucleotide that includes a constant region (e.g., seq8F) and the cell barcode (CBC).
  • the constant region (seq8F) is complementary to another constant region (seq8F) on the cDNA amplicon.
  • cell barcodes are incorporated into amplicons deriving from genomic DNA.
  • An oligonucleotide that includes a constant region (e.g., seq8F) and the cell barcode (CBC) hybridizes with a complementary constant region on the genomic DNA that was introduced through a forward primer in a previous PCR cycle.
  • CBC cell barcode
  • FIG. 7C depicts the molecular tagging in bulk, in accordance with the embodiments shown in FIGs. 7A and 7B.
  • an oligonucleotide an introduced, the oligonucleotide including the molecular tag.
  • the oligonucleotide further includes a sequence (“32092”) that hybridizes with a corresponding sequence (“32092”) of the cDNA amplicon. Nucleic acid extension occurs in bulk, which generates a complementary cDNA amplicon that incorporates the molecular tag.
  • each original molecule can only make one strand to which the 32092-tag- const can hybridize.
  • a forward primer extends in a 1 st cycle of PCR during barcoding.
  • the reverse primer - 32092 shown in FIG. 7B can extend in the 2 nd cycle if the forward primer is extended. Therefore, only a single molecule from each RNA molecule would have a 32092-tag-const prime.
  • Methods disclosed herein involve incorporating molecular tags and gene tags in sequencing libraries (e.g., DNA or RNA sequencing libraries).
  • sequencing libraries e.g., DNA or RNA sequencing libraries.
  • the final nucleic acids of the sequencing libraries do not contain sequences corresponding to sequences of the original genomic DNA or RNA transcripts.
  • the information can be replaced and/or maintained by the presence of a short gene tag in the final amplicon.
  • the original DNA or RNA sequences need not be amplified and/or sequenced.
  • PCR bias is minimized.
  • shorter amplicons are more efficiently amplified and more cheaply sequenced in comparison to longer amplicons including the original DNA or RNA sequence.
  • FIG. 8 A depicts the processing of RNA in a droplet, in accordance with an embodiment incorporating molecular tags and gene tags. Specifically, FIG. 8A shows the processing of RNA during the encapsulation stage. Although not shown in FIG. 8 A, the processing of DNA can occur in parallel using methods disclosed herein.
  • the top panel of FIG. 8 A shows the introduction of an oligonucleotide including a reverse primer.
  • the reverse primer may be a digestible primer that includes uracils.
  • the reverse primer may be a digestible primer that includes ribonucleotides.
  • FIG. 8A shows the reverse primer as a digestible primer, in various embodiments, the reverse primer is not digestible. Therefore, the reverse primer remains present throughout the subsequent steps.
  • the reverse primer hybridizes with a portion of the RNA analyte and reverse transcription occurs to generate cDNA that includes the reverse primer.
  • the oligonucleotide further includes a handle.
  • the handle includes a gene tag (e.g., a gene specific amplicon tag), which is a nucleotide sequence that identifies the specific gene that is targeted by the reverse primer.
  • a gene tag e.g., a gene specific amplicon tag
  • oligonucleotides may include the same reverse primer and same gene tag, given that the design of the reverse primer controls the targeting of the corresponding sequence of the RNA.
  • a gene tag may include about 6 to about 20 nucleotides. In some embodiments, a gene tag includes 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides. In some embodiments, a gene tag includes at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or at least 20 nucleotides in length. In various embodiments, every gene tag is different from every other gene tag (e.g., each gene tag is unique). In various embodiments, a gene tag is randomly generated. In such embodiments, gene tags may not be unique. For example a first randomly generated gene tag may have a same sequence as a sequence of a second randomly generated gene tag.
  • the forward primer may be a digestible primer (e.g., includes one or more ribonucleotides or uracils). In various embodiments, the forward primer is a gene specific primer.
  • FIG. 8B depicts the amplification and barcoding of nucleic acids derived from RNA, in accordance with the embodiment shown in FIG. 8A.
  • RNase e.g., RNase A or RNase H
  • RNase H is introduced in the barcoding step which digests the RNA portions including the original reverse primer, handle, and forward primer.
  • cell barcodes are incorporated into the amplicons.
  • the nucleic acid amplification can involve linear amplification.
  • Cell barcodes are incorporated into amplicons derived from cDNA by providing an oligonucleotide that includes a constant region (e.g., seq8F) and the cell barcode (CBC).
  • the constant region (seq8F) is complementary to another constant region (seq8F) on the cDNA amplicon.
  • each cDNA amplicon further includes the handle, which also includes the gene tag.
  • FIG. 8B cell barcodes and gene tags are incorporated into the cDNA amplicons, but no molecular tags have yet been incorporated.
  • FIG. 8C depicts the molecular tagging in bulk, in accordance with the embodiments shown in FIGs. 8 A and 8B.
  • a first step in bulk may involve performing a pulldown of RNA.
  • a pull down can be achieved via a 32092 biotin pulldown, where the biotin is conjugated to a bead, such as an streptavidin bead.
  • a nucleic acid sequence attached to the biotin-streptavidin conjugate is complementary to the 32092 sequence of the cDNA amplicon, thereby enabling the pulldown.
  • an oligonucleotide including the molecular tag (“MolTag”) is introduced.
  • the oligonucleotide further includes a constant region that is complementary to a corresponding constant region of the cDNA amplicon.
  • the constant region of the cDNA amplicon may be included in the handle, which was previously incorporated during reverse transcription.
  • nucleic acid extension beginning at the constant region results in the generation of a complementary strand that excludes the original gene sequence.
  • the extension occurs on the second cDNA strand in a 5’ to 3’ direction.
  • the resulting amplicon for sequencing includes the molecular tag, gene tag, and cell barcode, but does not include the original gene sequence.
  • Methods disclosed herein involve incorporating molecular tags through conversion of universal bases.
  • universal bases can be introduced via a primer.
  • a polymerase would create the molecular tag as a complementary sequence to the universal bases.
  • This process can achieve low variability across runs (e.g., arising from different ratios of randomers incorporated during synthesis).
  • different runs may include different ratios of nucleotide bases (e.g., different ratios of A, C, G, T), leading to lot-to-lot variability.
  • each run may have bias depending on the ratio of nucleotide bases added in the reagents or reaction mixture.
  • the ratio of universal bases is dependent on polymerase bias and mix of nucleotides (provided in excess). Therefore, the known bias can be accounted for, resulting in more accurate results.
  • FIG. 9 A depicts the processing of RNA and gDNA in a droplet, in accordance with an embodiment incorporating universal bases.
  • FIG. 9A shows the processing of RNA and gDNA during the encapsulation stage.
  • the top panel of FIG. 9 A shows the introduction of an oligonucleotide including a reverse primer.
  • the oligonucleotide can further include one or more universal bases, examples of which include any of inosine, 2’deoxyinosine, 5-nitroindole, 5' 5-Nitroindole, or 3 -Nitropyrrole.
  • the reverse primer hybridizes with a portion of the RNA analyte and reverse transcription occurs to generate cDNA that includes the reverse primer and the universal bases.
  • a forward primer is introduced that hybridizes to the cDNA.
  • Nucleic acid extension is performed (e.g., via DNA polymerase) beginning at the forward primer to generate a second cDNA that includes a sequence that is complementary to the reverse primer and the universal bases.
  • the universal bases are characterized in that they are complementary to any of the four natural DNA nucleotides (adenine, thymine, guanine, and cytosine).
  • the sequence complementary to the universal base represents a molecular tag that can be generally different from other molecular tags of other cDNA molecules.
  • the molecular tag of the cDNA is generated by a DNA polymerase as a result of the nucleic acid extension reaction.
  • genomic DNA gDNA
  • protease such as proteinase K.
  • FIG. 9B depicts the amplification and barcoding of nucleic acids derived from RNA and gDNA, in accordance with the embodiment shown in FIG. 9A.
  • FIG. 9B shows the steps performed during the barcoding stage.
  • a forward and reverse primer pair hybridize with portions of the gDNA.
  • both the forward primer and reverse primer may be gene specific primers.
  • Nucleic acid extension e.g., using DNA polymerase occurs and extends the gDNA from the forward primer and reverse primer.
  • a corresponding cDNA strand is now generated.
  • the corresponding cDNA strand includes a complementary sequence of the reverse primer, molecular tag, and seq8F.
  • cell barcodes CBC
  • cell barcodes are incorporated into the amplicons.
  • cell barcodes are incorporated by providing an oligonucleotide that includes a constant region (e.g., seq8F) and the cell barcode (CBC).
  • the constant region (seq8F) is complementary to another constant region (seq8F) on the cDNA amplicon.
  • extension of the nucleic acid along the cDNA amplicon generates a subsequent amplicon that incorporates the cell barcode.
  • the subsequent amplicon additionally includes the molecular tag.
  • cell barcodes are incorporated into amplicons deriving from genomic DNA.
  • An oligonucleotide that includes a constant region (e.g., seq8F) and the cell barcode (CBC) hybridizes with a complementary constant region (seq8F) on the genomic DNA that was introduced through a forward primer in a previous PCR cycle.
  • CBC cell barcode
  • extension of the nucleic acid along the amplicon derived from gDNA results in a subsequent amplicon that incorporates the cell barcode.
  • Methods disclosed herein involve using differential cleaving of RNA transcripts as a means of distinguishing amplicons.
  • using cleavage sites as molecular barcodes eliminates the use of primer sequences for introducing molecular tags. Therefore, this methodology avoids the need to design particular primers, synthesize the primers, and incorporate such primers with nucleic acid barcode sequences.
  • this differential cleavage methodology increases amplification efficiency due to template length. For example, increased amplification efficiency can be due to reduced inhibitory effects of secondary structures associated with long templates. Altogether, this may be a more efficient (e.g., time-efficient and cost-efficient) process for distinguishing amplicons.
  • FIG. 10 depicts the processing of RNA in a droplet, in accordance with an embodiment involving differentially cleaving RNA. Specifically, FIG. 10 shows the processing of RNA during the encapsulation stage. Although not explicitly shown in FIG. 10, gDNA can also be processed in parallel.
  • Truncation oligonucleotides are introduced, which hybridize to regions of the single stranded RNA.
  • the truncation oligonucleotides are DNA oligonucleotides.
  • RNAse e.g., RNAse H
  • cleaves the RNA-DNA duplex e.g., RNAse H
  • the RNAse will differentially cleave the RNA at different start and/or stop sites.
  • RNA analytes can be differentially cleaved, thereby resulting in RNA analytes with differing start and/or stop sites.
  • differential cleave of RNA by RNAse results in RNA of different start and/or stop sites.
  • a reverse primer can hybridize with the RNA and a complementary cDNA is generated. Given the different sequences of RNA analytes corresponding to different start and stop sites, the cDNA will similarly have different sequences corresponding to different start and stop sites.
  • the different sequences corresponding to different start and stop sites can be propagated during subsequent nucleic acid amplification, thereby generating amplicons of different sequences.
  • the different sequences of the amplicons corresponding to different start and stop sites represent molecular tags which enables distinguishing of amplicons that derive from different RNA analytes.
  • Methods disclosed herein involve incorporating alternate bases into amplicons as a means of distinguishing amplicons.
  • alternate bases can be incorporated during reverse transcription and/or nucleic acid amplification.
  • molecular tags can be created during real time amplification of template strands. The presence and location of alternate bases within amplicons can be informative for distinguishing amplicons.
  • This methodology eliminates the use of primer sequences for introducing molecular tags. Therefore, this methodology avoids the need to design particular primers, synthesize the primers, and incorporate such primers with nucleic acid barcode sequences. Additionally, as primers do not include an additional adaptor, this improves flexibility in targeting specific regions and further improves amplification efficiency as the resulting amplicon is shortened in length. Furthermore, the methodology remains compatible with nucleic acid amplification systems. Altogether, this may be a more efficient (e.g., time-efficient and cost-efficient) process for distinguishing amplicons.
  • FIG. 11 A depicts the processing of RNA in a droplet, in accordance with an embodiment incorporating alternate bases. Specifically, FIG. 11 A shows the processing of RNA during the encapsulation stage. Although not explicitly shown in FIG. 11 A, gDNA can also be processed in parallel. The top panel shows the introduction of a reverse primer.
  • alternate bases can be introduced.
  • alternate bases are included in the reagents 120 (shown in FIG. 1) at a particular ratio relative to dNTPs.
  • alternate bases include universal bases.
  • alternate bases include bases other than deoxynucleotides (dNTPs) (e.g., other than deoxyadenosine 5 ’-triphosphate, deoxyguanine 5 ’-trisphosphate, deoxycytidine 5 ’-triphosphate, and deoxythymidine 5 ’-triphosphate).
  • dNTPs deoxynucleotides
  • a cDNA is generated which may include one or more alternate bases.
  • the cDNA includes alternate base 1 and alternate base 2.
  • a cDNA can include fewer or additional alternate bases.
  • the number of alternate bases introduced into the cDNA can be controlled based on the ratio of alternate bases that are included in the reagents 120 relative to dNTPs.
  • alternate bases are incorporated at random locations in the cDNA.
  • FIG. 1 IB depicts the amplification and barcoding of nucleic acids derived from RNA, in accordance with the embodiment shown in FIG. 11 A.
  • FIG. 1 IB shows the steps performed during the barcoding stage.
  • one or more alternate bases can be introduced into amplicons based on ratio of alternate bases that are included in the reaction mixture 140 relative to dNTPs.
  • alternate base 3 can be introduced into a cDNA amplicon in an earlier cycle of PCR and alternate base 4 can be introduced into a cDNA amplicon in a later cycle of PCR.
  • alternate base 1 and alternate base 2 are propagated through the cycles of nucleic acid amplification. Therefore, the final cDNA amplicon can include alternate bases 1, 2, 3, and 4.
  • the cDNA amplicons including the alternate bases are subsequently sequenced.
  • the presence of the multiple alternate bases are detected in the sequences and can be correlated across the different amplicon sequences.
  • one or more sequence reads of amplicons with common alternate bases can be identified.
  • the common alternate bases indicate that the corresponding amplicons derived from the same RNA analyte.
  • the sequence reads of amplicons including the common alternate bases can be assigned to a single RNA analyte of the cell.
  • methods disclosed herein involve in situ processing of RNA of cells followed by single cell analysis.
  • the in situ processing can be performed while the cells are in a bulk (e.g., non-single cell) and therefore, a plurality of cells can be processed simultaneously.
  • wash steps to remove excess reagents can be performed in situ whereas was steps cannot be performed in a droplet.
  • excess molecular tags can be removed through a wash process. By removing excess reagents, including excess molecular tags, this enables more accurate downstream quantification of molecules (e.g., RNA transcripts).
  • the in situ processing involves providing primer sequences that are capable of hybridizing with a corresponding sequence of RNA molecules in cells.
  • the primer sequences include reverse primers that can be used to initiate reverse transcription to generate complementary DNA (cDNA) molecules.
  • the primer sequences are not used to initiate reverse transcription.
  • primer sequences can undergo ligation to generate sequences for subsequent single cell analysis. Further examples of in situ reverse transcription and in situ ligation are described in further detail below.
  • in situ processing of cells can involve fixing and/or permeabilizing cells prior to providing the primer sequences.
  • fixing the cells involves providing a fixing agent to the cells.
  • the fixing agent can include any of methanol, acetone, a mixture of methanol and acetone (e.g., 1 : 1 mixture), paraformaldehyde (e.g., 1-10%), DSP (dithiobis(succinimidyl propionate)) (e.g., 0.1 mM - 10 mM), SPDP (succinimidyl 3-(2-pyridyldithio)propionate) (e.g., 0.1 mM - 10 mM), and a mixture of DSP and SPDP (mix of 0.1 mM - 10 mM each).
  • cells can be incubated with the fixative agent at - 20°C, on ice, at 4 °C, or at 10 °C.
  • fixative agent For the paraformaldehyde, DSP, SPDP, and DSP/SPDP methods, cells can be incubated with the fixative agent at: 4°C, 10°C, 20°C, 25°C, room temperature, or 37°C. In various embodiments, the cells can be incubated with the fixative agent for at least 5 minutes, at least 10 minutes, at least 30 minutes, at least 45 minutes, at least 1 hour, at least 2 hours, or at least 3 hours.
  • permeabilizing the cells involves exposing the cells to a permeabilizing agent.
  • Example permeabilizing agents include Tween-20 (e.g., 0.01-1%), Triton X-100 (e.g., 0.01-1%), or saponin (e.g., 0.01-1%).
  • the permeabilization step is optional and need not be performed.
  • Cell permeabilization may occur at an incubation temperature of any of the following: on ice, 4C, 10C, 20C, room temperature, 25C, or 37C. The incubation duration for cell permeabilization takes place for any of the following: 1 min, 3 min, 5 min, 10 min, 15 min, 20 min, or 30 min.
  • in situ processing of cells can involve washing the cells. Washing the cells involves exposing the cells to a cell wash buffer.
  • Example cell wash buffers include any of DPBS, FBS, or a combination of DPBS and FBS (e.g., DPBS + 0.5% FBS, DPBS + 1% FBS, DPBS + 2.5% FBS, or DPBS + 5% FBS).
  • the cells can undergo a centrifugation (e.g., at 300 x g or 400 x g) to remove the cell wash buffer.
  • cells are washed at any of the following temperatures: 4C, 10C, 15C, 20C, 25C, or 37C.
  • cells can be washed more than once. For example, cells can be washed 2 times, 3 times, 4 times, or 5 times.
  • cells are loaded into a single cell analysis system (e.g., Tapestri® platform). As described herein in further detail, cells can under cell encapsulation, analyte release, cell barcoding, and target amplification.
  • a single cell analysis system e.g., Tapestri® platform.
  • cells can under cell encapsulation, analyte release, cell barcoding, and target amplification.
  • FIG. 12A depicts the in situ processing of RNA, in accordance with a first embodiment.
  • FIG. 12A depicts an in situ processing methodology that involves providing primer sequences that undergo ligation to generate sequences for subsequent single cell analysis. The primer sequences are not used to initiate reverse transcription.
  • FIG. 12A shows a RNA molecule from a cell, which includes a RNA sequence (labeled as “RNA” in FIG. 12 A) and a poly- A tail.
  • the in situ processing involves providing a primer sequence, which includes a poly-T sequence, a molecular tag sequence (labeled as “MT” in FIG. 12 A), and a PCR adaptor sequence.
  • the poly-T sequence of the primer sequence hybridizes with a sequence of the poly A tail of the RNA molecule.
  • the in situ processing involves providing a second sequence, shown in FIG. 12A as a “gene specific” sequence.
  • the gene specific sequence is designed to hybridize with a corresponding sequence of the RNA sequence of the RNA molecule.
  • the gene specific sequence includes a phosphate group, such as a 5’ phosphate group.
  • a ligase e.g., DNA ligase
  • the ligase ligates the gene specific sequence by catalyzing the formation of a phosphodiester bond between the 5’ phosphate group of the gene specific sequence and the 3 ’-hydroxyl group of the poly-T sequence.
  • This now generates an intermediate nucleic acid that includes 1) the gene specific sequence, 2) the poly-T sequence, 3) the molecular tag (MT), and 4) the PCR adaptor.
  • the cell can be provided for single cell partitioning.
  • FIG. 12B depicts the in droplet processing of RNA, in accordance with the embodiment shown in FIG. 12 A.
  • the in droplet processing shown in FIG. 12B refers to the steps that occur in a first droplet during analyte release 165 (see FIG. IB) of a two step process.
  • FIG. 12B depicts the intermediate nucleic acid generated via the in situ processing, the intermediate nucleic acid including the 1) the gene specific sequence, 2) the poly-T sequence, 3) the molecular tag (MT), and 4) the PCR adaptor.
  • an additional primer sequence is provided which enables nucleic acid amplification. As shown in FIG.
  • the additional primer sequence includes a gene specific sequence and a constant sequence (shown as “seq8” sequence in FIG. 12B).
  • the gene specific sequence of the additional primer sequence hybridizes with the gene specific sequence of the intermediate nucleic acid and serves to initiate nucleic acid amplification to generate amplicons.
  • the in-droplet processing can further include processing of genomic DNA (gDNA), in accordance with the methods described herein.
  • gDNA genomic DNA
  • gDNA can be released by exposure to a protease (e.g., proteinase K), as is described in further detail in FIGs. 3A-3C.
  • a protease e.g., proteinase K
  • the free gDNA can additionally undergo priming and nucleic acid amplification.
  • an additional cell barcoding step (e.g., a cell barcoding 170 step shown in FIG. IB) is performed to incorporate a cell barcode.
  • a barcode bead including a plurality of cell barcodes can be provided in a reaction mixture in a droplet, such that the cell barcodes can be incorporated into the amplicons.
  • a cell barcode can be included in a primer sequence that includes a constant region sequence that is complementary to the constant sequence (shown as “seq8” sequence).
  • the cell barcode is incorporated into the amplicon.
  • the bottom panel of FIG. 12B shows an example final amplicon following single cell processing.
  • the amplicon includes at least 1) a cell barcode sequence, 2) constant region (shown as “seq8”), 3) gene specific sequence, 4) poly-T sequence (or a complement such as a poly-A sequence), and 5) molecular tag.
  • the amplicon can further include read sequences such as P5 index and P7 index sequences and/or a read 2 sequence.
  • the amplicon can undergo subsequent sequencing. Given the read sequences, the amplicon can be attributed to a particular cell via the cell barcode. Additionally and/or alternatively, the amplicon can be attributed to a particular RNA molecule via the gene specific sequence and the molecular tag.
  • FIG. 13 A depicts the in situ processing of RNA, in accordance with a second embodiment.
  • FIG. 13 A depicts an in situ processing methodology that involves providing primer sequences that undergo ligation to generate sequences for subsequent single cell analysis. The primer sequences are not used to initiate reverse transcription.
  • FIG. 13A differs from FIG. 12A in which the provided primers can be gene specific primers that hybridize with a specific RNA sequence as opposed to a poly-A tail.
  • FIG. 13 A shows a RNA molecule from a cell, which includes a first RNA sequence (labeled as “RNA sequence 1” in FIG. 13A), a second RNA sequence (labeled as “RNA sequence 2”) and a poly-A tail.
  • the first RNA sequence and second RNA sequence may be adjacent to one another.
  • the in situ processing involves providing a primer sequence, which includes a gene specific sequence (shown in FIG. 13 A as “Gene specific 2”), a molecular tag sequence (labeled as “MT” in FIG. 13 A), and a PCR adaptor sequence.
  • the gene specific sequence of the primer sequence hybridizes with a RNA specific sequence (e.g., RNA sequence 2) of the RNA molecule.
  • the in situ processing involves providing a second gene specific sequence, shown in FIG. 13A as a “gene specific 1” sequence.
  • the gene specific 1 sequence is designed to hybridize with a corresponding RNA sequence (e.g., RNA sequence 1) of the RNA molecule.
  • the gene specific 1 sequence includes a phosphate group, such as a 5’ phosphate group.
  • the two gene specific sequences undergo ligation.
  • a ligase e.g., DNA ligase
  • the ligase ligates the gene specific sequences by catalyzing the formation of a phosphodiester bond between the 5’ phosphate group of the gene specific 1 sequence and the 3 ’-hydroxyl group of the gene specific 2 sequence.
  • This generates an intermediate nucleic acid that includes 1) the first gene specific sequence (e.g., gene specific 1), 2) the second gene specific sequence (e.g., gene specific 2), 3) the molecular tag (MT), and 4) the PCR adaptor.
  • the cell can be provided for single cell partitioning.
  • FIG. 13B depicts the in droplet processing of RNA, in accordance with the embodiment shown in FIG. 13 A.
  • the in droplet processing shown in FIG. 13B refers to the steps that occur in a first droplet during analyte release 165 (see FIG. IB) of a two step process.
  • FIG. 13B depicts the intermediate nucleic acid generated via the in situ processing, the intermediate nucleic acid including the 1) the gene specific sequence, 2) the poly-T sequence, 3) the molecular tag (MT), and 4) the PCR adaptor.
  • an additional primer sequence is provided which enables nucleic acid amplification. As shown in FIG.
  • the additional primer sequence includes a gene specific sequence and a constant sequence (shown as “seq8” sequence in FIG. 13B).
  • the gene specific sequence of the additional primer sequence hybridizes with the gene specific 1 sequence of the intermediate nucleic acid and serves to initiate nucleic acid amplification to generate amplicons.
  • the in-droplet processing can further include processing of genomic DNA (gDNA), in accordance with the methods described herein.
  • gDNA genomic DNA
  • gDNA can be released by exposure to a protease (e.g., proteinase K), as is described in further detail in FIGs. 3A-3C.
  • a protease e.g., proteinase K
  • the free gDNA can additionally undergo priming and nucleic acid amplification.
  • an additional cell barcoding step (e.g., a cell barcoding 170 step shown in FIG. IB) is performed to incorporate a cell barcode.
  • a barcode bead including a plurality of cell barcodes can be provided in a reaction mixture in a droplet, such that the cell barcodes can be incorporated into the amplicons.
  • a cell barcode can be included in a primer sequence that includes a constant region sequence that is complementary to the constant sequence (shown as “seq8” sequence).
  • the cell barcode is incorporated into the amplicon.
  • the bottom panel of FIG. 13B shows an example final amplicon following single cell processing.
  • the amplicon includes at least 1) a cell barcode sequence, 2) constant region (shown as “seq8”), 3) a first gene specific sequence (e.g., gene specific 1), 4) a second gene specific sequence (e.g., gene specific 2), and 5) molecular tag.
  • the amplicon can further include read sequences such as P5 index and P7 index sequences and/or a read 2 sequence.
  • the amplicon can undergo subsequent sequencing. Given the read sequences, the amplicon can be attributed to a particular cell via the cell barcode. Additionally and/or alternatively, the amplicon can be attributed to a particular RNA molecule via the gene specific sequence and the molecular tag.
  • FIG. 14A depicts the in situ processing of RNA, in accordance with a third embodiment.
  • FIG. 14A differs from FIG. 13A in which a second primer sequence is provided which includes 1) a gene specific sequence (labeled as “gene specific 1”), 2) a molecular tag, and 3) a constant region (labeled as “seq8”).
  • FIG. 14A shows a RNA molecule from a cell, which includes a first RNA sequence (labeled as “RNA sequence 1” in FIG. 1 A), a second RNA sequence (labeled as “RNA sequence 2”) and a poly-A tail.
  • the first RNA sequence and second RNA sequence may be adjacent to one another.
  • the in situ processing involves providing a primer sequence, which includes a gene specific sequence (shown in FIG. 14A as “Gene specific 2”), a molecular tag sequence (labeled as “MT” in FIG. 14A), and a PCR adaptor sequence.
  • the gene specific sequence of the primer sequence hybridizes with a RNA specific sequence (e.g., RNA sequence 2) of the RNA molecule.
  • the in situ processing involves providing a second primer sequence which includes 1) a gene specific sequence (labeled as “gene specific 1”), 2) a molecular tag, and 3) a constant region (labeled as “seq8”).
  • the gene specific 1 sequence is designed to hybridize with a corresponding RNA sequence (e.g., RNA sequence 1) of the RNA molecule.
  • the gene specific 1 sequence includes a phosphate group, such as a 5’ phosphate group.
  • two molecular tags are provided via two primer sequences. However, in some embodiments, only a single molecular tag is included.
  • a first primer sequence includes 1) gene specific sequence (labeled as “gene specific 2”), 2) a molecular tag, and 3) PCR adaptor
  • the second primer sequence includes 1) gene specific sequence (labeled as “gene specific 1”) and 2) constant region (labeled as “seq8”).
  • a first primer sequence includes 1) gene specific sequence (labeled as “gene specific 2”) and 2) PCR adaptor
  • the second primer sequence includes 1) gene specific sequence (labeled as “gene specific 1”), 2) molecular tag, and 2) constant region (labeled as “seq8”).
  • the two gene specific sequences undergo ligation.
  • a ligase e.g., DNA ligase
  • the ligase ligates the gene specific sequences by catalyzing the formation of a phosphodiester bond between the 5’ phosphate group of the gene specific 1 sequence and the 3 ’-hydroxyl group of the gene specific 2 sequence. This generates an intermediate nucleic acid that includes 1) the first gene specific sequence (e.g., gene specific 1), 2) the second gene specific sequence (e.g., gene specific 2), 3) one or two molecular tags, and 4) the PCR adaptor. Following ligation, the cell can be provided for single cell partitioning.
  • FIGs. 14B and 14C depict the in droplet processing of RNA, in accordance with the embodiment shown in FIG. 14 A.
  • FIG. 14B shows in droplet processing of RNA as well as genomic DNA.
  • the in droplet processing shown in FIG. 13B refers to the steps that occur in a first droplet during analyte release 165 (see FIG. IB) of a two step process.
  • the in droplet processing shown in FIG. 13B refers to the steps that occur in a second droplet during cell barcoding 170 (see FIG. IB).
  • a forward and reverse primer pair can be provided. As shown in FIG.
  • the forward primer can include a gene specific sequence (labeled as “Gene specific FW”) and a constant region (labeled as “seq8”).
  • the reverse primer can include another gene specific sequence (labeled as “gene specific RV”) and a read sequence (“Read 2”).
  • the DNA template can undergo nucleic acid amplification to generate amplicons that include 1) constant region (“seq8”), 2) DNA template sequence, and 3) read sequence (“Read 2”).
  • FIG. 14B shows the intermediate nucleic acid generated via the in situ processing that includes 1) the first gene specific sequence (e.g., gene specific 1), 2) the second gene specific sequence (e.g., gene specific 2), 3) one or two molecular tags, and 4) the PCR adaptor.
  • FIG. 14B depicts the intermediate nucleic acid generated via the in situ processing, the intermediate nucleic acid including the 1) the gene specific sequence, 2) the poly-T sequence, 3) the molecular tag (MT), and 4) the PCR adaptor.
  • an additional primer sequence is provided which enables nucleic acid amplification.
  • the additional primer sequence includes a gene specific sequence and a constant sequence (shown as “seq8” sequence in FIG. 13B).
  • the gene specific sequence of the additional primer sequence hybridizes with the gene specific 1 sequence of the intermediate nucleic acid and serves to initiate nucleic acid amplification to generate amplicons.
  • the inclusion of the constant region “seq8” in the intermediate nucleic acid derived from RNA solves an issue that arises when processing both genomic DNA and RNA. Specifically, the quantity of RNA molecules and gDNA molecules in a single cell is biased towards RNA (e.g., there is 100-1000 fold more RNA molecules than gDNA molecules). Therefore, a final sequencing library would typically be biased towards RNA, resulting in underrepresentation of DNA.
  • the inclusion of the constant region “seq8” in the intermediate nucleic acid derived from RNA reduces exponential amplification of the intermediate nucleic acid derived from RNA relative to the DNA template.
  • the steps shown in FIG. 14B are performed at an elevated temperature between 55°C and 65°C. In particular embodiments, the steps shown in FIG. 14B are performed at an elevated temperature between 56°C and 64°C, between 57°C and 63°C, between 58°C and 62°C, or between 59°C and 61°C. In particular embodiments, the steps shown in FIG. 14B are performed at an elevated temperature of about 61 °C.
  • FIG. 14C shows steps that occur in a second droplet during cell barcoding 170 (see FIG. IB).
  • amplification includes providing primer sequences which include 1) common sequence (labeled as “seq8”), 2) cell barcode sequence, and 3) read sequence (“Read 1”). These primer sequences hybridize with constant sequences (labeled as “seq8”) that are present on both the DNA and RNA sequences.
  • primer sequences which include 1) common sequence (labeled as “seq8”), 2) cell barcode sequence, and 3) read sequence (“Read 1”).
  • These primer sequences hybridize with constant sequences (labeled as “seq8”) that are present on both the DNA and RNA sequences.
  • nucleic acid amplification is initiated using the primer sequences to incorporate cell barcodes into the resulting DNA and RNA amplicons.
  • the steps shown in FIG. 14C can be performed at a temperature that facilitates the annealing of the primer sequences.
  • the steps shown in FIG. 14C are performed at an elevated temperature between 45°C and 55°C.
  • the steps shown in FIG. 14C are performed at an elevated temperature between 45°C and 52°C, between 45°C and 50°C, between 46°C and 49°C, or between 47°C and 48°C.
  • the steps shown in FIG. 14C are performed at an elevated temperature of about 48°C.
  • FIG. 14D depicts example amplicons of the DNA and RNA library, in accordance with the embodiments shown in FIGs. 14A-14C.
  • the final DNA amplicon includes at least 1) a cell barcode sequence, 2) constant region (shown as “seq8”), 3) DNA sequence (e.g., DNA template specific 1).
  • the DNA amplicon can further include read sequences such as P5 index and P7 index sequences and/or a read 2 sequence.
  • a molecular tag is not present in the DNA amplicon.
  • the final RNA amplicon includes at least 1) a cell barcode sequence, 2) constant region (shown as “seq8”), 3) one or two gene specific sequences (e.g., gene specific 1 and/or gene specific 2), and 4) one or two molecular tags.
  • the RNA amplicon can further include read sequences such as P5 index and P7 index sequences and/or a read 2 sequence.
  • the DNA and RNA amplicons can undergo subsequent sequencing. Given the read sequences, DNA and RNA amplicon sequences can be attributed to a particular cell via the cell barcode. Additionally, RNA amplicons can be attributed to a particular RNA molecule via the gene specific sequence and the molecular tag(s).
  • Embodiments disclosed herein further involve performing in situ reverse transcription (RT).
  • RT in situ reverse transcription
  • RT efficiency can be constrained by droplet characteristics (e.g., volume, concentration of reagents).
  • wash steps to remove excess reagents can be performed in situ whereas wash steps cannot be performed in a droplet.
  • RT reagents are introduced in situ, excess RT reagents can be removed through a wash process. By removing excess RT reagents, including excess molecular tags, this enables more accurate downstream quantification of molecules (e.g., RNA transcripts).
  • FIG. 15A depicts an in situ processing methodology that involves providing primer sequences that initiate reverse transcription to generate sequences for subsequent single cell analysis.
  • FIG. 15A shows a RNA molecule from a cell, which includes a RNA sequence (labeled as “RNA” in FIG. 15 A) and a poly-A tail.
  • the in situ processing involves providing a primer sequence, which includes a poly-T sequence, a molecular tag sequence (labeled as “MT” in FIG. 15 A), and a PCR adaptor sequence.
  • the poly-T sequence of the primer sequence hybridizes with a sequence of the poly A tail of the RNA molecule.
  • the in situ processing involves providing a reverse transcriptase enzyme for performing reverse transcription.
  • Reverse transcriptase extends the poly-T sequence to generate a complementary strand with a sequence complementary to the RNA sequence.
  • the cell can be provided for single cell partitioning.
  • FIG. 15B depicts the in situ processing of RNA involving reverse transcription, in accordance with a second embodiment.
  • FIG. 15B differs from FIG. 15A in which the provided primers are gene specific primers that hybridize with a specific RNA sequence as opposed to a poly-A tail.
  • FIG. 15B shows a RNA molecule from a cell, which includes a first RNA sequence (labeled as “RNA sequence 1” in FIG. 15B), a second RNA sequence (labeled as “RNA sequence 2” in FIG. 15B), and a poly-A tail.
  • the first RNA sequence and second RNA sequence may be adjacent to one another.
  • the in situ processing involves providing a reverse primer sequence, which includes a gene specific sequence that is complementary to a RNA sequence (e.g., RNA sequence 2), a molecular tag sequence (labeled as “MT” in FIG. 15B), and a PCR adaptor sequence.
  • the gene specific sequence of the primer sequence hybridizes with a RNA specific sequence (e.g., RNA sequence 2) of the RNA molecule.
  • the in situ processing involves providing a reverse transcriptase enzyme for performing reverse transcription. Reverse transcriptase extends the gene specific sequence to generate a complementary strand with a sequence complementary to RNA sequence 1.
  • FIG. 15C depicts the in droplet processing (e.g., encapsulation and barcoding), in accordance with an embodiment.
  • FIG. 15C shows a top panel which includes steps that occur in a first droplet during analyte release 165 (see FIG. IB), and a bottom panel which includes steps that occur in a second droplet during cell barcoding 170 (see FIG. IB).
  • a gene specific primer is provided to hybridize with a complementary gene specific sequence of the intermediate nucleic acid derived from RNA.
  • the gene specific primer includes a gene specific sequence (labeled as “gene specific” in FIG. 15C) and a constant region (labeled as “seq8” in FIG. 15C).
  • a forward and reverse primer pair is provided for hybridizing with complementary sequences of the genomic DNA sequence.
  • the forward primer can include a forward primer sequence (“fwd primer”) that hybridizes with a complementary sequence of the genomic DNA sequence, and further includes a constant region (“seq8F”).
  • the reverse primer can include a reverse primer sequence (“rev primer”) that hybridizes with a complementary sequence of the genomic DNA sequence, and further includes a read 2 sequence.
  • amplification takes place to incorporate cell barcode sequences into both DNA and RNA amplicons.
  • amplification includes providing primer sequences which include 1) common sequence (labeled as “seq8”), 2) cell barcode sequence (labeled as “CBC”), and 3) read sequence (“Read 1”). These primer sequences hybridize with constant sequences (labeled as “seq8”) that are present on both the DNA and RNA amplicons.
  • primer sequences which include 1) common sequence (labeled as “seq8”), 2) cell barcode sequence (labeled as “CBC”), and 3) read sequence (“Read 1”).
  • FIG. 15D depicts example amplicons of the DNA and RNA library, in accordance with the embodiments shown in FIGs. 15A-15C.
  • the final DNA amplicon includes at least 1) a cell barcode sequence, 2) constant region (shown as “seq8”), and 3) gDNA sequence.
  • the DNA amplicon can further include read sequences such as P5 index and P7 index sequences and/or a read 2 sequence.
  • a molecular tag is not present in the DNA amplicon.
  • the final RNA amplicon includes at least 1) a cell barcode sequence, 2) constant region (shown as “seq8”), 3) a gene specific sequence, and 4) a molecular tag.
  • the RNA amplicon can further include read sequences such as P5 index and P7 index sequences and/or a read 2 sequence.
  • the DNA and RNA amplicons can undergo subsequent sequencing. Given the read sequences, DNA and RNA amplicon sequences can be attributed to a particular cell via the cell barcode. Additionally, RNA amplicons can be attributed to a particular RNA molecule via the gene specific sequence and the molecular tag(s).
  • Embodiments of the invention involve providing one or more barcode sequences for labeling analytes of a single cell during step 170 shown in FIG. IB.
  • the one or more barcode sequences are encapsulated in an emulsion with a cell lysate derived from a single cell.
  • the one or more barcodes label analytes of the cell, thereby enabling the subsequent determination that sequence reads derived from the analytes originated from the cell.
  • a plurality of barcodes are added to an emulsion with a cell lysate.
  • the plurality of barcodes added to an emulsion includes at least 10 2 , at least 10 3 , at least 10 4 , at least 10 5 , at least 10 5 , at least 10 6 , at least 10 7 , or at least 10 8 barcodes.
  • the plurality of barcodes added to an emulsion have the same barcode sequence.
  • the plurality of barcodes added to an emulsion comprise molecular tag sequences.
  • molecular tag sequences can be provided in another stage of the single cell analysis, examples of which are described herein.
  • a molecular tag has a sequence which can be used to identify and/or distinguish one or more first molecules to which the molecular tag is conjugated from one or more second molecules.
  • both a barcode sequence and a molecular tag are incorporated into a barcode.
  • a molecular tag is used to distinguish between molecules of a similar type within a population or group, whereas a barcode sequence is used to distinguish between populations or groups of molecules that are derived from different cells.
  • a molecular tag can be used to count or quantify numbers of particular molecules (e.g., quantify number of RNA transcripts).
  • the molecular tag is shorter in sequence length than the barcode sequence.
  • the barcodes are single-stranded barcodes.
  • Single-stranded barcodes can be generated using a number of techniques. For example, they can be generated by obtaining a plurality of DNA barcode molecules in which the sequences of the different molecules are at least partially different. These molecules can then be amplified so as to produce single stranded copies using, for instance, asymmetric PCR. Alternatively, the barcode molecules can be circularized and then subjected to rolling circle amplification. This will yield a product molecule in which the original DNA barcoded is concatenated numerous times as a single long molecule.
  • circular barcode DNA containing a barcode sequence flanked by any number of constant sequences can be obtained by circularizing linear DNA. Primers that anneal to any constant sequence can initiate rolling circle amplification by the use of a strand displacing polymerase (such as Phi29 polymerase), generating long linear concatemers of barcode DNA.
  • a strand displacing polymerase such as Phi29 polymerase
  • barcodes can be linked to a primer sequence that enables the barcode to label a target nucleic acid.
  • the barcode is linked to a forward primer sequence.
  • the forward primer sequence is a gene specific primer that hybridizes with a forward target of a nucleic acid.
  • the forward primer sequence is a constant region, such as a PCR handle, that hybridizes with a complementary sequence attached to a gene specific primer.
  • the complementary sequence attached to a gene specific primer can be provided in the reaction mixture (e.g., reaction mixture 140 in FIG. IB). Including a constant forward primer sequence on barcodes may be preferable as the barcodes can have the same forward primer and need not be individually designed to be linked to gene specific forward primers.
  • barcodes can releasably attached to a support structure, such as a bead. Therefore, a single bead with multiple copies of barcodes can be partitioned into an emulsion with a cell lysate, thereby enabling labeling of analytes of the cell lysate with the barcodes of the bead.
  • Example beads include solid beads (e.g., silica beads), polymeric beads, or hydrogel beads (e.g., polyacrylamide, agarose, or alginate beads). Beads can be synthesized using a variety of techniques. For example, using a mix-split technique, beads with many copies of the same, random barcode sequence can be synthesized.
  • the beads can be divided into four collections and each mixed with a buffer that will add a base to it, such as an A, T, G, or C.
  • a base such as an A, T, G, or C.
  • each subpopulation can have one of the bases added to its surface. This reaction can be accomplished in such a way that only a single base is added and no further bases are added.
  • the beads from all four subpopulations can be combined and mixed together, and divided into four populations a second time. In this division step, the beads from the previous four populations may be mixed together randomly. They can then be added to the four different solutions, adding another, random base on the surface of each bead.
  • This process can be repeated to generate sequences on the surface of the bead of a length approximately equal to the number of times that the population is split and mixed. If this was done 10 times, for example, the result would be a population of beads in which each bead has many copies of the same random 10-base sequence synthesized on its surface. The sequence on each bead would be determined by the particular sequence of reactors it ended up in through each mix-split cycle. Additional details of example beads and their synthesis is described in International Application No. PCT/US2016/016444, which is hereby incorporated by reference in its entirety.
  • a molecular tag is a nucleic acid sequence which can be used to identify and/or distinguish one or more first analytes to which the molecular tag is conjugated from one or more second analytes to which a different molecular tag is conjugated.
  • a molecular tag includes at least a contiguous string of nucleotides.
  • Molecular tags may be single or double stranded.
  • a nucleic acid for sequencing such as a DNA amplicon or a RNA amplicon, includes one or more molecular tags.
  • Each amplicon (e.g., DNA amplicon or RNA amplicon) may include one or more (e.g., two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, or ten or more) molecular tags.
  • a molecular tag may include about 6 to about 20 nucleotides. In some embodiments, a molecular tag includes 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides. In some embodiments, a molecular tag includes at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or at least 20 nucleotides in length. In various embodiments, every molecular tag is different from every other molecular tag (e.g., each molecular tag is unique). In various embodiments, a molecular tag is randomly generated. In such embodiments, molecular tags may not be unique.
  • a first randomly generated molecular tag may have a same sequence as a sequence of a second randomly generated molecular tag.
  • a molecular tag includes a nucleic acid sequence that is not complementary to a corresponding nucleic acid sequence of an analyte (or a sequence derived from an analyte).
  • a molecular tag includes a nucleic acid sequence that is less than 80% complementary, less than 70% complementary, less than 60% complementary, less than 50% complementary, less than 40% complementary, less than 30% complementary, less than 20% complementary, or less than 10% complementary to a corresponding nucleic acid sequence of an analyte.
  • one or more molecular tags can be incorporated to distinguish between analytes via a variety of methods.
  • one or more molecular tags are incorporated during in situ processing of cells.
  • one or more molecular tags are incorporated during cell encapsulation 160 or analyte release 165 (shown in FIG. IB).
  • one or more molecular tags are incorporated during cell barcoding 170 and/or target amplification 175 (shown in FIG. IB).
  • Embodiments described herein include the encapsulation of a cell with reagents within an emulsion.
  • the reagents interact with the encapsulated cell under conditions in which the cell is lysed, thereby releasing target analytes of the cell.
  • the reagents can further interact with target analytes to prepare for subsequent barcoding and/or amplification.
  • the reagents include one or more lysing agents that cause the cell to lyse.
  • lysing agents include detergents such as Triton X-100, Nonidet P-40 (NP40) as well as cytotoxins.
  • the reagents further include agents that interact with target analytes that are released from a single cell.
  • One example of such an agent includes reverse transcriptase which reverse transcribes messenger RNA transcripts released from the cell to generate corresponding cDNA.
  • the reagents encapsulated with the cell include ddNTPs, inhibitors such as ribonuclease inhibitor, and stabilization agents such as dithothreitol (DTT).
  • the reagents further include proteases that assist in the lysing of the cell and/or accessing of genomic DNA.
  • proteases in the reagents can include any of proteinase K, pepsin, protease — subtilisin Carlsberg, protease type X- bacillus therm oproteolyticus, or protease type XIII — aspergillus Saitoi.
  • the reagents include deoxyribonucleotide triphosphate (dNTP) reagents including deoxyadenosine triphosphate, deoxycytosine triphosphate, deoxyguanine triphosphate, and deoxythymidine triphosphate.
  • dNTP deoxyribonucleotide triphosphate
  • the reagents include agents that interact with target analytes that are released from a single cell.
  • the reagents include reverse transcriptase which reverse transcribes mRNA transcripts released from the cell to generate corresponding cDNA.
  • the reagents include primers that hybridize with mRNA transcripts, thereby enabling the reverse transcription reaction to occur.
  • such primers are digestible oligonucleotides that participate in the reverse transcription reaction, but are subsequently digested to prevent their participation in subsequent reactions.
  • the reagents include agents for digesting the digestible oligonucleotides.
  • the agents digest the digestible oligonucleotides while in a droplet, such as a first droplet generated during the cell encapsulation step (step 160 in FIG. IB).
  • agents for digesting the digestible oligonucleotides are enzymes.
  • an agent for digesting the digestible oligonucleotides is a RNaseH enzyme.
  • a reaction mixture is provided into an emulsion with a cell lysate (e.g., see cell barcoding step 170 in FIG. IB).
  • the reaction mixture includes reactants sufficient for performing a reaction, such as nucleic acid amplification, on analytes of the cell lysate.
  • the reaction mixture includes primers that are capable of acting as a point of initiation of synthesis along a complementary strand when placed under conditions in which synthesis of a primer extension product which is complementary to a nucleic acid strand is catalyzed.
  • the reaction mixture includes the four different deoxyribonucleoside triphosphates (adenosine, guanine, cytosine, and thymine).
  • the reaction mixture includes enzymes for nucleic acid amplification. Examples of enzymes for nucleic acid amplification include DNA polymerase, thermostable polymerases for thermal cycled amplification, or polymerases for multiple-displacement amplification for isothermal amplification.
  • amplification may also be applied, such as amplification using DNA-dependent RNA polymerases to create multiple copies of RNA from the original DNA target which themselves can be converted back into DNA, resulting in, in essence, amplification of the target.
  • Living organisms can also be used to amplify the target by, for example, transforming the targets into the organism which can then be allowed or induced to copy the targets with or without replication of the organisms.
  • the reagents include deoxyribonucleotide triphosphate (dNTP) reagents including deoxyadenosine triphosphate, deoxycytosine triphosphate, deoxyguanine triphosphate, and deoxythymidine triphosphate.
  • dNTP deoxyribonucleotide triphosphate
  • the extent of nucleic amplification can be controlled by modulating the concentration of the reactants in the reaction mixture. In some instances, this is useful for fine tuning of the reactions in which the amplified products are used.
  • the reaction mixture include agents for digesting the digestible oligonucleotides.
  • agents for digesting the digestible oligonucleotides are enzymes.
  • the agents digest the digestible oligonucleotides while in a droplet, such as a second droplet generated during the barcoding step (step 170 in FIG. IB).
  • the reaction mixture can include enzymes selected from any of UDG, RNaseH, or RNaseA.
  • the digestible oligonucleotides have already primed the RNA transcript and reverse transcription has occurred. Therefore, providing any of these enzymes in the second droplet enables the digestion of the digestible oligonucleotides after they participated in the reverse transcription reaction.
  • Embodiments of the invention described herein use primers to conduct the singlecell analysis.
  • primers are implemented during the workflow process shown in FIG. IB.
  • Primers can be used to prime (e.g., hybridize) with specific sequences of nucleic acids of interest, such that the nucleic acids of interest can be processed (e.g., reverse transcribed, barcoded, and/or amplified). Additionally, primers enable the identification of target regions following sequencing.
  • primers described herein are between 5 and 50 nucleobases in length. In various embodiments, primers described herein are between 7 and 45 nucleobases in length. In various embodiments, primers described herein are between 10 and 40 nucleobases in length. In various embodiments, primers described herein are between 12 and 35 nucleobases in length. In various embodiments, primers described herein are between 15 and 32 nucleobases in length. In various embodiments, primers described herein are between 18 and 30 nucleobases in length. In various embodiments, primers described herein are between 18 and 25 nucleobases in length. [00245] Referring again to FIG.
  • primers can be included in the reagents 120 that are encapsulated with the cell 110.
  • primers included in the reagents are useful for priming RNA transcripts and enabling reverse transcription of the RNA transcripts.
  • primers in the reagents 120 can include RNA primers for priming RNA and/or for priming genomic DNA.
  • the primers included in the reagents are digestible primers included within digestible oligonucleotides. Digestible primers can be digested at the appropriate time to ensure that subsequent reactions are not impacted by the presence of the digestible primers.
  • digestible primers participate in a first reaction, such as a reverse transcriptase reaction, and are digested to prevent their participation in a second reaction, such as a nucleic acid amplification reaction.
  • primers can be included in the reaction mixture 140 that is encapsulated with the cell lysate 130.
  • primers included in the reaction mixture are useful for priming nucleic acids (e.g., cDNA, gDNA, and/or amplicons of cDNA/gDNA) and enabling nucleic acid amplification of the nucleic acids.
  • Such primers in the reaction mixture 140 can include cDNA primers for priming cDNA that have been reverse transcribed from RNA and/or DNA primers for priming genomic DNA and/or for priming products that have been generated from the genomic DNA.
  • primers of the reagents and primers of the reaction mixture form primer sets (e.g., forward primer and reverse primer) for a region of interest on a nucleic acid.
  • primers can be included in or linked with a barcode 145 that is encapsulated with the cell lysate 130. Further description and examples of primers that are used in a single-cell analysis workflow process is described in US Application No. 16/749,731, which is hereby incorporated by reference in its entirety.
  • the number of primers in any of the reagents, the reaction mixture, or with barcodes may range from about 1 to about 500 or more, e.g., about 2 to 100 primers, about 2 to 10 primers, about 10 to 20 primers, about 20 to 30 primers, about 30 to 40 primers, about 40 to 50 primers, about 50 to 60 primers, about 60 to 70 primers, about 70 to 80 primers, about 80 to 90 primers, about 90 to 100 primers, about 100 to 150 primers, about 150 to 200 primers, about 200 to 250 primers, about 250 to 300 primers, about 300 to 350 primers, about 350 to 400 primers, about 400 to 450 primers, about 450 to 500 primers, or about 500 primers or more.
  • primers in the reagents may include primers that are complementary to a target on a nucleic acid of interest (e.g., DNA or RNA).
  • primers in the reagents are gene-specific primers.
  • primers in the reagents are universal primers.
  • Example universal primers include primers including at least 3 consecutive deoxythymidine nucleobases (e.g., oligo dT primer), at least 3 consecutive deoxyuridine sequences (e.g., oligo dU primer), or at least 3 consecutive ribouridine sequences (e.g., oligo rU primer).
  • deoxythymidine nucleobases e.g., oligo dT primer
  • deoxyuridine sequences e.g., oligo dU primer
  • ribouridine sequences e.g., oligo rU primer
  • primers in the reagents are reverse primers.
  • primers in the reagents are only reverse primers and do not include forward primers.
  • primers in the reaction mixture include forward primers that are complementary to a forward target on a nucleic acid of interest (e.g., RNA or gDNA).
  • the reaction mixture includes forward primers that are complementary to a forward target on a cDNA strand (generated from a RNA transcript) and further includes forward primers that are complementary to a forward target on gDNA.
  • primers in the reaction mixture are genespecific primers that target a forward target of a gene of interest.
  • the number of forward or reverse primers for genes of interest that are added may be from about one to 500, e.g., about 1 to 10 primers, about 10 to 20 primers, about 20 to 30 primers, about 30 to 40 primers, about 40 to 50 primers, about 50 to 60 primers, about 60 to 70 primers, about 70 to 80 primers, about 80 to 90 primers, about 90 to 100 primers, about 100 to 150 primers, about 150 to 200 primers, about 200 to 250 primers, about 250 to 300 primers, about 300 to 350 primers, about 350 to 400 primers, about 400 to 450 primers, about 450 to 500 primers, or about 500 primers or more.
  • genes of interest for either DNA-sequencing or RNA-sequencing include, but are not limited to: CCND3, CD44, CCND1, CD33, CDK6, CDK4, CDKN1B, CREB3L4, CDKN1A, CREBBP, CREB3L1, CREB5, CREB1, ELK1, FOS, FHL1, FASLG, GNG12, GSK3B, BAD, FOXO4, FOXO1, HIF1A, HSPB1, IKBKG, IRF9, BCL2, BCL2L11, MAP2K1 MAPK1, BCL2L1, MYB, NF1, NFKB1, MYC, PIK3CB, PIM1, PIAS1, PRKCB, PTEN, HSPA1A, HSPA2, IL2RB, IL2RA, SIRT1, NCL, RHOA, MCM4, NASP, S0S1, TCL1B, SOCS3, SOCS2, STAT4, STAT6, SRF,
  • the primers of the reagents can include a random primer sequence.
  • the random primer hybridizes with a sequence of reverse transcribed cDNA, thereby enabling priming off of the cDNA.
  • the reagents 120 includes various different random primers that enables priming off of all or a majority of cDNA generated from mRNA transcripts across the transcriptome. This enables the processing and analysis of mRNA transcripts across the whole transcriptome.
  • a random primer comprises a sequence of 5 nucleobases.
  • a random primer comprises a sequence of 6 nucleobases. In various embodiments, a random primer comprises a sequence of 9 nucleobases. In various embodiments, a random primer comprises a sequence of at least 5 nucleobases. In various embodiments, a random primer comprises a sequence of at least 6 nucleobases. In various embodiments, a random primer comprises a sequence of at least 9 nucleobases.
  • a random primer comprises a sequence of at least 6 nucleobases, at least 7 nucleobases, at least 8 nucleobases, at least 9 nucleobases, at least 10 nucleobases, at least 11 nucleobases, at least 12 nucleobases, at least 13 nucleobases, at least 14 nucleobases, at least 15 nucleobases, at least 16 nucleobases, at least 17 nucleobases, at least 18 nucleobases, at least 19 nucleobases, at least 20 nucleobases, at least 21 nucleobases, at least 22 nucleobases, at least 23 nucleobases, at least 24 nucleobases, at least 25 nucleobases, at least 26 nucleobases, at least 27 nucleobases, at least 28 nucleobases, at least 29 nucleobases, at least 30 nucleobases, at least 31 nucleobases, at least 32 nucleobase
  • a random primer includes one or more ribonucleotide nucleobases.
  • the random primer 624 include one ribonucleotide nucleobase on the 3’ end.
  • the random primer 624 includes two ribonucleotide nucleobases on the 3’ end.
  • the random primer 624 includes three, four, five, six, seven, eight, nine, or ten ribonucleotide nucleobases on the 3’ end. The presence of ribonucleotide primers on the 3’ end of the random primer ensures that the random primer enables extension only on cDNA and not on RNA.
  • the reagents include a reverse primer that is complementary to a portion of mRNA transcripts.
  • the reverse primer is a universal primer, such as any one of an oligo dT primer, oligo dU primer, or an oligo rU primer.
  • the universal primer region can be an oligo dT sequence that hybridizes with the poly A tail of messenger RNA transcripts. Therefore, the reverse primer hybridizes with a portion of mRNA transcripts and enables generation of cDNA strands through reverse transcription of the mRNA transcripts.
  • the primers of the reaction mixture include constant forward primers and constant reverse primers.
  • the constant forward primers hybridize with the random forward primer that enabled priming off the cDNA.
  • the constant reverse primers hybridize with a sequence of the reverse constant region, such as a PCR handle, that previously enabled reverse transcription of the mRNA transcript.
  • primers included in the reagents include additional sequences.
  • additional sequences may have functional purposes.
  • a primer may include a read sequence for sequencing purposes.
  • a primer may include a constant region.
  • the constant region of a primer can hybridize with a complementary constant region on another nucleic acid sequence for incorporation of the nucleic acid sequence during nucleic acid amplification.
  • the constant region of a primer can be complementary to a complementary constant region of a barcode sequence.
  • the barcode sequence is incorporated into generated amplicons.
  • primers instead of the primers being included in the reaction mixture (e.g., reaction mixture 140 in FIG. IB) such primers can be included or linked to a barcode (e.g., barcode 145 in FIG. IB).
  • barcode e.g., barcode 145 in FIG. IB
  • the primers are linked to an end of the barcode and therefore, are available to hybridize with target sequences of nucleic acids in the cell lysate.
  • primers of the reaction mixture, primers of the reagents, or primers of barcodes may be added to an emulsion in one step, or in more than one step.
  • the primers may be added in two or more steps, three or more steps, four or more steps, or five or more steps. Regardless of whether the primers are added in one step or in more than one step, they may be added after the addition of a lysing agent, prior to the addition of a lysing agent, or concomitantly with the addition of a lysing agent.
  • the primers of the reaction mixture may be added in a separate step from the addition of a lysing agent (e.g., as exemplified in the two step workflow process shown in FIG. IB).
  • a primer set for the amplification of a target nucleic acid typically includes a forward primer and a reverse primer that are complementary to a target nucleic acid or the complement thereof.
  • amplification can be performed using multiple target-specific primer pairs in a single amplification reaction, wherein each primer pair includes a forward target-specific primer and a reverse target-specific primer, where each includes at least one sequence that substantially complementary or substantially identical to a corresponding target sequence in the sample, and each primer pair having a different corresponding target sequence. Accordingly, certain methods herein are used to detect or identify multiple target sequences from a single cell sample.
  • Embodiments disclosed herein involve the use of digestible oligonucleotides.
  • Digestible oligonucleotides include a digestible primer. In various embodiments, digestible oligonucleotides further include digestible molecular tags. Digestible primers can be primers that participate in the reverse transcription of RNA transcripts to generate cDNA, but are digested such that the digestible primers do not participate in subsequent reactions involving the cDNA (e.g., amplification of cDNA). In various embodiments, the step of digestion reduces or eliminates the presence of digestible primers (e.g., digestible primers that are primed on RNA transcripts, digestible primers that have formed undesired byproducts, and/or digestible primers that have misprimed genomic DNA). In some embodiments, digestible primers are reverse primers. In some embodiments, digestible primers are gene specific primers.
  • digestible oligonucleotides have one of the following characteristics: A) one or more ribonucleotide nucleobases, B) one or more uracil nucleobases, C) a repeating deoxyuridine sequence (e.g., oligo dUracil or oligo dU), or D) a repeating ribouridine sequence (e.g., oligo rUracil or oligo rU).
  • digestible oligonucleotides include one or more ribonucleotide nucleobases.
  • every nucleobase of a ribonucleotide primer are ribonucleotide nucleobases.
  • a ribonucleotide oligonucleotide includes a combination of deoxyribonucleotide and ribonucleotide nucleobases.
  • ribonucleotide oligonucleotides have more ribonucleotide nucleobases than deoxyribonucleotide nucleobases.
  • nucleobases of a ribonucleotide oligonucleotide are ribonucleotide nucleobases. In various embodiments, at least 70% of nucleobases of a ribonucleotide oligonucleotide are ribonucleotide nucleobases. In various embodiments, at least 80% of nucleobases of a ribonucleotide oligonucleotide are ribonucleotide nucleobases.
  • nucleobases of a ribonucleotide oligonucleotide are ribonucleotide nucleobases. In various embodiments, between 55 and 90% of nucleobases of a ribonucleotide oligonucleotide are ribonucleotide nucleobases. In various embodiments, between 60 and 85% of nucleobases of a ribonucleotide oligonucleotide are ribonucleotide nucleobases. In various embodiments, between 70 and 80% of nucleobases of a ribonucleotide oligonucleotide are ribonucleotide nucleobases.
  • ribonucleotide oligonucleotide have more deoxyribonucleotide nucleobases than ribonucleotide nucleobases.
  • at least 60% of nucleobases of a ribonucleotide oligonucleotide are deoxyribonucleotide nucleobases.
  • at least 70% of nucleobases of a ribonucleotide oligonucleotide are deoxyribonucleotide nucleobases.
  • nucleobases of a ribonucleotide oligonucleotide are deoxyribonucleotide nucleobases. In various embodiments, at least 90% of nucleobases of a ribonucleotide oligonucleotide are deoxyribonucleotide nucleobases. In various embodiments, between 55 and 90% of nucleobases of a ribonucleotide oligonucleotide are deoxyribonucleotide nucleobases.
  • nucleobases of a ribonucleotide oligonucleotide are deoxyribonucleotide nucleobases. In various embodiments, between 70 and 80% of nucleobases of a ribonucleotide oligonucleotide are deoxyribonucleotide nucleobases.
  • every other base of a ribonucleotide oligonucleotide are ribonucleotide nucleobases.
  • the ribonucleotide oligonucleotide comprises a ribonucleotide nucleobase every 3 nucleobases.
  • the ribonucleotide oligonucleotide comprises a ribonucleotide nucleobase every 4 nucleobases.
  • the ribonucleotide oligonucleotide comprises one ribonucleotide nucleobase every 5 nucleobases, every 6 nucleobases, every 7 nucleobases, every 8 nucleobases, every 9 nucleobases, or every 10 nucleobases.
  • digestible oligonucleotides have one or more uracil nucleobases, hereafter referred to as “uracil oligonucleotides.”
  • uracil oligonucleotides have combination of deoxyribonucleotides and ribonucleotides nucleobases.
  • one or more thymidine nucleobases of a deoxyribonucleotide oligonucleotide can be replaced with uracil to generate a uracil oligonucleotide.
  • all thymidine nucleobases of a deoxyribonucleotide oligonucleotide can be replaced with uracils to generate a uracil oligonucleotide.
  • a uracil oligonucleotide has more deoxyribonucleotide nucleobases than uracil nucleobases.
  • a uracil oligonucleotide has more uracil nucleobases than deoxyribonucleotide nucleobases.
  • every other base of a uracil oligonucleotide is a uracil nucleobase.
  • the uracil oligonucleotide comprises a uracil nucleobase every 3 nucleobases. In various embodiments, the uracil oligonucleotide comprises a uracil nucleobase every 4 nucleobases. In various embodiments, the uracil oligonucleotide comprises a uracil nucleobase every 5 nucleobases, every 6 nucleobases, every 7 nucleobases, every 8 nucleobases, every 9 nucleobases, or every 10 nucleobases.
  • nucleobases of a uracil oligonucleotide are deoxyribonucleotide nucleobases. In various embodiments, at least 40% of nucleobases of a uracil oligonucleotide are deoxyribonucleotide nucleobases. In various embodiments, at least 50% of nucleobases of a uracil oligonucleotide are deoxyribonucleotide nucleobases. In various embodiments, at least 60% of nucleobases of a uracil oligonucleotide are deoxyribonucleotide nucleobases.
  • nucleobases of a uracil oligonucleotide are deoxyribonucleotide nucleobases. In various embodiments, at least 80% of nucleobases of a uracil oligonucleotide are deoxyribonucleotide nucleobases. In various embodiments, at least 90% of nucleobases of a uracil oligonucleotide are deoxyribonucleotide nucleobases. In various embodiments, at least 95% of nucleobases of a uracil oligonucleotide are deoxyribonucleotide nucleobases.
  • nucleobases of a uracil oligonucleotide are deoxyribonucleotide nucleobases. In various embodiments, between 50 and 90% of nucleobases of a uracil oligonucleotide are deoxyribonucleotide nucleobases. In various embodiments, between 60 and 90% of nucleobases of a uracil oligonucleotide are deoxyribonucleotide nucleobases. In various embodiments, between 60 and 80% of nucleobases of a uracil oligonucleotide are deoxyribonucleotide nucleobases.
  • nucleobases of a uracil oligonucleotide are deoxyribonucleotide nucleobases. In various embodiments, between 70 and 80% of nucleobases of a uracil oligonucleotide are deoxyribonucleotide nucleobases. In various embodiments, the uracil oligonucleotide has a sequence comprising two or more consecutive uracil nucleobases. In various embodiments, the uracil oligonucleotide has a sequence comprising three or more consecutive uracil nucleobases.
  • the uracil oligonucleotide has a sequence comprising four or more, five or more, six or more, seven or more, eight or more, nine or more, or ten or more consecutive uracil nucleobases.
  • the digestible oligonucleotide having one or more uracil nucleobases includes a gene specific primer.
  • the digestible oligonucleotide would be designed in accordance with the target sequence on the specific gene. For example, based on the presence of an adenosine in the target sequence on the specific gene, the complementary base in the digestible uracil primer would be designed as a uracil.
  • the locations of uracil nucleobases in the uracil primer would be based on the target sequence and not positioned in any pattern.
  • digestible primers have a repeating deoxyuridine sequence, hereafter referred to as “oligo dU primers.”
  • the repeating deoxyuridine sequence comprises three or more consecutive deoxyuridine nucleobases.
  • the repeating deoxyuridine sequence comprises four or more consecutive deoxyuridine nucleobases.
  • the repeating deoxyuridine sequence comprises five or more consecutive deoxyuridine nucleobases.
  • the repeating deoxyuridine sequence comprises six or more consecutive deoxyuridine nucleobases.
  • the repeating deoxyuridine sequence comprises seven or more consecutive deoxyuridine nucleobases.
  • the repeating deoxyuridine sequence comprises eight or more consecutive deoxyuridine nucleobases. In various embodiments, the repeating deoxyuridine sequence comprises nine or more consecutive deoxyuridine nucleobases. In various embodiments, the repeating deoxyuridine sequence comprises ten or more consecutive deoxyuridine nucleobases.
  • the repeating deoxyuridine sequence comprises eleven or more, twelve or more, thirteen or more, fourteen or more, fifteen or more, sixteen or more, seventeen or more, eighteen or more, nineteen or more, twenty or more, twenty one or more, twenty two or more, twenty three or more, twenty four or more, twenty five or more, twenty six or more, twenty seven or more, twenty eight or more, twenty nine or more, or thirty or more consecutive deoxyuridine nucleobases.
  • the repeating deoxyuridine sequence comprises between 5 and 30 consecutive deoxyuridine nucleobases. In various embodiments, the repeating deoxyuridine sequence comprises between 8 and 25 consecutive deoxyuridine nucleobases. In various embodiments, the repeating deoxyuridine sequence comprises between 12 and 18 consecutive deoxyuridine nucleobases.
  • an oligo dU primer comprises a V or VN sequence, where “V” is any of an adenine (A), guanine (G), or cytosine (C) nucleobase and “N” is any of adenine (A), guanine (G), cytosine (C), or thymine (T) nucleobase.
  • the oligo dU primer terminates in the V or VN sequence (e.g., 3’ end of oligo dU contains the V or VN sequence).
  • digestible primers have a repeating ribouridine sequence, hereafter referred to as “oligo rU primers.”
  • the repeating ribouridine sequence comprises three or more consecutive ribouridine nucleobases.
  • the repeating ribouridine sequence comprises four or more consecutive ribouridine nucleobases.
  • the repeating ribouridine sequence comprises five or more consecutive ribouridine nucleobases.
  • the repeating ribouridine sequence comprises six or more consecutive ribouridine nucleobases.
  • the repeating ribouridine sequence comprises seven or more consecutive ribouridine nucleobases.
  • the repeating ribouridine sequence comprises eight or more consecutive ribouridine nucleobases. In various embodiments, the repeating ribouridine sequence comprises nine or more consecutive ribouridine nucleobases. In various embodiments, the repeating ribouridine sequence comprises ten or more consecutive ribouridine nucleobases.
  • the repeating ribouridine sequence comprises eleven or more, twelve or more, thirteen or more, fourteen or more, fifteen or more, sixteen or more, seventeen or more, eighteen or more, nineteen or more, twenty or more, twenty one or more, twenty two or more, twenty three or more, twenty four or more, twenty five or more, twenty six or more, twenty seven or more, twenty eight or more, twenty nine or more, or thirty or more consecutive ribouridine nucleobases.
  • the repeating ribouridine sequence comprises between 5 and 30 consecutive ribouridine nucleobases.
  • the repeating ribouridine sequence comprises between 8 and 25 consecutive ribouridine nucleobases.
  • the repeating ribouridine sequence comprises between 12 and 18 consecutive ribouridine nucleobases.
  • an oligo rU primer comprises a V or VN sequence, where “V” is any of an adenine (A), guanine (G), or cytosine (C) nucleobase and “N” is any of adenine (A), guanine (G), cytosine (C), or thymine (T) nucleobase.
  • the oligo rU primer terminates in the V or VN sequence (e.g., 3’ end of oligo dU contains the V or VN sequence).
  • FIG. 16 depicts an example computing device (e.g., computing device 180 shown in FIG. 1A) for implementing system and methods described in reference to FIGs. 1-15.
  • the example computing device 180 is configured to perform the in silico steps of read alignment 215 and/or characterization 220.
  • Examples of a computing device can include a personal computer, desktop computer laptop, server computer, a computing node within a cluster, message processors, hand-held devices, multi-processor systems, microprocessorbased or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, tablets, pagers, routers, switches, and the like.
  • FIG. 16 illustrates an example computing device 180 for implementing system and methods described in the figures above.
  • the computing device 180 includes at least one processor 1602 coupled to a chipset 1604.
  • the chipset 1604 includes a memory controller hub 1620 and an input/output (VO) controller hub 1622.
  • a memory 1606 and a graphics adapter 1612 are coupled to the memory controller hub 1620, and a display 1618 is coupled to the graphics adapter 1612.
  • a storage device 1608, an input interface 1614, and network adapter 1616 are coupled to the I/O controller hub 1622.
  • Other embodiments of the computing device 180 have different architectures.
  • the storage device 1608 is a non-transitory computer-readable storage medium such as a hard drive, compact disk read-only memory (CD-ROM), DVD, or a solid-state memory device.
  • the memory 1606 holds instructions and data used by the processor 1602.
  • the input interface 1614 is a touch-screen interface, a mouse, track ball, or other type of input interface, a keyboard, or some combination thereof, and is used to input data into the computing device 180.
  • the computing device 180 may be configured to receive input (e.g., commands) from the input interface 1614 via gestures from the user.
  • the graphics adapter 1612 displays images and other information on the display 1618. For example, the display 1618 can show metrics pertaining to the generated libraries (e.g., DNA or RNA libraries) and/or any characterization of single cells.
  • the network adapter 1616 couples the computing device 180 to one or more computer networks.
  • the computing device 180 is adapted to execute computer program modules for providing functionality described herein.
  • module refers to computer program logic used to provide the specified functionality.
  • a module can be implemented in hardware, firmware, and/or software.
  • program modules are stored on the storage device 1608, loaded into the memory 1606, and executed by the processor 1602.
  • a computing device 180 can vary from the embodiments described herein.
  • the computing device 180 can lack some of the components described above, such as graphics adapters 1612, input interface 1614, and displays 1618.
  • a computing device 180 can include a processor 1602 for executing instructions stored on a memory 1606.
  • a non-transitory machine-readable storage medium such as one described above, is provided, the medium comprising a data storage material encoded with machine readable data which, when using a machine programmed with instructions for using said data, is capable of displaying any of the datasets and execution and results of this invention.
  • Such data can be used for a variety of purposes, such as patient monitoring, treatment considerations, and the like.
  • Embodiments of the methods described above can be implemented in computer programs executing on programmable computers, comprising a processor, a data storage system (including volatile and non-volatile memory and/or storage elements), a graphics adapter, an input interface, a network adapter, at least one input device, and at least one output device.
  • a display is coupled to the graphics adapter.
  • Program code is applied to input data to perform the functions described above and generate output information.
  • the output information is applied to one or more output devices, in known fashion.
  • the computer can be, for example, a personal computer, microcomputer, or workstation of conventional design. [00277]
  • Each program can be implemented in a high level procedural or object oriented programming language to communicate with a computer system.
  • the programs can be implemented in assembly or machine language, if desired.
  • the language can be a compiled or interpreted language.
  • Each such computer program is preferably stored on a storage media or device (e.g., ROM or magnetic diskette) readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage media or device is read by the computer to perform the procedures described herein.
  • the system can also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform the functions described herein.
  • the signature patterns and databases thereof can be provided in a variety of media to facilitate their use.
  • Media refers to a manufacture that contains the signature pattern information of the present invention.
  • the databases of the present invention can be recorded on computer readable media, e.g. any medium that can be read and accessed directly by a computer.
  • Such media include, but are not limited to: magnetic storage media, such as floppy discs, hard disc storage medium, and magnetic tape; optical storage media such as CD-ROM; electrical storage media such as RAM and ROM; and hybrids of these categories such as magnetic/optical storage media.
  • magnetic storage media such as floppy discs, hard disc storage medium, and magnetic tape
  • optical storage media such as CD-ROM
  • electrical storage media such as RAM and ROM
  • hybrids of these categories such as magnetic/optical storage media.
  • Recorded refers to a process for storing information on computer readable medium, using any such methods as known in the art. Any convenient data storage structure can be chosen, based on the means used to access the stored information. A variety of data processor programs and formats can be used for storage, e.g. word processing text file, database format, etc.
  • kits for performing single cell analysis of RNA transcripts and genomic DNA of individual or populations of cells may include one or more of the following: fluids for forming emulsions (e.g., carrier phase, aqueous phase), barcoded beads, micro fluidic devices for processing single cells, reagents for lysing cells and releasing cell analytes, reaction mixtures for performing nucleic acid amplification reactions, and instructions for using any of the kit components according to the methods described herein.
  • kits include digestible primers that can be used for performing reverse transcription of RNA transcripts as well as agents for digesting the digestible primers to prevent the involvement of the digestible primers in subsequent reactions, such as nucleic acid amplification reactions.
  • cells are treated in situ to perform reverse transcription, followed by single cell DNA and RNA sequencing.
  • cells are treated in situ to perform reverse transcription (RT).
  • RT reverse transcription
  • cells are fixed and permeabilized, reagents for reverse transcription are provided, and cells are incubated within a suitable temperature range for RT to occur.
  • Cells are washed and loaded into the a single cell analysis system (e.g., Tapestri® single cell analysis system) for single cell encapsulation. Barcoding PCR is performed in emulsified droplets for both cDNA and DNA.
  • RNA and DNA sequencing libraries are prepared from these amplified sequences.
  • fixation step the following exemplary fixatives and concentration ranges (in parentheses) are used for cell fixation:
  • SPDP succinimidyl 3-(2-pyridyldithio)propionate
  • Incubation temperature is at any of the following:
  • any of the following reagents are used: Tween-20 (0.01-1%), Triton X-100 (0.01-1%), or saponin (0.01-1%).
  • Tween-20 0.01-1%
  • Triton X-100 0.01-1%
  • saponin 0.01-1%
  • no extra permeabilizing reagent is used, relying on some cell permeabilization caused by the cell fixation step.
  • Cell permeabilization takes place at an incubation temperature of any of the following: on ice, 4C, 10C, 20C, room temperature, 25C, or 37C.
  • the incubation duration for cell permeabilization takes place for any of the following: 1 min, 3 min, 5 min, 10 min, 15 min, 20 min, or 30 min.
  • this step can be performed concurrently with the following step (reverse transcription). In this case the reaction is proceeded to the next step without extra incubation time.
  • RT primers include any of:
  • the target RNA can be a fusion/translocated RNA, a transcribed gene, non-coding RNA, nuclear RNA, or any other RNA products.
  • the number of RNA targets and therefore specific RT primers can be from 1 to 20,000,
  • the RT primer sequence includes any of:
  • a unique sequence for pull down or isolation purposes e.g., for the use of biotinstreptavidin pull down method.
  • Reverse transcription is performed with or without the permeabilization reagent at any of the following temperatures: 25C, 30C, 37C, 40C, 50C, 55C, or 60C.
  • Incubation time for RT is any of the following: 5 min, 10 min, 15 min, 20 min, 30 min, 40 min, or 45 min.
  • cells are washed by adding cell wash buffer, centrifuging at 300 to 400 x g and removing the cell wash buffer without disturbing the cells.
  • Cells are washed in any of DPBS, DPBS + 0.5% FBS, DPBS + 1% FBS, DPBS + 2.5% FBS, or DPBS + 5% FBS.
  • Volume of wash buffer is any of 0.5 mL, 1 mL, 1.5 mL, 2 mL, 5 mL, 10 mL, or 15 mL.
  • Cells are washed at any of the following temperatures: 4C, 10C, 15C, 20C, 25C, or 37C. Cells are washed 1 to 5 times. Alternatively cells are pelleted to remove the RT reagent without extra washing.
  • Cells that are fixed, permeabilized, underwent reverse transcription, washed, are resuspended and provided to a single cell workflow (e.g., Tapestri® platform).
  • Cells are encapsulated in droplets along with cell lysis reagent, protease (e.g., proteinase K), and reverse primers for DNA targets.
  • Reverse primers for cDNAs RNA targets
  • forward primers for DNA and cDNA targets are added. PCR is performed to amplify these targets. In some iterations, both forward and reverse primers for the cDNAs are added in this step. In some iterations, only the forward primers are used for the cDNA targets.
  • the amplicons are processed to generate sequenceable libraries by adding appropriate adaptors for intended sequencing platform.
  • the RNA target amplicons are process together with the DNA targets.
  • the RNA target amplicons are separated from the DNA targets and processed separately. This is achieved by performing separation by size or by biotin-streptavidin pull down.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Physics & Mathematics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Disclosed herein are methods and apparati for single-cell analysis of analytes of cells through the implementation of molecular tags. Generally, the incorporation of molecular tags enables the distinguishing of the analytes of the cell. For example, analytes of the cell refer to DNA, RNA, or protein of a single cell and therefore, different DNA, RNA, or protein molecules of a cell can be detected, distinguished, and/or quantified.

Description

METHODS OF MOLECULAR TAGGING FOR SINGLE-CELL ANALYSIS
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of and priority to U.S. Provisional Patent Application No. 63/301,706 filed January 21, 2022, the entire disclosure of which is hereby incorporated by reference in its entirety for all purposes.
BACKGROUND
[0002] Single cell analysis methodologies involve analyzing analytes of single cells to characterize single cells (e.g., for disease). Prior methodologies have focused on utilizing barcodes, such as unique molecular identifiers (UMIs) for purposes of uniquely identifying analytes that are present in single cells. However, these prior methodologies are often inefficient or sub-optimal. This can be attributed to various factors, examples of which include reaction inefficiencies arising from undesired reagents interactions between reagents for reverse transcription and reagents for nucleic acid amplification, narrow experimental conditions that are sub-optimal for performing reactions in-droplet, poor amplification efficiency due to secondary structure effects of long amplicons. Furthermore, these prior methodologies can often be resource-intensive e.g., time intensive and expensive (e.g., due to needing to sequence long amplicons including gene sequences of interest). Thus, improved methodologies for analyzing single cell analytes are needed.
SUMMARY
[0003] The disclosure generally relates to methods and apparati for single-cell analysis of analytes of cells through the implementation of molecular tags. In various embodiments, analytes of the cell refer to nucleic acids of the cell. In various embodiments, analytes of the cell refer to genomic DNA of the cell. In various embodiments, analytes of the cell refer to RNA of the cell. In various embodiments, analytes of the cell refer to both DNA and RNA of the cell. In various embodiments, analytes of the cell refer to protein analytes of the cell. Generally, the incorporation of molecular tags enables the distinguishing of the analytes of the cell. In particular embodiments, the incorporation of molecular tags enables distinguishing amplicons derived from a first analyte, such as a first RNA analyte, and amplicons derived from a second analyte, such as a second RNA analyte. Thus, the incorporation of molecular tags may be useful for analyzing analytes of the cell, such as for purposes of quantifying the number of different RNA analytes. [0004] Disclosed herein is a method for analyzing RNA of a cell, the method comprising: encapsulating a cell within a droplet with reagents; within the droplet, generating a cell lysate using the reagents, the cell lysate comprising A) cDNA derived from RNA of the cell or B) complements of the cDNA; encapsulating the cell lysate in a second droplet with reactants; within the second droplet, generating amplicons comprising sequences derived from oligonucleotides comprising molecular tags using the reactants and the cDNA of the cell lysate; and sequencing the generated amplicons to analyze RNA of the cell, wherein oligonucleotides comprising molecular tags are included in either the reagents in the droplet or in the reactants in the second droplet. In various embodiments, the reagents comprise oligonucleotides comprising molecular tags. In various embodiments, the cDNA comprises the oligonucleotides comprising molecular tags. In various embodiments, the oligonucleotides comprising molecular tags further comprise reverse transcription (RT) primers or template switching oligonucleotides (TSOs). In various embodiments, for each of the oligonucleotides comprising molecular tags, a molecular tag is located at an end of a RT primer or TSO.
[0005] In various embodiments, the oligonucleotides comprising molecular tags further comprise RT primers, and wherein generating the cell lysate comprising cDNA derived from RNA of the cell comprises: providing a RT primer to the RNA of the cell; and performing reverse transcription to generate the cDNA comprising the RT primer and the molecular tag. In various embodiments, the oligonucleotides comprising molecular tags further comprise RT primers comprising one or more ribonucleotides. In various embodiments, the oligonucleotides comprising molecular tags further comprise RT primers comprising one or more uracil. In various embodiments, generating amplicons comprising sequences derived from the oligonucleotides comprising molecular tags comprises: digesting the oligonucleotides comprising molecular tags; and performing nucleic acid amplification to generate amplicons comprising sequences derived from the oligonucleotides comprising molecular tags. In various embodiments, digesting the oligonucleotides comprises exposing the oligonucleotides to a ribonuclease or uracil-DNA glycosylase (UDG).
[0006] In various embodiments, methods disclosed herein further comprise: prior to digesting the oligonucleotides comprising molecular tags: providing a primer to the cDNA; and extending the primer to generate a sequence complementary to the oligonucleotide comprising a molecular tag. In various embodiments, generating amplicons comprising sequences derived from the oligonucleotides comprising molecular tags further comprises incorporating cellular barcodes into the amplicons, wherein the cellular barcodes identify the cell from which the amplicons originate from. In various embodiments, the oligonucleotides comprising molecular tags further comprise TSOs, and wherein generating the cell lysate comprising cDNA derived from RNA of the cell comprises: performing reverse transcription to generate a first cDNA molecule; performing template switching by providing a TSO to the first cDNA molecule; and generating the cDNA from the hybridized TSO and the first cDNA molecule. In various embodiments, the TSO comprises a sequence that hybridizes with untemplated cytosine nucleotides of the first cDNA molecule. In various embodiments, the TSO comprises a rGrGrG sequence that hybridizes with three untemplated cytosine nucleotides of the first cDNA molecule. In various embodiments, the reactants comprise oligonucleotides comprising molecular tags. In various embodiments, the oligonucleotides comprising molecular tags further comprise forward primers. In various embodiments, the oligonucleotides comprising molecular tags further comprise reverse primers.
[0007] In various embodiments, generating amplicons comprising sequences derived from oligonucleotides comprising molecular tags using the reactants and the cDNA of the cell lysate further comprises: providing the forward primers of the oligonucleotides to the cDNA of the cell lysate; and extending the forward primers to generate sequences that incorporate the molecular tags of the oligonucleotides. In various embodiments, generating amplicons comprising sequences derived from oligonucleotides comprising molecular tags using the reactants and the cDNA of the cell lysate further comprises: performing nucleic acid amplification to generate amplicons comprising sequences derived from the oligonucleotides comprising molecular tags. In various embodiments, the forward primers are gene specific primers. In various embodiments, performing nucleic acid amplification to generate amplicons further comprises incorporating cellular barcodes into the amplicons, wherein the cellular barcodes identify the cell from which the amplicons originate from.
[0008] Additionally disclosed herein is a method for analyzing RNA of a cell, the method comprising: encapsulating a cell within a droplet with reagents; within the droplet, generating a cell lysate comprising A) cDNA derived from RNA of the cell or B) complements of the cDNA; encapsulating the cell lysate in a second droplet with reactants; within the second droplet, generating amplicons comprising sequences derived from the cDNA of the cell lysate; breaking the second droplet to obtain the generated amplicons in bulk; providing oligonucleotides comprising molecular tags to the generated amplicons in bulk; and sequencing at least the oligonucleotides comprising molecular tags to analyze RNA of the cell. In various embodiments, generating amplicons comprising sequences derived from the cDNA of the cell lysate further comprises: providing oligonucleotides comprising primers and cellular barcodes to the cDNA of the cell lysate, wherein the cellular barcodes identify the cell; and extending the primers to generate sequences that incorporate the cellular barcodes. In various embodiments, generating amplicons comprising sequences derived from the cDNA of the cell lysate further comprises performing nucleic acid amplification. In various embodiments, the primers are gene specific primers. In various embodiments, generating amplicons comprising sequences derived from the cDNA of the cell lysate further comprises: providing oligonucleotides comprising primers and gene tags; extending the primers to generate amplicons that further incorporate the gene tags.
[0009] In various embodiments, methods disclosed herein further comprise: hybridizing the provided oligonucleotides comprising molecular tags with the amplicons that incorporate the gene tags; generating nucleic acid sequences by extending the hybridized oligonucleotides comprising molecular tags, wherein the generated nucleic sequences comprise molecular tags and gene tags. In various embodiments, methods disclosed herein further comprise: sequencing gene tags of the nucleic acid sequences to analyze RNA of the cell. In various embodiments, the nucleic acid sequences do not include the cDNA of the cell lysate or sequences derived from the cDNA of the cell lysate. In various embodiments, sequencing at least the oligonucleotides comprising molecular tags and sequencing gene tags of the nucleic acid sequences do not include sequencing cDNA of the cell lysate or sequences derived from the cDNA of the cell lysate.
[0010] Additionally disclosed herein is a method for analyzing RNA of a cell, the method comprising: encapsulating a cell within a droplet with reagents; within the droplet, generating a cell lysate comprising A) cDNA derived from RNA of the cell or B) complements of the cDNA; encapsulating the cell lysate in a second droplet with reactants, wherein the reactants comprise oligonucleotides comprising one or more universal bases; within the second droplet, generating amplicons comprising molecular tags derived from oligonucleotides comprising one or more universal bases using the reactants and the cDNA of the cell lysate or complements of the cDNA, wherein amplicons from different cDNA are distinguishable according to the molecular tags derived from oligonucleotides comprising one or more universal bases; and sequencing the generated amplicons to analyze RNA of the cell, wherein oligonucleotides comprising one or more universal bases are included in either the reagents in the droplet or in the reactants in the second droplet. In various embodiments, generating amplicons comprising sequences derived from oligonucleotides comprising one or more universal bases using the reactants and the cDNA of the cell lysate comprises: performing a first cycle of nucleic acid amplification to incorporate the oligonucleotides comprising one or more universal bases; and performing a second cycle of nucleic acid amplification to generate the amplicons, wherein the molecular tags are generated within the amplicons during the second cycle of nucleic acid amplification.
[0011] In various embodiments, the molecular tags are generated within the amplicons during the second cycle of nucleic acid amplification by polymerases that generate strands complementary to the one or more universal bases of the oligonucleotides. In various embodiments, the oligonucleotides comprising one or more universal bases comprise two or more consecutive universal bases. In various embodiments, the oligonucleotides comprising one or more universal bases comprise three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, or ten or more consecutive universal bases. In various embodiments, each of the universal bases are independently any one of an inosine base, 2'-DeoxyNebularine, 2’deoxyinosine, 5-nitroindole, 5' 5-Nitroindole, or 3 -Nitropyrrole. In various embodiments, each molecular tag differs from other molecular tags. In various embodiments, at least one molecular tag has a same sequence as another molecular tag. In various embodiments, at least 0.1% of molecular tags have a same sequence as another molecular tag. In various embodiments, at least 0.5% of molecular tags have a same sequence as another molecular tag. In various embodiments, at least 1% of molecular tags have a same sequence as another molecular tag. In various embodiments, each of the molecular tags independently comprise any one of three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, or twenty nucleotide bases. In various embodiments, each of the molecular tags independently comprise either seven or eight nucleotide bases.
[0012] Additionally disclosed herein is a method for analyzing RNA of a cell, the method comprising: encapsulating a cell within a droplet with reagents; within the droplet, generating a cell lysate comprising A) cDNA with different start and stop sites derived from RNA of the cell that have been differentially cleaved or B) complements of the cDNA; encapsulating the cell lysate in a second droplet with reactants; within the second droplet, generating amplicons using the reactants and the cDNA of the cell lysate, wherein the amplicons are derived from the cDNA with different start and stop sites; and sequencing the generated amplicons to analyze RNA of the cell. In various embodiments, the RNA of the cell have been differentially cleaved by a RNAse included in the reagents. In various embodiments, the reagents further comprise a plurality of truncation oligonucleotides, wherein the plurality of truncation oligonucleotides comprise DNA nucleobases. In various embodiments, generating the cell lysate comprising cDNA with different start and stop sites derived from RNA of the cell that have been differentially cleaved further comprises: hybridizing the plurality of truncation oligonucleotides to RNA of the cell; differentially cleaving the RNA of the cell at locations where the plurality of truncation oligonucleotides are hybridized to RNA of the cell. In various embodiments, the different start and stop sites include at least 102, at least 103, at least 104, at least 105, at least 106, at least 107, at least 108, or at least 109 different start and stop sites. In various embodiments, sequencing the generated amplicons to analyze RNA of the cell comprises: identifying amplicons with sequences corresponding to the same start and stop site; and correlating the identified amplicons with sequences corresponding to the same start and stop site to a common RNA of the cell.
[0013] Additionally disclosed herein is a method for analyzing RNA of a cell, the method comprising: encapsulating a cell within a droplet with reagents; within the droplet, generating a cell lysate comprising A) cDNA derived from RNA of the cell or B) complements of the cDNA; encapsulating the cell lysate in a second droplet with reactants; within the second droplet, performing nucleic acid amplification to generate amplicons comprising nucleotide bases derived from alternate bases introduced in one or both of the reagents or reactants, wherein the alternate bases are propagated through one or more cycles of the nucleic acid amplification; and sequencing the generated amplicons comprising nucleotide bases derived from alternate bases to analyze RNA of the cell. In various embodiments, alternate bases are introduced in the reagents, and wherein generating the cell lysate comprises incorporating alternate bases into the cDNA or complements of the cDNA. In various embodiments, generating the cell lysate comprises incorporating alternate bases into the cDNA during reverse transcription. In various embodiments, the alternate bases are incorporated into the cDNA or complements of the cDNA in a random manner. In various embodiments, performing nucleic acid amplification to generate amplicons comprising nucleotide bases derived from alternate bases comprises amplifying the cDNA comprising the incorporated alternate bases. In various embodiments, alternate bases are introduced in the reactants, and wherein performing nucleic acid amplification comprises incorporating alternate bases during a first cycle of the nucleic acid amplification. In various embodiments, the alternate bases are incorporated during the first cycle of the nucleic acid amplification in a random manner. In various embodiments, additional alternate bases are randomly incorporated in the one or more cycles of the nucleic acid amplification. In various embodiments, the alternate bases comprise any one of inosine base, 2'-DeoxyNebularine, 2’deoxyinosine, 5 -nitroindole, 5' 5- Nitroindole, or 3 -Nitropyrrole. In various embodiments, sequencing the generated amplicons comprising common alternate bases to analyze RNA of the cell comprises: identifying one or more sequence reads of amplicons comprising a plurality of common alternate bases; and assigning the identified one or more sequence reads of amplicons to a RNA of the cell. In various embodiments, generating the cell lysate using the reagents comprises releasing genomic DNA (gDNA) of the cell such that the cell lysate comprises the gDNA. In various embodiments, releasing gDNA of the cell comprises exposing the cell to proteinase K. In various embodiments, methods disclosed herein further comprise: within the second droplet, generating amplicons derived from the released gDNA of the cell lysate using the reactants; and sequencing the generated amplicons derived from the released gDNA. In various embodiments, generating amplicons derived from the released gDNA comprises incorporating cellular barcodes into amplicons derived from the released gDNA using the reactants, wherein the cellular barcodes identify the cell from which the amplicons originate from.
[0014] Additionally disclosed herein is a method for analyzing RNA of a cell, the method comprising: performing in situ processing of the cell to incorporate one or more molecular tags into a nucleic acid derived from RNA of the cell; encapsulating, within a first droplet, the cell comprising the nucleic acid derived from RNA of the cell and reagents; further encapsulating the nucleic acid derived from RNA in a second droplet with reactants; within the second droplet, performing nucleic acid amplification to incorporate a cell barcode into amplicons using the reactants, wherein the amplicons comprise the one or more molecular tags, the cell barcode, and a gene specific sequence of the RNA or a complement thereof; and sequencing the generated amplicons to analyze RNA of the cell, In various embodiments, performing in situ processing of the cell to incorporate one or more molecular tags into a nucleic acid derived from RNA of the cell comprises: providing a primer comprising a sequence complementary to a sequence of the RNA of the cell and a molecular tag; providing a gene specific sequence comprising a sequence complementary to a second sequence of the RNA of the cell; and ligating the gene specific sequence and the primer to generate the nucleic acid molecule comprising the molecular tag. In various embodiments, the sequence complementary to a sequence of the RNA of the cell comprises a poly-T sequence complementary to a poly-A tail of the RNA. In various embodiments, the sequence complementary to a sequence of the RNA of the cell comprises a gene specific sequence. In various embodiments, the sequence of the RNA and the second sequence of the RNA are adjacent to one another other. In various embodiments, performing in situ processing of the cell to incorporate one or more molecular tags into a nucleic acid derived from RNA of the cell comprises: providing a first primer comprising a sequence complementary to a sequence of the RNA of the cell; providing a second primer comprising a sequence complementary to a second sequence of the RNA of the cell; and ligating the first primer and the second primer to generate the nucleic acid molecule comprising the molecular tag, wherein one or more both of the first primer or the second primer comprises a molecular tag.
[0015] In various embodiments, the sequence complementary to a sequence of the RNA of the cell comprises a poly-T sequence complementary to a poly-A tail of the RNA. In various embodiments, the sequence complementary to a sequence of the RNA of the cell comprises a gene specific sequence. In various embodiments, the sequence of the RNA and the second sequence of the RNA are adjacent to one another other. In various embodiments, the first primer or the second primer further comprises a constant region. In various embodiments, performing in situ processing of the cell to incorporate one or more molecular tags into a nucleic acid derived from RNA of the cell comprises: providing a primer comprising a sequence complementary to a sequence of the RNA of the cell and a molecular tag; using the primer, reverse transcribing the nucleic acid derived from RNA of the cell. In various embodiments, the sequence complementary to a sequence of the RNA of the cell comprises a poly-T sequence complementary to a poly-A tail of the RNA. In various embodiments, the sequence complementary to a sequence of the RNA of the cell comprises a gene specific sequence. In various embodiments, methods disclosed herein further comprise: subsequent to encapsulating the cell comprising the nucleic acid derived from RNA of the cell and reagents in the first droplet, releasing genomic DNA from the cell using the reagents. In various embodiments, the nucleic acid derived from RNA of the cell comprises a constant region with a primer annealing temperature that differs from a primer annealing temperature of the released genomic DNA. In various embodiments, the primer annealing temperature of the constant region is lower than the primer annealing temperature of the released genomic DNA. In various embodiments, performing nucleic acid amplification to incorporate a cell barcode comprises performing nucleic acid amplification cycles at an annealing temperature of the genomic DNA to preferentially amplify the genomic DNA in comparison to the nucleic acid derived from RNA of the cell.
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
[0016] These and other features, aspects, and advantages of the present invention will become better understood with regard to the following description, and accompanying drawings, where:
[0017] Figure (FIG.) 1 A shows an overall system environment for analyzing cell(s) through a single cell workflow analysis, in accordance with an embodiment.
[0018] FIG. IB depicts a single cell workflow analysis to generate amplicons for sequencing, in accordance with an embodiment.
[0019] FIG. 2 is a flow process for analyzing nucleic acid sequences derived from analytes of the single cell, in accordance with an embodiment.
[0020] FIGs. 3 A-3C depict the processing and releasing of analytes of a single cell in a droplet, in accordance with an embodiment.
[0021] FIG. 4 A depicts the processing of RNA and gDNA in a droplet, in accordance with a first embodiment using digestible ribonucleotides.
[0022] FIG. 4B depicts the amplification and barcoding of nucleic acids derived from RNA and gDNA, in accordance with the embodiment shown in FIG. 4A.
[0023] FIG. 4C depicts the processing of RNA and gDNA in a droplet, in accordance with an second embodiment using digestible uracils.
[0024] FIG. 4D depicts the amplification and barcoding of nucleic acids derived from RNA and gDNA, in accordance with the embodiment shown in FIG. 4C.
[0025] FIG. 4E depicts the processing of RNA and gDNA in a droplet, in accordance with a third embodiment using digestible primers.
[0026] FIGs. 4F and 4G depict the amplification and barcoding of nucleic acids derived from RNA and gDNA, in accordance with the third embodiment shown in FIG. 4E.
[0027] FIG. 5 A depicts the processing of RNA in a droplet, in accordance with an embodiment using template switching oligonucleotides.
[0028] FIG. 5B depicts the amplification and barcoding of nucleic acids derived from RNA, in accordance with the embodiment shown in FIG. 5 A.
[0029] FIG. 6 A depicts the processing of RNA and gDNA in a droplet, in accordance with an embodiment using gene specific primers comprising molecular tags. [0030] FIG. 6B depicts the amplification and barcoding of nucleic acids derived from RNA and gDNA, in accordance with the embodiment shown in FIG. 6A.
[0031] FIG. 7 A depicts the processing of RNA and gDNA in a droplet, in accordance with an embodiment in which molecular tags are introduced in bulk.
[0032] FIG. 7B depicts the amplification and barcoding of nucleic acids derived from RNA and gDNA, in accordance with the embodiment shown in FIG. 7A.
[0033] FIG. 7C depicts the molecular tagging in bulk, in accordance with the embodiments shown in FIGs. 7 A and 7B.
[0034] FIG. 8 A depicts the processing of RNA in a droplet, in accordance with an embodiment incorporating molecular tags and gene tags.
[0035] FIG. 8B depicts the amplification and barcoding of nucleic acids derived from RNA, in accordance with the embodiment shown in FIG. 8A.
[0036] FIG. 8C depicts the molecular tagging in bulk, in accordance with the embodiments shown in FIGs. 8 A and 8B.
[0037] FIG. 9 A depicts the processing of RNA and gDNA in a droplet, in accordance with an embodiment incorporating universal bases.
[0038] FIG. 9B depicts the amplification and barcoding of nucleic acids derived from RNA and gDNA, in accordance with the embodiment shown in FIG. 9A.
[0039] FIG. 10 depicts the processing of RNA in a droplet, in accordance with an embodiment involving differentially cleaving RNA.
[0040] FIG. 11 A depicts the processing of RNA in a droplet, in accordance with an embodiment incorporating alternate bases.
[0041] FIG. 1 IB depicts the amplification and barcoding of nucleic acids derived from RNA, in accordance with the embodiment shown in FIG. 11 A.
[0042] FIG. 12A depicts the in situ processing of RNA, in accordance with a first embodiment.
[0043] FIG. 12B depicts the in droplet processing of RNA, in accordance with the embodiment shown in FIG. 12 A.
[0044] FIG. 13 A depicts the in situ processing of RNA, in accordance with a second embodiment.
[0045] FIG. 13B depicts the in droplet processing of RNA, in accordance with the embodiment shown in FIG. 13A. [0046] FIG. 14A depicts the in situ processing of RNA, in accordance with a third embodiment.
[0047] FIGs. 14B and 14C depict the in droplet processing of RNA, in accordance with the embodiment shown in FIG. 14 A.
[0048] FIG. 14D depicts example amplicons of the DNA and RNA library, in accordance with the embodiments shown in FIGs. 14A-14C.
[0049] FIG. 15A depicts the in situ processing of RNA involving reverse transcription, in accordance with an embodiment.
[0050] FIG. 15B depicts the in situ processing of RNA involving reverse transcription, in accordance with a second embodiment.
[0051] FIG. 15C depicts the in droplet processing (e.g., encapsulation and barcoding), in accordance with an embodiment.
[0052] FIG. 15D depicts example amplicons of the DNA and RNA library, in accordance with the embodiments shown in FIGs. 15A-15C.
[0053] FIG. 16 depicts an example computing device for implementing system and methods described in reference to the above figures.
DETAILED DESCRIPTION
Definitions
[0054] Terms used in the claims and specification are defined as set forth below unless otherwise specified.
[0055] The term “subject” or “patient” are used interchangeably and encompass an organism, human or non-human, mammal or non-mammal, male or female.
[0056] The term “sample” or “test sample” can include a single cell or multiple cells or fragments of cells or an aliquot of body fluid, such as a blood sample, taken from a subject, by means including venipuncture, excretion, ejaculation, massage, biopsy, needle aspirate, lavage sample, scraping, surgical incision, or intervention or other means known in the art.
[0057] The term “analyte” refers to a component of a cell. Cell analytes can be informative for characterizing a cell. Therefore, performing single-cell analysis of one or more analytes of a cell using the systems and methods described herein are informative for determining a state or behavior of a cell. Examples of an analyte include a nucleic acid (e.g., RNA, DNA, cDNA), a protein, a peptide, an antibody, an antibody fragment, a polysaccharide, a sugar, a lipid, a small molecule, or combinations thereof. In particular embodiments, a single-cell analysis involves analyzing RNA. In particular embodiments, a single-cell analysis involves analyzing two different analytes such as RNA and DNA. In particular embodiments, a single-cell analysis involves analyzing three or more different analytes of a cell, such as RNA, DNA, and protein.
[0058] In some embodiments, the discrete entities as described herein are droplets. The terms “emulsion,” "drop," "droplet," and "microdroplet" are used interchangeably herein, to refer to small, generally spherically structures, containing at least a first fluid phase, e.g., an aqueous phase (e.g., water), bounded by a second fluid phase (e.g., oil) which is immiscible with the first fluid phase. In some embodiments, droplets according to the present disclosure may contain a first fluid phase, e.g., oil, bounded by a second immiscible fluid phase, e.g. an aqueous phase fluid (e.g., water). In some embodiments, the second fluid phase will be an immiscible phase carrier fluid. Thus droplets according to the present disclosure may be provided as aqueous-in-oil emulsions or oil-in-aqueous emulsions. Droplets may be sized and/or shaped as described herein for discrete entities. For example, droplets according to the present disclosure generally range from 1 pm to 1000 pm, inclusive, in diameter. Droplets according to the present disclosure may be used to encapsulate cells, nucleic acids (e.g., DNA), enzymes, reagents, reaction mixture, and a variety of other components. The term emulsion may be used to refer to an emulsion produced in, on, or by a microfluidic device and/or flowed from or applied by a microfluidic device.
[0059] "Complementarity" or “complementary” refers to the ability of a nucleic acid to form hydrogen bond(s) or hybridize with another nucleic acid sequence by either traditional Watson-Crick or other non-traditional types. As used herein "hybridization," refers to the binding, duplexing, or hybridizing of a molecule only to a particular nucleotide sequence under low, medium, or highly stringent conditions, including when that sequence is present in a complex mixture (e.g., total cellular) DNA or RNA. See, e.g., Ausubel, et al., Current Protocols In Molecular Biology, John Wiley & Sons, New York, N.Y., 1993. If a nucleotide at a certain position of a polynucleotide is capable of forming a Watson-Crick pairing with a nucleotide at the same position in an antiparallel DNA or RNA strand, then the polynucleotide and the DNA or RNA molecule are complementary to each other at that position. The polynucleotide and the DNA or RNA molecule are "substantially complementary" to each other when a sufficient number of corresponding positions in each molecule are occupied by nucleotides that can hybridize or anneal with each other in order to affect the desired process. A complementary sequence is a sequence capable of annealing under stringent conditions to provide a 3'-terminal serving as the origin of synthesis of complementary chain.
[0060] The terms "amplify," "amplifying," "amplification reaction” and their variants, refer generally to any action or process whereby at least a portion of a nucleic acid molecule (referred to as a template nucleic acid molecule) is replicated or copied into at least one additional nucleic acid molecule. The additional nucleic acid molecule optionally includes sequence that is substantially identical or substantially complementary to at least some portion of the template nucleic acid molecule. The template nucleic acid molecule can be single-stranded or double-stranded and the additional nucleic acid molecule can independently be single-stranded or doublestranded. In some embodiments, amplification includes a template-dependent in vitro enzyme-catalyzed reaction for the production of at least one copy of at least some portion of the nucleic acid molecule or the production of at least one copy of a nucleic acid sequence that is complementary to at least some portion of the nucleic acid molecule. Amplification optionally includes linear or exponential replication of a nucleic acid molecule. In some embodiments, such amplification is performed using isothermal conditions; in other embodiments, such amplification can include thermocycling. In some embodiments, the amplification is a multiplex amplification that includes the simultaneous amplification of a plurality of target sequences in a single amplification reaction. At least some of the target sequences can be situated, on the same nucleic acid molecule or on different target nucleic acid molecules included in the single amplification reaction. In some embodiments, "amplification" includes amplification of at least some portion of DNA- and RNA-based nucleic acids alone, or in combination. The amplification reaction can include single or double-stranded nucleic acid substrates and can further include any of the amplification processes known to one of ordinary skill in the art. In some embodiments, the amplification reaction includes polymerase chain reaction (PCR). In some embodiments, the amplification reaction includes an isothermal amplification reaction such as LAMP. In the present invention, the terms "synthesis" and "amplification" of nucleic acid are used. The synthesis of nucleic acid in the present invention means the elongation or extension of nucleic acid from an oligonucleotide serving as the origin of synthesis. If not only this synthesis but also the formation of other nucleic acid and the elongation or extension reaction of this formed nucleic acid occur continuously, a series of these reactions is comprehensively called amplification. The polynucleic acid produced by the amplification technology employed is generically referred to as an "amplicon" or "amplification product."
[0061] Any nucleic acid amplification method may be utilized, such as a PCR-based assay, e.g., quantitative PCR (qPCR), or an isothermal amplification may be used to detect the presence of certain nucleic acids, e.g., genes of interest, present in discrete entities or one or more components thereof, e.g., cells encapsulated therein. Such assays can be applied to discrete entities within a microfluidic device or a portion thereof or any other suitable location. The conditions of such amplification or PCR-based assays may include detecting nucleic acid amplification over time and may vary in one or more ways. [0062] A number of nucleic acid polymerases can be used in the amplification reactions utilized in certain embodiments provided herein, including any enzyme that can catalyze the polymerization of nucleotides (including analogs thereof) into a nucleic acid strand. Such nucleotide polymerization can occur in a template-dependent fashion. Such polymerases can include without limitation naturally occurring polymerases and any subunits and truncations thereof, mutant polymerases, variant polymerases, recombinant, fusion or otherwise engineered polymerases, chemically modified polymerases, synthetic molecules or assemblies, and any analogs, derivatives or fragments thereof that retain the ability to catalyze such polymerization. Optionally, the polymerase can be a mutant polymerase comprising one or more mutations involving the replacement of one or more amino acids with other amino acids, the insertion or deletion of one or more amino acids from the polymerase, or the linkage of parts of two or more polymerases. Typically, the polymerase comprises one or more active sites at which nucleotide binding and/or catalysis of nucleotide polymerization can occur. Some exemplary polymerases include without limitation DNA polymerases and RNA polymerases. The term "polymerase" and its variants, as used herein, also includes fusion proteins comprising at least two portions linked to each other, where the first portion comprises a peptide that can catalyze the polymerization of nucleotides into a nucleic acid strand and is linked to a second portion that comprises a second polypeptide. In some embodiments, the second polypeptide can include a reporter enzyme or a processivity-enhancing domain. Optionally, the polymerase can possess 5' exonuclease activity or terminal transferase activity. In some embodiments, the polymerase can be optionally reactivated, for example through the use of heat, chemicals or re-addition of new amounts of polymerase into a reaction mixture. In some embodiments, the polymerase can include a hot-start polymerase or an aptamer-based polymerase that optionally can be reactivated.
[0063] 'Forward primer binding site" and "reverse primer binding site" refer to the regions on the template nucleic acid and/or the amplicon to which the forward and reverse primers bind. The primers act to delimit the region of the original template polynucleotide which is exponentially amplified during amplification. In some embodiments, additional primers may bind to the region 5' of the forward primer and/or reverse primers. Where such additional primers are used, the forward primer binding site and/or the reverse primer binding site may encompass the binding regions of these additional primers as well as the binding regions of the primers themselves. For example, in some embodiments, the method may use one or more additional primers which bind to a region that lies 5' of the forward and/or reverse primer binding region. Such a method was disclosed, for example, in W00028082 which discloses the use of "displacement primers" or "outer primers."
[0064] The terms "nucleic acid," "polynucleotides," and "oligonucleotides" refer to biopolymers of nucleotides and, unless the context indicates otherwise, includes modified and unmodified nucleotides, and both DNA and RNA, and modified nucleic acid backbones. For example, in certain embodiments, the nucleic acid is a peptide nucleic acid (PNA) or a locked nucleic acid (LNA). Typically, the methods as described herein are performed using DNA as the nucleic acid template for amplification. However, nucleic acid whose nucleotide is replaced by an artificial derivative or modified nucleic acid from natural DNA or RNA is also included in the nucleic acid of the present invention insofar as it functions as a template for synthesis of complementary chain. The nucleic acid of the present invention is generally contained in a biological sample. The biological sample includes animal, plant or microbial tissues, cells, cultures and excretions, or extracts therefrom. In certain aspects, the biological sample includes intracellular parasitic genomic DNA or RNA such as virus or mycoplasma. The nucleic acid may be derived from nucleic acid contained in said biological sample. For example, genomic DNA, or cDNA synthesized from mRNA, or nucleic acid amplified on the basis of nucleic acid derived from the biological sample, are preferably used in the described methods. Unless denoted otherwise, whenever a oligonucleotide sequence is represented, it will be understood that the nucleotides are in 5' to 3' order from left to right and that "A" denotes deoxyadenosine, "C" denotes deoxycytidine, "G" denotes deoxyguanosine, "T" denotes deoxythymidine, and "U1 denotes uridine. Oligonucleotides are said to have "5' ends" and "3' ends" because mononucleotides are typically reacted to form oligonucleotides via attachment of the 5' phosphate or equivalent group of one nucleotide to the 3' hydroxyl or equivalent group of its neighboring nucleotide, optionally via a phosphodiester or other suitable linkage.
[0065] The phrase “molecular tag” refers to an entity that allows for distinguishing between analytes of a cell, such as RNA analytes of a cell. For example, molecular tags are present in amplicons, which, upon sequencing, enables the identification of the particular analytes (e.g., RNA analytes) from which the amplicons were generated. In various embodiments, a molecular tag is a short nucleotide sequence comprising between 3 to 20 nucleobases, between 5 to 18 nucleobases, between 6 to 15 nucleobases, or between 7 to 12 nucleobases. Molecular tags can be incorporated via a variety of methods, including via primers (e.g., digestible primers such as ribonucleotide or uracil primers, gene specific primers, within a droplet, or within bulk solution). In various embodiments, molecular tags need not be contiguous nucleotide sequences. For example, a molecular tag may be nucleobases located at variable positions within an amplicon. In such embodiments, the nucleobases may be alternate bases (e.g., bases different from reference bases, such as wild-type nucleotide bases) that are located at variable positions within the amplicon. In various embodiments, molecular tags are represented by sequences corresponding to different start and stop sites in RNA analytes which were differentially cleaved to create the start and stop sites. In various embodiments, molecular tags are represented by varying lengths of amplicons, which correspond to degenerate breaking sites on RNA analytes, such as manipulated or artificially induced breaking sites on RNA analytes. Thus, the differing sequences of amplicons corresponding to different start and stop sites enable identification of the particular analytes (e.g., RNA analytes) from which the amplicons were generated. In various embodiments, each molecular tag differs from all other molecular tags. In various embodiments, at least one molecular tag has a same sequence as another molecular tag. In various embodiments, at least 0.1% of molecular tags have a same sequence as another molecular tag. In various embodiments, at least 0.5% of molecular tags have a same sequence as another molecular tag. In various embodiments, at least 1% of molecular tags have a same sequence as another molecular tag. In various embodiments, at least 2% of molecular tags have a same sequence as another molecular tag. In various embodiments, at least 3% of molecular tags have a same sequence as another molecular tag. In various embodiments, at least 4% of molecular tags have a same sequence as another molecular tag. In various embodiments, at least 5% of molecular tags have a same sequence as another molecular tag. In various embodiments, at least 6% of molecular tags have a same sequence as another molecular tag. In various embodiments, at least 7% of molecular tags have a same sequence as another molecular tag. In various embodiments, at least 8% of molecular tags have a same sequence as another molecular tag. In various embodiments, at least 9% of molecular tags have a same sequence as another molecular tag. In various embodiments, at least 10% of molecular tags have a same sequence as another molecular tag. In various embodiments, at least 11%, at least 12%, at least 13%, at least 14%, at least 15%, at least 16%, at least 17%, at least 18%, at least 19%, at least 20%, at least 21%, at least 22%, at least 23%, at least 24%, at least 25%, at least 26%, at least 27%, at least 28%, at least 29%, at least 30%, at least 31%, at least 32%, at least 33%, at least 34%, at least 35%, at least 36%, at least 37%, at least 38%, at least 39%, at least 40%, at least 41%, at least 42%, at least 43%, at least 44%, at least 45%, at least 46%, at least 47%, at least 48%, at least 49%, or at least 50% of molecular tags have a same sequence as another molecular tag.
[0066] The phrase “gene tag” refers to an entity that allows for distinguishing between analytes of different gene targets. For example, gene tags can be sequences that are unique for a particular gene target and therefore, they enable differentiation between amplicons that derive from analytes of different genes. In various embodiments, gene tags can be sequences that enable differentiation between amplicons derived from the same gene. For example, gene tags can enable differentiation between amplicons derived from different RNA transcripts, such as different RNA transcripts from the same gene or different RNA transcripts from different genes. As another example, gene tags can enable differentiation between amplicons derived from different genomic DNA sequences, such as different genomic DNA sequences of the same gene or different genomic DNA sequences from different genes. In various embodiments, a sequence of each gene tag may differ from a sequence of other gene tags. In various embodiments, a gene tag is a short nucleotide sequence comprising between 3 to 20 nucleobases, between 5 to 18 nucleobases, between 6 to 15 nucleobases, or between 7 to 12 nucleobases. In various embodiments, gene tags can be incorporated with reverse primers that hybridize with RNA analytes for performing reverse transcription. Therefore, gene tags are incorporated into cDNA following reverse transcription.
Overview
[0067] Figure (FIG.) 1 A shows an overall system environment for analyzing cell(s) through a single cell workflow analysis, in accordance with an embodiment. Generally, the single cell workflow device 100 is configured to process the cell(s) 110 and generate sequence reads derived from individual cell(s) 110. Further details as to the processes of the single cell workflow device 100 are described below in reference to FIG. IB. The computing device 180 can analyze the sequence reads e.g., for purposes of building RNA/DNA libraries and/or characterizing individual cells. In various embodiments, the single cell workflow device 100 includes at least a microfluidic device that is configured to encapsulate cells with reagents to generate cell lysates comprising RNA and/or genomic (gDNA), encapsulate cell lysates with reaction mixtures, and perform nucleic acid amplification reactions. For example, the microfluidic device can include one or more fluidic channels that are fluidically connected. Therefore, the combining of an aqueous fluid through a first channel and a carrier fluid through a second channel results in the generation of emulsion droplets. In various embodiments, the fluidic channels of the microfluidic device may have at least one cross- sectional dimension on the order of a millimeter or smaller (e.g., less than or equal to about 1 millimeter). Additional details of microchannel design and dimensions is described in International Patent Application No. PCT/US2016/016444 and US Patent Application No. 14/420,646, each of which is hereby incorporated by reference in its entirety. An example of a microfluidic device is the Tapestri™ Platform. While the instant disclosure provides a specific example, it is understood by one of ordinary skill in the art that the disclosed principles are not limited thereto and may be implemented independently of the Tapestri™, miseq™ and novaseq™ devices.
[0068] In various embodiments, the single cell workflow device 100 may also include one or more of (a) a temperature control module for controlling the temperature of one or more portions of the subject devices and/or droplets therein and which is operably connected to the microfluidic device(s), (b) a detection means, i.e., a detector, e.g., an optical imager, operably connected to the microfluidic device(s), (c) an incubator, e.g., a cell incubator, operably connected to the microfluidic device(s), and (d) a sequencer operably connected to the microfluidic device(s). The one or more temperature and/or pressure control modules provide control over the temperature and/or pressure of a carrier fluid in one or more flow channels of a device. As an example, a temperature control module may be one or more thermal cycler that regulates the temperature for performing nucleic acid amplification. The one or more detection means i.e., a detector, e.g., an optical imager, are configured for detecting the presence of one or more droplets, or one or more characteristics thereof, including their composition. In some embodiments, detection means are configured to recognize one or more components of one or more droplets, in one or more flow channel. The sequencer is a hardware device configured to perform sequencing, such as next generation sequencing. Examples of sequencers include Illumina sequencers (e.g., MiniSeq™, MiSeq™, NextSeq™ 550 Series, or NextSeq™ 2000), Roche sequencing system 454, and Thermo Fisher Scientific sequencers (e.g., Ion GeneStudio S5 system, Ion Torrent Genexus System).
[0069] Reference is now made to FIG. IB, which depicts an embodiment of processing single cells to generate amplified nucleic acid molecules for sequencing. In various embodiments, the amplified nucleic acid molecules include molecular tags. Here, the processing of single cells can be performed by a single cell workflow device (e.g., the single cell workflow device 100 disclosed in FIG. 1 A). Specifically, FIG. IB depicts a workflow process including the steps of cell encapsulation 160, analyte release 165, cell barcoding 170, and target amplification 175 of target nucleic acid molecules.
[0070] Generally, the cell encapsulation step 160 involves encapsulating a single cell 110 with reagents 120 into a droplet. In various embodiments, the droplet is formed by partitioning aqueous fluid containing the cell 110 and reagents 120 into a carrier fluid (e.g., oil 115), thereby resulting in a aqueous fluid-in-oil emulsion. The droplet includes encapsulated cell 125 and the reagents 120. The encapsulated cell undergoes an analyte release at step 165. Generally, the reagents cause the cell to lyse, thereby generating a cell lysate 130 within the droplet. The cell lysate 130 includes the contents of the cell, which can include one or more different types of analytes (e.g., RNA transcripts, DNA, protein, lipids, or carbohydrates). In various embodiments, the different analytes of the cell lysate 130 can interact with reagents 120 within the droplet. In particular embodiments, reverse transcriptase in the reagents 120 can reverse transcribe cDNA molecules from RNA transcripts that are present in the cell lysate 130.
[0071] In particular embodiments, the reagents 120 include primers. The primers are useful for conducting a reaction, such as for conducting reverse transcription to generate cDNA. In particular embodiments, the primers include molecular tags and therefore, the molecular tags are incorporated into the cDNA following reverse transcription. In some embodiments, the primers are gene specific primers. In various embodiments, the primers are reverse primers that are capable of hybridizing to a portion of a nucleic acid, such as a RNA transcript. In such embodiments, the primers enables the reverse transcription of RNA transcripts to generate cDNA. Therefore, the reverse primers participate in the reverse transcription reaction, thereby generating cDNA molecules that incorporate molecular tags. In particular embodiments, the primers are digestible primers. For example, digestible primers can participate in the reverse transcription of RNA transcripts to generate cDNA, but are later digested such that the digestible primers do not participate in subsequent reactions involving the cDNA (e.g., amplification of cDNA). Further details on digestible primers is described below. In various embodiments, the reagents 120 include primers that further include gene tags. Gene tags can be sequences that are unique for a particular gene target and therefore, they enable differentiation between amplicons that derive from analytes of different genes.
[0072] In various embodiments, the reagents 120 include template switching oligonucleotides (TSOs). In particular embodiments, TSOs are DNA oligonucleotide sequences. A portion of the TSOs hybridizes with cDNA molecules, thereby enabling the subsequent template switching. In various embodiments, the TSOs include guanosines at its 3’ end. In various embodiments, the TSOs include riboguanosines at its 3’ end. In various embodiments, the TSOs include three riboguanosines (rGrGrG) at its 3’ end which hybridizes with 3’ dC extension of cDNA molecules. In various embodiments, TSOs include molecular tags. Therefore, the molecular tags of the TSOs are incorporated through the template switching.
[0073] In various embodiments, the reagents 120 include truncation oligonucleotides and cleaving enzymes, such as RNAse. In various embodiments, the truncation oligonucleotides are DNA oligonucleotides that are complementary to specific portions of RNA sequences. Therefore, different truncation oligonucleotides can be designed and included in the reagents 120. In particular embodiments, the RNase is RNaseH. Generally, RNase, such as RNase H, cleaves RNA/DNA duplexes. Therefore, RNAseH can cleave truncation oligonucleotides that hybridize with RNA. Altogether, different truncation oligonucleotides can be designed to hybridize with different regions of different RNA analytes of the cell. For example, a first truncation oligonucleotide can be designed to hybridize at the 3’ end of a first RNA analyte of the cell whereas a second truncation oligonucleotide can be designed to hybridize with a portion of a second RNA analyte that is a number of positions away from the 3’ end of the second RNA analyte. Thus, RNaseH cleavage differentially cleaves the different RNA analytes based on the location of the hybridized truncation oligonucleotides (e.g., within the hybridized region between RNA and DNA duplex), thereby generating RNA analytes with different start and stop sites. As discussed further below, these RNA analytes with different sequences (e.g., corresponding to different start and stop sites) can lead to amplicons with different sequences, which serves as the molecular tags that enable distinguishing between different RNA analytes.
[0074] In various embodiments, the reagents 120 include universal bases, examples of which include an inosine base, 2'-DeoxyNebularine, 2’deoxyinosine, 5-nitroindole, 5' 5- Nitroindole, or 3 -Nitropyrrole. Generally, universal bases exhibit base pairing with any of adenosine, cytosine, guanine, and thymine. In various embodiments, the universal bases are introduced in conjunction with primers, such as reverse primers for reverse transcription. Therefore, following reverse transcription, random sequences complementary to universal base sequences are incorporated into cDNA molecules. In various embodiments, sequences of at least 3 universal bases are included in the reagents 120. In various embodiments, sequences of at least one or more universal bases are included in the reagents 120. In various embodiments, sequences of at least two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, or ten or more consecutive universal bases are included in the reagents 120. Thus, random sequences complementary to the sequence of universal bases can be incorporated into the cDNA, and further propagated to the amplicons that are later sequenced.
[0075] In various embodiments, the reagents 120 include alternate bases. In various embodiments, alternate bases include bases other than deoxynucleotides (dNTPs) (e.g., other than deoxyadenosine 5 ’-triphosphate, deoxyguanine 5 ’-trisphosphate, deoxycytidine 5’- triphosphate, and deoxythymidine 5 ’-triphosphate). Example alternate bases include, but are not limited to: inosine base, 2'-DeoxyNebularine, 2’deoxyinosine, 5-nitroindole, 5' 5- Nitroindole, or 3 -Nitropyrrole. In various embodiments, alternate bases include universal bases. Further examples of universal bases are described in Liang, F., et al “Universal Base Analogues and their Applications in DNA Sequencing Technology.” RSC Advances, 35, 2013, which is hereby incorporated by reference in its entirety.
[0076] In particular embodiments, alternate bases are included in the reagents 120 at a particular ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1 :2 ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1 :5 ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1 : 10 ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1 :20 ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1:30 ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1 :40 ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1 :50 ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1 :75 ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1 : 100 ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1 : 125 ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1: 150 ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1 : 175 ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1 :200 ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1 :225 ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1 :250 ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1 :275 ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1:300 ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1 :400 ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 at a ratio less than a 1 :500 ratio relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 between a ratio of 1: 10 relative to dNTPs and 1 :500 relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 between a ratio of 1 :50 relative to dNTPs and 1 :300 relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 between a ratio of 1 : 100 relative to dNTPs and 1 :250 relative to dNTPs. In various embodiments, alternate bases are included in the reagents 120 between a ratio of 1 : 150 relative to dNTPs and 1 :200 relative to dNTPs. Thus, during reverse transcription, one or more alternate bases can be incorporated into the cDNA, and further propagated to the amplicons that are later sequenced. [0077] The cell barcoding step 170 involves encapsulating the cell lysate 130 into a second droplet along with a barcode 145 and/or reaction mixture 140. In various embodiments, the second emulsion is formed by partitioning aqueous fluid containing the cell lysate 130 into immiscible oil 135. As shown in FIG. IB, the reaction mixture 140 and barcode 145 can be introduced through a separate stream of aqueous fluid, thereby partitioning the reaction mixture 140 and barcode 145 into the second droplet along with the cell lysate 130.
[0078] Generally, the reaction mixture 140 enables the performance of a reaction, such as a nucleic acid amplification reaction. In various embodiments, the reaction mixture 140 includes primers for introducing molecular tags. In various embodiments, the primers are gene specific primers. Thus, through one or more rounds of nucleic acid amplification, the molecular tags are incorporated into amplicons. In various embodiments, the reaction mixture 140 includes one or more enzymes capable of digesting primers and/or molecular tags. In such embodiments where the reaction mixture 140 includes one or more enzymes capable of digesting the digestible primers, the enzymes digest the digestible primers here in this droplet at step 170. In other embodiments, the digestible primers are previously digested in the droplet at step 165 and therefore, need not be digested here at step 170. In various embodiments, the enzymes digest the digestible primers prior to a first cycle of nucleic acid amplification. In various embodiments, the enzymes digest the digestible primers subsequent to a first cycle of nucleic acid amplification. In various embodiments, the enzymes digest the digestible primers subsequent to a first cycle of nucleic acid amplification, but prior to a second cycle of nucleic acid amplification.
[0079] The target amplification step 175 involves amplifying target nucleic acids. For example, target nucleic acids of the cell lysate undergo amplification using the reaction mixture 140 in the second emulsion, thereby generating amplicons derived from the target nucleic acids. Generally, at step 175, any digestible primers that were previously introduced (e.g., previously introduced as part of the reagents 120) have been digested, thereby reducing or completely eliminating the presence of digestible primers. Therefore, digestible primers do not play a role in the target amplification 175 step.
[0080] Generally, a barcode 145 can label a target nucleic acid or amplicon to be analyzed (e.g., an analyte of the cell lysate such as genomic DNA or cDNA that has been reverse transcribed from RNA), which enables subsequent identification of the origin of a sequence read (e.g., a cellular origin) that is derived from the target nucleic acid. In various embodiments, multiple barcodes 145 can label multiple target nucleic acid of the cell lysate, thereby enabling the subsequent identification of the origin of large quantities of sequence reads.
[0081] As referred herein, the workflow process shown in FIG. IB is a two-step workflow process in which analyte release 165 from the cell occurs separate from the steps of cell barcoding 170 and target amplification 175. Specifically, analyte release 165 from a cell occurs within a first droplet followed by cell barcoding 170 and target amplification 175 in a second emulsion. In various embodiments, alternative workflow processes (e.g., workflow processes other than the two-step workflow process shown in FIG. 1 A) can be employed. For example, the cell 110, reagents 120, reaction mixture 140, and barcode 145 can be encapsulated in a single emulsion. Thus, analyte release 165 can occur within the droplet, followed by cell barcoding 170 and target amplification 175 within the same droplet. Additionally, although FIG. IB depicts cell barcoding 170 and target amplification 175 as two separate steps, in various embodiments, the target nucleic acid is labeled with a barcode 145 through the nucleic acid amplification step.
[0082] FIG. 2 is a flow process for analyzing nucleic acid sequences derived from analytes of the single cell, in accordance with an embodiment. Specifically, FIG. 2 depicts the steps of pooling amplified nucleic acids at step 205, sequencing the amplified nucleic acids at step 210, read alignment at step 215, and characterization at step 220. Generally, the flow process shown in FIG. 2 is a continuation of the workflow process shown in FIG. IB. [0083] For example, after target amplification at step 175 of FIG. IB, the amplified nucleic acids 250A, 250B, and 250C are pooled at step 205 shown in FIG. 2. For example, individual droplets containing amplified nucleic acids are pooled and collected, and the immiscible oil of the emulsions is removed. Thus, amplified nucleic acids from multiple cells can be pooled together. FIG. 2 depicts three amplified nucleic acids 250A, 250B, and 250C. In various embodiments, pooled nucleic acids can include hundreds, thousands, or millions of nucleic acids derived from analytes of multiple cells. In various embodiments, molecular tags are introduced in bulk to the pooled amplified nucleic acids 205, otherwise referred to as amplicons. Thus, the molecular tags can be incorporated into the amplicons in bulk prior to sequencing.
[0084] In various embodiments, each amplified nucleic acid 250 includes at least a sequence of a target nucleic acid 240 and a barcode 230. In various embodiments, an amplified nucleic acid 250 can include additional sequences, such as any of a universal primer sequence, a random primer sequence, a gene specific primer forward sequence, a gene specific primer reverse sequence, a constant region, or sequencing adapters. In various embodiments, each amplified nucleic acid 250 need not include the sequence of the target nucleic acid 240. For example, instead of the sequence of the target nucleic acid 240, an amplified nucleic acid can include a gene tag. [0085] In various embodiments, the amplified nucleic acids 250A, 250B, and 250C are derived from the same single cell and therefore, the barcodes 230A, 230B, and 230C are the same. Therefore, sequencing of the barcodes 230 enables the determination that the amplified nucleic acids 250 are derived from the same cell. In various embodiments, the amplified nucleic acids 250A, 250B, and 250C are pooled and derived from different cells. Therefore, the barcodes 230 A, 230B, and 230C are different from one another and sequencing of the barcodes 230 enables the determination that the amplified nucleic acids 250 are derived from different cells.
[0086] Although not shown in FIG. 2, amplified nucleic acids 205 may further include a molecular tag that enables distinguishing between analytes (e.g., RNA analytes) of a cell. In various embodiments, the molecular tag is located on the 3’ end of the amplified nucleic acid 205. In various embodiments, the molecular tag is located on the 5’ end of the amplified nucleic acid 205. In various embodiments, the molecular tag is not located on an end of the amplified nucleic acid 205, but rather within the amplified nucleic acid 205.
[0087] At step 210, the pooled amplified nucleic acids 250 undergo sequencing to generate sequence reads. Sequence reads originating from individual cells are clustered according to the barcode sequences included in the amplicons. At step 215, the sequence reads for each single cell are aligned (e.g., to a reference genome). Aligning the sequence reads to the reference genome enables the determination of where in the genome the sequence read is derived from. For example, multiple sequence reads generated from amplicons derived from a RNA transcript molecule, when aligned to a position of the genome, can reveal that a gene at the position of the genome was transcribed. As another example, multiple sequence reads generated amplicons derived from a genomic DNA molecule, when aligned to a position of the genome, can reveal the sequence of the gene at the position of the genome.
[0088] The alignment of sequence reads at step 215 generates libraries, such as single cell DNA libraries or single cell RNA libraries. Therefore, at step 220, characterization of the libraries and/or the single cells can be performed. In various embodiments, characterization of a library (e.g., DNA library or RNA library) can involve determining sequencing metrics In various embodiments, characterization of single cells can involve identifying one or more mutations (e.g., allelic variants, point mutations, single nucleotide variations/polymorphisms, translocations, DNA/RNA fusions, loss of heterozygosity) that are present in one or more of the single cells. Further description regarding characterization of single cells is described in PCT/US2020/026480 and PCT/US2020/026482, each of which is hereby incorporated by reference in its entirety.
[0089] In various embodiments, characterization at step 220 involves quantifying the number of analytes present in a cell based on the molecular tags included on the amplicons. For example, different molecular tags enables distinguishing between amplicons derived from different RNA analytes of a cell. Thus, by quantifying the number of different molecular tags, the number of RNA analytes for particular genes can be quantified as a measure of gene expression in the cell. For example, a larger quantified number of molecular tags can indicate the presence of a larger number of RNA analytes for a particular gene, thereby representing higher gene expression. In contrast, a lower quantified number of molecular tags can indicate the presence of fewer RNA analytes for a particular gene, thereby representing lower gene expression.
Methods for Performing Single-Cell Analysis
Encapsulation, Analyte Release, Barcoding, and Amplification
[0090] Embodiments described herein involve encapsulating one or more cells (e.g., at step 160 in FIG. IB) to perform single-cell analysis on the one or more cells. In various embodiments, the one or more cells can be isolated from a test sample obtained from a subject or a patient. In various embodiments, the one or more cells are healthy cells taken from a healthy subject. In various embodiments, the one or more cells include cancer cells taken from a subject previously diagnosed with cancer. For example, such cancer cells can be tumor cells available in the bloodstream of the subject diagnosed with cancer. Thus, single-cell analysis of the tumor cells enables cellular and sub-cellular prediction of the subject’s cancer. In various embodiments, the test sample is obtained from a subject following treatment of the subject (e.g., following a therapy such as cancer therapy). Thus, single-cell analysis of the cells enables cellular and sub-cellular prediction of the subject’s response to a therapy.
[0091] In various embodiments, encapsulating a cell with reagents is accomplished by combining an aqueous phase including the cell and reagents with an immiscible oil phase. In one embodiment, an aqueous phase including the cell and reagents are flowed together with a flowing immiscible oil phase such that water in oil emulsions are formed, where at least one emulsion includes a single cell and the reagents. In various embodiments the immiscible oil phase includes a fluorous oil, a fluorous non-ionic surfactant, or both. In various embodiments, emulsions can have an internal volume of about 0.001 to 1000 picoliters or more and can range from 0.1 to 1000 pm in diameter.
[0092] In various embodiments, the aqueous phase including the cell and reagents need not be simultaneously flowing with the immiscible oil phase. For example, the aqueous phase can be flowed to contact a stationary reservoir of the immiscible oil phase, thereby enabling the budding of water in oil emulsions within the stationary oil reservoir.
[0093] In various embodiments, combining the aqueous phase and the immiscible oil phase can be performed in a microfluidic device. For example, the aqueous phase can flow through a microchannel of the microfluidic device to contact the immiscible oil phase, which is simultaneously flowing through a separate microchannel or is held in a stationary reservoir of the microfluidic device. The encapsulated cell and reagents within an emulsion can then be flowed through the microfluidic device to undergo cell lysis.
[0094] Further example embodiments of adding reagents and cells to emulsions can include merging emulsions that separately contain the cells and reagents or picoinjecting reagents into an emulsion. Further description of example embodiments is described in US Application No. 14/420,646, which is hereby incorporated by reference in its entirety.
[0095] Generally, the encapsulated cell in an emulsion is lysed to generate cell lysate. In various embodiments, the cell is lysed due to the reagents which include one or more lysing agents that cause the cell to lyse. Examples of lysing agents include detergents such as Triton X-100, NP-40 (e.g., Tergitol-type NP-40 or nonyl phenoxypolyethoxylethanol), as well as cytotoxins. Examples of NP-40 include Thermo Scientific NP-40 Surfact-Amps Detergent solution, Igepal® ca-630, and Sigma Aldrich NP-40 (TERGITOL Type NP-40). In some embodiments, cell lysis may also, or instead, rely on techniques that do not involve a lysing agent in the reagent. For example, lysis may be achieved by mechanical techniques that may employ various geometric features to effect piercing, shearing, abrading, etc. of cells. Other types of mechanical breakage such as acoustic techniques may also be used. Further, thermal energy can also be used to lyse cells. Any convenient means of effecting cell lysis may be employed in the methods described herein.
[0096] In various embodiments, the reagents include reverse transcriptase which reverse transcribes mRNA transcripts released from the cell to generate corresponding cDNA and further include primers that hybridize with mRNA transcripts, thereby enabling the reverse transcription reaction to occur. In various embodiments, molecular tags are introduced in the reagents and therefore, are incorporated into cDNA after reverse transcription. [0097] FIGs. 3 A-3C depict the processing and releasing of analytes of a single cell in a droplet, in accordance with an embodiment. In FIG. 3 A, the cell is lysed, as indicated by the dotted line of the cell membrane. In some embodiments, the reagents include a detergent, such as NP40 (e.g., 0.01% or 1.0% NP40) or Triton-XlOO, which causes the cell to lyse. The lysed cell includes analytes such as RNA transcripts within the cytoplasm of the cell as well as packaged DNA 302, which refers to the organization of DNA with histones, thereby forming nucleosomes that are packaged as chromatin. As shown in FIG. 3 A, the reagents included in the emulsion 300A further includes reverse transcriptase (abbreviated as “RT” 310). Furthermore, the reagents included in the emulsion 300A further includes an enzyme 312 that digests the packaged DNA 302. In various embodiments, the enzyme 312 is proteinase K.
[0098] FIG. 3B depicts the emulsion 300B in a second state as reverse transcriptase performs reverse transcription on the RNA transcripts and the enzymes 312 digest the packaged DNA 302. In various embodiments, cDNA is generated as a result of reverse transcription. In particular embodiments, the generated cDNA include molecular tags. Additionally the genomic DNA is released from the packaged DNA 302 form. FIG. 3C depicts the emulsion 300C in a third state that includes synthesized cDNA 306. FIG. 3C also depicts freed gDNA 340 that is released from the packaged DNA 302. In various embodiments, the cDNA 306 include molecular tags.
[0099] In various embodiments, the emulsion 300C can be exposed to conditions to inactivate the enzymes 312. In various embodiments, the emulsion 300C is exposed to an elevated temperature of at least 50°C to inactivate the enzymes 312. In various embodiments, the emulsion 300C is exposed to an elevated temperature of at least 60°C to inactivate the enzymes 312. In various embodiments, the emulsion 300C is exposed to an elevated temperature of at least 70°C to inactivate the enzymes 312. In various embodiments, the emulsion 300C is exposed to an elevated temperature of at least 80°C to inactivate the enzymes 312. In various embodiments, the emulsion 300C is exposed to an elevated temperature of at least 90°C to inactivate the enzymes 312. In various embodiments, the emulsion 300C is exposed to an elevated temperature of at least 95°C to inactivate the enzymes 312. In various embodiments, the emulsion 300C is exposed to an elevated temperature of at least 100°C to inactivate the enzymes 312.
[00100] Returning to the step of cell barcoding 170 in FIG. IB, it includes encapsulating a cell lysate 130 with a reaction mixture 140 and a barcode 145. Generally, the reaction mixture includes reactants sufficient for performing a reaction, such as nucleic acid amplification, on analytes of the cell lysate. In various embodiments, the reaction mixture 140 includes components, such as primers, for performing the nucleic acid reaction on the analytes. Such primers are capable of acting as a point of initiation of synthesis along a complementary strand when placed under conditions in which synthesis of a primer extension product which is complementary to a nucleic acid strand is catalyzed.
[00101] In various embodiments, a cell lysate is encapsulated with a reaction mixture and a barcode by combining an aqueous phase including the reaction mixture and the barcode with the cell lysate and an immiscible oil phase. In one embodiment, an aqueous phase including the reaction mixture and the barcode are flowed together with a flowing cell lysate and a flowing immiscible oil phase such that water in oil emulsions are formed, where at least one emulsion includes a cell lysate, the reaction mixture, and the barcode. In various embodiments the immiscible oil phase includes a fluorous oil, a fluorous non-ionic surfactant, or both. In various embodiments, emulsions can have an internal volume of about 0.001 to 1000 picoliters or more and can range from 0.1 to 1000 pm in diameter.
[00102] In various embodiments, combining the aqueous phase and the immiscible oil phase can be performed in a microfluidic device. For example, the aqueous phase can flow through a microchannel of the microfluidic device to contact the immiscible oil phase, which is simultaneously flowing through a separate microchannel or is held in a stationary reservoir of the microfluidic device. The encapsulated cell lysate, reaction mixture, and barcode within an emulsion can then be flowed through the microfluidic device to perform amplification of target nucleic acids.
[00103] Further example embodiments of adding reaction mixture and barcodes to emulsions can include merging emulsions that separately contain the cell lysate and reaction mixture and barcodes or picoinjecting the reaction mixture and/or barcode into an emulsion. Further description of example embodiments of merging emulsions or picoinjecting substances into an emulsion is found in US Application No. 14/420,646, which is hereby incorporated by reference in its entirety.
[00104] In various embodiments, subsequent to adding the reaction mixture and barcode to an emulsion, the digestible primers are digested. As used herein, “digested primers” encompass primers that are broken down such that the primer can no longer hybridize with a target sequence. In various embodiments, “digested primers” further encompasses completely digested primers that are reduced to individual nucleotides. Digestible primers are digested to remove their subsequent participation in reactions such as nucleic acid amplification. In various embodiments, the digestion of digestible primers reduces or eliminates presence of the digestible primers. This can include digestible primers that have formed primer byproducts and misprimed digestible primers (e.g., digestible primers that have primed a different nucleic acid such as genomic DNA).
[00105] The emulsion may be incubated under conditions that facilitates the nucleic acid amplification reaction. In various embodiments, the emulsion may be incubated on the same microfluidic device as was used to add the reaction mixture and/or barcode, or may be incubated on a separate device. In certain embodiments, incubating the emulsion under conditions that facilitates nucleic acid amplification is performed on the same microfluidic device used to encapsulate the cells and lyse the cells. Incubating the emulsions may take a variety of forms. In certain aspects, the emulsions containing the reaction mix, barcode, and cell lysate may be flowed through a channel that incubates the emulsions under conditions effective for nucleic acid amplification. Flowing the microdroplets through a channel may involve a channel that snakes over various temperature zones maintained at temperatures effective for PCR. Such channels may, for example, cycle over two or more temperature zones, wherein at least one zone is maintained at about 65° C. and at least one zone is maintained at about 95° C. As the drops move through such zones, their temperature cycles, as needed for nucleic acid amplification. The number of zones, and the respective temperature of each zone, may be readily determined by those of skill in the art to achieve the desired nucleic acid amplification. Additionally, the extent of nucleic amplification can be controlled by modulating the concentration of the reactants in the reaction mixture. In some instances, this is useful for fine tuning of the reactions in which the amplified products are used.
[00106] In various embodiments, following nucleic acid amplification, emulsions containing the amplified nucleic acids are collected. In various embodiments, the emulsions are collected in a well, such as a well of a microfluidic device. In various embodiments, the emulsions are collected in a reservoir or a tube, such as an Eppendorf tube. Once collected, the amplified nucleic acids across the different emulsions are pooled. In one embodiment, the emulsions are broken by providing an external stimuli to pool the amplified nucleic acids. In one embodiment, the emulsions naturally aggregate over time given the density differences between the aqueous phase and immiscible oil phase. Thus, the amplified nucleic acids pool in the aqueous phase. [00107] Following pooling, the amplified nucleic acids can undergo further preparation for sequencing. For example, sequencing adapters can be added to the pooled nucleic acids. Example sequencing adapters are P5 and P7 sequencing adapters. The sequencing adapters enable the subsequent sequencing of the nucleic acids.
Sequencing and Read Alignment
[00108] Amplified nucleic acids are sequenced to obtain sequence reads for generating a sequencing library. Here, the amplified nucleic acids include molecular tags and are sequenced to generate sequence reads with molecular tags. Sequence reads can be achieved with commercially available next generation sequencing (NGS) platforms, including platforms that perform any of sequencing by synthesis, sequencing by ligation, pyrosequencing, using reversible terminator chemistry, using phospholinked fluorescent nucleotides, or real-time sequencing. As an example, amplified nucleic acids may be sequenced on an Illumina MiSeq platform.
[00109] When pyrosequencing, libraries of NGS fragments are cloned in-situ amplified by capture of one matrix molecule using granules coated with oligonucleotides complementary to adapters. Each granule containing a matrix of the same type is placed in a microbubble of the “water in oil” type and the matrix is cloned amplified using a method called emulsion PCR. After amplification, the emulsion is destroyed and the granules are stacked in separate wells of a titration picoplate acting as a flow cell during sequencing reactions. The ordered multiple administration of each of the four dNTP reagents into the flow cell occurs in the presence of sequencing enzymes and a luminescent reporter, such as luciferase. In the case where a suitable dNTP is added to the 3 ' end of the sequencing primer, the resulting ATP produces a flash of luminescence within the well, which is recorded using a CCD camera. It is possible to achieve a read length of more than or equal to 400 bases, and it is possible to obtain 106 readings of the sequence, resulting in up to 500 million base pairs (megabytes) of the sequence. Additional details for pyrosequencing is described in Voelkerding et al., Clinical Chem., 55: 641-658, 2009; MacLean et al., Nature Rev. Microbiol., 7: 287-296; US patent No. 6,210,891; US patent No. 6,258,568; each of which is hereby incorporated by reference in its entirety.
[00110] On the Solexa / Illumina platform, sequencing data is produced in the form of short readings. In this method, fragments of a library of NGS fragments are captured on the surface of a flow cell that is coated with oligonucleotide anchor molecules. An anchor molecule is used as a PCR primer, but due to the length of the matrix and its proximity to other nearby anchor oligonucleotides, elongation by PCR leads to the formation of a “vault” of the molecule with its hybridization with the neighboring anchor oligonucleotide and the formation of a bridging structure on the surface of the flow cell . These DNA loops are denatured and cleaved. Straight chains are then sequenced using reversibly stained terminators. The nucleotides included in the sequence are determined by detecting fluorescence after inclusion, where each fluorescent and blocking agent is removed prior to the next dNTP addition cycle. Additional details for sequencing using the Illumina platform is found in Voelkerding et al., Clinical Chem., 55: 641-658, 2009; MacLean et al., Nature Rev. Microbiol., 7: 287-296; US patent No. 6,833,246; US patent No. 7,115,400; US patent No. 6,969,488; each of which is hereby incorporated by reference in its entirety.
[00111] Sequencing of nucleic acid molecules using SOLiD technology includes clonal amplification of the library of NGS fragments using emulsion PCR. After that, the granules containing the matrix are immobilized on the derivatized surface of the glass flow cell and annealed with a primer complementary to the adapter oligonucleotide. However, instead of using the indicated primer for 3 'extension, it is used to obtain a 5' phosphate group for ligation for test probes containing two probe-specific bases followed by 6 degenerate bases and one of four fluorescent labels. In the SOLiD system, test probes have 16 possible combinations of two bases at the 3 'end of each probe and one of four fluorescent dyes at the 5' end. The color of the fluorescent dye and, thus, the identity of each probe, corresponds to a certain color space coding scheme. After many cycles of alignment of the probe, ligation of the probe and detection of a fluorescent signal, denaturation followed by a second sequencing cycle using a primer that is shifted by one base compared to the original primer. In this way, the sequence of the matrix can be reconstructed by calculation; matrix bases are checked twice, which leads to increased accuracy. Additional details for sequencing using SOLiD technology is found in Voelkerding et al., Clinical Chem., 55: 641-658, 2009; MacLean et al., Nature Rev. Microbiol., 7: 287-296; US patent No. 5,912,148; US patent No. 6,130,073; each of which is incorporated by reference in its entirety.
[00112] In particular embodiments, HeliScope from Helicos BioSciences is used. Sequencing is achieved by the addition of polymerase and serial additions of fluorescently- labeled dNTP reagents. Switching on leads to the appearance of a fluorescent signal corresponding to dNTP, and the specified signal is captured by the CCD camera before each dNTP addition cycle. The reading length of the sequence varies from 25-50 nucleotides with a total yield exceeding 1 billion nucleotide pairs per analytical work cycle. Additional details for performing sequencing using HeliScope is found in Voelkerding et al., Clinical Chem., 55: 641-658, 2009; MacLean et al., Nature Rev. Microbiol., 7: 287-296; US Patent No. 7,169,560; US patent No. 7,282,337; US patent No. 7,482,120; US patent No. 7,501,245; US patent No. 6,818,395; US patent No. 6,911,345; US patent No. 7,501,245; each of which is incorporated by reference in its entirety.
[00113] In some embodiments, a Roche sequencing system 454 is used. Sequencing 454 involves two steps. In the first step, DNA is cut into fragments of approximately 300-800 base pairs, and these fragments have blunt ends. Oligonucleotide adapters are then ligated to the ends of the fragments. The adapter serve as primers for amplification and sequencing of fragments. Fragments can be attached to DNA-capture beads, for example, streptavidin- coated beads, using, for example, an adapter that contains a 5'-biotin tag. Fragments attached to the granules are amplified by PCR within the droplets of an oil-water emulsion. The result is multiple copies of cloned amplified DNA fragments on each bead. At the second stage, the granules are captured in wells (several picoliters in volume). Pyrosequencing is carried out on each DNA fragment in parallel. Adding one or more nucleotides leads to the generation of a light signal, which is recorded on the CCD camera of the sequencing instrument. The signal intensity is proportional to the number of nucleotides included. Pyrosequencing uses pyrophosphate (PPi), which is released upon the addition of a nucleotide. PPi is converted to ATP using ATP sulfurylase in the presence of adenosine 5 'phosphosulfate. Luciferase uses ATP to convert luciferin to oxyluciferin, and as a result of this reaction, light is generated that is detected and analyzed. Additional details for performing sequencing 454 is found in Margulies et al. (2005) Nature 437: 376-380, which is hereby incorporated by reference in its entirety.
[00114] Ion Torrent technology is a DNA sequencing method based on the detection of hydrogen ions that are released during DNA polymerization. The microwell contains a fragment of a library of NGS fragments to be sequenced. Under the microwell layer is the hypersensitive ion sensor ISFET. All layers are contained within a semiconductor CMOS chip, similar to the chip used in the electronics industry. When dNTP is incorporated into a growing complementary chain, a hydrogen ion is released that excites a hypersensitive ion sensor. If homopolymer repeats are present in the sequence of the template, multiple dNTP molecules will be included in one cycle. This results in a corresponding amount of hydrogen atoms being released and in proportion to a higher electrical signal. This technology is different from other sequencing technologies that do not use modified nucleotides or optical devices. Additional details for Ion Torrent Technology is found in Science 327 (5970): 1190 (2010); US Patent Application Publication Nos. 20090026082, 20090127589, 20100301398, 20100197507, 20100188073, and 20100137143, each of which is incorporated by reference in its entirety.
[00115] In various embodiments, sequencing is performed using Oxford Nanopore technologies. Additional details for the Oxford Nanopore technology is described in Jain, M., et al. The Oxford Nanopore MinlON: delivery of nanopore sequencing to the genomics community. Genome Biol 17, 239 (2016), which is incorporated by reference in its entirety. [00116] In various embodiments, sequencing is performed using PacBio technologies. Additional details for PacBio sequencing is described in Rhoads, A. et al, PacBio Sequencing and its Applications. Genomics, Proteomics, & Bioinformatics, 13(5), (2015), 278-289, which is incorporated by reference in its entirety.
[00117] In various embodiments, sequencing reads obtained from the NGS methods can be filtered by quality and grouped by barcode sequence using any algorithms known in the art, e.g., Python script barcodeCleanup.py . In some embodiments, a given sequencing read may be discarded if more than about 20% of its bases have a quality score (Q-score) less than Q20, indicating a base call accuracy of about 99%. In some embodiments, a given sequencing read may be discarded if more than about 5%, about 10%, about 15%, about 20%, about 25%, about 30% have a Q-score less than Q10, Q20, Q30, Q40, Q50, Q60, or more, indicating a base call accuracy of about 90%, about 99%, about 99.9%, about 99.99%, about 99.999%, about 99.9999%, or more, respectively.
[00118] In some embodiments, all sequencing reads associated with a barcode containing less than 50 reads may be discarded to ensure that all barcode groups, representing single cells, contain a sufficient number of high-quality reads. In some embodiments, all sequencing reads associated with a barcode containing less than 30, less than 40, less than 50, less than 60, less than 70, less than 80, less than 90, less than 100 or more may be discarded to ensure the quality of the barcode groups representing single cells.
[00119] Sequence reads with common barcode sequences (e.g., meaning that sequence reads originated from the same cell) may be aligned to a reference genome using known methods in the art to determine alignment position information. The alignment position information may indicate a beginning position and an end position of a region in the reference genome that corresponds to a beginning nucleotide base and end nucleotide base of a given sequence read. A region in the reference genome may be associated with a target gene or a segment of a gene. Example aligner algorithms include BWA, Bowtie, Spliced Transcripts Alignment to a Reference (STAR), Tophat, or HISAT2. Further details for aligning sequence reads to reference sequences is described in US Application No.
16/279,315, which is hereby incorporated by reference in its entirety. In various embodiments, an output file having SAM (sequence alignment map) format or BAM (binary alignment map) format may be generated and output for subsequent analysis.
Example Methods of Incorporating Molecular Tags
[00120] Embodiments disclosed herein refer to a process for analyzing analytes of a cell through a single-cell workflow using molecular tags. The embodiments described herein, including the embodiments described below in reference to FIGs. 4A-4G, 5A-5B, 6A-6B, 7A-7C, 8A-8C, 9A-9B, 10, and 11 A-l IB, refer to the incorporation of molecular tags in any one of the 1) encapsulation stage, 2) barcoding stage, or 3) bulk stage. Namely, the steps that occur during the encapsulation stage refer to at least cell encapsulation 160 and analyte release 165 shown in FIG. IB. Additionally, the steps that occur during the barcoding stage occur during cell barcoding 170 and target amplification 175 shown in FIG. IB.
Additionally, the steps that occur during the bulk stage occur after target amplification 175 shown in FIG. IB after the amplicons from different cells have been pooled.
Incorporating Molecular Tags using Digestible Ribonucleotides or Digestible Uracils
[00121] Methods disclosed herein involve incorporating molecular tags using digestible primers, such as digestible ribonucleotides or digestible uracils. Generally, removal of digestible primers, such as digestible reverse transcription primers, ensures that such primers are not present during downstream reactions (e.g., during a cell barcoding step 170 as shown in FIG. IB. The presence of such primers during downstream reactions (e.g., during nucleic acid amplification) would result in their participation in the downstream reactions, thereby resulting in multiple molecular tags per unique molecule. Thus, by using digestible primers, a single molecular tag is incorporated per unique molecule.
[00122] FIG. 4 A depicts the processing of RNA and gDNA in a droplet, in accordance with an embodiment using digestible ribonucleotides. Specifically, FIG. 4A shows the processing of RNA and DNA during the encapsulation stage. Here, a digestible oligonucleotide comprising a reverse primer (rev primer), a molecular tag, and a seq8F is provided to the RNA. The reverse primer, molecular tag, and seq8F each include one or more ribonucleotides, which ensures that the oligonucleotide can be subsequently digested. The reverse primer hybridizes with a portion of the RNA and reverse transcriptase performs reverse transcription. Thus, this results in the generation of a cDNA that includes the oligonucleotide (e.g., reverse primer, molecular tag, and seq8F). Additionally, genomic DNA (gDNA) is released using a protease, such as proteinase K.
[00123] FIG. 4B depicts the amplification and barcoding of nucleic acids derived from RNA and gDNA, in accordance with the embodiment shown in FIG. 4A. FIG. 4B shows the steps performed during the barcoding stage. Referring first to the cDNA strand, a forward primer hybridizes with a portion of the cDNA. Here, the forward primer may be a gene specific primer. The strand is extended along the forward primer. Referring to the gDNA, a forward and reverse primer pair hybridize with portions of the gDNA. Here, both the forward primer and reverse primer may be gene specific primers. Nucleic acid extension (e.g., using DNA polymerase) occurs and extends the gDNA from the forward primer and reverse primer.
[00124] After the first nucleic acid amplification (e.g., PCR) cycle, a corresponding cDNA strand is now generated. Here, the corresponding cDNA strand includes a complementary sequence of the reverse primer, molecular tag, and seq8F. Furthermore, the complementary sequence of the reverse primer, molecular tag, and seq8F do not include ribonucleotides. RNAse, such as RNAse H is provided to digest the digestible oligonucleotide including the reverse primer, molecular tag, and seq8F that include ribonucleotides. Thus, this prevents the digestible oligonucleotide from participating in subsequent nucleic acid amplification reactions. Furthermore, it removes the original molecular tag including ribonucleotides that was first introduced.
[00125] In the later cycles of nucleic acid amplification (e.g., PCR), cell barcodes (CBC) are incorporated into the amplicons. As shown in the bottom panel of FIG. 4B, cell barcodes are incorporated by providing an oligonucleotide that includes a constant region (e.g., seq8F) and the cell barcode (CBC). Here, the constant region is complementary to another constant region on the cDNA amplicon. Thus, extension of the nucleic acid along the cDNA amplicon generates a subsequent amplicon that incorporates the cell barcode. Furthermore, the subsequent amplicon additionally includes the molecular tag.
[00126] Similarly, cell barcodes are incorporated into amplicons deriving from genomic DNA. An oligonucleotide that includes a constant region (e.g., seq8F) and the cell barcode (CBC) hybridizes with a complementary constant region on the genomic DNA that was introduced through a forward primer in a previous PCR cycle. Thus, extension of the nucleic acid along the amplicon derived from gDNA results in a subsequent amplicon that incorporates the cell barcode.
[00127] FIG. 4C depicts the processing of RNA and gDNA in a droplet, in accordance with an embodiment using digestible uracils. Specifically, FIG. 4C shows the processing of RNA and DNA during the encapsulation stage. Here, a digestible oligonucleotide comprising a reverse primer (rev primer), a molecular tag, and a seq8F is provided to the RNA. The reverse primer, molecular tag, and seq8F each include one or more uracil bases, which ensures that the digestible oligonucleotide can be subsequently digested. The reverse primer hybridizes with a portion of the RNA and reverse transcriptase performs reverse transcription. Thus, this results in the generation of a cDNA that includes the digestible oligonucleotide (e.g., reverse primer, molecular tag, and seq8F). Furthermore, a forward primer (fwd primer) is provided to generate a further cDNA strand. The forward primer hybridizes with a portion of the cDNA, and extension occurs along the cDNA. Here, the further cDNA strand includes a sequence complementary to the reverse primer, molecular tag, and seq8F. Notably, the sequence complementary to the reverse primer, molecular tag, and seq8F do not include uracils. Additionally, genomic DNA (gDNA) is released using a protease, such as proteinase K.
[00128] FIG. 4D depicts the amplification and barcoding of nucleic acids derived from RNA and gDNA, in accordance with the embodiment shown in FIG. 4C. FIG. 4D shows the steps performed during the barcoding stage. Referring first to the cDNA strand, uracil-DNA glycosylase (UDG) is provided to digest the digestible oligonucleotide including the reverse primer, molecular tag, and seq8F that include uracils. Thus, this prevents the digestible oligonucleotide from participating in subsequent nucleic acid amplification reactions. Furthermore, it removes the original molecular tag including uracils that was first introduced. [00129] Referring to the gDNA, a forward and reverse primer pair hybridize with portions of the gDNA. Here, both the forward primer and reverse primer may be gene specific primers. Nucleic acid extension (e.g., using DNA polymerase) occurs and extends the gDNA from the forward primer and reverse primer.
[00130] During the cycles of nucleic acid amplification (e.g., PCR), cell barcodes (CBC) are incorporated into the amplicons. Cell barcodes are incorporated into amplicons derived from cDNA by providing an oligonucleotide that includes a constant region (e.g., seq8F) and the cell barcode (CBC). Here, the constant region is complementary to another constant region on the cDNA amplicon. Thus, extension of the nucleic acid along the cDNA amplicon generates a subsequent amplicon that incorporates the cell barcode. Furthermore, the subsequent amplicon additionally includes the molecular tag.
[00131] Similarly, cell barcodes are incorporated into amplicons deriving from genomic DNA. An oligonucleotide that includes a constant region (e.g., seq8F) and the cell barcode (CBC) hybridizes with a complementary constant region on the genomic DNA that was introduced through a forward primer in a previous PCR cycle. Thus, extension of the nucleic acid along the amplicon derived from gDNA results in a subsequent amplicon that incorporates the cell barcode.
[00132] In various embodiments, the primers (e.g., forward primers and reverse primers) included in the encapsulation step (e.g., shown in FIG. 4 A and FIG. 4C) and the barcoding step (shown in FIG. 4B and FIG. 4D) can be differently designed such that the barcode can be differently incorporated into the amplicons derived from RNA. For example, as shown in FIG. 4A and 4C, the reverse primers added in the encapsulation step include a seq8F constant region. In the barcoding step, the seq8F constant region enables the incorporation of the cell barcode such that both the molecular tag and the cell barcode are on the same end of the amplicon derived from RNA. In various embodiments, the seq8F constant region can be introduced in the barcoding step via a forward primer that hybridizes with the cDNA. Thus, the seq8F region enables the incorporation of the cell barcode such that the molecular tag and the cell barcode are on opposite ends of the amplicon derived from RNA.
[00133] Additional reference is now made to FIGs. 4E-4G which depicts the processing of RNA and gDNA in a droplet and the amplification and barcoding of nucleic acids derived from RNA and gDNA, in accordance with a third embodiment using digestible primers.
[00134] First, FIG. 4E depicts the processing of RNA and gDNA in an encapsulation droplet (e.g., at step 160 and 165 shown in FIG. IB). The top panel shows the processing of RNA, whereas the bottom panel shows the processing of DNA. Referring first to RNA, a digestible primer (e.g., with digestible ribonucleotides or uracils) is provided which hybridizes with a sequence of the RNA. As shown in FIG. 4E, the digestible primer includes a reverse primer, a molecular tag (Ml) and a 32902 sequence. The reverse primer and 32902 sequence may include digestible ribonucleotides or uracils. Following reverse transcription, a corresponding cDNA strand is generated, the cDNA strand incorporating the reverse primer, molecular tag, and 32902 sequence. The RNA-cDNA hybrid is exposed to RNaseH, which randomly nicks the RNA. Here, after nicking with RNaseH, the still hybridized RNA strand serves as a primer for generating a second strand of cDNA. Thus, a second strand of cDNA is generated by DNA polymerase using the first cDNA strand as a template.
[00135] Referring to the DNA, it is released from chromatin packaging by being exposed to a protease, such as proteinase K. Thus, as shown in FIG. 4E, double stranded genomic DNA is released into the droplet.
[00136] Reference is now made to FIG. 4F which shows the amplification and barcoding of nucleic acids derived from RNA. In various embodiments, the steps shown in FIG. 4F occur in a second droplet after the steps in FIG. 4E occur in a first droplet. As shown in FIG. 4F, the digestible primer sequences of the first cDNA strand are digested. For example, if the digestible primer sequences include uracils, uracil-DNA glycosylase (UDG) is provided to digest the sequences. As another example, if the digestible primer sequences include ribonucleotides, RNaseH is provided to digest the sequences.
[00137] Through one or more subsequent amplification cycles, a cell barcode is incorporated into the amplicon derived from the RNA molecule. As shown in FIG. 4F, a forward primer and reverse primer (e.g., 32902 reverse primer) can be provided to amplify the second cDNA strand. Here, the forward primer can include a constant sequence (Seq8F sequence) which is useful in subsequent amplification rounds for incorporating the cell barcode. For example, after a first round of amplification, the amplicon now includes a Seq8F sequence. Here, a primer sequence including a complementary constant region (Seq8F), cell barcode (CBC), and a read sequence is provided. The complementary constant region hybridizes with the Seq8F constant region, and therefore, through a subsequent amplification cycle, the cell barcode is incorporated into an amplicon.
[00138] The additional steps in FIG. 4F show the optional step of a streptavidin bead pull down to obtain the amplicon sequence of the following format: Read 1 - cell barcode - seq8F - forward primer - cDNA sequence - reverse primer, molecular tag - 32902 sequence. Subsequent PCR cycles can take place for building the library.
[00139] Reference is now made to FIG. 4G which shows the amplification and barcoding of nucleic acids derived from DNA. Here, a forward and reverse primer pair is provided. Here, the forward and reverse primers may be gene specific primers that target specific sequences of the genomic DNA. Such forward and reverse primers may anneal with the DNA strand at a particular annealing temperature (e.g., between 50°C and 70°C, and preferably about 61°C). As shown in FIG. 4G, the forward primer may include a constant region (Seq8F) which is subsequently incorporated into the amplicon following the amplification cycle. Next, a primer sequence including a complementary constant region (Seq8F), cell barcode (CBC), and a read sequence (Read 1) is provided. The primer sequence may anneal with the constant region (Seq8F) at a second annealing temperature that is lower than the annealing temperature described above in relation to the forward and reverse primers. In various embodiments, the second annealing temperature is between 40°C and 60°C, and preferably about 51°C. This enables incorporation of the cell barcode into amplicons following a subsequent cycle of amplification.
Incorporating Molecular Tags using Template Switching Oligonucleotides
[00140] Methods disclosed herein involve incorporating molecular tags using template switching oligonucleotides. Here, such methods can be advantageous to incorporate molecular barcodes into full length transcripts. Thus, subsequent downstream analysis (e.g., sequencing) can capture information of the full length transcripts as opposed to only a portion of the transcripts. This can be particularly valuable for applications that focus on quantifying full length molecules based on presence of molecular barcodes on an end of the molecules (e.g., on the 5’ end).
[00141] FIG. 5 A depicts the processing of RNA in a droplet, in accordance with an embodiment using template switching oligonucleotides. Specifically, FIG. 5A shows the processing of RNA during the encapsulation stage. Although not shown in FIG. 5 A, the processing of DNA can occur in parallel using methods disclosed herein.
[00142] The top panel of FIG. 5 A shows the introduction of an oligonucleotide including a reverse primer and seq8F. The reverse primer hybridizes with a portion of the RNA analyte and reverse transcription occurs to generate cDNA that includes the reverse primer and seq8F. Here, the reverse transcription process generates a 3’ dC extension on the cDNA. [00143] In the bottom panel of FIG. 5 A, a template switching oligonucleotide (TSO) is introduced. Here, the TSO includes a template switching (TS) sequence, a molecular tag, and a sequence that enables hybridization with the cDNA molecule. As shown in FIG. 5A, the sequence that enables hybridization with the cDNA molecule is a repeating guanine unit that hybridizes with the 3’ dC extension of the cDNA. In various embodiments, the repeating guanine unit is a rGrGrG sequence. Here, the TSO causes template switching and therefore, extension further occurs along the cDNA molecule beginning at the 3’ dC extension. Thus, following this template switching extension, a complementary sequence of the molecular tag is incorporated into the cDNA.
[00144] FIG. 5B depicts the amplification and barcoding of nucleic acids derived from RNA, in accordance with the embodiment shown in FIG. 5 A. FIG. 5B shows the steps performed during the barcoding stage. In the top panel of FIG. 5B, a forward primer is hybridized to the cDNA. Here, the forward primer hybridizes with the TS sequence and nucleic acid extension proceeds to generate a complementary cDNA that incorporates the molecular tag.
[00145] In the middle panel of FIG. 5B, a cell barcode (CBC) is incorporated. Cell barcodes are incorporated by providing an oligonucleotide that includes a constant region (e.g., seq8F) and the cell barcode (CBC). Here, the constant region is complementary to another constant region on the cDNA amplicon. Thus, extension of the nucleic acid along the cDNA amplicon generates a subsequent amplicon that incorporates the cell barcode. Furthermore, the subsequent amplicon additionally includes the molecular tag.
Incorporating Molecular Tags using Gene Specific Primers
[00146] Methods disclosed herein involve incorporating molecular tags using gene specific primers. Here, such methods include providing gene specific primers with molecular tags independent of reverse transcription. This avoids reaction inefficiencies arising from undesired interactions between different reagents for reverse transcription (e.g., reverse transcription primers and enzymes) and reagents for nucleic acid amplification (e.g., gene specific primers with molecular tags and enzymes). Additionally higher temperatures can be applied during extension during nucleic acid amplification (thereby incorporating the molecular tags), as opposed to during reverse transcription. Therefore, fewer byproducts are generated, the byproducts arising from interactions between the molecular tags and other primers. Furthermore, by separating reverse transcription and nucleic acid amplification, the number of nucleic acid amplifications cycles can be limited, which minimizes amplification bias.
[00147] FIG. 6 A depicts the processing of RNA and gDNA in a droplet, in accordance with an embodiment using gene specific primers comprising molecular tags. Specifically, FIG. 6 A shows the processing of RNA and gDNA during the encapsulation stage. The top panel of FIG. 6 A shows the introduction of an oligonucleotide including a reverse primer. As shown in FIG. 6A, the reverse primer may be a digestible primer that includes uracils. In various embodiments, the reverse primer may be a digestible primer that includes ribonucleotides. In various embodiments, the reverse primer is not a digestible primer. The reverse primer hybridizes with a portion of the RNA analyte and reverse transcription occurs to generate cDNA that includes the reverse primer. Additionally, genomic DNA (gDNA) is released using a protease, such as proteinase K.
[00148] FIG. 6B depicts the amplification and barcoding of nucleic acids derived from RNA and gDNA, in accordance with the embodiment shown in FIG. 6A. FIG. 6B shows the steps performed during the barcoding stage. Referring first to the cDNA strand, uracil-DNA glycosylase (UDG) is provided to digest the digestible oligonucleotide including the reverse primer. Thus, this prevents the digestible oligonucleotide from participating in subsequent nucleic acid amplification reactions. Additionally, an oligonucleotide including a forward primer, a molecular tag, and seq8F (constant region) is introduced. In various embodiments, the forward primer is a gene specific primer. Thus, the forward primer hybridizes with a specific region of the cDNA. Nucleic acid extension results in the generation of a complementary cDNA amplicon that incorporates the molecular tag.
[00149] Referring to the gDNA, a forward and reverse primer pair hybridize with portions of the gDNA. Here, both the forward primer and reverse primer may be gene specific primers. Nucleic acid extension (e.g., using DNA polymerase) occurs and extends the gDNA from the forward primer and reverse primer.
[00150] In the middle panel of FIG. 6B, a second reverse primer (different from reverse primer used for initiating reverse transcription) is introduced, which hybridizes with the cDNA. This enables nucleic acid extension along the cDNA to further generate a cDNA amplicon that also includes the molecular tag.
[00151] During the cycles of nucleic acid amplification (e.g., PCR), cell barcodes (CBC) are incorporated into the amplicons. Cell barcodes are incorporated into amplicons derived from cDNA by providing an oligonucleotide that includes a constant region (e.g., seq8F) and the cell barcode (CBC). Here, the constant region (seq8F) is complementary to another constant region (seq8F) on the cDNA amplicon. Thus, extension of the nucleic acid along the cDNA amplicon generates a subsequent amplicon that incorporates the cell barcode. Furthermore, the subsequent amplicon additionally includes the molecular tag.
[00152] Similarly, cell barcodes are incorporated into amplicons deriving from genomic DNA. An oligonucleotide that includes a constant region (e.g., seq8F) and the cell barcode (CBC) hybridizes with a complementary constant region on the genomic DNA that was introduced through a forward primer in a previous PCR cycle. Thus, extension of the nucleic acid along the amplicon derived from gDNA results in a subsequent amplicon that incorporates the cell barcode.
Incorporating Molecular Tags using in Bulk Processing
[00153] Methods disclosed herein involve incorporating molecular tags in bulk following single cell encapsulation, lysis, and barcoding. Here, introducing molecular tags in bulk enables a broader range of experimental conditions (e.g., temperatures, buffer conditions, concentrations of molecular tags and reagents) than if the molecular tags are introduced within droplets. Furthermore, performing bulk molecular tagging may be able to achieve higher efficiency in comparison to molecular tagging in droplets.
[00154] FIG. 7 A depicts the processing of RNA and gDNA in a droplet, in accordance with an embodiment in which molecular tags are introduced in bulk. Specifically, FIG. 7A shows the processing of RNA and gDNA during the encapsulation stage. The top panel of FIG. 7A shows the introduction of an oligonucleotide including a reverse primer. As shown in FIG. 7A, the reverse primer may be a digestible primer that includes uracils. In various embodiments, the reverse primer may be a digestible primer that includes ribonucleotides. Although FIG. 7A shows the reverse primer as a digestible primer, in various embodiments, the reverse primer is not digestible. Therefore, the reverse primer remains present throughout the subsequent steps. The reverse primer hybridizes with a portion of the RNA analyte and reverse transcription occurs to generate cDNA that includes the reverse primer. Additionally, genomic DNA (gDNA) is released using a protease, such as proteinase K.
[00155] FIG. 7B depicts the amplification and barcoding of nucleic acids derived from RNA and gDNA, in accordance with the embodiment shown in FIG. 7A. FIG. 7B shows the steps performed during the barcoding stage. Referring first to the cDNA strand, uracil-DNA glycosylase (UDG) (or RNaseH) is provided to digest the digestible oligonucleotide including the reverse primer. Thus, this prevents the digestible oligonucleotide from participating in subsequent nucleic acid amplification reactions. Additionally, an oligonucleotide including a forward primer and seq8F (constant region) is introduced. In various embodiments, the forward primer is a gene specific primer. Thus, the forward primer hybridizes with a specific region of the cDNA. Nucleic acid extension results in the generation of a complementary cDNA amplicon that incorporates the forward primer.
[00156] Referring to the gDNA, a forward and reverse primer pair hybridize with portions of the gDNA. Here, both the forward primer and reverse primer may be gene specific primers. Nucleic acid extension (e.g., using DNA polymerase) occurs and extends the gDNA from the forward primer and reverse primer.
[00157] In the middle panel of FIG. 7B, a second reverse primer (different from reverse primer used for initiating reverse transcription) is introduced, which hybridizes with the cDNA. This enables nucleic acid extension along the cDNA to further generate a cDNA amplicon.
[00158] During the cycles of nucleic acid amplification (e.g., PCR), cell barcodes (CBC) are incorporated into the amplicons. Cell barcodes are incorporated into amplicons derived from cDNA by providing an oligonucleotide that includes a constant region (e.g., seq8F) and the cell barcode (CBC). Here, the constant region (seq8F) is complementary to another constant region (seq8F) on the cDNA amplicon. Thus, extension of the nucleic acid along the cDNA amplicon generates a subsequent amplicon that incorporates the cell barcode.
[00159] Similarly, cell barcodes are incorporated into amplicons deriving from genomic DNA. An oligonucleotide that includes a constant region (e.g., seq8F) and the cell barcode (CBC) hybridizes with a complementary constant region on the genomic DNA that was introduced through a forward primer in a previous PCR cycle. Thus, extension of the nucleic acid along the amplicon derived from gDNA results in a subsequent amplicon that incorporates the cell barcode.
[00160] Here, at the end of FIG. 7B, cell barcodes are incorporated into the amplicons, but no molecular tags have yet been incorporated. FIG. 7C depicts the molecular tagging in bulk, in accordance with the embodiments shown in FIGs. 7A and 7B. Referring first to the cDNA in the top panel, an oligonucleotide an introduced, the oligonucleotide including the molecular tag. Here, the oligonucleotide further includes a sequence (“32092”) that hybridizes with a corresponding sequence (“32092”) of the cDNA amplicon. Nucleic acid extension occurs in bulk, which generates a complementary cDNA amplicon that incorporates the molecular tag.
[00161] In these embodiments, 2 cycles of PCR are performed prior to extension in FIG. 7C. Therefore, each original molecule can only make one strand to which the 32092-tag- const can hybridize. For example, if the reverse primer participates in reverse transcription during encapsulation, then a forward primer extends in a 1st cycle of PCR during barcoding. Next, the reverse primer - 32092 shown in FIG. 7B can extend in the 2nd cycle if the forward primer is extended. Therefore, only a single molecule from each RNA molecule would have a 32092-tag-const prime. Incorporating Molecular Tags and Gene Tags
[00162] Methods disclosed herein involve incorporating molecular tags and gene tags in sequencing libraries (e.g., DNA or RNA sequencing libraries). Here, the final nucleic acids of the sequencing libraries do not contain sequences corresponding to sequences of the original genomic DNA or RNA transcripts. Thus, although the original DNA or RNA sequence is lost, the information can be replaced and/or maintained by the presence of a short gene tag in the final amplicon. Altogether, by incorporating a gene tag, the original DNA or RNA sequences need not be amplified and/or sequenced. By only amplifying the tags, PCR bias is minimized. Furthermore, shorter amplicons are more efficiently amplified and more cheaply sequenced in comparison to longer amplicons including the original DNA or RNA sequence.
[00163] FIG. 8 A depicts the processing of RNA in a droplet, in accordance with an embodiment incorporating molecular tags and gene tags. Specifically, FIG. 8A shows the processing of RNA during the encapsulation stage. Although not shown in FIG. 8 A, the processing of DNA can occur in parallel using methods disclosed herein.
[00164] The top panel of FIG. 8 A shows the introduction of an oligonucleotide including a reverse primer. As shown in FIG. 8A, the reverse primer may be a digestible primer that includes uracils. In various embodiments, the reverse primer may be a digestible primer that includes ribonucleotides. Although FIG. 8A shows the reverse primer as a digestible primer, in various embodiments, the reverse primer is not digestible. Therefore, the reverse primer remains present throughout the subsequent steps. The reverse primer hybridizes with a portion of the RNA analyte and reverse transcription occurs to generate cDNA that includes the reverse primer. As shown in FIG. 8A, the oligonucleotide further includes a handle. In various embodiments, the handle includes a gene tag (e.g., a gene specific amplicon tag), which is a nucleotide sequence that identifies the specific gene that is targeted by the reverse primer. Thus, oligonucleotides may include the same reverse primer and same gene tag, given that the design of the reverse primer controls the targeting of the corresponding sequence of the RNA.
[00165] In various embodiments, a gene tag may include about 6 to about 20 nucleotides. In some embodiments, a gene tag includes 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides. In some embodiments, a gene tag includes at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or at least 20 nucleotides in length. In various embodiments, every gene tag is different from every other gene tag (e.g., each gene tag is unique). In various embodiments, a gene tag is randomly generated. In such embodiments, gene tags may not be unique. For example a first randomly generated gene tag may have a same sequence as a sequence of a second randomly generated gene tag.
[00166] Following RT extension, RNase H removes the RNA analytes from the RNA- cDNA hybrid. This enables a forward primer to hybridize with the cDNA and initiate nucleic acid extension. Thus, the complementary cDNA strand that is generated from the forward primer further includes a sequence complementary to the reverse primer and the handle. As shown in FIG. 8 A, the forward primer may be a digestible primer (e.g., includes one or more ribonucleotides or uracils). In various embodiments, the forward primer is a gene specific primer.
[00167] FIG. 8B depicts the amplification and barcoding of nucleic acids derived from RNA, in accordance with the embodiment shown in FIG. 8A. In the top panel, RNase (e.g., RNase A or RNase H) is introduced in the barcoding step which digests the RNA portions including the original reverse primer, handle, and forward primer. Thus, this leaves a cDNA that includes the reverse primer and handle sequence.
[00168] During the cycles of nucleic acid amplification (e.g., PCR), cell barcodes (CBC) are incorporated into the amplicons. Here, the nucleic acid amplification can involve linear amplification. Cell barcodes are incorporated into amplicons derived from cDNA by providing an oligonucleotide that includes a constant region (e.g., seq8F) and the cell barcode (CBC). Here, the constant region (seq8F) is complementary to another constant region (seq8F) on the cDNA amplicon. Thus, extension of the nucleic acid along the cDNA amplicon generates a subsequent amplicon that incorporates the cell barcode. Furthermore, each cDNA amplicon further includes the handle, which also includes the gene tag. Here, at the end of FIG. 8B, cell barcodes and gene tags are incorporated into the cDNA amplicons, but no molecular tags have yet been incorporated.
[00169] FIG. 8C depicts the molecular tagging in bulk, in accordance with the embodiments shown in FIGs. 8 A and 8B. A first step in bulk may involve performing a pulldown of RNA. As shown in FIG. 8C, a pull down can be achieved via a 32092 biotin pulldown, where the biotin is conjugated to a bead, such as an streptavidin bead. Thus, a nucleic acid sequence attached to the biotin-streptavidin conjugate is complementary to the 32092 sequence of the cDNA amplicon, thereby enabling the pulldown. [00170] Next, an oligonucleotide including the molecular tag (“MolTag”) is introduced. The oligonucleotide further includes a constant region that is complementary to a corresponding constant region of the cDNA amplicon. For example, as shown in FIG. 8C, the constant region of the cDNA amplicon may be included in the handle, which was previously incorporated during reverse transcription. Notably, nucleic acid extension beginning at the constant region results in the generation of a complementary strand that excludes the original gene sequence. For example, as shown in FIG. 8C, the extension occurs on the second cDNA strand in a 5’ to 3’ direction. Thus, the resulting amplicon for sequencing includes the molecular tag, gene tag, and cell barcode, but does not include the original gene sequence.
Incorporating Molecular Tags through Universal Bases
[00171] Methods disclosed herein involve incorporating molecular tags through conversion of universal bases. Here, universal bases can be introduced via a primer. A polymerase would create the molecular tag as a complementary sequence to the universal bases. This process can achieve low variability across runs (e.g., arising from different ratios of randomers incorporated during synthesis). For example, different runs may include different ratios of nucleotide bases (e.g., different ratios of A, C, G, T), leading to lot-to-lot variability. Thus, each run may have bias depending on the ratio of nucleotide bases added in the reagents or reaction mixture. Through the incorporation of molecular tags through conversion of universal bases, the ratio of universal bases is dependent on polymerase bias and mix of nucleotides (provided in excess). Therefore, the known bias can be accounted for, resulting in more accurate results.
[00172] FIG. 9 A depicts the processing of RNA and gDNA in a droplet, in accordance with an embodiment incorporating universal bases. Specifically, FIG. 9A shows the processing of RNA and gDNA during the encapsulation stage. The top panel of FIG. 9 A shows the introduction of an oligonucleotide including a reverse primer. The oligonucleotide can further include one or more universal bases, examples of which include any of inosine, 2’deoxyinosine, 5-nitroindole, 5' 5-Nitroindole, or 3 -Nitropyrrole. The reverse primer hybridizes with a portion of the RNA analyte and reverse transcription occurs to generate cDNA that includes the reverse primer and the universal bases.
[00173] Next, a forward primer is introduced that hybridizes to the cDNA. Nucleic acid extension is performed (e.g., via DNA polymerase) beginning at the forward primer to generate a second cDNA that includes a sequence that is complementary to the reverse primer and the universal bases. Here, the universal bases are characterized in that they are complementary to any of the four natural DNA nucleotides (adenine, thymine, guanine, and cytosine). Thus, the sequence complementary to the universal base represents a molecular tag that can be generally different from other molecular tags of other cDNA molecules. In other words, the molecular tag of the cDNA is generated by a DNA polymerase as a result of the nucleic acid extension reaction. Additionally, genomic DNA (gDNA) is released using a protease, such as proteinase K.
[00174] FIG. 9B depicts the amplification and barcoding of nucleic acids derived from RNA and gDNA, in accordance with the embodiment shown in FIG. 9A. FIG. 9B shows the steps performed during the barcoding stage. Referring to the gDNA, a forward and reverse primer pair hybridize with portions of the gDNA. Here, both the forward primer and reverse primer may be gene specific primers. Nucleic acid extension (e.g., using DNA polymerase) occurs and extends the gDNA from the forward primer and reverse primer.
[00175] After the first nucleic acid amplification (e.g., PCR) cycle, a corresponding cDNA strand is now generated. Here, the corresponding cDNA strand includes a complementary sequence of the reverse primer, molecular tag, and seq8F. In the later cycles of nucleic acid amplification (e.g., PCR), cell barcodes (CBC) are incorporated into the amplicons. As shown in the bottom panel of FIG. 9B, cell barcodes are incorporated by providing an oligonucleotide that includes a constant region (e.g., seq8F) and the cell barcode (CBC).
Here, the constant region (seq8F) is complementary to another constant region (seq8F) on the cDNA amplicon. Thus, extension of the nucleic acid along the cDNA amplicon generates a subsequent amplicon that incorporates the cell barcode. Furthermore, the subsequent amplicon additionally includes the molecular tag.
[00176] Similarly, cell barcodes are incorporated into amplicons deriving from genomic DNA. An oligonucleotide that includes a constant region (e.g., seq8F) and the cell barcode (CBC) hybridizes with a complementary constant region (seq8F) on the genomic DNA that was introduced through a forward primer in a previous PCR cycle. Thus, extension of the nucleic acid along the amplicon derived from gDNA results in a subsequent amplicon that incorporates the cell barcode.
Differentially Cleaving RNA for Distinguishing Amplicons
[00177] Methods disclosed herein involve using differential cleaving of RNA transcripts as a means of distinguishing amplicons. Here, using cleavage sites as molecular barcodes eliminates the use of primer sequences for introducing molecular tags. Therefore, this methodology avoids the need to design particular primers, synthesize the primers, and incorporate such primers with nucleic acid barcode sequences. Furthermore, this differential cleavage methodology increases amplification efficiency due to template length. For example, increased amplification efficiency can be due to reduced inhibitory effects of secondary structures associated with long templates. Altogether, this may be a more efficient (e.g., time-efficient and cost-efficient) process for distinguishing amplicons.
[00178] FIG. 10 depicts the processing of RNA in a droplet, in accordance with an embodiment involving differentially cleaving RNA. Specifically, FIG. 10 shows the processing of RNA during the encapsulation stage. Although not explicitly shown in FIG. 10, gDNA can also be processed in parallel.
[00179] Truncation oligonucleotides are introduced, which hybridize to regions of the single stranded RNA. In various embodiments, the truncation oligonucleotides are DNA oligonucleotides. Following hybridization of a truncation oligonucleotide to the RNA, RNAse (e.g., RNAse H) cleaves the RNA-DNA duplex. Here, depending on where the truncation oligonucleotide hybridizes with the RNA, the RNAse will differentially cleave the RNA at different start and/or stop sites. Therefore, by designing different truncation oligonucleotides to hybridize with different regions of RNA analytes, the RNA analytes can be differentially cleaved, thereby resulting in RNA analytes with differing start and/or stop sites. As shown in FIG. 10, differential cleave of RNA by RNAse results in RNA of different start and/or stop sites. A reverse primer can hybridize with the RNA and a complementary cDNA is generated. Given the different sequences of RNA analytes corresponding to different start and stop sites, the cDNA will similarly have different sequences corresponding to different start and stop sites. Thus, the different sequences corresponding to different start and stop sites can be propagated during subsequent nucleic acid amplification, thereby generating amplicons of different sequences. The different sequences of the amplicons corresponding to different start and stop sites represent molecular tags which enables distinguishing of amplicons that derive from different RNA analytes.
Incorporating Alternate Bases for Distinguishing Amplicons
[00180] Methods disclosed herein involve incorporating alternate bases into amplicons as a means of distinguishing amplicons. Here, alternate bases can be incorporated during reverse transcription and/or nucleic acid amplification. Thus, molecular tags can be created during real time amplification of template strands. The presence and location of alternate bases within amplicons can be informative for distinguishing amplicons. This methodology eliminates the use of primer sequences for introducing molecular tags. Therefore, this methodology avoids the need to design particular primers, synthesize the primers, and incorporate such primers with nucleic acid barcode sequences. Additionally, as primers do not include an additional adaptor, this improves flexibility in targeting specific regions and further improves amplification efficiency as the resulting amplicon is shortened in length. Furthermore, the methodology remains compatible with nucleic acid amplification systems. Altogether, this may be a more efficient (e.g., time-efficient and cost-efficient) process for distinguishing amplicons.
[00181] FIG. 11 A depicts the processing of RNA in a droplet, in accordance with an embodiment incorporating alternate bases. Specifically, FIG. 11 A shows the processing of RNA during the encapsulation stage. Although not explicitly shown in FIG. 11 A, gDNA can also be processed in parallel. The top panel shows the introduction of a reverse primer.
Here, alternate bases can be introduced. For example, alternate bases are included in the reagents 120 (shown in FIG. 1) at a particular ratio relative to dNTPs. In various embodiments, alternate bases include universal bases. In various embodiments, alternate bases include bases other than deoxynucleotides (dNTPs) (e.g., other than deoxyadenosine 5 ’-triphosphate, deoxyguanine 5 ’-trisphosphate, deoxycytidine 5 ’-triphosphate, and deoxythymidine 5 ’-triphosphate).
[00182] Therefore, as reverse transcription is performed, a cDNA is generated which may include one or more alternate bases. As shown in FIG. 11 A, the cDNA includes alternate base 1 and alternate base 2. In various embodiments, a cDNA can include fewer or additional alternate bases. The number of alternate bases introduced into the cDNA can be controlled based on the ratio of alternate bases that are included in the reagents 120 relative to dNTPs. In various embodiments, alternate bases are incorporated at random locations in the cDNA.
[00183] FIG. 1 IB depicts the amplification and barcoding of nucleic acids derived from RNA, in accordance with the embodiment shown in FIG. 11 A. FIG. 1 IB shows the steps performed during the barcoding stage. In various embodiments, during nucleic acid amplification cycles, one or more alternate bases can be introduced into amplicons based on ratio of alternate bases that are included in the reaction mixture 140 relative to dNTPs. As shown in FIG. 1 IB, alternate base 3 can be introduced into a cDNA amplicon in an earlier cycle of PCR and alternate base 4 can be introduced into a cDNA amplicon in a later cycle of PCR. Additionally, alternate base 1 and alternate base 2 are propagated through the cycles of nucleic acid amplification. Therefore, the final cDNA amplicon can include alternate bases 1, 2, 3, and 4.
[00184] The cDNA amplicons including the alternate bases are subsequently sequenced. The presence of the multiple alternate bases are detected in the sequences and can be correlated across the different amplicon sequences. For example, one or more sequence reads of amplicons with common alternate bases can be identified. Here, the common alternate bases indicate that the corresponding amplicons derived from the same RNA analyte. Thus, the sequence reads of amplicons including the common alternate bases can be assigned to a single RNA analyte of the cell.
In Situ Processing for Single Cell RNA and/or DNA analysis
[00185] In various embodiments, methods disclosed herein involve in situ processing of RNA of cells followed by single cell analysis. The in situ processing can be performed while the cells are in a bulk (e.g., non-single cell) and therefore, a plurality of cells can be processed simultaneously. By performing certain steps in situ, a wider range of experimental optimizations can be explored than in drop. Furthermore, wash steps to remove excess reagents can be performed in situ whereas was steps cannot be performed in a droplet. In certain embodiments where molecular tags are introduced in situ, excess molecular tags can be removed through a wash process. By removing excess reagents, including excess molecular tags, this enables more accurate downstream quantification of molecules (e.g., RNA transcripts).
[00186] Generally, the in situ processing involves providing primer sequences that are capable of hybridizing with a corresponding sequence of RNA molecules in cells. In some embodiments, the primer sequences include reverse primers that can be used to initiate reverse transcription to generate complementary DNA (cDNA) molecules. In various embodiments, the primer sequences are not used to initiate reverse transcription. For example, primer sequences can undergo ligation to generate sequences for subsequent single cell analysis. Further examples of in situ reverse transcription and in situ ligation are described in further detail below.
[00187] In various embodiments, in situ processing of cells can involve fixing and/or permeabilizing cells prior to providing the primer sequences. In various embodiments, fixing the cells involves providing a fixing agent to the cells. In various embodiments, the fixing agent can include any of methanol, acetone, a mixture of methanol and acetone (e.g., 1 : 1 mixture), paraformaldehyde (e.g., 1-10%), DSP (dithiobis(succinimidyl propionate)) (e.g., 0.1 mM - 10 mM), SPDP (succinimidyl 3-(2-pyridyldithio)propionate) (e.g., 0.1 mM - 10 mM), and a mixture of DSP and SPDP (mix of 0.1 mM - 10 mM each). For methanol, acetone, and methanol/acetone fixatives, cells can be incubated with the fixative agent at - 20°C, on ice, at 4 °C, or at 10 °C. For the paraformaldehyde, DSP, SPDP, and DSP/SPDP methods, cells can be incubated with the fixative agent at: 4°C, 10°C, 20°C, 25°C, room temperature, or 37°C. In various embodiments, the cells can be incubated with the fixative agent for at least 5 minutes, at least 10 minutes, at least 30 minutes, at least 45 minutes, at least 1 hour, at least 2 hours, or at least 3 hours.
[00188] In various embodiments, permeabilizing the cells involves exposing the cells to a permeabilizing agent. Example permeabilizing agents include Tween-20 (e.g., 0.01-1%), Triton X-100 (e.g., 0.01-1%), or saponin (e.g., 0.01-1%). In various embodiments, the permeabilization step is optional and need not be performed. Cell permeabilization may occur at an incubation temperature of any of the following: on ice, 4C, 10C, 20C, room temperature, 25C, or 37C. The incubation duration for cell permeabilization takes place for any of the following: 1 min, 3 min, 5 min, 10 min, 15 min, 20 min, or 30 min.
[00189] In various embodiments, in situ processing of cells can involve washing the cells. Washing the cells involves exposing the cells to a cell wash buffer. Example cell wash buffers include any of DPBS, FBS, or a combination of DPBS and FBS (e.g., DPBS + 0.5% FBS, DPBS + 1% FBS, DPBS + 2.5% FBS, or DPBS + 5% FBS). After exposing the cells to a cell wash buffer, the cells can undergo a centrifugation (e.g., at 300 x g or 400 x g) to remove the cell wash buffer. In various embodiments, cells are washed at any of the following temperatures: 4C, 10C, 15C, 20C, 25C, or 37C. In various embodiments, cells can be washed more than once. For example, cells can be washed 2 times, 3 times, 4 times, or 5 times.
[00190] Following in situ processing, cells are loaded into a single cell analysis system (e.g., Tapestri® platform). As described herein in further detail, cells can under cell encapsulation, analyte release, cell barcoding, and target amplification.
[00191] Reference is now made to FIGs. 12A-12B, 13A-13B, 14A-14D, and 15A-15D which show various embodiments of in situ processing of RNA and single cell analysis. Specifically, FIG. 12A depicts the in situ processing of RNA, in accordance with a first embodiment. Here, FIG. 12A depicts an in situ processing methodology that involves providing primer sequences that undergo ligation to generate sequences for subsequent single cell analysis. The primer sequences are not used to initiate reverse transcription. FIG. 12A shows a RNA molecule from a cell, which includes a RNA sequence (labeled as “RNA” in FIG. 12 A) and a poly- A tail. The in situ processing involves providing a primer sequence, which includes a poly-T sequence, a molecular tag sequence (labeled as “MT” in FIG. 12 A), and a PCR adaptor sequence. The poly-T sequence of the primer sequence hybridizes with a sequence of the poly A tail of the RNA molecule. Additionally, the in situ processing involves providing a second sequence, shown in FIG. 12A as a “gene specific” sequence. Here, the gene specific sequence is designed to hybridize with a corresponding sequence of the RNA sequence of the RNA molecule. In various embodiments, the gene specific sequence includes a phosphate group, such as a 5’ phosphate group.
[00192] Next, the poly-T sequence of the primer sequence and the gene specific sequence undergo ligation. Here, a ligase (e.g., DNA ligase) is provided which ligates the gene specific sequence and the poly-T sequence. In particular embodiments, the ligase ligates the gene specific sequence by catalyzing the formation of a phosphodiester bond between the 5’ phosphate group of the gene specific sequence and the 3 ’-hydroxyl group of the poly-T sequence. This now generates an intermediate nucleic acid that includes 1) the gene specific sequence, 2) the poly-T sequence, 3) the molecular tag (MT), and 4) the PCR adaptor. Following ligation, the cell can be provided for single cell partitioning.
[00193] FIG. 12B depicts the in droplet processing of RNA, in accordance with the embodiment shown in FIG. 12 A. In various embodiments, the in droplet processing shown in FIG. 12B refers to the steps that occur in a first droplet during analyte release 165 (see FIG. IB) of a two step process. FIG. 12B depicts the intermediate nucleic acid generated via the in situ processing, the intermediate nucleic acid including the 1) the gene specific sequence, 2) the poly-T sequence, 3) the molecular tag (MT), and 4) the PCR adaptor. Within the droplet, an additional primer sequence is provided which enables nucleic acid amplification. As shown in FIG. 12B, the additional primer sequence includes a gene specific sequence and a constant sequence (shown as “seq8” sequence in FIG. 12B). The gene specific sequence of the additional primer sequence hybridizes with the gene specific sequence of the intermediate nucleic acid and serves to initiate nucleic acid amplification to generate amplicons.
[00194] Although not explicitly shown in FIG. 12B, the in-droplet processing can further include processing of genomic DNA (gDNA), in accordance with the methods described herein. For example, within the droplet, gDNA can be released by exposure to a protease (e.g., proteinase K), as is described in further detail in FIGs. 3A-3C. Thus, the free gDNA can additionally undergo priming and nucleic acid amplification.
[00195] In various embodiments, although not shown in FIG. 12B, an additional cell barcoding step (e.g., a cell barcoding 170 step shown in FIG. IB) is performed to incorporate a cell barcode. For example, a barcode bead including a plurality of cell barcodes can be provided in a reaction mixture in a droplet, such that the cell barcodes can be incorporated into the amplicons. Specifically, a cell barcode can be included in a primer sequence that includes a constant region sequence that is complementary to the constant sequence (shown as “seq8” sequence). Thus, through subsequent nucleic acid amplification cycles, the cell barcode is incorporated into the amplicon.
[00196] The bottom panel of FIG. 12B shows an example final amplicon following single cell processing. Here, the amplicon includes at least 1) a cell barcode sequence, 2) constant region (shown as “seq8”), 3) gene specific sequence, 4) poly-T sequence (or a complement such as a poly-A sequence), and 5) molecular tag. The amplicon can further include read sequences such as P5 index and P7 index sequences and/or a read 2 sequence. The amplicon can undergo subsequent sequencing. Given the read sequences, the amplicon can be attributed to a particular cell via the cell barcode. Additionally and/or alternatively, the amplicon can be attributed to a particular RNA molecule via the gene specific sequence and the molecular tag.
[00197] Reference is now made to FIG. 13 A which depicts the in situ processing of RNA, in accordance with a second embodiment. Here, FIG. 13 A depicts an in situ processing methodology that involves providing primer sequences that undergo ligation to generate sequences for subsequent single cell analysis. The primer sequences are not used to initiate reverse transcription. FIG. 13A differs from FIG. 12A in which the provided primers can be gene specific primers that hybridize with a specific RNA sequence as opposed to a poly-A tail.
[00198] FIG. 13 A shows a RNA molecule from a cell, which includes a first RNA sequence (labeled as “RNA sequence 1” in FIG. 13A), a second RNA sequence (labeled as “RNA sequence 2”) and a poly-A tail. The first RNA sequence and second RNA sequence may be adjacent to one another. The in situ processing involves providing a primer sequence, which includes a gene specific sequence (shown in FIG. 13 A as “Gene specific 2”), a molecular tag sequence (labeled as “MT” in FIG. 13 A), and a PCR adaptor sequence. The gene specific sequence of the primer sequence hybridizes with a RNA specific sequence (e.g., RNA sequence 2) of the RNA molecule. Additionally, the in situ processing involves providing a second gene specific sequence, shown in FIG. 13A as a “gene specific 1” sequence. Here, the gene specific 1 sequence is designed to hybridize with a corresponding RNA sequence (e.g., RNA sequence 1) of the RNA molecule. In various embodiments, the gene specific 1 sequence includes a phosphate group, such as a 5’ phosphate group.
[00199] Next, the two gene specific sequences (e.g., gene specific 1 and gene specific 2) undergo ligation. A ligase (e.g., DNA ligase) is provided which ligates the two gene specific sequences. In particular embodiments, the ligase ligates the gene specific sequences by catalyzing the formation of a phosphodiester bond between the 5’ phosphate group of the gene specific 1 sequence and the 3 ’-hydroxyl group of the gene specific 2 sequence. This generates an intermediate nucleic acid that includes 1) the first gene specific sequence (e.g., gene specific 1), 2) the second gene specific sequence (e.g., gene specific 2), 3) the molecular tag (MT), and 4) the PCR adaptor. Following ligation, the cell can be provided for single cell partitioning.
[00200] FIG. 13B depicts the in droplet processing of RNA, in accordance with the embodiment shown in FIG. 13 A. In various embodiments, the in droplet processing shown in FIG. 13B refers to the steps that occur in a first droplet during analyte release 165 (see FIG. IB) of a two step process. FIG. 13B depicts the intermediate nucleic acid generated via the in situ processing, the intermediate nucleic acid including the 1) the gene specific sequence, 2) the poly-T sequence, 3) the molecular tag (MT), and 4) the PCR adaptor. Within the droplet, an additional primer sequence is provided which enables nucleic acid amplification. As shown in FIG. 13B, the additional primer sequence includes a gene specific sequence and a constant sequence (shown as “seq8” sequence in FIG. 13B). The gene specific sequence of the additional primer sequence hybridizes with the gene specific 1 sequence of the intermediate nucleic acid and serves to initiate nucleic acid amplification to generate amplicons.
[00201] Although not explicitly shown in FIG. 13B, the in-droplet processing can further include processing of genomic DNA (gDNA), in accordance with the methods described herein. For example, within the droplet, gDNA can be released by exposure to a protease (e.g., proteinase K), as is described in further detail in FIGs. 3A-3C. Thus, the free gDNA can additionally undergo priming and nucleic acid amplification.
[00202] In various embodiments, although not shown in FIG. 13B, an additional cell barcoding step (e.g., a cell barcoding 170 step shown in FIG. IB) is performed to incorporate a cell barcode. For example, a barcode bead including a plurality of cell barcodes can be provided in a reaction mixture in a droplet, such that the cell barcodes can be incorporated into the amplicons. Specifically, a cell barcode can be included in a primer sequence that includes a constant region sequence that is complementary to the constant sequence (shown as “seq8” sequence). Thus, through subsequent nucleic acid amplification cycles, the cell barcode is incorporated into the amplicon.
[00203] The bottom panel of FIG. 13B shows an example final amplicon following single cell processing. Here, the amplicon includes at least 1) a cell barcode sequence, 2) constant region (shown as “seq8”), 3) a first gene specific sequence (e.g., gene specific 1), 4) a second gene specific sequence (e.g., gene specific 2), and 5) molecular tag. The amplicon can further include read sequences such as P5 index and P7 index sequences and/or a read 2 sequence. The amplicon can undergo subsequent sequencing. Given the read sequences, the amplicon can be attributed to a particular cell via the cell barcode. Additionally and/or alternatively, the amplicon can be attributed to a particular RNA molecule via the gene specific sequence and the molecular tag.
[00204] Reference is now made to FIG. 14A, which depicts the in situ processing of RNA, in accordance with a third embodiment. FIG. 14A differs from FIG. 13A in which a second primer sequence is provided which includes 1) a gene specific sequence (labeled as “gene specific 1”), 2) a molecular tag, and 3) a constant region (labeled as “seq8”).
[00205] Specifically, FIG. 14A shows a RNA molecule from a cell, which includes a first RNA sequence (labeled as “RNA sequence 1” in FIG. 1 A), a second RNA sequence (labeled as “RNA sequence 2”) and a poly-A tail. The first RNA sequence and second RNA sequence may be adjacent to one another. The in situ processing involves providing a primer sequence, which includes a gene specific sequence (shown in FIG. 14A as “Gene specific 2”), a molecular tag sequence (labeled as “MT” in FIG. 14A), and a PCR adaptor sequence. The gene specific sequence of the primer sequence hybridizes with a RNA specific sequence (e.g., RNA sequence 2) of the RNA molecule. Additionally, the in situ processing involves providing a second primer sequence which includes 1) a gene specific sequence (labeled as “gene specific 1”), 2) a molecular tag, and 3) a constant region (labeled as “seq8”). Here, the gene specific 1 sequence is designed to hybridize with a corresponding RNA sequence (e.g., RNA sequence 1) of the RNA molecule. In various embodiments, the gene specific 1 sequence includes a phosphate group, such as a 5’ phosphate group. [00206] As shown in FIG. 14 A, two molecular tags are provided via two primer sequences. However, in some embodiments, only a single molecular tag is included. For example, in some embodiments, a first primer sequence includes 1) gene specific sequence (labeled as “gene specific 2”), 2) a molecular tag, and 3) PCR adaptor, and the second primer sequence includes 1) gene specific sequence (labeled as “gene specific 1”) and 2) constant region (labeled as “seq8”). In some embodiments, a first primer sequence includes 1) gene specific sequence (labeled as “gene specific 2”) and 2) PCR adaptor, and the second primer sequence includes 1) gene specific sequence (labeled as “gene specific 1”), 2) molecular tag, and 2) constant region (labeled as “seq8”).
[00207] The two gene specific sequences (e.g., gene specific 1 and gene specific 2) undergo ligation. A ligase (e.g., DNA ligase) is provided which ligates the two gene specific sequences. In particular embodiments, the ligase ligates the gene specific sequences by catalyzing the formation of a phosphodiester bond between the 5’ phosphate group of the gene specific 1 sequence and the 3 ’-hydroxyl group of the gene specific 2 sequence. This generates an intermediate nucleic acid that includes 1) the first gene specific sequence (e.g., gene specific 1), 2) the second gene specific sequence (e.g., gene specific 2), 3) one or two molecular tags, and 4) the PCR adaptor. Following ligation, the cell can be provided for single cell partitioning.
[00208] FIGs. 14B and 14C depict the in droplet processing of RNA, in accordance with the embodiment shown in FIG. 14 A. Specifically, FIG. 14B shows in droplet processing of RNA as well as genomic DNA. In various embodiments, the in droplet processing shown in FIG. 13B refers to the steps that occur in a first droplet during analyte release 165 (see FIG. IB) of a two step process. In various embodiments, the in droplet processing shown in FIG. 13B refers to the steps that occur in a second droplet during cell barcoding 170 (see FIG. IB). [00209] Referring first to the DNA template, a forward and reverse primer pair can be provided. As shown in FIG. 14B, the forward primer can include a gene specific sequence (labeled as “Gene specific FW”) and a constant region (labeled as “seq8”). The reverse primer can include another gene specific sequence (labeled as “gene specific RV”) and a read sequence (“Read 2”). The DNA template can undergo nucleic acid amplification to generate amplicons that include 1) constant region (“seq8”), 2) DNA template sequence, and 3) read sequence (“Read 2”).
[00210] Referring to the RNA sequence, FIG. 14B shows the intermediate nucleic acid generated via the in situ processing that includes 1) the first gene specific sequence (e.g., gene specific 1), 2) the second gene specific sequence (e.g., gene specific 2), 3) one or two molecular tags, and 4) the PCR adaptor.
[00211] FIG. 14B depicts the intermediate nucleic acid generated via the in situ processing, the intermediate nucleic acid including the 1) the gene specific sequence, 2) the poly-T sequence, 3) the molecular tag (MT), and 4) the PCR adaptor. Within the droplet, an additional primer sequence is provided which enables nucleic acid amplification. As shown in FIG. 13B, the additional primer sequence includes a gene specific sequence and a constant sequence (shown as “seq8” sequence in FIG. 13B). The gene specific sequence of the additional primer sequence hybridizes with the gene specific 1 sequence of the intermediate nucleic acid and serves to initiate nucleic acid amplification to generate amplicons.
[00212] Interestingly, the inclusion of the constant region “seq8” in the intermediate nucleic acid derived from RNA solves an issue that arises when processing both genomic DNA and RNA. Specifically, the quantity of RNA molecules and gDNA molecules in a single cell is biased towards RNA (e.g., there is 100-1000 fold more RNA molecules than gDNA molecules). Therefore, a final sequencing library would typically be biased towards RNA, resulting in underrepresentation of DNA. Here, the inclusion of the constant region “seq8” in the intermediate nucleic acid derived from RNA reduces exponential amplification of the intermediate nucleic acid derived from RNA relative to the DNA template. This is due to different annealing temperatures of gene specific primers for amplifying the DNA template and gene specific primers for amplifying the intermediate nucleic acid derived from RNA. Therefore, by controlling the temperature of the droplet, the exponential amplification of RNA and DNA amplicons can be controlled to mitigate the RNA:DNA biasing. In particular embodiments, the steps shown in FIG. 14B are performed at an elevated temperature between 55°C and 65°C. In particular embodiments, the steps shown in FIG. 14B are performed at an elevated temperature between 56°C and 64°C, between 57°C and 63°C, between 58°C and 62°C, or between 59°C and 61°C. In particular embodiments, the steps shown in FIG. 14B are performed at an elevated temperature of about 61 °C.
[00213] Referring to FIG. 14C, further amplification takes place to incorporate cell barcode sequences into both DNA and RNA amplicons. Here, FIG. 14C shows steps that occur in a second droplet during cell barcoding 170 (see FIG. IB). In particular, amplification includes providing primer sequences which include 1) common sequence (labeled as “seq8”), 2) cell barcode sequence, and 3) read sequence (“Read 1”). These primer sequences hybridize with constant sequences (labeled as “seq8”) that are present on both the DNA and RNA sequences. Thus, nucleic acid amplification is initiated using the primer sequences to incorporate cell barcodes into the resulting DNA and RNA amplicons.
[00214] Here, the steps shown in FIG. 14C can be performed at a temperature that facilitates the annealing of the primer sequences. In particular embodiments, the steps shown in FIG. 14C are performed at an elevated temperature between 45°C and 55°C. In particular embodiments, the steps shown in FIG. 14C are performed at an elevated temperature between 45°C and 52°C, between 45°C and 50°C, between 46°C and 49°C, or between 47°C and 48°C. In particular embodiments, the steps shown in FIG. 14C are performed at an elevated temperature of about 48°C.
[00215] FIG. 14D depicts example amplicons of the DNA and RNA library, in accordance with the embodiments shown in FIGs. 14A-14C. Here, the final DNA amplicon includes at least 1) a cell barcode sequence, 2) constant region (shown as “seq8”), 3) DNA sequence (e.g., DNA template specific 1). The DNA amplicon can further include read sequences such as P5 index and P7 index sequences and/or a read 2 sequence. A molecular tag is not present in the DNA amplicon. The final RNA amplicon includes at least 1) a cell barcode sequence, 2) constant region (shown as “seq8”), 3) one or two gene specific sequences (e.g., gene specific 1 and/or gene specific 2), and 4) one or two molecular tags. The RNA amplicon can further include read sequences such as P5 index and P7 index sequences and/or a read 2 sequence. The DNA and RNA amplicons can undergo subsequent sequencing. Given the read sequences, DNA and RNA amplicon sequences can be attributed to a particular cell via the cell barcode. Additionally, RNA amplicons can be attributed to a particular RNA molecule via the gene specific sequence and the molecular tag(s).
[00216] Embodiments disclosed herein further involve performing in situ reverse transcription (RT). By performing in situ RT as opposed to in droplet RT, a wider range of experimental optimizations can be explored. For example, higher efficiency RT can be achieved in situ whereas in droplet, RT efficiency can be constrained by droplet characteristics (e.g., volume, concentration of reagents). Furthermore, wash steps to remove excess reagents can be performed in situ whereas wash steps cannot be performed in a droplet. As RT reagents are introduced in situ, excess RT reagents can be removed through a wash process. By removing excess RT reagents, including excess molecular tags, this enables more accurate downstream quantification of molecules (e.g., RNA transcripts).
[00217] Reference is now made to FIG. 15 A, which depicts the in situ processing of RNA involving reverse transcription, in accordance with an embodiment. Here, FIG. 15A depicts an in situ processing methodology that involves providing primer sequences that initiate reverse transcription to generate sequences for subsequent single cell analysis. FIG. 15A shows a RNA molecule from a cell, which includes a RNA sequence (labeled as “RNA” in FIG. 15 A) and a poly-A tail. The in situ processing involves providing a primer sequence, which includes a poly-T sequence, a molecular tag sequence (labeled as “MT” in FIG. 15 A), and a PCR adaptor sequence. The poly-T sequence of the primer sequence hybridizes with a sequence of the poly A tail of the RNA molecule. Additionally, the in situ processing involves providing a reverse transcriptase enzyme for performing reverse transcription. Reverse transcriptase extends the poly-T sequence to generate a complementary strand with a sequence complementary to the RNA sequence. This now generates an intermediate nucleic acid that includes 1) a gene specific sequence (including a sequence complementary to the RNA sequence and the poly-T sequence), 3) the molecular tag (MT), and 4) the PCR adaptor. Following reverse transcription, the cell can be provided for single cell partitioning.
[00218] Reference is made to FIG. 15B, which depicts the in situ processing of RNA involving reverse transcription, in accordance with a second embodiment. FIG. 15B differs from FIG. 15A in which the provided primers are gene specific primers that hybridize with a specific RNA sequence as opposed to a poly-A tail. FIG. 15B shows a RNA molecule from a cell, which includes a first RNA sequence (labeled as “RNA sequence 1” in FIG. 15B), a second RNA sequence (labeled as “RNA sequence 2” in FIG. 15B), and a poly-A tail. The first RNA sequence and second RNA sequence may be adjacent to one another. The in situ processing involves providing a reverse primer sequence, which includes a gene specific sequence that is complementary to a RNA sequence (e.g., RNA sequence 2), a molecular tag sequence (labeled as “MT” in FIG. 15B), and a PCR adaptor sequence. The gene specific sequence of the primer sequence hybridizes with a RNA specific sequence (e.g., RNA sequence 2) of the RNA molecule. Additionally, the in situ processing involves providing a reverse transcriptase enzyme for performing reverse transcription. Reverse transcriptase extends the gene specific sequence to generate a complementary strand with a sequence complementary to RNA sequence 1. This now generates an intermediate nucleic acid that includes 1) a gene specific sequence (including a sequence complementary to the RNA sequence 1 and complementary to RNA sequence 2), 3) the molecular tag (MT), and 4) the PCR adaptor. Following reverse transcription, the cell can be provided for single cell partitioning. [00219] FIG. 15C depicts the in droplet processing (e.g., encapsulation and barcoding), in accordance with an embodiment. FIG. 15C shows a top panel which includes steps that occur in a first droplet during analyte release 165 (see FIG. IB), and a bottom panel which includes steps that occur in a second droplet during cell barcoding 170 (see FIG. IB). [00220] Referring to the top panel, a gene specific primer is provided to hybridize with a complementary gene specific sequence of the intermediate nucleic acid derived from RNA. Here, the gene specific primer includes a gene specific sequence (labeled as “gene specific” in FIG. 15C) and a constant region (labeled as “seq8” in FIG. 15C). Additionally, a forward and reverse primer pair is provided for hybridizing with complementary sequences of the genomic DNA sequence. As shown in FIG. 15C, the forward primer can include a forward primer sequence (“fwd primer”) that hybridizes with a complementary sequence of the genomic DNA sequence, and further includes a constant region (“seq8F”). The reverse primer can include a reverse primer sequence (“rev primer”) that hybridizes with a complementary sequence of the genomic DNA sequence, and further includes a read 2 sequence.
[00221] Referring to the bottom panel, urther amplification takes place to incorporate cell barcode sequences into both DNA and RNA amplicons. In particular, amplification includes providing primer sequences which include 1) common sequence (labeled as “seq8”), 2) cell barcode sequence (labeled as “CBC”), and 3) read sequence (“Read 1”). These primer sequences hybridize with constant sequences (labeled as “seq8”) that are present on both the DNA and RNA amplicons. Thus, nucleic acid amplification is initiated using the primer sequences to incorporate cell barcodes into the resulting DNA and RNA amplicons.
[00222] FIG. 15D depicts example amplicons of the DNA and RNA library, in accordance with the embodiments shown in FIGs. 15A-15C. Here, the final DNA amplicon includes at least 1) a cell barcode sequence, 2) constant region (shown as “seq8”), and 3) gDNA sequence. The DNA amplicon can further include read sequences such as P5 index and P7 index sequences and/or a read 2 sequence. A molecular tag is not present in the DNA amplicon. The final RNA amplicon includes at least 1) a cell barcode sequence, 2) constant region (shown as “seq8”), 3) a gene specific sequence, and 4) a molecular tag. The RNA amplicon can further include read sequences such as P5 index and P7 index sequences and/or a read 2 sequence. The DNA and RNA amplicons can undergo subsequent sequencing. Given the read sequences, DNA and RNA amplicon sequences can be attributed to a particular cell via the cell barcode. Additionally, RNA amplicons can be attributed to a particular RNA molecule via the gene specific sequence and the molecular tag(s).
Barcodes and Barcoded Beads
[00223] Embodiments of the invention involve providing one or more barcode sequences for labeling analytes of a single cell during step 170 shown in FIG. IB. The one or more barcode sequences are encapsulated in an emulsion with a cell lysate derived from a single cell. As such, the one or more barcodes label analytes of the cell, thereby enabling the subsequent determination that sequence reads derived from the analytes originated from the cell.
[00224] In various embodiments, a plurality of barcodes are added to an emulsion with a cell lysate. In various embodiments, the plurality of barcodes added to an emulsion includes at least 102, at least 103, at least 104, at least 105, at least 105, at least 106, at least 107, or at least 108 barcodes. In various embodiments, the plurality of barcodes added to an emulsion have the same barcode sequence. In various embodiments, the plurality of barcodes added to an emulsion comprise molecular tag sequences. In some embodiments, molecular tag sequences can be provided in another stage of the single cell analysis, examples of which are described herein. In various embodiments, a molecular tag has a sequence which can be used to identify and/or distinguish one or more first molecules to which the molecular tag is conjugated from one or more second molecules. In some embodiments, both a barcode sequence and a molecular tag are incorporated into a barcode. In various embodiments, a molecular tag is used to distinguish between molecules of a similar type within a population or group, whereas a barcode sequence is used to distinguish between populations or groups of molecules that are derived from different cells. In various embodiments, a molecular tag can be used to count or quantify numbers of particular molecules (e.g., quantify number of RNA transcripts). In some embodiments, where both a molecular tag and a barcode sequence are utilized, the molecular tag is shorter in sequence length than the barcode sequence. The use of barcodes is further described in US Patent Application No. 15/940,850, which is hereby incorporated by reference in its entirety.
[00225] In some embodiments, the barcodes are single-stranded barcodes. Single-stranded barcodes can be generated using a number of techniques. For example, they can be generated by obtaining a plurality of DNA barcode molecules in which the sequences of the different molecules are at least partially different. These molecules can then be amplified so as to produce single stranded copies using, for instance, asymmetric PCR. Alternatively, the barcode molecules can be circularized and then subjected to rolling circle amplification. This will yield a product molecule in which the original DNA barcoded is concatenated numerous times as a single long molecule.
[00226] In some embodiments, circular barcode DNA containing a barcode sequence flanked by any number of constant sequences can be obtained by circularizing linear DNA. Primers that anneal to any constant sequence can initiate rolling circle amplification by the use of a strand displacing polymerase (such as Phi29 polymerase), generating long linear concatemers of barcode DNA.
[00227] In various embodiments, barcodes can be linked to a primer sequence that enables the barcode to label a target nucleic acid. In one embodiment, the barcode is linked to a forward primer sequence. In various embodiments, the forward primer sequence is a gene specific primer that hybridizes with a forward target of a nucleic acid. In various embodiments, the forward primer sequence is a constant region, such as a PCR handle, that hybridizes with a complementary sequence attached to a gene specific primer. The complementary sequence attached to a gene specific primer can be provided in the reaction mixture (e.g., reaction mixture 140 in FIG. IB). Including a constant forward primer sequence on barcodes may be preferable as the barcodes can have the same forward primer and need not be individually designed to be linked to gene specific forward primers.
[00228] In various embodiments, barcodes can releasably attached to a support structure, such as a bead. Therefore, a single bead with multiple copies of barcodes can be partitioned into an emulsion with a cell lysate, thereby enabling labeling of analytes of the cell lysate with the barcodes of the bead. Example beads include solid beads (e.g., silica beads), polymeric beads, or hydrogel beads (e.g., polyacrylamide, agarose, or alginate beads). Beads can be synthesized using a variety of techniques. For example, using a mix-split technique, beads with many copies of the same, random barcode sequence can be synthesized. This can be accomplished by, for example, creating a plurality of beads including sites on which DNA can be synthesized. The beads can be divided into four collections and each mixed with a buffer that will add a base to it, such as an A, T, G, or C. By dividing the population into four subpopulations, each subpopulation can have one of the bases added to its surface. This reaction can be accomplished in such a way that only a single base is added and no further bases are added. The beads from all four subpopulations can be combined and mixed together, and divided into four populations a second time. In this division step, the beads from the previous four populations may be mixed together randomly. They can then be added to the four different solutions, adding another, random base on the surface of each bead. This process can be repeated to generate sequences on the surface of the bead of a length approximately equal to the number of times that the population is split and mixed. If this was done 10 times, for example, the result would be a population of beads in which each bead has many copies of the same random 10-base sequence synthesized on its surface. The sequence on each bead would be determined by the particular sequence of reactors it ended up in through each mix-split cycle. Additional details of example beads and their synthesis is described in International Application No. PCT/US2016/016444, which is hereby incorporated by reference in its entirety.
Molecular Tags
[00229] Methods disclosed herein involve incorporating one or more molecular tags. Generally, a molecular tag is a nucleic acid sequence which can be used to identify and/or distinguish one or more first analytes to which the molecular tag is conjugated from one or more second analytes to which a different molecular tag is conjugated. Generally, a molecular tag includes at least a contiguous string of nucleotides. Molecular tags may be single or double stranded. In various embodiments, a nucleic acid for sequencing, such as a DNA amplicon or a RNA amplicon, includes one or more molecular tags. Each amplicon (e.g., DNA amplicon or RNA amplicon) may include one or more (e.g., two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, or ten or more) molecular tags.
[00230] In various embodiments, a molecular tag may include about 6 to about 20 nucleotides. In some embodiments, a molecular tag includes 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides. In some embodiments, a molecular tag includes at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or at least 20 nucleotides in length. In various embodiments, every molecular tag is different from every other molecular tag (e.g., each molecular tag is unique). In various embodiments, a molecular tag is randomly generated. In such embodiments, molecular tags may not be unique. For example a first randomly generated molecular tag may have a same sequence as a sequence of a second randomly generated molecular tag. [00231] In various embodiments, a molecular tag includes a nucleic acid sequence that is not complementary to a corresponding nucleic acid sequence of an analyte (or a sequence derived from an analyte). For example, a molecular tag includes a nucleic acid sequence that is less than 80% complementary, less than 70% complementary, less than 60% complementary, less than 50% complementary, less than 40% complementary, less than 30% complementary, less than 20% complementary, or less than 10% complementary to a corresponding nucleic acid sequence of an analyte.
[00232] As described herein, one or more molecular tags can be incorporated to distinguish between analytes via a variety of methods. For example, one or more molecular tags are incorporated during in situ processing of cells. As another example, one or more molecular tags are incorporated during cell encapsulation 160 or analyte release 165 (shown in FIG. IB). As another example, one or more molecular tags are incorporated during cell barcoding 170 and/or target amplification 175 (shown in FIG. IB).
Reagents
[00233] Embodiments described herein include the encapsulation of a cell with reagents within an emulsion. In various embodiments, the reagents interact with the encapsulated cell under conditions in which the cell is lysed, thereby releasing target analytes of the cell. The reagents can further interact with target analytes to prepare for subsequent barcoding and/or amplification.
[00234] In various embodiments, the reagents include one or more lysing agents that cause the cell to lyse. Examples of lysing agents include detergents such as Triton X-100, Nonidet P-40 (NP40) as well as cytotoxins. In various embodiments, the reagents further include agents that interact with target analytes that are released from a single cell. One example of such an agent includes reverse transcriptase which reverse transcribes messenger RNA transcripts released from the cell to generate corresponding cDNA.
[00235] In various embodiments, the reagents encapsulated with the cell include ddNTPs, inhibitors such as ribonuclease inhibitor, and stabilization agents such as dithothreitol (DTT). In various embodiments, the reagents further include proteases that assist in the lysing of the cell and/or accessing of genomic DNA. In various embodiments, proteases in the reagents can include any of proteinase K, pepsin, protease — subtilisin Carlsberg, protease type X- bacillus therm oproteolyticus, or protease type XIII — aspergillus Saitoi. In various embodiments, the reagents include deoxyribonucleotide triphosphate (dNTP) reagents including deoxyadenosine triphosphate, deoxycytosine triphosphate, deoxyguanine triphosphate, and deoxythymidine triphosphate.
[00236] In various embodiments, the reagents include agents that interact with target analytes that are released from a single cell. For example, the reagents include reverse transcriptase which reverse transcribes mRNA transcripts released from the cell to generate corresponding cDNA. As another example, the reagents include primers that hybridize with mRNA transcripts, thereby enabling the reverse transcription reaction to occur. In various embodiments, such primers are digestible oligonucleotides that participate in the reverse transcription reaction, but are subsequently digested to prevent their participation in subsequent reactions.
[00237] In various embodiments, the reagents include agents for digesting the digestible oligonucleotides. In such embodiments, the agents digest the digestible oligonucleotides while in a droplet, such as a first droplet generated during the cell encapsulation step (step 160 in FIG. IB). In various embodiments, agents for digesting the digestible oligonucleotides are enzymes. In some embodiments, an agent for digesting the digestible oligonucleotides is a RNaseH enzyme.
Reaction Mixture
[00238] As described herein, a reaction mixture is provided into an emulsion with a cell lysate (e.g., see cell barcoding step 170 in FIG. IB). Generally, the reaction mixture includes reactants sufficient for performing a reaction, such as nucleic acid amplification, on analytes of the cell lysate.
[00239] In various embodiments, the reaction mixture includes primers that are capable of acting as a point of initiation of synthesis along a complementary strand when placed under conditions in which synthesis of a primer extension product which is complementary to a nucleic acid strand is catalyzed. In various embodiments, the reaction mixture includes the four different deoxyribonucleoside triphosphates (adenosine, guanine, cytosine, and thymine). In various embodiments, the reaction mixture includes enzymes for nucleic acid amplification. Examples of enzymes for nucleic acid amplification include DNA polymerase, thermostable polymerases for thermal cycled amplification, or polymerases for multiple-displacement amplification for isothermal amplification. Other, less common forms of amplification may also be applied, such as amplification using DNA- dependent RNA polymerases to create multiple copies of RNA from the original DNA target which themselves can be converted back into DNA, resulting in, in essence, amplification of the target. Living organisms can also be used to amplify the target by, for example, transforming the targets into the organism which can then be allowed or induced to copy the targets with or without replication of the organisms.
[00240] In various embodiments, the reagents include deoxyribonucleotide triphosphate (dNTP) reagents including deoxyadenosine triphosphate, deoxycytosine triphosphate, deoxyguanine triphosphate, and deoxythymidine triphosphate.
[00241] The extent of nucleic amplification can be controlled by modulating the concentration of the reactants in the reaction mixture. In some instances, this is useful for fine tuning of the reactions in which the amplified products are used.
[00242] In various embodiments, the reaction mixture include agents for digesting the digestible oligonucleotides. In various embodiments, agents for digesting the digestible oligonucleotides are enzymes. In such embodiments, the agents digest the digestible oligonucleotides while in a droplet, such as a second droplet generated during the barcoding step (step 170 in FIG. IB). The reaction mixture can include enzymes selected from any of UDG, RNaseH, or RNaseA. Here in the second droplet, the digestible oligonucleotides have already primed the RNA transcript and reverse transcription has occurred. Therefore, providing any of these enzymes in the second droplet enables the digestion of the digestible oligonucleotides after they participated in the reverse transcription reaction.
Primers
[00243] Embodiments of the invention described herein use primers to conduct the singlecell analysis. For example, primers are implemented during the workflow process shown in FIG. IB. Primers can be used to prime (e.g., hybridize) with specific sequences of nucleic acids of interest, such that the nucleic acids of interest can be processed (e.g., reverse transcribed, barcoded, and/or amplified). Additionally, primers enable the identification of target regions following sequencing.
[00244] In various embodiments, primers described herein are between 5 and 50 nucleobases in length. In various embodiments, primers described herein are between 7 and 45 nucleobases in length. In various embodiments, primers described herein are between 10 and 40 nucleobases in length. In various embodiments, primers described herein are between 12 and 35 nucleobases in length. In various embodiments, primers described herein are between 15 and 32 nucleobases in length. In various embodiments, primers described herein are between 18 and 30 nucleobases in length. In various embodiments, primers described herein are between 18 and 25 nucleobases in length. [00245] Referring again to FIG. IB, in various embodiments, primers can be included in the reagents 120 that are encapsulated with the cell 110. In various embodiments, primers included in the reagents are useful for priming RNA transcripts and enabling reverse transcription of the RNA transcripts. In various embodiments, primers in the reagents 120 can include RNA primers for priming RNA and/or for priming genomic DNA. In various embodiments, the primers included in the reagents are digestible primers included within digestible oligonucleotides. Digestible primers can be digested at the appropriate time to ensure that subsequent reactions are not impacted by the presence of the digestible primers. In particular embodiments, digestible primers participate in a first reaction, such as a reverse transcriptase reaction, and are digested to prevent their participation in a second reaction, such as a nucleic acid amplification reaction.
[00246] In various embodiments, primers can be included in the reaction mixture 140 that is encapsulated with the cell lysate 130. In various embodiments, primers included in the reaction mixture are useful for priming nucleic acids (e.g., cDNA, gDNA, and/or amplicons of cDNA/gDNA) and enabling nucleic acid amplification of the nucleic acids. Such primers in the reaction mixture 140 can include cDNA primers for priming cDNA that have been reverse transcribed from RNA and/or DNA primers for priming genomic DNA and/or for priming products that have been generated from the genomic DNA. In various embodiments, primers of the reagents and primers of the reaction mixture form primer sets (e.g., forward primer and reverse primer) for a region of interest on a nucleic acid. In various embodiments, primers can be included in or linked with a barcode 145 that is encapsulated with the cell lysate 130. Further description and examples of primers that are used in a single-cell analysis workflow process is described in US Application No. 16/749,731, which is hereby incorporated by reference in its entirety.
[00247] In various embodiments, the number of primers in any of the reagents, the reaction mixture, or with barcodes may range from about 1 to about 500 or more, e.g., about 2 to 100 primers, about 2 to 10 primers, about 10 to 20 primers, about 20 to 30 primers, about 30 to 40 primers, about 40 to 50 primers, about 50 to 60 primers, about 60 to 70 primers, about 70 to 80 primers, about 80 to 90 primers, about 90 to 100 primers, about 100 to 150 primers, about 150 to 200 primers, about 200 to 250 primers, about 250 to 300 primers, about 300 to 350 primers, about 350 to 400 primers, about 400 to 450 primers, about 450 to 500 primers, or about 500 primers or more. [00248] For targeted nucleic acid (e.g., targeted DNA or targeted RNA) sequencing, primers in the reagents (e.g., reagents 120 in FIG. IB) may include primers that are complementary to a target on a nucleic acid of interest (e.g., DNA or RNA). In various embodiments, primers in the reagents are gene-specific primers. In various embodiments, primers in the reagents are universal primers. Example universal primers include primers including at least 3 consecutive deoxythymidine nucleobases (e.g., oligo dT primer), at least 3 consecutive deoxyuridine sequences (e.g., oligo dU primer), or at least 3 consecutive ribouridine sequences (e.g., oligo rU primer).
[00249] In various embodiments, such primers in the reagents are reverse primers. In particular embodiments, primers in the reagents are only reverse primers and do not include forward primers. In various embodiments, for targeted nucleic acid (e.g., targeted DNA or targeted RNA) sequencing, primers in the reaction mixture (e.g., reaction mixture 140 in FIG. IB) include forward primers that are complementary to a forward target on a nucleic acid of interest (e.g., RNA or gDNA). In particular embodiments, the reaction mixture includes forward primers that are complementary to a forward target on a cDNA strand (generated from a RNA transcript) and further includes forward primers that are complementary to a forward target on gDNA. In various embodiments, primers in the reaction mixture are genespecific primers that target a forward target of a gene of interest.
[00250] The number of forward or reverse primers for genes of interest that are added may be from about one to 500, e.g., about 1 to 10 primers, about 10 to 20 primers, about 20 to 30 primers, about 30 to 40 primers, about 40 to 50 primers, about 50 to 60 primers, about 60 to 70 primers, about 70 to 80 primers, about 80 to 90 primers, about 90 to 100 primers, about 100 to 150 primers, about 150 to 200 primers, about 200 to 250 primers, about 250 to 300 primers, about 300 to 350 primers, about 350 to 400 primers, about 400 to 450 primers, about 450 to 500 primers, or about 500 primers or more. In various embodiments, genes of interest for either DNA-sequencing or RNA-sequencing include, but are not limited to: CCND3, CD44, CCND1, CD33, CDK6, CDK4, CDKN1B, CREB3L4, CDKN1A, CREBBP, CREB3L1, CREB5, CREB1, ELK1, FOS, FHL1, FASLG, GNG12, GSK3B, BAD, FOXO4, FOXO1, HIF1A, HSPB1, IKBKG, IRF9, BCL2, BCL2L11, MAP2K1 MAPK1, BCL2L1, MYB, NF1, NFKB1, MYC, PIK3CB, PIM1, PIAS1, PRKCB, PTEN, HSPA1A, HSPA2, IL2RB, IL2RA, SIRT1, NCL, RHOA, MCM4, NASP, S0S1, TCL1B, SOCS3, SOCS2, STAT4, STAT6, SRF, TP53, CASP9, CASP3, CASP8, UBB, MPRL16, MRPL21, FAM32A, ABCB7, PCBP1. EPS15, NRAS, RPS27A, AFF3, PAX3, CMTM6, RHOA, PIK3CA, MAP3K13, NSD1, PTPRK, CARD11, EGFR, EZH2, WRN, JAK2, GAT A3, DKK1, P0LA2, CCND1, ATM, ARHGEF12, KRAS, C0L2A1, KMT2D, CLIP1, FLT3, BRCA2, BUB IB, PALB2, FANCA, NC0R1, ERBB2, KAT2A, RAB5C, METTL23, SRSF2, MFSD11, DNM2, CIC, BCR, MYH9, EP300, and SSX1.
[00251] For whole transcriptome RNA sequencing, in various embodiments, the primers of the reagents (e.g., reagents 120 in FIG. IB) can include a random primer sequence. In various embodiments, the random primer hybridizes with a sequence of reverse transcribed cDNA, thereby enabling priming off of the cDNA. In various embodiments, the reagents 120 includes various different random primers that enables priming off of all or a majority of cDNA generated from mRNA transcripts across the transcriptome. This enables the processing and analysis of mRNA transcripts across the whole transcriptome. In various embodiments, a random primer comprises a sequence of 5 nucleobases. In various embodiments, a random primer comprises a sequence of 6 nucleobases. In various embodiments, a random primer comprises a sequence of 9 nucleobases. In various embodiments, a random primer comprises a sequence of at least 5 nucleobases. In various embodiments, a random primer comprises a sequence of at least 6 nucleobases. In various embodiments, a random primer comprises a sequence of at least 9 nucleobases. In various embodiments, a random primer comprises a sequence of at least 6 nucleobases, at least 7 nucleobases, at least 8 nucleobases, at least 9 nucleobases, at least 10 nucleobases, at least 11 nucleobases, at least 12 nucleobases, at least 13 nucleobases, at least 14 nucleobases, at least 15 nucleobases, at least 16 nucleobases, at least 17 nucleobases, at least 18 nucleobases, at least 19 nucleobases, at least 20 nucleobases, at least 21 nucleobases, at least 22 nucleobases, at least 23 nucleobases, at least 24 nucleobases, at least 25 nucleobases, at least 26 nucleobases, at least 27 nucleobases, at least 28 nucleobases, at least 29 nucleobases, at least 30 nucleobases, at least 31 nucleobases, at least 32 nucleobases, at least 33 nucleobases, at least 34 nucleobases, or at least 35 nucleobases.
[00252] In various embodiments, a random primer includes one or more ribonucleotide nucleobases. In some embodiments, the random primer 624 include one ribonucleotide nucleobase on the 3’ end. In some embodiments, the random primer 624 includes two ribonucleotide nucleobases on the 3’ end. In some embodiments, the random primer 624 includes three, four, five, six, seven, eight, nine, or ten ribonucleotide nucleobases on the 3’ end. The presence of ribonucleotide primers on the 3’ end of the random primer ensures that the random primer enables extension only on cDNA and not on RNA. [00253] In various embodiments, the reagents include a reverse primer that is complementary to a portion of mRNA transcripts. In various embodiments, the reverse primer is a universal primer, such as any one of an oligo dT primer, oligo dU primer, or an oligo rU primer. For example, the universal primer region can be an oligo dT sequence that hybridizes with the poly A tail of messenger RNA transcripts. Therefore, the reverse primer hybridizes with a portion of mRNA transcripts and enables generation of cDNA strands through reverse transcription of the mRNA transcripts.
[00254] In various embodiments, for whole transcriptome RNA sequencing, the primers of the reaction mixture (e.g., reaction mixture 140 in FIG. IB) include constant forward primers and constant reverse primers. The constant forward primers hybridize with the random forward primer that enabled priming off the cDNA. The constant reverse primers hybridize with a sequence of the reverse constant region, such as a PCR handle, that previously enabled reverse transcription of the mRNA transcript.
[00255] In various embodiments, primers included in the reagents (e.g., reagents 120 in FIG. IB) or the reaction mixture (e.g., reaction mixture 140 in FIG. IB) include additional sequences. Such additional sequences may have functional purposes. For example, a primer may include a read sequence for sequencing purposes. As another example, a primer may include a constant region. Generally, the constant region of a primer can hybridize with a complementary constant region on another nucleic acid sequence for incorporation of the nucleic acid sequence during nucleic acid amplification. For example, the constant region of a primer can be complementary to a complementary constant region of a barcode sequence. Thus, during nucleic acid amplification, the barcode sequence is incorporated into generated amplicons.
[00256] In various embodiments, instead of the primers being included in the reaction mixture (e.g., reaction mixture 140 in FIG. IB) such primers can be included or linked to a barcode (e.g., barcode 145 in FIG. IB). In particular embodiments, the primers are linked to an end of the barcode and therefore, are available to hybridize with target sequences of nucleic acids in the cell lysate.
[00257] In various embodiments, primers of the reaction mixture, primers of the reagents, or primers of barcodes may be added to an emulsion in one step, or in more than one step. For instance, the primers may be added in two or more steps, three or more steps, four or more steps, or five or more steps. Regardless of whether the primers are added in one step or in more than one step, they may be added after the addition of a lysing agent, prior to the addition of a lysing agent, or concomitantly with the addition of a lysing agent. When added before or after the addition of a lysing agent, the primers of the reaction mixture may be added in a separate step from the addition of a lysing agent (e.g., as exemplified in the two step workflow process shown in FIG. IB).
[00258] A primer set for the amplification of a target nucleic acid typically includes a forward primer and a reverse primer that are complementary to a target nucleic acid or the complement thereof. In some embodiments, amplification can be performed using multiple target-specific primer pairs in a single amplification reaction, wherein each primer pair includes a forward target-specific primer and a reverse target-specific primer, where each includes at least one sequence that substantially complementary or substantially identical to a corresponding target sequence in the sample, and each primer pair having a different corresponding target sequence. Accordingly, certain methods herein are used to detect or identify multiple target sequences from a single cell sample. [00259] Embodiments disclosed herein involve the use of digestible oligonucleotides. Digestible oligonucleotides include a digestible primer. In various embodiments, digestible oligonucleotides further include digestible molecular tags. Digestible primers can be primers that participate in the reverse transcription of RNA transcripts to generate cDNA, but are digested such that the digestible primers do not participate in subsequent reactions involving the cDNA (e.g., amplification of cDNA). In various embodiments, the step of digestion reduces or eliminates the presence of digestible primers (e.g., digestible primers that are primed on RNA transcripts, digestible primers that have formed undesired byproducts, and/or digestible primers that have misprimed genomic DNA). In some embodiments, digestible primers are reverse primers. In some embodiments, digestible primers are gene specific primers.
[00260] In particular embodiments, digestible oligonucleotides have one of the following characteristics: A) one or more ribonucleotide nucleobases, B) one or more uracil nucleobases, C) a repeating deoxyuridine sequence (e.g., oligo dUracil or oligo dU), or D) a repeating ribouridine sequence (e.g., oligo rUracil or oligo rU).
[00261] In various embodiments, digestible oligonucleotides include one or more ribonucleotide nucleobases. In various embodiments, every nucleobase of a ribonucleotide primer are ribonucleotide nucleobases. In various embodiments, a ribonucleotide oligonucleotide includes a combination of deoxyribonucleotide and ribonucleotide nucleobases. In various embodiments, ribonucleotide oligonucleotides have more ribonucleotide nucleobases than deoxyribonucleotide nucleobases. In various embodiments, at least 60% of nucleobases of a ribonucleotide oligonucleotide are ribonucleotide nucleobases. In various embodiments, at least 70% of nucleobases of a ribonucleotide oligonucleotide are ribonucleotide nucleobases. In various embodiments, at least 80% of nucleobases of a ribonucleotide oligonucleotide are ribonucleotide nucleobases. In various embodiments, at least 90% of nucleobases of a ribonucleotide oligonucleotide are ribonucleotide nucleobases. In various embodiments, between 55 and 90% of nucleobases of a ribonucleotide oligonucleotide are ribonucleotide nucleobases. In various embodiments, between 60 and 85% of nucleobases of a ribonucleotide oligonucleotide are ribonucleotide nucleobases. In various embodiments, between 70 and 80% of nucleobases of a ribonucleotide oligonucleotide are ribonucleotide nucleobases.
[00262] In various embodiments, ribonucleotide oligonucleotide have more deoxyribonucleotide nucleobases than ribonucleotide nucleobases. In various embodiments, at least 60% of nucleobases of a ribonucleotide oligonucleotide are deoxyribonucleotide nucleobases. In various embodiments, at least 70% of nucleobases of a ribonucleotide oligonucleotide are deoxyribonucleotide nucleobases. In various embodiments, at least 80% of nucleobases of a ribonucleotide oligonucleotide are deoxyribonucleotide nucleobases. In various embodiments, at least 90% of nucleobases of a ribonucleotide oligonucleotide are deoxyribonucleotide nucleobases. In various embodiments, between 55 and 90% of nucleobases of a ribonucleotide oligonucleotide are deoxyribonucleotide nucleobases. In various embodiments, between 60 and 85% of nucleobases of a ribonucleotide oligonucleotide are deoxyribonucleotide nucleobases. In various embodiments, between 70 and 80% of nucleobases of a ribonucleotide oligonucleotide are deoxyribonucleotide nucleobases.
[00263] In various embodiments, every other base of a ribonucleotide oligonucleotide are ribonucleotide nucleobases. In various embodiments, the ribonucleotide oligonucleotide comprises a ribonucleotide nucleobase every 3 nucleobases. In various embodiments, the ribonucleotide oligonucleotide comprises a ribonucleotide nucleobase every 4 nucleobases. In various embodiments, the ribonucleotide oligonucleotide comprises one ribonucleotide nucleobase every 5 nucleobases, every 6 nucleobases, every 7 nucleobases, every 8 nucleobases, every 9 nucleobases, or every 10 nucleobases.
[00264] In various, digestible oligonucleotides have one or more uracil nucleobases, hereafter referred to as “uracil oligonucleotides.” In various embodiments, uracil oligonucleotides have combination of deoxyribonucleotides and ribonucleotides nucleobases. In some embodiments, one or more thymidine nucleobases of a deoxyribonucleotide oligonucleotide can be replaced with uracil to generate a uracil oligonucleotide. In some embodiments, all thymidine nucleobases of a deoxyribonucleotide oligonucleotide can be replaced with uracils to generate a uracil oligonucleotide. In various embodiments, a uracil oligonucleotide has more deoxyribonucleotide nucleobases than uracil nucleobases. In some embodiments, a uracil oligonucleotide has more uracil nucleobases than deoxyribonucleotide nucleobases. In various embodiments, every other base of a uracil oligonucleotide is a uracil nucleobase. In various embodiments, the uracil oligonucleotide comprises a uracil nucleobase every 3 nucleobases. In various embodiments, the uracil oligonucleotide comprises a uracil nucleobase every 4 nucleobases. In various embodiments, the uracil oligonucleotide comprises a uracil nucleobase every 5 nucleobases, every 6 nucleobases, every 7 nucleobases, every 8 nucleobases, every 9 nucleobases, or every 10 nucleobases. [00265] In various embodiments, at least 30% of nucleobases of a uracil oligonucleotide are deoxyribonucleotide nucleobases. In various embodiments, at least 40% of nucleobases of a uracil oligonucleotide are deoxyribonucleotide nucleobases. In various embodiments, at least 50% of nucleobases of a uracil oligonucleotide are deoxyribonucleotide nucleobases. In various embodiments, at least 60% of nucleobases of a uracil oligonucleotide are deoxyribonucleotide nucleobases. In various embodiments, at least 70% of nucleobases of a uracil oligonucleotide are deoxyribonucleotide nucleobases. In various embodiments, at least 80% of nucleobases of a uracil oligonucleotide are deoxyribonucleotide nucleobases. In various embodiments, at least 90% of nucleobases of a uracil oligonucleotide are deoxyribonucleotide nucleobases. In various embodiments, at least 95% of nucleobases of a uracil oligonucleotide are deoxyribonucleotide nucleobases. In various embodiments, between 40 and 95% of nucleobases of a uracil oligonucleotide are deoxyribonucleotide nucleobases. In various embodiments, between 50 and 90% of nucleobases of a uracil oligonucleotide are deoxyribonucleotide nucleobases. In various embodiments, between 60 and 90% of nucleobases of a uracil oligonucleotide are deoxyribonucleotide nucleobases. In various embodiments, between 60 and 80% of nucleobases of a uracil oligonucleotide are deoxyribonucleotide nucleobases. In various embodiments, between 70 and 90% of nucleobases of a uracil oligonucleotide are deoxyribonucleotide nucleobases. In various embodiments, between 70 and 80% of nucleobases of a uracil oligonucleotide are deoxyribonucleotide nucleobases. In various embodiments, the uracil oligonucleotide has a sequence comprising two or more consecutive uracil nucleobases. In various embodiments, the uracil oligonucleotide has a sequence comprising three or more consecutive uracil nucleobases. In various embodiments, the uracil oligonucleotide has a sequence comprising four or more, five or more, six or more, seven or more, eight or more, nine or more, or ten or more consecutive uracil nucleobases.
[00266] In various embodiments, the digestible oligonucleotide having one or more uracil nucleobases includes a gene specific primer. Here, the digestible oligonucleotide would be designed in accordance with the target sequence on the specific gene. For example, based on the presence of an adenosine in the target sequence on the specific gene, the complementary base in the digestible uracil primer would be designed as a uracil. Thus, in such embodiments, the locations of uracil nucleobases in the uracil primer would be based on the target sequence and not positioned in any pattern.
[00267] In various, digestible primers have a repeating deoxyuridine sequence, hereafter referred to as “oligo dU primers.” In various embodiments, the repeating deoxyuridine sequence comprises three or more consecutive deoxyuridine nucleobases. In various embodiments, the repeating deoxyuridine sequence comprises four or more consecutive deoxyuridine nucleobases. In various embodiments, the repeating deoxyuridine sequence comprises five or more consecutive deoxyuridine nucleobases. In various embodiments, the repeating deoxyuridine sequence comprises six or more consecutive deoxyuridine nucleobases. In various embodiments, the repeating deoxyuridine sequence comprises seven or more consecutive deoxyuridine nucleobases. In various embodiments, the repeating deoxyuridine sequence comprises eight or more consecutive deoxyuridine nucleobases. In various embodiments, the repeating deoxyuridine sequence comprises nine or more consecutive deoxyuridine nucleobases. In various embodiments, the repeating deoxyuridine sequence comprises ten or more consecutive deoxyuridine nucleobases. In various embodiments, the repeating deoxyuridine sequence comprises eleven or more, twelve or more, thirteen or more, fourteen or more, fifteen or more, sixteen or more, seventeen or more, eighteen or more, nineteen or more, twenty or more, twenty one or more, twenty two or more, twenty three or more, twenty four or more, twenty five or more, twenty six or more, twenty seven or more, twenty eight or more, twenty nine or more, or thirty or more consecutive deoxyuridine nucleobases. In various embodiments, the repeating deoxyuridine sequence comprises between 5 and 30 consecutive deoxyuridine nucleobases. In various embodiments, the repeating deoxyuridine sequence comprises between 8 and 25 consecutive deoxyuridine nucleobases. In various embodiments, the repeating deoxyuridine sequence comprises between 12 and 18 consecutive deoxyuridine nucleobases.
[00268] In various embodiments, an oligo dU primer comprises a V or VN sequence, where “V” is any of an adenine (A), guanine (G), or cytosine (C) nucleobase and “N” is any of adenine (A), guanine (G), cytosine (C), or thymine (T) nucleobase. In various embodiments, the oligo dU primer terminates in the V or VN sequence (e.g., 3’ end of oligo dU contains the V or VN sequence).
[00269] In various, digestible primers have a repeating ribouridine sequence, hereafter referred to as “oligo rU primers.” In various embodiments, the repeating ribouridine sequence comprises three or more consecutive ribouridine nucleobases. In various embodiments, the repeating ribouridine sequence comprises four or more consecutive ribouridine nucleobases. In various embodiments, the repeating ribouridine sequence comprises five or more consecutive ribouridine nucleobases. In various embodiments, the repeating ribouridine sequence comprises six or more consecutive ribouridine nucleobases. In various embodiments, the repeating ribouridine sequence comprises seven or more consecutive ribouridine nucleobases. In various embodiments, the repeating ribouridine sequence comprises eight or more consecutive ribouridine nucleobases. In various embodiments, the repeating ribouridine sequence comprises nine or more consecutive ribouridine nucleobases. In various embodiments, the repeating ribouridine sequence comprises ten or more consecutive ribouridine nucleobases. In various embodiments, the repeating ribouridine sequence comprises eleven or more, twelve or more, thirteen or more, fourteen or more, fifteen or more, sixteen or more, seventeen or more, eighteen or more, nineteen or more, twenty or more, twenty one or more, twenty two or more, twenty three or more, twenty four or more, twenty five or more, twenty six or more, twenty seven or more, twenty eight or more, twenty nine or more, or thirty or more consecutive ribouridine nucleobases. In various embodiments, the repeating ribouridine sequence comprises between 5 and 30 consecutive ribouridine nucleobases. In various embodiments, the repeating ribouridine sequence comprises between 8 and 25 consecutive ribouridine nucleobases. In various embodiments, the repeating ribouridine sequence comprises between 12 and 18 consecutive ribouridine nucleobases.
[00270] In various embodiments, an oligo rU primer comprises a V or VN sequence, where “V” is any of an adenine (A), guanine (G), or cytosine (C) nucleobase and “N” is any of adenine (A), guanine (G), cytosine (C), or thymine (T) nucleobase. In various embodiments, the oligo rU primer terminates in the V or VN sequence (e.g., 3’ end of oligo dU contains the V or VN sequence).
Example System and/or Computer Embodiments
[00271] FIG. 16 depicts an example computing device (e.g., computing device 180 shown in FIG. 1A) for implementing system and methods described in reference to FIGs. 1-15. For example, the example computing device 180 is configured to perform the in silico steps of read alignment 215 and/or characterization 220. Examples of a computing device can include a personal computer, desktop computer laptop, server computer, a computing node within a cluster, message processors, hand-held devices, multi-processor systems, microprocessorbased or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, tablets, pagers, routers, switches, and the like.
[00272] FIG. 16 illustrates an example computing device 180 for implementing system and methods described in the figures above. In some embodiments, the computing device 180 includes at least one processor 1602 coupled to a chipset 1604. The chipset 1604 includes a memory controller hub 1620 and an input/output (VO) controller hub 1622. A memory 1606 and a graphics adapter 1612 are coupled to the memory controller hub 1620, and a display 1618 is coupled to the graphics adapter 1612. A storage device 1608, an input interface 1614, and network adapter 1616 are coupled to the I/O controller hub 1622. Other embodiments of the computing device 180 have different architectures.
[00273] The storage device 1608 is a non-transitory computer-readable storage medium such as a hard drive, compact disk read-only memory (CD-ROM), DVD, or a solid-state memory device. The memory 1606 holds instructions and data used by the processor 1602. The input interface 1614 is a touch-screen interface, a mouse, track ball, or other type of input interface, a keyboard, or some combination thereof, and is used to input data into the computing device 180. In some embodiments, the computing device 180 may be configured to receive input (e.g., commands) from the input interface 1614 via gestures from the user. The graphics adapter 1612 displays images and other information on the display 1618. For example, the display 1618 can show metrics pertaining to the generated libraries (e.g., DNA or RNA libraries) and/or any characterization of single cells. The network adapter 1616 couples the computing device 180 to one or more computer networks.
[00274] The computing device 180 is adapted to execute computer program modules for providing functionality described herein. As used herein, the term “module” refers to computer program logic used to provide the specified functionality. Thus, a module can be implemented in hardware, firmware, and/or software. In one embodiment, program modules are stored on the storage device 1608, loaded into the memory 1606, and executed by the processor 1602.
[00275] The types of computing devices 180 can vary from the embodiments described herein. For example, the computing device 180 can lack some of the components described above, such as graphics adapters 1612, input interface 1614, and displays 1618. In some embodiments, a computing device 180 can include a processor 1602 for executing instructions stored on a memory 1606.
[00276] The methods of aligning sequence reads and characterizing libraries and/or cells can be implemented in hardware or software, or a combination of both. In one embodiment, a non-transitory machine-readable storage medium, such as one described above, is provided, the medium comprising a data storage material encoded with machine readable data which, when using a machine programmed with instructions for using said data, is capable of displaying any of the datasets and execution and results of this invention. Such data can be used for a variety of purposes, such as patient monitoring, treatment considerations, and the like. Embodiments of the methods described above can be implemented in computer programs executing on programmable computers, comprising a processor, a data storage system (including volatile and non-volatile memory and/or storage elements), a graphics adapter, an input interface, a network adapter, at least one input device, and at least one output device. A display is coupled to the graphics adapter. Program code is applied to input data to perform the functions described above and generate output information. The output information is applied to one or more output devices, in known fashion. The computer can be, for example, a personal computer, microcomputer, or workstation of conventional design. [00277] Each program can be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the programs can be implemented in assembly or machine language, if desired. In any case, the language can be a compiled or interpreted language. Each such computer program is preferably stored on a storage media or device (e.g., ROM or magnetic diskette) readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage media or device is read by the computer to perform the procedures described herein. The system can also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform the functions described herein.
[00278] The signature patterns and databases thereof can be provided in a variety of media to facilitate their use. “Media” refers to a manufacture that contains the signature pattern information of the present invention. The databases of the present invention can be recorded on computer readable media, e.g. any medium that can be read and accessed directly by a computer. Such media include, but are not limited to: magnetic storage media, such as floppy discs, hard disc storage medium, and magnetic tape; optical storage media such as CD-ROM; electrical storage media such as RAM and ROM; and hybrids of these categories such as magnetic/optical storage media. One of skill in the art can readily appreciate how any of the presently known computer readable mediums can be used to create a manufacture comprising a recording of the present database information. "Recorded" refers to a process for storing information on computer readable medium, using any such methods as known in the art. Any convenient data storage structure can be chosen, based on the means used to access the stored information. A variety of data processor programs and formats can be used for storage, e.g. word processing text file, database format, etc.
Example Kit Embodiments
[00279] Also provided herein are kits for performing single cell analysis of RNA transcripts and genomic DNA of individual or populations of cells. The kits may include one or more of the following: fluids for forming emulsions (e.g., carrier phase, aqueous phase), barcoded beads, micro fluidic devices for processing single cells, reagents for lysing cells and releasing cell analytes, reaction mixtures for performing nucleic acid amplification reactions, and instructions for using any of the kit components according to the methods described herein. In particular embodiments, the kits include digestible primers that can be used for performing reverse transcription of RNA transcripts as well as agents for digesting the digestible primers to prevent the involvement of the digestible primers in subsequent reactions, such as nucleic acid amplification reactions. EXAMPLES
Example 1: In Situ Reverse Transcription for Single Cell DNA and RNA Analysis
[00280] Generally, cells are treated in situ to perform reverse transcription, followed by single cell DNA and RNA sequencing. In particular, cells are treated in situ to perform reverse transcription (RT). To achieve this, cells are fixed and permeabilized, reagents for reverse transcription are provided, and cells are incubated within a suitable temperature range for RT to occur. Cells are washed and loaded into the a single cell analysis system (e.g., Tapestri® single cell analysis system) for single cell encapsulation. Barcoding PCR is performed in emulsified droplets for both cDNA and DNA. RNA and DNA sequencing libraries are prepared from these amplified sequences.
[00281] Regarding the fixation step, the following exemplary fixatives and concentration ranges (in parentheses) are used for cell fixation:
- Methanol (100%), acetone (100%)
- Methanol and acetone (1: 1 mix of 100% solution each)
- Paraformaldehyde (1-10%)
- DSP (dithiobis(succinimidyl propionate)) (0.1 mM - 10 mM)
SPDP (succinimidyl 3-(2-pyridyldithio)propionate) (0.1 mM - 10 mM)
DSP and SPDP (mix of 0.1 mM - 10 mM each).
[00282] For fixation, cells are incubated with the fixatives for any of 5 minutes, 10 minutes, 30 minutes, 45 minutes, 1 hour, 2 hours, or 3 hours. Incubation temperature is at any of the following:
- For methanol, acetone, and methanol/acetone fixatives: -20C, on ice, 4C, or 10C
- For the paraformaldehyde, DSP, SPDP, and DSP/SPDP methods: On ice, 4C, 10C, 20C, 25C, room temperature, or 37C
[00283] For cell permeabilization, any of the following reagents are used: Tween-20 (0.01-1%), Triton X-100 (0.01-1%), or saponin (0.01-1%). Alternatively, no extra permeabilizing reagent is used, relying on some cell permeabilization caused by the cell fixation step.
[00284] Cell permeabilization takes place at an incubation temperature of any of the following: on ice, 4C, 10C, 20C, room temperature, 25C, or 37C. The incubation duration for cell permeabilization takes place for any of the following: 1 min, 3 min, 5 min, 10 min, 15 min, 20 min, or 30 min. Alternatively, this step can be performed concurrently with the following step (reverse transcription). In this case the reaction is proceeded to the next step without extra incubation time.
[00285] Reverse transcription is performed on the cells by providing reverse transcriptase, along with its buffer, dNTPs, and RT primers. RT primers include any of:
A primer or a mix of primers targeting specific region(s) of intended RNA(s). The target RNA can be a fusion/translocated RNA, a transcribed gene, non-coding RNA, nuclear RNA, or any other RNA products. The number of RNA targets and therefore specific RT primers can be from 1 to 20,000,
- Random hexamers or random oligonucleotides that target random regions, Oligo dTs that target poly -A tails in RNA, and Combinations of any of the above.
[00286] The RT primer sequence includes any of:
Complementary sequence to a segment on the bead oligo.
A unique sequence for pull down or isolation purposes, e.g., for the use of biotinstreptavidin pull down method.
A molecular tag
[00287] Reverse transcription is performed with or without the permeabilization reagent at any of the following temperatures: 25C, 30C, 37C, 40C, 50C, 55C, or 60C. Incubation time for RT is any of the following: 5 min, 10 min, 15 min, 20 min, 30 min, 40 min, or 45 min.
[00288] After RT, cells are washed by adding cell wash buffer, centrifuging at 300 to 400 x g and removing the cell wash buffer without disturbing the cells. Cells are washed in any of DPBS, DPBS + 0.5% FBS, DPBS + 1% FBS, DPBS + 2.5% FBS, or DPBS + 5% FBS. Volume of wash buffer is any of 0.5 mL, 1 mL, 1.5 mL, 2 mL, 5 mL, 10 mL, or 15 mL. Cells are washed at any of the following temperatures: 4C, 10C, 15C, 20C, 25C, or 37C. Cells are washed 1 to 5 times. Alternatively cells are pelleted to remove the RT reagent without extra washing.
[00289] Cells that are fixed, permeabilized, underwent reverse transcription, washed, are resuspended and provided to a single cell workflow (e.g., Tapestri® platform). Cells are encapsulated in droplets along with cell lysis reagent, protease (e.g., proteinase K), and reverse primers for DNA targets. Reverse primers for cDNAs (RNA targets) are added here. [00290] In the barcoding step, forward primers for DNA and cDNA targets are added. PCR is performed to amplify these targets. In some iterations, both forward and reverse primers for the cDNAs are added in this step. In some iterations, only the forward primers are used for the cDNA targets.
[00291] The amplicons are processed to generate sequenceable libraries by adding appropriate adaptors for intended sequencing platform. In some iterations the RNA target amplicons are process together with the DNA targets. In some iterations the RNA target amplicons are separated from the DNA targets and processed separately. This is achieved by performing separation by size or by biotin-streptavidin pull down.

Claims

CLAIMS A method for analyzing RNA of a cell, the method comprising: encapsulating a cell within a droplet with reagents; within the droplet, generating a cell lysate using the reagents, the cell lysate comprising A) cDNA derived from RNA of the cell or B) complements of the cDNA; encapsulating the cell lysate in a second droplet with reactants; within the second droplet, generating amplicons comprising sequences derived from oligonucleotides comprising molecular tags using the reactants and the cDNA of the cell lysate; and sequencing the generated amplicons to analyze RNA of the cell, wherein oligonucleotides comprising molecular tags are included in either the reagents in the droplet or in the reactants in the second droplet. The method of claim 1, wherein the reagents comprise oligonucleotides comprising molecular tags. The method of claim 1 or 2, wherein the cDNA comprises the oligonucleotides comprising molecular tags. The method of claim 2 or 3, wherein the oligonucleotides comprising molecular tags further comprise reverse transcription (RT) primers or template switching oligonucleotides (TSOs). The method of claim 4, wherein for each of the oligonucleotides comprising molecular tags, a molecular tag is located at an end of a RT primer or TSO. The method of any one of claims 2-5, wherein the oligonucleotides comprising molecular tags further comprise RT primers, and wherein generating the cell lysate comprising cDNA derived from RNA of the cell comprises: providing a RT primer to the RNA of the cell; and performing reverse transcription to generate the cDNA comprising the RT primer and the molecular tag. The method of any one of claims 2-6, wherein the oligonucleotides comprising molecular tags further comprise RT primers comprising one or more ribonucleotides. The method of any one of claims 2-6, wherein the oligonucleotides comprising molecular tags further comprise RT primers comprising one or more uracil. The method of any one of claims 2-8, wherein generating amplicons comprising sequences derived from the oligonucleotides comprising molecular tags comprises: digesting the oligonucleotides comprising molecular tags; and performing nucleic acid amplification to generate amplicons comprising sequences derived from the oligonucleotides comprising molecular tags. The method of claim 9, wherein digesting the oligonucleotides comprises exposing the oligonucleotides to a ribonuclease or uracil-DNA glycosylase (UDG). The method of claim 9 or 10, further comprising: prior to digesting the oligonucleotides comprising molecular tags: providing a primer to the cDNA; and extending the primer to generate a sequence complementary to the oligonucleotide comprising a molecular tag. The method of any one of claims 9-11, wherein generating amplicons comprising sequences derived from the oligonucleotides comprising molecular tags further comprises incorporating cellular barcodes into the amplicons, wherein the cellular barcodes identify the cell from which the amplicons originate from. The method of claim 1, wherein the oligonucleotides comprising molecular tags further comprise TSOs, and wherein generating the cell lysate comprising cDNA derived from RNA of the cell comprises: performing reverse transcription to generate a first cDNA molecule; performing template switching by providing a TSO to the first cDNA molecule; and generating the cDNA from the hybridized TSO and the first cDNA molecule. The method of claim 13, wherein the TSO comprises a sequence that hybridizes with untemplated cytosine nucleotides of the first cDNA molecule. The method of claim 14, wherein the TSO comprises a rGrGrG sequence that hybridizes with three untemplated cytosine nucleotides of the first cDNA molecule. The method of claim 1, wherein the reactants comprise oligonucleotides comprising molecular tags. The method of claim 16, wherein the oligonucleotides comprising molecular tags further comprise forward primers. The method of claim 16, wherein the oligonucleotides comprising molecular tags further comprise reverse primers. The method of claim 17, wherein generating amplicons comprising sequences derived from oligonucleotides comprising molecular tags using the reactants and the cDNA of the cell lysate further comprises: providing the forward primers of the oligonucleotides to the cDNA of the cell lysate; and extending the forward primers to generate sequences that incorporate the molecular tags of the oligonucleotides. The method of claim 19, wherein generating amplicons comprising sequences derived from oligonucleotides comprising molecular tags using the reactants and the cDNA of the cell lysate further comprises: performing nucleic acid amplification to generate amplicons comprising sequences derived from the oligonucleotides comprising molecular tags. The method of any one of claims 17-20, wherein the forward primers are gene specific primers. The method of claim 20 or 21, wherein performing nucleic acid amplification to generate amplicons further comprises incorporating cellular barcodes into the amplicons, wherein the cellular barcodes identify the cell from which the amplicons originate from. A method for analyzing RNA of a cell, the method comprising: encapsulating a cell within a droplet with reagents; within the droplet, generating a cell lysate comprising A) cDNA derived from RNA of the cell or B) complements of the cDNA; encapsulating the cell lysate in a second droplet with reactants; within the second droplet, generating amplicons comprising sequences derived from the cDNA of the cell lysate; breaking the second droplet to obtain the generated amplicons in bulk; providing oligonucleotides comprising molecular tags to the generated amplicons in bulk; and sequencing at least the oligonucleotides comprising molecular tags to analyze RNA of the cell. The method of claim 23, wherein generating amplicons comprising sequences derived from the cDNA of the cell lysate further comprises: providing oligonucleotides comprising primers and cellular barcodes to the cDNA of the cell lysate, wherein the cellular barcodes identify the cell; and extending the primers to generate sequences that incorporate the cellular barcodes. The method of claim 24, wherein generating amplicons comprising sequences derived from the cDNA of the cell lysate further comprises performing nucleic acid amplification. The method of claim 24 or 25, wherein the primers are gene specific primers. The method of claim 23, wherein generating amplicons comprising sequences derived from the cDNA of the cell lysate further comprises: providing oligonucleotides comprising primers and gene tags; extending the primers to generate amplicons that further incorporate the gene tags. The method of claim 27, further comprising: hybridizing the provided oligonucleotides comprising molecular tags with the amplicons that incorporate the gene tags; generating nucleic acid sequences by extending the hybridized oligonucleotides comprising molecular tags, wherein the generated nucleic sequences comprise molecular tags and gene tags. The method of claim 28, further comprising sequencing gene tags of the nucleic acid sequences to analyze RNA of the cell. The method of claim 28 or 29, wherein the nucleic acid sequences do not include the cDNA of the cell lysate or sequences derived from the cDNA of the cell lysate. The method of claim 29, wherein sequencing at least the oligonucleotides comprising molecular tags and sequencing gene tags of the nucleic acid sequences do not include sequencing cDNA of the cell lysate or sequences derived from the cDNA of the cell lysate. A method for analyzing RNA of a cell, the method comprising: encapsulating a cell within a droplet with reagents; within the droplet, generating a cell lysate comprising A) cDNA derived from RNA of the cell or B) complements of the cDNA; encapsulating the cell lysate in a second droplet with reactants, wherein the reactants comprise oligonucleotides comprising one or more universal bases; within the second droplet, generating amplicons comprising molecular tags derived from oligonucleotides comprising one or more universal bases using the reactants and the cDNA of the cell lysate or complements of the cDNA, wherein amplicons from different cDNA are distinguishable according to the molecular tags derived from oligonucleotides comprising one or more universal bases; and sequencing the generated amplicons to analyze RNA of the cell, wherein oligonucleotides comprising one or more universal bases are included in either the reagents in the droplet or in the reactants in the second droplet. The method of claim 32, wherein generating amplicons comprising sequences derived from oligonucleotides comprising one or more universal bases using the reactants and the cDNA of the cell lysate comprises: performing a first cycle of nucleic acid amplification to incorporate the oligonucleotides comprising one or more universal bases; and performing a second cycle of nucleic acid amplification to generate the amplicons, wherein the molecular tags are generated within the amplicons during the second cycle of nucleic acid amplification. The method of claim 33, wherein the molecular tags are generated within the amplicons during the second cycle of nucleic acid amplification by polymerases that generate strands complementary to the one or more universal bases of the oligonucleotides. The method of any one of claims 32-34, wherein the oligonucleotides comprising one or more universal bases comprise two or more consecutive universal bases. The method of any one of claims 32-35, wherein the oligonucleotides comprising one or more universal bases comprise three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, or ten or more consecutive universal bases. The method of any one of claims 32-36, wherein each of the universal bases are independently any one of an inosine base, 2'-DeoxyNebularine, 2’deoxyinosine, 5- nitroindole, 5' 5-Nitroindole, or 3 -Nitropyrrole. The method of any one of claims 1-37, wherein each molecular tag differs from other molecular tags. The method of any one of claims 1-38, wherein at least one molecular tag has a same sequence as another molecular tag. The method of any one of claims 1-39, wherein at least 0.1% of molecular tags have a same sequence as another molecular tag. The method of any one of claims 1-40, wherein at least 0.5% of molecular tags have a same sequence as another molecular tag. The method of any one of claims 1-41, wherein at least 1% of molecular tags have a same sequence as another molecular tag. The method of any one of claims 1-42, wherein each of the molecular tags independently comprise any one of three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, or twenty nucleotide bases. The method of any one of claims 1-42, wherein each of the molecular tags independently comprise either seven or eight nucleotide bases. A method for analyzing RNA of a cell, the method comprising: encapsulating a cell within a droplet with reagents; within the droplet, generating a cell lysate comprising A) cDNA with different start and stop sites derived from RNA of the cell that have been differentially cleaved or B) complements of the cDNA; encapsulating the cell lysate in a second droplet with reactants; within the second droplet, generating amplicons using the reactants and the cDNA of the cell lysate, wherein the amplicons are derived from the cDNA with different start and stop sites; and sequencing the generated amplicons to analyze RNA of the cell. The method of claim 45, wherein the RNA of the cell have been differentially cleaved by a RNAse included in the reagents. The method of claim 45 or 46, wherein the reagents further comprise a plurality of truncation oligonucleotides, wherein the plurality of truncation oligonucleotides comprise DNA nucleobases. The method of any one of claims 45-47, wherein generating the cell lysate comprising cDNA with different start and stop sites derived from RNA of the cell that have been differentially cleaved further comprises: hybridizing the plurality of truncation oligonucleotides to RNA of the cell; differentially cleaving the RNA of the cell at locations where the plurality of truncation oligonucleotides are hybridized to RNA of the cell. The method of any one of claims 45-48, wherein the different start and stop sites include at least 102, at least 103, at least 104, at least 105, at least 106, at least 107, at least 108, or at least 109 different start and stop sites. The method of any one of claims 45-49, wherein sequencing the generated amplicons to analyze RNA of the cell comprises: identifying amplicons with sequences corresponding to the same start and stop site; and correlating the identified amplicons with sequences corresponding to the same start and stop site to a common RNA of the cell. A method for analyzing RNA of a cell, the method comprising: encapsulating a cell within a droplet with reagents; within the droplet, generating a cell lysate comprising A) cDNA derived from RNA of the cell or B) complements of the cDNA; encapsulating the cell lysate in a second droplet with reactants; within the second droplet, performing nucleic acid amplification to generate amplicons comprising nucleotide bases derived from alternate bases introduced in one or both of the reagents or reactants, wherein the alternate bases are propagated through one or more cycles of the nucleic acid amplification; and sequencing the generated amplicons comprising nucleotide bases derived from alternate bases to analyze RNA of the cell. The method of claim 51, wherein alternate bases are introduced in the reagents, and wherein generating the cell lysate comprises incorporating alternate bases into the cDNA or complements of the cDNA. The method of claim 52, wherein generating the cell lysate comprises incorporating alternate bases into the cDNA during reverse transcription. The method of claim 52 or 53, wherein the alternate bases are incorporated into the cDNA or complements of the cDNA in a random manner. The method of any one of claims 51-54, wherein performing nucleic acid amplification to generate amplicons comprising nucleotide bases derived from alternate bases comprises amplifying the cDNA comprising the incorporated alternate bases. The method of claim 51, wherein alternate bases are introduced in the reactants, and wherein performing nucleic acid amplification comprises incorporating alternate bases during a first cycle of the nucleic acid amplification. The method of claim 56, wherein the alternate bases are incorporated during the first cycle of the nucleic acid amplification in a random manner. The method of any one of claims 51-57, wherein additional alternate bases are randomly incorporated in the one or more cycles of the nucleic acid amplification. The method of any one of claims 51-58, wherein the alternate bases comprise any one of inosine base, 2'-DeoxyNebularine, 2’deoxyinosine, 5-nitroindole, 5' 5-Nitroindole, or 3- Nitropyrrole. The method of any one of claims 51-59, wherein sequencing the generated amplicons comprising common alternate bases to analyze RNA of the cell comprises: identifying one or more sequence reads of amplicons comprising a plurality of common alternate bases; and assigning the identified one or more sequence reads of amplicons to a RNA of the cell. The method of any one of claims 1-60, wherein generating the cell lysate using the reagents comprises releasing genomic DNA (gDNA) of the cell such that the cell lysate comprises the gDNA. The method of claim 61, wherein releasing gDNA of the cell comprises exposing the cell to proteinase K. The method of claim 61 or 62, further comprising: within the second droplet, generating amplicons derived from the released gDNA of the cell lysate using the reactants; and sequencing the generated amplicons derived from the released gDNA. The method of claim 63, wherein generating amplicons derived from the released gDNA comprises incorporating cellular barcodes into amplicons derived from the released gDNA using the reactants, wherein the cellular barcodes identify the cell from which the amplicons originate from. A method for analyzing RNA of a cell, the method comprising: performing in situ processing of the cell to incorporate one or more molecular tags into a nucleic acid derived from RNA of the cell; encapsulating, within a first droplet, the cell comprising the nucleic acid derived from RNA of the cell and reagents; further encapsulating the nucleic acid derived from RNA in a second droplet with reactants; within the second droplet, performing nucleic acid amplification to incorporate a cell barcode into amplicons using the reactants, wherein the amplicons comprise the one or more molecular tags, the cell barcode, and a gene specific sequence of the RNA or a complement thereof; and sequencing the generated amplicons to analyze RNA of the cell, The method of claim 65, wherein performing in situ processing of the cell to incorporate one or more molecular tags into a nucleic acid derived from RNA of the cell comprises: providing a primer comprising a sequence complementary to a sequence of the RNA of the cell and a molecular tag; providing a gene specific sequence comprising a sequence complementary to a second sequence of the RNA of the cell; and ligating the gene specific sequence and the primer to generate the nucleic acid molecule comprising the molecular tag. The method of claim 66, wherein the sequence complementary to a sequence of the RNA of the cell comprises a poly-T sequence complementary to a poly-A tail of the RNA. The method of claim 66, wherein the sequence complementary to a sequence of the RNA of the cell comprises a gene specific sequence. The method of any one of claims 66-68, wherein the sequence of the RNA and the second sequence of the RNA are adjacent to one another other. The method of claim 65, wherein performing in situ processing of the cell to incorporate one or more molecular tags into a nucleic acid derived from RNA of the cell comprises: providing a first primer comprising a sequence complementary to a sequence of the RNA of the cell; providing a second primer comprising a sequence complementary to a second sequence of the RNA of the cell; and ligating the first primer and the second primer to generate the nucleic acid molecule comprising the molecular tag, wherein one or more both of the first primer or the second primer comprises a molecular tag. . The method of claim 70, wherein the sequence complementary to a sequence of the RNA of the cell comprises a poly-T sequence complementary to a poly-A tail of the RNA. . The method of claim 70, wherein the sequence complementary to a sequence of the RNA of the cell comprises a gene specific sequence. . The method of any one of claims 70-72, wherein the sequence of the RNA and the second sequence of the RNA are adjacent to one another other. . The method of any one of claims 70-73, wherein the first primer or the second primer further comprises a constant region. . The method of claim 65, wherein performing in situ processing of the cell to incorporate one or more molecular tags into a nucleic acid derived from RNA of the cell comprises: providing a primer comprising a sequence complementary to a sequence of the RNA of the cell and a molecular tag; using the primer, reverse transcribing the nucleic acid derived from RNA of the cell. The method of claim 75, wherein the sequence complementary to a sequence of the RNA of the cell comprises a poly-T sequence complementary to a poly-A tail of the RNA. The method of claim 75, wherein the sequence complementary to a sequence of the RNA of the cell comprises a gene specific sequence. The method of any one of claims 65-77, further comprising: subsequent to encapsulating the cell comprising the nucleic acid derived from RNA of the cell and reagents in the first droplet, releasing genomic DNA from the cell using the reagents. The method of claim 78, wherein the nucleic acid derived from RNA of the cell comprises a constant region with a primer annealing temperature that differs from a primer annealing temperature of the released genomic DNA. The method of claim 79, wherein the primer annealing temperature of the constant region is lower than the primer annealing temperature of the released genomic DNA. The method of claim 79 or 80, wherein performing nucleic acid amplification to incorporate a cell barcode comprises performing nucleic acid amplification cycles at an annealing temperature of the genomic DNA to preferentially amplify the genomic DNA in comparison to the nucleic acid derived from RNA of the cell.
PCT/US2023/061040 2022-01-21 2023-01-20 Methods of molecular tagging for single-cell analysis WO2023141604A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202263301706P 2022-01-21 2022-01-21
US63/301,706 2022-01-21

Publications (2)

Publication Number Publication Date
WO2023141604A2 true WO2023141604A2 (en) 2023-07-27
WO2023141604A3 WO2023141604A3 (en) 2023-09-28

Family

ID=87349174

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/061040 WO2023141604A2 (en) 2022-01-21 2023-01-20 Methods of molecular tagging for single-cell analysis

Country Status (1)

Country Link
WO (1) WO2023141604A2 (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018218222A1 (en) * 2017-05-26 2018-11-29 Goldfless Stephen Jacob High-throughput polynucleotide library sequencing and transcriptome analysis
US10501739B2 (en) * 2017-10-18 2019-12-10 Mission Bio, Inc. Method, systems and apparatus for single cell analysis
US20220325357A1 (en) * 2019-08-12 2022-10-13 Mission Bio, Inc. Method and Apparatus for Multi-Omic Simultaneous Detection of Protein Expression, Single Nucleotide Variations, and Copy Number Variations in the Same Single Cells

Also Published As

Publication number Publication date
WO2023141604A3 (en) 2023-09-28

Similar Documents

Publication Publication Date Title
US11725241B2 (en) Compositions and methods for identification of a duplicate sequencing read
EP3332024B1 (en) Target enrichment by single probe primer extension
US11827936B2 (en) Methods and systems for single cell gene profiling
US20230265497A1 (en) Single cell workflow for whole genome amplification
CN114555827A (en) Methods, systems and devices for simultaneous multiomic detection of protein expression, single nucleotide variation and copy number variation in the same single cell
US20240060134A1 (en) Methods, systems and apparatus for copy number variations and single nucleotide variations simultaneously detected in single-cells
CN116615538A (en) Whole transcriptome analysis in single cells
US20230101896A1 (en) Enhanced Detection of Target Nucleic Acids by Removal of DNA-RNA Cross Contamination
WO2023141604A2 (en) Methods of molecular tagging for single-cell analysis
US20230094303A1 (en) Methods and Systems Involving Digestible Primers for Improving Single Cell Multi-Omic Analysis
US20220282326A1 (en) Method and Apparatus for Single-Cell Analysis for Determining a Cell Trajectory

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23743974

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE