WO2022143221A1 - 用于标记核酸分子的方法和试剂盒 - Google Patents

用于标记核酸分子的方法和试剂盒 Download PDF

Info

Publication number
WO2022143221A1
WO2022143221A1 PCT/CN2021/139123 CN2021139123W WO2022143221A1 WO 2022143221 A1 WO2022143221 A1 WO 2022143221A1 CN 2021139123 W CN2021139123 W CN 2021139123W WO 2022143221 A1 WO2022143221 A1 WO 2022143221A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
nucleic acid
tag
transposase
cell
Prior art date
Application number
PCT/CN2021/139123
Other languages
English (en)
French (fr)
Inventor
蒋岚
李芸
Original Assignee
中国科学院北京基因组研究所(国家生物信息中心)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院北京基因组研究所(国家生物信息中心) filed Critical 中国科学院北京基因组研究所(国家生物信息中心)
Priority to EP21913962.3A priority Critical patent/EP4279609A1/en
Publication of WO2022143221A1 publication Critical patent/WO2022143221A1/zh

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1096Processes for the isolation, preparation or purification of DNA or RNA cDNA Synthesis; Subtracted cDNA library construction, e.g. RT, RT-PCR
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1065Preparation or screening of tagged libraries, e.g. tagged microorganisms by STM-mutagenesis, tagged polynucleotides, gene tags
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B40/00Libraries per se, e.g. arrays, mixtures
    • C40B40/04Libraries containing only organic compounds
    • C40B40/06Libraries containing nucleotides or polynucleotides, or derivatives thereof
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/06Biochemical methods, e.g. using enzymes or whole viable microorganisms

Definitions

  • the present application relates to the technical field of transcriptome sequencing, in particular to high-throughput single-cell transcriptome sequencing.
  • the present application relates to a method of treating a cell or nucleus to generate a population of nucleic acid fragments, and using the population of nucleic acid fragments to generate labeled nucleic acid molecules, construct a library of nucleic acid molecules for transcriptome sequencing, or, for Methods for high-throughput sequencing of single-cell transcriptomes.
  • the present application also relates to a nucleic acid molecule library constructed using the method, and a kit for implementing the method.
  • Single-cell sequencing mainly includes single-cell genome sequencing, transcriptome sequencing, methylation sequencing, chromatin accessibility sequencing, and single-cell multi-omics sequencing containing the above omics information. Its essence is to reveal the genome, transcriptome, methylation, chromatin open state and other omics changes of a single cell by analyzing the sequence, copy number, modification state, and interaction of DNA and RNA in a single cell.
  • Single-cell transcriptome sequencing is currently the most widely used, and can obtain transcriptome information in a single cell at a certain time.
  • the method includes reverse transcription of the transcriptome in a single cell at a certain time to obtain cDNA, then amplifying the cDNA, constructing a sequencing library, and performing sequencing to obtain the transcript information of a specific cell.
  • the emergence of single-cell transcriptome technology makes the research accuracy from the multi-cellular level of the tissue to a single cell. It can study the specific characteristics of a cell or a group of cells alone, and play a role in the study of cell development, tumor microenvironment, and single-cell mapping. key role.
  • high-throughput single-cell transcriptome library technologies mainly include the following.
  • the barcodes include a cell barcode (Cell barcode) specific to each microsphere and a single molecule barcode (UMI) specific to each single molecule.
  • Cell barcode cell barcode
  • UMI single molecule barcode
  • thousands of cells can be reverse transcribed simultaneously, single-cell and single-molecule labeling can be achieved, and the resulting cDNA pooled library can be sequenced.
  • the main disadvantages of this library construction technology are low cell throughput, high empty loading rate of microreaction system, low sample throughput, and high cost of library construction.
  • the basic principle of this type of technology is to use a chip or microplate with hundreds of thousands of microwells as a carrier, combined with magnetic beads with polythymidine (PolyT) and cell barcodes for single-cell capture: single-cell suspension After the liquid is loaded into the microplate, it will fall into the microwell. After washing off the free single-cell sample, magnetic beads with polythymidine and cell barcodes are added. Each well is equivalent to an independent reaction chamber. After adding lysate for cell lysis, the RNA released by single cells in different wells will be adsorbed on the primers of the magnetic beads and labeled with different cell barcodes.
  • PolyT polythymidine
  • RNA molecules adsorbed with RNA were transferred to EP tubes for reverse transcription, realizing the simultaneous labeling of RNA molecules of thousands of cells, and then amplifying cDNA and constructing a library.
  • the main disadvantages of this library construction technology are low cell throughput, low sample throughput, cumbersome operation process, and high cost of library construction.
  • single-cell transcriptome sequencing technology breaks through the limitation of samples, making it possible to study the single-cell transcriptome of frozen samples, especially frozen clinical samples.
  • sample multiplexing based on additional tags has been developed.
  • samples can be pre-barcoded and then pooled and sequenced from multiple samples. After sequencing, the single-cell transcriptome information of multiple samples can be split from the sequencing data with the help of sample barcodes.
  • Representative examples of high-throughput transcriptional library technologies based on additional tags and microfluidic droplets include the Feature Barcoding technology developed by 10X Genomics based on BioLegend's TotalSeq TM antibody (Single Cell 3' Feature Barcode Library Kit, #PN -1000079; Single Cell 5'Feature Barcode Kit, #PN-1000256), and MULTI-seq technology reported by Zev J. Gartner team in 2019 (McGinnis, CS, et al., Nature Methods, 2019.16(7):p .619-626). This type of technology is based on microfluidic technology platforms such as 10X Genomics Chromium and Fluidigm C1.
  • each cell sample is first labeled with a specific label, and then labeled with different labels.
  • Multiple samples of the label are mixed for microdroplet preparation.
  • TotalSeq TM series antibodies that can specifically bind to different proteins on the cell membrane surface are coupled to a specific tag and a sequence that can be complementary to the 10X Genomics microbead barcode sequence. Oligonucleotides.
  • different cell samples can be labeled in advance with different TotalSeq TM antibodies.
  • These labeled samples can be pooled, run a standard 10X Genomics transcription build-up library, and then library enriched with the Feature Barcoding kit and sequenced.
  • the library building process of the method includes: dividing the cell samples into multiple parts, and using reverse transcription primers with specific oligonucleotide sequence tags to reverse-transcribe each part of the cell sample respectively, so that the nucleic acid molecules in each part of the sample are obtained separately.
  • the first round of labeling then, these samples were pooled and pooled using the scATAC kit from 10X Genomics to add a second round of labeling (including cellular and single-molecule barcodes) to nucleic acid molecules in the library.
  • this method can increase cell throughput by approximately 15-fold over the standard technical protocol of the 10X Genomics Chromium platform.
  • this method can only be used for 3'-end library construction, and only the 3'-end information of transcriptome mRNA molecules can be collected, but the 5'-end information of mRNA molecules cannot be obtained.
  • high-throughput single-cell transcriptome end sequencing library construction technology can be divided into: 5' end library construction technology and 3' end library construction technology for transcriptome sequencing. Both of these library building techniques can be used for non-full-length mRNA end sequencing, but they are two different techniques: 3' end library technology is used to enrich and determine 3' end information of transcriptome mRNA molecules, and 5' end library technology The 'end library construction technology is used to enrich and measure the 5' end information of transcriptome mRNA molecules, and can be used to provide transcription start position information; the two achieve different goals and are suitable for different scenarios.
  • T lymphocytes (T cells) and B lymphocytes (B cells) are mainly responsible for adaptive immune responses, which mainly rely on T cell receptors (TCR) and B cell receptors (BCR) to recognize antigens.
  • TCR T cell receptors
  • BCR B cell receptors
  • the common feature of these two types of cell surface molecules is that they are diverse and can recognize a wide variety of antigenic molecules.
  • the heavy chain of BCR and TCR ⁇ chain are composed of four gene fragments, V, D, J, and C, and the light chain of BCR and TCR ⁇ chain are composed of three gene fragments, V, J, and C. These gene fragments undergo recombination during the genetic process. Rearrangement, combining into different forms, ensures receptor diversity.
  • VDJ sequencing the immune mechanism can be explored and the relationship between the immune repertoire and disease can be explored. Since the VDJ region is located at the 5' end of the mRNA, it is easier to enrich for sequences of the full-length V(D)J region of the T-cell receptor (TCR) and B-cell receptor (BCR) using 5'-end library technology.
  • TCR T-cell receptor
  • BCR B-cell receptor
  • the main disadvantages of commercial methods of 5'-end library construction technology for transcriptome sequencing are: low cell throughput, high empty loading rate of microreaction system, sample The throughput is low, and the cost of building a library is high.
  • the 5'-end library construction scheme based on 10X Genomics' Feature Barcoding technology can realize the labeling of multiple samples in a single reaction, it requires additional expensive Feature Barcoding kits and TotalSeq TM antibodies.
  • the use of barcode tags introduced by TotalSeq TM antibodies failed to decompose the transcriptomes of different cells from the same droplet.
  • the existing high-throughput single-cell transcriptomic library technology (especially the 5'-end library construction technology) still has the following defects: low cell throughput, high empty load rate of the microreaction system, and high cost of library construction. Therefore, there is an urgent need to develop new high-throughput single-cell transcriptome library technologies (especially 5'-end library technology).
  • the term "pseudosingle cell” refers to a microreaction system (eg, a water-in-oil droplet or a microwell) containing two or more situation of cells.
  • a microreaction system eg, a water-in-oil droplet or a microwell
  • two or more cells in the same microreaction system eg, the same droplet or microwell
  • only using the cell-specific tags introduced by the microreaction system cannot perform a "one-to-one" labeling effect on each cell in the microreaction system.
  • the sequencing data generated by the "pseudo-single-cell” microreaction system cannot be used to analyze the transcriptomic information of a single cell because it contains sequencing results from two or more cells. Therefore, in the traditional high-throughput single-cell transcriptome sequencing method, it is necessary to filter or remove the sequencing data generated by the "pseudo-single-cell” microreaction system from the final sequencing data; A lot of waste, it is necessary to reduce or control the number or ratio of "pseudo-single-cell” microreaction systems as much as possible.
  • the term “pseudo-single cell rate” refers to the ratio (number) of "pseudo-single cell” microreaction systems to all microreaction systems (number) containing cells.
  • cellular throughput refers to the number of cells that can be labeled simultaneously in a single banking reaction for a given single-cell banking protocol.
  • sample throughput refers to the number of samples that can be labeled simultaneously in a single banking reaction for a given single-cell banking protocol.
  • a cell or nucleus thereof useful in the methods of the invention can be any cell of interest or nucleus thereof, eg, a cancer cell , stem cells, nerve cells, fetal cells and immune cells or their nuclei involved in immune responses.
  • the cell may be a single cell or a plurality of cells.
  • the cells may be a mixture of cells of the same type, or a mixture of completely heterogeneous cells of different types. Different cell types may include cells derived from different tissues (e.g., epithelial, connective, muscle, neural), body fluids (e.g., blood), etc.
  • different cell types can include normal cells and cancer cells of an individual; various cell types obtained from human subjects, such as various immune cells; and many different bacteria from environmental, forensic, microbiome, or other samples species, strains and/or variants; or any other various mixtures of cell types.
  • cancer cells that can be treated or analyzed using the methods described herein include cancer cells such as acanthoma, acne cell carcinoma, acoustic neuroma, acral melanoma, acral hidradenoma, acute oncocytic leukemia , acute lymphoblastic leukemia, acute megakaryocytic leukemia, acute monocytic leukemia, mature acute myeloid leukemia, acute myeloid dendritic cell leukemia, acute myeloid leukemia, acute promyelocytic leukemia, diamond disease, adenocarcinoma Carcinoma, adenoid cystic carcinoma, adenoma, adenomatous odontogenic tumor, adrenocortical carcinoma, adult T-cell leukemia, aggressive NK-cell leukemia, AIDS-related cancer, AIDS-related lymphoma, alveolar soft tissue sarcoma, ameloblastoid
  • Non-limiting examples of immune cells that can be treated or analyzed using the methods described herein include B cells, T cells (eg, cytotoxic T cells, natural killer T cells, regulatory T cells, and T helper cells), natural killer cells, Cytokine-Induced Killer (CIK) cells; myeloid cells such as granulocytes (basophils, eosinophils, neutrophils/hyperlobed neutrophils), monocytes/macrophages Cells, mast cells, platelets/megakaryocytes and dendritic cells and combinations thereof.
  • T cells eg, cytotoxic T cells, natural killer T cells, regulatory T cells, and T helper cells
  • CIK Cytokine-Induced Killer
  • myeloid cells such as granulocytes (basophils, eosinophils, neutrophils/hyperlobed neutrophils), monocytes/macrophages Cells, mast cells, platelets/megakaryocytes and dendritic cells and combinations
  • nuclei of the above-described cells can also be processed or analyzed using the methods described herein.
  • Long non-coding RNA has the meaning commonly understood by those skilled in the art, and is used interchangeably with “lncRNA”.
  • Long non-coding RNAs are a class of RNA molecules with a transcript length of more than 200 nt, which usually do not encode proteins, and regulate the expression level of target genes in the form of RNA.
  • eRNA enhancer RNA
  • eRNA RNA transcribed from RNA pol II from a transcriptional enhancer region.
  • a "population of nucleic acid fragments” refers to, for example, a population of nucleic acid fragments derived from a target nucleic acid molecule (eg, a DNA double-stranded molecule, RNA/cDNA hybrid double-stranded molecule, DNA single-stranded molecule, or RNA single-stranded molecule) or gather.
  • a population of nucleic acid fragments comprises a library of nucleic acid fragments comprising sequences qualitatively and/or quantitatively representative of the sequence of a target nucleic acid molecule.
  • the population of nucleic acid fragments comprises a subset of the library of nucleic acid fragments.
  • a "library of nucleic acid molecules” refers to labeled nucleic acid fragments generated from target nucleic acid molecules (eg, labeled DNA double-stranded molecular fragments, labeled RNA/cDNA hybrid double-stranded molecular fragments, labeled DNA single-stranded molecular fragments, stranded molecular fragments, or labeled RNA single-stranded molecular fragments) collections or populations, wherein the combination of labeled nucleic acid fragments in the collection or population is shown to qualitatively and/or quantitatively represent the nucleic acid from which the labeled nucleic acids are produced.
  • the sequence of the sequence of the target nucleic acid molecule of the fragment is shown to qualitatively and/or quantitatively represent the nucleic acid from which the labeled nucleic acids are produced.
  • nucleic acid molecules there is no intentional selection for acceptance or selection against being in the collection or population by using methods that include or exclude labeled nucleic acid fragments based on the nucleotide or sequence composition of the target nucleic acid molecule Added labeled nucleic acid fragments.
  • cDNA refers to a catalyzed interaction with the RNA molecule of interest catalyzed by an RNA-dependent DNA polymerase or reverse transcriptase using at least a portion of the RNA molecule of interest as a template.
  • “Complementary DNA” is synthesized by extension of primers that anneal the RNA molecule (this process is also known as “reverse transcription”). The synthesized cDNA molecule is "complementary” or “base paired” or “complexed” with at least a portion of the template.
  • transposase refers to an enzyme capable of forming a functional complex with a composition comprising a transposon end (eg, transposon, transposon end, transposon end composition) and catalyzing the insertion or transposition of the composition comprising the ends of the transposon into a double-stranded nucleic acid molecule (eg, DNA duplex, RNA/cDNA hybrid duplex) incubated with the enzyme in a transposition reaction (eg, an in vitro transposition reaction) )middle.
  • a transposon end eg, transposon, transposon end, transposon end composition
  • a transposition reaction eg, an in vitro transposition reaction
  • transposases include Tn5 transposase, MuA transposase, Sleeping Beauty transposase, Mariner transposase, Tn7 transposase, Tn10 transposase, Ty1 transposase, Tn552 transposase, As well as variants, modified products and derivatives having the transposition activity of the above transposases (eg, having a higher transposition activity).
  • transposon end or “transposase recognition sequence” refers to a double-stranded nucleic acid molecule of nucleotide sequence necessary to form a functional complex with a transposase in a transposition reaction.
  • transposon end and “transposase recognition sequence” have the same meaning and are used interchangeably.
  • a transposon end forms a "transposase complex” or “transposome complex” or “transposome composition” with a transposase that recognizes and binds the transposon end, and the complex is capable of
  • the transposon ends are inserted or transposed into the target double-stranded nucleic acid molecule with which they are incubated in an in vitro transposition reaction.
  • the transposon end contains two complementary sequences consisting of a "transferred transposon end sequence" and a "non-transferred transposon end sequence".
  • the nucleic acid strand containing the transferred transposon end sequence is referred to as the "transferred strand”.
  • a nucleic acid strand containing a non-transferred transposon end sequence is referred to as a "non-transferred strand”.
  • the 3' end of the transferred strand is joined to or transferred to a target nucleic acid molecule (e.g., DNA molecule, RNA molecule).
  • a target nucleic acid molecule e.g., DNA molecule, RNA molecule.
  • the 5' end of the transposon end sequence complementary to the transferred transposon end sequence ie, the non-transferred transposon end sequence
  • the target nucleic acid molecule molecular does not engage or transfer to the target nucleic acid molecule molecular.
  • the transferred and non-transferred strands are non-covalently joined (eg, by hydrogen bonding formed between bases).
  • the transfer strand and the non-transfer strand are covalently attached.
  • the transferred strand sequence and the non-transferred strand sequence are provided on a single oligonucleotide, eg, in a hairpin configuration.
  • Transposon end composition or “transposition sequence” means a transposon comprising a transposon end (ie, the smallest double-stranded DNA fragment capable of undergoing a transposition reaction with the action of a transposase), optionally plus a transferred transposon Composition of additional sequences 5' to the end sequences and/or 3' to the non-transferred transposon end sequences.
  • transposon end composition and “transposition sequence” have the same meaning and are used interchangeably.
  • a transposon end (transposase recognition sequence) linked to a first tag sequence and/or a first consensus sequence is a "transposon end composition” or "transposition sequence”.
  • the transposon end composition comprises or consists of two transposon end oligonucleotides, which are shown in combination by "transferred transposon end oligonucleotides” or “transferred strands” and “non-transferred strand end oligonucleotides” or “non-transferred strand end oligonucleotides” that have sequences at the ends of the transposon and one or both strands of which include additional sequences transfer chain".
  • transfer strand refers to the transferred portion of both "transposon end” and “transposon end composition”, ie regardless of whether the transposon end is attached to a tag sequence or other portion.
  • non-transferred strand refers to the non-transferred portion of both "transposon end” and "transposon end composition”.
  • the transposon end composition or transposition sequence is provided by two single oligonucleotide strands linked by interbase hydrogen bonding to form a linear duplex.
  • the 3' terminal nucleotides of the non-transferred strands in the transposon end composition or transposition sequence are blocked (eg, dideoxynucleotides).
  • the transposon end composition is a "hairpin transposon end composition," which refers to a transposon end composition consisting of a single oligodeoxyribonucleotide oligodeoxyribonucleotides exhibiting a non-transferred strand transposon end sequence at its 5' end, a transferred transposon end sequence at its 3' end, and an Any sequence intervening between the non-transferred transposon end sequence and the transferred transposon end sequence to enable the transposon end portion to function in a transposition reaction.
  • the 5' end of the hairpin transposon end composition has a phosphate group at the 5' position of the 5' terminal nucleotide.
  • any sequence intervening between the non-transferred transposon end sequence and the transferred transposon end sequence of the hairpin transposon end composition provides a marker sequence for a particular use or application .
  • upstream is used to describe the relative positional relationship of two nucleic acid sequences (or two nucleic acid molecules) and has the meaning commonly understood by those skilled in the art.
  • expression “a nucleic acid sequence is located upstream of another nucleic acid sequence” means that when arranged in a 5' to 3' direction, the former is located at a more forward position (ie, closer to the 5' end) than the latter. Location).
  • downstream has the opposite meaning to "upstream.”
  • tag sequence refers to the nucleic acid fragment to which it ligates or the nucleic acid fragment to which it ligates
  • Derivative products e.g., complementary fragments of nucleic acid fragments, fragmented short fragments of nucleic acid fragments, etc.
  • the oligonucleotides are oligonucleotides of non-target nucleic acid components such as primers for DNA polymerase extension or oligonucleotides for capture or ligation reactions.
  • the marker sequence can consist of at least two (preferably about 6 to 100) in a row, but there is no definite limit to the length of the oligonucleotide, the exact size depends on many factors, which in turn depend on the ultimate function of the oligonucleotide. or use) nucleotide composition, and can also be composed of continuous or non-consecutive arrangement of multi-segment oligonucleotides.
  • a marker sequence may be unique to each nucleic acid fragment to which it is ligated, or unique to a class of nucleic acid fragments to which it is ligated.
  • the marker sequence can be reversibly or irreversibly joined to the polynucleotide sequence to be "labeled” by any method including ligation, hybridization or other methods.
  • the process of attaching a label sequence to a nucleic acid molecule is sometimes referred to herein as "labeling” and a nucleic acid molecule that undergoes the addition of a label or a label-containing sequence is referred to as a "labeled nucleic acid molecule.”
  • Nucleic acids or polynucleotides of the invention may include one or more modified Nucleic acid bases, sugar moieties or internucleoside linkages.
  • nucleic acids or polynucleotides comprising modified bases, sugar moieties, or internucleoside linkages include, but are not limited to: (1) changes in Tm; (2) changes in the polynucleotide to one or more susceptibility to nucleases; (3) providing a moiety for attaching a label; (4) providing a label or a label quencher; or (5) providing a moiety for attaching to another molecule in solution or bound to a surface, such as biotin.
  • oligonucleotides such as primers can be synthesized such that the random portion comprises one or more nucleic acid analogs that are conformationally constrained, such as, but not limited to, the ribose ring in which the 2'-O atom is attached
  • nucleic acid analogs that are conformationally constrained, such as, but not limited to, the ribose ring in which the 2'-O atom is attached
  • One or more ribonucleic acid analogs "locked" to the methylene bridge of the 4'-C atom these modified nucleotides result in an increase in the Tm or melting temperature of each nucleotide monomer by approximately 2 degrees Celsius to about 8 degrees Celsius.
  • one indicator of the use of modified nucleotides in the method may be that the oligonucleotides comprising the modified nucleotides may be Digested by single-strand specific RNase.
  • the nucleic acid bases in a single nucleotide at one or more positions in a polynucleotide or oligonucleotide may include guanine, adenine, uracil, thymine, or cytosine pyrimidines, or alternatively, one or more of the nucleic acid bases may comprise modified bases such as, but not limited to, xanthine, allyamino-uracil, allylamino-thymine glycosides, hypoxanthine, 2-aminoadenine, 5-propynyluracil, 5-propynylcytosine, 4-thiouracil, 6-thioguanine, azauracil and deazauracil, thymus Pyrimidine nucleosides, cytosine, adenine or guanine.
  • modified bases such as, but not limited to, xanthine, allyamino-uracil, allylamino-thymine glycosides
  • nucleic acid bases may comprise nucleic acid bases derivatized with a biotin moiety, a digoxigenin moiety, a fluorescent or chemiluminescent moiety, a quenching moiety, or some other moiety.
  • the present invention is not limited to the listed nucleic acid bases; this list given shows examples of a wide range of bases that can be used in the methods of the present invention.
  • one or more of the sugar moieties may comprise 2'-deoxyribose, or alternatively, one or more of the sugar moieties may comprise some other sugar moiety, such as But not limited to: ribose or 2'-fluoro-2'-deoxyribose or 2'-O-methyl-ribose that provides resistance to some nucleases, or can be combined with visible, fluorescent, infrared fluorescent or other detectable dyes or chemicals with electrophilic, photoreactive, alkynyl or other reactive chemical moieties labeled with 2'-amino 2'-deoxyribose or 2'-azido- 2'-deoxyribose.
  • internucleoside linkages of the nucleic acids or polynucleotides of the invention can be phosphodiester linkages, or alternatively, one or more of the internucleoside linkages can include modified linkages such as, but not limited to: thiols Phosphate, phosphorodithioate, phosphoroselenate, or phosphorodiselenate linkages, which are resistant to some nucleases.
  • reverse transcriptase having terminal transfer activity refers to a template-independent addition (or "" tail") to the reverse transcriptase of the 3' end of the cDNA.
  • examples of such reverse transcriptases include, but are not limited to, M-MLV reverse transcriptase, HIV-1 reverse transcriptase, AMV reverse transcriptase, telomerase reverse transcriptase, and reverse transcriptases having the reverse transcriptase activity and Variants, modified products and derivatives of terminal transfer activity.
  • the reverse transcriptase used to reverse transcribe the RNA to generate cDNA has no or reduced RNase activity (especially RNase H activity) to avoid degradation of the RNA.
  • the reverse transcriptase used to reverse transcribe RNA to generate cDNA has end-transfer activity and no or reduced RNase activity (especially RNase H activity).
  • RNase H activity especially RNase H activity
  • examples of such reverse transcriptases include, but are not limited to, M-MLV reverse transcriptase, HIV-1 reverse transcriptase, AMV reverse transcriptase, and telomerase modified or mutated to remove RNase activity, particularly RNase H activity reverse transcriptase.
  • the expression "has reduced RNase activity” means that the modified or mutated reverse transcriptase has reduced RNase activity compared to the naturally occurring wild-type reverse transcriptase.
  • nucleic acid polymerase with "strand displacement activity” means that in the process of extending a new nucleic acid strand, if it encounters a nucleic acid strand complementary to the template strand downstream, it can continue the extension reaction and complement the template strand.
  • nucleic acid polymerase with "high fidelity” refers to a nucleic acid polymerase (or DNA polymerase) that has a lower probability of introducing erroneous nucleotides (ie, the error rate) in the process of amplifying nucleic acid than that of the wild-type Taq enzyme.
  • Nucleic acid polymerase or DNA polymerase.
  • annealing or “hybridizing” and “annealing” or “hybridizing” refer to between nucleotide sequences having sufficient complementarity to form a complex via Watson-Crick base pairing form a complex.
  • nucleic acid sequences that are “complementary to” or “complementary to” or “hybridize” or “anneal” to each other should be capable of forming or forming a sufficiently stable “hybrid” or “hybrid” to serve the intended purpose. "Complex”.
  • every nucleic acid base within the sequence displayed by one nucleic acid molecule is capable of base-pairing or pairing or complexing with every nucleic acid base within the sequence displayed by a second nucleic acid molecule so that the two nucleic acid molecules or in The corresponding sequences are shown as being “complementary” or “annealing” or “hybridizing” to each other.
  • the terms “complementary” or “complementarity” are used when referring to sequences of nucleotides that are related by base pairing. For example, the sequence 5'-A-G-T-3' is complementary to the sequence 3'-T-C-A-5'.
  • Complementarity can be “partial” in which only some of the nucleic acid bases match according to the base pairing laws. Alternatively, there may be “complete” or “total” complementarity between nucleic acids. The degree of complementarity between nucleic acid strands has a significant effect on the efficiency and strength of hybridization between nucleic acid strands. This is particularly important in amplification reactions and detection methods that rely on hybridization of nucleic acids. As used herein, the terms “annealing” or “hybridization” are used in reference to the pairing of complementary nucleic acid strands.
  • Hybridization and the strength of hybridization are affected by a number of factors well known in the art, including the degree of complementarity between nucleic acids, including the stringency of conditions such as salt concentration, the resulting hybridization
  • the Tm (melting temperature) of the body the presence of other components (eg, the presence or absence of polyethylene glycol or betaine), the molarity of the hybridized strands, and the G:C content of the nucleic acid strands.
  • annealing or hybridization can be performed using low, medium, or high stringency conditions, which are known in the art.
  • Beads generally refers to particles. Beads can be porous, non-porous, solid, semi-solid, semi-fluid or fluid. Beads can be magnetic or non-magnetic. In some embodiments, the beads may be dissolvable, breakable, or degradable. In some cases, the beads may be non-degradable. In some embodiments, the beads may be gel beads. The gel beads may be hydrogel beads. Gel beads can be formed from molecular precursors, such as polymeric or monomeric species. The semi-solid beads can be liposomal beads. Solid beads may contain metals, including iron oxides, gold and silver. In some cases, the beads are silica beads. In some cases, the beads are rigid. In some cases, the beads may be flexible and/or compressible.
  • the beads can contain molecular precursors (eg, monomers or polymers) that can form a polymer network through polymerization of the precursors.
  • the precursor may be an already polymerized species that is capable of further polymerizing, eg, by chemical cross-linking.
  • the precursor comprises one or more of acrylamide or methacrylamide monomers, oligomers or polymers.
  • the beads may contain prepolymers, which are oligomers capable of further polymerization. For example, prepolymers can be used to prepare polyurethane beads.
  • the beads may contain individual polymers that can be further polymerized together.
  • the beads can be generated by the polymerization of different precursors such that they comprise mixed polymers, copolymers and/or block copolymers.
  • the beads may contain natural and/or synthetic materials.
  • the polymer can be a natural polymer or a synthetic polymer.
  • the beads comprise natural and synthetic polymers.
  • natural polymers include proteins and sugars, such as DNA, rubber, cellulose, starch, proteins, enzymes, polysaccharides, silk, polyhydroxyalkanoates, chitosan, dextran, collagen, carrageenan Gum, psyllium, gum arabic, agar, gelatin, shellac, karaya, xanthan, corn gum, guar, karaya, agarose, alginic acid, alginates or their natural polymers thing.
  • Examples of synthetic polymers include acrylics, nylons, silicones, spandex, viscose rayon, polycarboxylic acids, polyvinyl acetates, polyacrylamides, polyacrylates, polyethylene glycols, polyurethanes, polylactic acids, di- Silica, polystyrene, polyacrylonitrile, polybutadiene, polycarbonate, polyethylene, polyethylene terephthalate, poly(chlorotrifluoroethylene), poly(ethylene oxide), poly (ethylene terephthalate), polyethylene, polyisobutylene, poly(methyl methacrylate), poly(formaldehyde), polyoxymethylene, polypropylene, polystyrene, poly(tetrafluoroethylene), poly( vinyl acetate), poly(vinyl alcohol), poly(vinyl chloride), poly(vinylidene chloride), poly(vinylidene fluoride), poly(vinyl fluoride), and any combination thereof (eg, a copolymer). Bead
  • the beads can be of uniform size or non-uniform size.
  • the diameter of the beads can be about 1 ⁇ m, 5 ⁇ m, 10 ⁇ m, 20 ⁇ m, 30 ⁇ m, 40 ⁇ m, 50 ⁇ m, 60 ⁇ m, 70 ⁇ m, 80 ⁇ m, 90 ⁇ m, 100 ⁇ m, 250 ⁇ m, 500 ⁇ m, or 1 mm.
  • the diameter of the beads can be at least about 1 ⁇ m, 5 ⁇ m, 10 ⁇ m, 20 ⁇ m, 30 ⁇ m, 40 ⁇ m, 50 ⁇ m, 60 ⁇ m, 70 ⁇ m, 80 ⁇ m, 90 ⁇ m, 100 ⁇ m, 250 ⁇ m, 500 ⁇ m, 1 mm, or larger. In some cases, the diameter of the beads can be less than about 1 ⁇ m, 5 ⁇ m, 10 ⁇ m, 20 ⁇ m, 30 ⁇ m, 40 ⁇ m, 50 ⁇ m, 60 ⁇ m, 70 ⁇ m, 80 ⁇ m, 90 ⁇ m, 100 ⁇ m, 250 ⁇ m, 500 ⁇ m, or 1 mm.
  • the diameter of the beads can be about 40-75 ⁇ m, 30-75 ⁇ m, 20-75 ⁇ m, 40-85 ⁇ m, 40-95 ⁇ m, 20-100 ⁇ m, 10-100 ⁇ m, 1-100 ⁇ m, 20-250 ⁇ m, or 20 ⁇ m -500 ⁇ m range.
  • the beads are provided as a population of beads or beads having a relatively monodisperse size distribution. Maintaining relatively consistent bead characteristics (eg, size) can contribute to overall consistency in situations where relatively consistent amounts of reagents need to be provided within a partition.
  • the beads described herein can have a coefficient of variation in their cross-sectional dimensions of less than 50%, less than 40%, less than 30%, less than 20%, and in some cases less than 15%, less than 10%, or even less than 5% size distribution.
  • the beads can have any suitable shape.
  • bead shapes include, but are not limited to, spherical, aspherical, elliptical, oblong, amorphous, circular, cylindrical, and variations thereof.
  • oligonucleic acid molecules containing marker sequences can be coupled to the surface of and/or enclosed within beads.
  • Functionalization of beads for attachment of oligonucleotides can be accomplished by a number of different methods, including activation of chemical groups within the polymer, incorporation of reactive or activatable functional groups into the polymer structure, or production of linking in the prepolymer or monomer stage.
  • precursors eg, monomers, cross-linking agents
  • precursors eg, monomers, cross-linking agents
  • precursors eg, monomers, cross-linking agents
  • precursors eg, monomers, cross-linking agents
  • precursors eg, monomers, cross-linking agents
  • precursors eg, monomers, cross-linking agents
  • beads may contain acrylamide phosphoramidite moieties such that when beads are formed, the beads also contain acrylamide phosphoramidite moieties.
  • Acrylamide phosphoramidite moieties can be attached to oligonucle
  • beads can release the oligos either spontaneously or upon exposure to one or more stimuli (eg, temperature changes, pH changes, exposure to specific chemicals or phases, exposure to light, reducing agents, etc.). Nucleotides. Adding multiple types of labile bonds to gel beads can result in beads that can respond to different stimuli. Each type of labile bond can be sensitive to associated stimuli (e.g., chemical stimuli, light, temperature, etc.), such that by applying the appropriate stimulus, the release of substances attached to the beads through each labile bond can be controlled. Such functional groups can be used for controlled release of substances from gel beads.
  • oligonucleotides releasably, cleavably or reversibly attached to beads described herein may include release or release by cleavage of the linkage between the oligonucleotide molecule and the bead A releasable barcode or marker sequence, or a barcode or marker sequence released by degradation of the bead itself, or both, which allows or is accessible by other reagents.
  • labile bonds that can be coupled to precursors or beads include ester bonds (eg, cleavable with acid, base, or hydroxylamine), vicinal diol bonds (eg, cleavable by sodium periodate), Diels-Alder bonds (eg, cleavable by thermal), sulfone bonds (eg, cleavable by alkali), methyl Silyl ether bonds (eg, cleavable by acid), glycosidic bonds (eg, cleavable by amylase), peptide bonds (eg, cleavable by proteases), or phosphodiester bonds (eg, cleavable by nucleases (eg, DNase) cleavage)).
  • ester bonds eg, cleavable with acid, base, or hydroxylamine
  • vicinal diol bonds eg, cleavable by sodium periodate
  • Diels-Alder bonds eg, cleavable by
  • the beads can be spontaneously or upon exposure to one or more stimuli (eg, temperature changes, pH changes , degradable, destructible, or soluble upon exposure to specific chemicals or phases, exposure to light, reducing agents, etc.).
  • the beads may be soluble, such that the material components of the beads dissolve upon exposure to specific chemicals or environmental changes (eg, changing temperatures or pH changes).
  • the gel beads degrade or dissolve under elevated temperature and/or alkaline conditions.
  • the beads can be thermally degradable, such that when the beads are exposed to a suitable temperature change (eg, heat), the beads degrade. Degradation or solubilization of beads bound to substances (eg, oligonucleotides, eg, barcoded oligonucleotides) can result in the release of substances from the beads.
  • substances that do not participate in polymerization can also be encapsulated in the beads during bead formation (eg, during polymerization of the precursor). Such species may enter the polymerization reaction mixture such that the resulting beads contain the species as the beads are formed. In some cases, such materials may be added to the gel beads after formation.
  • Such substances can include, for example, oligonucleotides, reagents for nucleic acid amplification reactions (eg, primers, polymerases, dNTPs, cofactors (eg, ionic cofactors)), reagents for enzymatic reactions (eg, enzymes, cofactors, substrates) or reagents for nucleic acid modification reactions such as polymerization, ligation or digestion.
  • the capture of such species can be controlled by the density of the polymer network generated during the polymerization of the precursor, by the control of the ionic charge within the gel beads (for example, by ionic species attached to the polymer species), or by the release of other species .
  • the encapsulated material can be released from the beads upon degradation of the beads and/or by applying a stimulus capable of releasing the material from the beads.
  • transposase and reverse transcriptase and “nucleic acid polymerase” refer to protein molecules or aggregates of protein molecules responsible for catalyzing specific chemical and biological reactions.
  • the methods, compositions or kits of the invention are not limited to the use of a specific transposase, reverse transcriptase or nucleic acid polymerase from a specific source.
  • the methods, compositions or kits of the invention include any transposase, reverse transcriptase or nucleic acid polymerase from any source having equivalent enzymatic activity to the specific enzymes disclosed herein according to the specific method, composition or kit .
  • the methods of the present invention also include embodiments wherein any one particular enzyme provided and used in the steps of the method is replaced by a combination of two or more enzymes, the two or more enzymes When used in combination, whether used separately or together in a step-by-step fashion, the reaction mixture produces the same results as would be obtained with that particular enzyme.
  • the methods, buffers and reaction conditions provided herein, including those in the Examples, are presently preferred for embodiments of the methods, compositions and kits of the invention.
  • other enzyme storage buffers, reaction buffers and reaction conditions using some of the enzymes of the present invention are known in the art, which may also be suitable for use in the present invention and are included herein.
  • the labeled nucleic acid molecules produced according to the methods of the present application can be conveniently used to construct a library of nucleic acid molecules, in particular a transcriptome sequencing library, wherein the library of nucleic acid molecules contains the 5' of RNA molecules (eg, mRNA molecules) Terminal sequence information can be used to analyze the abundance and 5' terminal sequence of RNA molecules (eg, mRNA molecules) in the transcriptome, as well as the transcription start position.
  • RNA molecules eg, mRNA molecules
  • the nucleic acid molecule library constructed by the method of the present application has dual cell tags (eg, a first tag and a second tag), which can significantly reduce the adverse effects of "pseudo-single cells" on the sequencing process and sequencing data. Therefore, the method of the present application can greatly reduce the empty load rate of the microreaction system during the library construction process, improve the cell throughput and sample throughput of a single library construction reaction, and can greatly reduce the library construction cost and the sequencing cost.
  • nucleic acid molecules of the present application 1) Compatible with the current major transcription organization library technologies and platforms (including, microfluidic droplet-based high-throughput single-cell transcription organization library technology, microplate-based high-throughput Quantitative single-cell transcriptome library technology, etc.), which can be easily commercialized; 2) Compatible with cell- or nucleus-based library construction schemes, breaking through sample limitations (for example, single-cell and single-nucleus transcriptome libraries can be established based on frozen samples ).
  • the application provides a method of treating a cell or nucleus to produce a population of nucleic acid fragments, comprising the steps of:
  • RNA eg, mRNA, long non-coding RNA, eRNA
  • a treatment including a reverse transcription step to form a double-stranded nucleic acid (eg, containing RNA (eg, mRNA) containing cDNA strands) , long non-coding RNA, eRNA) strand and cDNA strand hybrid double-stranded nucleic acid);
  • the transposase complex contains a transposase and the transposase can recognize and A transposition sequence that binds and is capable of cleaving or breaking a double-stranded nucleic acid (eg, a hybrid double-stranded nucleic acid containing RNA and DNA); and, the transposition sequence comprises a transferred strand and a non-transferred strand; wherein the transferred strand comprising a transposase recognition sequence, a first tag sequence, and a first consensus sequence; wherein the first tag sequence is located upstream (eg, at the 5' end) of the transposase recognition sequence, and the first tag sequence A consensus sequence is located upstream (eg, the 5' end) of the first tag sequence; and the incubation allows the double-stranded nucleic acid (eg, the hybrid double-strand) with a transposase complex; wherein the transposase complex contains a transposase and the transposase can recognize and A transposition
  • nucleic acid fragments comprise a cDNA fragment, and the sequence of the transferred strand linked to the 5' end of the cDNA fragment.
  • the nucleic acid fragment comprises a first consensus sequence, a first tag sequence, a transposase recognition sequence and a cDNA fragment from the 5' end to the 3' end.
  • cells are permeabilized and/or fixed prior to step (2).
  • the cells may be treated with methanol and/or formaldehyde prior to performing step (2).
  • cells can be permeabilized using various known methods. Such permeabilization can allow various reactive reagents, including, for example, enzymes such as reverse transcriptases and transposases, nucleic acid molecules such as reverse transcription primers and transposable sequences, to penetrate the cell membrane, enter the cell, and function.
  • cells can be permeabilized with methanol.
  • cells can be fixed using various known methods.
  • cells can be fixed using formaldehyde.
  • the methods of the present application can be used to treat one or more cells or nuclei.
  • at least 2 eg, at least 10 , at least 102, at least 103 , at least 104 , at least 105, at least 106 are provided in step ( 1 ).
  • at least 10 7 or more cells or nuclei.
  • the cells or nuclei to be treated can be grouped, and for each group of cells or nuclei, the same or different transposition sequences can be used for treatment, thereby allowing Nucleic acid molecules derived from different groups of cells or nuclei are labeled with the same or different sequences (eg, first tag sequences).
  • the The cells or nuclei are divided into at least 2 (eg, at least 3, at least 4, at least 5, at least 8, at least 10, at least 12, at least 20, at least 24, at least 50, at least 96, at least 100, at least 200, at least 384, at least 400, or more) subsets, wherein each subset contains at least one cell or nucleus.
  • step (3) the double-stranded nucleic acid (eg, hybrid double-stranded nucleic acid) in each subset of cells or nuclei is incubated with the transposase complex, respectively.
  • the double-stranded nucleic acid eg, hybrid double-stranded nucleic acid
  • the transposase complexes have first tag sequences that are different from each other, whereby nucleic acid fragments generated from cells or nuclei of the respective subsets contain mutually different first tag sequences The first tag sequence of .
  • the transposase complexes have the same transposase, the same transposase recognition sequence, the same first consensus sequence, and/or, the same the non-transfer chain.
  • the transposase complexes have the same transposase, the same transposase recognition sequence, the same first tag sequence The consensus sequence, and, the same non-transferred strand.
  • nucleic acid fragments produced from cells or nuclei of the respective subsets have the same first consensus sequence and transposase recognition sequence; and nucleic acid fragments produced from cells or nuclei of the same subset have the same a first tag sequence; and nucleic acid fragments produced from different subsets of cells or nuclei have different first tag sequences from each other.
  • nucleic acid fragments generated from each subset have the same first consensus sequence
  • nucleic acid fragments generated from the same subset have the same first tag sequence
  • nucleic acid fragments generated from different subsets have the same first tag sequence.
  • the nucleic acid fragments have first tag sequences that are different from each other.
  • the first tag sequence can be used to determine the subset from which cells or nuclei are derived, and can be used to distinguish cells or nuclei derived from different subsets. Thus, following transposition, different subsets of cells or nuclei can be merged, and the first tag sequence can be used to distinguish the different subsets of cells or nuclei.
  • the first tag sequence can be used to distinguish the different subsets of cells or nuclei.
  • after performing step (3) at least 2 subsets of cells or nuclei are pooled. In certain preferred embodiments, after performing step (3), at least the cells or nuclei of each subset are pooled.
  • the methods of the present application are applicable to any cell or nucleus, including but not limited to cancer cells, stem cells, neural cells, fetal cells, and immune cells or nuclei thereof involved in an immune response.
  • the cells are cells or cell lines derived from animals, plants, or microorganisms, or any combination thereof.
  • the cell is a mammalian (eg, human) derived cell or cell line, or any combination thereof.
  • the cells are cancer cells, stem cells, neural cells, fetal cells, immune cells, or any combination thereof.
  • the cells are immune cells, such as B cells or T cells. Accordingly, nuclei from such cells can be used in the methods of the present application.
  • the nucleus is from an immune cell, such as a B cell or a T cell.
  • the population of nucleic acid fragments comprises a T cell receptor gene or gene product, or a B cell receptor gene or gene product.
  • RNA eg, mRNA, long non-coding RNA, eRNA
  • reverse transcriptases eg, mRNA, long non-coding RNA, eRNA
  • a containing RNA eg, mRNA, long non-coding RNA, eRNA
  • the hybrid double-stranded nucleic acid has an overhang at the 3' end of the cDNA strand.
  • the overhangs have at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, At least 10 or more nucleotides in length.
  • the overhang is an overhang of 2-5 cytosine nucleotides (eg, a CCC overhang).
  • an overhang can be formed or added to the 3' end of a cDNA strand by using a reverse transcriptase with end-transfer activity.
  • the reverse transcriptase has end-transfer activity.
  • the reverse transcriptase is capable of synthesizing a cDNA strand using RNA (eg, mRNA, long non-coding RNA, eRNA) as a template, and adding an overhang to the 3' end of the cDNA strand .
  • the reverse transcriptase is capable of adding lengths of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 3' ends of a cDNA strand 7, at least 8, at least 9, at least 10 or more nucleotide overhangs.
  • the reverse transcriptase is capable of adding an overhang of 2-5 cytosine nucleotides (eg, a CCC overhang) to the 3' end of the cDNA strand.
  • any reverse transcriptase capable of synthesizing a cDNA strand using an RNA molecule as a template and adding an overhang to the 3' end of the cDNA strand ie, a reverse transcriptase with end-transfer activity
  • a reverse transcriptase with end-transfer activity examples include, but are not limited to, M-MLV reverse transcriptase, HIV-1 reverse transcriptase, AMV reverse transcriptase, and telomerase reverse transcriptase.
  • the reverse transcriptase used preferably has no or reduced RNase activity (especially RNase H activity).
  • the reverse transcriptase is selected from the group consisting of M-MLV reverse transcriptase, HIV-1 reverse transcriptase, AMV modified or mutated to remove RNase activity (especially RNase H activity) Reverse transcriptase and telomerase reverse transcriptase (eg, M-MLV reverse transcriptase without RNase H activity).
  • the RNA (eg, mRNA, long non-coding RNA, eRNA) is reverse transcribed using primers comprising poly(T) sequences and/or primers comprising random oligonucleotide sequences .
  • the poly(T) sequence and/or random oligonucleotide sequence is located at the 3' end of the primer.
  • the poly(T) sequence comprises at least 5 (eg, at least 10, at least 15, or at least 20) thymine nucleotide residues.
  • the random oligonucleotide sequence has a length of 5-30 nt (eg, 5-10 nt, 10-20 nt, 20-30 nt).
  • the primers contain no modifications, or contain modified nucleotides.
  • any composition capable of forming a functional complex with a composition comprising a transposase recognition sequence and catalyzing the transposition of part or all of the composition comprising a transposase recognition sequence into a double-stranded nucleic acid incubated with the enzyme in a transposition reaction Transposases in molecules are suitable for use in the methods of the present application.
  • the transposase complex is capable of randomly cleaving or cleaving a hybrid double-stranded nucleic acid containing RNA and DNA.
  • the transposase is selected from the group consisting of Tn5 transposase, MuA transposase, Sleeping Beauty transposase, Mariner transposase, Tn7 transposase, Tn10 transposase, Ty1 transposase Enzymes, Tn552 transposases, and variants, modified products and derivatives having the transposition activity of the above transposases.
  • the transposase is a Tn5 transposase.
  • the first tag sequence is not limited by its composition or length, as long as it can play the role of identification.
  • the first tag sequence has at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 length of one or more nucleotides.
  • the first tag sequence is 4-8 nucleotides in length.
  • the first tag sequence is linked (eg, directly linked) to the 5' end of the transposase recognition sequence.
  • the first consensus sequence is not limited by its composition or length.
  • the first consensus sequence has at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 12, at least 15, at least 18, at least 20, at least 25 or more nucleotides in length.
  • the first consensus sequence is 12-25 nucleotides in length.
  • the first consensus sequence is linked (e.g., directly linked) to the 5' end of the first tag sequence.
  • the transfer strand comprises a first consensus sequence, a first tag sequence, and a transposase recognition sequence from the 5' end to the 3' end.
  • the transposase recognition sequence has the sequence set forth in SEQ ID NO:99. This sequence contains the recognition sequence for the Tn5 transposase.
  • the non-transferring strand is capable of annealing or hybridizing to the transferring strand to form a duplex.
  • the non-transferring strand comprises a sequence complementary to a transposase recognition sequence in the transferring strand.
  • the non-transferred strand has the sequence set forth in SEQ ID NO:1.
  • the transferred or non-transferred strand may or may not be modified as desired.
  • the transferred strand contains no modifications, or contains modified nucleotides; and/or, the non-transferred strand contains no modifications, or contains modified nucleotides.
  • the 5' end of the non-transferring strand is modified with a phosphate group; and/or, the 3' end of the non-transferring strand is blocked (eg, the 3' end of the non-transferring strand is 'terminal nucleotides are dideoxy nucleotides).
  • step (3) the population of nucleic acid fragments is formed within the cell or nucleus.
  • the population of nucleic acid fragments is used to construct a transcriptome library (e.g., 5' transcriptome library) or for transcriptome sequencing (e.g., 5' transcriptome sequencing).
  • a transcriptome library e.g., 5' transcriptome library
  • transcriptome sequencing e.g., 5' transcriptome sequencing
  • the population of nucleic acid fragments is used to construct a library of target nucleic acids (eg, V(D)J sequences) or for sequencing of target nucleic acids (eg, V(D)J sequences).
  • the target nucleic acid comprises the sequence of the nucleic acid of interest produced by cellular transcription or its complement.
  • the target nucleic acid comprises, (1) a nucleotide sequence encoding a T cell receptor (TCR) or a B cell receptor (BCR), or a portion thereof (eg, V(D) J sequence), or the complement of (2)(1).
  • the target nucleic acid comprises the sequence of the V(D)J gene or its complement.
  • the application provides a method of generating a labeled nucleic acid molecule, comprising the steps of:
  • step (a) at least 2 (eg, at least 10, at least 10, at least 10 , at least 10 , at least 10 , at least 10 ) are provided and/or, providing at least 2 (eg, at least 10 , at least 102 , at least 103 , at least 104 , at least 105) cells or nuclei , at least 10 6 , at least 10 7 , at least 10 8 , or more) beads.
  • the cells or nuclei and beads can be provided in various suitable reaction systems.
  • the cells or nuclei, and the beads are provided in microwells or droplets (eg, in a plurality of microwells or droplets).
  • the droplets are water-in-oil droplets.
  • Various means can be used to prepare water-in-oil droplets containing nuclei or beads of cells coupled to oligonucleotide molecules.
  • the preparation of water-in-oil droplets can be performed using a 10X GENOMICS Chromium platform or controller.
  • the beads are conjugated a plurality (eg, at least 10, at least 10 , at least 10 , at least 10 , at least 10 , at least 107 , at least 108 , or more) oligonucleotide molecules.
  • Oligonucleotide molecules can be coupled to beads using various known methods. Such methods are described in detail in the Definitions of Terms section above, and are not limited to the specific examples set forth therein.
  • oligonucleotide molecules can be coupled to the surface of the beads or enclosed within the beads. In certain preferred embodiments, the oligonucleotide molecules are coupled to the surface of the bead, and/or, enclosed within the bead.
  • the beads are capable of spontaneously or upon exposure to one or more stimuli (eg, temperature changes, pH changes, exposure to specific chemicals or phases, exposure to light, reducing agents, etc. ) to release the oligonucleotide.
  • one or more stimuli eg, temperature changes, pH changes, exposure to specific chemicals or phases, exposure to light, reducing agents, etc.
  • the beads may be prepared using any suitable material, and may have any desired size, shape, particle size distribution, and/or modification, as described in detail in the Definitions of Terms section above.
  • the beads are gel beads.
  • the marker sequence comprises an element selected from the group consisting of a first amplification primer sequence, a second consensus sequence, a second tag sequence, a unique molecular tag sequence, a template switching sequence, or any combination thereof .
  • the marker sequence comprises a second consensus sequence, a second tag sequence, a unique molecular tag sequence and a template switching sequence.
  • the marker sequence further comprises a first amplification primer sequence.
  • Template switching sequences can be designed so that oligonucleotide molecules (tag sequences) capture (eg, anneal or hybridize to) nucleic acid fragments within a cell or nucleus.
  • the template switching sequence comprises a sequence complementary to the 3' end overhang of the cDNA strand.
  • the overhang is an overhang of 2-5 cytosine nucleotides (eg, a CCC overhang), and the 3' end of the template switching sequence comprises 2-5 guanines Nucleotide overhangs (eg GGG).
  • template switching sequences are not limited by their length.
  • the template switching sequences have at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 15, at least 20, at least 25 or more nucleotides in length.
  • the template switching sequence contains no modifications, or contains modified nucleotides (eg, locked nucleic acids).
  • modified nucleotides eg, locked nucleic acids
  • the use of modified nucleotides may be advantageous.
  • modified nucleotides eg, locked nucleic acids
  • the unique molecular tag sequence is not limited by its composition or length, as long as it can play a labeling role.
  • the unique molecular tag sequence has at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 15, at least 20, at least 25 length of one or more nucleotides.
  • the unique molecular tag sequence contains no modifications, or contains modified nucleotides.
  • the second tag sequence is not limited by its composition or length, as long as it can play a role of identification.
  • the second tag sequence has at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 15 , at least 20, at least 25 or more nucleotides in length.
  • the second tag sequence contains no modifications, or contains modified nucleotides.
  • the second consensus sequence is not limited by its composition or length. In certain preferred embodiments, the second consensus sequence has at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 15 , at least 20, at least 25 or more nucleotides in length. In certain preferred embodiments, the second consensus sequence contains no modifications, or contains modified nucleotides.
  • the beads are coupled with a plurality of oligonucleotide molecules, and each oligonucleotide molecule has a unique molecular tag sequence that is different from one another. In certain preferred embodiments, each oligonucleotide molecule has the same second tag sequence and/or the same second consensus sequence.
  • the method uses a plurality of beads, and each bead has a plurality of oligonucleotide molecules; and, the plurality of oligonucleotides on the same bead
  • the acid molecules have the same second tag sequence, and the oligonucleotide molecules on different beads have second tag sequences that are different from each other.
  • tagged nucleic acid molecules generated from nucleic acid fragments in the same droplet can carry the same second tag sequence or its complement, as well as unique molecular tag sequences or their complements that are different from each other sequence (for labeling different nucleic acid fragments within the same droplet); labeled nucleic acid molecules generated from nucleic acid fragments in different droplets may carry second tag sequences or their complements that are different from each other.
  • each oligonucleotide molecule on each bead may have the same second consensus sequence and/or the same first amplification primer sequence.
  • the labeled nucleic acid molecules generated by the nucleic acid fragments in each droplet may carry the same second consensus sequence or its complement and/or the same first amplification primer sequence or its complement.
  • the oligonucleotide molecules on each bead have the same second consensus sequence.
  • the oligonucleotide molecules on each bead also have the same first amplification primer sequence.
  • a template switching sequence can be used to capture the desired nucleic acid fragment and initiate an extension reaction.
  • a template switching sequence can be provided, for example, at the 3' end of the marker sequence.
  • the second consensus sequence and/or the first amplification primer sequence can be used to provide primer binding sites.
  • the second consensus sequence and/or the first amplification primer sequence may, for example, be provided at the 5' end of the marker sequence.
  • the template switching sequence is located 3' to the marker sequence.
  • the second consensus sequence is located upstream of the second tag sequence, unique molecular tag sequence and/or template switching sequence.
  • the first amplification primer sequence is located upstream of the second consensus sequence.
  • the tag sequence comprises an optional first amplification primer sequence, a second consensus sequence, a second tag sequence, a unique molecular tag sequence and a template switching sequence from the 5' end to the 3' end .
  • the tag sequence comprises an optional first amplification primer sequence, a second consensus sequence, a unique molecular tag sequence, a second tag sequence and a template switching sequence from the 5' end to the 3' end .
  • step (b) the nucleic acid fragment and the oligonucleotide molecule are contacted by means selected from the group consisting of:
  • the oligonucleotide molecule anneals or hybridizes to the nucleic acid fragment containing the 3' end overhang of the cDNA strand via a template switching sequence, wherein the template switching sequence
  • the sequence comprises a sequence complementary to the 3' end overhang of the cDNA strand; and, the nucleic acid fragment (or the oligonucleotide molecule) is involved in the action of a nucleic acid polymerase (eg, DNA polymerase or reverse transcriptase)
  • a nucleic acid polymerase eg, DNA polymerase or reverse transcriptase
  • nucleic acid polymerases eg, DNA polymerases or reverse transcriptases
  • DNA polymerases or reverse transcriptases can be used to carry out the extension reaction, so long as they are capable of extending an oligonucleotide molecule (or a captured nucleic acid fragment) as a template Captured nucleic acid fragments (or oligonucleotide molecules) are sufficient.
  • the nucleic acid polymerase used in step (b) is the same reverse transcriptase used in step (2).
  • step (b) usually only the nucleic acid fragment containing the 3' end of the cDNA strand can be captured by the oligonucleotide molecule through the overhang of the 3' end of the cDNA strand, so the resulting labeled nucleic acid molecule is usually The sequence at the 3' end of the cDNA strand (which corresponds to the sequence at the 5' end of RNA (eg, mRNA, long noncoding RNA, eRNA)) or its complement will be contained. Thus, by sequencing the resulting labeled nucleic acid molecules or derivatives thereof, sequence information on the 5' end of cellular or nuclear RNA (eg, mRNA, long non-coding RNA, eRNA) can be obtained.
  • RNA eg, mRNA, long non-coding RNA, eRNA
  • the labeled nucleic acid molecule comprises the label sequence and the complementary sequence of the nucleic acid fragment from the 5' end to the 3' end, wherein the nucleic acid fragment comprises an , long non-coding RNA, eRNA) sequence complementary to the 5' end sequence.
  • the labeled nucleic acid molecule comprises, from the 5' end to the 3' end, the sequence of the nucleic acid fragment and the complement of the label sequence, wherein the nucleic acid fragment comprises an , mRNA, long non-coding RNA, eRNA) sequence complementary to the 5' end sequence.
  • the labeled nucleic acid molecule comprises, from the 5' end to the 3' end, a first consensus sequence, a first tag sequence, a transposase recognition sequence, a sequence of a cDNA fragment, a template switching sequence The complementary sequence of the unique molecular tag sequence, the complementary sequence of the second tag sequence, the complementary sequence of the second consensus sequence, and optionally the complementary sequence of the first amplification primer sequence.
  • the cDNA fragment comprises a sequence complementary to a sequence at the 5' end of an RNA (e.g., mRNA, long non-coding RNA, eRNA).
  • the method further comprises: (c) recovering and purifying the labeled nucleic acid molecule.
  • the labeled nucleic acid molecules are used to construct transcriptome libraries (e.g., 5' transcriptome libraries) or for transcriptome sequencing (e.g., 5' transcriptome sequencing).
  • transcriptome libraries e.g., 5' transcriptome libraries
  • transcriptome sequencing e.g., 5' transcriptome sequencing
  • the population of nucleic acid fragments is used to construct a library of target nucleic acids (eg, V(D)J sequences) or for sequencing of target nucleic acids (eg, V(D)J sequences).
  • the target nucleic acid comprises the sequence of the nucleic acid of interest produced by cellular transcription or its complement.
  • the target nucleic acid comprises, (1) a nucleotide sequence encoding a T cell receptor (TCR) or a B cell receptor (BCR), or a portion thereof (eg, V(D) J sequence), or the complement of (2)(1).
  • the target nucleic acid comprises the sequence of the V(D)J gene or its complement.
  • the application also provides a method for constructing a library of nucleic acid molecules, comprising,
  • step (ii) the labeled nucleic acid molecules derived from the plurality of beads are recovered and/or combined.
  • the labeled nucleic acid molecules can be enriched as desired.
  • nucleic acid amplification reactions can be performed on labeled nucleic acid molecules to generate enriched products.
  • the method further comprises, (iii) enriching the labeled nucleic acid molecule.
  • a nucleic acid amplification reaction is performed on the labeled nucleic acid molecules to produce enriched products.
  • the nucleic acid amplification reaction is performed using at least a first primer, wherein the first primer is capable of a sequence complementary to the sequence of the first amplification primer and/or the second primer The complement of the consensus sequence hybridizes or anneals.
  • the nucleic acid amplification reaction also employs a second primer capable of hybridizing or annealing to a sequence complementary to the first consensus sequence.
  • the first primer comprises: 1 the first amplification primer sequence or a partial sequence thereof, or 2 the second consensus sequence or a partial sequence thereof, or 3 a combination of 1 and 2.
  • the second primer contains the first consensus sequence or a portion thereof.
  • nucleic acid polymerases can be used to perform nucleic acid amplification reactions for enriching labeled nucleic acid molecules, so long as they can be performed using the labeled nucleic acid molecules as templates Amplification reaction is sufficient.
  • the nucleic acid amplification reaction can be performed using a nucleic acid polymerase having strand displacement activity (eg, a DNA polymerase having strand displacement activity).
  • the nucleic acid amplification reaction can be performed using a high-fidelity nucleic acid polymerase (eg, a high-fidelity DNA polymerase).
  • the nucleic acid amplification reaction in step (iii) is performed using a nucleic acid polymerase (eg, a DNA polymerase; eg, a DNA polymerase having strand displacement activity and/or high fidelity).
  • a nucleic acid polymerase eg, a DNA polymerase; eg, a DNA polymerase having strand displacement activity and/or high fidelity.
  • the method further comprises, in performing the step ( iii) prior to the step of degrading the oligonucleotide molecule or template switching sequence. It is readily understood that degrading the oligonucleotide molecule or template switching sequence may be advantageous in certain circumstances, for example, to avoid hindering the nucleic acid amplification reaction of the oligonucleotide molecule or template switching sequence.
  • the annealing temperature of the first primer to the labeled nucleic acid molecule is higher than the annealing temperature of the oligonucleotide molecule to the labeled nucleic acid molecule.
  • the method further comprises, (iv) recovering and purifying the enriched product of step (iii).
  • step (iii) in order to facilitate the recovery and purification of the enriched product of step (iii) in step (iv), optionally, in step (iii), the first primer with a label and/or with a label can be used.
  • the labeled second primer performs a nucleic acid amplification reaction on the labeled nucleic acid molecule.
  • the enriched product of step (iii) can be recovered and purified using a binding molecule capable of interacting with said marker molecule.
  • the binding molecule is capable of interacting specifically or non-specifically with the labeling molecule.
  • the binding molecule interacts with the marker molecule in a manner selected from the group consisting of positive and negative charge interactions (eg, polylysine-glycoprotein), affinity interactions (eg, biotin-avidin, biotin-streptavidin, antigen-antibody, receptor-ligand, enzyme-cofactor), click chemistry (e.g. alkynyl group-containing azide compounds), or any combination thereof.
  • positive and negative charge interactions eg, polylysine-glycoprotein
  • affinity interactions eg, biotin-avidin, biotin-streptavidin, antigen-antibody, receptor-ligand, enzyme-cofactor
  • click chemistry e.g. alkynyl group-containing azide compounds
  • the labeling molecule is polylysine, and the binding molecule is a glycoprotein; or, the labeling molecule is an antibody, and the binding molecule is an antigen that can bind to the antibody; or, the labeling molecule is biotin, and the binding molecule is streptavidin.
  • the binding molecule is polylysine, and the labeling molecule is a glycoprotein; or, the binding molecule is an antibody, and the labeling molecule is an antigen that can bind to the antibody; or, the binding molecule is biotin, and the labeling molecule is streptavidin.
  • step (iii) the first primer is linked with a first labeling molecule, and the first labeling molecule is capable of interacting with the first binding molecule.
  • step (iv) the enriched product of step (iii) is recovered and purified using the first binding molecule.
  • a nucleic acid amplification reaction is performed on the labeled nucleic acid molecule using at least the first primer and the second primer to generate an enriched product; wherein the The first primer is linked with a first marker molecule, and/or the second primer is linked with a second marker molecule; the first marker molecule can interact with the first binding molecule, and the second marker molecule can interact with the first binding molecule.
  • the second binding molecule interacts.
  • the enriched product of step (iii) is recovered and purified using the first binding molecule and/or the second binding molecule.
  • the first labeling molecule is the same or not the same as the second labeling molecule, and/or the first binding molecule is the same or not the same as the second binding molecule.
  • the first primer without the marker molecule and/or the first primer without the marker molecule may be used first. performing a nucleic acid amplification reaction on the labeled nucleic acid molecule with the second primer; then, using the first primer connected with the first labeled molecule and/or the first primer connected with the second labeled molecule The second primer performs an additional nucleic acid amplification reaction on the labeled nucleic acid molecule.
  • binding molecules are also applicable to the first binding molecule and the second binding molecule; the detailed descriptions and definitions of the labeled molecules above are also applicable to the first labeling molecule, the second marker molecule.
  • a nucleic acid amplification reaction can be performed on the recovered labeled nucleic acid molecules or the recovered enrichment products to generate amplification products for sequencing. Therefore, in certain preferred embodiments, the method further comprises the steps of:
  • step (v) subjecting the labeled nucleic acid molecules recovered in step (ii) or the enrichment products recovered in step (iv) to a nucleic acid amplification reaction to generate amplification products.
  • the nucleic acid amplification reaction is performed using at least a third primer and a fourth primer.
  • the third primer is capable of hybridizing or annealing to the complementary sequence of the first amplification primer sequence and/or the complementary sequence of the second consensus sequence, and optionally contains a third tag sequence; and, the The fourth primer is capable of hybridizing or annealing to the complement of the first consensus sequence, and optionally contains a second amplification primer sequence and/or a fourth tag sequence.
  • the third and fourth tag sequences may not be used.
  • the third tag sequence can be introduced in the third primer without the fourth tag sequence in the fourth primer.
  • the fourth tag sequence can be introduced in the fourth primer without the third tag sequence in the third primer.
  • third and fourth tag sequences can be introduced in the third and fourth primers, respectively.
  • the third and/or fourth tag sequences can, for example, be used to distinguish labeled nucleic acid molecules from different libraries.
  • the third primer contains the first amplification primer sequence or a portion thereof, an optional third tag sequence, and an optional second consensus sequence or a portion thereof .
  • the third primer contains: 1 the first amplification primer sequence or a partial sequence thereof; or 2 the first amplification primer sequence or a partial sequence thereof, and the second consensus sequence or a partial sequence thereof , or, 3 the first amplification primer sequence or its partial sequence, the third tag sequence, and the second consensus sequence or its partial sequence.
  • the fourth primer contains a second amplification primer sequence, an optional fourth tag sequence, and a first consensus sequence or a portion thereof.
  • the fourth primer contains: 1 the second amplification primer sequence, and the first consensus sequence or a partial sequence thereof; or, 2 the second amplification primer sequence, the fourth tag sequence, and the first consensus sequence or its partial sequence.
  • the third tag sequence is not limited by its composition or length, as long as it can play a role of identification.
  • the third tag sequence has at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or more length of nucleotides.
  • the third tag sequence contains no modifications, or contains modified nucleotides.
  • the fourth tag sequence is not limited by its composition or length, as long as it can play the role of identification.
  • the fourth tag sequence has at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or more length of nucleotides.
  • the fourth tag sequence contains no modifications, or contains modified nucleotides.
  • nucleic acid polymerases can be used to perform nucleic acid amplification reactions that produce amplification products for sequencing, so long as they are capable of using labeled nucleic acid molecules or enrichment products
  • a nucleic acid amplification reaction eg, extension of the third primer and the fourth primer
  • the nucleic acid amplification reaction can be performed using a nucleic acid polymerase having strand displacement activity (eg, a DNA polymerase having strand displacement activity).
  • the nucleic acid amplification reaction can be performed using a high-fidelity nucleic acid polymerase (eg, a high-fidelity DNA polymerase).
  • the nucleic acid amplification reaction in step (v) is performed using a nucleic acid polymerase (eg, a DNA polymerase; eg, a DNA polymerase having strand displacement activity and/or high fidelity).
  • the nucleic acid amplification reaction used to enrich the labeled nucleic acid molecules and the nucleic acid amplification reaction used to generate the nucleic acid molecules to be sequenced may use the same or different nucleic acid polymerases (eg, DNA polymerases).
  • the nucleic acid polymerase (eg, DNA polymerase) used in step (v) is the same as or different from step (iii).
  • the library of nucleic acid molecules comprises the amplification product of step (v).
  • one nucleic acid strand of the amplification product comprises, from the 5' end to the 3' end, a second amplification primer sequence, an optional fourth tag sequence, a first consensus sequence, a first Tag sequence, transposase recognition sequence, sequence of cDNA fragment, complementary sequence of template switching sequence, complementary sequence of unique molecular tag sequence, complementary sequence of second tag sequence, complementary sequence of second consensus sequence, optional third sequence The complementary sequence of the tag sequence, and the complementary sequence of the first amplification primer sequence.
  • the cDNA fragment comprises a sequence complementary to a sequence at the 5' end of an RNA (e.g., mRNA, long non-coding RNA, eRNA).
  • the library of nucleic acid molecules is used for transcriptome sequencing (e.g., 5' transcriptome sequencing) or for sequencing of target nucleic acids (e.g., V(D)J sequences).
  • the target nucleic acid comprises the sequence of the nucleic acid of interest produced by cellular transcription or its complement.
  • the target nucleic acid comprises, (1) a nucleotide sequence encoding a T cell receptor (TCR) or a B cell receptor (BCR), or a portion thereof (eg, V(D) J sequence), or the complement of (2)(1).
  • the target nucleic acid comprises the sequence of the V(D)J gene or its complement.
  • the method further comprises the step of enriching the target nucleic acid molecule.
  • step of enriching target nucleic acid molecules can be carried out in any process subsequent to step (i) of the method.
  • the target nucleic acid molecule is enriched .
  • oligonucleotide probes can be used to specifically enrich for target nucleic acid molecules in the plurality of labeled nucleic acid molecules.
  • the oligonucleotide probe contains an oligonucleotide sequence capable of specifically binding or annealing to the target nucleic acid molecule.
  • the oligonucleotide probe contains a label molecule; and, target nucleic acid molecules that specifically bind or anneal to the oligonucleotide probe can be recovered and purified using one or more binding molecules ; wherein, the binding molecule and the labeling molecule can interact specifically or non-specifically.
  • the target nucleic acid molecule comprises: (i) a nucleotide sequence encoding a T cell receptor (TCR), or a partial sequence thereof (eg, a V(D)J sequence), and/or, ( ii) the complementary sequence of (i).
  • TCR T cell receptor
  • a partial sequence thereof eg, a V(D)J sequence
  • an oligonucleotide probe set comprising a first oligonucleotide probe and a second oligonucleotide probe can be used to target the target in the plurality of labeled nucleic acid molecules Nucleic acid molecules are specifically enriched; wherein, the first oligonucleotide probe contains a first specific oligonucleotide that can specifically bind or anneal to the nucleotide sequence encoding the ⁇ chain constant region of the TCR or its complementary sequence A nucleotide sequence, and a first labeling molecule; the second oligonucleotide probe contains a second specificity capable of specifically binding or annealing to the nucleotide sequence encoding the beta chain constant region of the TCR or its complement an oligonucleotide sequence, and a second marker molecule; and, a first binding molecule capable of interacting with the first marker molecule and/or a second binding molecule capable of interacting with the second marker
  • the method described in Tu, A.A. et al. TCR sequencing paired with massively parallel 3' RNA-seq reveals clonotypic T cell signatures. Nat Immunol 20, 1692-1699 (2019) Target nucleic acid molecules are enriched.
  • the target nucleic acid molecule comprises: (i) a nucleotide sequence encoding a B cell receptor (BCR), or a partial sequence thereof (eg, a V(D)J sequence), and/or, ( ii) the complementary sequence of (i).
  • BCR B cell receptor
  • a partial sequence thereof eg, a V(D)J sequence
  • an oligonucleotide probe set comprising a third oligonucleotide probe and a fourth oligonucleotide probe can be used to target the target in the plurality of labeled nucleic acid molecules Nucleic acid molecules are specifically enriched; wherein, the third oligonucleotide probe contains a third specific oligonucleotide capable of specifically binding or annealing to the nucleotide sequence encoding the light chain constant region of BCR or its complementary sequence Nucleotide sequence, and a third labeling molecule; the fourth oligonucleotide probe contains a fourth specificity capable of specifically binding or annealing to the nucleotide sequence encoding the heavy chain constant region of BCR or its complement an oligonucleotide sequence, and a fourth marker molecule; and, a third binding molecule capable of interacting with the third marker molecule and/or a fourth binding molecule capable of interacting with the fourth marker molecule may be used
  • the application also provides a method for nucleic acid sequencing of cells or nuclei, comprising:
  • a library of nucleic acid molecules is constructed according to the methods described above in this application.
  • the library of nucleic acid molecules is sequenced.
  • nucleic acid molecule libraries prior to sequencing, at least 2, at least 3, at least 4, at least 5, at least 8, at least 10, at least 12, at least 15, at least 18, At least 20, at least 25 or more nucleic acid molecule libraries are combined and then sequenced; wherein each nucleic acid molecule library has multiple nucleic acid molecules (ie, amplification products), and the multiple nucleic acid molecule libraries in the same library are combined.
  • Each nucleic acid molecule has the same third tag sequence or the same fourth tag sequence; and, nucleic acid molecules derived from different libraries have mutually different third tag sequences or mutually different fourth tag sequences.
  • the application also provides a nucleic acid molecule library comprising a plurality of nucleic acid molecules, wherein,
  • One nucleic acid strand of the nucleic acid molecule comprises, from the 5' end to the 3' end, a first consensus sequence, a first tag sequence, a transposase recognition sequence, a sequence of a cDNA fragment, a sequence complementary to a template switching sequence, a unique molecular tag sequence The complement of the second tag sequence, the complement of the second consensus sequence.
  • the cDNA fragment comprises a sequence complementary to the 5' end sequence of RNA (eg, mRNA, long non-coding RNA, eRNA).
  • the nucleic acid strands of each nucleic acid molecule have the same first consensus sequence, the same transposase recognition sequence, the same sequence complementary to the template switching sequence, and the same second consensus sequence complementary sequence.
  • the nucleic acid strands of the nucleic acid molecules of the cDNA fragment derived from the same cell have the same first tag sequence, and the same complementary sequence of the second tag sequence.
  • the nucleic acid strand also has a second amplification primer sequence and an optional fourth tag sequence upstream of the first consensus sequence.
  • the nucleic acid strand also has the complementary sequence of an optional third tag sequence downstream of the complementary sequence of the second consensus sequence and the complementary sequence of the first amplification primer sequence.
  • nucleic acid molecule library can be constructed using the method for constructing a nucleic acid molecule library provided in this application. Therefore, above for each element (including but not limited to, the second amplification primer sequence, the fourth tag sequence, the first consensus sequence, the first tag sequence, the transposase recognition sequence, the cDNA fragment, the template switching sequence , the unique molecular tag sequence, the second tag sequence, the second consensus sequence, the third tag sequence, and/or the first amplification primer sequence) detailed descriptions and definitions also apply to this aspect.
  • the second amplification primer sequence including but not limited to, the second amplification primer sequence, the fourth tag sequence, the first consensus sequence, the first tag sequence, the transposase recognition sequence, the cDNA fragment, the template switching sequence , the unique molecular tag sequence, the second tag sequence, the second consensus sequence, the third tag sequence, and/or the first amplification primer sequence
  • the library of nucleic acid molecules is a transcriptome library.
  • the nucleic acid molecules in the library of nucleic acid molecules are derived from immune cells.
  • the immune cells are selected from B cells and T cells.
  • the library of nucleic acid molecules is constructed by the methods provided herein.
  • the application also provides a kit comprising: a reverse transcriptase, a transposase, and one or more transposable sequences that the transposase can recognize and bind, wherein,
  • transposase and transposable sequence are capable of forming a transposase complex capable of cleaving or breaking a double-stranded nucleic acid (eg, a hybrid double-stranded nucleic acid comprising RNA and DNA); and,
  • the transposition sequence includes a transferred strand and a non-transferred strand.
  • the transfer strand comprises a transposase recognition sequence, a first tag sequence, and a first consensus sequence.
  • the first tag sequence is located upstream (e.g., 5' end) of the transposase recognition sequence
  • the first consensus sequence is located upstream (e.g., 5' end) of the first tag sequence.
  • the kit comprises at least 2 (eg, at least 3, at least 4, at least 5, at least 8, at least 10, at least 20, at least 50, at least 100 species, at least 200 species, or more) transposition sequences.
  • various transposition sequences have different first tag sequences from each other.
  • the various transposable sequences have the same transposase recognition sequence, the same first consensus sequence, and/or, the same non-transferred strand.
  • the reverse transcriptase has end-transfer activity.
  • the reverse transcriptase is capable of synthesizing a cDNA strand using RNA (eg, mRNA, long non-coding RNA, eRNA) as a template, and adding an overhang to the 3' end of the cDNA strand .
  • RNA eg, mRNA, long non-coding RNA, eRNA
  • the reverse transcriptase is capable of adding lengths of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 3' ends of a cDNA strand 7, at least 8, at least 9, at least 10 or more nucleotide overhangs.
  • the reverse transcriptase is capable of adding an overhang of 2-5 cytosine nucleotides (eg, a CCC overhang) to the 3' end of the cDNA strand.
  • the reverse transcriptase has no or reduced RNase activity (especially RNase H activity).
  • the reverse transcriptase is selected from the group consisting of M-MLV reverse transcriptase, HIV-1 reverse transcriptase, AMV reverse transcriptase modified or mutated to remove RNase activity (especially RNase H activity) Enzymes and telomerase reverse transcriptase (eg, M-MLV reverse transcriptase without RNase H activity).
  • the transposase is selected from the group consisting of Tn5 transposase, MuA transposase, Sleeping Beauty transposase, Mariner transposase, Tn7 transposase, Tn10 transposase, Ty1 transposase Enzymes, Tn552 transposases, and variants, modified products and derivatives having the transposition activity of the above transposases.
  • the transposase is a Tn5 transposase.
  • the first tag sequence is linked (eg, directly linked) to the 5' end of the transposase recognition sequence.
  • the first consensus sequence is linked (e.g., directly linked) to the 5' end of the first tag sequence.
  • the transfer strand comprises a first consensus sequence, a first tag sequence, and a transposase recognition sequence from the 5' end to the 3' end.
  • the transposase recognition sequence has the sequence set forth in SEQ ID NO:99.
  • the non-transferring strand is capable of annealing or hybridizing to the transferring strand to form a duplex.
  • the non-transferring strand comprises a sequence complementary to a transposase recognition sequence in the transferring strand.
  • the non-transferred strand has the sequence set forth in SEQ ID NO:1.
  • the transferred strand contains no modifications, or contains modified nucleotides; and/or, the non-transferred strand contains no modifications, or contains modified nucleotides.
  • the 5' end of the non-transferring strand is modified with a phosphate group; and/or, the 3' end of the non-transferring strand is blocked (eg, the 3' end of the non-transferring strand is 'terminal nucleotides are dideoxy nucleotides).
  • the kit further comprises reverse transcription primers, eg, primers comprising poly(T) sequences and/or primers comprising random oligonucleotide sequences.
  • the poly(T) sequence or the random oligonucleotide sequence is located at the 3' end of the primer.
  • the poly(T) sequence comprises at least 5 (eg, at least 10, at least 15, or at least 20) thymine nucleotide residues.
  • the random oligonucleotide sequence has a length of 5-30 nt (eg, 5-10 nt, 10-20 nt, 20-30 nt).
  • the primers contain no modifications, or contain modified nucleotides.
  • kits described herein further comprise reagents for constructing transcriptome sequencing libraries.
  • the reagents for constructing a transcriptome sequencing library comprise: beads coupled to oligonucleotide molecules containing marker sequences.
  • the oligonucleotide molecules are coupled to the surface of the bead, and/or, enclosed within the bead.
  • the beads are capable of spontaneously or upon exposure to one or more stimuli (eg, temperature changes, pH changes, exposure to specific chemicals or phases, exposure to light, reducing agents, etc. ) to release the oligonucleotide.
  • one or more stimuli eg, temperature changes, pH changes, exposure to specific chemicals or phases, exposure to light, reducing agents, etc.
  • the beads are gel beads.
  • the marker sequence comprises an element selected from the group consisting of a first amplification primer sequence, a second consensus sequence, a second tag sequence, a unique molecular tag sequence, a template switching sequence, or any combination thereof .
  • the marker sequence comprises a second consensus sequence, a second tag sequence, a unique molecular tag sequence and a template switching sequence. In certain preferred embodiments, the marker sequence further comprises a first amplification primer sequence.
  • the template switching sequence comprises a sequence complementary to the overhang added by the reverse transcriptase at the 3' end of the cDNA strand.
  • the overhang is an overhang of 2-5 cytosine nucleotides (eg, a CCC overhang), and the 3' end of the template switching sequence comprises 2-5 guanines Nucleotides (eg GGG).
  • the template switching sequence contains no modifications, or contains modified nucleotides (eg, locked nucleic acids).
  • the unique molecular tag sequence is at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or more nucleotides in length. In certain preferred embodiments, the unique molecular tag sequence contains no modifications, or contains modified nucleotides.
  • the second tag sequence has at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or more length of nucleotides. In certain preferred embodiments, the second tag sequence contains no modifications, or contains modified nucleotides.
  • the second consensus sequence has at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or more length of nucleotides. In certain preferred embodiments, the second consensus sequence contains no modifications, or contains modified nucleotides.
  • the beads are coupled with a plurality of oligonucleotide molecules, and each oligonucleotide molecule has a unique molecular tag sequence that is different from one another. In certain preferred embodiments, each oligonucleotide molecule has the same second tag sequence and/or the same second consensus sequence.
  • the reagent contains a plurality of beads, and each bead has a plurality of oligonucleotide molecules; and, the plurality of oligonucleotides on the same bead
  • the molecules have the same second tag sequence, and the oligonucleotide molecules on different beads have second tag sequences that are different from each other.
  • the oligonucleotide molecules on each bead have the same second consensus sequence.
  • the oligonucleotide molecules on each bead also have the same first amplification primer sequence.
  • the template switching sequence is located 3' to the marker sequence.
  • the second consensus sequence is located upstream of the second tag sequence, unique molecular tag sequence and/or template switching sequence.
  • the first amplification primer sequence is located upstream of the second consensus sequence.
  • the tag sequence comprises an optional first amplification primer sequence, a second consensus sequence, a second tag sequence, a unique molecular tag sequence and a template switching sequence from the 5' end to the 3' end .
  • the kit further comprises mineral oil, buffer, dNTPs, one or more nucleic acid polymerases (eg, DNA polymerases; eg, DNA polymerases with strand displacement activity and/or high fidelity) enzymes), reagents for recovering or purifying nucleic acids (e.g. magnetic beads), primers for amplifying nucleic acids (e.g. first primer, second primer, third primer, fourth primer, or any of the above defined primers) combination), or any combination thereof.
  • nucleic acid polymerases eg. DNA polymerases; eg, DNA polymerases with strand displacement activity and/or high fidelity
  • reagents for recovering or purifying nucleic acids e.g. magnetic beads
  • primers for amplifying nucleic acids e.g. first primer, second primer, third primer, fourth primer, or any of the above defined primers
  • the kit further comprises reagents for sequencing, eg, reagents for next-generation sequencing.
  • kits described herein can be used to implement the methods provided herein (eg, methods of treating cells or nuclei to generate populations of nucleic acid fragments as described above; methods of generating labeled nucleic acid molecules; constructing nucleic acid molecules methods of libraries; and/or, methods of transcriptome sequencing of cells or nuclei).
  • the application also provides the methods (eg, as described herein, methods of treating cells or nuclei to generate populations of nucleic acid fragments; methods of generating labeled nucleic acid molecules; and/or, constructing libraries of nucleic acid molecules) method) or the kit for constructing a nucleic acid molecule library or for transcriptome sequencing.
  • the library of nucleic acid molecules is used for transcriptome sequencing (eg, single cell transcriptome sequencing).
  • the method or kit is used to perform single cell transcriptome sequencing. In certain preferred embodiments, the method or kit is used to analyze the level of gene expression in cells or nuclei (eg, immune cells or nuclei thereof), the location of gene transcription initiation, and/or, RNA (eg, mRNA) , long non-coding RNA, eRNA) molecule 5' end sequence.
  • RNA eg, mRNA
  • eRNA long non-coding RNA, eRNA
  • the method or kit is used to construct a transcriptome library of cells or nuclei (eg, immune cells or nuclei thereof) or for performing cell or nuclei (eg, immune cells or nuclei thereof) transcriptome sequencing.
  • the immune cells are selected from B cells and T cells.
  • the present application provides a new method for labeling nucleic acid molecules (eg, RNA molecules, such as mRNA molecules, long non-coding RNAs, eRNAs), wherein the generated labeled nucleic acid molecules can be conveniently used to construct nucleic acid molecule libraries (especially transcriptome sequencing libraries), and can be conveniently used for high-throughput sequencing (especially, high-throughput single-cell transcriptome sequencing).
  • nucleic acid molecules eg, RNA molecules, such as mRNA molecules, long non-coding RNAs, eRNAs
  • the method of the present application has one or more beneficial technical effects selected from the following:
  • the nucleic acid molecule library constructed by the method of the present application contains the information of the 5' end sequence of RNA molecules (for example, mRNA molecules, long non-coding RNAs, eRNAs); accordingly, the high-pass obtained from the nucleic acid molecule library Quantitative sequencing data can be used to analyze not only the abundance of RNA molecules (e.g., mRNA molecules, long non-coding RNAs, eRNAs) in the transcriptome, but also RNA molecules (e.g., mRNA molecules, long non-coding RNAs, eRNAs). eRNA), the 5' end sequence of an RNA molecule (eg, an mRNA molecule) is analyzed. Therefore, the method of the present application can be conveniently used to analyze the sequences of TCR and BCR, and is compatible with the V(D)J sequencing method.
  • RNA molecules for example, mRNA molecules, long non-coding RNAs, eRNAs
  • Quantitative sequencing data can be used to analyze not only the abundance of RNA
  • nucleic acid molecules such as RNA molecules, such as mRNA molecules, long non-coding RNAs, and eRNAs
  • the method for labeling nucleic acid molecules (such as RNA molecules, such as mRNA molecules, long non-coding RNAs, and eRNAs) of the present application is compatible with current major transcriptional library technologies and platforms (including, microfluidic droplet-based high-pass Quantitative single-cell transcriptome library technology, microplate-based high-throughput single-cell transcriptome library technology, etc.), which can be easily commercialized.
  • the nucleic acid molecule library constructed by the method of the present application can significantly reduce the adverse effect of the "pseudo-single cell rate" on the sequencing process and sequencing data.
  • major library building and sequencing protocols for transcriptome sequencing especially 5'-end library building and sequencing protocols
  • the sequencing data generated by "pseudosingle cells” need to be filtered and removed because they cannot accurately reflect the transcriptome information of a single cell. This leads to a waste of sequencing data and an increase in sequencing costs.
  • the nucleic acid molecule library constructed by the method of the present application has dual cell tags (such as a first tag and a second tag), which enables the splitting of the sequencing data generated by "pseudo-single cells", thereby accurately tracking and determining the sequencing data cell source.
  • dual cell tags such as a first tag and a second tag
  • the nucleic acid molecule library constructed by the method of the present application can greatly reduce the empty load rate of the micro-reaction system in the process of library construction. Restricted by the "pseudo-single-cell phenomenon", the empty load rate of the microreaction system required by the current major transcriptional organization libraries and sequencing schemes (especially the 5'-end library construction and sequencing schemes) is very high (close to 99%). In other words, out of one hundred microreaction systems, only 1 microreaction is loaded with cells. This leads to a great waste of reaction system and reagents, resulting in very high cost of library construction.
  • the method of the present application greatly eliminates the negative impact of the "pseudo-single cell phenomenon", which enables the use of higher cell throughput (eg, cell throughput can be increased by at least 5 times, at least 10 times, or even 100 times) during the bank building process. times), which greatly reduces the no-load rate of the micro-reaction system and greatly reduces the cost of building a library.
  • cell throughput for a single bank can be increased to 100,000-1,000,000 cells.
  • the nucleic acid molecule library constructed by the method of the present application in a single reaction can be derived from different samples (for example, from 2, or 94, or 384 human cells).
  • the number of samples in a single reaction depending on the type of the first label, is easily scalable and has no theoretical upper limit.
  • the method of the present application greatly increases the sample throughput of a single reaction at a lower cost, and is suitable for single-cell transcriptome sequencing of large-scale population samples, immune profiling analysis of large-scale population samples, and various conditions based on cell lines and organoids. application scenarios such as drug screening, massively parallel single-cell interpretation of gene editing results.
  • RNA molecules eg, mRNA molecules, long non-coding RNAs, eRNAs
  • the application is more flexible and extensive.
  • Figure 1 shows an exemplary scheme for constructing a library for single-cell transcriptome sequencing using the methods of the present invention, and an exemplary structure of nucleic acid molecules in the library for sequencing.
  • the exemplary protocol includes the following steps.
  • the permeabilized cells or nuclei are divided into one or more subsets (eg, at least 1, at least 2, at least 3, at least 4, at least 5, at least 8, at least 10, at least 20 , at least 50, at least 100, at least 200, or more subsets); and, with a reverse transcriptase (eg, a reverse transcriptase with end-transfer activity) and a reverse transcription primer (eg, a 3'-end carrying reverse transcription primers of poly(T) sequences or random oligonucleotide sequences) to reverse transcription of RNA molecules (e.g., mRNA molecules, long non-coding RNAs, eRNAs) in the nucleus/permeabilized cells to generate cDNA, And add an overhang (eg, an overhang containing 3 cytosine nucleotides) to the 3' end of the cDNA.
  • a reverse transcriptase eg, a reverse transcriptase with end-transfer activity
  • a reverse transcription primer eg,
  • RNA within the cell or nucleus forms a hybrid double-stranded nucleic acid with the resulting cDNA.
  • a reverse transcriptase for example, a reverse transcriptase with terminal transfer activity; for example, M-MLV reverse transcriptase
  • a reverse transcription primer for example, a poly(T) sequence or Reverse transcription primers of random oligonucleotide sequences
  • the hybrid duplex comprising RNA eg, mRNA, long non-coding RNA, eRNA
  • cDNA is treated with a transposase complex (eg, Tn5 transposase complex) capable of cleaving or breaking hybrid double-stranded nucleic acids.
  • the nucleic acid undergoes transposition, causing random breaks in the hybrid double-stranded nucleic acid.
  • the transposase complex used contains a transposase (eg, a Tn5 transposase) and a transposase sequence that the transposase can recognize and bind to (eg, a transposase containing a Tn5 transposase recognition sequence).
  • the transposition sequence comprises a transfer strand and a non-transfer strand
  • the transfer strand comprises a transposase recognition sequence (Tn5 transposase recognition sequence; Tn5-S), a first tag sequence (Tag1) and a first consensus sequence (C1)
  • the first tag sequence is located upstream (eg, 5' end) of the transposase recognition sequence
  • the first consensus sequence is located at the first tag upstream (eg, 5' end) of the sequence
  • the non-transferring strand comprises a sequence complementary to the transposase recognition sequence in the transferring strand.
  • the transposase complex randomly breaks the hybrid double-stranded nucleic acid into nucleic acid fragments, and the transferred strand carrying the first tag sequence and the first consensus sequence is ligated to the 5' of the broken cDNA strand end.
  • the used transposase complexes have first tag sequences that are different from each other; thus, nucleic acid fragments generated from cells or nuclei of the respective subsets contain the first tag sequences that are different from each other. a tag sequence.
  • the transposase complexes used by each subset may have the same transposase, the same transposase recognition sequence, the same first consensus sequence, as well, the same non-transferred strand.
  • nucleic acid fragments produced from cells or nuclei of each subset have the same first consensus sequence and transposase recognition sequence; and nucleic acid fragments produced from cells or nuclei of the same subset have the same first tag sequence and nucleic acid fragments produced from different subsets of cells or nuclei have first tag sequences that are different from each other.
  • nuclei or cells from multiple subsets are combined and contacted with multiple beads coupled to oligonucleotide molecules to generate labeled nucleic acid molecules.
  • nuclei or cells are coupled to beads coupled to oligonucleotide molecules (eg, beads for transcriptional repertoire provided by 10X Genomics). ) contact; wherein the oligonucleotide molecule contains a marker sequence.
  • Exemplary tag sequences can contain a first amplification primer sequence (P1), a second consensus sequence (C2), a second tag sequence (Tag2), a unique molecular tag sequence (UMI), a template switching sequence (TSO), or the like any combination.
  • the marker sequence may contain a first amplification primer sequence, a second consensus sequence, a second tag sequence, a unique molecular tag sequence, and a template switching sequence.
  • the template switching sequence can generally be located 3' to the marker sequence.
  • the first amplification primer sequence and/or the second consensus sequence can generally be located at the 5' end of the marker sequence.
  • Various means can be used to prepare water-in-oil droplets containing nuclei or beads of cells coupled to oligonucleotide molecules.
  • the preparation of water-in-oil droplets can be performed using a 10X GENOMICS Chromium platform or controller.
  • the template switching sequence may comprise a sequence complementary to the 3' end overhang of the cDNA strand.
  • the template switching sequence may contain GGG at its 3' end.
  • the nucleotides of the template switching sequence can also be modified (eg, using locked nucleic acids) to enhance complementary pairing between the template switching sequence and the 3' end overhang of the cDNA strand.
  • the captured nucleic acid fragment can be extended with the oligonucleotide molecule as a template, and the complementary sequence of the tag sequence is added to the 3' end of the cDNA strand, thereby generating the first tag sequence at the 5' end. and a labeled nucleic acid molecule having the first consensus sequence and carrying the complement of the labeled sequence at the 3' end.
  • suitable nucleic acid polymerases eg, DNA polymerase or reverse transcriptase
  • the same reverse transcriptase enzyme as previously described for the reverse transcription step can be used to extend the captured nucleic acid fragments.
  • nucleic acid fragments that typically only contain the 3' end of the cDNA strand can be captured by the oligonucleotide molecule by overhanging the 3' end of the cDNA strand, so the resulting labeled nucleic acid molecule will typically contain The sequence of the 3' end of the cDNA strand (which corresponds to the sequence of the 5' end of RNA (eg, mRNA, long non-coding RNA, eRNA)).
  • sequence information on the 5' end of RNA e.g., mRNA
  • each bead is individually coupled to a plurality of oligonucleotide molecules; and, each oligonucleotide molecule on the same bead can have unique molecular tag sequences that differ from each other; and, the same Each oligonucleotide molecule on one bead can have the same second tag sequence; and, oligonucleotide molecules on different beads can have second tag sequences that are different from each other.
  • tagged nucleic acid molecules generated from nucleic acid fragments in the same droplet can carry the same complement of the second tag sequence, as well as the complement of unique molecular tag sequences that are different from each other (for labeling different nucleic acid fragments within the same droplet); labeled nucleic acid molecules generated from nucleic acid fragments in different droplets may carry complementary sequences of second tag sequences that are different from each other.
  • each oligonucleotide molecule on each bead may have the same second consensus sequence and/or the same first amplification primer sequence.
  • the labeled nucleic acid molecules generated by the nucleic acid fragments in each droplet may carry the same complementary sequence of the second consensus sequence and/or the complementary sequence of the same first amplification primer sequence.
  • the first amplification primer sequence can comprise a library linker sequence (eg, a P5 linker sequence).
  • library adapters can be added to the ends of the labeled nucleic acid molecules to facilitate subsequent sequencing. Subsequently, the resulting plurality of labeled nucleic acid molecules can be recovered and combined.
  • the labeled nucleic acid molecules can be enriched as desired.
  • nucleic acid amplification reactions can be performed on labeled nucleic acid molecules to generate enriched products.
  • the nucleic acid amplification reaction can be performed using at least a first primer.
  • the first primer can be designed to hybridize or anneal to the complement of the first amplification primer sequence and/or the complement of the second consensus sequence.
  • Exemplary first primers contain: the first amplification primer sequence or a portion thereof, or the second consensus sequence or a portion thereof, or a combination of the two.
  • nucleic acid polymerases can be used to perform nucleic acid amplification reactions for enriching labeled nucleic acid molecules, so long as they are capable of extending the labeled nucleic acid molecules as templates
  • the first primer is sufficient.
  • the nucleic acid amplification reaction can be performed using a nucleic acid polymerase having strand displacement activity (eg, a DNA polymerase having strand displacement activity).
  • the nucleic acid amplification reaction can be performed using a high-fidelity nucleic acid polymerase (eg, a high-fidelity DNA polymerase). Subsequently, the resulting enriched product can be recovered and purified. It is easy to understand that the enrichment step is not necessary and can be carried out according to the actual situation.
  • nucleic acid amplification reactions can be performed on the recovered labeled nucleic acid molecules or the recovered enrichment products to generate amplification products for sequencing.
  • the nucleic acid amplification reaction may be performed using at least a third primer and a fourth primer.
  • the third primer can be designed to hybridize or anneal to the complement of the first amplification primer sequence and/or the complement of the second consensus sequence, and optionally contains a third tag sequence (Tag3) .
  • a fourth primer can be designed to hybridize or anneal to the complement of the first consensus sequence, and optionally contain a second amplification primer sequence (P2) and/or a fourth tag sequence (Tag4) ).
  • the third and fourth tags may not be used.
  • the third tag can be introduced in the third primer without introducing the fourth tag in the fourth primer.
  • the fourth tag can be introduced in the fourth primer without introducing the third tag in the third primer.
  • third and fourth tags can be introduced in the third and fourth primers, respectively.
  • the third and/or fourth tags can be used, for example, to distinguish labeled nucleic acid molecules from different libraries.
  • the third primer may contain: 1 the first amplification primer sequence or a portion thereof; alternatively, 2 the first amplification primer sequence or a portion thereof, and the second A consensus sequence or a partial sequence thereof, or, 3 the first amplification primer sequence or a partial sequence thereof, a third tag sequence, and the second consensus sequence or a partial sequence thereof.
  • the fourth primer may contain: 1 a second amplification primer sequence, and a first consensus sequence or a partial sequence thereof; or, 2 a second amplification primer sequence, a fourth tag sequence, and the first consensus sequence or a partial sequence thereof.
  • nucleic acid polymerases can be used to perform nucleic acid amplification reactions that produce amplification products for sequencing, so long as they are capable of using labeled nucleic acid molecules or enrichment products It is sufficient to extend the third and fourth primers for the template.
  • the nucleic acid amplification reaction can be performed using a nucleic acid polymerase having strand displacement activity (eg, a DNA polymerase having strand displacement activity).
  • the nucleic acid amplification reaction can be performed using a high-fidelity nucleic acid polymerase (eg, a high-fidelity DNA polymerase).
  • the nucleic acid amplification reaction used to enrich the labeled nucleic acid molecules and the nucleic acid amplification reaction used to generate the nucleic acid molecules to be sequenced may use the same or different nucleic acid polymerases (eg, DNA polymerases).
  • the second amplification primer sequence may comprise a library linker sequence (eg, a P7 linker sequence).
  • a library linker sequence eg, a P7 linker sequence.
  • the resulting amplification products can contain library adapter sequences (eg, P5 adapter sequences and P7 adapter sequences) at both ends, respectively, and can be used for subsequent sequencing (eg, performing next-generation sequencing).
  • Figure 1 also shows an exemplary structure of a nucleic acid strand of a nucleic acid molecule (amplification product) to be sequenced in the library constructed by the above-described exemplary embodiment, which comprises: a second amplification primer sequence (eg, a P7 linker) sequence), the fourth tag sequence, the first consensus sequence, the first tag sequence, the transposase recognition sequence, the sequence of the cDNA fragment, the complementary sequence of the template switching sequence, the complementary sequence of the unique molecular tag sequence, the complementary sequence of the second tag sequence a complementary sequence, a complementary sequence to a second consensus sequence, and a complementary sequence to a first amplification primer sequence (eg, a P5 linker sequence); wherein the cDNA fragment comprises a sequence complementary to the 5' end of the RNA (eg, mRNA) sequence.
  • a second amplification primer sequence eg, a P7 linker sequence
  • the fourth tag sequence eg, the first consensus sequence, the first tag sequence, the transposas
  • FIG 2 shows the results of gel electrophoresis of the products of mouse genomic DNA transposition with the Tn5 transposase complex.
  • the experimental results show that the Tn5 transposase complex can break mouse genomic DNA (full-length 23kb) into 300-600bp bands.
  • Figure 3 shows the relationship between the number of cells captured (recovered) and the rate of pseudosingle cells versus the number of cells loaded when using the 10X Genomics chromium platform and kit to construct a library for 5' transcriptome sequencing. The results showed that there was a linear functional relationship between the pseudo-single cell rate and the number of cells on the machine: the more the number of cells on the machine, the higher the pseudo-single cell rate.
  • Figure 4 shows the number and ratio of droplets containing different numbers of first labels (1-9 labels) in water-in-oil droplets prepared using permeabilized cells or nuclei in Example 4. The results showed that the ratio of droplets containing two or more first tags was 34.63% (nucleus samples) or 42.15% (cellular samples).
  • Figure 5 shows the single-cell transcriptome analysis results obtained by using nuclei from HEK293T cells, Hela cells and K562 cells to perform 5'-end transcriptome library and sequencing in Example 4; wherein, Figure 5A shows the marker genes of each cell line The expression of ; Figure 5B shows the visualization results of the clustering of each cell line.
  • Figure 6 shows the single-cell transcriptome analysis results obtained by using permeabilized HEK293T cells, Hela cells and K562 cells to perform 5'-end transcriptome library and sequencing in Example 4; wherein, Figure 6A shows the marker genes of each cell line The expression of ; Figure 6B shows the visualization results of the clustering of each cell line.
  • Figure 7 shows the single-cell transcriptome analysis results obtained by using the permeabilized cell samples and cell nucleus samples of Hela cells in Example 5, using 3 different reverse transcription primers to perform 5'-end transcriptome library and sequencing, and the abscissa is sequencing. Depth, the ordinate is the number of detected genes.
  • Figure 8 shows the results of single-cell transcriptome data analysis in Example 6 using enriched T cells from 14 human peripheral blood.
  • Figure 8A shows the visualization results of various cell populations;
  • Figure 8B shows the number of cells in each cell population.
  • Figure 9 shows the results of data analysis of the TCR VDJ region in Example 6 using enriched T cells from 14 human peripheral blood.
  • Fig. 9A shows the visualization results of the T cell distribution corresponding to the detected different TCR clonotypes
  • Fig. 9B shows the number of cells of each population of the detected TCR clones.
  • Figures 10-11 show the basic information of TCR clones in Example 6 using enriched T cells from 14 human peripheral blood.
  • Figure 10 shows the distribution of the main TCR clonotypes detected in 14 samples, indicating that different human TCR clonotypes are diverse;
  • Figure 11A and Figure 11B show the detected TRB genes of 14 people ( Figure 11A ). ) and the distribution of TRA genes (Fig. 11B).
  • Example 1 Preparation of single-end specific oligonucleotide-tagged TN5 transposase complexes
  • 5Phos means phosphorylation at the 5' end
  • 3ddC means cytosine dideoxyribonucleotide at the 3' end.
  • Transposon linker 1 SEQ ID NO: 1
  • tagged transposon linker 2 SEQ ID NO: 2
  • the Annealing buffer in the Tagment Enzyme kit was dissolved to 100uM, and then the transposon adapter 1 and 96 kinds of tagged transposon adapter 2 (the tag sequences are shown in SEQ ID Nos: 3-98) were mixed at 1:1 respectively. Mix by volume.
  • the mixing steps are as follows: take 10ul of connector 1 and connector 2, respectively, and fully mix them in a 96-well PCR plate, cover the 96-well PCR plate with a sealed silica gel cover, and centrifuge briefly in a microplate mini centrifuge to ensure that the solution is completely mixed. converge to the bottom of the tube.
  • transposon adapter 1 and the tagged transposon adapter 2 after mixing in the above step (1) into the PCR machine, and carry out the following annealing reaction procedure: (hot cover 105 °C) 75 °C for 15 minutes, 60 °C for 10 minutes, 50 °C for 10 minutes, 40 °C for 10 minutes, and 25 °C for 30 minutes.
  • the annealed linker mixture is the transposon, which is stored at -20°C for future use.
  • the embedded TN5 transposase complex needs to be tested for fragmentation efficiency before it can be used in subsequent experiments.
  • complete mouse genomic DNA is used as the detection object.
  • reaction product of the above step (2) is subjected to agarose gel electrophoresis detection to verify the fragmentation efficiency of the TN5 transposase complex.
  • the experimental results are shown in Figure 2.
  • the experimental results show that the Tn5 transposase complex can break mouse genomic DNA (full-length 23kb) into 300-600bp bands.
  • fresh tissues, fresh cell lines, fresh blood samples, primary cells, cryopreserved cells, and liquid nitrogen quick-frozen tissues can be used for cell nucleus extraction.
  • HEK293T cell line purchased from the Cell Bank of the Chinese Academy of Sciences, catalog number GNHu17
  • Hela cell line purchasedd from the Cell Bank of the Chinese Academy of Sciences, catalog number TCHu187
  • K562 cell line purchased from the Cell Bank of the Chinese Academy of Sciences, catalog number SCSP -5054
  • step (2) Resuspend the precipitate obtained in step (1) with 50 ul of cell nucleus dilution solution, take a small amount and examine it in a microscope, and filter it with a 40um cell sieve if there is cell clumps.
  • the prepared cell nuclei were placed on ice for later use, and 1 ul was taken after mixing thoroughly and counted with an automatic fluorescent cell counter. The nucleus extraction process will lose around 40-50% of the cells.
  • Reagent/Instrument Name brand article number methanol Fisher Chemical M/4000/17 1x PBS Gibco 14190-094 BSA bovine serum albumin Sigma A8806-5 SUPERase-In RNase Inhibitor Thermo Fisher Scientific AM2696 Nuclease-free water Invitrogen AM9932 Flowmi Cell Strainer, 40 ⁇ m Bel-Art H13680-0040 1.5ml DNA low adsorption tube Eppendorf 30108051 centrifuge Thermo Fisher Scientific Micro21R Automatic Fluorescence Cell Counter LUNA LUNA FL
  • fresh tissues, fresh cell lines, fresh blood samples, primary cells, and frozen cell samples can be used to build a complete single cell bank.
  • the cells need to be permeabilized.
  • cell permeabilization is carried out for HEK293T cell line, Hela cell line and K562 cell line, and the permeabilization experimental steps are as follows:
  • Example 2 After counting the single cell nuclei obtained in Example 2 and the permeabilized cell samples obtained in Example 3, take 200,000 of each into a 200ul PCR tube, add 3ul 25uM reverse transcription primer (SEQ ID NO: 100), and the total reaction The system is 10ul, and the insufficient part is supplemented with nuclease-free water.
  • the reverse transcription reaction system of this experiment can usually be used for reverse transcription reaction of 50,000-500,000 cells/nuclei. 200,000 cells/nucleus were exemplarily used in this experiment. However, it is readily understood that if fewer or more cells/nuclei are to be used, the volume of the reverse transcription reaction system can be reduced or increased as required.
  • the cDNA product obtained in the above step 1 was subjected to a transposition reaction with the TN5 transposase complex labeled with the single-end specific oligonucleotide prepared in Example 1, and the first tag sequence was loaded.
  • the experimental steps are as follows:
  • the reaction solution in each well is as follows: 4ul 5x Reaction Buffer (purchased from Vazyme, product number: S601-01), 2000-10000 cell nuclei/reaction, add nuclease-free water to 18.2ul.
  • the reaction solution in each well is as follows: 4ul 5x Reaction Buffer (purchased from Vazyme, product number: S601-01), 2000-10000 cells/reaction, 0.2ul 1% Digitonin, add nuclease-free water to 18.2ul .
  • reaction stop solution 100mM Tris-HCl pH8.0, 200mM EDTA
  • 10X genomics chromium platform and 10X Single Cell 5'Gel Beads are used as examples to prepare water-in-oil microdroplets, and each microdroplet is given a unique tag label (second tag sequence). Beads for water-in-oil preparation and cell barcodes can be replaced by other platforms.
  • the experimental steps are as follows:
  • PCR amplification reaction system in a 200ul PCR tube: 50ul NEBNext High-Fidelity 2x PCR Master Mix, 0.5ul 100mM S-P5 primer (SEQ ID NO: 102), 35ul of the product purified in the above step 4, 11.5ul nucleic acid-free Enzyme water. After mixing, place it in a PCR machine.
  • the reaction conditions are: (hot cover 105°C) 72°C for 3 minutes, 98°C for 45 seconds, 13 cycles (depending on the number of cells loaded) [98°C for 20 seconds, 67°C for 30 seconds, 72°C for 1 minute], 72°C for 1 minute, and temporary storage at 4°C.
  • reaction system is as follows: 50ul KAPA HiFi HotStart 2X ReadyMix, 1ul 100mM S-bio-P5 primer (SEQ ID NO:103), 4ul 25mM S-P7 primer (SEQ ID NO:103) ID NO: 104, take 4 kinds of 25mM S-P7 primers (the tag sequences of the 4 kinds of P7 primers are shown in SEQ ID NOs: 105-108 respectively), each kind takes 1ul), 20ul of the product of the above step 6, 25ul Nuclease-free water. After mixing, place it in the PCR instrument.
  • the reaction conditions are: (hot cover 105°C) 98°C for 45 seconds, 8 cycles (the number of cycles can be adjusted according to the product concentration in the above step 6) [98°C for 20 seconds, 54°C for 30 seconds, 72 °C for 20 seconds], 72 °C for 1 minute, and temporarily stored at 4 °C.
  • the reaction system is as follows: 50ul KAPA HiFi HotStart 2X ReadyMix, 1ul 100mM S-P5 primer (SEQ ID NO:102), 4ul 25mM S-P7 primer (take 4 kinds of 25mM S-P7 P7 primers, 1ul of each), 20ul of the product of step 6 above, and 25ul of nuclease-free water.
  • reaction conditions (hot cover 105°C) 98°C for 45 seconds, 8 cycles (the number of cycles can be adjusted according to the product concentration in the above step 7) [98°C for 20 seconds, 54°C for 30 seconds, 72 °C for 20 seconds], 72 °C for 1 minute, and temporarily stored at 4 °C.
  • the product of the previous step was purified and fragmented with 0.55X and 0.2X SPRIselect magnetic beads. Finally, a sequencing library with a fragment size of about 300-600 bp was obtained.
  • the constructed library was sequenced with NovaSeq 6000 (Illumina, San Diego, CA), with a read length of 150 bp paired-end sequencing, and each cell measured 50,000 reads.
  • the 10X Genomics chromium platform and kit for 5'-end transcriptome sequencing During the library construction process, the actual number of cells captured is usually the number of cells used to prepare water-in-oil droplets. around 57% (i.e. capture rate around 57%). Moreover, there is a linear functional relationship between the pseudosingle cell rate and the number of cells on the machine: that is, the more the number of cells on the machine, the higher the pseudosingle cell rate (see Figure 3).
  • the results of the analysis also showed that for the experiments using nuclei, the proportion of droplets containing only 1 first label was 65.37%; the proportion of droplets containing 2 first labels was 25.93%; the proportion of droplets containing 3 first labels was 65.37%; The ratio of droplets was 6.95%; the ratio of droplets containing more than 3 first labels was 1.75% (see Figure 4).
  • the ratio of droplets containing two or more first labels was 34.63%.
  • the number of first labels in a single droplet basically reflects the number of nuclei in a single droplet (the case where a single droplet contains two or more nuclei with the same first label is not considered).
  • the ratio of droplets containing two or more cells was 34.63%. Similar results were obtained for experiments using permeabilized cells: 57.85% of droplets containing only 1 first label; 28.71% of droplets containing 2 first labels; The ratio of droplets for one label was 9.82%; the ratio of droplets containing more than 3 first labels was 3.62% (see Figure 4). In other words, for experiments using permeabilized cells, the ratio of droplets containing two or more first labels was 42.15%.
  • HEK293T cells three human cell lines: HEK293T cells, Hela cells and K562 cells were mixed and sequenced.
  • the sequencing results showed that each cell line had unique highly expressed genes, and the sequencing data obtained using permeabilized cell samples (Fig. 5A) and nuclear samples (Fig. 6A) were highly consistent.
  • a dimensionality reduction visualization analysis was performed on the expression matrix of the sequencing data.
  • the results showed that the sequencing data obtained from the permeabilized cell samples (Fig. 5B) and the nuclear samples (Fig. 6B) were able to distinguish the three cell lines well (i.e., all three cell lines could be clearly divided into three independent group).
  • Example 5 Effects of different reverse transcription primers on the quality of single-cell transcriptome data of nuclei and permeabilized cells
  • the basic steps of this example include the basic steps of Example 2 single cell nucleus preparation, 3 single cell suspension permeabilization, and 4 single cell transcriptome library preparation. The specific differences are described in the description:
  • a cell line was selected for testing.
  • Hela cell line purchased from the Cell Bank of the Chinese Academy of Sciences, catalog number TCHu187
  • the cells were permeabilized according to the method of Example 3.
  • permeabilized cell samples and cell nucleus samples were respectively placed in three 200ul PCR tubes at an amount of 50,000/tube, and 3ul of the reverse transcription primers (poly T primers, random primers, or mixed primers), a total of 6 experiments were performed.
  • the total reaction system of a single tube is 10ul, and the insufficient part is supplemented with nuclease-free water.
  • TN5 transposase complex labeled with the single-end specific oligonucleotide prepared in Example 1 to perform a transposition reaction on the six reverse transcriptase products obtained in the above step 3, load the first tag sequence, and each reverse transcriptase product is
  • the TN5 transposase complexes labeled with 8 different single-end specific oligonucleotides were used for transposition labeling respectively.
  • the samples were gently resuspended in 20ul of sample diluent (1x PBS, supplemented with 1% BSA, 1% SUPERase-In RNase Inhibitor) and counted.
  • step 4 28,000 cells and nuclear products obtained in step 4 were respectively taken for water-in-oil microdroplet preparation and template replacement reaction. For the remaining experimental steps and conditions, refer to step 3 of Example 4.
  • the constructed library was sequenced with NovaSeq 6000 (Illumina, San Diego, CA), with a read length of 150bp paired-end sequencing, and a total of 125G of raw data was measured.
  • the data obtained in this example is shown in Fig. 7.
  • the data shows that: 3 different reverse transcription primers can achieve nucleic acid detection of permeabilized cells or nuclei; in addition, no matter the sequencing depth is 100G or 125G, the same primers are used when using the same primers.
  • permeabilized cell samples a higher number of genes can be detected in permeabilized cell samples than in nuclear samples under conditions of 100 ⁇ ; a higher number of genes can be detected using poly T primers than random primers in permeabilized cell samples; For nuclear samples, more genes were detected using a mixture of poly T primers and random primers than using only a single primer.
  • Example 6 This single-cell sequencing technology is compatible with the enrichment of immune cell VDJ sequences
  • the enriched VDJ region can be obtained from the enriched cDNA products. (containing the BCR region of B cells, or the TCR region of T cells).
  • This example only takes human peripheral blood-derived T cells as an example, but this method is also applicable to the enrichment and sequencing of VDJ regions of immune cells such as T cells and B cells from other sources and species, as well as the capture of other target genes.
  • the enrichment method provided in this example is as follows: according to the characteristics of the target gene or fragment, a specific primer for the target fragment needs to be designed, and the target fragment is enriched with common specificity with the S-P5 primer, and the second enrichment method is obtained in this way.
  • Tag-conjugated target fragments are simple and easy to implement.
  • 3ml of human peripheral blood was diluted with an equal amount of PBS, followed by gradient centrifugation according to the instructions of Ficoll Paque PLUS to separate and extract PBMC.
  • CD3 antibody incubation resuspend the isolated PBMC with 100ul washing solution (1XPBS, add 2% BAS), add 5ul APC anti-human CD3 antibody, incubate on ice for 30 minutes in the dark, wash with 500ul washing solution (1XPBS, Add 2% BAS) and wash twice, and centrifuge at 350g to remove the supernatant.
  • T cell permeabilization See Example 3 for the permeabilization process.
  • the peripheral blood of 14 humans was enriched according to the above method to obtain T cells, and the single-cell transcriptome library was prepared.
  • the preparation method see Example 4.
  • each healthy human was transduced with TN5 labeled with 12 different single-end specific oligonucleotides.
  • Transposase complexes were used for transposition labeling, and each cancer patient was transposed with 6 different single-end specific oligonucleotide-labeled TN5 transposase complexes for transposition labeling; 67,000 were collected after the single-end transposition reaction.
  • Cells all cells were used for water-in-oil droplet preparation and template displacement reactions.
  • the first round of nested PCR prepare a PCR amplification reaction system in a 200ul PCR tube: 50ul KAPA HiFi HotStart 2X ReadyMix, 5ul 10mM T MIX 1 (Human TCR outer-1 primer, SEQ ID NO: 117 and Human TCR outer-2 primer, SEQ ID NO: 118 mixed in an equimolar ratio), 5ul of 10mM S-P5 primer (SEQ ID NO:102), 5ul of the cDNA enrichment product of the above step, and 35ul of nuclease-free water. After mixing, place it in the PCR machine.
  • the reaction conditions are: (hot cover 105°C) 98°C for 45 seconds, 12 cycles (the number of cycles is adjusted according to the concentration of the cDNA enriched product) [98°C for 20 seconds, 67°C for 30 seconds, 72 °C for 60 seconds], 72 °C for 1 minute, and temporary storage at 4 °C.
  • the product of the previous step was purified and fragment screened with 0.5X and 0.3X SPRIselect magnetic beads, and the beads were eluted with 25ul EB. In order to enrich the fragments of about 600-1000bp.
  • the second round of nested PCR prepare a PCR amplification reaction system in a 200ul PCR tube: 50ul KAPA HiFi HotStart 2X ReadyMix, 5ul 10mM T MIX 2 (Human TCR inner-1 primer, SEQ ID NO: 119 and Human TCR inner-2 primer, SEQ ID NO:120 mixed in equimolar ratio), 5ul 10mM S-P5 primer (SEQ ID NO:102), 25ul first-round nested PCR product, 15ul nuclease-free water.
  • reaction conditions (hot cover 105°C) 98°C for 45 seconds, 10 cycles (the number of cycles is adjusted according to the concentration of the first-round nested PCR product) [98°C for 20 seconds, 67°C for 30 seconds seconds, 60 seconds at 72°C], 1 minute at 72°C, and temporary storage at 4°C.
  • Amplified products can be constructed by traditional transcriptional library construction methods.
  • Chromium Single Cell 5'Library Construction Kit is used to construct libraries.
  • nuclease-free water Take 50ng of the amplified product, add nuclease-free water to make up to 20ul, and add the fragmentation reaction solution (containing 5ul fragmentation buffer, 15ul Fragmentation Enzyme Blend, and 15ul nuclease-free water). Mix well on ice and place it in a PCR instrument.
  • the reaction conditions are: (hot cover 65°C) 32°C for 2 minutes, 65°C for 30 minutes, and temporary storage at 4°C.
  • reaction solution of end repair and linker including 20ul Ligation Buffer, 10ul DNA Ligase, 2.5ul Adaptor Mix, 17.5ul nuclease-free water
  • end repair and linker including 20ul Ligation Buffer, 10ul DNA Ligase, 2.5ul Adaptor Mix, 17.5ul nuclease-free water
  • the reaction conditions (Hot lid at 30°C) 20°C for 15 minutes, temporarily stored at 4°C.
  • the product of the previous step was purified using 0.8X SPRIselect magnetic beads and eluted with 30ul EB.
  • reaction solution 70ul reaction solution, and the system is as follows: 50ul KAPA HiFi HotStart 2X ReadyMix, 2ul SI-PCR Primer, 10ul individual Chromium i7 Sample Index, 8ul nuclease-free water). After mixing, put it into the PCR machine.
  • the reaction conditions are: (hot cover 105°C) 9 cycles of 98°C for 45 seconds [98°C for 20 seconds, 54°C for 30 seconds, 72°C for 20 seconds], 72°C for 1 minute, and temporary storage at 4°C .
  • the product of the previous step was purified with 0.8X SPRIselect magnetic beads.
  • the constructed library was sequenced with NovaSeq 6000 (Illumina, San Diego, CA), with a read length of 150 bp paired-end sequencing, with 12,500 reads per cell.
  • the data obtained in this example shows that a total of 41,337 cells were detected in the transcriptome data obtained in this example, and 4,719 single cells with high TCR expression were enriched at the current sequencing depth.
  • Analysis of single-cell transcriptomic data of enriched T cells from 14 human peripheral blood showed that 12 T cell types were detected, and 12 (100%) T cell types carrying TCR information were detected. See Figure 8 for visualization results and cell numbers.
  • the corresponding TCR clonotypes were detected in the VDJ library sequencing data.
  • the visualization results and corresponding cell numbers of different TCR clonotypes detected in the VDJ library sequencing data are shown in Figure 9.
  • the results in Figure 9 show that the proportion of cells with TCR detected in various cell types is proportional to the number of cells of this type.
  • the proportion of cells detected by transcriptome data was consistent.
  • the distribution of the main TCR clonotypes detected in 14 samples was further analyzed, showing that the TCR clonotypes of different people are diverse (see Figure 10); in addition, the TRB detected in the 14 samples was analyzed respectively.
  • the distribution of genes and TRA genes showed that there was no bias in the distribution of TRB genes and TRA genes of TCR detected in different samples (see Figure 11A and Figure 11B ).

Abstract

提供一种处理细胞或细胞核以产生核酸片段群的方法,以及利用所述核酸片段群来生成经标记的核酸分子,构建用于转录组测序的核酸分子文库,或,对单细胞转录组进行高通量测序的方法;还提供利用所述方法构建的核酸分子文库,以及用于实施所述方法的试剂盒。

Description

用于标记核酸分子的方法和试剂盒 技术领域
本申请涉及转录组测序(transcriptome sequencing),特别是高通量单细胞转录组测序的技术领域。具体而言,本申请涉及一种处理细胞或细胞核以产生核酸片段群的方法,以及利用所述核酸片段群来生成经标记的核酸分子,构建用于转录组测序的核酸分子文库,或,对单细胞转录组进行高通量测序的方法。此外,本申请还涉及,利用所述方法构建的核酸分子文库,以及用于实施所述方法的试剂盒。
背景技术
细胞作为生物体基本的结构和功能单位,其功能和异质性研究一直是生物学领域的一大挑战。单细胞组学测序技术的发展,推进了人类对细胞多样性和异质性的认知,对发育生物学、肿瘤等疾病、辅助生殖、免疫学、神经科学、微生物等多个生物学和生物医学研究领域的发展起到了革命性的推动。单细胞测序主要包括单细胞基因组测序、转录组测序、甲基化测序、染色质可及性测序以及包含以上组学信息的单细胞多组学测序等。其本质就是通过对单个细胞内的DNA和RNA的序列、拷贝数量、修饰状态、相互作用进行分析,揭示单个细胞的基因组、转录组、甲基化、染色质开放状态等组学变化情况。
单细胞转录组测序目前应用最为广泛,能够获得某一时刻单个细胞内的转录组信息。简言之,该方法包括,对某一时刻单个细胞内的转录组进行逆转录以得到cDNA,然后对cDNA进行扩增放大,构建测序文库,并进行测序,从而得到特定细胞的转录本信息。单细胞转录组技术的出现使得研究精度从组织多细胞层面精确到单个细胞,可以单独研究某个细胞或者某群细胞具体的特征,对研究细胞发育、肿瘤微环境、单细胞图谱绘制等方面发挥了关键作用。自2009年第一个单细胞转录组建库技术建立以来,随着单细胞组学测序技术需求的增加,大量单细胞转录组建库技术如雨后春笋涌现,从一开始的低通量、可检测基因数低的建库方法,发展到数据质量提高、通量变大的建库方法。科研人员正在追求更大通量且更低成本的单细胞测序技术手段,以便于更加细致的刻画不同样品的细胞异质性,分析单细胞基因调控网络,描绘样品的细胞全景。
目前,高通量单细胞转录组建库技术主要包括以下几种。
(1)在微流控液滴中进行细胞条形码标记的高通量单细胞转录组建库技术。该类建库技术以2015年哈佛大学的David Weitz等开发的inDrop系统(Cell,2015,161(5):p.1187-1201)和Drop-seq系统(Cell,2015,161(5):p.1202-1214),以及10X Genomics公司的Chromium平台为代表。该类技术的基本原理都是,利用微流控系统产生单个细胞加单个微球的纳升级油包水反应体系。每个微球表面具有数百万的具有条形码的多聚胸腺嘧啶(PolyT)的引物。条形码包括每个微球特异的细胞条形码(Cell barcode)和每个单分子特异的单分子条形码(UMI)。在一次操作中,可实现对数千个细胞同时进行逆转录反应,实现单细胞和单分子的标记,并对产生的cDNA合并建库进行测序。该建库技术最主要的缺点是细胞通量低,微反应体系空载率高,样品通量低,建库成本高。
(2)在微孔板中进行细胞条形码标记的高通量单细胞转录组建库技术。该类建库技术以2017年Alex K Shalek等开发的Seq-Well技术(Nature Methods,2017.14(7):p.752-752)以及2018年国内浙江大学郭国骥团队建立的Microwell-seq平台(Cell,2018.172(5):p.1091-1107)为代表。该类技术的基本原理是,利用带有数十万个微孔的芯片或微孔板为载体,结合带有多聚胸腺嘧啶(PolyT)和细胞条形码的磁珠进行单细胞捕获:单细胞悬液加样到微孔板后会落到微孔中,洗掉游离的单细胞样品后,再加入带有多聚胸腺嘧啶和细胞条形码的磁珠。每一个小孔相当于一个独立的反应腔,加入裂解液进行细胞裂解后,不同孔里的单细胞释放出的RNA会吸附到磁珠的引物上被不同的细胞条形码标记。随后将吸附有RNA的磁珠转移到EP管中进行逆转录,实现对数千个细胞的RNA分子的同时标记,再对cDNA扩增并构建文库。该建库技术最主要的缺点是细胞通量低,样品通量低,操作过程繁琐,建库成本高。
(3)基于分离-汇聚多轮标签组合标记的高通量单细胞转录组建库技术。该类建库技术以2018年A.B.Rosenberg团队建立的基于分离-汇聚链接的SPLiT-seq(split-pool ligation-based transcriptomem sequencing)技术(Nature Methods,2018.15(12):p.1126-1126)为代表。该技术不依赖于昂贵的微流体和微孔制备设备,而是通过多轮链接反应给细胞加上不同的条形码组合,实现上万个细胞的同时建库。该建库技术最主要的缺点是细胞通量不高,操作过程繁琐,建库成本高,且目前并未投入商业化应用。
经过多年的发展,高通量的单细胞转录组测序技术已取得了显著的进步。
例如,早期的单细胞转录组研究主要利用从完整的活细胞中捕获的mRNA,对新鲜样本从取样到开始建库之间的时间要求严苛。这使得单细胞转录组测序技术的应用大大受限。单细胞核转录组测序技术的出现突破了这一限制。该技术可以从冷冻组织中直接抽提细胞核,并捕获单细胞核中的RNA进行测序,且由细胞核产生的单细胞转录组数据信息与完整细胞相当。单细胞核转录组测序技术突破了样本限制,使冷冻样本,特别是冷冻临床样本的单细胞转录组研究成为可能。
另外,为了提高样品通量(即,在单轮实验中实现对多个样品来源的细胞进行转录组建库和测序),已基于额外的标签开发了多重混样标记策略(sample multiplexing)。在此类方法中,可对样本预先加上样本条形码,然后对多个样品进行混合建库和测序。在测序完成后,借助于样本条形码,可以从测序数据中拆分得到多个样品的单细胞转录组信息。基于额外标签和微流控液滴的高通量转录组建库技术的代表性例子包括,10X Genomics公司基于BioLegend公司的TotalSeq TM抗体开发的Feature Barcoding技术(Single Cell 3′Feature Barcode Library Kit,#PN-1000079;Single Cell 5'Feature Barcode Kit,#PN-1000256),以及Zev J.Gartner团队于2019年报告的MULTI-seq技术(McGinnis,C.S.,et al.,Nature Methods,2019.16(7):p.619-626)。此类技术以10X Genomics Chromium和Fluidigm C1等微流控技术平台为基础,在制备细胞条形码标记的单细胞微液滴之前,首先对各细胞样品分别进行一轮特定标签标记,再将标记上不同标签的多个样品混合起来进行微液滴的制备。例如,在基于Feature Barcode技术的建库方案中,将能够特异性结合细胞膜表面的不同蛋白的TotalSeq TM系列抗体分别偶联一段含有特异性标签和能够与10X Genomics微珠条形码序列互补配对的序列的寡核苷酸。由此,可以将不同的细胞样品用不同的TotalSeq TM抗体提前标记。这些经标记的样品可以混合上机,进行标准的10X Genomics转录组建库,且随后可以用Feature Barcoding试剂盒进行文库富集,并进行测序。借助于由TotalSeq TM抗体引入的特异性标签,可确定单细胞转录组数据来源于哪个样品。在基于MULTI-seq技术的建库方案中,利用能够亲和吸附细胞膜且携带有特异性寡核苷酸序列的脂质体来标记不同的样品,然后将这些经标记的样品混合上机,并进行转录组建库和测序。借助于Feature Barcoding技术或MULTI-seq技术,可以对来源于不同样品的多个细胞进行高通量的转录组建库和测序。
此外,对于在微流控液滴或微孔板中进行细胞条形码标记的高通量单细胞转录组建库技术而言,如果单个液滴或微孔中含有两个或更多个细胞,那么从该液滴或微孔获得的测序结果将无法准确反映单个细胞的转录组信息,无法使用。因此,在单细胞转录组建库过程中,需要尽量避免假单细胞的情形(即,避免单个液滴或微孔中含有两个或更多个细胞)。以10X Genomics Chromium平台的标准技术方案为例,为了将假单细胞率控制到一个可接受的范围内(例如5%),通常推荐将细胞通量限制在单次反应1万细胞以下。这意味着单次反应产生的数量大约为十万个的包含了试剂和磁珠的液滴,有效利用率不足10%。绝大多数液滴是空载的,不包含细胞,形成巨大的浪费。为了解决这一问题,Christoph Bock团队建立了scifi-RNA-seq方法(bioRxiv,2019:p.2019.12.17.879304.)。该方法的建库流程包括:将细胞样品分为多份,并用带有特异寡核苷酸序列标签的逆转录引物对各份细胞样品分别进行逆转录,使各份样品中的核酸分子分别获得第一轮标记;然后,将这些样品混合,并利用10X Genomics的scATAC试剂盒进行建库,使文库中的核酸分子加上第二轮标记(包括细胞条形码和单分子条形码)。借助于第一轮和第二轮标记的组合,该方法可以将10X Genomics Chromium平台的标准技术方案提高大约15倍的细胞通量。然而,该方法只能用于3’端建库,只能采集转录组mRNA分子的3’端信息,而无法获得mRNA分子的5’端信息。
根据文库富集的RNA信息的差异,高通量单细胞转录组末端测序建库技术可分为:用于转录组测序的5’端建库技术和3’端建库技术。这两种建库技术都可用于非全长的mRNA末端测序,但它们是两种不同的技术:3’端建库技术用于富集和测定转录组mRNA分子的3’端信息,而5’端建库技术用于富集和测定转录组mRNA分子的5’端信息,可用于提供转录起始位置信息;二者实现的目标不同,适用于不同的场景。目前,10X Genomics针对这两种建库技术,分别推出了不同的试剂盒:Chromium Next GEM Single Cell 3′GEM,Library&Gel Bead Kit,#PN-1000075;和Chromium Single Cell 5'Library&Gel Bead Kit,#PN-1000006。
T淋巴细胞(T cell)和B淋巴细胞(B cell)主要负责适应性免疫应答,其主要依赖于T细胞受体(TCR)和B细胞受体(BCR)来识别抗原。这两类细胞表面分子的共同特点是,它们具有多样性,可以识别多种多样的抗原分子。BCR的重链和TCRβ链由V、D、J、C四个基因片段组成,BCR的轻链和TCRα链由V、J、C三个基因 片段组成,这些基因片段在遗传过程中发生重组、重排,组合成不同的形式,保证了受体多样性。通过VDJ测序,可以探究免疫机制,挖掘免疫组库与疾病的关系。由于VDJ区域位于mRNA的5’端,使用5’端建库技术更容易富集T细胞受体(TCR)和B细胞受体(BCR)的全长V(D)J区域的序列。
目前,用于转录组测序的5’端建库技术的商业化方法(例如10X Genomics开发的5’端建库方案)的主要缺点是:细胞通量低,微反应体系空载率高,样品通量低,建库成本高。例如,基于10X Genomics的Feature Barcoding技术的5’端建库方案,虽然能实现单次反应对多个样品的标记,但需要额外的昂贵的Feature Barcoding试剂盒和TotalSeq TM抗体。此外,利用TotalSeq TM抗体引入的条形码标签,无法分解同一个微滴来源的不同细胞的转录组。因此,在分析测序数据时,只能将测序结果判定为“假单细胞”的测序数据抛弃。上机的细胞越多,需要去除的“假单细胞”数据就越多,浪费的测序成本也越高。因此,该5’端建库方案仍然无法解决细胞通量低和微反应体系空载率高的问题,转录组测序成本依然较高。
综上,现有的高通量单细胞转录组建库技术(特别是5’端建库技术)仍然存在下列缺陷:细胞通量低,微反应体系空载率高,建库成本高。因此,亟需开发新的高通量单细胞转录组建库技术(特别是5’端建库技术)。
发明内容
在本申请中,除非另有说明,否则本文中使用的科学和技术名词具有本领域技术人员所通常理解的含义。并且,本文中所用的核酸化学实验室操作步骤均为相应领域内广泛使用的常规步骤。同时,为了更好地理解本发明,下面提供相关术语的定义和解释。除非在本文别处具体限定或不同地描述,否则以下与本发明有关的术语和描述应按照下面给出的定义来理解。
当本文使用术语“例如”、“如”、“诸如”、“包括”、“包含”或其变体时,这些术语将不被认为是限制性术语,而将被解释为表示“但不限于”或“不限于”。
除非本文另外指明或根据上下文明显矛盾,否则术语“一个”和“一种”以及“该”和类似指称物在描述本发明的上下文中(尤其在以下权利要求的上下文中)应被解释成覆盖单数和复数。
如本文所用,术语“假单细胞”是指,在分析单细胞的转录组学实验中,一个微反应体 系(例如,一个油包水液滴或者一个微孔)中含有两个或者更多个细胞的情形。在“假单细胞”的情况下,同一个微反应体系(例如,同一个液滴或微孔)中的两个或者更多个细胞将被标记上相同的细胞特异性标签。这导致,仅利用微反应体系引入的细胞特异性标签,是无法对微反应体系中的各个细胞进行“一对一”的标识作用。相应地,由“假单细胞”微反应体系所产生的测序数据,由于其含有来源于两个或多个细胞的测序结果,不能用于分析单细胞的转录组信息。因此,在传统的高通量单细胞转录组测序方法中,需要从最终产生的测序数据中过滤或移除由“假单细胞”微反应体系所产生的测序数据;并且,为了避免测序数据的大量浪费,需要尽可能降低或控制“假单细胞”微反应体系的数量或比率。如本文所用,术语“假单细胞率”是指,“假单细胞”微反应体系(数量)占所有包含细胞的微反应体系(数量)的比率。
如本文所用,“细胞通量”是指对于给定的单细胞建库技术方案,单次建库反应能够同时进行标记的细胞数量。
如本文所用,“样品通量”是指对于给定单细胞建库技术方案,单次建库反应能够同时进行标记的样品数量。
如本文所用,可用于本发明方法的细胞或其细胞核(例如,可使用本发明方法进行处理以产生核酸片段群的细胞或其细胞核)可以是任何感兴趣的细胞或其细胞核,例如,癌细胞、干细胞、神经细胞、胎儿细胞和参与免疫应答的免疫细胞或其细胞核。所述细胞可以是一个细胞,也可以是多个细胞。所述细胞可以是相同类型的细胞混合,也可以是完全异质的不同类型细胞混合。不同的细胞类型可包括个体的不同组织(例如上皮组织、结缔组织、肌肉组织、神经组织)、体液(例如血液)等来源的细胞或不同个体的相同组织(例如上皮组织、结缔组织、肌肉组织、神经组织)、体液(例如血液)等来源的细胞或者来源于不同属、种、菌株、变体或任何或所有前述的任何组合的微生物的细胞。例如,不同的细胞类型可包括个体的正常细胞和癌细胞;获自人类受试者的各种细胞类型,例如多种免疫细胞;来自环境、法医、微生物组或其他样品的多种不同的细菌物种、菌株和/或变体;或细胞类型的任何其他各种混合物。
例如,可以使用本文描述的方法处理或分析的癌细胞的非限制性实例包括癌症细胞,例如棘皮瘤、痤疮细胞癌、听神经瘤、肢端黑色素瘤、肢端汗腺瘤、急性嗜酸细胞性白血病、急性淋巴细胞白血病、急性巨核细胞白血病、急性单核细胞白血病、成熟的急性成髓细胞白血病、急性髓性树突状细胞白血病、急性髓细胞白血病、急性早幼粒细胞白血病、 金刚病、腺癌、腺样囊性癌、腺瘤、腺瘤样牙源性肿瘤、肾上腺皮质癌、成人T细胞白血病、侵袭性NK细胞白血病、艾滋病相关癌症、艾滋病相关淋巴瘤、肺泡软组织肉瘤、成釉纤维瘤、肛门癌、间变性大细胞淋巴瘤、甲状腺未分化癌、血管免疫母细胞性T细胞淋巴瘤、血管平滑肌脂肪瘤、血管肉瘤、附件癌、星形细胞瘤、非典型畸胎瘤性横纹肌瘤、基底细胞癌、基底样癌、B细胞白血病、B细胞淋巴瘤、贝里尼导管癌、胆管癌、膀胱癌、胚细胞瘤、骨癌、骨肿瘤、脑干胶质瘤、脑肿瘤、乳腺癌、布伦纳瘤、支气管肿瘤、细支气管肺泡癌、棕色肿瘤、伯基特淋巴瘤、未知原发癌、癌类肿瘤、原位癌、阴茎癌、未知原发癌、癌肉瘤、巨淋巴结增生症、中枢神经系统胚胎肿瘤、小脑星形细胞瘤、脑星形细胞瘤、宫颈癌、胆管癌、软骨瘤、软骨肉瘤、脊索瘤、绒毛膜癌、脉络丛乳头状瘤、慢性淋巴细胞白血病、慢性单核细胞白血病、慢性粒细胞白血病、慢性骨髓增生性疾病、慢性中性粒细胞白血病、透明细胞瘤、结肠癌、结肠直肠癌、颅咽管瘤、皮肤T细胞淋巴瘤、德戈斯病、皮肤粘膜囊肿、皮样囊肿、增生性小圆形细胞肿瘤、弥漫性大B细胞淋巴瘤、胚胎发育不良神经上皮肿瘤、胚胎癌、内胚窦瘤、子宫内膜癌、子宫内膜样肿瘤、肠病相关T细胞淋巴瘤、室管膜母细胞瘤、室管膜瘤、上皮样肉瘤、红白血病、食道癌、鼻腔神经胶质瘤、尤文氏家族肿瘤、尤文氏家族肉瘤、尤文氏肉瘤、颅外生殖细胞肿瘤、外生殖细胞肿瘤、肝外胆管癌、乳腺外佩吉特氏病、输卵管癌、胎中胎、纤维瘤、纤维肉瘤、滤泡性淋巴瘤、滤泡性甲状腺癌、胆囊癌、胆囊癌、神经胶质瘤、神经节细胞瘤、胃癌、胃淋巴瘤、胃肠癌、胃肠道类癌肿瘤、胃肠道间质瘤、生殖细胞肿瘤、生殖细胞瘤、妊娠绒毛膜癌、妊娠滋养细胞肿瘤、骨巨细胞瘤、多形性胶质母细胞瘤、胶质瘤、脑胶质瘤病、血管球瘤、高血糖素瘤、成性腺细胞瘤、颗粒细胞瘤、毛细胞白血病、头颈癌、心脏癌、血管母细胞瘤、血管外皮细胞瘤、血管肉瘤、血液系统恶性肿瘤、肝细胞癌、肝脾T细胞淋巴瘤、遗传性乳腺癌综合征、霍奇金淋巴瘤、下咽癌、下丘脑神经胶质瘤、炎症性乳腺癌、眼内黑色素瘤、胰岛细胞癌、胰岛细胞瘤、青少年髓单核细胞白血病、卡波西肉瘤、卡波西氏肉瘤、肾癌、克拉茨金肿瘤、克鲁根勃瘤、喉癌、恶性雀斑样黑色素瘤、白血病、唇和口腔癌、脂肪肉瘤、肺癌、黄体瘤、淋巴管瘤、淋巴管肉瘤、淋巴上皮瘤、淋巴性白血病、淋巴瘤、巨球蛋白血症、恶性纤维组织细胞瘤、骨恶性纤维组织细胞瘤、恶性胶质瘤、恶性间皮瘤、恶性周边神经腱鞘瘤、恶性横纹肌样瘤、恶性氚核瘤、MALT淋巴瘤、套细胞淋巴瘤、肥大细胞白血病、纵隔生殖细胞瘤、纵隔肿瘤、甲状腺髓样癌、成神经管细胞瘤、成神经管细胞瘤、髓质口皮瘤、黑色素瘤、脑膜瘤、默克细胞癌、间皮瘤、转移 性鳞状颈癌伴隐匿性原发性、转移性尿路上皮癌、混合性缪勒肿瘤、单核细胞白血病、口腔癌、粘液瘤、多发性内分泌肿瘤综合征、多发性骨髓瘤、多发性骨髓瘤、骨髓增生异常病、骨髓增生异常综合征、骨髓性白血病、骨髓瘤、骨髓增生性疾病、粘液瘤、鼻腔癌、鼻咽癌、肿瘤、神经瘤、神经母细胞瘤、神经纤维瘤、神经瘤、结节性黑色素瘤、非霍奇金淋巴瘤、非黑色素瘤皮肤癌、非小细胞肺癌、眼肿瘤、少星形细胞瘤、少突神经胶质瘤、大嗜酸粒细胞瘤、视神经鞘膜瘤、口咽癌、骨肉瘤、卵巢癌、卵巢上皮癌、卵巢生殖细胞肿瘤、卵巢低恶性潜能肿瘤、派杰病、肺上沟瘤、胰腺癌、乳头状甲状腺癌、乳头状瘤病、副神经节瘤、鼻窦癌、甲状旁腺癌、阴茎癌、血管周围上皮样细胞瘤、咽癌、嗜铬细胞瘤、中间分化的松果体实质肿瘤、成松果体细胞瘤、垂体细胞瘤、垂体腺瘤、垂体瘤、浆细胞肿瘤、胸膜肺母细胞瘤、多胚胎瘤、前体T淋巴母细胞淋巴瘤、原发性中枢神经系统淋巴瘤、原发性积液淋巴瘤、原发性肝细胞癌、原发性肝癌、原发性腹膜癌、原始神经外胚层肿瘤、前列腺癌、腹膜假黏液瘤、直肠癌、肾细胞癌、涉及染色体15上的NUT基因的呼吸道癌、视网膜母细胞瘤、横纹肌瘤、横纹肌肉瘤、里氏转化、骶尾部畸胎瘤、唾液腺癌、肉瘤、神经鞘瘤病、皮脂腺癌、继发性肿瘤、精原细胞瘤、浆液性肿瘤、卵巢支持间质细胞瘤、性索间质瘤、西泽里综合征、印戒细胞癌、皮肤癌、小蓝圆细胞肿瘤、小细胞癌、小细胞肺癌、小细胞淋巴瘤、小肠癌、软组织肉瘤、生长抑素瘤、烟尘疣、脊髓肿瘤、脊柱肿瘤、脾边缘区淋巴瘤、鳞状细胞癌、浅表性扩散黑素瘤、幕上原始神经外胚层肿瘤、表面上皮-间质瘤、滑膜肉瘤、T细胞急性淋巴细胞白血病、T细胞大颗粒淋巴细胞白血病、T细胞白血病、T细胞淋巴瘤、T细胞淋巴细胞白血病、畸胎瘤、终末淋巴癌、睾丸癌、泡膜细胞瘤、胸腺癌、胸腺瘤、甲状腺癌、肾盂和输尿管移行细胞癌、移行细胞癌、脐尿管癌、尿道癌、泌尿生殖系肿瘤、子宫肉瘤、葡萄膜黑色素瘤、阴道癌、弗纳-莫里森综合征、疣状癌、视觉通路胶质瘤、外阴癌、瓦尔登斯特伦巨球蛋白血、淋巴瘤性乳头状囊腺瘤、韦尔姆斯氏瘤及其组合。
可以使用本文描述的方法处理或分析的免疫细胞的非限制性实例包括B细胞、T细胞(例如,细胞毒性T细胞、天然杀伤T细胞、调节性T细胞和T辅助细胞)、天然杀伤细胞、细胞因子诱导的杀伤(CIK)细胞;骨髓细胞,例如粒细胞(嗜碱性粒细胞、嗜酸性粒细胞、中性粒细胞/分叶过多的中性粒细胞)、单核细胞/巨噬细胞、肥大细胞、血小板/巨核细胞和树突细胞及其组合。
相应地,上述细胞的细胞核也可以使用本文描述的方法进行处理或分析。
如本文所用,“长链非编码RNA”(Long non-coding RNA,lncRNA)具有本领域技术人员所通常理解的含义,其与“lncRNA”可互换使用。长链非编码RNA是一类转录本长度超过200nt的RNA分子,其通常不编码蛋白质,以RNA的形式调控靶基因的表达水平。
如本文所用,“eRNA”(enhancer RNA)具有本领域技术人员所通常理解的含义,其代表一类从转录增强子区域由RNA pol II转录得到的RNA。
如本文所用,“核酸片段群”是指例如来源于靶核酸分子(例如DNA双链分子、RNA/cDNA杂合双链分子、DNA单链分子、或RNA单链分子)的核酸片段的群体或集合。在一些实施方案中,核酸片段群包括核酸片段文库,所述核酸片段文库包含性质上和/或数量上代表靶核酸分子序列的序列。在另一些实施方案中,核酸片段群包含核酸片段文库的子集。
如本文所用,“核酸分子文库”表示从靶核酸分子产生的经标记的核酸片段(例如经标记的DNA双链分子片段、经标记的RNA/cDNA杂合双链分子片段、经标记的DNA单链分子片段、或经标记的RNA单链分子片段)的集合或群体,其中,在该集合或群体中经标记的核酸片段的组合显示在性质上和/或数量上代表从中产生经标记的核酸片段的靶核酸分子的序列的序列。在优选的情况下,对于核酸分子文库而言,没有通过有意使用基于该靶核酸分子的核苷酸或序列组成包括或排除经标记的核酸片段的方法来选择接受或选择反对处于该集合或群体中加标记的核酸片段。
如本文所用,“cDNA”、“cDNA链”或“cDNA分子”是指使用感兴趣的RNA分子的至少一部分作为模板,通过RNA依赖性DNA聚合酶或反转录酶催化的与该感兴趣的RNA分子退火的引物的延伸而合成的“互补的DNA”(该过程也称为“反转录”)。所合成的cDNA分子与该模板的至少一部分“互补”或“碱基配对”或“形成复合物”。
如本文所用,“转座酶”表示如下的酶:该酶能够与包含转座子末端的组合物(例如,转座子、转座子末端、转座子末端组合物)形成功能复合物并催化该包含转座子末端的组合物插入或转座进入在转座反应(例如,体外转座反应)中与该酶孵育的双链核酸分子(例如DNA双链、RNA/cDNA杂合双链)中。非限制性转座酶的实例包括Tn5转座酶、MuA转座酶、睡美人转座酶、Mariner转座酶、Tn7转座酶、Tn10转座酶、Ty1转座酶、Tn552转座酶,以及具有上述转座酶的转座活性(例如,具有更高转座活性)的变体、修饰产物和衍生物。
如本文所用,术语“转座子末端”或“转座酶识别序列”表示在转座反应中与转座酶形成有功能的复合物所必需的核苷酸序列的双链核酸分子。在本文中,“转座子末端”和“转座酶识别序列”具有相同的含义,且可互换使用。转座子末端与识别并结合该转座子末端的转座酶形成“转座酶复合体”或“转座体复合物”或“转座体组合物”,并且该复合物能够将该转座子末端插入或转座进入在体外转座反应中与其孵育的靶双链核酸分子中。转座子末端包含由“转移的转座子末端序列”以及“非转移的转座子末端序列”构成的两条互补的序列。含有转移的转座子末端序列的核酸链称为“转移链”。含有非转移的转座子末端序列的核酸链称为“非转移链”。
在体外转座反应中,转移链的3’端与靶核酸分子(例如DNA分子、RNA分子)接合或转移至靶核酸分子。在体外转座反应中,与转移的转座子末端序列互补的转座子末端序列(即,非转移的转座子末端序列)的5’端不与靶核酸分子接合或不转移至靶核酸分子。
在一些实施方案中,转移链和非转移链是非共价接合的(例如通过碱基间形成的氢键连接)。在一些实施方案中,转移链和非转移链是共价接合的。例如,在一些实施方案中,在单个寡核苷酸上提供转移链序列和非转移链序列,例如在发夹构型中。这样,尽管非转移链的游离端(5’端)没有通过转座反应直接与靶DNA接合,但该非转移链间接与DNA片段相连,因为该非转移链通过发夹结构的环与转移链连接。
“转座子末端组合物”或“转座序列”表示包含转座子末端(即,能够与转座酶作用进行转座反应的最小双链DNA片段)任选地加上转移的转座子末端序列5’端的和/或非转移的转座子末端序列3’端的另外的序列的组合物。在本文中,“转座子末端组合物”和“转座序列”具有相同的含义,且可互换使用。例如,与第一标签序列和/或第一共有序列相连的转座子末端(转座酶识别序列)为“转座子末端组合物”或“转座序列”。在一些实施方案中,转座子末端组合物包括两种转座子末端寡核苷酸或由两种转座子末端寡核苷酸构成,所述转座子末端寡核苷酸由联合显示转座子末端的序列并且其中的一条或两条链包括另外的序列的“转移的转座子末端寡核苷酸”或“转移链”以及“非转移链末端寡核苷酸”或“非转移链”构成。
术语“转移链”是指“转座子末端”和“转座子末端组合物”二者的转移部分,即不考虑转座子末端是否与标记序列或其他部分相连。类似地,术语“非转移链”是指“转座子末端”和“转座子末端组合物”二者的非转移部分。在一些实施方案中,转座子末端组合物或转座 序列是由两条单寡核苷酸链通过碱基间氢键连接形成线性双链提供的。在一些实施方案中,所述转座子末端组合物或转座序列中非转移链5′端存在磷酸化修饰。在一些实施方案中,所述转座子末端组合物或转座序列中非转移链3′端核苷酸是封闭的(例如双脱氧核苷酸)。在一些实施方案中,转座子末端组合物是“发夹转座子末端组合物”,“发夹转座子末端组合物”表示由单个寡脱氧核糖核苷酸构成的转座子末端组合物,所述寡脱氧核糖核苷酸显示在其5’端的非转移链转座子末端序列、在其3’端的转移的转座子末端序列以及足够长的允许分子内茎-环形成的在该非转移的转座子末端序列与该转移的转座子末端序列之间的间插的任意序列,以使得该转座子末端部分能够在转座反应中发挥功能。在一些实施方案中,发夹转座子末端组合物的5’端在5’端核苷酸的5’位置具有磷酸基团。在一些实施方案中,在发夹转座子末端组合物的非转移的转座子末端序列与转移的转座子末端序列之间的间插的任意序列提供用于特定用途或应用的标记序列。
如本文中所使用的,术语“上游”用于描述两条核酸序列(或两个核酸分子)的相对位置关系,并且具有本领域技术人员通常理解的含义。例如,表述“一条核酸序列位于另一条核酸序列的上游”意指,当以5'至3'方向排列时,与后者相比,前者位于更靠前的位置(即,更接近5'端的位置)。如本文中所使用的,术语“下游”具有与“上游”相反的含义。
如本文所用,“标记序列”(例如“第一标签序列”、“第二标签序列”、“第三标签序列”、“第四标签序列”、“独特分子标签序列”、“第一共有序列”、“第二共有序列”、“第一扩增引物序列”、“第二扩增引物序列”、“模板转换序列”等)是指向它所接合的核酸片段或其接合的核酸片段的衍生产物(例如,核酸片段的互补片段、核酸片段的断裂短片段等)提供鉴定、识别和/或分子操作或生物化学操作手段(例如,通过提供用于使寡核苷酸退火的位点,所述寡核苷酸诸如用于DNA聚合酶延伸的引物或者用于捕获反应或连接反应的寡核苷酸)的非靶核酸组分的寡核苷酸。标记序列可以由连续的至少两个(优选大约6到100个,但是对寡核苷酸的长度没有确定的限制,确切大小取决于许多因素,而这些因素又取决于寡核苷酸的最终功能或用途)核苷酸组成,也可以由多段寡核苷酸连续或非连续排列组合而成。标记序列可以对于其接合的每个核酸片段是唯一的,也可以对于其接合的某一类核酸片段是唯一的。标记序列可以通过任何方法包括连接、杂交或其他方法与待“标记”的多核苷酸序列可逆或不可逆地接合。将标记序列与核酸分子接合的过程有时在本文称为“添加标记”并且经历添加标记或含标记序列的核酸分子称为“经标记的核酸分子”。
出于多种原因,本发明的核酸或多核苷酸(例如标记序列、转座酶识别序列、第一引物、第二引物、第三引物、第四引物)可包括一种或多种修饰的核酸碱基、糖部分或核苷间连接。例如,使用包含修饰的碱基、糖部分或核苷间连接的核酸或多核苷酸的一些原因包括但不限于:(1)Tm的改变;(2)改变多核苷酸对一种或多种核酸酶的易感性;(3)提供用于连接标记的部分;(4)提供标记或标记猝灭剂;或(5)提供用于连接溶液中或结合于表面的另一种分子的部分,诸如生物素。例如,在一些实施方案中,可将寡核苷酸诸如引物合成为使得随机部分包含一种或多种构象受限制的核酸类似物,诸如但不限于其中的核糖环被连接2’-O原子与4’-C原子的亚甲基桥“锁定”的一种或多种核糖核酸类似物;这些修饰的核苷酸导致每个核苷酸单体的Tm或解链温度提高大约2摄氏度到大约8摄氏度。例如,在其中使用包含核糖核苷酸的寡核苷酸引物的一些实施方案中,在该方法中使用修饰的核苷酸的一个指标可以是包含该修饰的核苷酸的寡核苷酸可以被单链特异性RNA酶消化。
在本发明的方法中,例如,在多核苷酸或寡核苷酸中的一个或多个位置的单核苷酸中的核酸碱基可包括鸟嘌呤、腺嘌呤、尿嘧啶、胸腺嘧啶或胞嘧啶,或者可选地,所述核酸碱基中的一种或多种可包含修饰的碱基,诸如但不限于黄嘌呤、烯丙氨基(allyamino)-尿嘧啶、烯丙氨基-胸腺嘧啶核苷、次黄嘌呤、2-氨基腺嘌呤、5-丙炔基尿嘧啶、5-丙炔基胞嘧啶、4-硫尿嘧啶、6-硫鸟嘌呤、氮尿嘧啶和脱氮尿嘧啶、胸腺嘧啶核苷、胞嘧啶、腺嘌呤或鸟嘌呤。此外,它们可包含用如下部分衍生的核酸碱基:生物素部分、洋地黄毒苷部分、荧光部分或化学发光部分、猝灭部分或某种其他部分。本发明不限于列出的核酸碱基;给出的这份名单示出了可用于本发明方法中的范围广泛的碱基的实例。
就本发明的核酸或多核苷酸来说,糖部分中的一个或多个可包括2′-脱氧核糖,或者可选地,糖部分中的一个或多个可包括某种其他糖部分,诸如但不限于:提供对一些核酸酶的抵抗力的核糖或2’-氟代-2’-脱氧核糖或2’-O-甲基-核糖,或可通过与可见的、荧光的、红外荧光的或其他可检测的染料或具有亲电子的、光反应性的、炔基或其他反应性化学部分的化学物质进行反应而标记的2’-氨基2’-脱氧核糖或2’-叠氮基-2’-脱氧核糖。
本发明的核酸或多核苷酸的核苷间连接可以是磷酸二酯键连接,或者可选地,核苷间连接中的一种或多种可包括修饰的连接,诸如但不限于:硫代磷酸酯、二硫代磷酸酯、硒代磷酸酯(phosphoroselenate)、或二硒代磷酸酯(phosphorodiselenate)连接,它们对一些核酸酶具有抵抗力。
如本文所用,术语“具有末端转移活性的逆转录酶”是指,能催化一个或多个脱氧核糖核苷三磷酸(dNTP)或单个双脱氧核糖核苷三磷酸不依赖模板地添加(或“加尾”)至cDNA的3’末端的逆转录酶。此类逆转录酶的实例包括但不限于,M-MLV逆转录酶、HIV-1逆转录酶、AMV逆转录酶、端粒酶逆转录酶,以及具有所述逆转录酶的逆转录活性和末端转移活性的变体、修饰产物和衍生物。在优选的实施方案中,用于逆转录RNA以生成cDNA的逆转录酶不具有或者具有降低的RNase活性(特别是RNase H活性),以避免RNA的降解。因此,在优选的实施方案中,用于逆转录RNA以生成cDNA的逆转录酶具有末端转移活性,且不具有或者具有降低的RNase活性(特别是RNase H活性)。此类逆转录酶的实例包括但不限于,经修饰或突变以去除RNase活性(特别是RNase H活性)的M-MLV逆转录酶、HIV-1逆转录酶、AMV逆转录酶和端粒酶逆转录酶。如本文所用,表述“具有降低的RNase活性”是指,与天然存在的野生型逆转录酶相比,经修饰或突变的逆转录酶的RNase活性降低。
如本文所用,具有“链置换活性”的核酸聚合酶是指,在延伸新核酸链的过程中,如果遇到下游与模板链互补的核酸链,可以继续延伸反应并将所述与模板链互补的核酸链剥离(而非降解)的核酸聚合酶。
如本文所用,具有“高保真性”的核酸聚合酶(或DNA聚合酶)是指,在扩增核酸的过程中,引入错误核苷酸的概率(即,错误率)低于野生型Taq酶的核酸聚合酶(或DNA聚合酶)。
如本文所用,术语“使退火”或“使杂交”以及“退火”或“杂交”是指,具有经由沃森-克里克碱基配对形成复合物的充分互补性的核苷酸序列之间形成复合物。就本发明来说,彼此之间“对其互补”或“与之互补”或与其“杂交”或“退火”的核酸序列应该能形成或形成服务于预定目的的足够稳定的“杂交体”或“复合物”。不要求由一个核酸分子显示的序列内的每个核酸碱基能够与由第二核酸分子显示的序列内的每个核酸碱基进行碱基配对或配对或复合,以便这两个核酸分子或其中显示的相应序列与彼此“互补”或“退火”或“杂交”。如本文所述,在提及按碱基配对法则联系的核苷酸的序列时使用术语“互补的”或“互补性”。例如,序列5’-A-G-T-3’与序列3’-T-C-A-5’互补。互补性可以是“部分的”,其中核酸碱基中只有一些根据碱基配对法则匹配。或者,在核酸之间可具有“完全的”或“全部的”互补性。核酸链之间的互补性的程度对核酸链之间的杂交的效率和强度具有显著影响。这在扩增反应以及依赖于核酸的杂交的检测方法中是特别重要的。如本文所用,在提及互补 的核酸链的配对时使用术语“退火”或“杂交”。杂交和杂交强度(即,核酸链之间的缔合强度)受本领域中公知的许多因素影响,包括核酸之间的互补性程度,包括受诸如盐浓度影响的条件的严格度,形成的杂交体的Tm(解链温度),其他组分的存在(如,存在或不存在聚乙二醇或甜菜碱),杂交链的摩尔浓度以及核酸链的G:C含量。如本文所用,可以采用低严格度条件、中严格度条件或高严格度条件来进行退火或杂交,所述低严格度条件、中严格度条件、高严格度条件是本领域已知的。
如本文所用,术语“珠粒”通常是指颗粒。珠粒可以是多孔的、无孔的、固体的、半固体的、半流体的或流体的。珠粒可以是磁性的或非磁性的。在一些实施方案中,珠粒可以是可溶解的、可破裂的或可降解的。在一些情况下,珠粒可以是不可降解的。在一些实施方案中,珠粒可以是凝胶珠粒。凝胶珠粒可以是水凝胶珠粒。凝胶珠粒可以由分子前体形成,例如聚合物或单体物质。半固体珠粒可以是脂质体珠粒。固体珠粒可包含金属,包括氧化铁、金和银。在一些情况下,珠粒是二氧化硅珠粒。在一些情况下,珠粒是刚性的。在一些情况下,珠粒可以是柔性的和/或可压缩的。
在一些实施方案中,珠粒可含有分子前体(例如,单体或聚合物),其可通过前体的聚合来形成聚合物网络。在一些情况下,前体可以是已经聚合的物质,其能够通过例如化学交联进行进一步聚合。在一些情况下,前体包含丙烯酰胺或甲基丙烯酰胺单体、低聚物或聚合物中的一种或多种。在一些情况下,珠粒可包含预聚物,其是能够进一步聚合的低聚物。例如,可以使用预聚物制备聚氨酯珠粒。在一些情况下,珠粒可含有可进一步聚合在一起的单个聚合物。在一些情况下,可以通过不同前体的聚合生成珠粒,使得它们包含混合聚合物、共聚物和/或嵌段共聚物。
珠粒可包含天然和/或合成材料。例如,聚合物可以是天然聚合物或合成聚合物。在一些情况下,珠粒包含天然和合成聚合物。天然聚合物的实例包括蛋白质和糖,例如脱氧核糖核酸、橡胶、纤维素、淀粉、蛋白质、酶、多糖、丝、聚羟基链烷酸酯、壳聚糖、葡聚糖、胶原、角叉菜胶、卵叶车前子、阿拉伯胶、琼脂、明胶、虫胶、梧桐树胶、黄原胶、玉米糖胶、瓜尔胶、刺梧桐树胶、琼脂糖、海藻酸、藻酸盐或其天然聚合物。合成聚合物的实例包括丙烯酸类、尼龙、硅氧烷、氨纶、粘胶人造丝、多元羧酸、聚乙酸乙烯酯、聚丙烯酰胺、聚丙烯酸酯、聚乙二醇、聚氨酯、聚乳酸、二氧化硅、聚苯乙烯、聚丙烯腈、聚丁二烯、聚碳酸酯、聚乙烯、聚对苯二甲酸乙二醇酯、聚(三氟氯乙烯)、聚(环氧乙烷)、聚(对苯二甲酸乙二醇酯)、聚乙烯、聚异丁烯、聚(甲基丙烯酸甲酯)、聚(甲醛)、聚甲醛、 聚丙烯、聚苯乙烯、聚(四氟乙烯)、聚(乙酸乙烯酯)、聚(乙烯醇)、聚(氯乙烯)、聚(偏二氯乙烯)、聚(偏二氟乙烯)、聚(氟乙烯)以及其任何组合(例如,共聚物)。珠粒也可以由除聚合物之外的材料形成,包括脂质、胶束、陶瓷、玻璃陶瓷、材料复合物、金属、其他无机材料等。
珠粒可具有均匀尺寸或不均匀尺寸。在一些情况下,珠粒的直径可以是约1μm、5μm、10μm、20μm、30μm、40μm、50μm、60μm、70μm、80μm、90μm、100μm、250μm、500μm或1mm。在一些情况下,珠粒的直径可以是至少约1μm、5μm、10μm、20μm、30μm、40μm、50μm、60μm、70μm、80μm、90μm、100μm、250μm、500μm、1mm或更大。在一些情况下,珠粒的直径可以小于约1μm、5μm、10μm、20μm、30μm、40μm、50μm、60μm、70μm、80μm、90μm、100μm、250μm、500μm或1mm。在一些情况下,珠粒的直径可以在约40-75μm、30-75μm、20-75μm、40-85μm、40-95μm、20-100μm、10-100μm、1-100μm、20-250μm、或20-500μm的范围内。
在某些方面中,珠粒作为具有相对单分散尺寸分布的珠粒群或多个珠粒提供。在需要在分区内提供相对一致量的试剂的情况下,保持相对一致的珠粒特征(例如尺寸)可有助于总体一致性。特别地,本文所述的珠粒可具有其横截面尺寸的变异系数小于50%、小于40%、小于30%、小于20%,并且在一些情况下小于15%、小于10%、或甚至小于5%的尺寸分布。
珠粒可以具有任何合适的形状。珠粒形状的实例包括但不限于球形、非球形、椭圆形、长圆形、无定形、圆形、圆柱形及其变型形式。
如本文所述,含有标记序列的寡核酸分子可以偶联至珠粒的表面和/或封闭在珠粒内。用于连接寡核苷酸的珠粒的官能化可以通过多种不同的方法实现,包括活化聚合物内的化学基团、将活性或可活化的官能团掺入聚合物结构中或者在珠粒生产中的预聚物或单体阶段进行连接。例如,聚合形成珠粒的前体(例如,单体,交联剂)可包含丙烯酰胺亚磷酰胺部分,使得当生成珠粒时,珠粒还包含丙烯酰胺亚磷酰胺部分。丙烯酰胺亚磷酰胺部分可以连接到寡核苷酸。
如本文所述,珠粒可以自发地或在暴露于一种或多种刺激(例如,温度变化、pH变化、暴露于特定化学物质或相、暴露于光、还原剂等)时释放所述寡核苷酸。向凝胶珠粒中添加多种类型的不稳定键可导致生成能够对不同刺激有反应的珠粒。每种类型的不稳定键可以对相关的刺激(例如,化学刺激、光、温度等)敏感,使得通过施加适当的 刺激可以控制通过每个不稳定键连接到珠粒的物质的释放。这种官能团可用于从凝胶珠粒受控地释放物质。在一些情况下,包含不稳定键的另一物质可以在凝胶珠粒形成之后通过例如如上所述的凝胶珠粒的活化官能团与凝胶珠粒连接。应当理解的是,可释放地、可裂解地或可逆地连接到本文所述的珠粒的寡核苷酸可包括,通过寡核苷酸分子与珠粒之间的键联的裂解来释放或可释放的条形码或标记序列,或通过珠粒本身的降解来释放的条形码或标记序列,或两者兼而有之,所述条形码或标记序列允许被其他试剂接近或可被其他试剂接近。
除了可热裂解的键、二硫键和UV敏感键之外,可以与前体或珠粒偶合的不稳定键的其他非限制性实例包括酯键(例如,可用酸、碱或羟胺裂解)、邻位二醇键(例如,可通过高碘酸钠裂解)、狄尔斯-阿尔德(Diels-Alder)键(例如,可通过热裂解)、砜键(例如,可通过碱裂解)、甲硅烷基醚键(例如,可通过酸裂解)、糖苷键(例如,可通过淀粉酶裂解)、肽键(例如,可通过蛋白酶裂解)或磷酸二酯键(例如,可通过核酸酶(例如,DNA酶)裂解))。
除了上文所述的珠粒与寡核苷酸之间的可裂解键之外或作为其替代,珠粒可以在自发地或在暴露于一种或多种刺激(例如,温度变化、pH变化、暴露于特定化学物质或相、暴露于光、还原剂等)时为可降解、可破坏或可溶解的。在一些情况下,珠粒可以是可溶解的,使得珠粒的材料组分在暴露于特定化学物质或环境变化(例如变化温度或pH变化)时溶解。在一些情况下,凝胶珠粒在升高的温度和/或碱性条件下降解或溶解。在一些情况下,珠粒可以是可热降解的,使得当珠粒暴露于适当的温度变化(例如,加热)时,珠粒降解。与物质(例如,寡核苷酸,例如条形码化寡核苷酸)结合的珠粒的降解或溶解可导致物质从珠粒中释放。
此外,不参与聚合的物质也可以在珠粒生成期间(例如,在前体的聚合期间)被包封在珠粒中。此类物质可以进入聚合反应混合物中,使得生成的珠粒在珠粒形成时包含各物质。在一些情况下,可在形成之后将此类物质加入凝胶珠粒中。此类物质可包括例如寡核苷酸、用于核酸扩增反应的试剂(例如,引物、聚合酶、dNTP、辅因子(例如,离子辅因子))、用于酶促反应的试剂(例如,酶、辅因子、底物)或用于核酸修饰反应如聚合、连接或消化的试剂。此类物质的捕集可以通过在前体的聚合期间生成的聚合物网络密度、凝胶珠粒内离子电荷的控制(例如,通过与聚合物质连接的离子物质)或通过其他物质的释放来控制。可以在珠粒降解时和/或通过施加能够从珠粒释放物质的刺激从珠粒释放包 封的物质。
如本文所用,术语“转座酶”和“逆转录酶”以及“核酸聚合酶”是指负责催化特异性化学反应和生物学反应的蛋白质分子或蛋白质分子聚集体。一般来说,本发明的方法、组合物或试剂盒不限于使用来自特定来源的特定的转座酶、逆转录酶或核酸聚合酶。反之,本发明的方法、组合物或试剂盒包括与根据特定方法、组合物或试剂盒的本文公开的特定酶具有等同酶活性的来自任何来源的任何转座酶、逆转录酶或核酸聚合酶。更进一步,本发明的方法还包括如下实施方案:其中在所述方法的步骤中提供和使用的任何一种特定的酶被两种或多种酶的组合取代,所述两种或多种酶在组合使用时,不论是以分步方式分别使用还是同时一起使用,反应混合物产生的结果与使用该一种特定的酶获得的结果相同。本文提供的方法、缓冲液和反应条件,包括在实施例中的方法、缓冲液和反应条件目前对于本发明的方法、组合物和试剂盒的实施方案是优选的。然而,使用本发明的一些酶的其他的酶储存缓冲液、反应缓冲液和反应条件是本领域已知的,其也可能适于在本发明中使用并且被包括在本文中。
本申请的发明人基于深入的研究,开发了一种新的标记核酸分子的方法。根据本申请的方法所产生的经标记的核酸分子可以方便地用于构建核酸分子文库(特别是转录组测序文库),其中,所述核酸分子文库含有RNA分子(例如,mRNA分子)的5’末端序列的信息,可用于分析转录组中的RNA分子(例如,mRNA分子)的丰度和5’末端序列,以及转录起始位置。并且,利用本申请方法构建的核酸分子文库具有双重的细胞标签(例如第一标签和第二标签),由此能够显著降低“假单细胞”对测序过程和测序数据的不利影响。因此,本申请的方法能够大幅降低建库过程中的微反应体系空载率和提高单次建库反应的细胞通量和样品通量,能够大幅降低建库成本和测序成本。
此外,本申请的标记核酸分子的方法:1)兼容当前主要的转录组建库技术和平台(包括,基于微流控液滴的高通量单细胞转录组建库技术,基于微孔板的高通量单细胞转录组建库技术等),可方便地进行商业化应用;2)兼容基于细胞或细胞核的建库方案,突破了样本限制(例如,可基于冷冻样品建立单细胞和单细胞核转录组文库)。
因此,在一个方面,本申请提供了一种处理细胞或细胞核以产生核酸片段群的方法,其包括下述步骤:
(1)提供一个或多个细胞或细胞核;
(2)对所述细胞或细胞核内的RNA(例如,mRNA、长链非编码RNA、eRNA)进行包括逆转录步骤的处理,形成含有cDNA链的双链核酸(例如,含有RNA(例如,mRNA、长链非编码RNA、eRNA)链和cDNA链的杂合双链核酸);
(3)将所述双链核酸(例如,所述杂合双链核酸)与转座酶复合体孵育;其中,所述转座酶复合体含有转座酶和所述转座酶能够识别并结合的转座序列,且能够切割或断裂双链核酸(例如,含有RNA和DNA的杂合双链核酸);并且,所述转座序列包含转移链和非转移链;其中,所述转移链包含转座酶识别序列,第一标签序列,以及,第一共有序列;其中,所述第一标签序列位于所述转座酶识别序列的上游(例如5’端),且,所述第一共有序列位于所述第一标签序列的上游(例如5’端);并且所述孵育在允许所述双链核酸(例如,所述杂合双链核酸)被所述转座酶复合体断裂成核酸片段且所述转移链被连接至所述核酸片段的末端(例如,所述核酸片段的5’端)的条件下进行;
从而,形成核酸片段群;其中,所述核酸片段包含cDNA片段,以及连接至所述cDNA片段的5’端的转移链的序列。
在某些优选的实施方案中,所述核酸片段从5’端至3’端包含第一共有序列,第一标签序列,转座酶识别序列和cDNA片段。
在某些优选的实施方案中,在进行步骤(2)之前,对细胞进行透化和/或固定处理。在某些示例性的实施方案中,在进行步骤(2)之前,可使用甲醇和/或甲醛对细胞进行处理。
不受理论限制,在本申请的方法中,可使用各种已知的方法对细胞进行透化处理。此类透化处理可使得各种反应试剂(包括例如,酶例如逆转录酶和转座酶,核酸分子例如逆转录引物和转座序列)能够透过细胞膜,进入细胞内,并发挥功能。在某些示例性实施方案中,可使用甲醇对细胞进行透化处理。
不受理论限制,在本申请的方法中,可使用各种已知的方法对细胞进行固定处理。在某些示例性实施方案中,可使用甲醛对细胞进行固定处理。
本申请的方法可用于处理一个或多个细胞或细胞核。在某些优选的实施方案中,在步骤(1)中提供至少2个(例如,至少10个,至少10 2个,至少10 3个,至少10 4个,至少10 5个,至少10 6个,至少10 7个,或更多个)细胞或细胞核。
当本申请的方法用于处理多个细胞或细胞核时,可以将待处理的细胞或细胞核进行分组,并且对于各组细胞或细胞核,可以使用相同或不同的转座序列进行处理,由此,可以对衍生自不同组的细胞或细胞核的核酸分子标记上相同或者不同的序列(例如第一标签序列)。
因此,在某些优选的实施方案中,在进行步骤(3)之前(例如,在进行步骤(2)之前,或者在进行步骤(2)之后且在进行步骤(3)之前),将所述细胞或细胞核分成至少2个(例如,至少3个,至少4个,至少5个,至少8个,至少10个,至少12个,至少20个,至少24个,至少50个,至少96个,至少100个,至少200个,至少384个,至少400个,或更多个)亚集,其中,每个亚集含有至少一个细胞或细胞核。
在某些优选的实施方案中,在步骤(3)中,将各个亚集的细胞或细胞核内的所述双链核酸(例如,杂合双链核酸)分别与转座酶复合体孵育。
在某些优选的实施方案中,对于每个亚集,所述转座酶复合体具有彼此不同的第一标签序列,由此,从各个亚集的细胞或细胞核所产生的核酸片段含有彼此不同的第一标签序列。
在某些优选的实施方案中,对于每个亚集,所述转座酶复合体具有相同的转座酶,相同的转座酶识别序列,相同的第一共有序列,和/或,相同的非转移链。
在某些优选的实施方案中,对于每个亚集,除了所述第一标签序列之外,所述转座酶复合体具有相同的转座酶,相同的转座酶识别序列,相同的第一共有序列,以及,相同的非转移链。在此类实施方案中,从各个亚集的细胞或细胞核产生的核酸片段具有相同的第一共有序列和转座酶识别序列;且从同一亚集的细胞或细胞核产生的核酸片段具有相同的第一标签序列;且从不同亚集的细胞或细胞核产生的核酸片段具有彼此不同的第一标签序列。
在某些优选的实施方案中,各个亚集所产生的核酸片段具有相同的第一共有序列,且同一亚集所产生的核酸片段具有相同的第一标签序列,且不同亚集所产生的核酸片段具有彼此不同的第一标签序列。
易于理解,在使用多种第一标签序列的实施方案中,第一标签序列可用于确定细胞或细胞核所源自的亚集,并且可用于区分源自不同亚集的细胞或细胞核。因此,在转座之后,可以将不同亚集的细胞或细胞核合并,并且可以利用第一标签序列来区分 不同亚集的细胞或细胞核。因此,在某些优选的实施方案中,在进行步骤(3)之后,将至少2个亚集的细胞或细胞核合并。在某些优选的实施方案中,在进行步骤(3)之后,将至少各个亚集的细胞或细胞核合并。
易于理解,本申请的方法适用于任何细胞或细胞核,包括但不限于癌细胞、干细胞、神经细胞、胎儿细胞和参与免疫应答的免疫细胞或其细胞核。此类细胞的详细描述提供于上文的术语定义部分,但不限于其中所列举的具体实例。在某些优选的实施方案中,所述细胞是来自动物、植物或微生物的细胞或细胞系,或其任何组合。在某些优选的实施方案中,所述细胞是来自哺乳动物(例如人)的细胞或细胞系,或其任何组合。在某些优选的实施方案中,所述细胞是癌细胞、干细胞、神经细胞、胎儿细胞、免疫细胞,或其任何组合。在某些优选的实施方案中,所述细胞是免疫细胞,例如B细胞或T细胞。相应地,来自所述细胞的细胞核可用于本申请的方法中。在某些优选的实施方案中,所述细胞核来自免疫细胞,例如B细胞或T细胞。在某些优选的实施方案中,所述核酸片段群包含T细胞受体基因或基因产物,或B细胞受体基因或基因产物。
不受理论限制,可使用各种逆转录酶来实施逆转录反应。在某些优选的实施方案中,使用逆转录酶对所述RNA(例如,mRNA、长链非编码RNA、eRNA)进行逆转录,形成含有RNA(例如,mRNA、长链非编码RNA、eRNA)链和cDNA链的杂合双链核酸。
在某些情况下,在杂合双链核酸的cDNA链的3’端形成或添加悬突是有利的,例如可用于后续的核酸操作。因此,在某些优选的实施方案中,所述杂合双链核酸在cDNA链的3’端具有悬突。在某些优选的实施方案中,所述悬突具有至少1个,至少2个,至少3个,至少4个,至少5个,至少6个,至少7个,至少8个,至少9个,至少10个或更多个核苷酸的长度。在某些优选的实施方案中,所述悬突为2-5个胞嘧啶核苷酸的悬突(例如CCC悬突)。
可使用各种合适的方法来在cDNA链的3’端形成或添加悬突。在某些实施方案中,可通过使用具有末端转移活性的逆转录酶来在cDNA链的3’端形成或添加悬突。因此,在某些优选的实施方案中,所述逆转录酶具有末端转移活性。在某些优选的实施方案中,所述逆转录酶能够以RNA(例如,mRNA、长链非编码RNA、eRNA)为模板,合成cDNA链,且在所述cDNA链的3’端添加悬突。在某些优选的实施方案中,所述 逆转录酶能够在cDNA链的3’末端添加长度为至少1个,至少2个,至少3个,至少4个,至少5个,至少6个,至少7个,至少8个,至少9个,至少10个或更多个核苷酸的悬突。在某些优选的实施方案中,所述逆转录酶能够在cDNA链的3’末端添加2-5个胞嘧啶核苷酸的悬突(例如CCC悬突)。
易于理解,任何能够以RNA分子为模板合成cDNA链,并在所述cDNA链的3’端添加悬突的逆转录酶(即,具有末端转移活性的逆转录酶)均适用于本方法。具有末端转移活性的逆转录酶的实例包括但不限于,M-MLV逆转录酶、HIV-1逆转录酶、AMV逆转录酶和端粒酶逆转录酶。此外,为避免RNA的不必要的降解,所使用的逆转录酶优选地不具有或者具有降低的RNase活性(特别是RNase H活性)。因此,在某些优选的实施方案中,所述逆转录酶选自,经修饰或突变以去除RNase活性(特别是RNase H活性)的M-MLV逆转录酶、HIV-1逆转录酶、AMV逆转录酶和端粒酶逆转录酶(例如,不具有RNase H活性的M-MLV逆转录酶)。
在某些优选的实施方案中,使用包含poly(T)序列的引物和/或包含随机寡核苷酸序列的引物对所述RNA(例如,mRNA、长链非编码RNA、eRNA)进行逆转录。在某些优选的实施方案中,所述poly(T)序列和/或随机寡核苷酸序列位于所述引物的3’端。在某些优选的实施方案中,所述poly(T)序列包含至少5个(例如,至少10个、至少15个、或至少20个)胸腺嘧啶核苷酸残基。在某些优选的实施方案中,所述随机寡核苷酸序列具有5-30nt(例如,5-10nt,10-20nt,20-30nt)的长度。在某些优选的实施方案中,所述引物不包含修饰,或者包含修饰的核苷酸。
易于理解,任何能够与包含转座酶识别序列的组合物形成功能复合物并催化该包含转座酶识别序列的组合物部分或全部转座进入在转座反应中与该酶孵育的双链核酸分子中的转座酶均适用于本申请的方法。在某些优选的实施方案中,所述转座酶复合体能够随机切割或断裂含有RNA和DNA的杂合双链核酸。
在某些优选的实施方案中,所述转座酶选自Tn5转座酶、MuA转座酶、睡美人转座酶、Mariner转座酶、Tn7转座酶、Tn10转座酶、Ty1转座酶、Tn552转座酶,以及具有上述转座酶的转座活性的变体、修饰产物和衍生物。在某些优选的实施方案中,所述转座酶为Tn5转座酶。
在本申请的方法中,第一标签序列不受其组成或长度的限制,只要其能发挥标识作用即可。在某些优选的实施方案中,所述第一标签序列具有至少2个,至少3个, 至少4个,至少5个,至少6个,至少7个,至少8个,至少9个,至少10个或更多个核苷酸的长度。例如,所述第一标签序列的长度为4-8个核苷酸。在某些优选的实施方案中,所述第一标签序列连接(例如直接连接)至所述转座酶识别序列的5’端。
在本申请的方法中,第一共有序列不受其组成或长度的限制。在某些优选的实施方案中,所述第一共有序列具有至少5个,至少6个,至少7个,至少8个,至少9个,至少10个,至少12个,至少15个,至少18个,至少20个,至少25个或更多个核苷酸的长度。例如,所述第一共有序列的长度为12-25个核苷酸。在某些优选的实施方案中,所述第一共有序列连接(例如直接连接)至所述第一标签序列的5’端。
在某些优选的实施方案中,所述转移链从5’端至3’端包含第一共有序列,第一标签序列,和转座酶识别序列。在某些优选的实施方案中,所述转座酶识别序列具有如SEQ ID NO:99所示的序列。该序列包含Tn5转座酶的识别序列。
在某些优选的实施方案中,所述非转移链能够与所述转移链退火或杂交形成双链体。在某些优选的实施方案中,所述非转移链包含与转移链中的转座酶识别序列互补的序列。在某些优选的实施方案中,所述非转移链具有如SEQ ID NO:1所示的序列。
可根据需要,对转移链或非转移链进行修饰或者不进行修饰。因此,在某些优选的实施方案中,所述转移链不包含修饰,或者包含修饰的核苷酸;和/或,所述非转移链不包含修饰,或者包含修饰的核苷酸。在某些优选的实施方案中,所述非转移链的5’末端具有磷酸基团修饰;和/或,所述非转移链的3’末端是封闭的(例如,所述非转移链的3’末端核苷酸为双脱氧的核苷酸)。
在某些优选的实施方案中,在步骤(3)中,在所述细胞或细胞核内形成所述核酸片段群。
在某些优选的实施方案中,所述核酸片段群用于构建转录组文库(例如,5’端转录组文库)或用于转录组测序(例如,5’端转录组测序)。
在某些优选的实施方案中,所述核酸片段群用于构建靶核酸(例如,V(D)J序列)的文库或用于靶核酸(例如,V(D)J序列)的测序。在某些优选的实施方案中,所述靶核酸含有细胞转录产生的目的核酸的序列或其互补序列。在某些优选的实施方案中,所述靶核酸包含,(1)编码T细胞受体(TCR)或B细胞受体(BCR)的核苷酸序列或其部分序列(例如,V(D)J序列),或(2)(1)的互补序列。在某些优选的实施方案中,所述靶核酸包含,V(D)J基因的序列或其互补序列。
在一个方面,本申请提供了一种生成经标记的核酸分子的方法,其包括下述步骤:
(a)提供:
一个或多个细胞或细胞核,所述细胞或细胞核是根据本申请上文所述的方法进行了处理的细胞或细胞核,其含有核酸片段群;和
一个或多个偶联寡核苷酸分子的珠粒,所述寡核苷酸分子含有标记序列;和
(b)使用所述核酸片段和所述寡核苷酸分子生成经标记的核酸分子,所述经标记的核酸分子从5'末端到3'末端包含所述核酸片段的序列以及所述标记序列的互补序列,或者包含所述标记序列以及所述核酸片段的互补序列。
易于理解,可以使用多个细胞或细胞核以及多个珠粒来实施所述方法。在某些优选的实施方案中,在步骤(a)中,提供至少2个(例如,至少10个,至少10 2个,至少10 3个,至少10 4个,至少10 5个,至少10 6个,至少10 7个,或更多个)细胞或细胞核;和/或,提供至少2个(例如,至少10个,至少10 2个,至少10 3个,至少10 4个,至少10 5个,至少10 6个,至少10 7个,至少10 8个,或更多个)珠粒。
可以在各种适当的反应体系中提供所述细胞或细胞核以及珠粒。在某些优选的实施方案中,在步骤(a)中,在微孔或液滴中(例如,在多个微孔或液滴中)提供所述细胞或细胞核,以及所述珠粒。在某些优选的实施方案中,所述液滴是油包水液滴。可以使用各种方式来制备含有细胞核或细胞与偶联寡核苷酸分子的珠粒的油包水液滴。例如,在某些示例性实施方案中,可以使用10X GENOMICS Chromium平台或控制器进行油包水液滴的制备。
在珠粒上偶联多个寡核苷酸分子是有利的,其可用于捕获细胞或细胞核内的核酸片段中的多个核酸片段。在某些优选的实施方案中,所述珠粒偶联了多个(例如,至少10个,至少10 2个,至少10 3个,至少10 4个,至少10 5个,至少10 6个,至少10 7个,至少10 8个,或更多个)寡核苷酸分子。
可使用各种已知的方法将寡核苷酸分子与珠粒偶联。此类方法在上文的术语定义部分进行了详细描述,并且不限于其中所列举的具体实例。此外,可以将寡核苷酸分子偶联至珠粒的表面或封闭在珠粒内。在某些优选的实施方案中,所述寡核苷酸分子偶联至珠粒的表面,和/或,封闭在珠粒内。
在某些优选的实施方案中,所述珠粒能够自发地或在暴露于一种或多种刺激(例 如,温度变化、pH变化、暴露于特定化学物质或相、暴露于光、还原剂等)时释放所述寡核苷酸。
可以使用任何合适的材料来制备所述珠粒,并且所述珠粒可具有任何期望的尺寸、形状、粒径分布、和/或修饰,如上文术语定义部分所详细描述的。在某些优选的实施方案中,所述珠粒是凝胶珠粒。
可根据需要,设计和使用各种标记序列。在某些优选的实施方案中,所述标记序列包含选自下列的元件:第一扩增引物序列,第二共有序列,第二标签序列,独特分子标签序列,模板转换序列,或其任何组合。在某些优选的实施方案中,所述标记序列包含第二共有序列,第二标签序列,独特分子标签序列和模板转换序列。在某些优选的实施方案中,所述标记序列还包含第一扩增引物序列。
可对模板转换序列进行设计,以便于寡核苷酸分子(标记序列)捕获(例如,退火或杂交至)细胞或细胞核内的核酸片段。在某些优选的实施方案中,所述模板转换序列包含与所述cDNA链的3’末端悬突互补的序列。在某些优选的实施方案中,所述悬突为2-5个胞嘧啶核苷酸的悬突(例如CCC悬突),且所述模板转换序列的3’末端包含2-5个鸟嘌呤核苷酸突(例如GGG)。借助于如此设计的模板转换序列,细胞核或细胞内含有cDNA链3’末端的核酸片段将被寡核苷酸分子所捕获,二者可以进行退火或杂交。
易于理解,模板转换序列不受其长度的限制。在某些优选的实施方案中,所述模板转换序列具有至少5个,至少6个,至少7个,至少8个,至少9个,至少10个,至少15个,至少20个,至少25个或更多个核苷酸的长度。在某些优选的实施方案中,所述模板转换序列不包含修饰,或者包含修饰的核苷酸(例如锁核酸)。在某些情况下,修饰的核苷酸的使用是有利的。例如,修饰的核苷酸(例如锁核酸)可有助于增强模板转换序列与核酸片段之间的结合(碱基互补配对)。
在本申请的方法中,独特分子标签序列不受其组成或长度的限制,只要其能发挥标识作用即可。在某些优选的实施方案中,所述独特分子标签序列具有至少5个,至少6个,至少7个,至少8个,至少9个,至少10个,至少15个,至少20个,至少25个或更多个核苷酸的长度。在某些优选的实施方案中,所述独特分子标签序列不包含修饰,或者包含修饰的核苷酸。
在本申请的方法中,第二标签序列不受其组成或长度的限制,只要其能发挥标识 作用即可。在某些优选的实施方案中,所述第二标签序列具有至少3个,至少4个,至少5个,至少6个,至少7个,至少8个,至少9个,至少10个,至少15个,至少20个,至少25个或更多个核苷酸的长度。在某些优选的实施方案中,所述第二标签序列不包含修饰,或者包含修饰的核苷酸。
在本申请的方法中,第二共有序列不受其组成或长度的限制。在某些优选的实施方案中,所述第二共有序列具有至少3个,至少4个,至少5个,至少6个,至少7个,至少8个,至少9个,至少10个,至少15个,至少20个,至少25个或更多个核苷酸的长度。在某些优选的实施方案中,所述第二共有序列不包含修饰,或者包含修饰的核苷酸。
在某些优选的实施方案中,所述珠粒偶联了多个寡核苷酸分子,并且,各个寡核苷酸分子具有彼此不同的独特分子标签序列。在某些优选的实施方案中,各个寡核苷酸分子具有相同的第二标签序列和/或相同的第二共有序列。
在某些优选的实施方案中,所述方法使用了多个珠粒,并且,每个珠粒各自具有多个寡核苷酸分子;并且,同一个珠粒上的所述多个寡核苷酸分子具有相同的第二标签序列,并且,不同珠粒上的寡核苷酸分子具有彼此不同的第二标签序列。借助于标记序列的这一设计,由同一个液滴中的核酸片段所生成的经标记的核酸分子可携带相同的第二标签序列或其互补序列,以及彼此不同的独特分子标签序列或其互补序列(用于标记同一个液滴内的不同核酸片段);由不同液滴中的核酸片段所生成的经标记的核酸分子可携带彼此不同的第二标签序列或其互补序列。
此外,为了便于后续的核酸操作,各个珠粒上的各个寡核苷酸分子可具有相同的第二共有序列和/或相同的第一扩增引物序列。由此,各个液滴中的核酸片段所生成的经标记的核酸分子可携带相同的第二共有序列或其互补序列和/或相同的第一扩增引物序列或其互补序列。因此,在某些优选的实施方案中,各个珠粒上的寡核苷酸分子具有相同的第二共有序列。在某些优选的实施方案中,各个珠粒上的寡核苷酸分子还具有相同的第一扩增引物序列。
可根据各个元件的期望功能,设置各个元件的排列顺序。例如,模板转换序列可用于捕获期望的核酸片段,并启动延伸反应。相应地,模板转换序列例如可设置于标记序列的3’末端。例如,第二共有序列和/或第一扩增引物序列可用于提供引物结合位点。相应地,第二共有序列和/或第一扩增引物序列例如可设置于标记序列的5’末端。 因此,在某些优选的实施方案中,所述模板转换序列位于所述标记序列的3’末端。在某些优选的实施方案中,所述第二共有序列位于所述第二标签序列,独特分子标签序列和/或模板转换序列的上游。在某些优选的实施方案中,所述第一扩增引物序列位于所述第二共有序列的上游。在某些优选的实施方案中,所述标记序列从5’端至3’端包含任选的第一扩增引物序列,第二共有序列,第二标签序列,独特分子标签序列和模板转换序列。在某些优选的实施方案中,所述标记序列从5’端至3’端包含任选的第一扩增引物序列,第二共有序列,独特分子标签序列,第二标签序列和模板转换序列。
在某些优选的实施方案中,在步骤(b)中,通过选自下列的方式,使所述核酸片段和所述寡核苷酸分子接触:
(b1)将细胞或细胞核裂解以释放核酸片段;
(b2)将寡核苷酸分子从珠粒上释放;或者
(b3)(b1)和(b2)的组合。
在某些优选的实施方案中,在步骤(b)中,所述寡核苷酸分子通过模板转换序列与含有cDNA链的3’末端悬突的核酸片段退火或杂交,其中,所述模板转换序列包含与所述cDNA链的3’末端悬突互补的序列;并且,所述核酸片段(或所述寡核苷酸分子)在核酸聚合酶(例如,DNA聚合酶或逆转录酶)的作用下,以所述寡核苷酸分子(或所述核酸片段)为模板被延伸,生成经标记的核酸分子。不受理论限制,可以使用各种合适的核酸聚合酶(例如,DNA聚合酶或逆转录酶)来进行延伸反应,只要其能够以寡核苷酸分子(或被捕获的核酸片段)为模板延伸被捕获的核酸片段(或寡核苷酸分子)即可。在某些优选的实施方案中,步骤(b)中使用的核酸聚合酶与步骤(2)中使用的逆转录酶是相同的。
在步骤(b)中,通常仅含有cDNA链3’末端的核酸片段能够通过cDNA链3’末端的悬突,而被寡核苷酸分子所捕获,因此,所生成的经标记的核酸分子通常将含有cDNA链3’末端的序列(其对应于RNA(例如,mRNA、长链非编码RNA、eRNA)的5’末端的序列)或其互补序列。由此,对所生成的经标记的核酸分子或其衍生物进行测序,可以获得细胞或细胞核内RNA(例如,mRNA、长链非编码RNA、eRNA)的5’末端的序列信息。
在某些优选的实施方案中,所述经标记的核酸分子从5'末端到3'末端包含所述标记序列以及所述核酸片段的互补序列,其中所述核酸片段包含与RNA(例如,mRNA、 长链非编码RNA、eRNA)的5’端序列互补的序列。
在某些优选的实施方案中,所述经标记的核酸分子从5'末端到3'末端包含所述核酸片段的序列以及所述标记序列的互补序列,其中所述核酸片段包含与RNA(例如,mRNA、长链非编码RNA、eRNA)的5’端序列互补的序列。在某些优选的实施方案中,所述经标记的核酸分子从5'末端到3'末端包含第一共有序列,第一标签序列,转座酶识别序列,cDNA片段的序列,模板转换序列的互补序列,独特分子标签序列的互补序列,第二标签序列的互补序列,第二共有序列的互补序列,以及任选的第一扩增引物序列的互补序列。在某些优选的实施方案中,所述cDNA片段包含与RNA(例如,mRNA、长链非编码RNA、eRNA)的5’端序列互补的序列。
在某些优选的实施方案中,所述方法还包括:(c)回收和纯化所述经标记的核酸分子。
在某些优选的实施方案中,所述经标记的核酸分子用于构建转录组文库(例如,5’端转录组文库)或用于转录组测序(例如,5’端转录组测序)。
在某些优选的实施方案中,所述核酸片段群用于构建靶核酸(例如,V(D)J序列)的文库或用于靶核酸(例如,V(D)J序列)的测序。在某些优选的实施方案中,所述靶核酸含有细胞转录产生的目的核酸的序列或其互补序列。在某些优选的实施方案中,所述靶核酸包含,(1)编码T细胞受体(TCR)或B细胞受体(BCR)的核苷酸序列或其部分序列(例如,V(D)J序列),或(2)(1)的互补序列。在某些优选的实施方案中,所述靶核酸包含,V(D)J基因的序列或其互补序列。
在一个方面,本申请还提供了一种构建核酸分子文库的方法,其包括,
(i)根据本申请上文所述的生成经标记的核酸分子的方法生成多个经标记的核酸分子,以及,
(ii)回收和/或合并多个经标记的核酸分子,
从而获得核酸分子文库。
在某些优选的实施方案中,在步骤(ii)中,回收和/或合并由多个珠粒衍生的经标记的核酸分子。
可根据需要,富集所述经标记的核酸分子。例如,可以对经标记的核酸分子进行核酸扩增反应,以产生富集产物。因此,在某些优选的实施方案中,所述方法还包括, (iii)富集所述经标记的核酸分子。
在某些优选的实施方案中,在步骤(iii)中,对所述经标记的核酸分子进行核酸扩增反应,以产生富集产物。在某些优选的实施方案中,所述核酸扩增反应使用至少第一引物来进行,其中,所述第一引物能够与所述第一扩增引物序列的互补序列和/或所述第二共有序列的互补序列杂交或退火。任选地,所述核酸扩增反应还使用第二引物,所述第二引物能够与所述第一共有序列的互补序列杂交或退火。
在某些优选的实施方案中,所述第一引物含有:①所述第一扩增引物序列或其部分序列,或者②所述第二共有序列或其部分序列,或者③①和②的组合。
在某些优选的实施方案中,所述第二引物含有所述第一共有序列或其部分序列。
不受理论限制,可以使用各种合适的核酸聚合酶(例如,DNA聚合酶)来进行用于富集经标记的核酸分子的核酸扩增反应,只要其能够以经标记的核酸分子为模板进行扩增反应即可。在某些示例性实施方案中,可使用具有链置换活性的核酸聚合酶(例如,具有链置换活性的DNA聚合酶)来进行所述核酸扩增反应。在某些示例性实施方案中,可使用具有高保真性的核酸聚合酶(例如,具有高保真性的DNA聚合酶)来进行所述核酸扩增反应。在某些优选的实施方案中,步骤(iii)中的所述核酸扩增反应使用核酸聚合酶(例如DNA聚合酶;例如具有链置换活性和/或高保真性的DNA聚合酶)来进行。
在某些实施方案中,特别是在所述经标记的核酸分子包含所述核酸片段的序列以及所述标记序列的互补序列的实施方案中,优选地,所述方法还包括,在进行步骤(iii)之前,将所述寡核苷酸分子或模板转换序列降解的步骤。易于理解,将所述寡核苷酸分子或模板转换序列降解在某些情况下是有利的,其例如可以避免所述寡核苷酸分子或模板转换序列对核酸扩增反应的阻碍作用。
在某些优选的实施方案中,所述第一引物与所述经标记的核酸分子的退火温度高于所述寡核苷酸分子与所述经标记的核酸分子的退火温度。
在某些优选的实施方案中,所述方法还包括,(iv)回收和纯化步骤(iii)的富集产物。
易于理解,为便于步骤(iv)中对步骤(iii)的富集产物的回收和纯化,任选地,步骤(iii)中,可使用带有标记的所述第一引物和/或带有标记的所述第二引物对所述经标记的核酸分子进行核酸扩增反应。从而,在步骤(iv)中,可使用能与所述标记分 子发生相互作用的结合分子回收和纯化步骤(iii)的富集产物。
在某些实施方案中,所述结合分子能与所述标记分子发生特异性相互作用或者非特异性相互作用。
在某些实施方案中,所述结合分子与所述标记分子通过选自下述的方式发生相互作用:正负电荷相互作用(例如多聚赖氨酸-糖蛋白),亲和相互作用(例如生物素-亲和素,生物素-链霉亲和素,抗原-抗体,受体-配体,酶-辅因子),点击化学反应(例如含炔基基团-叠氮基化合物),或其任意组合。
例如,所述标记分子为多聚赖氨酸,所述结合分子为糖蛋白;或者,所述标记分子为抗体,所述结合分子为能与所述抗体结合的抗原;或者,所述标记分子为生物素,所述结合分子为链霉亲和素。
例如,所述结合分子为多聚赖氨酸,所述标记分子为糖蛋白;或者,所述结合分子为抗体,所述标记分子为能与所述抗体结合的抗原;或者,所述结合分子为生物素,所述标记分子为链霉亲和素。
在某些实施方案中,步骤(iii)中,所述第一引物连接有第一标记分子,所述第一标记分子能与第一结合分子发生相互作用。在某些实施方案中,步骤(iv)中,使用所述第一结合分子回收和纯化步骤(iii)的富集产物。
在某些实施方案中,步骤(iii)中,使用至少所述第一引物和所述第二引物对所述经标记的核酸分子进行核酸扩增反应,以产生富集产物;其中,所述第一引物连接有第一标记分子,和/或,所述第二引物连接有第二标记分子;所述第一标记分子能与第一结合分子发生相互作用,所述第二标记分子能与第二结合分子发生相互作用。在某些实施方案中,步骤(iv)中,使用所述第一结合分子和/或所述第二结合分子回收和纯化步骤(iii)的富集产物。在某些实施方案中,所述第一标记分子与所述第二标记分子相同或不相同,和/或,所述第一结合分子与所述第二结合分子相同或不相同。
易于理解,为提高核酸扩增的效率,在某些实施方案中,步骤(iii)中,可以先使用不带所述标记分子的所述第一引物和/或不带所述标记分子的所述第二引物对所述经标记的核酸分子进行核酸扩增反应;然后,再使用连接有所述第一标记分子的所述第一引物和/或连接有所述第二标记分子的所述第二引物对所述经标记的核酸分子进行额外的核酸扩增反应。
易于理解,上文中对于有关结合分子的详细描述和定义同样适用于所述第一结合 分子、所述第二结合分子;上文中对于有关标记分子的详细描述和定义同样适用于所述第一标记分子、所述第二标记分子。
在本申请的方法中,可以对回收的经标记的核酸分子或回收的富集产物进行核酸扩增反应,以产生用于测序的扩增产物。因此,在某些优选的实施方案中,所述方法还包括下述步骤:
(v)对步骤(ii)回收的经标记的核酸分子或步骤(iv)回收的富集产物进行核酸扩增反应,以产生扩增产物。
在某些优选的实施方案中,在步骤(v)中,所述核酸扩增反应使用至少第三引物和第四引物来进行。其中,所述第三引物能够与所述第一扩增引物序列的互补序列和/或所述第二共有序列的互补序列杂交或退火,且任选地含有第三标签序列;且,所述第四引物能够与所述第一共有序列的互补序列杂交或退火,且任选地含有第二扩增引物序列和/或第四标签序列。
在某些情况下,可以不使用第三和第四标签序列。在某些情况下,可以在第三引物中引入第三标签序列,而不在第四引物中引入第四标签序列。在某些情况下,可以在第四引物中引入第四标签序列,而不在第三引物中引入第三标签序列。在某些情况下,可以在第三和第四引物中分别引入第三和第四标签序列。不受理论限制,第三和/或第四标签序列可例如用于区分来自不同文库的经标记的核酸分子。
因此,在某些优选的实施方案中,所述第三引物含有所述第一扩增引物序列或其部分序列,任选的第三标签序列,以及任选的第二共有序列或其部分序列。
例如,所述第三引物含有:①所述第一扩增引物序列或其部分序列;或者,②所述第一扩增引物序列或其部分序列,以及所述第二共有序列或其部分序列,或者,③所述第一扩增引物序列或其部分序列,第三标签序列,以及所述第二共有序列或其部分序列。
在某些优选的实施方案中,所述第四引物含有第二扩增引物序列,任选的第四标签序列,以及第一共有序列或其部分序列。
例如,所述第四引物含有:①第二扩增引物序列,以及第一共有序列或其部分序列;或,②第二扩增引物序列,第四标签序列,以及第一共有序列或其部分序列。
在本申请的方法中,第三标签序列不受其组成或长度的限制,只要其能发挥标识 作用即可。在某些优选的实施方案中,所述第三标签序列具有至少3个,至少4个,至少5个,至少6个,至少7个,至少8个,至少9个,至少10个或更多个核苷酸的长度。在某些优选的实施方案中,所述第三标签序列不包含修饰,或者包含修饰的核苷酸。
在本申请的方法中,第四标签序列不受其组成或长度的限制,只要其能发挥标识作用即可。在某些优选的实施方案中,所述第四标签序列具有至少3个,至少4个,至少5个,至少6个,至少7个,至少8个,至少9个,至少10个或更多个核苷酸的长度。在某些优选的实施方案中,所述第四标签序列不包含修饰,或者包含修饰的核苷酸。
不受理论限制,可以使用各种合适的核酸聚合酶(例如,DNA聚合酶)来进行产生用于测序的扩增产物的核酸扩增反应,只要其能够以经标记的核酸分子或富集产物为模板进行核酸扩增反应(例如延伸第三引物和第四引物)即可。在某些示例性实施方案中,可使用具有链置换活性的核酸聚合酶(例如,具有链置换活性的DNA聚合酶)来进行所述核酸扩增反应。在某些示例性实施方案中,可使用具有高保真性的核酸聚合酶(例如,具有高保真性的DNA聚合酶)来进行所述核酸扩增反应。在某些优选的实施方案中,步骤(v)中的所述核酸扩增反应使用核酸聚合酶(例如DNA聚合酶;例如具有链置换活性和/或高保真性的DNA聚合酶)来进行。
用于富集经标记的核酸分子的核酸扩增反应和用于产生待测序的核酸分子的核酸扩增反应可使用相同或者不同的核酸聚合酶(例如DNA聚合酶)。因此,在某些优选的实施方案中,步骤(v)使用的核酸聚合酶(例如DNA聚合酶)与步骤(iii)相同或者不同。
在某些优选的实施方案中,所述核酸分子文库包含步骤(v)的扩增产物。
在某些优选的实施方案中,所述扩增产物的一条核酸链从5'末端到3'末端包含第二扩增引物序列,任选的第四标签序列,第一共有序列,第一标签序列,转座酶识别序列,cDNA片段的序列,模板转换序列的互补序列,独特分子标签序列的互补序列,第二标签序列的互补序列,第二共有序列的互补序列,任选的第三标签序列的互补序列,以及第一扩增引物序列的互补序列。在某些优选的实施方案中,所述cDNA片段包含与RNA(例如,mRNA、长链非编码RNA、eRNA)的5’端序列互补的序列。
在某些优选的实施方案中,所述核酸分子文库用于转录组测序(例如,5’端转录组 测序)或用于靶核酸(例如,V(D)J序列)的测序。在某些优选的实施方案中,所述靶核酸含有细胞转录产生的目的核酸的序列或其互补序列。在某些优选的实施方案中,所述靶核酸包含,(1)编码T细胞受体(TCR)或B细胞受体(BCR)的核苷酸序列或其部分序列(例如,V(D)J序列),或(2)(1)的互补序列。在某些优选的实施方案中,所述靶核酸包含,V(D)J基因的序列或其互补序列。
在某些实施方案中,所述方法还包括,对靶核酸分子进行富集的步骤。
易于理解,对靶核酸分子进行富集的步骤可在所述方法的步骤(i)之后的任意过程中进行。
在某些实施方案中,所述方法中,在所述步骤(ii)之后,在所述步骤(iii)之后,或者,在所述步骤(v)之后,对所述靶核酸分子进行富集。
可通过各种已知的方式来进行靶核酸分子的富集和回收。例如,可使用寡核苷酸探针来对所述多个经标记的核酸分子中的靶核酸分子进行特异性富集。在一些实施方案中,所述寡核苷酸探针含有能够与所述靶核酸分子特异性结合或退火的寡核苷酸序列。在一些实施方案中,所述寡核苷酸探针含有标记分子;并且,可使用一种或多种结合分子回收和纯化与所述寡核苷酸探针特异性结合或退火的靶核酸分子;其中,所述结合分子与所述标记分子能发生特异性相互作用或者非特异性相互作用。
在某些实施方案中,所述靶核酸分子包含:(i)编码T细胞受体(TCR)的核苷酸序列或其部分序列(例如,V(D)J序列),和/或,(ii)(i)的互补序列。
在某些实施方案中,可使用包含第一寡核苷酸探针和第二寡核苷酸探针的寡核苷酸探针组对所述多个经标记的核酸分子中的所述靶核酸分子进行特异性富集;其中,所述第一寡核苷酸探针含有能够与编码TCR的α链恒定区的核苷酸序列或其互补序列特异性结合或退火的第一特异性寡核苷酸序列,以及第一标记分子;所述第二寡核苷酸探针含有能够与编码TCR的β链恒定区的核苷酸序列或其互补序列特异性结合或退火的第二特异性寡核苷酸序列,以及第二标记分子;并且,可使用能与所述第一标记分子发生相互作用的第一结合分子和/或能与所述第二标记分子发生相互作用的第二结合分子回收和纯化与所述寡核苷酸探针退火的靶核酸分子。在某些实施方案中,所述第一标记分子与所述第二标记分子相同。在某些实施方案中,所述第一标记分子与所述标记分子不相同。
在某些实施方案中,可参照Tu,A.A.et al.TCR sequencing paired with massively  parallel 3'RNA-seq reveals clonotypic T cell signatures.Nat Immunol 20,1692-1699(2019)中所述的方法对所述靶核酸分子进行富集。
在某些实施方案中,所述靶核酸分子包含:(i)编码B细胞受体(BCR)的核苷酸序列或其部分序列(例如,V(D)J序列),和/或,(ii)(i)的互补序列。
在某些实施方案中,可使用包含第三寡核苷酸探针和第四寡核苷酸探针的寡核苷酸探针组对所述多个经标记的核酸分子中的所述靶核酸分子进行特异性富集;其中,所述第三寡核苷酸探针含有能够与编码BCR的轻链恒定区的核苷酸序列或其互补序列特异性结合或退火的第三特异性寡核苷酸序列,以及第三标记分子;所述第四寡核苷酸探针含有能够与编码BCR的重链恒定区的核苷酸序列或其互补序列特异性结合或退火的第四特异性寡核苷酸序列,以及第四标记分子;并且,可使用能与所述第三标记分子发生相互作用的第三结合分子和/或能与所述第四标记分子发生相互作用的第四结合分子回收和纯化与所述寡核苷酸探针退火的靶核酸分子。在某些实施方案中,所述第三标记分子与所述第四标记分子相同。在某些实施方案中,所述第三标记分子与所述第四标记分子不相同。
易于理解,上文中对于有关标记分子和结合分子的详细描述和定义同样适用于此处。
在一个方面,本申请还提供了一种对细胞或细胞核进行核酸测序的方法,其包括:
根据本申请上文所述的方法构建核酸分子文库;和,
对所述核酸分子文库进行测序。
在某些优选的实施方案中,在测序之前,将至少2个,至少3个,至少4个,至少5个,至少8个,至少10个,至少12个,至少15个,至少18个,至少20个,至少25个或更多个核酸分子文库合并,然后进行测序;其中,每个核酸分子文库各自具有多个核酸分子(即,扩增产物),且同一个文库中的所述多个核酸分子具有相同的第三标签序列或者相同的第四标签序列;且,来源于不同文库的核酸分子具有彼此不同的第三标签序列或者彼此不同的第四标签序列。
在一个方面,本申请还提供了一种核酸分子文库,其包含多个核酸分子,其中,
所述核酸分子的一条核酸链从5'末端到3'末端包含第一共有序列,第一标签序列, 转座酶识别序列,cDNA片段的序列,模板转换序列的互补序列,独特分子标签序列的互补序列,第二标签序列的互补序列,第二共有序列的互补序列。其中,所述cDNA片段包含与RNA(例如,mRNA、长链非编码RNA、eRNA)的5’端序列互补的序列。
在某些优选的实施方案中,各个核酸分子的所述核酸链具有相同的第一共有序列,相同的转座酶识别序列,相同的模板转换序列的互补序列,和相同的第二共有序列的互补序列。
在某些优选的实施方案中,cDNA片段衍生自同一个细胞的核酸分子的所述核酸链具有相同的第一标签序列,和相同的第二标签序列的互补序列。
在某些优选的实施方案中,所述核酸链还具有位于第一共有序列上游的第二扩增引物序列和任选的第四标签序列。
在某些优选的实施方案中,所述核酸链还具有位于第二共有序列的互补序列下游的任选的第三标签序列的互补序列和第一扩增引物序列的互补序列。
易于理解,所述核酸分子文库可使用本申请提供的构建核酸分子文库的方法来构建。因此,上文中对于各个元件(包括但不限于,所述第二扩增引物序列,第四标签序列,第一共有序列,第一标签序列,转座酶识别序列,cDNA片段,模板转换序列,独特分子标签序列,第二标签序列,第二共有序列,第三标签序列,和/或第一扩增引物序列)的详细描述和定义同样适用于本方面。
在某些优选的实施方案中,所述核酸分子文库是转录组文库。
在某些优选的实施方案中,所述核酸分子文库中的核酸分子衍生自免疫细胞。
在某些优选的实施方案中,所述免疫细胞选自B细胞和T细胞。
在某些优选的实施方案中,所述核酸分子文库是通过本申请提供的方法构建的。
在一个方面,本申请还提供了一种试剂盒,其包含:逆转录酶,转座酶,和一种或多种所述转座酶能够识别并结合的转座序列,其中,
所述转座酶和转座序列能够形成转座酶复合体,所述转座酶复合体能够切割或断裂双链核酸(例如,含有RNA和DNA的杂合双链核酸);并且,
所述转座序列包含转移链和非转移链。其中,所述转移链包含转座酶识别序列,第一标签序列,以及,第一共有序列。其中,所述第一标签序列位于所述转座酶识别序列的上游(例如5’端),且,所述第一共有序列位于所述第一标签序列的上游(例 如5’端)。
在某些优选的实施方案中,所述试剂盒包含至少2种(例如,至少3种,至少4种,至少5种,至少8种,至少10种,至少20种,至少50种,至少100种,至少200种,或更多种)转座序列。其中,各种转座序列具有彼此不同的第一标签序列。在某些优选的实施方案中,各种转座序列具有相同的转座酶识别序列,相同的第一共有序列,和/或,相同的非转移链。
在某些优选的实施方案中,所述逆转录酶具有末端转移活性。在某些优选的实施方案中,所述逆转录酶能够以RNA(例如,mRNA、长链非编码RNA、eRNA)为模板,合成cDNA链,且在所述cDNA链的3’端添加悬突。在某些优选的实施方案中,所述逆转录酶能够在cDNA链的3’末端添加长度为至少1个,至少2个,至少3个,至少4个,至少5个,至少6个,至少7个,至少8个,至少9个,至少10个或更多个核苷酸的悬突。在某些优选的实施方案中,所述逆转录酶能够在cDNA链的3’末端添加2-5个胞嘧啶核苷酸的悬突(例如CCC悬突)。在某些优选的实施方案中,所述逆转录酶不具有或者具有降低的RNase活性(特别是RNase H活性)。在某些优选的实施方案中,所述逆转录酶选自,经修饰或突变以去除RNase活性(特别是RNase H活性)的M-MLV逆转录酶、HIV-1逆转录酶、AMV逆转录酶和端粒酶逆转录酶(例如,不具有RNase H活性的M-MLV逆转录酶)。
在某些优选的实施方案中,所述转座酶选自Tn5转座酶、MuA转座酶、睡美人转座酶、Mariner转座酶、Tn7转座酶、Tn10转座酶、Ty1转座酶、Tn552转座酶,以及具有上述转座酶的转座活性的变体、修饰产物和衍生物。在某些优选的实施方案中,所述转座酶为Tn5转座酶。
在某些优选的实施方案中,所述第一标签序列连接(例如直接连接)至所述转座酶识别序列的5’端。
在某些优选的实施方案中,所述第一共有序列连接(例如直接连接)至所述第一标签序列的5’端。
在某些优选的实施方案中,所述转移链从5’端至3’端包含第一共有序列,第一标签序列,和转座酶识别序列。在某些优选的实施方案中,所述转座酶识别序列具有如SEQ ID NO:99所示的序列。
在某些优选的实施方案中,所述非转移链能够与所述转移链退火或杂交形成双链 体。在某些优选的实施方案中,所述非转移链包含与转移链中的转座酶识别序列互补的序列。在某些优选的实施方案中,所述非转移链具有如SEQ ID NO:1所示的序列。
在某些优选的实施方案中,所述转移链不包含修饰,或者包含修饰的核苷酸;和/或,所述非转移链不包含修饰,或者包含修饰的核苷酸。在某些优选的实施方案中,所述非转移链的5’末端具有磷酸基团修饰;和/或,所述非转移链的3’末端是封闭的(例如,所述非转移链的3’末端核苷酸为双脱氧的核苷酸)。
在某些优选的实施方案中,所述试剂盒还包含逆转录引物,例如包含poly(T)序列的引物和/或包含随机寡核苷酸序列的引物。在某些优选的实施方案中,所述poly(T)序列或所述随机寡核苷酸序列位于所述引物的3’端。在某些优选的实施方案中,所述poly(T)序列包含至少5个(例如,至少10个、至少15个、或至少20个)胸腺嘧啶核苷酸残基。在某些优选的实施方案中,所述随机寡核苷酸序列具有5-30nt(例如,5-10nt,10-20nt,20-30nt)的长度。在某些优选的实施方案中,所述引物不包含修饰,或者包含修饰的核苷酸。
在某些优选的实施方案中,本申请所述的试剂盒还包含,用于构建转录组测序文库的试剂。
在某些优选的实施方案中,所述用于构建转录组测序文库的试剂包括:偶联寡核苷酸分子的珠粒,所述寡核苷酸分子含有标记序列。
在某些优选的实施方案中,所述寡核苷酸分子偶联至珠粒的表面,和/或,封闭在珠粒内。
在某些优选的实施方案中,所述珠粒能够自发地或在暴露于一种或多种刺激(例如,温度变化、pH变化、暴露于特定化学物质或相、暴露于光、还原剂等)时释放所述寡核苷酸。
在某些优选的实施方案中,所述珠粒是凝胶珠粒。
在某些优选的实施方案中,所述标记序列包含选自下列的元件:第一扩增引物序列,第二共有序列,第二标签序列,独特分子标签序列,模板转换序列,或其任何组合。
在某些优选的实施方案中,所述标记序列包含第二共有序列,第二标签序列,独特分子标签序列和模板转换序列。在某些优选的实施方案中,所述标记序列还包含第一扩增引物序列。
在某些优选的实施方案中,所述模板转换序列包含与所述逆转录酶在cDNA链的3’末端添加的悬突互补的序列。在某些优选的实施方案中,所述悬突为2-5个胞嘧啶核苷酸的悬突(例如CCC悬突),且所述模板转换序列的3’末端包含2-5个鸟嘌呤核苷酸(例如GGG)。在某些优选的实施方案中,所述模板转换序列不包含修饰,或者包含修饰的核苷酸(例如锁核酸)。
在某些优选的实施方案中,所述独特分子标签序列具有至少5个,至少6个,至少7个,至少8个,至少9个,至少10个或更多个核苷酸的长度。在某些优选的实施方案中,所述独特分子标签序列不包含修饰,或者包含修饰的核苷酸。
在某些优选的实施方案中,所述第二标签序列具有至少3个,至少4个,至少5个,至少6个,至少7个,至少8个,至少9个,至少10个或更多个核苷酸的长度。在某些优选的实施方案中,所述第二标签序列不包含修饰,或者包含修饰的核苷酸。
在某些优选的实施方案中,所述第二共有序列具有至少3个,至少4个,至少5个,至少6个,至少7个,至少8个,至少9个,至少10个或更多个核苷酸的长度。在某些优选的实施方案中,所述第二共有序列不包含修饰,或者包含修饰的核苷酸。
在某些优选的实施方案中,所述珠粒偶联了多个寡核苷酸分子,并且,各个寡核苷酸分子具有彼此不同的独特分子标签序列。在某些优选的实施方案中,各个寡核苷酸分子具有相同的第二标签序列和/或相同的第二共有序列。
在某些优选的实施方案中,所述试剂含有多个珠粒,并且,每个珠粒各自具有多个寡核苷酸分子;并且,同一个珠粒上的所述多个寡核苷酸分子具有相同的第二标签序列,并且,不同珠粒上的寡核苷酸分子具有彼此不同的第二标签序列。在某些优选的实施方案中,各个珠粒上的寡核苷酸分子具有相同的第二共有序列。在某些优选的实施方案中,各个珠粒上的寡核苷酸分子还具有相同的第一扩增引物序列。
在某些优选的实施方案中,所述模板转换序列位于所述标记序列的3’末端。
在某些优选的实施方案中,所述第二共有序列位于所述第二标签序列,独特分子标签序列和/或模板转换序列的上游。
在某些优选的实施方案中,所述第一扩增引物序列位于所述第二共有序列的上游。
在某些优选的实施方案中,所述标记序列从5’端至3’端包含任选的第一扩增引物序列,第二共有序列,第二标签序列,独特分子标签序列和模板转换序列。
在某些优选的实施方案中,所述试剂盒还包含矿物油,缓冲液,dNTP,一种或多 种核酸聚合酶(例如DNA聚合酶;例如具有链置换活性和/或高保真性的DNA聚合酶),用于回收或纯化核酸的试剂(例如磁珠),用于扩增核酸的引物(例如上文所定义的第一引物,第二引物,第三引物,第四引物,或其任何组合),或其任何组合。
在某些优选的实施方案中,所述试剂盒还包含用于测序的试剂,例如用于二代测序的试剂。
易于理解,本申请所述的试剂盒可用于实施本申请提供的方法(例如,如上文所述的处理细胞或细胞核以产生核酸片段群的方法;生成经标记的核酸分子的方法;构建核酸分子文库的方法;和/或,对细胞或细胞核进行转录组测序的方法)。因此,上文对于各种组分和元件(包括但不限于,逆转录酶,转座酶,转座序列,寡核苷酸分子,珠粒,标记序列,核酸聚合酶,第一、第二、第三和第四引物,以及其中的元件(例如,第二扩增引物序列,第四标签序列,第一共有序列,第一标签序列,转座酶识别序列,cDNA片段,模板转换序列,独特分子标签序列,第二标签序列,第二共有序列,第三标签序列,和/或第一扩增引物序列))的详细描述和定义同样适用于本方面。
在一个方面,本申请还提供了所述方法(例如,如本文所述的,处理细胞或细胞核以产生核酸片段群的方法;生成经标记的核酸分子的方法;和/或,构建核酸分子文库的方法)或所述试剂盒用于构建核酸分子文库或用于进行转录组测序的用途。
在某些优选的实施方案中,所述核酸分子文库用于进行转录组测序(例如,单细胞转录组测序)。
在某些优选的实施方案中,所述方法或试剂盒用于进行单细胞转录组测序。在某些优选的实施方案中,所述方法或试剂盒用于分析细胞或细胞核(例如,免疫细胞或其细胞核)的基因表达水平,基因转录起始位置,和/或,RNA(例如,mRNA、长链非编码RNA、eRNA)分子的5’末端序列。
在某些优选的实施方案中,所述方法或试剂盒用于构建细胞或细胞核(例如,免疫细胞或其细胞核)的转录组文库或用于进行细胞或细胞核(例如,免疫细胞或其细胞核)的转录组测序。
在某些优选的实施方案中,所述免疫细胞选自B细胞和T细胞。
发明的有益效果
本申请提供了一种新的标记核酸分子(例如RNA分子,例如mRNA分子、长链非编码RNA、eRNA)的方法,其中,所产生的经标记的核酸分子可以方便地用于构建核酸分子文库(特别是转录组测序文库),且可以方便地用于高通量测序(特别是,高通量的单细胞转录组测序)。本申请的方法具有一个或多个选自下列的有益技术效果:
(1)利用本申请方法构建的核酸分子文库含有RNA分子(例如,mRNA分子、长链非编码RNA、eRNA)的5’末端序列的信息;相应地,从所述核酸分子文库获得的高通量测序数据不仅可以用于分析转录组中的RNA分子(例如,mRNA分子、长链非编码RNA、eRNA)丰度,而且可以用于分析RNA分子(例如,mRNA分子、长链非编码RNA、eRNA)的转录起始位置,分析RNA分子(例如,mRNA分子)的5’末端序列。因此,本申请方法可方便地用于分析TCR和BCR的序列,与V(D)J测序方法兼容。
(2)本申请的标记核酸分子(例如RNA分子,例如mRNA分子、长链非编码RNA、eRNA)的方法兼容当前主要的转录组建库技术和平台(包括,基于微流控液滴的高通量单细胞转录组建库技术,基于微孔板的高通量单细胞转录组建库技术等),可方便地进行商业化应用。
(3)利用本申请方法构建的核酸分子文库可以显著降低“假单细胞率”对测序过程和测序数据的不利影响。当前,用于转录组测序的主要建库和测序方案(特别是5’末端建库和测序方案)受到“假单细胞现象”的严重限制。通常情况下,由“假单细胞”产生的测序数据因无法准确反映单个细胞的转录组信息,而需要被过滤和去除。这导致了测序数据的浪费,和测序成本的提高。利用本申请方法构建的核酸分子文库具有双重的细胞标签(例如第一标签和第二标签),这使得能够对由“假单细胞”产生的测序数据进行拆分,进而准确追踪和确定测序数据的细胞来源。换言之,通过利用本申请的方法,即使是由“假单细胞”产生的测序数据,也能够被使用;建库过程中出现的“假单细胞现象”的负面影响被大大降低。相应地,测序数据的利用率大大提高,测序成本显著降低。
(4)利用本申请方法构建的核酸分子文库可以大幅降低建库过程中的微反应体系空载率。受到“假单细胞现象”的限制,当前主要的转录组建库和测序方案(特别是5’末端建库和测序方案)所要求的微反应体系空载率都非常高(接近于99%)。换言之, 一百个微反应体系中,仅有1个微反应加载了细胞。这导致反应体系和试剂的极大浪费,导致建库成本非常高昂。本申请方法极大消除了“假单细胞现象”的负面影响,这使得在建库过程中能够使用更高的细胞通量(例如,细胞通量可提高至少5倍,至少10倍,甚至100倍),大幅降低了微反应体系空载率,大幅降低建库成本。例如,在某些示例性实施方案中,借助于本申请的方法,单次建库使用的细胞通量可提高至10万-100万个细胞。
(5)本申请方法单次反应构建的核酸分子文库,可以来自不同的样品(例如,来自2个,或者94个,或者384个人体的细胞)。单次反应的样品数量,取决于第一标签的种类,易于扩展,理论上没有上限。本申请方法以较低成本大幅增加了单次反应的样品通量,适用于大规模人群样本的单细胞转录组测序,大规模人群样本的免疫图谱分析,基于细胞系和类器官的多种条件的药物筛选,大规模并行的基因编辑结果的单细胞解读等应用场景。
(6)本申请方法能够兼容基于细胞或细胞核的建库方案,并且,能够对单个细胞或细胞核中的RNA分子(例如,mRNA分子、长链非编码RNA、eRNA)进行多重标记,应用更加灵活和广泛。
下面将结合附图和实施例对本发明的实施方案进行详细描述,但是本领域技术人员将理解,下列附图和实施例仅用于说明本发明,而不是对本发明的范围的限定。根据附图和优选实施方案的下列详细描述,本发明的各种目的和有利方面对于本领域技术人员来说将变得显然。
附图说明
图1显示了,利用本发明的方法构建用于单细胞转录组测序的文库的示例性方案,以及,文库中用于测序的核酸分子的示例性结构。所述示例性方案包含以下步骤。
首先,将透化的细胞或细胞核分为一个或多个亚集(例如至少1个,至少2个,至少3个,至少4个,至少5个,至少8个,至少10个,至少20个,至少50个,至少100个,至少200个,或更多个亚集);并且,用逆转录酶(例如,具有末端转移活性的逆转录酶)和逆转录引物(例如,3’端携带poly(T)序列或随机寡核苷酸序列的逆转录引物)对细胞核/透化的细胞内的RNA分子(例如,mRNA分子、长链非编码RNA、eRNA) 进行逆转录,以生成cDNA,并在cDNA的3’端添加悬突(例如,包含3个胞嘧啶核苷酸的悬突)。细胞或细胞核内的RNA(例如,mRNA、长链非编码RNA、eRNA)与生成的cDNA形成了杂合双链核酸。在该步骤中,也可以先用逆转录酶(例如,具有末端转移活性的逆转录酶;例如,M-MLV逆转录酶)和逆转录引物(例如,3’端携带poly(T)序列或随机寡核苷酸序列的逆转录引物)对细胞核/透化的细胞内的RNA分子(例如,mRNA分子、长链非编码RNA、eRNA)进行逆转录,以生成包含RNA(例如,mRNA、长链非编码RNA、eRNA)与cDNA的杂合双链核酸,再将透化的细胞或细胞核分为多个亚集。可使用各种具有末端转移活性的逆转录酶来进行逆转录反应。在某些优选的实施方案中,所使用的逆转录酶不具有或者具有降低的RNase活性(特别是RNase H活性)。
其次,用能够切割或断裂杂合双链核酸的转座酶复合体(例如Tn5转座酶复合体)对包含RNA(例如,mRNA、长链非编码RNA、eRNA)与cDNA的杂合双链核酸进行转座,使所述杂合双链核酸随机断裂。在该步骤中,所使用的转座酶复合体含有转座酶(例如,Tn5转座酶)和所述转座酶能够识别并结合的转座序列(例如,含有Tn5转座酶识别序列的转座序列);其中,所述转座序列包含转移链和非转移链;所述转移链包含转座酶识别序列(Tn5转座酶识别序列;Tn5-S),第一标签序列(Tag1)和第一共有序列(C1);其中,所述第一标签序列位于所述转座酶识别序列的上游(例如5’端),且,所述第一共有序列位于所述第一标签序列的上游(例如5’端);并且,所述非转移链包含与转移链中的转座酶识别序列互补的序列。由此,在转座后,转座酶复合体将杂合双链核酸随机断裂成核酸片段,并且,携带第一标签序列和第一共有序列的转移链连接至断裂的cDNA链的5’端。
此外,在该步骤中,针对每个亚集,所使用的转座酶复合体具有彼此不同的第一标签序列;由此,从各个亚集的细胞或细胞核产生的核酸片段含有彼此不同的第一标签序列。在优选的实施方案中,除了所述第一标签序列之外,各个亚集所使用的转座酶复合体可以具有相同的转座酶,相同的转座酶识别序列,相同的第一共有序列,以及,相同的非转移链。由此,从各个亚集的细胞或细胞核产生的核酸片段具有相同的第一共有序列和转座酶识别序列;且从同一亚集的细胞或细胞核产生的核酸片段具有相同的第一标签序列;且从不同亚集的细胞或细胞核产生的核酸片段具有彼此不同的第一标签序列。
随后,将多个亚集(例如,所有亚集)的细胞核或细胞合并,并与多个偶联寡核苷酸 分子的珠粒接触,以生成经标记的核酸分子。在优选的实施方案中,在液滴(例如油包水液滴)中,使细胞核或细胞与偶联寡核苷酸分子的珠粒(例如,10X Genomics提供的用于转录组建库的珠粒)接触;其中,所述寡核苷酸分子含有标记序列。示例性的标记序列可含有第一扩增引物序列(P1),第二共有序列(C2),第二标签序列(Tag2),独特分子标签序列(UMI),模板转换序列(TSO),或其任何组合。在优选的实施方案中,标记序列可含有第一扩增引物序列,第二共有序列,第二标签序列,独特分子标签序列,和模板转换序列。模板转换序列通常可位于标记序列的3’末端。第一扩增引物序列和/或第二共有序列通常可位于标记序列的5’末端。可以使用各种方式来制备含有细胞核或细胞与偶联寡核苷酸分子的珠粒的油包水液滴。例如,在某些示例性实施方案中,可以使用10X GENOMICS Chromium平台或控制器进行油包水液滴的制备。
在示例性实施方案中,模板转换序列可包含与cDNA链的3’末端悬突互补的序列。例如,当cDNA链的3’末端包含3个胞嘧啶核苷酸的悬突时,模板转换序列可在其3’端包含GGG。此外,还可以对模板转换序列的核苷酸进行修饰(例如,使用锁核酸),以增强模板转换序列与cDNA链的3’末端悬突之间的互补配对。借助于如此设计的模板转换序列,细胞核或细胞内含有cDNA链3’末端的核酸片段将被寡核苷酸分子所捕获,二者可以进行退火或杂交。随后,被捕获的核酸片段在核酸聚合酶的作用下,可以以寡核苷酸分子为模板进行延伸,在cDNA链3’末端添加标记序列的互补序列,从而生成5’端携带第一标签序列和第一共有序列、且3’端携带标记序列的互补序列的经标记的核酸分子。不受理论限制,可以使用各种合适的核酸聚合酶(例如,DNA聚合酶或逆转录酶)来进行延伸反应,只要其能够以寡核苷酸分子为模板延伸被捕获的核酸片段即可。在某些示例性实施方案中,可使用与前述逆转录步骤相同的逆转录酶来延伸被捕获的核酸片段。在该过程中,通常仅含有cDNA链3’末端的核酸片段能够通过cDNA链3’末端的悬突,而被寡核苷酸分子所捕获,因此,所生成的经标记的核酸分子通常将含有cDNA链3’末端的序列(其对应于RNA(例如,mRNA、长链非编码RNA、eRNA)的5’末端的序列)。由此,对所生成的经标记的核酸分子或其衍生物进行测序,可以获得细胞或细胞核内RNA(例如,mRNA)的5’末端的序列信息。
在示例性实施方案中,每个珠粒各自偶联有多个寡核苷酸分子;并且,同一个珠粒上的各个寡核苷酸分子可具有彼此不同的独特分子标签序列;并且,同一个珠粒上的各个寡核苷酸分子可具有相同的第二标签序列;并且,不同珠粒上的寡核苷酸分子 可具有彼此不同的第二标签序列。借助于标记序列的这一设计,由同一个液滴中的核酸片段所生成的经标记的核酸分子可携带相同的第二标签序列的互补序列,以及彼此不同的独特分子标签序列的互补序列(用于标记同一个液滴内的不同核酸片段);由不同液滴中的核酸片段所生成的经标记的核酸分子可携带彼此不同的第二标签序列的互补序列。
此外,为了便于后续的核酸操作,各个珠粒上的各个寡核苷酸分子可具有相同的第二共有序列和/或相同的第一扩增引物序列。由此,各个液滴中的核酸片段所生成的经标记的核酸分子可携带相同的第二共有序列的互补序列和/或相同的第一扩增引物序列的互补序列。例如,第一扩增引物序列可包含文库接头序列(例如,P5接头序列)。由此,可以在经标记的核酸分子的末端添加文库接头,从而便于后续的测序。随后,可回收和合并生成的多个经标记的核酸分子。
可根据需要,富集所述经标记的核酸分子。例如,可以对经标记的核酸分子进行核酸扩增反应,以产生富集产物。在示例性实施方案中,所述核酸扩增反应可使用至少第一引物来进行。在一些情况下,可以设计第一引物,以使其能够与所述第一扩增引物序列的互补序列和/或所述第二共有序列的互补序列杂交或退火。示例性的第一引物含有:所述第一扩增引物序列或其部分序列,或者所述第二共有序列或其部分序列,或者,二者的组合。不受理论限制,可以使用各种合适的核酸聚合酶(例如,DNA聚合酶)来进行用于富集经标记的核酸分子的核酸扩增反应,只要其能够以经标记的核酸分子为模板延伸第一引物即可。在某些示例性实施方案中,可使用具有链置换活性的核酸聚合酶(例如,具有链置换活性的DNA聚合酶)来进行所述核酸扩增反应。在某些示例性实施方案中,可使用具有高保真性的核酸聚合酶(例如,具有高保真性的DNA聚合酶)来进行所述核酸扩增反应。随后,可以回收和纯化生成的富集产物。易于理解,富集步骤并非是必需的,且可以视实际情况进行。
随后,可以对回收的经标记的核酸分子或回收的富集产物进行核酸扩增反应,以产生用于测序的扩增产物。在示例性实施方案中,所述核酸扩增反应可使用至少第三引物和第四引物来进行。可以设计第三引物,以使其能够与所述第一扩增引物序列的互补序列和/或所述第二共有序列的互补序列杂交或退火,且任选地含有第三标签序列(Tag3)。此外,可以设计第四引物,以使其能够与所述第一共有序列的互补序列杂交或退火,且任选地含有第二扩增引物序列(P2)和/或第四标签序列(Tag4)。
在某些情况下,可以不使用第三和第四标签。在某些情况下,可以在第三引物中引入第三标签,而不在第四引物中引入第四标签。在某些情况下,可以在第四引物中引入第四标签,而不在第三引物中引入第三标签。在某些情况下,可以在第三和第四引物中分别引入第三和第四标签。不受理论限制,第三和/或第四标签可例如用于区分来自不同文库的经标记的核酸分子。
因此,在示例性实施方案中,第三引物可含有:①所述第一扩增引物序列或其部分序列;或者,②所述第一扩增引物序列或其部分序列,以及所述第二共有序列或其部分序列,或者,③所述第一扩增引物序列或其部分序列,第三标签序列,以及所述第二共有序列或其部分序列。在示例性实施方案中,所述第四引物可含有:①第二扩增引物序列,以及第一共有序列或其部分序列;或,②含有第二扩增引物序列,第四标签序列,以及第一共有序列或其部分序列。
不受理论限制,可以使用各种合适的核酸聚合酶(例如,DNA聚合酶)来进行产生用于测序的扩增产物的核酸扩增反应,只要其能够以经标记的核酸分子或富集产物为模板延伸第三引物和第四引物即可。在某些示例性实施方案中,可使用具有链置换活性的核酸聚合酶(例如,具有链置换活性的DNA聚合酶)来进行所述核酸扩增反应。在某些示例性实施方案中,可使用具有高保真性的核酸聚合酶(例如,具有高保真性的DNA聚合酶)来进行所述核酸扩增反应。用于富集经标记的核酸分子的核酸扩增反应和用于产生待测序的核酸分子的核酸扩增反应可使用相同或者不同的核酸聚合酶(例如DNA聚合酶)。
在示例性实施方案中,第二扩增引物序列可包含文库接头序列(例如,P7接头序列)。由此,生成的扩增产物可在两端分别包含文库接头序列(例如,P5接头序列和P7接头序列),并且可用于后续的测序(例如,进行二代测序)。
图1还显示了通过上述示例性实施方案所构建的文库中,待测序的核酸分子(扩增产物)的一条核酸链的示例性结构,其包含:第二扩增引物序列(例如,P7接头序列),第四标签序列,第一共有序列,第一标签序列,转座酶识别序列,cDNA片段的序列,模板转换序列的互补序列,独特分子标签序列的互补序列,第二标签序列的互补序列,第二共有序列的互补序列,以及第一扩增引物序列(例如,P5接头序列)的互补序列;其中,所述cDNA片段包含与RNA(例如,mRNA)的5’端序列互补的序列。
图2显示了用Tn5转座酶复合体对小鼠基因组DNA进行转座的产物的凝胶电泳结 果。实验结果表明,Tn5转座酶复合体可以将小鼠基因组DNA(全长23kb)打断为300-600bp的条带。
图3显示了使用10X Genomics chromium平台及试剂盒来构建用于5’端转录组测序的文库时,捕获(回收)的细胞数量和假单细胞率与上机细胞数量之间的关系。结果显示,假单细胞率与上机细胞数量之间呈线性函数关系:上机细胞数量越多,假单细胞率越高。
图4显示了实施例4中使用透化细胞或细胞核制备的油包水液滴中,包含不同数量的第一标签(1-9个标签)的液滴的数量和比率。结果显示,包含两种或更多种第一标签的液滴的比率为34.63%(细胞核样品)或42.15%(细胞样品)。
图5显示了实施例4中使用来自HEK293T细胞、Hela细胞和K562细胞的细胞核进行5’末端转录组建库和测序获得的单细胞转录组分析结果;其中,图5A显示了各细胞系的标志基因的表达情况;图5B显示了各细胞系的分群可视化结果。
图6显示了实施例4中使用透化的HEK293T细胞、Hela细胞和K562细胞进行5’末端转录组建库和测序获得的单细胞转录组分析结果;其中,图6A显示了各细胞系的标志基因的表达情况;图6B显示了各细胞系的分群可视化结果。
图7显示了实施例5中使用Hela细胞的透化细胞样品和细胞核样品,用3种不同的逆转录引物进行5’末端转录组建库和测序获得的单细胞转录组分析结果,横坐标为测序深度,纵坐标为检测到的基因数。
图8显示了实施例6中使用来自14个人外周血的富集到的T细胞的单细胞转录组数据分析结果。其中,图8A显示了各种细胞类群的可视化结果;图8B显示了各细胞类群的细胞数量。
图9显示了实施例6中使用来自14个人外周血的富集到的T细胞的TCR VDJ区域的数据分析结果。其中,图9A显示了检测到的不同TCR克隆型对应的T细胞分布的可视化结果;图9B显示了检测到的TCR克隆的各类群的细胞数量。
图10-图11显示了实施例6中使用来自14个人外周血的富集到的T细胞的TCR克隆的基本信息。其中图10显示了14个样品分别检测到主要的TCR克隆型的分布情况,表明不同的人TCR克隆型呈现多样性;图11A和图11B分别显示了检测到的14个人的TRB基因(图11A)和TRA基因(图11B)分布情况。
序列信息
本发明涉及的序列的信息提供于下面的表1中。
表1
Figure PCTCN2021139123-appb-000001
Figure PCTCN2021139123-appb-000002
Figure PCTCN2021139123-appb-000003
Figure PCTCN2021139123-appb-000004
Figure PCTCN2021139123-appb-000005
注:N=A,T,G or C;“bio”表示生物素(biotin)修饰;V=A,G or C。
具体实施方式
现参照下列意在举例说明本发明(而非限定本发明)的实施例来描述本发明。
实施例中未注明具体条件者,按照常规条件或制造商建议的条件进行。所用试剂或仪器未注明生产厂商者,均为可以通过市购获得的常规产品。本领域技术人员知晓,实施例以举例方式描述本发明,且不意欲限制本申请所要求保护的范围。
实施例1:单端特异性寡核苷酸标记的TN5转座酶复合体的制备
本实施例所用试剂/仪器信息如表2所示:
表2试剂与仪器
Figure PCTCN2021139123-appb-000006
注:“5Phos”表示5’端含磷酸化修饰;“3ddC”表示3’末端为胞嘧啶双脱氧核糖核苷酸。
实验方法:
1.转座子的制备
(1)将转座子接头1(SEQ ID NO:1)和带标签的转座子接头2(SEQ ID NO:2)分别用
Figure PCTCN2021139123-appb-000007
Tagment Enzyme试剂盒中的Annealing buffer溶解至100uM,再把转座子接头1与96种带标签的转座子接头2(标签序列分别如SEQ ID Nos:3-98所示)分别以1:1体积比混匀。混匀步骤为:分别取10ul接头1和接头2,在96孔PCR板中 分别充分混匀后,盖好96孔PCR板密封硅胶盖,在微孔板迷你离心机中瞬时离心,确保溶液都汇聚到管底。
(2)将上述步骤(1)混匀后的转座子接头1和带标签转座子接头2置于PCR仪内,进行如下退火反应程序:(热盖105℃)75℃ 15分钟,60℃ 10分钟,50℃ 10分钟,40℃ 10分钟,25℃ 30分钟。经过退火后的接头混合液即为转座子,-20℃保存备用。
2.TN5转座酶复合体的包埋
(1)取一个新的96孔PCR板,用
Figure PCTCN2021139123-appb-000008
Tagment Enzyme试剂盒中的TruePrep Tagment Enzyme(2μg/μl)和Coupling Buffer配制如下反应液。每个孔包含:2.5ul TruePrep Tagment Enzyme(2μg/μl)、8.25ul Coupling Buffer和1.75ul带标签的转座子(上述步骤1所得)。
(2)用移液器轻轻吹打反应液20次充分混匀,盖好96孔PCR板密封硅胶盖,瞬时离心后,置于PCR仪内,30℃反应1小时(热盖50℃)。反应结束后即可得到单端特异性寡核苷酸标记的TN5转座酶复合体,-20℃保存,最多可存放1年。
3.转座酶复合体效率测试
(1)包埋好的TN5转座酶复合体需要检测片段化效率才能用于后续的实验,本实施例用完整的小鼠基因组DNA作为检测对象。
(2)取
Figure PCTCN2021139123-appb-000009
Tagment Enzyme试剂盒中的5×Tagment Buffer L和上述步骤2得到的TN5转座酶复合体配制以下反应体系:4ul 5×Tagment Buffer L、1ul 100ng/μl DNA和1ul TN5转座酶复合体,用移液器轻轻吹打反应液20次充分混匀,瞬时离心后,置于PCR仪内,55℃反应10分钟(热盖105℃)。待样品温度降低至4℃后,取出PCR管,加5ul终止反应液(100mM Tris-HCl pH8.0,200mM EDTA)充分混匀后室温放置5分钟。
(3)将上述步骤(2)的反应产物进行琼脂糖凝胶电泳检测,以验证TN5转座酶复合体的片段化效率。
实验结果如图2所示。实验结果表明,Tn5转座酶复合体可以将小鼠基因组DNA(全长23kb)打断为300-600bp的条带。
实施例2:单细胞核的制备
本实施例所用试剂/仪器信息如表3所示:
表3试剂与仪器
试剂/仪器名称 品牌 货号
1x PBS Gibco 14190-094
1M Tris-HCl pH7.5 Jena Bioscience BU-125S
1M MgCl 2 Merck Millipore 20-303
5M NaCl 生工 B548121-0100
Tween-20 Sigma P7949-500ML
IGEPAL CA-630 Sigma I8896-50ML
BSA牛血清蛋白 Sigma A8806-5
Digitonin Abcam ab141501
SUPERase-In RNase Inhibitor Thermo Fisher Scientific AM2696
nuclease-free water Invitrogen AM9932
Flowmi Cell Strainer,40μM Bel-Art H13680-0040
1.5ml DNA低吸附管 Eppendorf 30108051
HEK293T细胞系 中国科学院细胞库 GNHu17
Hela细胞系 中国科学院细胞库 TCHu187
K562细胞系 中国科学院细胞库 SCSP-5054
离心机 Thermo Fisher Scientific Micro21R
全自动荧光细胞计数仪 LUNA LUNA FL
实验方法:
1.配置新鲜裂解液(10mM Tris-HCl,ph 7.4,10mM NaCl,3mM MgCl2,0.1%Tween-20,0.1%IGEPAL CA-630,1%BSA,0.01%digitonin,1%SUPERase-In RNase Inhibitor),洗涤液(10mM Tris-HCl,ph 7.4,10mM NaCl,3mM MgCl2,0.1%Tween-20,1%BSA,1%SUPERase-In RNase Inhibitor)和样品稀释液(1x PBS,添加1%BSA,1%SUPERase-In RNase Inhibitor),并放置在冰上预冷。
2.根据实验需要,可以用新鲜组织、新鲜细胞系、新鲜血样、原代细胞、冻存细胞、液氮速冻保存的组织等进行细胞核提取。本实施例选用HEK293T细胞系(购自中国科学院细胞库,目录号GNHu17)、Hela细胞系(购自中国科学院细胞库,目录号TCHu187)和K562细胞系(购自中国科学院细胞库,目录号SCSP-5054)进行实验。培养好细胞系获得单细胞悬液后,用1X PBS洗2次,然后分别取200万个细胞到1.5ml离心管中进行单细胞核的提取,细胞核提取步骤如下:
(1)将细胞沉淀用500ul预冷的裂解液重悬,用1mL移液器轻轻吹打10次,让细胞均匀重悬,置于冰上孵育5分钟后,立即加入1mL预冷洗涤液轻轻吹打5次,4℃500g离心,去上清。
(2)用50ul细胞核稀释液重悬步骤(1)所得沉淀,取少量在显微镜中镜检,如果 有细胞团块可用40um细胞筛过滤。制备好的细胞核放冰上备用,充分混匀后取1ul用全自动荧光细胞计数仪计数。细胞核提取过程会损失40-50%左右的细胞。
实施例3:单细胞悬液透化
本实施例所用试剂/仪器信息如表4所示:
表4试剂与仪器
试剂/仪器名称 品牌 货号
甲醇 Fisher Chemical M/4000/17
1x PBS Gibco 14190-094
BSA牛血清蛋白 Sigma A8806-5
SUPERase-In RNase Inhibitor Thermo Fisher Scientific AM2696
nuclease-free water Invitrogen AM9932
Flowmi Cell Strainer,40μm Bel-Art H13680-0040
1.5ml DNA低吸附管 Eppendorf 30108051
离心机 Thermo Fisher Scientific Micro21R
全自动荧光细胞计数仪 LUNA LUNA FL
实验方法:
1.配制新鲜样品稀释液(1x PBS,添加1%BSA,1%SUPERase-In RNase Inhibitor);
2.根据实验需要,可以用新鲜组织、新鲜细胞系、新鲜血样、原代细胞、冻存细胞样品进行完整单细胞的建库,在建库之前,需要对细胞进行透化。本实施例针对HEK293T细胞系、Hela细胞系和K562细胞系进行细胞透化,透化实验步骤如下:
(1)培养好细胞系获得单细胞悬液后,用1X PBS洗2次,然后分别取200万个细胞到1.5ml离心管中,用1.5mL预冷甲醇重悬细胞,1mL移液器充分混匀后,-20℃孵育10分钟。
(2)4℃ 500g离心,去上清,用1ml样品稀释液分别洗两次,4℃ 500g离心,去上清后,用50ul样品稀释液重悬。
(3)取少量在显微镜中镜检,如果有细胞团块可用40um细胞筛过滤,制备好的细胞放冰上备用,充分混匀后取1ul用全自动荧光细胞计数仪计数,透化后会损失30-40%左右的细胞。
实施例4:单细胞转录组文库的制备
本实施例所用试剂/仪器信息如表5所示:
表5试剂与仪器
Figure PCTCN2021139123-appb-000010
实验方法:
1.逆转录反应
(1)对实施例2获得的单细胞核和实施例3获得的透化细胞样品计数后,各取20万于200ul PCR管中,加入3ul 25uM逆转录引物(SEQ ID NO:100),总反应体系为10ul,不足的部分用无核酸酶的水补足。本实验的逆转录反应体系通常可用于对5万-50万细胞/细胞核进行逆转录反应。在本实验中示例性使用了20万个细胞/细胞核。然而,易于理解的是,如果需要使用更少或更多的细胞/细胞核,那么可以根据需要,减少或增加逆转录反应体系的体积。
(2)移液器轻轻混匀后,置于PCR仪(105℃热盖)55℃孵育5分钟,迅速置于冰上至少2分钟。
(3)加入30ul逆转录反应液(8μl 5x Reverse Transcription Buffer,2μl 100mM DTT,2μl 10mM dNTPs,2μl RNaseOUT RNase inhibitor,2.5μl Maxima H Minus Reverse Transcriptase,13.5μl无核酸酶的水),移液器充分混匀后,置于PCR仪中进行如下反应:(热盖60℃)50℃ 10分;3个循环[8℃ 12秒,15℃ 45秒,20℃ 45秒,30℃ 30秒,42℃ 2分,50℃ 3分];50℃ 5分,4℃暂存。
2.单端转座反应(加载第一标签序列)
用实施例1制备好的单端特异性寡核苷酸标记的TN5转座酶复合体对上述步骤1得到的cDNA产物进行转座反应,加载第一标签序列。实验步骤如下:
(1)按需要测序的细胞数量,对逆转录后的样品配制转座反应液。当样品为细胞核时,每孔反应液如下:4ul 5x Reaction Buffer(购自Vazyme,货号:S601-01),2000-10000细胞核/反应,加无核酸酶的水至18.2ul。当样品为细胞时,每孔反应液如下:4ul 5x Reaction Buffer(购自Vazyme,货号:S601-01),2000-10000细胞/反应,0.2ul 1%Digitonin,加无核酸酶的水至18.2ul。
(2)将配制好的细胞反应液加到已提前分装有单端特异性寡核苷酸标记的TN5转座酶复合体(每孔已分装1.8ul)的96孔板中。移液器充分吹打均匀,盖好96孔PCR板密封硅胶盖,瞬时离心后,置于恒温震荡仪37℃,1000转每分钟孵育震荡30分钟。
(3)孵育结束后,收集96孔板中的所有反应液于2个1.5ml离心管中(每管约1ml),加入200ul反应终止液(100mM Tris-HCl pH8.0,200mM EDTA),充分混匀后室温放 置5分钟,4℃ 500g离心,去上清,用200ul样品稀释液(1x PBS,添加1%BSA,1%SUPERase-In RNase Inhibitor)洗两次。
3.油包水微液滴制备和模板置换反应(加载第二标签序列)
本实施例以10X genomics chromium平台和10X Single Cell 5′Gel Beads为例,制备油包水微液滴,赋予每个微液滴一个唯一的标签标记(第二标签序列)。油包水制备和细胞条形码标记的微珠可以被其他平台代替。实验步骤如下:
(1)配制模板置换反应液:(20μl 5x Reverse Transcription Buffer,1μl 100mM DTT,10μl 10mM dNTPs,2μl RNaseOUT RNase inhibitor,2.4ul Additive A,10ul 20%Ficoll,5μl Maxima H Minus Reverse Transcriptase,49μl无核酸酶的水),重悬上述步骤2获得的细胞或细胞核沉淀。本步骤重悬的细胞或细胞核的数量为约11万。
(2)按照10X Chromium Single Cell V(D)J Reagent Kits User Guide说明书,将90ul的细胞反应液、40ul 10X Single Cell 5′Gel Beads、270ul矿物油加载到10X Chip A芯片上(空余的孔按说明书要求加载50%甘油),在10X Genomics chromium仪器上进行油包水制备,大约耗时7分钟。本步骤用于制备油包水液滴的上机细胞量为约10万。
(3)小心收集油包水产物到200ul PCR管中,快速置于PCR仪,反应条件如下:(热盖105℃)25℃ 30分,42℃ 90分,53℃ 10分,4℃暂存。
4.油包水微液滴纯化
按照10X Chromium Single Cell V(D)J Reagent Kits User Guide说明书,对油包水微液滴纯化,最后用35.5ul EB洗脱。
5.cDNA富集
在200ul PCR管中配制PCR扩增反应体系:50ul NEBNext High-Fidelity 2x PCR Master Mix,0.5ul 100mM S-P5引物(SEQ ID NO:102),35ul上述步骤4纯化获得的产物,11.5ul无核酸酶的水。混匀后置于PCR仪,反应条件:(热盖105℃)72℃ 3分,98℃ 45秒,13个循环(根据上样的细胞数量决定)[98℃ 20秒,67℃ 30秒,72℃ 1分],72℃ 1分钟,4℃暂存。
6.cDNA富集产物纯化回收
(1)用0.8x SPRIselect磁珠进行cDNA纯化,即在上述步骤5的100ul的反应产物中加入80ul 0.8x SPRIselect磁珠,移液器充分混匀。
(2)室温孵育5分钟后,置于磁力架上,待磁珠完全吸附到磁力架上,溶液变清澈 时,去上清。
(3)用200ul现配制的80%乙醇洗两次,去上清,等磁珠干燥后用41ul EB缓冲液重悬磁珠,室温放置2分钟后,置于磁力架上,待磁珠完全吸附到磁力架上,溶液变清澈时,收集上清于新的离心管中。
(4)取1ul,用Qubit测浓度,样品可于-80℃保存3月。
7.测序文库初步扩增
(1)取20ul上述步骤6的产物进行PCR扩增,反应体系如下:50ul KAPA HiFi HotStart 2X ReadyMix,1ul 100mM S-bio-P5引物(SEQ ID NO:103),4ul 25mM S-P7引物(SEQ ID NO:104,取4种25mM S-P7引物(所述4种P7引物的标签序列分别如SEQ ID NOs:105-108所示),每种取1ul),20ul上述步骤6的产物,25ul无核酸酶的水。混匀后置于PCR仪,反应条件:(热盖105℃)98℃ 45秒,8个循环(循环数可根据上述步骤6的产物浓度调整)[98℃ 20秒,54℃ 30秒,72℃ 20秒],72℃ 1分钟,4℃暂存。
(2)用10ul Dynabeads MyOne Streptavidin C1 beads对上步产物进行带生物素(biotin)修饰的核酸的特异性富集,以进一步富集目的的cDNA来源的片段。富集后使用20.5ul EB缓冲液重悬磁珠(beads),备用。
8.测序文库二次扩增
取20ul上述步骤7的产物进行PCR扩增,反应体系如下:50ul KAPA HiFi HotStart 2X ReadyMix,1ul 100mM S-P5引物(SEQ ID NO:102),4ul 25mM S-P7引物(取4种25mM S-P7引物,每种取1ul),20ul上述步骤6的产物,25ul无核酸酶的水。混匀后置于PCR仪,反应条件:(热盖105℃)98℃ 45秒,8个循环(循环数可根据上述步骤7的产物浓度调整)[98℃ 20秒,54℃ 30秒,72℃ 20秒],72℃ 1分钟,4℃暂存。
9.测序文库纯化和片段筛选
用0.55X和0.2X SPRIselect磁珠对上步产物进行纯化和片段筛选。最后得到片段大小为300-600bp左右的测序文库。
10.测序
构建好的文库用NovaSeq 6000(Illumina,San Diego,CA)测序,读长150bp双端测序,每个细胞测50,000个reads。
10.数据分析
10.1假单细胞率
根据10X Genomics发布的数据:用于5’端转录组测序的10X Genomics chromium平台及试剂盒在建库过程中,实际捕获的细胞数通常为用于制备油包水液滴的上机细胞量的57%左右(即捕获率为57%左右)。并且,假单细胞率与上机细胞数量之间呈线性函数关系:即,上机细胞数量越多,假单细胞率越高(参见图3)。当上机细胞数量为10万时,油包水液滴捕获的细胞数量为约5.7万,且液滴的假单细胞率为约45.65%(数据来源:USER GUIDE of Chromium Next GEM Single Cell V(D)J Reagent Kits v1.1,10X Genomics)。然而,由这些假单细胞所产生的测序数据耗费了大量测序成本,却没有使用价值,需要被过滤去除。为了将假单细胞率控制在合理的范围内(例如,低于8%),10X Genomics推荐上机细胞量应不超过1万。
在本实施例中,使用已经加载了第一标签的10万个细胞核或透化细胞分别进行油包水液滴的制备及转录组文库的构建,随后进行测序。在测序后,结合第一标签和第二标签的序列对测序数据进行分析。分析结果显示:在油包水液滴制备过程中,59622个细胞核被捕获(捕获率为59.62%),58771个透化细胞被捕获(捕获率为58.77%)。捕获效率与10X Genomics给出的数据一致。分析结果还显示:对于使用细胞核的实验,只包含1种第一标签的液滴的比率为65.37%;包含2种第一标签的液滴的比率为25.93%;包含3种第一标签的液滴的比率为6.95%;包含3种以上第一标签的液滴的比率为1.75%(参见图4)。换言之,对于使用细胞核的实验,包含两种或更多种第一标签的液滴的比率为34.63%。此处,单个液滴中第一标签的个数基本上反应了单个液滴中细胞核的个数(未考虑单个液滴包含具有相同第一标签的两个或更多个细胞核的情形)。因此,在本实施例制备的油包水液滴中,包含两个或更多个细胞的液滴的比率为34.63%。对于使用透化细胞的实验,也获得了类似的结果:只包含1种第一标签的液滴的比率为57.85%;包含2种第一标签的液滴的比率为28.71%;包含3种第一标签的液滴的比率为9.82%;包含3种以上第一标签的液滴的比率为3.62%(参见图4)。换言之,对于使用透化细胞的实验,包含两种或更多种第一标签的液滴的比率为42.15%。
对于常规的10X Genomics转录组测序方案而言,这样的假单细胞比率(34.63%或42.15%)是不可接受的,因为这可导致产生大量的无用测序数据。然而,在本实施例的方法中,由于各个细胞或细胞核在制备油包水液滴之前已经加载了第一标签,因此,即使 单个液滴中含有两个或多个细胞或细胞核,由此类液滴所产生的测序数据也可以通过第一标签的序列进行拆分,从而获得单个液滴中的各个细胞各自的测序数据。因此,通过利用本实施例的方法,由假单细胞液滴(含有两个或多个细胞或细胞核的液滴)所产生的测序数据可以用于分析各个细胞,而无需过滤去除。这大大降低了测序成本。同时,由于整个方法能够兼容更高的上机细胞数量(例如,至少10万个细胞或细胞核),建库成本也大大降低。
10.2数据质量
本实施例对3种人细胞系:HEK293T细胞、Hela细胞和K562细胞进行混样建库并测序。测序结果显示,各个细胞系各自具有独特的高表达基因,且利用透化细胞样品(图5A)和细胞核样品(图6A)获得的测序数据具有高度的一致性。另外,对测序数据的表达矩阵进行降维可视化分析。结果显示:利用透化细胞样品(图5B)和细胞核样品(图6B)获得的测序数据,均能够对三个细胞系进行良好的区分(即,三个细胞系都能被明显分成3个独立的类群)。这些结果表明,本实施例的方法能够用于高通量单细胞转录组测序,能够准确测定单细胞的RNA丰度,能够准确区分和确定不同细胞之间的转录组差异。
实施例5:不同逆转录引物对细胞核及透化细胞的单细胞转录组数据质量的影响
本实施例所用试剂、仪器信息同表3-5
实验方法:
该实施例的基本步骤包括实施例2单细胞核制备、3单细胞悬液透化、4单细胞转录组文库制备的基本步骤。具体差异见描述:
1.逆转录引物准备
(1)25umM poly T引物(SEQ ID NO:100);
(2)25umM随机引物(SEQ ID NO:101);
(3)25umM混合引物,由poly T引物和随机引物1:1等摩尔浓度混合得到。
2.单细胞样品制备:
取一种细胞系进行测试,本实施例用Hela细胞系(购自中国科学院细胞库,目录号TCHu187)按实施例2的方法提取细胞核,按实施例3的方法透化细胞。
3.逆转录反应
将上述透化细胞样品和细胞核样品以5万个/管的量各自置于3个200ul PCR管中, 所述3个管内分别加入3ul步骤1中准备好的逆转录引物(poly T引物、随机引物、或混合引物),总共进行6个实验。单管总反应体系为10ul,不足的部分用无核酸酶的水补足。移液器轻轻混匀后,置于PCR仪(105℃热盖)55℃孵育5分钟,迅速置于冰上至少2分钟。每个PCR管内加入30ul逆转录反应液(8μl 5x Reverse Transcription Buffer,2μl 100mM DTT,2μl 10mM dNTPs,2μl RNaseOUT RNase inhibitor,2.5μl Maxima H Minus Reverse Transcriptase,13.5μl无核酸酶的水),移液器充分混匀后,置于PCR仪中进行如下反应:(热盖60℃)50℃ 10分;3个循环[8℃ 12秒,15℃ 45秒,20℃ 45秒,30℃ 30秒,42℃ 2分,50℃ 3分];50℃ 5分,4℃暂存。。
4.单端转座反应(加载第一标签序列)
用实施例1制备的单端特异性寡核苷酸标记的TN5转座酶复合体对上述步骤3得到的6种逆转录产物进行转座反应,加载第一标签序列,每种逆转录产物分别用8种不同的单端特异性寡核苷酸标记的TN5转座酶复合体分别进行转座标记。具体实验步骤参见实施例4步骤2的单端转座反应。最后用20ul样品稀释液(1x PBS,添加1%BSA,1%SUPERase-In RNase Inhibitor)轻轻重悬样品,并计数。
5.油包水微液滴制备和模板置换反应(加载第二标签序列)
从步骤4中获得的细胞及细胞核产物中分别取2.8万个进行油包水微液滴制备和模板置换反应,其余实验步骤和条件参见实施例4步骤3。
6.油包水微液滴纯化到测序文库纯化和片段筛选
具体实验步骤参见实施例4步骤4-9。
7.测序
构建好的文库用NovaSeq 6000(Illumina,San Diego,CA)测序,读长150bp双端测序,总共测125G原始数据。
8.数据分析
本实施例获得的数据如图7所示,数据显示:3种不同的逆转录引物均能实现对透化细胞或细胞核的核酸检测;此外,无论测序深度在100G或者125G,在使用相同的引物条件下,与细胞核样品相比,在透化细胞样品中能检测到更多的基因数;对透化的细胞样品而言,使用poly T引物能比用随机引物检测到更多的基因数;而对细胞核样品而言,使用含poly T引物和随机引物的混合引物比仅使用单一种类的引物能检测到更多的基因。上述结果提示,3种逆转录引物(poly T引物、随机引物、或混合引物) 均能实现对透化细胞或细胞核的核酸检测;并且,对细胞核样品使用含poly T引物和随机引物的混合引物能够提高单细胞转录组基因的检测能力。
实施例6:本单细胞测序技术兼容免疫细胞VDJ序列的富集
本实施例所用试剂、仪器信息如表6所示,其余同表3-5:
表6试剂与仪器
Figure PCTCN2021139123-appb-000011
该实施例以从人类外周血中富集到的T细胞为例,描述了用本发明的单细胞建库方法,在检测单细胞转录组数据的同时,从富集到的cDNA产物中富VDJ区域(包含B细胞的BCR区域,或者T细胞的TCR区域)的过程。
本实施例仅以人外周血来源T细胞为例,但该方法同样适用其他来源、其他物种的 T细胞和B细胞等免疫细胞VDJ区域的富集和测序,以及其他目的基因的捕获。
本实施例提供的富集方式为:需要根据目标基因或片段的特征,设计目的片段的特异性引物,与S-P5引物共同特异性的富集目的片段,以这种富集方式得到第二标签偶联的目的片段,简单易行。
实验方法:
1.PBMC的提取
3ml人外周血中,加等量的PBS稀释,按照Ficoll Paque PLUS说明书,梯度离心,分离提取PBMC。
2.T细胞富集
(1)CD3抗体孵育:将分离得到PBMC用100ul洗涤液(1XPBS,添加2%BAS)重悬,加5ul APC anti-human CD3抗体,冰上避光孵育30分钟,用500ul洗涤液(1XPBS,添加2%BAS)洗2次,350g离心去上清。
(2)7-AAD染料孵育:CD3抗体孵育过的细胞用用100ul洗涤液(1XPBS,添加2%BAS)重悬,加5ul 7-aad,冰上避光孵育5分钟。孵育后加500ul洗涤液(1XPBS,添加2%BAS)重悬细胞,并过70um细胞筛,转移到流式管中,放冰上带分选。
(3)流式分选:用流式细胞仪分选CD3+,7-AAD-细胞,该类细胞为活的T细胞。
3.T细胞透化:透化过程参见实施例3。
4.T细胞单细胞转录组文库的制备
本实施例对14个人类(包括2个健康人和12个癌症患者)的外周血按上述方法富集得到T细胞并进行单细胞转录组文库制备,制备方式参见实施例4。
简言之,每个样品取1.5万个细胞,用Poly T引物进行逆转录反应;单端转座反应中,每个健康人用12种不同的单端特异性寡核苷酸标记的TN5转座酶复合体进行转座标记,每个癌症患者用6种不同的的单端特异性寡核苷酸标记的TN5转座酶复合体进行转座标记;单端转座反应后收集得到6.7万细胞,所有细胞都用于油包水微液滴制备和模板置换反应。
剩余的步骤与实施例4的对应步骤一致,以获得单细胞转录组数据。
5.T细胞VDJ富集
从单细胞转录组文库的制备过程获得的cDNA产物中,取5ul进行后续的扩增富集。
(1)第一轮巢式PCR:在200ul PCR管中配制PCR扩增反应体系:50ul KAPA HiFi  HotStart 2X ReadyMix,5ul 10mM T MIX 1(Human TCR outer-1引物,SEQ ID NO:117和Human TCR outer-2引物,SEQ ID NO:118等摩尔比混合),5ul 10mM S-P5引物(SEQ ID NO:102),5ul上述步骤cDNA富集产物,35ul无核酸酶的水。混匀后置于PCR仪,反应条件:(热盖105℃)98℃ 45秒,12个循环(循环数根据所述cDNA富集产物浓度调整)[98℃ 20秒,67℃ 30秒,72℃ 60秒],72℃ 1分钟,4℃暂存。
(2)初扩增产物纯化
用0.5X和0.3X SPRIselect磁珠对上步产物进行纯化和片段筛选,用25ul EB洗脱磁珠。以期富集600-1000bp左右的片段。
(3)第二轮巢式PCR:在200ul PCR管中配制PCR扩增反应体系:50ul KAPA HiFi HotStart 2X ReadyMix,5ul 10mM T MIX 2(Human TCR inner-1引物,SEQ ID NO:119和Human TCR inner-2引物,SEQ ID NO:120等摩尔比混合),5ul 10mM S-P5引物(SEQ ID NO:102),25ul第一轮巢式PCR产物,15ul无核酸酶的水。混匀后置于PCR仪,反应条件:(热盖105℃)98℃ 45秒,10个循环(循环数根据所述第一轮巢式PCR产物浓度调整)[98℃ 20秒,67℃ 30秒,72℃ 60秒],72℃ 1分钟,4℃暂存。
(4)扩增产物纯化,同上(2)。
6.VDJ文库构建
扩增产物可以用传统的转录组建库方法建库,本实施例使用Chromium Single Cell 5'Library Construction Kit建库。
(1)产物片段化
取50ng扩增产物,加无核酸酶的水补齐至20ul,加入片段化反应液(包含5ul fragmentation buffer,15ul Fragmentation Enzyme Blend,15ul无核酸酶的水)。在冰上充分混匀后置于PCR仪,反应条件:(热盖65℃)32℃ 2分,65℃ 30分,4℃暂存。
(2)末端修复&加接头
片段后的产物中加入末端修复和加接头的反应液(包含20ul Ligation Buffer,10ul DNA Ligase,2.5ul Adaptor Mix,17.5ul无核酸酶的水),充分混匀后置于PCR仪,反应条件:(热盖30℃)20℃ 15分,4℃暂存。
(3)产物纯化
使用0.8X SPRIselect磁珠对上步产物进行纯化,用30ul EB进行洗脱。
(4)VDJ测序文库扩增
上述产物加入70ul反应液,体系如下:50ul KAPA HiFi HotStart 2X ReadyMix,2ul SI-PCR Primer,10ul individual Chromium i7 Sample Index,8ul无核酸酶的水)。混匀后置于PCR仪,反应条件:(热盖105℃)98℃ 45秒9个循环[98℃ 20秒,54℃ 30秒,72℃ 20秒],72℃ 1分钟,4℃暂存。
7.VDJ测序文库纯化
用0.8X SPRIselect磁珠对上步产物进行纯化。
8.VDJ测序
构建好的文库用NovaSeq 6000(Illumina,San Diego,CA)测序,读长150bp双端测序,每个细胞测12,500个reads。
9.数据分析
本实施例获得的数据显示:在本实施例获得的转录组数据中检测到总计41,337个细胞,在当前的测序深度下富集到4,719个高表达TCR的单细胞。14个人外周血的富集到的T细胞的单细胞转录组数据分析结果显示,检测到12种T细胞类型,携带TCR信息的T细胞类型也为12种(100%),每种细胞类群的可视化结果和细胞数量参见图8。
对于转录组数据中检测到的12种T细胞类型,VDJ文库测序数据中都有与之对应的TCR克隆型被检测到。VDJ文库测序数据中检测到的不同TCR克隆型的细胞类群的可视化结果和对应的细胞数量参见图9,图9的结果表明,各种细胞类型中检测到TCR的细胞数量占比与该种细胞转录组数据检测到的细胞数量占比一致。本实施例进一步分析了14个样品分别检测到的主要的TCR克隆型的分布情况,显示不同人的TCR克隆型具有多样性(参见图10);另外,分别分析了14个样品检测到的TRB基因和TRA基因分布情况,发现不同样品检测到的TCR的TRB基因和TRA基因分布无偏好性(参见图11A和图11B)。
综合以上结果,可以明确本发明能兼容单细胞TCR VDJ区域的富集及测序分析。
尽管本发明的具体实施方式已经得到详细的描述,但本领域技术人员将理解:根据已经公开的所有教导,可以对细节进行各种修改和变动,并且这些改变均在本发明的保护范围之内。本发明的全部范围由所附权利要求及其任何等同物给出。

Claims (23)

  1. 一种处理细胞或细胞核以产生核酸片段群的方法,其包括下述步骤:
    (1)提供一个或多个细胞或细胞核;
    (2)对所述细胞或细胞核内的RNA(例如,mRNA、非编码RNA、eRNA)进行包括逆转录步骤的处理,形成含有cDNA链的双链核酸(例如,含有RNA(例如,mRNA、非编码RNA、eRNA)链和cDNA链的杂合双链核酸);
    (3)将所述双链核酸(例如,所述杂合双链核酸)与转座酶复合体孵育;其中,所述转座酶复合体含有转座酶和所述转座酶能够识别并结合的转座序列,且能够切割或断裂双链核酸(例如,含有RNA和DNA的杂合双链核酸);并且,所述转座序列包含转移链和非转移链;其中,所述转移链包含转座酶识别序列,第一标签序列,以及,第一共有序列;其中,所述第一标签序列位于所述转座酶识别序列的上游(例如5’端),且,所述第一共有序列位于所述第一标签序列的上游(例如5’端);并且
    所述孵育在允许所述双链核酸(例如,所述杂合双链核酸)被所述转座酶复合体断裂成核酸片段且所述转移链被连接至所述核酸片段的末端(例如,所述核酸片段的5’端)的条件下进行;
    从而,形成核酸片段群;其中,所述核酸片段包含cDNA片段,以及连接至所述cDNA片段的5’端的转移链的序列;
    优选地,所述核酸片段从5’端至3’端包含第一共有序列,第一标签序列,转座酶识别序列和cDNA片段;
    优选地,在进行步骤(2)之前,对细胞进行透化和/或固定处理。
  2. 权利要求1的方法,其中,在步骤(1)中提供至少2个(例如,至少10个,至少10 2个,至少10 3个,至少10 4个,至少10 5个,至少10 6个,至少10 7个,或更多个)细胞或细胞核;
    优选地,在进行步骤(3)之前(例如,在进行步骤(2)之前,或者在进行步骤(2)之后且在进行步骤(3)之前),将所述细胞或细胞核分成至少2个(例如,至少3个,至少4个,至少5个,至少8个,至少10个,至少12个,至少20个,至少24个,至少50个,至少96个,至少100个,至少200个,至少384个,至少400个, 或更多个)亚集,其中,每个亚集含有至少一个细胞或细胞核;
    优选地,在步骤(3)中,将各个亚集的细胞或细胞核内的所述双链核酸(例如,杂合双链核酸)分别与转座酶复合体孵育;
    优选地,对于每个亚集,所述转座酶复合体具有彼此不同的第一标签序列;由此,从各个亚集的细胞或细胞核所产生的核酸片段含有彼此不同的第一标签序列;
    优选地,对于每个亚集,所述转座酶复合体具有相同的转座酶,相同的转座酶识别序列,相同的第一共有序列,和/或,相同的非转移链;
    优选地,对于每个亚集,除了所述第一标签序列之外,所述转座酶复合体具有相同的转座酶,相同的转座酶识别序列,相同的第一共有序列,以及,相同的非转移链;
    优选地,各个亚集所产生的核酸片段具有相同的第一共有序列;且同一亚集所产生的核酸片段具有相同的第一标签序列;且不同亚集所产生的核酸片段具有彼此不同的第一标签序列;
    优选地,在进行步骤(3)之后,将至少2个亚集的细胞或细胞核合并;
    优选地,在进行步骤(3)之后,将至少各个亚集的细胞或细胞核合并。
  3. 权利要求1或2的方法,其中,所述细胞是来自动物、植物或微生物的细胞或细胞系,或其任何组合;
    例如,所述细胞是来自哺乳动物(例如人)的细胞或细胞系,或其任何组合;
    例如,所述细胞是癌细胞、干细胞、神经细胞、胎儿细胞、免疫细胞,或其任何组合;
    例如,所述细胞是免疫细胞,例如B细胞或T细胞;或者,所述细胞核来自免疫细胞,例如B细胞或T细胞;
    优选地,所述核酸片段群包含T细胞受体基因或基因产物,或B细胞受体基因或基因产物。
  4. 权利要求1-3任一项的方法,其中,使用逆转录酶对所述RNA(例如,mRNA、长链非编码RNA、eRNA)进行逆转录,形成含有RNA(例如,mRNA、长链非编码RNA、eRNA)链和cDNA链的杂合双链核酸;
    优选地,所述杂合双链核酸在cDNA链的3’端具有悬突;优选地,所述悬突具有 至少1个,至少2个,至少3个,至少4个,至少5个,至少6个,至少7个,至少8个,至少9个,至少10个或更多个核苷酸的长度;优选地,所述悬突为2-5个胞嘧啶核苷酸的悬突(例如CCC悬突);
    优选地,所述逆转录酶具有末端转移活性;优选地,所述逆转录酶能够以RNA(例如,mRNA、长链非编码RNA、eRNA)为模板,合成cDNA链,且在所述cDNA链的3’端添加悬突;优选地,所述逆转录酶能够在cDNA链的3’末端添加长度为至少1个,至少2个,至少3个,至少4个,至少5个,至少6个,至少7个,至少8个,至少9个,至少10个或更多个核苷酸的悬突;优选地,所述逆转录酶能够在cDNA链的3’末端添加2-5个胞嘧啶核苷酸的悬突(例如CCC悬突);优选地,所述逆转录酶不具有或者具有降低的RNase活性(特别是RNase H活性);优选地,所述逆转录酶选自,经修饰或突变以去除RNase活性(特别是RNase H活性)的M-MLV逆转录酶、HIV-1逆转录酶、AMV逆转录酶和端粒酶逆转录酶(例如,不具有RNase H活性的M-MLV逆转录酶);
    优选地,使用包含poly(T)序列的引物和/或包含随机寡核苷酸序列的引物对所述RNA(例如,mRNA、长链非编码RNA、eRNA)进行逆转录;优选地,所述poly(T)序列或所述随机寡核苷酸序列位于所述引物的3’端;优选地,所述poly(T)序列包含至少5个(例如,至少10个、至少15个、或至少20个)胸腺嘧啶核苷酸残基;优选地,所述随机寡核苷酸序列具有5-30nt(例如,5-10nt,10-20nt,20-30nt)的长度;优选地,所述引物不包含修饰,或者包含修饰的核苷酸。
  5. 权利要求1-4任一项的方法,其中,所述转座酶复合体能够随机切割或断裂含有RNA和DNA的杂合双链核酸;
    优选地,所述转座酶选自Tn5转座酶、MuA转座酶、睡美人转座酶、Mariner转座酶、Tn7转座酶、Tn10转座酶、Ty1转座酶、Tn552转座酶,以及具有上述转座酶的转座活性的变体、修饰产物和衍生物;
    优选地,所述转座酶为Tn5转座酶;
    优选地,所述第一标签序列具有至少2个,至少3个,至少4个,至少5个,至少6个,至少7个,至少8个,至少9个,至少10个或更多个核苷酸的长度;例如,所述第一标签序列的长度为4-8个核苷酸;优选地,所述第一标签序列连接(例如直接 连接)至所述转座酶识别序列的5’端;
    优选地,所述第一共有序列具有至少5个,至少6个,至少7个,至少8个,至少9个,至少10个,至少12个,至少15个,至少18个,至少20个,至少25个或更多个核苷酸的长度;例如,所述第一共有序列的长度为12-25个核苷酸;优选地,所述第一共有序列连接(例如直接连接)至所述第一标签序列的5’端;
    优选地,所述转移链从5’端至3’端包含第一共有序列,第一标签序列,和转座酶识别序列;优选地,所述转座酶识别序列具有如SEQ ID NO:99所示的序列;
    优选地,所述非转移链能够与所述转移链退火或杂交形成双链体;优选地,所述非转移链包含与转移链中的转座酶识别序列互补的序列;优选地,所述非转移链具有如SEQ ID NO:1所示的序列;
    优选地,所述转移链不包含修饰,或者包含修饰的核苷酸;和/或,所述非转移链不包含修饰,或者包含修饰的核苷酸;优选地,所述非转移链的5’末端具有磷酸基团修饰;和/或,所述非转移链的3’末端是封闭的(例如,所述非转移链的3’末端核苷酸为双脱氧的核苷酸)。
  6. 权利要求1-5任一项的方法,其中,在步骤(3)中,在所述细胞或细胞核内形成所述核酸片段群;
    优选地,所述核酸片段群用于构建转录组文库(例如,5’端转录组文库)或用于转录组测序(例如,5’端转录组测序);
    优选地,所述核酸片段群用于构建靶核酸(例如,V(D)J序列)的文库或用于靶核酸(例如,V(D)J序列)的测序。
  7. 一种生成经标记的核酸分子的方法,其包括下述步骤:
    (a)提供:
    一个或多个细胞或细胞核,所述细胞或细胞核是根据权利要求1-6任一项所述的方法进行了处理的细胞或细胞核,其含有核酸片段群;和
    一个或多个偶联寡核苷酸分子的珠粒,所述寡核苷酸分子含有标记序列;和
    (b)使用所述核酸片段和所述寡核苷酸分子生成经标记的核酸分子,所述经标记的核酸分子从5'末端到3'末端包含所述核酸片段的序列以及所述标记序列的互补序列, 或者包含所述标记序列以及所述核酸片段的互补序列。
  8. 权利要求7的方法,其中,在步骤(a)中,提供至少2个(例如,至少10个,至少10 2个,至少10 3个,至少10 4个,至少10 5个,至少10 6个,至少10 7个,或更多个)细胞或细胞核;和/或,提供至少2个(例如,至少10个,至少10 2个,至少10 3个,至少10 4个,至少10 5个,至少10 6个,至少10 7个,至少10 8个,或更多个)珠粒;
    优选地,在步骤(a)中,在微孔或液滴中(例如,在多个微孔或液滴中)提供所述细胞或细胞核,以及所述珠粒;
    优选地,所述液滴是油包水液滴。
  9. 权利要求7或8的方法,其中,所述珠粒偶联了多个(例如,至少10个,至少10 2个,至少10 3个,至少10 4个,至少10 5个,至少10 6个,至少10 7个,至少10 8个,或更多个)寡核苷酸分子;
    优选地,所述寡核苷酸分子偶联至珠粒的表面,和/或,封闭在珠粒内;
    优选地,所述珠粒能够自发地或在暴露于一种或多种刺激(例如,温度变化、pH变化、暴露于特定化学物质或相、暴露于光、还原剂等)时释放所述寡核苷酸;
    优选地,所述珠粒是凝胶珠粒。
  10. 权利要求7-9任一项的方法,其中,所述标记序列包含选自下列的元件:第一扩增引物序列,第二共有序列,第二标签序列,独特分子标签序列,模板转换序列,或其任何组合;
    优选地,所述标记序列包含第二共有序列,第二标签序列,独特分子标签序列和模板转换序列;优选地,所述标记序列还包含第一扩增引物序列;
    优选地,所述模板转换序列包含与所述cDNA链的3’末端悬突互补的序列;优选地,所述悬突为2-5个胞嘧啶核苷酸的悬突(例如CCC悬突),且所述模板转换序列的3’末端包含2-5个鸟嘌呤核苷酸突(例如GGG);优选地,所述模板转换序列具有至少5个,至少6个,至少7个,至少8个,至少9个,至少10个,至少15个,至少20个,至少25个或更多个核苷酸的长度;优选地,所述模板转换序列不包含修饰, 或者包含修饰的核苷酸(例如锁核酸);
    优选地,所述独特分子标签序列具有至少5个,至少6个,至少7个,至少8个,至少9个,至少10个,至少15个,至少20个,至少25个或更多个核苷酸的长度;优选地,所述独特分子标签序列不包含修饰,或者包含修饰的核苷酸;
    优选地,所述第二标签序列具有至少3个,至少4个,至少5个,至少6个,至少7个,至少8个,至少9个,至少10个,至少15个,至少20个,至少25个或更多个核苷酸的长度;优选地,所述第二标签序列不包含修饰,或者包含修饰的核苷酸;
    优选地,所述第二共有序列具有至少3个,至少4个,至少5个,至少6个,至少7个,至少8个,至少9个,至少10个,至少15个,至少20个,至少25个或更多个核苷酸的长度;优选地,所述第二共有序列不包含修饰,或者包含修饰的核苷酸;
    优选地,所述珠粒偶联了多个寡核苷酸分子,并且,各个寡核苷酸分子具有彼此不同的独特分子标签序列;优选地,各个寡核苷酸分子具有相同的第二标签序列和/或相同的第二共有序列;
    优选地,所述方法使用了多个珠粒,并且,每个珠粒各自具有多个寡核苷酸分子;并且,同一个珠粒上的所述多个寡核苷酸分子具有相同的第二标签序列,并且,不同珠粒上的寡核苷酸分子具有彼此不同的第二标签序列;优选地,各个珠粒上的寡核苷酸分子具有相同的第二共有序列;优选地,各个珠粒上的寡核苷酸分子还具有相同的第一扩增引物序列;
    优选地,所述模板转换序列位于所述标记序列的3’末端;
    优选地,所述第二共有序列位于所述第二标签序列,独特分子标签序列和/或模板转换序列的上游;
    优选地,所述第一扩增引物序列位于所述第二共有序列的上游;
    优选地,所述标记序列从5’端至3’端包含任选的第一扩增引物序列,第二共有序列,第二标签序列,独特分子标签序列和模板转换序列。
  11. 权利要求7-10任一项的方法,其中,在步骤(b)中,通过选自下列的方式,使所述核酸片段和所述寡核苷酸分子接触:
    (b1)将细胞或细胞核裂解以释放核酸片段;
    (b2)将寡核苷酸分子从珠粒上释放;或者
    (b3)(b1)和(b2)的组合;
    优选地,在步骤(b)中,所述寡核苷酸分子通过模板转换序列与含有cDNA链的3’末端悬突的核酸片段退火或杂交,其中,所述模板转换序列包含与所述cDNA链的3’末端悬突互补的序列;并且,在核酸聚合酶(例如,DNA聚合酶或逆转录酶)的作用下,所述核酸片段以所述寡核苷酸分子为模板被延伸,或所述寡核苷酸分子以所述核酸片段为模板被延伸,从而生成经标记的核酸分子;优选地,步骤(b)中使用的核酸聚合酶与步骤(2)中使用的逆转录酶是相同的;
    优选地,所述经标记的核酸分子从5'末端到3'末端包含所述核酸片段的序列以及所述标记序列的互补序列,其中所述核酸片段包含与RNA(例如,mRNA、非编码RNA、eRNA)的5’端序列互补的序列;
    优选地,所述经标记的核酸分子从5'末端到3'末端包含第一共有序列,第一标签序列,转座酶识别序列,cDNA片段的序列,模板转换序列的互补序列,独特分子标签序列的互补序列,第二标签序列的互补序列,第二共有序列的互补序列,以及任选的第一扩增引物序列的互补序列;优选地,所述cDNA片段包含与RNA(例如,mRNA、长链非编码RNA、eRNA)的5’端序列互补的序列;
    优选地,所述方法还包括:(c)回收和纯化所述经标记的核酸分子;
    优选地,所述经标记的核酸分子用于构建转录组文库(例如,5’端转录组文库)或用于转录组测序(例如,5’端转录组测序);
    优选地,所述核酸片段群用于构建靶核酸(例如,V(D)J序列)的文库或用于靶核酸(例如,V(D)J序列)的测序。
  12. 一种构建核酸分子文库的方法,其包括,
    (i)根据权利要求7-11任一项的方法生成多个经标记的核酸分子,以及,
    (ii)回收和/或合并多个经标记的核酸分子,
    从而获得核酸分子文库;
    优选地,在步骤(ii)中,回收和/或合并由多个珠粒衍生的经标记的核酸分子。
  13. 权利要求12的方法,其中,所述方法还包括,(iii)富集所述经标记的核酸分子;
    优选地,在步骤(iii)中,对所述经标记的核酸分子进行核酸扩增反应,以产生富集产物;优选地,所述核酸扩增反应使用至少第一引物来进行,其中,所述第一引物能够与所述第一扩增引物序列的互补序列和/或所述第二共有序列的互补序列杂交或退火;任选地,所述核酸扩增反应还使用第二引物,所述第二引物能够与所述第一共有序列的互补序列杂交或退火;
    优选地,步骤(iii)中的所述核酸扩增反应使用核酸聚合酶(例如DNA聚合酶;例如具有链置换活性和/或高保真性的DNA聚合酶)来进行;
    优选地,所述第一引物含有:①所述第一扩增引物序列或其部分序列,或者②所述第二共有序列或其部分序列,或者③①和②的组合;
    优选地,所述第二引物含有所述第一共有序列或其部分序列;
    优选地,所述方法还包括,在进行步骤(iii)之前,将所述寡核苷酸分子或模板转换序列降解的步骤;
    优选地,所述第一引物与所述经标记的核酸分子的退火温度高于所述寡核苷酸分子与所述经标记的核酸分子的退火温度;
    优选地,所述方法还包括,(iv)回收和纯化步骤(iii)的富集产物。
  14. 权利要求13的方法,其中,步骤(iii)中,所述第一引物连接有第一标记分子,所述第一标记分子能与第一结合分子发生相互作用;
    优选地,步骤(iv)中,使用所述第一结合分子回收和纯化步骤(iii)的富集产物。
  15. 权利要求13的方法,其中,步骤(iii)中,使用至少所述第一引物和所述第二引物对所述经标记的核酸分子进行核酸扩增反应,以产生富集产物;其中,所述第一引物连接有第一标记分子,和/或,所述第二引物连接有第二标记分子;所述第一标记分子能与第一结合分子发生相互作用,所述第二标记分子能与第二结合分子发生相互作用;
    优选地,步骤(iv)中,使用所述第一结合分子和/或所述第二结合分子回收和纯化步骤(iii)的富集产物。
  16. 权利要求12-15任一项的方法,其中,所述方法还包括,
    (v)对步骤(ii)回收的经标记的核酸分子或步骤(iv)回收的富集产物进行核酸扩增反应,以产生扩增产物;
    优选地,在步骤(v)中,所述核酸扩增反应使用至少第三引物和第四引物来进行;其中,所述第三引物能够与所述第一扩增引物序列的互补序列和/或所述第二共有序列的互补序列杂交或退火,且任选地含有第三标签序列;且,所述第四引物能够与所述第一共有序列的互补序列杂交或退火,且任选地含有第二扩增引物序列和/或第四标签序列;
    优选地,所述第三引物含有所述第一扩增引物序列或其部分序列,任选的第三标签序列,以及任选的第二共有序列或其部分序列;
    例如,所述第三引物含有:①所述第一扩增引物序列或其部分序列;或者,②所述第一扩增引物序列或其部分序列,以及所述第二共有序列或其部分序列,或者,③所述第一扩增引物序列或其部分序列,第三标签序列,以及所述第二共有序列或其部分序列;
    优选地,所述第四引物含有第二扩增引物序列,任选的第四标签序列,以及第一共有序列或其部分序列;
    例如,所述第四引物含有:①第二扩增引物序列,以及第一共有序列或其部分序列;或,②第二扩增引物序列,第四标签序列,以及第一共有序列或其部分序列;
    优选地,所述第三标签序列具有至少3个,至少4个,至少5个,至少6个,至少7个,至少8个,至少9个,至少10个或更多个核苷酸的长度;优选地,所述第三标签序列不包含修饰,或者包含修饰的核苷酸;
    优选地,所述第四标签序列具有至少3个,至少4个,至少5个,至少6个,至少7个,至少8个,至少9个,至少10个或更多个核苷酸的长度;优选地,所述第四标签序列不包含修饰,或者包含修饰的核苷酸;
    优选地,步骤(v)中的所述核酸扩增反应使用核酸聚合酶(例如DNA聚合酶;例如具有链置换活性和/或高保真性的DNA聚合酶)来进行;优选地,步骤(v)使用的核酸聚合酶(例如DNA聚合酶)与步骤(iii)相同或者不同;
    优选地,所述核酸分子文库包含步骤(v)的扩增产物;
    优选地,所述扩增产物的一条核酸链从5'末端到3'末端包含第二扩增引物序列,任选的第四标签序列,第一共有序列,第一标签序列,转座酶识别序列,cDNA片段的 序列,模板转换序列的互补序列,独特分子标签序列的互补序列,第二标签序列的互补序列,第二共有序列的互补序列,任选的第三标签序列的互补序列,以及第一扩增引物序列的互补序列;
    优选地,所述cDNA片段包含与RNA(例如,mRNA、长链非编码RNA、eRNA)的5’端序列互补的序列;
    优选地,所述核酸分子文库用于转录组测序(例如,5’端转录组测序)或用于靶核酸(例如,V(D)J序列)的测序。
  17. 权利要求12-16任一项的方法,其中,所述方法还包括对靶核酸分子进行富集的步骤;
    优选地,在所述步骤(ii)之后,在所述步骤(iii)之后,或者,在所述步骤(v)之后,对所述靶核酸分子进行富集。
  18. 权利要求17的方法,其中,所述靶核酸分子包含:(i)编码T细胞受体(TCR)或B细胞受体(BCR)的核苷酸序列或其部分序列(例如,V(D)J序列),和/或,(ii)(i)的互补序列。
  19. 一种对细胞或细胞核进行核酸测序的方法,其包括:
    根据权利要求12-18任一项所述的方法构建核酸分子文库;和,
    对所述核酸分子文库进行测序;
    优选地,在测序之前,将至少2个,至少3个,至少4个,至少5个,至少8个,至少10个,至少12个,至少15个,至少18个,至少20个,至少25个或更多个核酸分子文库合并,然后进行测序;其中,每个核酸分子文库各自具有多个核酸分子(即,扩增产物),且同一个文库中的所述多个核酸分子具有相同的第三标签序列或者相同的第四标签序列;且,来源于不同文库的核酸分子具有彼此不同的第三标签序列或者彼此不同的第四标签序列。
  20. 一种核酸分子文库,其包含多个核酸分子,其中,所述核酸分子的一条核酸链从5'末端到3'末端包含第一共有序列,第一标签序列,转座酶识别序列,cDNA片段 的序列,模板转换序列的互补序列,独特分子标签序列的互补序列,第二标签序列的互补序列,第二共有序列的互补序列;其中,所述cDNA片段包含与RNA(例如,mRNA、非编码RNA、eRNA)的5’端序列互补的序列;
    优选地,各个核酸分子的所述核酸链具有相同的第一共有序列,相同的转座酶识别序列,相同的模板转换序列的互补序列,和相同的第二共有序列的互补序列;
    优选地,cDNA片段衍生自同一个细胞的核酸分子的所述核酸链具有相同的第一标签序列,和相同的第二标签序列的互补序列;
    优选地,所述核酸链还具有位于第一共有序列上游的第二扩增引物序列和任选的第四标签序列;
    优选地,所述核酸链还具有位于第二共有序列的互补序列下游的任选的第三标签序列的互补序列和第一扩增引物序列的互补序列;
    优选地,所述第二扩增引物序列,第四标签序列,第一共有序列,第一标签序列,转座酶识别序列,cDNA片段,模板转换序列,独特分子标签序列,第二标签序列,第二共有序列,第三标签序列,和/或第一扩增引物序列分别如上文所定义;
    优选地,所述核酸分子文库是转录组文库;
    优选地,所述核酸分子文库中的核酸分子衍生自免疫细胞;
    优选地,所述免疫细胞选自B细胞和T细胞;
    优选地,所述核酸分子文库是通过权利要求12-18任一项所述的方法构建的。
  21. 一种试剂盒,其包含:逆转录酶,转座酶,和一种或多种所述转座酶能够识别并结合的转座序列,其中,
    所述转座酶和转座序列能够形成转座酶复合体,所述转座酶复合体能够切割或断裂双链核酸(例如,含有RNA和DNA的杂合双链核酸);并且,
    所述转座序列包含转移链和非转移链;其中,所述转移链包含转座酶识别序列,第一标签序列,以及,第一共有序列;其中,所述第一标签序列位于所述转座酶识别序列的上游(例如5’端),且,所述第一共有序列位于所述第一标签序列的上游(例如5’端);
    优选地,所述试剂盒包含至少2种(例如,至少3种,至少4种,至少5种,至少8种,至少10种,至少20种,至少50种,至少100种,至少200种,或更多种) 转座序列;其中,各种转座序列具有彼此不同的第一标签序列;优选地,各种转座序列具有相同的转座酶识别序列,相同的第一共有序列,和/或,相同的非转移链;
    优选地,所述逆转录酶具有末端转移活性;优选地,所述逆转录酶能够以RNA(例如,mRNA、非编码RNA、eRNA)为模板,合成cDNA链,且在所述cDNA链的3’端添加悬突;优选地,所述逆转录酶能够在cDNA链的3’末端添加长度为至少1个,至少2个,至少3个,至少4个,至少5个,至少6个,至少7个,至少8个,至少9个,至少10个或更多个核苷酸的悬突;优选地,所述逆转录酶能够在cDNA链的3’末端添加2-5个胞嘧啶核苷酸的悬突(例如CCC悬突);优选地,所述逆转录酶不具有或者具有降低的RNase活性(特别是RNase H活性);优选地,所述逆转录酶选自,经修饰或突变以去除RNase活性(特别是RNase H活性)的M-MLV逆转录酶、HIV-1逆转录酶、AMV逆转录酶和端粒酶逆转录酶(例如,不具有RNase H活性的M-MLV逆转录酶);
    优选地,所述转座酶选自Tn5转座酶、MuA转座酶、睡美人转座酶、Mariner转座酶、Tn7转座酶、Tn10转座酶、Ty1转座酶、Tn552转座酶,以及具有上述转座酶的转座活性的变体、修饰产物和衍生物;优选地,所述转座酶为Tn5转座酶;
    优选地,所述第一标签序列连接(例如直接连接)至所述转座酶识别序列的5’端;
    优选地,所述第一共有序列连接(例如直接连接)至所述第一标签序列的5’端;
    优选地,所述转移链从5’端至3’端包含第一共有序列,第一标签序列,和转座酶识别序列;优选地,所述转座酶识别序列具有如SEQ ID NO:99所示的序列;
    优选地,所述非转移链能够与所述转移链退火或杂交形成双链体;优选地,所述非转移链包含与转移链中的转座酶识别序列互补的序列;优选地,所述非转移链具有如SEQ ID NO:1所示的序列;
    优选地,所述转移链不包含修饰,或者包含修饰的核苷酸;和/或,所述非转移链不包含修饰,或者包含修饰的核苷酸;优选地,所述非转移链的5’末端具有磷酸基团修饰;和/或,所述非转移链的3’末端是封闭的(例如,所述非转移链的3’末端核苷酸为双脱氧的核苷酸);
    优选地,所述试剂盒还包含逆转录引物,例如包含poly(T)序列的引物和/或包含随机寡核苷酸序列的引物;优选地,所述poly(T)序列或所述随机寡核苷酸序列位于所述引物的3’端;优选地,所述poly(T)序列包含至少5个(例如,至少10个、至少15 个、或至少20个)胸腺嘧啶核苷酸残基;优选地,所述随机寡核苷酸序列具有5-30nt(例如,5-10nt,10-20nt,20-30nt)的长度;优选地,所述引物不包含修饰,或者包含修饰的核苷酸。
  22. 权利要求21的试剂盒,其还包含,用于构建转录组测序文库的试剂;
    优选地,所述用于构建转录组测序文库的试剂包括:偶联寡核苷酸分子的珠粒,所述寡核苷酸分子含有标记序列;
    优选地,所述寡核苷酸分子偶联至珠粒的表面,和/或,封闭在珠粒内;
    优选地,所述珠粒能够自发地或在暴露于一种或多种刺激(例如,温度变化、pH变化、暴露于特定化学物质或相、暴露于光、还原剂等)时释放所述寡核苷酸;
    优选地,所述珠粒是凝胶珠粒;
    优选地,所述标记序列包含选自下列的元件:第一扩增引物序列,第二共有序列,第二标签序列,独特分子标签序列,模板转换序列,或其任何组合;
    优选地,所述标记序列包含第二共有序列,第二标签序列,独特分子标签序列和模板转换序列;优选地,所述标记序列还包含第一扩增引物序列;
    优选地,所述模板转换序列包含与所述逆转录酶在cDNA链的3’末端添加的悬突互补的序列;优选地,所述悬突为2-5个胞嘧啶核苷酸的悬突(例如CCC悬突),且所述模板转换序列的3’末端包含2-5个鸟嘌呤核苷酸(例如GGG);优选地,所述模板转换序列不包含修饰,或者包含修饰的核苷酸(例如锁核酸);
    优选地,所述独特分子标签序列具有至少5个,至少6个,至少7个,至少8个,至少9个,至少10个或更多个核苷酸的长度;优选地,所述独特分子标签序列不包含修饰,或者包含修饰的核苷酸;
    优选地,所述第二标签序列具有至少3个,至少4个,至少5个,至少6个,至少7个,至少8个,至少9个,至少10个或更多个核苷酸的长度;优选地,所述第二标签序列不包含修饰,或者包含修饰的核苷酸;
    优选地,所述第二共有序列具有至少3个,至少4个,至少5个,至少6个,至少7个,至少8个,至少9个,至少10个或更多个核苷酸的长度;优选地,所述第二共有序列不包含修饰,或者包含修饰的核苷酸;
    优选地,所述珠粒偶联了多个寡核苷酸分子,并且,各个寡核苷酸分子具有彼此 不同的独特分子标签序列;优选地,各个寡核苷酸分子具有相同的第二标签序列和/或相同的第二共有序列;
    优选地,所述试剂含有多个珠粒,并且,每个珠粒各自具有多个寡核苷酸分子;并且,同一个珠粒上的所述多个寡核苷酸分子具有相同的第二标签序列,并且,不同珠粒上的寡核苷酸分子具有彼此不同的第二标签序列;优选地,各个珠粒上的寡核苷酸分子具有相同的第二共有序列;优选地,各个珠粒上的寡核苷酸分子还具有相同的第一扩增引物序列;
    优选地,所述模板转换序列位于所述标记序列的3’末端;
    优选地,所述第二共有序列位于所述第二标签序列,独特分子标签序列和/或模板转换序列的上游;
    优选地,所述第一扩增引物序列位于所述第二共有序列的上游;
    优选地,所述标记序列从5’端至3’端包含任选的第一扩增引物序列,第二共有序列,第二标签序列,独特分子标签序列和模板转换序列;
    优选地,所述试剂盒还包含矿物油,缓冲液,dNTP,一种或多种核酸聚合酶(例如DNA聚合酶;例如具有链置换活性和/或高保真性的DNA聚合酶),用于回收或纯化核酸的试剂(例如磁珠),用于扩增核酸的引物(例如上文所定义的第一引物,第二引物,第三引物,第四引物,或其任何组合),或其任何组合;
    优选地,所述试剂盒还包含用于测序的试剂;例如用于二代测序的试剂。
  23. 权利要求1-11任一项的方法或权利要求21或22的试剂盒用于构建核酸分子文库或用于进行转录组测序的用途;
    优选地,所述核酸分子文库用于进行转录组测序(例如,单细胞转录组测序);
    优选地,所述方法或试剂盒用于进行单细胞转录组测序;优选地,所述方法或试剂盒用于分析细胞或细胞核(例如,免疫细胞或其细胞核)的基因表达水平,基因转录起始位置,和/或,RNA(例如,mRNA、长链非编码RNA、eRNA)分子的5’末端序列;
    优选地,所述方法或试剂盒用于构建细胞或细胞核(例如,免疫细胞或其细胞核)的转录组文库或用于进行细胞或细胞核(例如,免疫细胞或其细胞核)的转录组测序;
    优选地,所述免疫细胞选自B细胞和T细胞。
PCT/CN2021/139123 2020-12-31 2021-12-17 用于标记核酸分子的方法和试剂盒 WO2022143221A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP21913962.3A EP4279609A1 (en) 2020-12-31 2021-12-17 Method and kit for labeling nucleic acid molecules

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011639159 2020-12-31
CN202011639159.X 2020-12-31

Publications (1)

Publication Number Publication Date
WO2022143221A1 true WO2022143221A1 (zh) 2022-07-07

Family

ID=80069080

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/139123 WO2022143221A1 (zh) 2020-12-31 2021-12-17 用于标记核酸分子的方法和试剂盒

Country Status (3)

Country Link
EP (1) EP4279609A1 (zh)
CN (1) CN114015755B (zh)
WO (1) WO2022143221A1 (zh)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115386624A (zh) * 2022-10-26 2022-11-25 北京寻因生物科技有限公司 一种单细胞全序列标记的方法及其应用
WO2023044307A1 (en) * 2021-09-14 2023-03-23 Becton, Dickinson And Company Full length single cell rna sequencing
US11634708B2 (en) 2012-02-27 2023-04-25 Becton, Dickinson And Company Compositions and kits for molecular counting
US11639517B2 (en) 2018-10-01 2023-05-02 Becton, Dickinson And Company Determining 5′ transcript sequences
US11649497B2 (en) 2020-01-13 2023-05-16 Becton, Dickinson And Company Methods and compositions for quantitation of proteins and RNA
US11702706B2 (en) 2013-08-28 2023-07-18 Becton, Dickinson And Company Massively parallel single cell analysis
US11739443B2 (en) 2020-11-20 2023-08-29 Becton, Dickinson And Company Profiling of highly expressed and lowly expressed proteins
US11773436B2 (en) 2019-11-08 2023-10-03 Becton, Dickinson And Company Using random priming to obtain full-length V(D)J information for immune repertoire sequencing
US11782059B2 (en) 2016-09-26 2023-10-10 Becton, Dickinson And Company Measurement of protein expression using reagents with barcoded oligonucleotide sequences
US11845986B2 (en) 2016-05-25 2023-12-19 Becton, Dickinson And Company Normalization of nucleic acid libraries
US11932901B2 (en) 2020-07-13 2024-03-19 Becton, Dickinson And Company Target enrichment using nucleic acid probes for scRNAseq
US11939622B2 (en) 2019-07-22 2024-03-26 Becton, Dickinson And Company Single cell chromatin immunoprecipitation sequencing assay

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114507711A (zh) * 2022-02-24 2022-05-17 浙江大学 一种单细胞转录组测序方法及其应用
CN114807305A (zh) * 2022-04-13 2022-07-29 首都医科大学附属北京口腔医院 一种构建原核生物单细胞rna测序文库的方法
CN117089607A (zh) * 2022-05-11 2023-11-21 中国科学院北京基因组研究所(国家生物信息中心) 一种单细胞RNA m5C修饰的分析方法
CN115386622B (zh) * 2022-10-26 2023-10-27 北京寻因生物科技有限公司 一种转录组文库的建库方法及其应用
CN116606905B (zh) * 2023-04-03 2024-01-05 中山大学 一种在细胞内原位进行全长mRNA反转录和转座的试剂盒、方法及其应用

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102264914A (zh) * 2008-10-24 2011-11-30 阿霹震中科技公司 用于修饰核酸的转座子末端组合物和方法
US20140093916A1 (en) * 2012-10-01 2014-04-03 Agilent Technologies, Inc. Immobilized transposase complexes for dna fragmentation and tagging
CN109207572A (zh) * 2018-09-29 2019-01-15 苏州贝康医疗器械有限公司 单细胞高通量测序文库构建方法及其试剂盒
CN109526228A (zh) * 2017-05-26 2019-03-26 10X基因组学有限公司 转座酶可接近性染色质的单细胞分析

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10017759B2 (en) * 2014-06-26 2018-07-10 Illumina, Inc. Library preparation of tagged nucleic acid
FI3810774T3 (fi) * 2018-06-04 2023-12-11 Illumina Inc Menetelmiä suuritehoisten yksittäissolutranskriptomikirjastojen valmistamiseksi

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102264914A (zh) * 2008-10-24 2011-11-30 阿霹震中科技公司 用于修饰核酸的转座子末端组合物和方法
US20140093916A1 (en) * 2012-10-01 2014-04-03 Agilent Technologies, Inc. Immobilized transposase complexes for dna fragmentation and tagging
CN109526228A (zh) * 2017-05-26 2019-03-26 10X基因组学有限公司 转座酶可接近性染色质的单细胞分析
CN109207572A (zh) * 2018-09-29 2019-01-15 苏州贝康医疗器械有限公司 单细胞高通量测序文库构建方法及其试剂盒

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
A. B. ROSENBERG, NATURE METHODS, vol. 172, no. 12, 2018, pages 1126 - 1126
ALEX K SHALEK ET AL., NATURE METHODS, vol. 14, no. 7, 2017, pages 752 - 752
BIORXIV, 2019, pages 2019
CELL, vol. 161, no. 5, 2015, pages 1202 - 1214
CHARLES COLE, ASHLEY BYRNE, ANNA E BEAUDIN, E CAMILLA FORSBERG, CHRISTOPHER VOLLMERS: "Tn5Prime, a Tn5 based 5′ capture method for single cell RNA-seq", NUCLEIC ACIDS RESEARCH, OXFORD UNIVERSITY PRESS, GB, vol. 46, no. 10, 1 June 2018 (2018-06-01), GB , pages e62 - e62, XP055637367, ISSN: 0305-1048, DOI: 10.1093/nar/gky182 *
MCGINNIS, C.S. ET AL., NATURE METHODS, vol. 16, no. 7, 2019, pages 619 - 626
SIMONE PICELLI, ÅSA K. BJÖRKLUND, BJÖRN REINIUS, SVEN SAGASSER, GÖSTA WINBERG, RICKARD SANDBERG: "Tn5 transposase and tagmentation procedures for massively scaled sequencing projects", GENOME RESEARCH, COLD SPRING HARBOR LABORATORY PRESS, US, vol. 24, no. 12, 1 December 2014 (2014-12-01), US , pages 2033 - 2040, XP055236186, ISSN: 1088-9051, DOI: 10.1101/gr.177881.114 *
TANG JIANGTAO, ET AL.: "Research development of Tn5transposition mechanism", JI YIN ZU XUE YU YING YONG SHENG WU XUE [GENOMICS AND APPLIED BIOLOGY], CHINA, vol. 22, no. 4, 31 December 2003 (2003-12-31), China , pages 316 - 321, XP055947312, ISSN: 1008-3464 *
TU, A.A. ET AL.: "TCR sequencing paired with massively parallel 3' RNA-seq reveals clonotypic T cell signatures", NAT IMMUNOL, vol. 20, 2019, pages 1692 - 1699, XP036928307, DOI: 10.1038/s41590-019-0544-5

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11634708B2 (en) 2012-02-27 2023-04-25 Becton, Dickinson And Company Compositions and kits for molecular counting
US11702706B2 (en) 2013-08-28 2023-07-18 Becton, Dickinson And Company Massively parallel single cell analysis
US11845986B2 (en) 2016-05-25 2023-12-19 Becton, Dickinson And Company Normalization of nucleic acid libraries
US11782059B2 (en) 2016-09-26 2023-10-10 Becton, Dickinson And Company Measurement of protein expression using reagents with barcoded oligonucleotide sequences
US11639517B2 (en) 2018-10-01 2023-05-02 Becton, Dickinson And Company Determining 5′ transcript sequences
US11939622B2 (en) 2019-07-22 2024-03-26 Becton, Dickinson And Company Single cell chromatin immunoprecipitation sequencing assay
US11773436B2 (en) 2019-11-08 2023-10-03 Becton, Dickinson And Company Using random priming to obtain full-length V(D)J information for immune repertoire sequencing
US11649497B2 (en) 2020-01-13 2023-05-16 Becton, Dickinson And Company Methods and compositions for quantitation of proteins and RNA
US11932901B2 (en) 2020-07-13 2024-03-19 Becton, Dickinson And Company Target enrichment using nucleic acid probes for scRNAseq
US11739443B2 (en) 2020-11-20 2023-08-29 Becton, Dickinson And Company Profiling of highly expressed and lowly expressed proteins
WO2023044307A1 (en) * 2021-09-14 2023-03-23 Becton, Dickinson And Company Full length single cell rna sequencing
CN115386624A (zh) * 2022-10-26 2022-11-25 北京寻因生物科技有限公司 一种单细胞全序列标记的方法及其应用

Also Published As

Publication number Publication date
CN114015755B (zh) 2024-03-01
CN114015755A (zh) 2022-02-08
EP4279609A1 (en) 2023-11-22

Similar Documents

Publication Publication Date Title
WO2022143221A1 (zh) 用于标记核酸分子的方法和试剂盒
US20220154288A1 (en) Combined analysis of cell-free nucleic acids and single cells for oncology diagnostics
US11932849B2 (en) Whole transcriptome analysis of single cells using random priming
EP4158055B1 (en) Oligonucleotides and beads for 5 prime gene expression assay
US20230203577A1 (en) Methods and systems for processing polynucleotides
EP3978622B1 (en) Composition for processing polynucleotides
US20210238661A1 (en) Mesophilic dna polymerase extension blockers
US11841371B2 (en) Proteomics and spatial patterning using antenna networks
US20200109437A1 (en) Determining 5' transcript sequences
EP4090763A1 (en) Methods and compositions for quantitation of proteins and rna
EP4150118A1 (en) Primers for immune repertoire profiling
US11939622B2 (en) Single cell chromatin immunoprecipitation sequencing assay
WO2021046232A1 (en) Optically readable barcodes and systems and methods for characterizing molecular interactions
CA3153296A1 (en) Single cell genetic analysis
US11946095B2 (en) Particles associated with oligonucleotides
WO2020167830A1 (en) Determining expressions of transcript variants and polyadenylation sites
WO2020150356A1 (en) Polymerase chain reaction normalization through primer titration
CN109680343A (zh) 一种外泌体微量dna的建库方法
EP3615683B1 (en) Methods for linking polynucleotides
WO2019129133A1 (zh) 一种获得单细胞mRNA序列的方法
WO2023116376A1 (zh) 单细胞核酸标记和分析方法
Wang Droplet microfluidics for high-throughput single-cell analysis
WO2023028582A1 (en) Controlled cell-cell interaction assay
CN115175985A (zh) 从未经处理的生物样本中提取单链dna和rna并测序的方法
CN117651611A (zh) 生物分子的高通量分析

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21913962

Country of ref document: EP

Kind code of ref document: A1

DPE2 Request for preliminary examination filed before expiration of 19th month from priority date (pct application filed from 20040101)
NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021913962

Country of ref document: EP

Effective date: 20230731