WO2022018055A1 - Circulation method to sequence immune repertoires of individual cells - Google Patents

Circulation method to sequence immune repertoires of individual cells Download PDF

Info

Publication number
WO2022018055A1
WO2022018055A1 PCT/EP2021/070210 EP2021070210W WO2022018055A1 WO 2022018055 A1 WO2022018055 A1 WO 2022018055A1 EP 2021070210 W EP2021070210 W EP 2021070210W WO 2022018055 A1 WO2022018055 A1 WO 2022018055A1
Authority
WO
WIPO (PCT)
Prior art keywords
nucleic acid
sequencing
rna
dna
barcode
Prior art date
Application number
PCT/EP2021/070210
Other languages
French (fr)
Inventor
Gerd MEYER ZU HÖRSTE
Xiaolin Li
Original Assignee
Westfälische Wilhelms-Universität Münster
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Westfälische Wilhelms-Universität Münster filed Critical Westfälische Wilhelms-Universität Münster
Publication of WO2022018055A1 publication Critical patent/WO2022018055A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing

Definitions

  • the present invention relates to method/s (e.g., sequencing and/or nucleic acid library construction methods) comprising self-circularization of nucleic acids of interest (e.g., barcoded/labelled nucleic acids) to bring/move an off-barcode region of said nucleic acid closer to a barcode, whereby allowing for sequencing the off-barcode region with the barcode after circularization.
  • nucleic acids of interest e.g., barcoded/labelled nucleic acids
  • a read length limitation is the longest fragment that can be sequenced by a single-end sequencing run and the insert fragment length is the longest fragment that can be sequenced by a paired-end sequencing run.
  • the present invention relates to a method for producing/modifying a nucleic acid of interest (e.g., DNA, RNA or cDNA, etc.) carrying (e.g., comprising) at least one specific barcode (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or at least 100 barcodes, etc.) at its 3’-end and/or 5’-end, said method comprising: (a) circularizing (e.g., self-circularizing) said nucleic acid of interest into a circular nucleic acid, preferably said circularizing (e.g., self-circularizing) is carried out by the means of an enzymatic ligation (e.g., by the means of: DNA ligase (e.g., T4 DNA Ligase) or RNA ligase, e.g., DNA ligase having EC:6.5.1.1, EC:6.5.1.2, EC:6.5.
  • SEQ ID NO: 1 is the DNA sequence of the exemplary sequencing primer 1 (“readl” sequencing primer).
  • SEQ ID NO: 2 is the DNA sequence of the exemplary sequencing primer 2 (“read2” sequencing primer).
  • SEQ ID NO: 3 is the amino acid sequence of the Clonotype 1 IGL CDR3 (e.g., Figure 3).
  • SEQ ID NO: 4 is the amino acid sequence of the Clonotype 2 IGK CDR3 (e.g., Figure 3).
  • SEQ ID NO: 5 is the amino acid sequence of the Clonotype 3 IGK CDR3 (e.g., Figure 3).
  • SEQ ID NO: 6 is the amino acid sequence of the Clonotype 4 IGL CDR3 (e.g., Figure 3).
  • SEQ ID NO: 7 is the amino acid sequence of the Clonotype 9 IGK CDR3 (e.g., Figure 3).
  • SEQ ID NO: 8 is the amino acid sequence of the Clonotype 8 IGK CDR3 (e.g., Figure 3).
  • SEQ ID NO: 9 is the amino acid sequence of the Clonotype 5 IGL CDR3 (e.g., Figure 3).
  • SEQ ID NO: 10 is the amino acid sequence of the Clonotype 7 IGK CDR3 (e.g., Figure [0021]
  • SEQ ID NO: 11 is the amino acid sequence of the Clonotype 6 IGK CDR3 (e.g., Figure 3).
  • SEQ ID NO: 12 is the amino acid sequence of the Clonotype 10 IGK CDR3 (e.g., Figure 3).
  • SEQ ID NO: 13 is the amino acid sequence of the Clonotype 1 TRA CDR3 (e.g., Figure 10).
  • SEQ ID NO: 14 is the amino acid sequence of the Clonotype 1 TRB CDR3 (e.g., Figure 10).
  • SEQ ID NO: 15 is the amino acid sequence of the Clonotype 2 TRB CDR3 (e.g., Figure 10).
  • SEQ ID NO: 16 is the amino acid sequence of the Clonotype 3 TRA CDR3 (e.g., Figure 10).
  • SEQ ID NO: 18 is the amino acid sequence of the Clonotype 4 TRB CDR3 (e.g., Figure 10).
  • SEQ ID NO: 19 is the amino acid sequence of the Clonotype 4 TRB CDR3 (e.g., Figure 10).
  • SEQ ID NO: 20 is the amino acid sequence of the Clonotype 5 TRA CDR3 (e.g., Figure 10).
  • SEQ ID NO: 21 is the amino acid sequence of the Clonotype 5 TRB CDR3 (e.g., Figure 10).
  • SEQ ID NO: 22 is the amino acid sequence of the Clonotype 6 TRB CDR3 (e.g., Figure 10).
  • SEQ ID NO: 23 is the amino acid sequence of the Clonotype 6 TRB CDR3 (e.g., Figure 10).
  • SEQ ID NO: 25 is the amino acid sequence of the Clonotype 12 TRB CDR3 (e.g., Figure 10).
  • SEQ ID NO: 26 is the amino acid sequence of the Clonotype 9 TRA CDR3 (e.g., Figure 10).
  • SEQ ID NO: 27 is the amino acid sequence of the Clonotype 8 TRA CDR3 (e.g., Figure 10).
  • SEQ ID NO: 28 is the amino acid sequence of the Clonotype 13 TRB CDR3 (e.g., Figure 10).
  • SEQ ID NO: 31 is the DNA sequence of the exemplary “Trxc rev pooljn” primer (content of pool: mTRBC_1).
  • SEQ ID NO: 32 is the DNA sequence of the exemplary “Trxc rev pool_out” primer (content of pool: mTRAC_2).
  • SEQ ID NO: 33 is the DNA sequence of the exemplary “Trxc rev pool_out” primer (content of pool: mTRBC_2).
  • SEQ ID NO: 34 is the DNA sequence of the exemplary “TSO” primer.
  • Figure 1 schematically shows a “near-barcode region”, “off barcode region” and sequencing library construction of barcoded RNA/DNA.
  • Current barcoding techniques add the barcoding sequences at either the 3’- or 5’-end of DNA/RNA fragments. Because of the short read length and short inserting fragment size, short read sequencers can only sequence the barcode together with the region near the barcode (less than the limitation of either the read length or inserting fragment length of the sequencing library).
  • Readl e.g., SEQ ID NO: 1
  • Read2 e.g., SEQ ID NO: 2
  • P5 and P7 are the sequences to bind with sequencing chips of the “Nlumina” sequencers.
  • i5 and i7 are indexes to identify libraries. The final library is sequenced by short read sequencer, but only the barcode and near-barcode region.
  • Figure 2 shows an exemplary embobiment of the method of the present invention together with exemplary molecular constructs/structures of the corresponding method steps (e.g., carried out with SEQ ID NOs: 1-2, 29, 30, 31, 32, 33 and/or 34).
  • FIG. 3 shows the BCR(IGH) annotation result and the statistics of the immune repertoire counting including V(D)J Annotation, Top 10 Clonotype frequencies and Top 10 Clonotype CDR3 sequences (e.g., SEQ ID NOs: 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) .
  • V(D)J variation regions of BCR.
  • IGK Immunoglobulin light chain kappa.
  • IGL Immunoglobulin light chain lambda.
  • IGH Immunoglobulin heavy chain.
  • Contig a set of overlapping DNA segments that together represent a consensus region of DNA.
  • CDR3 the main CDR complementarity determining regions) responsible for recognizing processed antigen.
  • V-J spanning pair fraction of cell- associated barcodes with at least one contig for each chain of the receptor pair.
  • Clonotype The phenotype of a clone of a cell.
  • Figure 4 shows an exemplary embodiment of the invention utilizing circulazization of the barcoded DNA/RNA.
  • the off-barcode region is ligated to a barcode on the other side, thus the off-barcode region becomes the near-barcode region and can be sequenced together with barcode by a short-read sequencer (e.g., with SEQ ID NOs: 1-2, 29, 30, 31, 32, 33 or 34).
  • a short-read sequencer e.g., with SEQ ID NOs: 1-2, 29, 30, 31, 32, 33 or 34.
  • Figure 5 shows a further exemplary embodiment of the invention where examplary method steps are shown together with corresponding exemplary molecular constructs/structures (e.g., with SEQ ID NOs: 1-2, 29, 30, 31, 32, 33 or 34).
  • Figure 6 shows a yet another exemplary embodiment of the invention where examplary method steps are shown together with corresponding exemplary molecular constructs/structures (e.g., with SEQ ID NOs: 1-2, 29, 30, 31, 32, 33 or 34).
  • Figure 7 shows the validation of the data rate, defined by the percentage of data maps to BCR after data filtering and UMI adjusting. For the total data, valid rate is 98.95%, on average of cells, validation rate is 98.73%.
  • Figure 8 shows the BCR analysis data including statistic of sequencing, including BCR containing cell numbers, enrichment rate (Note: enrichment rate is based on total reads).
  • Figure 9 shows a schematic view of the single cell TCR sequencing from 3’ single cell cDNA library as used in Example 2 herein (e.g., with SEQ ID NOs: 1-2, 29, 30, 31, 32, 33 or 34).
  • FIG. 10 shows the TCR annotation result and the statistics of the immune repertoire counting including V(D)J Annotation, Top 10 Clonotype frequencies and Top 10 Clonotype CDR3 sequences (e.g., SEQ ID NOs: 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28).
  • V(D)J variation regions of TCR.
  • TRA T cell Receptor Alpha.
  • TRB T cell Receptor Beta.
  • Contig a set of overlapping DNA segments that together represent a consensus region of DNA.
  • CDR3 the main CDR (complementarity determining regions) responsible for recognizing processed antigen.
  • V-J spanning pair fraction of cell-associated barcodes with at least one contig for each chain of the receptor pair.
  • Clonotype The phenotype of a clone of a cell.
  • barcode or “barcode sequence” may refer to any unique sequence label that can be coupled to at least one nucleotide sequence for, e.g., later identification of the at least one nucleotide sequence.
  • patient may be used interchangeably and refer to either a human or a non-human animal. These terms include mammals such as humans, primates, livestock animals (e.g., bovines, porcines), companion animals (e.g., canines, felines) and rodents (e.g., mice and rats).
  • livestock animals e.g., bovines, porcines
  • companion animals e.g., canines, felines
  • rodents e.g., mice and rats.
  • diagnosis may refer to methods by which the skilled artisan can estimate and/or determine whether or not a patient is afflicted with a given disease or condition.
  • the skilled worker often makes a diagnosis based on one or more diagnostic indicators.
  • Exemplary diagnostic indicators may include the manifestation of symptoms or the presence, absence, or change in one or more markers for the disease or condition.
  • a diagnosis may indicate the presence or absence, or severity, of the disease or condition.
  • prognosis may refer to the likelihood of the progression or regression of a disease or condition, including likelihood of the recurrence of a disease or condition.
  • treating may refer to taking steps to obtain beneficial or desired results, including clinical results.
  • beneficial or desired clinical results include, but are not limited to, reduction, alleviation or amelioration of one or more symptoms associated with the disease or condition.
  • administering or “administration of’ a compound or an agent to a subject can be carried out using one of a variety of methods known to those skilled in the art.
  • a compound or an agent can be administered orally, intravenously, arterially, intradermally, intramuscularly, intraperitoneally, subcutaneously, ocularly, sublingually, intranasally, intraspinally, intracerebrally, and transdermally.
  • a compound or agent can appropriately be introduced by rechargeable or biodegradable polymeric devices or other devices, e.g., patches and pumps, or formulations, which provide for the extended, slow, or controlled release of the compound or agent.
  • Administering can also be performed, for example, once, a plurality of times, and/or over one or more extended periods.
  • Administration of a compound may include both direct administration, including self-administration, and indirect administration, including the act of prescribing a drug. For example, a physician who instructs a patient to self-administer a therapeutic agent, or to have the agent administered by another, and/or who provides a patient with a prescription for a drug has administered the drug to the patient.
  • nucleic acid may refer to DNA molecules (e.g., cDNA or genomic DNA), RNA molecules (e.g., mRNA), DNA-RNA hybrids, and analogs of the DNA or RNA generated using nucleotide analogs.
  • the nucleic acid molecule can be a nucleotide, oligonucleotide, double- stranded DNA, single-stranded DNA, multi-stranded DNA, complementary DNA, genomic DNA, non-coding DNA, messenger RNA (mRNA), microRNA (miRNA), small nucleolar RNA (snoRNA), ribosomal RNA (rRNA), transfer RNA (tRNA), small interfering RNA (siRNA), heterogeneous nuclear RNAs (hnRNA), or small hairpin RNA (shRNA).
  • mRNA messenger RNA
  • miRNA microRNA
  • rRNA ribosomal RNA
  • tRNA transfer RNA
  • siRNA small interfering RNA
  • hnRNA heterogeneous nuclear RNAs
  • shRNA small hairpin RNA
  • linearized nucleic acid may refer to a nucleic acid with one or two ends on each side of the nucleic acid molecule.
  • Linearized DNA may refer to the DNA with two ends on each side of the DNA molecule.
  • Linearized RNA may refer to the RNA with one end on each side of the RNA molecule.
  • enriching may refer to increasing the quantity or amount of nucleic acid (e.g., by the means of PCR or any other suitable technique as discrebed herein).
  • the term “adapter” or “adaptor” may refer to a linker in genetic engineering that is a short, chemically synthesized, single-stranded or double-stranded oligonucleotide that can be ligated to the ends of other DNA or RNA molecules.
  • polypeptide is equally used herein with the term “protein”. Proteins (including fragments thereof, preferably biologically active fragments, and peptides, usually having less than 30 amino acids) comprise one or more amino acids coupled to each other via a covalent peptide bond (resulting in a chain of amino acids, e.g., SEQ ID NOs: 3-28).
  • polypeptide as used herein describes a group of molecules, which, for example, consist of more than 30 amino acids. Polypeptides may further form multimers such as dimers, trimers and higher oligomers, i.e. consisting of more than one polypeptide molecule. Polypeptide molecules forming such dimers, trimers etc. may be identical or non-identical.
  • heteromultimer is an antibody molecule, which, in its naturally occurring form, consists of two identical light polypeptide chains and two identical heavy polypeptide chains.
  • polypeptide and protein may also refer to naturally modified polypeptides/proteins wherein the modification is effected e.g. by post-translational modifications like glycosylation, acetylation, phosphorylation and the like. Such modifications are well known in the art.
  • variable refers to the portions of the immunoglobulin domains that exhibit variability in their sequence and that are involved in determining the specificity and binding affinity of a particular antibody (i.e., the "variable domain(s)"). Variability is not evenly distributed throughout the variable domains of antibodies; it is concentrated in sub-domains of each of the heavy and light chain variable regions. These sub-domains are called “complementarity determining regions” (CDRs).
  • CDRs complementarity determining regions
  • each subunit structure e.g., a CH, VH, CL, VL, CDR, FR structure
  • comprises active fragments e.g., the portion of the VH, VL, or CDR subunit the binds to the antigen, i.e. , the antigen-binding fragment, or, e.g., the portion of the CH subunit that binds to and/or activates, e.g., an Fc receptor and/or complement.
  • the CDRs typically refer to the Kabat CDRs, as described in Sequences of Proteins of immunological Interest, US Department of Health and Human Services (1991), eds. Kabat et al.
  • a “profile” of a transcriptome or portion of a transcriptome can refer to any sequencing or gene expression information concerning the transcriptome or portion thereof. This information can be either qualitative (e.g., presence or absence) or quantitative (e.g., levels or mRNA copy numbers). In some embodiments, a profile can indicate a lack of expression of one or more genes.
  • a “single cell” may refer to one cell.
  • Single cells useful in the methods described herein can be obtained from a tissue of interest, or from a biopsy, blood sample, or cell culture. Additionally, cells from specific organs, tissues, tumors, neoplasms, or the like can be obtained and used in the methods described herein. Cells can be cultured cells or cells from a dissociated tissue, and can be fresh or preserved in a preservative buffer such as RNAprotect. Furthermore, in general, cells from any population can be used in the methods, such as a population of prokaryotic or eukaryotic single-celled organisms including bacteria or yeast.
  • the method of preparing the cDNA library can include the step of obtaining single cells.
  • a single cell suspension can be obtained using standard methods known in the art including, for example, enzymatically using trypsin or papain to digest proteins connecting cells in tissue samples or releasing adherent cells in culture, or mechanically separating cells in a sample.
  • Single cells can be placed in any suitable reaction vessel in which single cells can be treated individually. For example a 96-well plate, such that each single cell is placed in a single well.
  • an “oligonucleotide” or “polynucleotide” may refer to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides or analogs thereof. Polynucleotides can have any three-dimensional structure and can perform any function.
  • a polynucleotide is composed of a specific sequence of four nucleotide bases: adenine (A), cytosine (C), guanine (G), and thymine (T).
  • Uracil (U) substitutes for thymine when the polynucleotide is RNA.
  • the sequence can be input into databases in a computer having a central processing unit and used for bioinformatics applications such as functional genomics and homology searching.
  • a “primer” may refer to a polynucleotide that hybridizes to a target or template that may be present in a sample of interest. After hybridization, the primer promotes the polymerization of a polynucleotide complementary to the target, for example in a reverse transcription or amplification reaction (e.g., SEQ ID NOs: 1 , 2, 29, 30, 31, 32, 33 or 34).
  • a reverse transcription or amplification reaction e.g., SEQ ID NOs: 1 , 2, 29, 30, 31, 32, 33 or 34.
  • sequence identity may refer to the relatedness between two amino acid sequences or between two nucleotide sequences and is described by the parameter “sequence identity”.
  • sequence identity may refer to the relatedness between two amino acid sequences or between two nucleotide sequences and is described by the parameter “sequence identity”.
  • sequence identity may refer to the relatedness between two amino acid sequences or between two nucleotide sequences and is described by the parameter “sequence identity”.
  • sequence identity between two amino acid sequences is determined using the Needleman-Wunsch algorithm (Needleman and Wunsch, 1970, J. Mol. Biol. 48: 443-453) as implemented in the Needle program of the EMBOSS package (EMBOSS: The European Molecular Biology Open Software Suite, Rice et al., 2000, Trends Genet. 16: 276-277), preferably version 5.0.0 or later.
  • the parameters used may be gap open penalty of 10, gap extension penalty of 0.5, and the EBLOSUM62 (EMBOSS version of BLOSUM62) substitution matrix.
  • the output of Needle labeled “longest identity” (obtained using the no-brief option) is used as the percent identity and is calculated as follows:
  • amplification may refer to a process by which multiple copies of a particular polynucleotide are formed, and includes methods such as the polymerase chain reaction (PCR), ligation amplification (also known as ligase chain reaction, or LCR), and other amplification methods.
  • PCR polymerase chain reaction
  • LCR ligation amplification
  • amplification refers specifically to PCR.
  • Amplification methods are widely known in the art.
  • PCR refers to a method of amplification comprising hybridization of primers to specific sequences within a DNA sample and amplification involving multiple rounds of annealing, elongation, and denaturation using a DNA polymerase.
  • the resulting DNA products are then often screened for a band of the correct size.
  • the primers used are oligonucleotides of appropriate length and sequence to provide initiation of polymerization (e.g., SEQ ID NOs: 1, 2, 29, 30, 31, 32, 33 or 34). Reagents and hardware for conducting amplification reactions are widely known and commercially available. [0082] As used herein, “sequencing” may refer to any technique known in the art that allows the identification of consecutive nucleotides of at least part of a nucleic acid.
  • sequencing comprises detecting a sequencing product using an instrument, for example but not limited to an ABI PRISMTM 377 DNA Sequencer, an ABI PRISMTM 310, 3100, 3100-Avant, 3730, or 3730x1 Genetic Analyzer, an ABI PRISMTM 3700 DNA Analyzer, or an Applied Biosystems SOLiDTM System (all from Applied Biosystems), a Genome Sequencer 20 System (Roche Applied Science), or a mass spectrometer.
  • “High-throughput” or “nextgeneration sequencing” can sequence mass amount of DNA fragments in parallel, thus reduce the cost and time for high demand for large scale of sequencing. It can be categorized into short read sequencing and long read sequencing. Short read sequencing is currently most commonly used technique because of its cost effectiveness and high throughput.
  • the invention inter alia is useful in generating gene expression profiles for a plurality of ceils. These gene expression profiles can be used in a number of applications related to the diagnosis, prognosis, and treatment of subjects.
  • the term “at least” preceding a series of elements is to be understood to refer to every element in the series.
  • the term “at least one” refers, if not particularly defined differently, to one or more such as two, three, four, five, six, seven, eight, nine, ten or more.
  • Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments of the invention described herein. Such equivalents are intended to be encompassed by the present invention.
  • the term “and/or” wherever used herein includes the meaning of “and”, “or” and “all or any other combination of the elements connected by said term”.
  • Lymphocytes T and B cells recognize antigens through their highly variable antigen receptor (AgR) and each lymphocyte expresses one single variant.
  • the immune repertoire denotes the number of different AgR variants an organism's adaptive immune system makes.
  • a typical way to characterize the ImR is to sequence the highly variable V(D)J region of RNA or DNA molecules derived from the immunoglobulin gene of B cells or from the T cell receptor of T cells.
  • Single cell RNA and DNA sequencing can sequence individual genes at single cell resolution.
  • the most efficient and commonly used way of single cell RNA sequencing (scRNA-seq) is to use 3’ barcoding technology to bio-informatically identify cells in the sequencing library.
  • the highly variable V(D)J region is outside of the regions that can be sequenced together with cell barcodes.
  • the inventors denote this as the “off-barcode” region of the cDNA libraries.
  • the ImR can thus currently not be sequenced by 3’ barcoding scRNA-seq.
  • the present invention utilizes self circulating technology to bring the V(D)J region closer to the cell barcodes. This makes it for the first time possible to sequence the ImR with 3’ barcoding single cell sequencing technology while maintaining the advantages of short-read sequencers.
  • the present invention can also be modified to sequence the “off-barcode” region for all other sequencing technologies utilized barcoding.
  • transcriptome profiling with UMI 5’ transcriptome profiling with UMI, 5’ single cell transcriptome profiling, barcoding genome sequencing, barcoding de novo sequencing, barcoding single cell genome seqeuncing, barcoding single cell de novo sequencing, barcoding Hi-C, barcoding single cell Hi-C, barcoding exom sequencing, single cell barcoding exome sequencing, barcoding target enrichment sequencing, single cell barcoding target enrichment sequencing, RNA-seq with UMIs, DNA seq with UMIs, single cell RNA sequencing with barcoding, single cell DNA seq with barcoding, ATAC seq with barcoding technoligies, single cell ATAC sequencing with barcoding techniques, barcoding cacencer pannel enrichment sequencing, barcoding cacencer
  • AgR sequences contain regions denoted variable (V), joining (J), and in some cases, diversity (D) followed by a constant (C) region.
  • V variable
  • J joining
  • D diversity
  • C constant
  • Immune repertoire sequencing is to sequence the variable V(D)J region to identify their sequence which allows understanding recombination events, clonal expansion and trafficking of lymphocytes in health and disease.
  • the inventors aim to sequence the V, D and J region.
  • the cell barcode will be add on the 3’ of mRNA, resulting in the 3’ barcoded cDNA.
  • the barcodes are close to the constant region but far away from the variable region, which leaves barely a chance to the sequence the V, D or J region together with the barcode.
  • An alternative method would be utilizing 5’ barcoding single cell sequencing.
  • the barcode is added on 5’ end of the mRNA, which is near the V(D)J region of immune repertoire. In the case, the V(D)J region lays in the near-barcode region, thus it is possible to sequence it together with the barcode.
  • the most popular single cell sequencing is based on a 3’ barcoding technique, and the results of 5’ barcoding single cell sequencing are not totally comparable with 3’ barcoding single cell sequencing.
  • the present invention is based on 3’ barcoding technique, but the inventors can design V, D, J regions in close proximity to the 3’ barcode, and thus make it possible to sequence them together by short read sequencer.
  • NGS Next generation sequencing
  • technologies can sequence millions to billions of DNA fragments in one single sequencing run.
  • the revolutionary speed and considerably cheaper cost per base has facilitated a wide range of applications.
  • NGS techniques can be categorized with sequencing by ligation or sequencing by synthesis. But both of them have relatively short read length (Common limitation is 150-250 bp, for some sequencer this limitation could be up to 700 bp) and short read sequencers also have limitations of the insertion fragment size.
  • one of the most cost effective sequencers in current time e.g., “Novaseq” from “Nlumina” is recommended to have insert size ranging between 100 bp and 500 bp.
  • Barcoding technology in general denotes using a short section of DNA sequences as an identifier of a fragment of DNA/RNA. Barcoding technology can be used in molecular identification, cell identification, tissue/organ identification, species identification, sample identification, group identification, antibody identification, chemical identification, molecular quantification and de-multiplexing. In essence, barcodes are used to identify DNA/RNA molecules bio-informatically through the barcode sequence rather than physically by separating DNA/RNA from different sorces.
  • UMI RNA unique molecular identifiers
  • cell barcodes The most popular usages of RNA unique molecular identifiers (UMIs) and cell barcodes.
  • UMI is a specific sequencing linker added to the 3’ or 5’ end of RNA primers, DNA primers or oligonucleotides.
  • the Unique sequence of UMIs can identify unique mRNA transcripts or DNA fragments, and therefore helps to profile mRNA/DNA free of PCR errors.
  • UMIs are widely used in RNA-sequencing (RNA-seq), ImR sequencing and single cell RNA sequencing (scRNAseq).
  • DNA/RNA barcoding is a method used for analyzing short sections of DNA/RNA from one or more specific gene(s).
  • the barcode can be placed on one or both sides of a DNA/RNA fragment, and may be used for the identification of molecular, cell, tissue/organ, species, and samples, as well as molecular quantification and de-multiplexing.
  • a typical procedure is to sequence DNA/RNA with barcodes.
  • the in-barcode region is close to the barcode and the size is not larger than the inserting size limitation of the sequencer used.
  • the off-barcode region is the rest of DNA/RNA.
  • a final amplification with primer 1 contains P5 plus readl and primer 2 contains P7, i7 index plus read 2 produced final sequencing libraries.
  • the final library is sequenced by short read sequencer, but only the barcode and in-barcode region.
  • RNA sequencing barcodes can be added at 3’ end of the mRNA. Because only short regions near the barcode can be sequenced together with the barcode, 3’ barcoding of single cells allows only the sequencing of parts near the 3’ end of the mRNA.
  • Single cell RNA sequencing applies next generation sequencing to examine the sequence information of RNA from individual cells. It reveals the heterogeneity of individual cells which brings research and application to a new level.
  • cell barcodes were also introduced to identify cells. Cell barcodes are specific sequencing linkers added to oligonucleotides that can be used to uniquely identify cells.
  • Cell barcoding techniques are broadly used in single cell RNA-sequencing methods, such as MARS-seq, CytoSeq, Drop-seq, InDrop, Chromium, sci-RNA-seq, Seq-Well7, DroNC-seq, SPLiT-seq, Quartz-Seq, Microwell-seq.
  • the first step is to reverse transcribe the transcript mRNA using primers containing oligonucleotide dT to match the 3’ poly-A tail of mRNAs.
  • a 2 nd chain synthesis is done by which the triple “C” cap adds on the 5’ of mRNA sequence during the first step of reverse transcription. Therefore, there are 2 ways to add barcodes to the mRNA sequences: 1) 3’ barcoding technique, which adds barcodes on the 3’ of mRNA sequences during the first step of reverse transcription. 2) 5’ barcoding technique, which adds barcodes on the 5’ of mRNA sequences during the second step of 2 nd chain synthesis.
  • V(D)J region of interest is located close to the 5’ end of mRNA transcripts but not the 3’end
  • V(D)J region is too long to be sequenced in its entirety by short range sequencers.
  • barcodes at the 3’ end of RNA molecules will cause a problem when sequencing the ImR of single cells.
  • the constant C region will become the near-barcode region since this region is closest to the 3’ end of the mRNA, and the most parts of V(D)J region will become the off-barcode region.
  • the V(D)J region cannot be sequenced by short read sequencers to profile the ImR with 3’ barcoding single cell sequencing methods.
  • scRNA-seq kits are known in the art (e.g., 5’-kit from 10x Genomics), but this has the following limitations: 1) it has been recently develped and many available datasets were generated with the 3’-approach impeding comparability, 2) 5’-barcoding is considered to have worse performance than 3’-barcoding on scRNA-seq (see above), 3) when it comes to a point to combine scRNA-seq with other approaches (such as combining it with oligo-barcoded antibodies to do CITE-seq), there are more compatible reagents for 3’ scRRNA-seq than 5’ scRNA-seq. Thus, a method to sequence ImR in single cell level with 3’-barcoding would be a better choice than 5’ barcoding.
  • DNA/RNA barcoding is a method of species identification using a short section of DNA/RNA from a specific gene or genes. It can be placed in one or both sides of DNA/RNA fragment, and can be used in molecular identification, cell identification, tissue/organ identification, species identification, sample identification, molecular quantification and de multiplexing.
  • P5 and P7 are the sequences to bind with sequencing chips for illumina sequencers.
  • i7 index is the index to identify library when a sequencing lane include more than 1 library. For each short read sequencer, there is limitation of incerting size.
  • In-barcode region is close to barcode and the size is not larger than the inserting size limitation of sequencer to use.
  • Off-barcode region is the rest of DNA/RNA.
  • a final amplification with primer 1 contains P5 plus readl and primer 2 contains P7, i7 index plus read 2 produced final sequencing libraries. The final library is sequenced by a short read sequencer, but only the barcode and near-barcode region (e.g., Figure 1).
  • 3’-barcoding can be used, in which the barcodes are added at the 3’ of mRNA. Because only short region near barcode can be sequenced together with barcode, 3’ barcoding single cell sequencing only allows sequencing of parts/fragments near the 3’-end of mRNA.
  • Immune repertoire is the number of different sub-types an organism's immune system makes, either immunoglobulin or T cell receptor. They can be measured in either mRNA or genomic DNA. Each immunoglobulin or T cell receptor RNA contains 4 regions from 5’ to 3’: V, D, J and C. The recombination of V, D and J made the variable region and C is constant region.
  • V, D and J region are preferably sequenced.
  • the cell barcode will be add on the 3’ of mRNA, result in the 3’ barcoded cDNA. In this case, the barcode are close to constant region but away from variable region, leaves there barely chance to sequence V, D or J region together with barcode by short read sequencer.
  • V, D and J are viriable region of immune repertoire and C is constent region. The recombination of different V, D and J lead to the differnce on immune cell receptor.
  • C region is generally longer than the limitation of inserting size for short read sequencer, V, D and J are in the off-barcode region, which made it very rarely to reach V, D and J region during 3’ barcoding single cell sequencing. Therefore it is not possible to sequence immune repertoire by 3’ barcoding single cell sequencing technology known from the prior art.
  • An alternative method is to utilize 5’-barcoding single cell sequencing.
  • the barcode is added at the 5’-end of mRNA, which is near the V(D)J region of immune repertoire.
  • the V(D)J region became near-barcode region, thus it is possible to be sequenced together with barcode.
  • the most popular single cell sequencing is based on 3’ barcoding technique, and the result of 5’ barcoding single cell sequencing is not totally comparable with 3’ barcoding single cell sequencing.
  • the present invention is based on 3’ barcoding technique, in that V, D, J region are moved closer to 3’ barcode, thus making it possible to sequence them together by short read sequencer.
  • the present invention aimes to solve the problem of being able to combine 3’- barcoding based scRNAseq with ImR sequencing and make it possible to sequence the “off- barcode” region by short read sequencer while maintaining the barcode information.
  • the inventors here utilize self-circularization method to circularize the barcoded DNA/RNA in order to bring barcodes closer to the off-barcode region, thus enabling sequencing the barcode and the off-barcode region together in a cost effective way.
  • the inventors demonstrate applicability of the methods of the present invention for 3’ scRNA-seq, but the library preparing method of the present invention can be more generally used for immune repertoire sequencing, single cell full length RNA sequencing or other applications which require breaking the limitation of sequencing barcodes together with the off-barcode region by short read sequencers.
  • the present invention relates to a method for/of producing (and/or modifying) a nucleic acid (e.g., nucleic acid of interest, e.g., DNA, RNA or cDNA, etc.) carrying (e.g., comprising) at least one specific barcode (e.g., any unique sequence label) (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or at least 100 barcodes, etc.) at its 3’-end and/or 5’-end, said method comprising: comprising circularizing (e.g., self-circularizing) of nucleic acids (e.g., barcoded nucleic acids of interest).
  • a nucleic acid e.g., nucleic acid of interest, e.g., DNA, RNA or cDNA, etc.
  • carrying e.g., comprising
  • at least one specific barcode e.g., any unique sequence label
  • the present invention relates to a method for/of producing (and/or modifying) a nucleic acid (e.g., nucleic acid of interest, e.g., DNA, RNA or cDNA, etc.) carrying (e.g., comprising) at least one specific barcode (e.g., any unique sequence label) (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or at least 100 barcodes, etc.) at its 3’-end and/or 5’-end, said method comprising: preferably: (i) providing: a nucleic acid (e.g., nucleic acid of interest, e.g., DNA, RNA or cDNA, etc.) carrying (e.g., comprising) at least one specific barcode (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or at least 100 barcodes
  • the present invention relates to the method of the present invention, wherein the methods steps (a) to (e) or (f) or (g) are carried out consecutively.
  • the present invention relates to the method of the present invention, wherein said method comprises no amplification and/or molecular modification of said nucleic acid of interest prior to the circularizing of step (a).
  • the present invention relates to the method of the present invention, wherein said method comprises an amplification and/or molecular modification of said nucleic acid of interest prior to the circularizing of step (a).
  • the present invention relates to the method of the present invention, wherein said adapter sequence: (i) does not comprise restriction site/s for a restriction endonuclease (e.g., having EC:3.1.21.4 enzymatic activity, e.g., does not comprise restriction site/s for Not I restriction endonuclease (e.g., 5’-GCGGCCGC-3’), e.g., wherein Not!
  • a restriction endonuclease e.g., having EC:3.1.21.4 enzymatic activity, e.g., does not comprise restriction site/s for Not I restriction endonuclease (e.g., 5’-GCGGCCGC-3’)
  • restriction endonuclease is a restriction endonuclease derived from Nocardia otitidiscaviarum, e.g., having UniProtKB - Q2I6W2); and/or (ii) can not be recognized and/or cleaved by a restriction endonuclease (e.g., Not! restriction endonuclease, e.g., having UniProtKB - Q2I6W2).
  • the present invention relates to the method of the present invention, wherein said method comprising step (g), wherein said sequencing of step (g) is a single- or paired-end sequencing (e.g., as described in the Examples section herein).
  • the present invention relates to the method of the present invention, wherein said nucleic acid of interest comprising at least 1 specific barcode sequence (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or at least 100 specific barcode sequences).
  • at least 1 specific barcode sequence e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or at least 100 specific barcode sequences.
  • the present invention relates to the method of the present invention, wherein said nucleic acid of interest (e.g., a farthest nucleotide of the nucleic acid of interest to be sequenced) and said barcode are at least 100 nucleotides apart, e.g., at least about 500-700 nucleotides apart, e.g., at least about 700 nucleotides apart.
  • said nucleic acid of interest e.g., a farthest nucleotide of the nucleic acid of interest to be sequenced
  • said barcode are at least 100 nucleotides apart, e.g., at least about 500-700 nucleotides apart, e.g., at least about 700 nucleotides apart.
  • the present invention relates to the method of the present invention, wherein said method is/suitable for a short read sequencing (e.g., a short read high-throughput sequencing), preferably with sequencing read length not longer than 1000 nucleotides.
  • a short read sequencing e.g., a short read high-throughput sequencing
  • the present invention relates to the method of the present invention, wherein said method comprising step (g), wherein said method has the sequencing read accuracy (e.g., single read-based, e.g., not consensus based) of at least 50%% (e.g., 60%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8% or 99.9%).
  • said method comprising step (g), wherein said method has the time per sequencing run of at least 10 minute.
  • the present invention relates to the method of the present invention, wherein: (i) said nucleic acid comprises a plurality of nucleic acids (e.g., cDNA library or sequencing library), and preferably said plurality of nucleic acids is derived from a single cell (e.g., as described in the Examples section herein); and/or (ii) said method comprises/applied to multiple (e.g., non-identical) nucleic acids modified/processed according to the method steps of the present invention, preferably said method is a method for nucleic acid library construction (e.g., cDNA library or sequencing library).
  • said method is a method for nucleic acid library construction (e.g., cDNA library or sequencing library).
  • the present invention relates to the method of the present invention, wherein said nucleic acid is/comprises a plurality of nucleic acids (e.g., cDNA library or sequencing library), wherein said method, comprising step (g), is a method for multiplex sequencing of said plurality of nucleic acids (e.g., as described in the Examples section herein).
  • said nucleic acid of interest is an amplification and/or reverse transcription product (e.g., as described in the Examples section herein).
  • the present invention relates to the method of the present invention, wherein said method comprising step (g), wherein said method is suitable for sequencing a variable region of a nucleic acid encoding an antigen receptor and/or an antibody, preferably said method is a method for sequencing of a nucleic acid encoding a variable region of an antigen receptor and/or an antibody (e.g., as described in the Examples section herein).
  • the present invention relates to the method of the present invention, wherein said method comprising step (g), wherein said method is suitable for determining a clone type frequency of an antigen receptor and/or antibody, preferably said method is a method for determining a clone type frequency of an antigen receptor and/or antibody (e.g., as described in the Examples section herein).
  • the present invention relates to the method of the present invention, wherein said method comprising step (g), wherein said method is suitable for a full length RNA sequencing (e.g., full length single cell RNA sequencing), preferably said method is a method for a full length single cell RNA target enrichment sequencing (e.g., as described in the Examples section herein).
  • a full length RNA sequencing e.g., full length single cell RNA sequencing
  • said method is a method for a full length single cell RNA target enrichment sequencing (e.g., as described in the Examples section herein).
  • the present invention relates to the method of the present invention for sequencing the 3’-end barcoded nucleic acid, e.g., derived from a single cell, wherein the nucleic acid produced/modified according to the method of the present invention comprises form 5’ to 3’-end: a first primer binding site; a first index sequence as library identifier; a binding site for a sequencing primer (e.g., read2, e.g., SEQ ID NO: 2); a sequence of interest with binding sites, e.g., for primer “TSO”, “C” and Poly-A, e.g., SEQ ID NOs: 29, 30, 31 , 32, 33 or 34); a cell barcode and/or unique molecular identifier; a binding site for another sequencing primer (e.g., readl, e.g., SEQ ID NO: 1); a second index sequence as library identifier; a second primer binding site, and wherein the method comprises: amplifying the barcoded
  • the present invention relates to the method of the present invention, wherein the barcoded cDNA is circularized to enable sequencing of a sequence of interest positioned in distance to the cell barcode.
  • the present invention relates to the method of the present invention, wherein the method is used to sequence the variable regions of antigen receptors or antibodies.
  • the present invention relates to the method of the present invention, wherein the clone type frequencies of antigen receptors or antibodies can be determined.
  • the present invention relates to the method of the present invention, wherein the constant region of the antigen receptor is on the 5’-end of the sequence of interest in close proximity to the cell barcode.
  • the present invention relates to the method of the present invention, wherein the sequence of interest is the highly variable V(D)J region, which is positioned at the 3’-end of the sequence of interest in far distance to the cell barcode.
  • the present invention relates to the method of the present invention, wherein the sequence of interest comprises before sequencing a reduced constant region at the 5’-end, and the full length highly variable V(D)J region at the 3’-end of the sequence of interest.
  • the present invention relates to the method of the present invention, wherein the barcode of the present invention is a unique sequence used to identify a specific cell.
  • the present invention relates to the method of the present invention, wherein the barcode of the present invention (e.g., cell barcode) is selected from the group consisting of: cell identifying barcodes, molecular identifying barcodes, DNA or RNA identifying barcodes, sample identifying barcodes, chemical identifying barcodes, protein identifying barcodes, quantification barcodes.
  • the barcode of the present invention e.g., cell barcode
  • the barcode of the present invention is selected from the group consisting of: cell identifying barcodes, molecular identifying barcodes, DNA or RNA identifying barcodes, sample identifying barcodes, chemical identifying barcodes, protein identifying barcodes, quantification barcodes.
  • the present invention relates to the method of the present invention, wherein the barcode of the present invention (e.g., cell barcode) has molecular modifications selected from the group consisting of: fluorophores and dark quenchers labeling, non- fluorescent labeling, fluorescent labeling, biotinylation, avidinylation, attachment chemistry/linkers modifications, adenylation, spacer modifications, phosphorylation, phosphorothioate bonds, click chemistry modifications, and base modifications.
  • molecular modifications selected from the group consisting of: fluorophores and dark quenchers labeling, non- fluorescent labeling, fluorescent labeling, biotinylation, avidinylation, attachment chemistry/linkers modifications, adenylation, spacer modifications, phosphorylation, phosphorothioate bonds, click chemistry modifications, and base modifications.
  • the present invention relates to the method of the present invention, wherein the barcode of the present invention (e.g., cell barcode) is combined with a unique molecular identifier.
  • the barcode of the present invention e.g., cell barcode
  • the present invention relates to the method of the present invention, wherein one or more cell barcode or one or more unique molecular identifier is added.
  • the present invention relates to the method of the present invention, wherein a short read sequencer is used for sequencing.
  • the present invention relates to the method of the present invention, wherein the percentage of valid data is the ratio of cell barcode counts of fragments to total barcode counts and is at least about 80%, preferably at least about 90%, more preferably at least about 95%.
  • the present invention relates to the method of the present invention, wherein said method is the method for profiling variable regions of antigen receptors or antibodies, comprising: (a) isolating mRNA from a plurality of single cells to provide a plurality of individual mRNA samples, wherein each individual mRNA sample is from a single cell; (b) reverse-transcribing the mRNA samples of a cell, producing cDNA incorporating a cell barcode sequence; (c) pooling and purifying the barcoded cDNA produced from the separate cells; (d) amplifying the barcoded cDNA to generate a cDNA library comprising double-stranded cDNA; (e) circulating the barcoded double-stranded cDNA; (f) linearizing the circulated barcoded cDNA by PCR and target enrichment with Poly-A (e.g., SEQ ID NO: 29) and “C” primer/s (e.g., SEQ ID NOs: 30, 31, 32 or 33);
  • Poly-A
  • the present invention relates to the method of the present invention, wherein adding/providing a barcode/s to a nucleic acid of the present invention, e.g., RNA or DNA, is carried out by the means of: ligation, extraction, proliferation, transcription, amplification, reverse-transcription, DNA extension, antibody binding, PCR, qPCR, realtime PCR, Digital PCR(dPCR), Droplet Digital PCR(ddPCR), recombination, biotin capture, transposition, enzyme reaction, exonuclease digestion, endonuclease digestion, digestion and/or 2nd strand synthesizing.
  • a barcode/s to a nucleic acid of the present invention, e.g., RNA or DNA
  • the present invention relates to the method of the present invention, wherein the mRNA is isolated from animals, cells, single cell, tissue, biopsies, blood, and cell cultures.
  • the present invention relates to the method of the present invention, wherein the mRNA is further isolated from virus, bacteria, micro-beings, and plants.
  • the present invention relates to the method of the present invention, wherein circularization of the nucleic acid of interest of the present invention (e.g., DNA or RNA) is is carried out by the means of: ligation, RNA ligation, T4 DNA Ligase, Cre lox recombination, transposition, and/or DNA circulating enzyme use.
  • nucleic acid of interest of the present invention e.g., DNA or RNA
  • the present invention relates to the method of the present invention, wherein enrichment method step is carried out by the means of: Hybridization-Based Capture, PCR-Based Capture, biotin capture, Exonuclease digestion, endonuclease digestion, digestion, biotin capture, and/or PCR.
  • the present invention relates to the method of the present invention, wherein after each circularization method step a DNA digestion of left over linear DNA is optionally carried out.
  • the present invention relates to the method of the present invention, said method further comprising: one or more nucleic acid purification step/s.
  • the present invention relates to the method of the present invention, wherein said method is further combined with one or more of the following: MARS-sequencing, Cyto-sequencing, Drop-sequencing, InDrop, Chromium, sciRNA-sequencing, sequencing-Well, DroNC-sequencing, SPLiT-sequencing, Quartz-sequencing, Microwell-sequencing, 3’- transcriptome profiling with UMI, 5’ transcriptome profiling with UMI, 5’-single cell transcriptome profiling, barcoding genome sequencing, barcoding de novo sequencing, barcoding single cell genome sequencing, barcoding single cell de novo sequencing, barcoding Hi-C, barcoding single cell Hi-C, barcoding Exom sequencing, single cell barcoding Exome sequencing, barcoding target enrichment sequencing, single cell barcoding target enrichment sequencing, RNA-sequencing with UMIs, DNA sequencing with UMIs, single cell RNA sequencing with barcoding, single cell DNA sequencing with barcoding, AT
  • the present invention relates to the method of the present invention, wherein said method is suitable for / compatible with a single cell full length RNA sequencing method carried out with existing single cell RNA sequencing kits (both 3’ and 5’ kits).
  • the present invention relates to the method of the present invention, wherein said method is suitable for / compatible with a single cell immune repertoire sequencing carried out with cDNA samples derived from storage.
  • the present invention relates to the method of the present invention, wherein said method is suitable for identifying RNA location in a tissue sample.
  • the present invention relates to the method of the present invention, wherein re-linaerization step (e.g., method step (d)) is carried out by the means of a PCR, preferably said re-linaerization step does not comprise a restriction enzyme digestion.
  • re-linaerization step e.g., method step (d)
  • said re-linaerization step does not comprise a restriction enzyme digestion.
  • the present invention relates to the method of the present invention, wherein said method is compatible with existing methods of single cell RNA 3’-capture.
  • the present invention relates to the method of the present invention, wherein said method can utilize polyA region and BCR constant region to enrich a nucleic acid of interest (e.g., BCR).
  • a nucleic acid of interest e.g., BCR
  • the present invention relates to the method of the present invention, wherein said method utilizes PCR to re-linearize circulated cDNA.
  • the present invention relates to the method of the present invention, wherein said method circulazization step is carried out before enrichment step.
  • the present invention relates to the method of the present invention, which is capable of utilizing cDNA from the samples that have been already processed by other single cell 3’-capture methods.
  • the present invention relates to the method of the present invention, wherein said method is an in vitro or ex vivo or in vivo method.
  • the present invention relates to/provides a nucleic acid (e.g., a nucleic acid of interest, e.g., DNA, RNA or cDNA) carrying (e.g., comprising) at least one specific barcode (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or at least 100 barcodes) at its 3’ and/or 5’-end, e.g., produced (or modified) by the method of the present invention (e.g., as described in the Examples section herein).
  • a nucleic acid e.g., a nucleic acid of interest, e.g., DNA, RNA or cDNA
  • carrying e.g., comprising
  • at least one specific barcode e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or at least 100 barcodes
  • the present invention relates to/provides the nucleic acid, e.g., produced (or modified) by the method of the present invention, wherein said nucleic acid is an intermediate product, e.g., in another method.
  • the present invention relates to/provides the nucleic acid/s and/or polypeptide/s and/or nucleic acid/s encoding said polypeptides, e.g., SEQ ID NOs: 1-34 and/or nucleic acid/s and/or polypeptide/s and/or nucleic acid/s encoding said polypeptides being at least 80% (e.g., at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99%) sequence identity to any one of SEQ ID NOs: 1-34, e.g., for use in the method/s, composition/s and/or kit/s of the present invention.
  • the present invention relates to/provides the nucleic acid, e.g., produced (or modified) by the method of the present invention, wherein said nucleic acid further comprising: any one of the following sequences: a first primer binding site; a first index sequence as library identifier; a second index sequence as library identifier; or a second primer binding site.
  • the present invention relates to/provides the nucleic acid of the present invention, wherein functional equivalent sequences are used.
  • the present invention relates to/provides the methods/nucleic acids/compositions and/or kits of the present invention as depicted in the Examples and/or Figures as described herein (e.g., as depicted in Figure 1-10) carried out, for example, with SEQ ID NOs: 1-2, 29, 30, 31, 32, 33 or 34.
  • the present invention relates to/provides a composition or kit comprising the nucleic acid and/or polypeptide/sof the present invention (e.g., as described in the Examples section herein), e.g., for use in the methods of the present invention.
  • the present invention relates to the nucleic acid/s, polypeptide/s, composition/s or kit/s of the present invention for use as a medicament and/or diagnostic marker.
  • the present invention relates to the method/s, nucleic acid/s, polypeptide/s, composition/s or kit/s of the present invention for use in a diagnostic and/or screening (e.g., disease susceptibility screening) and/or prognostic and/or prediction (e.g., disease outcome and/or course prognosis/prediction) and/or phenotyping method (e.g., immunodiagnostic method, e.g., for an autoimmune disease, e.g., lupus erythematosis, immune disease, inflammatory disease, neuroinflammatory disease, meningitis, interleukin (I L)-17 producing T helper (Th17)-cells associated disease, cell-dominated meningeal inflammation, infections disease, genetic disorder, tissue typing/compatibility).
  • a diagnostic and/or screening e.g., disease susceptibility screening
  • prognostic and/or prediction e.g., disease outcome and/or course prognosis/prediction
  • the present invention relates to the method/s, nucleic acid/s, polypeptide/s, composition/s or kit/s of the present invention for use in in one or more of the following methods: (i) sequencing method (e.g., as described in the Examples section herein); (ii) library construction (e.g., cDNA library or sequencing library) method (e.g., as described in the Examples section herein); (iii) method for determining a clone type frequency of an antigen receptor and/or antibody, preferably said method is a method for determining a clone type frequency of an antigen receptor and/or antibody (e.g., as described in the Examples section herein); (iv) method for sequencing a variable region of a nucleic acid encoding an antigen receptor and/or an antibody, preferably said method is a method for sequencing of a nucleic acid encoding a variable region of an antigen receptor and/or an antibody (e.g., as described in the following methods: (i)
  • the present invention relates to use of the nucleic acid or composition or kit of the present invention for one or more of the following: (i) for sequencing (e.g., as described in the Examples section herein); (ii) for library construction (e.g., cDNA library or sequencing library) (e.g., as described in the Examples section herein); (iii) for determining a clone type frequency of an antigen receptor and/or antibody, preferably said method is a method for determining a clone type frequency of an antigen receptor and/or antibody (e.g., as described in the Examples section herein); (iv) for sequencing a variable region of a nucleic acid encoding an antigen receptor and/or an antibody, preferably said method is a method for sequencing of a nucleic acid encoding a variable region of an antigen receptor and/or an antibody (e.g., as described in the Examples section herein); (v) for diagnostics and/or screening (e.g.
  • the invention is also characterized by the following items:
  • a method for/of producing (and/or modifying) a nucleic acid e.g., nucleic acid of interest, e.g., DNA, RNA or cDNA, etc.
  • a nucleic acid e.g., nucleic acid of interest, e.g., DNA, RNA or cDNA, etc.
  • carrying e.g., comprising) at least one specific barcode (e.g., any unique sequence label) (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or at least 100 barcodes, etc.) at its 3’-end and/or 5’-end
  • said method comprising: preferably: (i) providing: a nucleic acid (e.g., nucleic acid of interest, e.g., DNA, RNA or cDNA, etc.) carrying (e.g., comprising) at least one specific barcode (e.g., at least 2, at least 3, at
  • said adapter sequence does not comprise restriction site/s for a restriction endonuclease (e.g., having EC:3.1.21.4 enzymatic activity, e.g., does not comprise restriction site/s for Not I restriction endonuclease (e.g., 5’-GCGGCCGC-3’), e.g., wherein Not I is a restriction endonuclease derived from Nocardia otitidiscaviarum, e.g., having UniProtKB - Q2I6W2); and/or (ii) can not be recognized and/or cleaved by a restriction endonuclease (e.g., Not I restriction endonuclease, e.g., having UniProtKB - Q2I6W2).
  • a restriction endonuclease e.g., having EC:3.1.21.4 enzymatic activity, e.g., does not comprise restriction site/s for Not I restriction endonucle
  • step (g) comprising step (g), wherein said sequencing of step (g) is a single- or paired-end sequencing (e.g., as described in the Examples section herein).
  • said nucleic acid of interest comprising at least 1 specific barcode sequence (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or at least 100 specific barcode sequences).
  • nucleic acid of interest e.g., a farthest nucleotide of the nucleic acid of interest to be sequenced
  • barcode are at least 100 nucleotides apart, e.g., at least about 500-700 nucleotides apart, e.g., at least about 700 nucleotides apart.
  • a short read sequencing e.g., a short read high-throughput sequencing
  • step (g) The method of any one of the preceding items comprising step (g), wherein said method has the sequencing read accuracy (e.g., single read-based, e.g., not consensus based) of at least 50%% (e.g., 60%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8% or 99.9%).
  • step (g) wherein said method has the time per sequencing run of at least 10 minute.
  • said nucleic acid is a plurality of nucleic acids (e.g., cDNA library or sequencing library), and preferably said plurality of nucleic acids is derived from a single cell (e.g., as described in the Examples section herein); and/or (ii) said method comprises multiple (e.g., non-identical) nucleic acids modified/processed according to the method steps according to any one of the preceding items, preferably said method is a method for nucleic acid library construction (e.g., cDNA library or sequencing library).
  • said method comprises multiple (e.g., non-identical) nucleic acids modified/processed according to the method steps according to any one of the preceding items, preferably said method is a method for nucleic acid library construction (e.g., cDNA library or sequencing library).
  • nucleic acid is a plurality of nucleic acids (e.g., cDNA library or sequencing library), wherein said method, comprising step (g), is a method for multiplex sequencing of said plurality of nucleic acids (e.g., as described in the Examples section herein).
  • nucleic acid of interest is an amplification and/or reverse transcription product (e.g., as described in the Examples section herein).
  • step (g) wherein said method is suitable for sequencing a variable region of a nucleic acid encoding an antigen receptor and/or an antibody, preferably said method is a method for sequencing of a nucleic acid encoding a variable region of an antigen receptor and/or an antibody (e.g., as described in the Examples section herein).
  • step (g) wherein said method is suitable for determining a clone type frequency of an antigen receptor and/or antibody, preferably said method is a method for determining a clone type frequency of an antigen receptor and/or antibody (e.g., as described in the Examples section herein).
  • step (g) wherein said method is suitable for a full length RNA sequencing (e.g., full length single cell RNA sequencing), preferably said method is a method for a full length single cell RNA target enrichment sequencing (e.g., as described in the Examples section herein).
  • a full length RNA sequencing e.g., full length single cell RNA sequencing
  • a full length single cell RNA target enrichment sequencing e.g., as described in the Examples section herein.
  • a nucleic acid e.g., DNA, RNA or cDNA, e.g., SEQ ID NOs: 1-34, e.g., as described in the Examples section herein
  • carrying e.g., comprising) at least one specific barcode (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or at least 100 barcodes) at its 3’ and/or 5’-end, produced (or modified) by the method according to any one of preceding items (e.g., as described in the Examples section herein).
  • a composition or kit comprising the nucleic acid according to any one of preceding items (e.g., as described in the Examples section herein).
  • nucleic acid or polypeptide e.g., SEQ ID NOs: 1-34, e.g., as described in the Examples section herein
  • composition or kit according to any one of preceding items for use in one or more of the following methods: i) sequencing method (e.g., as described in the Examples section herein); ii) library construction (e.g., cDNA library or sequencing library) method (e.g., as described in the Examples section herein); iii) method for determining a clone type frequency of an antigen receptor and/or antibody, preferably said method is a method for determining a clone type frequency of an antigen receptor and/or antibody (e.g., as described in the Examples section herein); iv) method for sequencing a variable region of a nucleic acid encoding an antigen receptor and/or an antibody, preferably said method is a method for sequencing of a nucleic acid encoding a variable region of an antigen receptor and/or an antibody (e
  • nucleic acid or polypeptide e.g., SEQ ID NOs: 1-34, e.g., as described in the Examples section herein
  • composition or kit according to any one of preceding items for/in one or more of the following: i) for sequencing (e.g., as described in the Examples section herein); ii) for library construction (e.g., cDNA library or sequencing library) (e.g., as described in the Examples section herein); iii) for determining a clone type frequency of an antigen receptor and/or antibody, preferably said method is a method for determining a clone type frequency of an antigen receptor and/or antibody (e.g., as described in the Examples section herein); iv) for sequencing a variable region of a nucleic acid encoding an antigen receptor and/or an antibody, preferably said method is a method for sequencing of a nucleic acid encoding a variable region of an antigen receptor and/or an antibody (e.
  • barcoded DNA/RNA subsequently can be called “barcoded DNA/RNA”. It can be synthesized or prepared from virus, bacteria, micro-beings, plants, animals, cells, single cell, tissue, biopsies, blood, or cultures.
  • the barcodes include cell identifying barcodes, molecular identifying barcodes, DNA/RNA identifying barcodes, sample identifying barcodes, chemical identifying barcodes, protein identifying barcodes, quantification barcodes.
  • Barcodes can be added to RNA/DNA by ligation, extraction, proliferation, transcription, amplification, reverse-transcription, DNA extension, antibody binding, PCR, qPCR, realtime PCR, Digital PCR(dPCR), Droplet Digital PCR(ddPCR), recombination, biotin capture, transposition, enzyme reaction, Exonuclease digestion, Endouclease digestion, digestion or 2 nd strand synthesizing. Barcodes can be on the region 0-10000 bp from the 5’ or/and 3’ of DNA/RNA fragments. The end of the barcoded DNA/RNA can be blunt-end or sticky end.
  • Barcorded DNA/RNA contain 0-100% of Deoxyribonucleic Acid or Ribonucleic Acid, and can have molecular modifications of fluorophores and dark quenchers labeling, nonfluorescent labeling, fluorescent labeling, biotinylation, avidinylation, attachment chemistry/linkers modifications, adenylation, spacer modifications, phosphorylation, phosphorothioate bonds, click chemistry modifications, base modifications (like 2-Aminopurine, 2,6-Dia inopurine (2-Amino-dA), 5-Bromo dU, deoxyUridine, Inverted dT, Inverted Dideoxy-T, Dideoxy-C, 5-Methyl dC, deoxyl nosine, Super T (5- hydroxybutynl-2’-deoxyuridine), Super G (8-aza-7-deazaguanosine), Locked nucleic acids — Affinity Plus modified bases, 5-Nitroindole, 2'-0-Methyl RNA Base
  • Target enrichment of a population of DNA/RNA molecules contains immune repertoire sequences by hybridization-Based Capture, PCR-Based Capture, biotin capture, Exonuclease digestion, Endouclease digestion, digestion, biotin capture, or PCR.
  • Target enrichment of a population of DNA/RNA molecules contains immune repertoire sequences by hybridization-Based Capture, PCR-Based Capture, biotin capture, Exonuclease digestion, Endouclease digestion, digestion, biotin capture, or PCR.
  • Target enrichment of a population of DNA/RNA molecules contains immune repertoire sequences by Hybridization-Based Capture, PCR-Based Capture, biotin capture, Exonuclease digestion, Endouclease digestion, digestion, biotin capture, or PCR.
  • Target enrichment of a population of DNA/RNA molecules contains immune repertoire sequences by Hybridization-Based Capture, PCR-Based Capture, biotin capture, Exonuclease digestion, Endouclease digestion, digestion, biotin capture, or PCR.
  • [00205] (optional) Amplify barcoded DNA/RNA by ligation, extraction, proliferation, transcription, amplification, reverse-transcription, DNA extension, antibody binding, PCR, qPCR, realtime PCR, dPCR, ddPCR, recombination, biotin capture, transposition, enzyme reaction, transfection, culture, digestion, Exonuclease digestion, Endouclease digestion, or 2 nd strand synthesizing.
  • Target enrichment of population of DNA/RNA molecules contains immune repertoire sequences by Hybridization-Based Capture, PCR-Based Capture, biotin capture, Exonuclease digestion, Endouclease digestion, digestion, biotin capture, or PCR.
  • [00214] Sequencing with single-end or paired-end method by high-throughput sequencers.
  • BCR human single cell B cell receptor
  • Ligate adaptor [00262] Prepare Mixture on ice: [00263] Incubate at 20°C for 15minutes.
  • a paired end sequencing is performed by lllumina sequencer. Sequencing stratergy PE150.
  • Figure 3 shows the BCR(IGH) annotation result and the statistics of the immune repertoire counting including V(D)J Annotation, Top 10 Clonotype frequencies and Top 10 Clonotype CDR3 sequences (e.g., SEQ ID NOs: 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) .
  • V(D)J variation regions of BCR.
  • IGK Immunoglobulin light chain kappa.
  • IGL Immunoglobulin light chain lambda.
  • IGH Immunoglobulin heavy chain.
  • Contig a set of overlapping DNA segments that together represent a consensus region of DNA.
  • CDR3 the main CDR(complementarity determining regions) responsible for recognizing processed antigen.
  • V-J spanning pair fraction of cell- associated barcodes with at least one contig for each chain of the receptor pair.
  • Clonotype The phenotype of a clone of a cell.
  • Figure 7 shows the validation of the data rate, defined by the percentage of data maps to BCR after data filtering and UMI adjusting. For the total data, valid rate is 98.95%, on average of cells, validation rate is 98.73%.
  • Figure 8 shows the BCR analysis data including statistic of sequencing, including BCR containing cell numbers, enrichment rate (Note: enrichment rate is based on total reads).
  • Example 2 Single cell TCR sequencing from mice 3’ single cell cDNA library.
  • Circularized cDNA libraries were purified by 0.7x Ampure XP beads.
  • a PCR enrichment of the T cell receptor (TCR) variable region was performed with 25 mI Kapa hotstart amplification mix (KAPA Biosystems), 10 mI primer polyA and 5 mI primer Trxc rev poo!jout (e.g., SEQ ID NOs: 29, 32- 33).
  • TCR T cell receptor
  • 10 mI primer polyA and 5 mI primer Trxc rev poo!jout e.g., SEQ ID NOs: 29, 32- 33
  • a nested PCR was performed with 25 mI Kapa hotstart amplification mix, 10 mI primer polyA and 5 mI primer Trxc rev pool Jn (e.g., SEQ ID NOs: 29, 30-31).
  • PCR products were purified by 0.5x - 0.8 x Ampure XP beads (Beckman Coulter) and libraries were prepared from them using the Chromium Single Cell 3’ Library Kit v3 (10x Genomics).
  • TCR annotation in Figure 10, which shows the TCR annotation result and the statistics of the immune repertoire counting including V(D)J Annotation, Top 10 Clonotype frequencies and Top 10 Clonotype CDR3 sequences (e.g., SEQ ID NOs: 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28).
  • V(D)J variation regions of TCR.
  • TRA T cell Receptor Alpha.
  • TRB T cell Receptor Beta.
  • Contig a set of overlapping DNA segments that together represent a consensus region of DNA.
  • CDR3 the main CDR (complementarity determining regions) responsible for recognizing processed antigen.
  • V-J spanning pair fraction of cell-associated barcodes with at least one contig for each chain of the receptor pair.
  • Clonotype The phenotype of a clone of a cell.
  • T cell receptor variable region gene segments recombine in pre-B cells: evidence that B and T cells use a common recombinase. Cell 44, 251-259 (1986). Goodwin, S., McPherson, J. D. & McCombie, W. R. Coming of age: ten years of nextgeneration sequencing technologies. Nat. Rev. Genet. 17, 333-351 (2016). Miner, B. E., Stoger, R. J., Burden, A. E, Laird, C. D. & Hansen, R. S. Molecular barcodes detect redundancy and contamination in hairpin-bisulfite PCR. Nucleic Acids Res.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention relates to methods comprising self-circularizing of barcoded nucleic acids in order to ligate an off-barcode region with a barcode, wherein after self-circularizing the off- barcode region becomes closer to the barcode allowing for sequencing the off-barcode region with the barcode.

Description

CIRCULATION METHOD TO SEQUENCE IMMUNE REPERTOIRES OF INDIVIDUAL CELLS
[001] This application contains a Sequence Listing in a computer readable form, which is incorporated herein by reference.
[002] Technical field
[003] The present invention relates to method/s (e.g., sequencing and/or nucleic acid library construction methods) comprising self-circularization of nucleic acids of interest (e.g., barcoded/labelled nucleic acids) to bring/move an off-barcode region of said nucleic acid closer to a barcode, whereby allowing for sequencing the off-barcode region with the barcode after circularization. The present invention further relates to nucleic acids produced or modified by the methods of the present invention (e.g., including libraries of nucleic acids, e.g., derived from a single cell, e.g., scRNA libraries) as well as uses thereof, e.g., for sequencing and/or nucleic acid library construction and/or screening applications.
[004] Background of the invention
[005] Current barcoding techniques utilize the barcode sequences at either the 3’ or 5’ end of
DNA/RNA molecules. Nowadays “short-read length” sequencers are the most commonly chosen devices for sequencing because of their low cost and high accuracy. However, there are two known size limitations affecting sequencing of barcoded DNA/RNA by short read sequencers: a read length limitation and an insert fragment length limitation. The read length is the longest fragment that can be sequenced by a single-end sequencing run and the insert fragment length is the longest fragment that can be sequenced by a paired-end sequencing run.
When the target region to be sequenced has a distance from the barcode less than either of the read length and inserting length limitation, it is possible to sequence the barcode and sequencing target region together. However, if the target region is out of the limitation (i.e. , as above), it is not possible to directly sequence it together with the barcode. The inventors may refer to the region located away from the barcodes less than either the read length limitation or the inserting fragment size limitation of the sequencer as the “near-barcode region” and to the rest of the nucleic acid (e.g., DNA/RNA) region as the “off-barcode region”. The off-barcode region cannot be sequenced together with the barcode by short read sequencers. “Long-read” sequencer can solve this problem, but the associated sequencing costs are more than 10 times higher, which is a disadvantage. Thus, no cost effective approach is currently available to identify the sequence of DNA/RNA fragments that are located at the off-barcode region while maintaining the barcode information. This is because short-read sequencers cover a too short length of the sequenced molecule and inserting fragment, while long-read sequencers are considerably more expensive. A feasible technique for sequencing such an “off-barcode region” of the DNA/RNA molecule is not available. Accordingly, there is a need for feasible techniques for sequencing an “off-barcode region” of the DNA/RNA molecule at low cost.
[006] The technical problem underlying the present application is thus to comply with these needs. The technical problem is solved by providing the embodiments reflected in the claims, described in the description and illustrated in the examples and figures that follow.
[007] Summary of the invention
[008] The present invention relates to a method for producing/modifying a nucleic acid of interest (e.g., DNA, RNA or cDNA, etc.) carrying (e.g., comprising) at least one specific barcode (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or at least 100 barcodes, etc.) at its 3’-end and/or 5’-end, said method comprising: (a) circularizing (e.g., self-circularizing) said nucleic acid of interest into a circular nucleic acid, preferably said circularizing (e.g., self-circularizing) is carried out by the means of an enzymatic ligation (e.g., by the means of: DNA ligase (e.g., T4 DNA Ligase) or RNA ligase, e.g., DNA ligase having EC:6.5.1.1, EC:6.5.1.2, EC:6.5.1.6 or EC:6.5.1.7 enzymatic activity, RNA ligase having EC:6.5.1.3 enzymatic activity, 3'-phosphate/5'-hydroxy nucleic acid ligase having EC: 6.5.1.8 enzymatic activity, etc.); optionally, removing uncircularized (e.g., linear) nucleic acid after said circularizing, preferably said removing is carried out by the means of a nucleic acid digestion (e.g., by the means of: a nuclease/s, e.g., an exonuclease/s and/or endonuclease/s, e.g., as described in the Examples section herein); (b) enriching (e.g., amplifying) said target nucleic acid from said circularized (e.g., self-circularized) nucleic acid from step (a), e.g., by the means of any suitable nucleic acid enriching (e.g., amplifying) method, preferably said enriching is carried out by the means of: PCR (e.g., polymerase chain reaction), NASBA (e.g., nucleic acid sequence based amplification), RCA (e.g., rolling-circle amplification), SPIA (e.g., single primer isothermal amplification), HDA (e.g., helicase-dependent amplification), SDA (e.g., strand displacement amplification), LAMP (e.g., loop-mediated isothermal amplification), RPA (e.g., recombinase polymerase amplification), TS-PCR (e.g., template-switching polymerase chain reaction, e.g., SMART, e.g., switching mechanism at 5’-end of RNA transcript), ICAN (e.g., isothermal and/or chimeric primer-initiated amplification of nucleic acids), SDA (e.g., strand displacement amplification), EXPAR (e.g., exponential amplification reaction), NEMA (e.g., nicking endonuclease-mediated isothermal amplification), in-vitro transcription, etc., and/or combination/s thereof, further preferably said PCR is carried out with primer/s (e.g., SEQ ID NOs: 1, 2, 29, 30, 31, 32, 33 or 34) capable of hybridizing to a specific and/or unspecific sequence/s within (and/or adjacent to) said nucleic acid of interest (e.g., to polyA sequence within said nucleic acid of interest and/or constant region within said nucleic acid of interest (e.g., constant (C) region of B-cell receptor) under suitable conditions (e.g., as described in the Examples section herein); (c) optionally, circularizing (e.g., self-circularizing) the amplified nucleic acid from step (b) into a circular nucleic acid; optionally, removing uncircularized (e.g., linear) nucleic acid after said circularizing, preferably said removing is carried out by the means of a nucleic acid digestion (e.g., by the means of a nuclease, e.g., as described in the Examples section herein); (d) optionaly, linearizing the circularized nucleic acid from step (c), preferably said linearizing is carried out by the means of PCR and/or nucleic acid fragmentation (e.g., shearing); further preferably said nucleic acid fragmentation is a random/stochastic nucleic acid fragmentation (e.g., as described in the Examples section herein); (e) ligating an adapter sequence comprising at least one specific oligonucleotide (e.g., a sequencing primer) to the 3’- end and/or 5’-end of the linearized nucleic acid from step (b), (c) or (d); preferably said adapter sequence further comprising SEQ ID NO: 1 (e.g., Illumina read2 primer) sequence (e.g., as described in the Examples section herein); (f) optionally, amplifying the ligation product from step (e); preferably said amplifying is carried out by the means of PCR; further preferably said PCR is carried out with a primer hybridizing to said adapter sequence (e.g., as described in the Examples section herein); (g) optionally, sequencing the barcoded nucleic acid from step (e) or (f), wherein said nucleic acid of interest is sequenced together with said barcode (e.g., in a single- or paired-end sequencing), wherein said method is a 3’-end and/or 5’end sequencing method (e.g., high-throughput sequencing method) (e.g., as described in the Examples section herein).
[009] The present application satisfies this demand by the provision of methods and modified nucleic acid/s and compositions and kits described herein below, characterized in the claims and illustrated by the appended Examples and Figures.
[0010] Overview of the Sequence Listing
[0011] SEQ ID NO: 1 is the DNA sequence of the exemplary sequencing primer 1 (“readl” sequencing primer).
[0012] SEQ ID NO: 2 is the DNA sequence of the exemplary sequencing primer 2 (“read2” sequencing primer).
[0013] SEQ ID NO: 3 is the amino acid sequence of the Clonotype 1 IGL CDR3 (e.g., Figure 3).
[0014] SEQ ID NO: 4 is the amino acid sequence of the Clonotype 2 IGK CDR3 (e.g., Figure 3).
[0015] SEQ ID NO: 5 is the amino acid sequence of the Clonotype 3 IGK CDR3 (e.g., Figure 3).
[0016] SEQ ID NO: 6 is the amino acid sequence of the Clonotype 4 IGL CDR3 (e.g., Figure 3).
[0017] SEQ ID NO: 7 is the amino acid sequence of the Clonotype 9 IGK CDR3 (e.g., Figure 3).
[0018] SEQ ID NO: 8 is the amino acid sequence of the Clonotype 8 IGK CDR3 (e.g., Figure 3).
[0019] SEQ ID NO: 9 is the amino acid sequence of the Clonotype 5 IGL CDR3 (e.g., Figure 3).
[0020] SEQ ID NO: 10 is the amino acid sequence of the Clonotype 7 IGK CDR3 (e.g., Figure [0021] SEQ ID NO: 11 is the amino acid sequence of the Clonotype 6 IGK CDR3 (e.g., Figure 3).
[0022] SEQ ID NO: 12 is the amino acid sequence of the Clonotype 10 IGK CDR3 (e.g., Figure 3).
[0023] SEQ ID NO: 13 is the amino acid sequence of the Clonotype 1 TRA CDR3 (e.g., Figure 10).
[0024] SEQ ID NO: 14 is the amino acid sequence of the Clonotype 1 TRB CDR3 (e.g., Figure 10).
[0025] SEQ ID NO: 15 is the amino acid sequence of the Clonotype 2 TRB CDR3 (e.g., Figure 10).
[0026] SEQ ID NO: 16 is the amino acid sequence of the Clonotype 3 TRA CDR3 (e.g., Figure 10).
[0027] SEQ ID NO: 17 is the amino acid sequence of the Clonotype 4 TRA CDR3 (e.g., Figure 10).
[0028] SEQ ID NO: 18 is the amino acid sequence of the Clonotype 4 TRB CDR3 (e.g., Figure 10).
[0029] SEQ ID NO: 19 is the amino acid sequence of the Clonotype 4 TRB CDR3 (e.g., Figure 10).
[0030] SEQ ID NO: 20 is the amino acid sequence of the Clonotype 5 TRA CDR3 (e.g., Figure 10).
[0031] SEQ ID NO: 21 is the amino acid sequence of the Clonotype 5 TRB CDR3 (e.g., Figure 10).
[0032] SEQ ID NO: 22 is the amino acid sequence of the Clonotype 6 TRB CDR3 (e.g., Figure 10).
[0033] SEQ ID NO: 23 is the amino acid sequence of the Clonotype 6 TRB CDR3 (e.g., Figure 10).
[0034] SEQ ID NO: 24 is the amino acid sequence of the Clonotype 12 TRA CDR3 (e.g., Figure 10).
[0035] SEQ ID NO: 25 is the amino acid sequence of the Clonotype 12 TRB CDR3 (e.g., Figure 10).
[0036] SEQ ID NO: 26 is the amino acid sequence of the Clonotype 9 TRA CDR3 (e.g., Figure 10).
[0037] SEQ ID NO: 27 is the amino acid sequence of the Clonotype 8 TRA CDR3 (e.g., Figure 10).
[0038] SEQ ID NO: 28 is the amino acid sequence of the Clonotype 13 TRB CDR3 (e.g., Figure 10).
[0039] SEQ ID NO: 29 is the DNA sequence of the exemplary polyA primer, wherein n=c or g or t. [0040] SEQ ID NO: 30 is the DNA sequence of the exemplary “Trxc rev pooljn” primer (content of pool: mTRAC_1).
[0041] SEQ ID NO: 31 is the DNA sequence of the exemplary “Trxc rev pooljn” primer (content of pool: mTRBC_1).
[0042] SEQ ID NO: 32 is the DNA sequence of the exemplary “Trxc rev pool_out” primer (content of pool: mTRAC_2).
[0043] SEQ ID NO: 33 is the DNA sequence of the exemplary “Trxc rev pool_out” primer (content of pool: mTRBC_2).
[0044] SEQ ID NO: 34 is the DNA sequence of the exemplary “TSO” primer.
[0045] Brief description of the drawings
[0046] Figure 1 schematically shows a “near-barcode region”, “off barcode region” and sequencing library construction of barcoded RNA/DNA. Current barcoding techniques add the barcoding sequences at either the 3’- or 5’-end of DNA/RNA fragments. Because of the short read length and short inserting fragment size, short read sequencers can only sequence the barcode together with the region near the barcode (less than the limitation of either the read length or inserting fragment length of the sequencing library). Readl (e.g., SEQ ID NO: 1) and Read2 (e.g., SEQ ID NO: 2) are the sequencing primers, they are both starting points for the paired-end sequencing. P5 and P7 are the sequences to bind with sequencing chips of the “Nlumina” sequencers. i5 and i7 are indexes to identify libraries. The final library is sequenced by short read sequencer, but only the barcode and near-barcode region.
[0047] Figure 2 shows an exemplary embobiment of the method of the present invention together with exemplary molecular constructs/structures of the corresponding method steps (e.g., carried out with SEQ ID NOs: 1-2, 29, 30, 31, 32, 33 and/or 34).
[0048] Figure 3 shows the BCR(IGH) annotation result and the statistics of the immune repertoire counting including V(D)J Annotation, Top 10 Clonotype frequencies and Top 10 Clonotype CDR3 sequences (e.g., SEQ ID NOs: 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) . The clonotype of single cell IGH sequenced in the course of the present invention. V(D)J: variation regions of BCR. IGK: Immunoglobulin light chain kappa. IGL: Immunoglobulin light chain lambda. IGH: Immunoglobulin heavy chain. Contig: a set of overlapping DNA segments that together represent a consensus region of DNA. CDR3: the main CDR complementarity determining regions) responsible for recognizing processed antigen. V-J spanning pair: fraction of cell- associated barcodes with at least one contig for each chain of the receptor pair. Clonotype: The phenotype of a clone of a cell.
[0049] Figure 4 shows an exemplary embodiment of the invention utilizing circulazization of the barcoded DNA/RNA. After circularization, the off-barcode region is ligated to a barcode on the other side, thus the off-barcode region becomes the near-barcode region and can be sequenced together with barcode by a short-read sequencer (e.g., with SEQ ID NOs: 1-2, 29, 30, 31, 32, 33 or 34).
[0050] Figure 5 shows a further exemplary embodiment of the invention where examplary method steps are shown together with corresponding exemplary molecular constructs/structures (e.g., with SEQ ID NOs: 1-2, 29, 30, 31, 32, 33 or 34).
[0051] Figure 6 shows a yet another exemplary embodiment of the invention where examplary method steps are shown together with corresponding exemplary molecular constructs/structures (e.g., with SEQ ID NOs: 1-2, 29, 30, 31, 32, 33 or 34).
[0052] Figure 7 shows the validation of the data rate, defined by the percentage of data maps to BCR after data filtering and UMI adjusting. For the total data, valid rate is 98.95%, on average of cells, validation rate is 98.73%.
[0053] Figure 8 shows the BCR analysis data including statistic of sequencing, including BCR containing cell numbers, enrichment rate (Note: enrichment rate is based on total reads).
[0054] Figure 9 shows a schematic view of the single cell TCR sequencing from 3’ single cell cDNA library as used in Example 2 herein (e.g., with SEQ ID NOs: 1-2, 29, 30, 31, 32, 33 or 34)..
[0055] Figure 10 shows the TCR annotation result and the statistics of the immune repertoire counting including V(D)J Annotation, Top 10 Clonotype frequencies and Top 10 Clonotype CDR3 sequences (e.g., SEQ ID NOs: 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28). V(D)J: variation regions of TCR. TRA: T cell Receptor Alpha. TRB: T cell Receptor Beta. Contig: a set of overlapping DNA segments that together represent a consensus region of DNA. CDR3: the main CDR (complementarity determining regions) responsible for recognizing processed antigen. V-J spanning pair: fraction of cell-associated barcodes with at least one contig for each chain of the receptor pair. Clonotype: The phenotype of a clone of a cell.
[0056]
[0057] Detailed description of the invention [0058] Definitions
[0059] As described herein references are made to UniProtKB Accession Numbers (http://www.uniprot.org/, e.g., as available in UniProtKB release 2020_03 published June 17, 2020).
[0060] As described herein references are made to EC (Enzyme Commission) numbers relative to the nomenclature of enzymes (e.g., as described by Bairoch A. The ENZYME database in 2000. Nucleic Acids Res. 28:304-305(2000). See the Enzyme nomenclature database (or ENZYME database), release/published June 17, 2020 (e.g., at https://enzyme.expasy.org/). [0061] As used herein, “barcode” or “barcode sequence” may refer to any unique sequence label that can be coupled to at least one nucleotide sequence for, e.g., later identification of the at least one nucleotide sequence. [0062] The terms “patient,” “subject,” and “individual” may be used interchangeably and refer to either a human or a non-human animal. These terms include mammals such as humans, primates, livestock animals (e.g., bovines, porcines), companion animals (e.g., canines, felines) and rodents (e.g., mice and rats).
[0063] The term “diagnosis” as used herein may refer to methods by which the skilled artisan can estimate and/or determine whether or not a patient is afflicted with a given disease or condition. The skilled worker often makes a diagnosis based on one or more diagnostic indicators. Exemplary diagnostic indicators may include the manifestation of symptoms or the presence, absence, or change in one or more markers for the disease or condition. A diagnosis may indicate the presence or absence, or severity, of the disease or condition.
[0064] The term “prognosis” as used herein may refer to the likelihood of the progression or regression of a disease or condition, including likelihood of the recurrence of a disease or condition.
[0065] As used herein, “treating” a disease or condition may refer to taking steps to obtain beneficial or desired results, including clinical results. Beneficial or desired clinical results include, but are not limited to, reduction, alleviation or amelioration of one or more symptoms associated with the disease or condition.
[0066] As used herein, “administering” or “administration of’ a compound or an agent to a subject can be carried out using one of a variety of methods known to those skilled in the art. For example, a compound or an agent can be administered orally, intravenously, arterially, intradermally, intramuscularly, intraperitoneally, subcutaneously, ocularly, sublingually, intranasally, intraspinally, intracerebrally, and transdermally. A compound or agent can appropriately be introduced by rechargeable or biodegradable polymeric devices or other devices, e.g., patches and pumps, or formulations, which provide for the extended, slow, or controlled release of the compound or agent. Administering can also be performed, for example, once, a plurality of times, and/or over one or more extended periods. Administration of a compound may include both direct administration, including self-administration, and indirect administration, including the act of prescribing a drug. For example, a physician who instructs a patient to self-administer a therapeutic agent, or to have the agent administered by another, and/or who provides a patient with a prescription for a drug has administered the drug to the patient.
[0067] The term “nucleic acid” may refer to DNA molecules (e.g., cDNA or genomic DNA), RNA molecules (e.g., mRNA), DNA-RNA hybrids, and analogs of the DNA or RNA generated using nucleotide analogs. The nucleic acid molecule can be a nucleotide, oligonucleotide, double- stranded DNA, single-stranded DNA, multi-stranded DNA, complementary DNA, genomic DNA, non-coding DNA, messenger RNA (mRNA), microRNA (miRNA), small nucleolar RNA (snoRNA), ribosomal RNA (rRNA), transfer RNA (tRNA), small interfering RNA (siRNA), heterogeneous nuclear RNAs (hnRNA), or small hairpin RNA (shRNA). [0068] As used herein, “circularized” nucleic acid may refer to a nucleic acid that forms a closed loop and has no ends. Examples include plasmids etc. As used herein, “linearized” nucleic acid may refer to a nucleic acid with one or two ends on each side of the nucleic acid molecule. Linearized DNA may refer to the DNA with two ends on each side of the DNA molecule. Linearized RNA may refer to the RNA with one end on each side of the RNA molecule.
[0069] As used herein, the term “enriching” (e.g., amplifying) may refer to increasing the quantity or amount of nucleic acid (e.g., by the means of PCR or any other suitable technique as discrebed herein).
[0070] As used herein, the term “adapter” or “adaptor” may refer to a linker in genetic engineering that is a short, chemically synthesized, single-stranded or double-stranded oligonucleotide that can be ligated to the ends of other DNA or RNA molecules.
[0071] The term “polypeptide” is equally used herein with the term "protein". Proteins (including fragments thereof, preferably biologically active fragments, and peptides, usually having less than 30 amino acids) comprise one or more amino acids coupled to each other via a covalent peptide bond (resulting in a chain of amino acids, e.g., SEQ ID NOs: 3-28). The term "polypeptide" as used herein describes a group of molecules, which, for example, consist of more than 30 amino acids. Polypeptides may further form multimers such as dimers, trimers and higher oligomers, i.e. consisting of more than one polypeptide molecule. Polypeptide molecules forming such dimers, trimers etc. may be identical or non-identical. The corresponding higher order structures of such multimers are, consequently, termed homo- or heterodimers, homo- or heterotrimers etc. An example for a heteromultimer is an antibody molecule, which, in its naturally occurring form, consists of two identical light polypeptide chains and two identical heavy polypeptide chains. The terms "polypeptide" and "protein" may also refer to naturally modified polypeptides/proteins wherein the modification is effected e.g. by post-translational modifications like glycosylation, acetylation, phosphorylation and the like. Such modifications are well known in the art.
[0072] The term "variable" refers to the portions of the immunoglobulin domains that exhibit variability in their sequence and that are involved in determining the specificity and binding affinity of a particular antibody (i.e., the "variable domain(s)"). Variability is not evenly distributed throughout the variable domains of antibodies; it is concentrated in sub-domains of each of the heavy and light chain variable regions. These sub-domains are called "complementarity determining regions" (CDRs).
[0073] CDR3 (e.g., SEQ ID NOs: 3-28) is typically the greatest source of molecular diversity within the antibody-binding site. H3, for example, can be as short as two amino acid residues or greater than 26 amino acids. The subunit structures and three-dimensional configurations of different classes of immunoglobulins are well known in the art. For a review of the antibody structure, see Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory, eds. Harlow et al. , 1988. One of skill in the art will recognize that each subunit structure, e.g., a CH, VH, CL, VL, CDR, FR structure, comprises active fragments, e.g., the portion of the VH, VL, or CDR subunit the binds to the antigen, i.e. , the antigen-binding fragment, or, e.g., the portion of the CH subunit that binds to and/or activates, e.g., an Fc receptor and/or complement. The CDRs typically refer to the Kabat CDRs, as described in Sequences of Proteins of immunological Interest, US Department of Health and Human Services (1991), eds. Kabat et al. Another standard for characterizing the antigen binding site is to refer to the hypervariable loops as described by Chothia. See, e.g., Chothia, et al. (1992; J. Mol. Biol. 227:799-817; and Tomlinson et al. (1995) EMBO J. 14:4628-4638. Still another standard is the AbM definition used by Oxford Molecular's AbM antibody modelling software. See, generally, e.g., Protein Sequence and Structure Analysis of Antibody Variable Domains. In: Antibody Engineering Lab Manual (Ed.: Duebel, S. and Kontermann, R., Springer-Verlag, Heidelberg).
[0074] As used herein, a “profile” of a transcriptome or portion of a transcriptome can refer to any sequencing or gene expression information concerning the transcriptome or portion thereof. This information can be either qualitative (e.g., presence or absence) or quantitative (e.g., levels or mRNA copy numbers). In some embodiments, a profile can indicate a lack of expression of one or more genes.
[0075] The term “cDNA library” may refer to a collection of complementary DNA (cDNA) fragments. A cDNA library may be generated from the transcriptome of a single cell or from a plurality of single cells. cDNA is produced from mRNA found in a cell and therefore reflects those genes that have been transcribed for subsequent protein expression.
[0076] As used herein, a “plurality” of cells may refer to a population of cells and can include any number of cells to be used in the methods described herein. For example, a plurality of cells includes at least 10 cells, at least 25 cells, at least 50 cells, at least 100 cells, at least 200 cells, at least 500 cells, at least 1,000 cells, at least 5,000 cells, or at least 10,000 cells. In some embodiments, a plurality of cells includes from 10 to 100 cells, from 50 to 200 cells, from 100 to 500 cells, from 100 to 1 ,000 cells, or from 1,000 to 5,000 cells.
[0077] As used herein, a “single cell” may refer to one cell. Single cells useful in the methods described herein can be obtained from a tissue of interest, or from a biopsy, blood sample, or cell culture. Additionally, cells from specific organs, tissues, tumors, neoplasms, or the like can be obtained and used in the methods described herein. Cells can be cultured cells or cells from a dissociated tissue, and can be fresh or preserved in a preservative buffer such as RNAprotect. Furthermore, in general, cells from any population can be used in the methods, such as a population of prokaryotic or eukaryotic single-celled organisms including bacteria or yeast. In some aspects of the invention, the method of preparing the cDNA library can include the step of obtaining single cells. A single cell suspension can be obtained using standard methods known in the art including, for example, enzymatically using trypsin or papain to digest proteins connecting cells in tissue samples or releasing adherent cells in culture, or mechanically separating cells in a sample. Single cells can be placed in any suitable reaction vessel in which single cells can be treated individually. For example a 96-well plate, such that each single cell is placed in a single well.
[0078] As used herein, an “oligonucleotide” or “polynucleotide” may refer to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides or analogs thereof. Polynucleotides can have any three-dimensional structure and can perform any function. Exemplary polynucleotides include a gene or gene fragment (e.g., a probe or primer), exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA or RNA of any sequence, and nucleic acid probes and primers. A polynucleotide can comprise modified nucleotides, such as isonucleotides, methylated nucleotides, and other nucleotide analogs. The term also refers to both double- and single-stranded molecules. A polynucleotide is composed of a specific sequence of four nucleotide bases: adenine (A), cytosine (C), guanine (G), and thymine (T). Uracil (U) substitutes for thymine when the polynucleotide is RNA. The sequence can be input into databases in a computer having a central processing unit and used for bioinformatics applications such as functional genomics and homology searching.
[0079] As used herein, a “primer” may refer to a polynucleotide that hybridizes to a target or template that may be present in a sample of interest. After hybridization, the primer promotes the polymerization of a polynucleotide complementary to the target, for example in a reverse transcription or amplification reaction (e.g., SEQ ID NOs: 1 , 2, 29, 30, 31, 32, 33 or 34).
[0080] As used herein, the term “identity” (e.g., sequence identity) may refer to the relatedness between two amino acid sequences or between two nucleotide sequences and is described by the parameter “sequence identity”. For purposes of the present invention, the sequence identity between two amino acid sequences is determined using the Needleman-Wunsch algorithm (Needleman and Wunsch, 1970, J. Mol. Biol. 48: 443-453) as implemented in the Needle program of the EMBOSS package (EMBOSS: The European Molecular Biology Open Software Suite, Rice et al., 2000, Trends Genet. 16: 276-277), preferably version 5.0.0 or later. The parameters used may be gap open penalty of 10, gap extension penalty of 0.5, and the EBLOSUM62 (EMBOSS version of BLOSUM62) substitution matrix. The output of Needle labeled “longest identity” (obtained using the no-brief option) is used as the percent identity and is calculated as follows:
(Identical Residues* 100)/(Length of Alignment-Total Number of Gaps in Alignment).
[0081] As used herein, the term “amplification” or “amplifying” may refer to a process by which multiple copies of a particular polynucleotide are formed, and includes methods such as the polymerase chain reaction (PCR), ligation amplification (also known as ligase chain reaction, or LCR), and other amplification methods. In some embodiments, amplification refers specifically to PCR. Amplification methods are widely known in the art. In general, PCR refers to a method of amplification comprising hybridization of primers to specific sequences within a DNA sample and amplification involving multiple rounds of annealing, elongation, and denaturation using a DNA polymerase. The resulting DNA products are then often screened for a band of the correct size. The primers used are oligonucleotides of appropriate length and sequence to provide initiation of polymerization (e.g., SEQ ID NOs: 1, 2, 29, 30, 31, 32, 33 or 34). Reagents and hardware for conducting amplification reactions are widely known and commercially available. [0082] As used herein, “sequencing” may refer to any technique known in the art that allows the identification of consecutive nucleotides of at least part of a nucleic acid. Exemplary sequencing techniques include RNA-seq (also known as whole transcriptome sequencing), lllumina™ sequencing, direct sequencing, random shotgun sequencing, Sanger dideoxy termination sequencing, whole-genome sequencing, massively parallel signature sequencing (MPSS), sequencing by hybridization, pyrosequencing, capillary electrophoresis, gel electrophoresis, duplex sequencing, cycle sequencing, single-base extension sequencing, solid-phase sequencing, high-throughput sequencing, massively parallel signature sequencing, emulsion PCR, sequencing by reversible dye terminator, paired-end sequencing, near-term sequencing, exonuclease sequencing, sequencing by ligation, short-read sequencing, single-molecule sequencing, sequencing-by-synthesis, real-time sequencing, reverse-terminator sequencing, nanopore sequencing, 454 sequencing, Solexa Genome Analyzer sequencing, SOLiD™ sequencing, MS-PET sequencing, mass spectrometry, and a combination thereof. In some embodiments, sequencing comprises detecting a sequencing product using an instrument, for example but not limited to an ABI PRISM™ 377 DNA Sequencer, an ABI PRISM™ 310, 3100, 3100-Avant, 3730, or 3730x1 Genetic Analyzer, an ABI PRISM™ 3700 DNA Analyzer, or an Applied Biosystems SOLiD™ System (all from Applied Biosystems), a Genome Sequencer 20 System (Roche Applied Science), or a mass spectrometer. “High-throughput” or “nextgeneration sequencing” can sequence mass amount of DNA fragments in parallel, thus reduce the cost and time for high demand for large scale of sequencing. It can be categorized into short read sequencing and long read sequencing. Short read sequencing is currently most commonly used technique because of its cost effectiveness and high throughput.
[0083] As described above, the invention inter alia is useful in generating gene expression profiles for a plurality of ceils. These gene expression profiles can be used in a number of applications related to the diagnosis, prognosis, and treatment of subjects.
[0084] It is noted that as used herein, the singular forms “a”, “an”, and “the”, include plural references unless the context clearly indicates otherwise. Thus, for example, reference to “a reagent” includes one or more of such different reagents and reference to “the method” includes reference to equivalent steps and methods known to those of ordinary skill in the art that could be modified or substituted for the methods described herein.
[0085] The term “about” or “approximately” as used herein means within 20 %, preferably within 10 %, and more preferably within 5 % of a given value or range. It includes, however, also the concrete number, e.g. “about 20” includes 20.
[0086] Unless otherwise indicated, the term “at least” preceding a series of elements is to be understood to refer to every element in the series. The term “at least one” refers, if not particularly defined differently, to one or more such as two, three, four, five, six, seven, eight, nine, ten or more. Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments of the invention described herein. Such equivalents are intended to be encompassed by the present invention. [0087] The term “and/or” wherever used herein includes the meaning of “and”, “or” and “all or any other combination of the elements connected by said term”.
[0088] The term “less than” or in turn “more than” does not include the concrete number. For example, less than 20 means less than the number indicated. Similarly, more than or greater than means more than or greater than the indicated number, e.g. more than 80 % means more than or greater than the indicated number of 80%.
[0089] Throughout this specification and the claims which follow, unless the context requires otherwise, the word “comprise”, and variations such as “comprises” and “comprising”, will be understood to imply the inclusion of a stated integer or step or group of integers or steps but not the exclusion of any other integer or step or group of integer or step. When used herein the term “comprising” can be substituted with the term “containing” or “including” or sometimes when used herein with the term “having”.
[0090] When used herein “consisting of excludes any element, step, or ingredient not specified in the claim element. When used herein, “consisting essentially of” does not exclude materials or steps that do not materially affect the basic and novel characteristics of the claim.
[0091] The term “including” means “including but not limited to”. “Including” and “including but not limited to” are used interchangeably.
[0092] The term “about” means plus or minus 10%, preferably plus or minus 5%, more preferably plus or minus 2%, most preferably plus or minus 1%.
[0093] Lymphocytes (T and B cells) recognize antigens through their highly variable antigen receptor (AgR) and each lymphocyte expresses one single variant. The immune repertoire (ImR) denotes the number of different AgR variants an organism's adaptive immune system makes. A typical way to characterize the ImR is to sequence the highly variable V(D)J region of RNA or DNA molecules derived from the immunoglobulin gene of B cells or from the T cell receptor of T cells. Single cell RNA and DNA sequencing can sequence individual genes at single cell resolution. The most efficient and commonly used way of single cell RNA sequencing (scRNA-seq) is to use 3’ barcoding technology to bio-informatically identify cells in the sequencing library. In this 3’ barcoding approach, cell barcodes are connected with a poly-T tail which will bind the poly-A tail of the messenger RNA (mRNA), cell barcodes sequences will thus appear at the 3’ end of the cDNA after reverse transcription from mRNA. By sequencing barcodes together with the short fragment of the 3’ end of cDNA, the RNA expression profile for each single cell will be generated according to the identification of the cell barcodes. The current limitation for barcoding technology is that only the region near the barcode can be sequenced together with barcode by standard short read next-generation sequencers. Third generation long-range sequencers can circumvent this limitation, but are often considerably more expensive than short-read sequencers and thus economically prohibitive. When attempting to sequence the ImR with 3’ barcoding scRNA-seq, the highly variable V(D)J region is outside of the regions that can be sequenced together with cell barcodes. The inventors denote this as the “off-barcode” region of the cDNA libraries. The ImR can thus currently not be sequenced by 3’ barcoding scRNA-seq. The present invention utilizes self circulating technology to bring the V(D)J region closer to the cell barcodes. This makes it for the first time possible to sequence the ImR with 3’ barcoding single cell sequencing technology while maintaining the advantages of short-read sequencers. By extension, the present invention can also be modified to sequence the “off-barcode” region for all other sequencing technologies utilized barcoding. For example, MARS-seq, CytoSeq, Drop-seq, InDrop, Chromium, sciRNA-seq, Seq-Well, DroNC-seq, SPLiT- seq, Quartz-Seq, Microwell-seq, 3’ transcriptome profiling with UMI, 5’ transcriptome profiling with UMI, 5’ single cell transcriptome profiling, barcoding genome sequencing, barcoding de novo sequencing, barcoding single cell genome seqeuncing, barcoding single cell de novo sequencing, barcoding Hi-C, barcoding single cell Hi-C, barcoding exom sequencing, single cell barcoding exome sequencing, barcoding target enrichment sequencing, single cell barcoding target enrichment sequencing, RNA-seq with UMIs, DNA seq with UMIs, single cell RNA sequencing with barcoding, single cell DNA seq with barcoding, ATAC seq with barcoding technoligies, single cell ATAC sequencing with barcoding techniques, barcoding cacencer pannel enrichment sequencing, barcoding cacencer DNA pannel enrichment sequencing, barcoding cacencer RNA pannel enrichment sequencing. By adjusting shearing steps, the technique can be adjusted to sequence either parts of the “off-barcode” region or the full length of barcoded DNA/RNA.
[0094] The Immune repertoire (ImR)
[0095] The ability of the immune system to recognize specific antigens rather than just general patterns conserved among classes of pathogens represents a decisive evolutionary advantage. All higher vertebrates are equipped with such an adaptive immune system that enables a targeted host defense. The recognition of specific antigens is achieved through the highly variable antigen receptor (AgR) of lymphocytes (T and B cells) and each lymphocyte expresses an AgR with a single sequence variant (‘one sequence variant per cell’). Enormous sequence variability (estimated >1013 different TCR variants in humans) is created during ontogenesis by randomly recombining the highly variable region of the AgR of developing lymphocytes and then subsequently removing or suppressing lymphocytes with non-functional or auto-reactive AgR and each lymphocyte expresses one single variant. The immune repertoire (ImR) denotes the number of different AgR variants (either immunoglobulin or T cell receptor) an organism's adaptive immune system carries. The ImR can be measured by sequencing either mRNA or genomic DNA and this gives a quantitative view of the ImR. For example, over-representation of individual sequence variants indicates expansion of individual lymphocyte clones and trafficking events can be deduced from clonal information. From 5’ to 3’ AgR sequences contain regions denoted variable (V), joining (J), and in some cases, diversity (D) followed by a constant (C) region. The recombination of V, D, and J segments constructs the variable region. Immune repertoire sequencing is to sequence the variable V(D)J region to identify their sequence which allows understanding recombination events, clonal expansion and trafficking of lymphocytes in health and disease.
[0096] For Immune repertoire sequencing, the inventors aim to sequence the V, D and J region. However, when using 3’ barcoding techniques for single cell sequencing, the cell barcode will be add on the 3’ of mRNA, resulting in the 3’ barcoded cDNA. In this case, the barcodes are close to the constant region but far away from the variable region, which leaves barely a chance to the sequence the V, D or J region together with the barcode.
[0097] An alternative method would be utilizing 5’ barcoding single cell sequencing. The barcode is added on 5’ end of the mRNA, which is near the V(D)J region of immune repertoire. In the case, the V(D)J region lays in the near-barcode region, thus it is possible to sequence it together with the barcode. However, so far the most popular single cell sequencing is based on a 3’ barcoding technique, and the results of 5’ barcoding single cell sequencing are not totally comparable with 3’ barcoding single cell sequencing. The present invention is based on 3’ barcoding technique, but the inventors can design V, D, J regions in close proximity to the 3’ barcode, and thus make it possible to sequence them together by short read sequencer.
[0098] Next generation sequencing (NGS)
[0099] Next generation sequencing (NGS) technologies can sequence millions to billions of DNA fragments in one single sequencing run. The revolutionary speed and considerably cheaper cost per base has facilitated a wide range of applications. NGS techniques can be categorized with sequencing by ligation or sequencing by synthesis. But both of them have relatively short read length (Common limitation is 150-250 bp, for some sequencer this limitation could be up to 700 bp) and short read sequencers also have limitations of the insertion fragment size. For an example, one of the most cost effective sequencers in current time (e.g., “Novaseq” from “Nlumina”) is recommended to have insert size ranging between 100 bp and 500 bp. There are also long read sequencers named third-generation sequencers like Pacific BioSciences RS II, but the sequencing cost is considerably higher (over 10 fold higher cost per bp) than the most popular short read sequencers.
[00100] Barcoding technology for DNA/RNA
[00101] Barcoding technology in general denotes using a short section of DNA sequences as an identifier of a fragment of DNA/RNA. Barcoding technology can be used in molecular identification, cell identification, tissue/organ identification, species identification, sample identification, group identification, antibody identification, chemical identification, molecular quantification and de-multiplexing. In essence, barcodes are used to identify DNA/RNA molecules bio-informatically through the barcode sequence rather than physically by separating DNA/RNA from different sorces.
[00102] The concept of barcoding DNA/RNA previously constituted a major breakthrough in multiple technical applications based on NGS. This is because barcoding in principle allows, distinguishing individual DNA/RNA molecules bio-informatically only after sequencing, rather than manually keeping them separated before and during sequencing. This increased experimental efficiency and reduced cost by several orders of magnitude, because a mixture of DNA/RNA molecules can be manually processed together and only be separated ‘virtually’ based on the barcodes after sequencing. Multiple RNA-sequencing applications essentially rely on the barcoding concept.
[00103] The most popular usages of RNA unique molecular identifiers (UMIs) and cell barcodes. UMI is a specific sequencing linker added to the 3’ or 5’ end of RNA primers, DNA primers or oligonucleotides. The Unique sequence of UMIs can identify unique mRNA transcripts or DNA fragments, and therefore helps to profile mRNA/DNA free of PCR errors. UMIs are widely used in RNA-sequencing (RNA-seq), ImR sequencing and single cell RNA sequencing (scRNAseq).
[00104] DNA/RNA barcoding is a method used for analyzing short sections of DNA/RNA from one or more specific gene(s). The barcode can be placed on one or both sides of a DNA/RNA fragment, and may be used for the identification of molecular, cell, tissue/organ, species, and samples, as well as molecular quantification and de-multiplexing.
[00105] A typical procedure is to sequence DNA/RNA with barcodes. The in-barcode region is close to the barcode and the size is not larger than the inserting size limitation of the sequencer used. The off-barcode region is the rest of DNA/RNA. A typical procedure added readl together with barcode at one side of DNA/RNA, the fragments will be sheared into the size fitting the sequencer inserting size, and the off-barcode region will be sheared off from the fragments containing the barcode. A final amplification with primer 1 contains P5 plus readl and primer 2 contains P7, i7 index plus read 2 produced final sequencing libraries. The final library is sequenced by short read sequencer, but only the barcode and in-barcode region.
[00106] For single cell RNA sequencing barcodes can be added at 3’ end of the mRNA. Because only short regions near the barcode can be sequenced together with the barcode, 3’ barcoding of single cells allows only the sequencing of parts near the 3’ end of the mRNA.
[00107] Single cell RNA sequencing (scRNA-seq)
[00108] Single cell RNA sequencing (scRNA-seq) applies next generation sequencing to examine the sequence information of RNA from individual cells. It reveals the heterogeneity of individual cells which brings research and application to a new level. In several single cell RNA sequencing methods, cell barcodes were also introduced to identify cells. Cell barcodes are specific sequencing linkers added to oligonucleotides that can be used to uniquely identify cells. Cell barcoding techniques are broadly used in single cell RNA-sequencing methods, such as MARS-seq, CytoSeq, Drop-seq, InDrop, Chromium, sci-RNA-seq, Seq-Well7, DroNC-seq, SPLiT-seq, Quartz-Seq, Microwell-seq.
[00109] In scRNA-seq, the first step is to reverse transcribe the transcript mRNA using primers containing oligonucleotide dT to match the 3’ poly-A tail of mRNAs. After that, a 2nd chain synthesis is done by which the triple “C” cap adds on the 5’ of mRNA sequence during the first step of reverse transcription. Therefore, there are 2 ways to add barcodes to the mRNA sequences: 1) 3’ barcoding technique, which adds barcodes on the 3’ of mRNA sequences during the first step of reverse transcription. 2) 5’ barcoding technique, which adds barcodes on the 5’ of mRNA sequences during the second step of 2nd chain synthesis. Since 3’ barcoding technique added barcodes 1 step earlier than 5’ barcoding, it would be considered to have a better chance to capture more mRNA fragments and a lesser chance for cell barcodes wrongly added on mRNAs from other cells. That’s why the most popular scRNA-seq methods are utilizing the 3’ barcoding technique to add cell barcodes and mRNA UMIs on the 3’ of mRNA during reverse transcription. By sequencing barcodes and the near-barcode region, the transcriptomic profiles can be generated for each individual cell.
[00110] ImR sequencing in 3’ scRNA-seq
[00111] Two problems impede the sequencing of the ImR in scRNA-seq: 1) the highly variable V(D)J region of interest is located close to the 5’ end of mRNA transcripts but not the 3’end, 2) the entire V(D)J region is too long to be sequenced in its entirety by short range sequencers. In detail, barcodes at the 3’ end of RNA molecules will cause a problem when sequencing the ImR of single cells. When barcodes are added on the 3’ of mRNA, in TCR or BCR, the constant C region will become the near-barcode region since this region is closest to the 3’ end of the mRNA, and the most parts of V(D)J region will become the off-barcode region. In this case, the V(D)J region cannot be sequenced by short read sequencers to profile the ImR with 3’ barcoding single cell sequencing methods.
[00112] The scRNA-seq kits are known in the art (e.g., 5’-kit from 10x Genomics), but this has the following limitations: 1) it has been recently develped and many available datasets were generated with the 3’-approach impeding comparability, 2) 5’-barcoding is considered to have worse performance than 3’-barcoding on scRNA-seq (see above), 3) when it comes to a point to combine scRNA-seq with other approaches (such as combining it with oligo-barcoded antibodies to do CITE-seq), there are more compatible reagents for 3’ scRRNA-seq than 5’ scRNA-seq. Thus, a method to sequence ImR in single cell level with 3’-barcoding would be a better choice than 5’ barcoding.
[00113] The concept and embodiments of the invention [00114] DNA/RNA barcoding is a method of species identification using a short section of DNA/RNA from a specific gene or genes. It can be placed in one or both sides of DNA/RNA fragment, and can be used in molecular identification, cell identification, tissue/organ identification, species identification, sample identification, molecular quantification and de multiplexing. Atypical procedure to sequence DNA/RNA with barcode: Readl and read2 are the sequencing primers, which are both starting points for paired-end sequencing. P5 and P7 are the sequences to bind with sequencing chips for illumina sequencers. i7 index is the index to identify library when a sequencing lane include more than 1 library. For each short read sequencer, there is limitation of incerting size. In-barcode region is close to barcode and the size is not larger than the inserting size limitation of sequencer to use. Off-barcode region is the rest of DNA/RNA. A typical procedure added readl together with barcode at one side of DNA/RNA, the fragments will be sheared into the size fits sequencer inserting size, and off- barcode region will be sheared off from the fragments contain barcode. A final amplification with primer 1 contains P5 plus readl and primer 2 contains P7, i7 index plus read 2 produced final sequencing libraries. The final library is sequenced by a short read sequencer, but only the barcode and near-barcode region (e.g., Figure 1). For single cell RNA sequencing 3’-barcoding can be used, in which the barcodes are added at the 3’ of mRNA. Because only short region near barcode can be sequenced together with barcode, 3’ barcoding single cell sequencing only allows sequencing of parts/fragments near the 3’-end of mRNA.
[00115] Immune repertoire is the number of different sub-types an organism's immune system makes, either immunoglobulin or T cell receptor. They can be measured in either mRNA or genomic DNA. Each immunoglobulin or T cell receptor RNA contains 4 regions from 5’ to 3’: V, D, J and C. The recombination of V, D and J made the variable region and C is constant region. For Immune repertoire sequencing, V, D and J region are preferably sequenced. However, when using 3’ barcoding technique for single cell sequencing, the cell barcode will be add on the 3’ of mRNA, result in the 3’ barcoded cDNA. In this case, the barcode are close to constant region but away from variable region, leaves there barely chance to sequence V, D or J region together with barcode by short read sequencer.
[00116] Molecular structures of immune repertoire during 3’ barding single cell sequencing library preparation. V, D and J are viriable region of immune repertoire and C is constent region. The recombination of different V, D and J lead to the differnce on immune cell receptor. However, since C region is generally longer than the limitation of inserting size for short read sequencer, V, D and J are in the off-barcode region, which made it very rarely to reach V, D and J region during 3’ barcoding single cell sequencing. Therefore it is not possible to sequence immune repertoire by 3’ barcoding single cell sequencing technology known from the prior art.
[00117] An alternative method is to utilize 5’-barcoding single cell sequencing. The barcode is added at the 5’-end of mRNA, which is near the V(D)J region of immune repertoire. In the case, the V(D)J region became near-barcode region, thus it is possible to be sequenced together with barcode. However, so far the most popular single cell sequencing is based on 3’ barcoding technique, and the result of 5’ barcoding single cell sequencing is not totally comparable with 3’ barcoding single cell sequencing. Thus a single cell immune repertoire sequencing based on 3’ barcoding single cell sequencing can be more useful than based on 5’ barcoding sequencing. The present invention is based on 3’ barcoding technique, in that V, D, J region are moved closer to 3’ barcode, thus making it possible to sequence them together by short read sequencer.
[00118] The present invention aimes to solve the problem of being able to combine 3’- barcoding based scRNAseq with ImR sequencing and make it possible to sequence the “off- barcode” region by short read sequencer while maintaining the barcode information. The inventors here utilize self-circularization method to circularize the barcoded DNA/RNA in order to bring barcodes closer to the off-barcode region, thus enabling sequencing the barcode and the off-barcode region together in a cost effective way. The inventors demonstrate applicability of the methods of the present invention for 3’ scRNA-seq, but the library preparing method of the present invention can be more generally used for immune repertoire sequencing, single cell full length RNA sequencing or other applications which require breaking the limitation of sequencing barcodes together with the off-barcode region by short read sequencers.
[00119] In some aspects the present invention relates to a method for/of producing (and/or modifying) a nucleic acid (e.g., nucleic acid of interest, e.g., DNA, RNA or cDNA, etc.) carrying (e.g., comprising) at least one specific barcode (e.g., any unique sequence label) (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or at least 100 barcodes, etc.) at its 3’-end and/or 5’-end, said method comprising: comprising circularizing (e.g., self-circularizing) of nucleic acids (e.g., barcoded nucleic acids of interest). [00120] In some aspects the present invention relates to a method for/of producing (and/or modifying) a nucleic acid (e.g., nucleic acid of interest, e.g., DNA, RNA or cDNA, etc.) carrying (e.g., comprising) at least one specific barcode (e.g., any unique sequence label) (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or at least 100 barcodes, etc.) at its 3’-end and/or 5’-end, said method comprising: preferably: (i) providing: a nucleic acid (e.g., nucleic acid of interest, e.g., DNA, RNA or cDNA, etc.) carrying (e.g., comprising) at least one specific barcode (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or at least 100 barcodes, etc.) at its 3’-end and/or 5’-end; or (ii) adding at least one specific barcode (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or at least 100 barcodes, etc.) at 3’-end and/or 5’-end of said nucleic acid (e.g., nucleic acid of interest, e.g., DNA, RNA or cDNA, etc.); said method further comprising: (a) circularizing (e.g., self-circularizing) said nucleic acid into a circular nucleic acid, preferably said circularizing (e.g., self-circularizing) is carried out by the means of an enzymatic ligation (e.g., by the means of: DNA ligase (e.g., T4 DNA Ligase) or RNA ligase, e.g., DNA ligase having EC:6.5.1.1, EC:6.5.1.2, EC:6.5.1.6 or EC:6.5.1.7 enzymatic activity, RNA ligase having EC:6.5.1.3 enzymatic activity, 3'-phosphate/5'-hydroxy nucleic acid ligase having EC:6.5.1.8 enzymatic activity, etc.); optionally, removing uncircularized (e.g., linear) nucleic acid after said circularizing, preferably said removing is carried out by the means of a nucleic acid digestion (e.g., by the means of: a nuclease/s, e.g., an exonuclease/s and/or endonuclease/s, e.g., as described in the Examples section herein); (b) enriching (e.g., amplifying) said target nucleic acid from said circularized (e.g., self- circularized) nucleic acid from step (a), e.g., by the means of any suitable nucleic acid enriching (e.g., amplifying) method, preferably said enriching is carried out by the means of: PCR (e.g., polymerase chain reaction), NASBA (e.g., nucleic acid sequence based amplification), RCA (e.g., rolling-circle amplification), SPIA (e.g., single primer isothermal amplification), HDA (e.g., helicase-dependent amplification), SDA (e.g., strand displacement amplification), LAMP (e.g., loop-mediated isothermal amplification), RPA (e.g., recombinase polymerase amplification), TS- PCR (e.g., template-switching polymerase chain reaction, e.g., SMART, e.g., switching mechanism at 5’-end of RNA transcript), ICAN (e.g., isothermal and/or chimeric primer-initiated amplification of nucleic acids), SDA (e.g., strand displacement amplification), EXPAR (e.g., exponential amplification reaction), NEMA (e.g., nicking endonuclease-mediated isothermal amplification), in-vitro transcription, etc., and/or combination/s thereof, further preferably said PCR is carried out with primer/s (e.g., SEQ ID NOs: 1, 2, 29, 30, 31 , 32, 33 or 34) capable of hybridizing to a specific and/or unspecific sequence/s within (and/or adjacent to) said nucleic acid of interest (e.g., to polyA sequence within said nucleic acid of interest and/or constant region within said nucleic acid of interest (e.g., constant (C) region of B-cell receptor) under suitable conditions (e.g., as described in the Examples section herein); (c) optionally, circularizing (e.g., self-circularizing) the amplified nucleic acid from step (b) into a circular nucleic acid; optionally, removing uncircularized (e.g., linear) nucleic acid after said circularizing, preferably said removing is carried out by the means of a nucleic acid digestion (e.g., by the means of a nuclease, e.g., as described in the Examples section herein); (d) optionaly, linearizing the circularized nucleic acid from step (c), preferably said linearizing is carried out by the means of PCR and/or nucleic acid fragmentation (e.g., shearing); further preferably said nucleic acid fragmentation is a random/stochastic nucleic acid fragmentation (e.g., as described in the Examples section herein); (e) optionally, ligating an adapter sequence/s comprising at least one specific oligonucleotide (e.g., a sequencing primer) to the 3’-end and/or 5’-end of the linearized nucleic acid from step (b), (c) or (d); preferably said adapter sequence comprising SEQ ID NO: 1 (e.g., Illumina readl primer) and/or SEQ ID NO: 2 (e.g., Illumina read2 primer) sequence (e.g., as described in the Examples section herein); (f) optionally, amplifying the ligation product from step (e); preferably said amplifying is carried out by the means of PCR; further preferably said PCR is carried out with a primer hybridizing to said adapter sequence (e.g., as described in the Examples section herein); (g) optionally, sequencing the barcoded nucleic acid from step (e) or (f), wherein said nucleic acid of interest is sequenced together with said barcode (e.g., in a single- or paired-end sequencing), wherein said method is a 3’-end and/or 5’end sequencing method (e.g., high-throughput sequencing method) (e.g., as described in the Examples section herein).
[00121] In some aspects the present invention relates to the method of the present invention, wherein the methods steps (a) to (e) or (f) or (g) are carried out consecutively.
[00122] In some aspects the present invention relates to the method of the present invention, wherein said method comprises no amplification and/or molecular modification of said nucleic acid of interest prior to the circularizing of step (a).
[00123] In some aspects the present invention relates to the method of the present invention, wherein said method comprises an amplification and/or molecular modification of said nucleic acid of interest prior to the circularizing of step (a).
[00124] In some aspects the present invention relates to the method of the present invention, wherein said adapter sequence: (i) does not comprise restriction site/s for a restriction endonuclease (e.g., having EC:3.1.21.4 enzymatic activity, e.g., does not comprise restriction site/s for Not I restriction endonuclease (e.g., 5’-GCGGCCGC-3’), e.g., wherein Not! is a restriction endonuclease derived from Nocardia otitidiscaviarum, e.g., having UniProtKB - Q2I6W2); and/or (ii) can not be recognized and/or cleaved by a restriction endonuclease (e.g., Not! restriction endonuclease, e.g., having UniProtKB - Q2I6W2).
[00125] In some aspects the present invention relates to the method of the present invention, wherein said method comprising step (g), wherein said sequencing of step (g) is a single- or paired-end sequencing (e.g., as described in the Examples section herein).
[00126] In some aspects the present invention relates to the method of the present invention, wherein said nucleic acid of interest comprising at least 1 specific barcode sequence (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or at least 100 specific barcode sequences).
[00127] In some aspects the present invention relates to the method of the present invention, wherein said nucleic acid of interest (e.g., a farthest nucleotide of the nucleic acid of interest to be sequenced) and said barcode are at least 100 nucleotides apart, e.g., at least about 500-700 nucleotides apart, e.g., at least about 700 nucleotides apart.
[00128] In some aspects the present invention relates to the method of the present invention, wherein said method is/suitable for a short read sequencing (e.g., a short read high-throughput sequencing), preferably with sequencing read length not longer than 1000 nucleotides.
[00129] In some aspects the present invention relates to the method of the present invention, wherein said method comprising step (g), wherein said method has the sequencing read accuracy (e.g., single read-based, e.g., not consensus based) of at least 50%% (e.g., 60%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8% or 99.9%). [00130] In some aspects the present invention relates to the method of the present invention, wherein said method comprising step (g), wherein said method has the time per sequencing run of at least 10 minute.
[00131] In some aspects the present invention relates to the method of the present invention, wherein: (i) said nucleic acid comprises a plurality of nucleic acids (e.g., cDNA library or sequencing library), and preferably said plurality of nucleic acids is derived from a single cell (e.g., as described in the Examples section herein); and/or (ii) said method comprises/applied to multiple (e.g., non-identical) nucleic acids modified/processed according to the method steps of the present invention, preferably said method is a method for nucleic acid library construction (e.g., cDNA library or sequencing library).
[00132] In some aspects the present invention relates to the method of the present invention, wherein said nucleic acid is/comprises a plurality of nucleic acids (e.g., cDNA library or sequencing library), wherein said method, comprising step (g), is a method for multiplex sequencing of said plurality of nucleic acids (e.g., as described in the Examples section herein). [00133] In some aspects the present invention relates to the method of the present invention, wherein said nucleic acid of interest is an amplification and/or reverse transcription product (e.g., as described in the Examples section herein).
[00134] In some aspects the present invention relates to the method of the present invention, wherein said method comprising step (g), wherein said method is suitable for sequencing a variable region of a nucleic acid encoding an antigen receptor and/or an antibody, preferably said method is a method for sequencing of a nucleic acid encoding a variable region of an antigen receptor and/or an antibody (e.g., as described in the Examples section herein).
[00135] In some aspects the present invention relates to the method of the present invention, wherein said method comprising step (g), wherein said method is suitable for determining a clone type frequency of an antigen receptor and/or antibody, preferably said method is a method for determining a clone type frequency of an antigen receptor and/or antibody (e.g., as described in the Examples section herein).
[00136] In some aspects the present invention relates to the method of the present invention, wherein said method comprising step (g), wherein said method is suitable for a full length RNA sequencing (e.g., full length single cell RNA sequencing), preferably said method is a method for a full length single cell RNA target enrichment sequencing (e.g., as described in the Examples section herein).
[00137] In some aspects the present invention relates to the method of the present invention for sequencing the 3’-end barcoded nucleic acid, e.g., derived from a single cell, wherein the nucleic acid produced/modified according to the method of the present invention comprises form 5’ to 3’-end: a first primer binding site; a first index sequence as library identifier; a binding site for a sequencing primer (e.g., read2, e.g., SEQ ID NO: 2); a sequence of interest with binding sites, e.g., for primer “TSO”, “C” and Poly-A, e.g., SEQ ID NOs: 29, 30, 31 , 32, 33 or 34); a cell barcode and/or unique molecular identifier; a binding site for another sequencing primer (e.g., readl, e.g., SEQ ID NO: 1); a second index sequence as library identifier; a second primer binding site, and wherein the method comprises: amplifying the barcoded cDNA with TSO and readl primers to generate double-stranded cDNA; circulating the double-stranded barcoded cDNA; linearizing the circulated barcoded cDNA by PCR with Poly-A and “C” primers e.g., SEQ ID NOs: 29, 30, 31 , 32, 33 or 34); circulating the linearized barcoded cDNA; linearizing the circulated barcoded cDNA by PCR with TSO and readl primers; adding an adapter sequence for the read2 sequencing primer at the 5‘-end of the sequence of interest; and sequencing the barcoded cDNA fragments (e.g., as described in the Examples section herein).
[00138] In some aspects the present invention relates to the method of the present invention, wherein the barcoded cDNA is circularized to enable sequencing of a sequence of interest positioned in distance to the cell barcode.
[00139] In some aspects the present invention relates to the method of the present invention, wherein the method is used to sequence the variable regions of antigen receptors or antibodies. [00140] In some aspects the present invention relates to the method of the present invention, wherein the clone type frequencies of antigen receptors or antibodies can be determined. [00141] In some aspects the present invention relates to the method of the present invention, wherein the constant region of the antigen receptor is on the 5’-end of the sequence of interest in close proximity to the cell barcode.
[00142] In some aspects the present invention relates to the method of the present invention, wherein the sequence of interest is the highly variable V(D)J region, which is positioned at the 3’-end of the sequence of interest in far distance to the cell barcode.
[00143] In some aspects the present invention relates to the method of the present invention, wherein the sequence of interest comprises before sequencing a reduced constant region at the 5’-end, and the full length highly variable V(D)J region at the 3’-end of the sequence of interest. [00144] In some aspects the present invention relates to the method of the present invention, wherein the barcode of the present invention is a unique sequence used to identify a specific cell.
[00145] In some aspects the present invention relates to the method of the present invention, wherein the barcode of the present invention (e.g., cell barcode) is selected from the group consisting of: cell identifying barcodes, molecular identifying barcodes, DNA or RNA identifying barcodes, sample identifying barcodes, chemical identifying barcodes, protein identifying barcodes, quantification barcodes.
[00146] In some aspects the present invention relates to the method of the present invention, wherein the barcode of the present invention (e.g., cell barcode) has molecular modifications selected from the group consisting of: fluorophores and dark quenchers labeling, non- fluorescent labeling, fluorescent labeling, biotinylation, avidinylation, attachment chemistry/linkers modifications, adenylation, spacer modifications, phosphorylation, phosphorothioate bonds, click chemistry modifications, and base modifications.
[00147] In some aspects the present invention relates to the method of the present invention, wherein the barcode of the present invention (e.g., cell barcode) is combined with a unique molecular identifier.
[00148] In some aspects the present invention relates to the method of the present invention, wherein one or more cell barcode or one or more unique molecular identifier is added.
[00149] In some aspects the present invention relates to the method of the present invention, wherein a short read sequencer is used for sequencing.
[00150] In some aspects the present invention relates to the method of the present invention, wherein the percentage of valid data is the ratio of cell barcode counts of fragments to total barcode counts and is at least about 80%, preferably at least about 90%, more preferably at least about 95%.
[00151] In some aspects the present invention relates to the method of the present invention, wherein said method is the method for profiling variable regions of antigen receptors or antibodies, comprising: (a) isolating mRNA from a plurality of single cells to provide a plurality of individual mRNA samples, wherein each individual mRNA sample is from a single cell; (b) reverse-transcribing the mRNA samples of a cell, producing cDNA incorporating a cell barcode sequence; (c) pooling and purifying the barcoded cDNA produced from the separate cells; (d) amplifying the barcoded cDNA to generate a cDNA library comprising double-stranded cDNA; (e) circulating the barcoded double-stranded cDNA; (f) linearizing the circulated barcoded cDNA by PCR and target enrichment with Poly-A (e.g., SEQ ID NO: 29) and “C” primer/s (e.g., SEQ ID NOs: 30, 31, 32 or 33); or linearizing the circulated barcoded cDNA by PCR and target enrichment with forward primers which match sequences with the nucleotides on the 3’ of targeted region (e.g., with a primer comprising or consisting of a Poly A tail, e.g., SEQ ID NO: 29) and reverse primer/s matching constant region of TCR/BCR (e.g., SEQ ID NOs: 30, 31, 32 or 33) (g) circulating the linearized barcoded cDNA; (h) linearizing the circular barcoded cDNA by PCR with readl and TSO primers; (i) reducing the length of the linearized barcoded cDNA; 0 adding an adapter sequence for the read2 sequencing primer at the 5‘-end of the sequence of interest; and (k) sequencing the cDNA fragments (e.g., as described in the Examples section herein).
[00152] In some aspects the present invention relates to the method of the present invention, further comprising: amplifying the barcoded cDNA to generate a cDNA library comprising double-stranded cDNA, e.g., by using PCR, e.g., with TSO/read1 primers (e.g., SEQ ID NOs: 34 and 1); optionally, end pairing and A adding (e.g., as described in the Examples and Figures herein).
[00153] In some aspects the present invention relates to the method of the present invention, wherein adding/providing a barcode/s to a nucleic acid of the present invention, e.g., RNA or DNA, is carried out by the means of: ligation, extraction, proliferation, transcription, amplification, reverse-transcription, DNA extension, antibody binding, PCR, qPCR, realtime PCR, Digital PCR(dPCR), Droplet Digital PCR(ddPCR), recombination, biotin capture, transposition, enzyme reaction, exonuclease digestion, endonuclease digestion, digestion and/or 2nd strand synthesizing.
[00154] In some aspects the present invention relates to the method of the present invention, wherein the barcode is added on the 5’-end of the mRNA of a sequence of interest.
[00155] In some aspects the present invention relates to the method of the present invention, wherein the mRNA is isolated from animals, cells, single cell, tissue, biopsies, blood, and cell cultures.
[00156] In some aspects the present invention relates to the method of the present invention, wherein the mRNA is further isolated from virus, bacteria, micro-beings, and plants.
[00157] In some aspects the present invention relates to the method of the present invention, wherein circularization of the nucleic acid of interest of the present invention (e.g., DNA or RNA) is is carried out by the means of: ligation, RNA ligation, T4 DNA Ligase, Cre lox recombination, transposition, and/or DNA circulating enzyme use.
[00158] In some aspects the present invention relates to the method of the present invention, wherein enrichment method step is carried out by the means of: Hybridization-Based Capture, PCR-Based Capture, biotin capture, Exonuclease digestion, endonuclease digestion, digestion, biotin capture, and/or PCR.
[00159] In some aspects the present invention relates to the method of the present invention, wherein after each circularization method step a DNA digestion of left over linear DNA is optionally carried out.
[00160] In some aspects the present invention relates to the method of the present invention, said method further comprising: one or more nucleic acid purification step/s.
[00161] In some aspects the present invention relates to the method of the present invention, wherein said purification is carried out by the means of: DNA or RNA column purification, DNA or RNA combining beads, biotin capture and/or chemical precipitation.
[00162] In some aspects the present invention relates to the method of the present invention, wherein said method is further combined with one or more of the following: MARS-sequencing, Cyto-sequencing, Drop-sequencing, InDrop, Chromium, sciRNA-sequencing, sequencing-Well, DroNC-sequencing, SPLiT-sequencing, Quartz-sequencing, Microwell-sequencing, 3’- transcriptome profiling with UMI, 5’ transcriptome profiling with UMI, 5’-single cell transcriptome profiling, barcoding genome sequencing, barcoding de novo sequencing, barcoding single cell genome sequencing, barcoding single cell de novo sequencing, barcoding Hi-C, barcoding single cell Hi-C, barcoding Exom sequencing, single cell barcoding Exome sequencing, barcoding target enrichment sequencing, single cell barcoding target enrichment sequencing, RNA-sequencing with UMIs, DNA sequencing with UMIs, single cell RNA sequencing with barcoding, single cell DNA sequencing with barcoding, ATAC sequencing with barcoding technologies, single cell ATAC sequencing with barcoding techniques, barcoding cancer panel enrichment sequencing, barcoding cancer DNA panel enrichment sequencing, and barcoding cancer RNA panel enrichment sequencing.
[00163] In some aspects the present invention relates to the method of the present invention, wherein said method is suitable for / compatible with a single cell full length RNA sequencing method carried out with existing single cell RNA sequencing kits (both 3’ and 5’ kits).
[00164] In some aspects the present invention relates to the method of the present invention, wherein said method is suitable for / compatible with a single cell immune repertoire sequencing carried out with cDNA samples derived from storage.
[00165] In some aspects the present invention relates to the method of the present invention, wherein said method is suitable for identifying RNA location in a tissue sample.
[00166] In some aspects the present invention relates to the method of the present invention, wherein said method is suitable for sequencing an immune repertoire in a spatial manner. [00167] In some aspects the present invention relates to the method of the present invention, wherein said nucleic acid is a cDNA produced by any suitable 3’ barcoding single cell RNA library preparation method, e.g., no need to introduce an extra restriction site sequences.
[00168] In some aspects the present invention relates to the method of the present invention, wherein the circularization step is carried out before the enrichment step (e.g., BCR target enrichment), which, for example, allows for the use of standard primers, e.g., polyA and BCR constant region primers to perform a cheap and fast enrichment.
[00169] In some aspects the present invention relates to the method of the present invention, wherein re-linaerization step (e.g., method step (d)) is carried out by the means of a PCR, preferably said re-linaerization step does not comprise a restriction enzyme digestion.
[00170] In some aspects the present invention relates to the method of the present invention, wherein said method comprises only one target sequence enrichment/selection (e.g., method step (b)).
[00171] In some aspects the present invention relates to the method of the present invention, wherein said method is compatible with existing methods of single cell RNA 3’-capture.
[00172] In some aspects the present invention relates to the method of the present invention, wherein said method can utilize polyA region and BCR constant region to enrich a nucleic acid of interest (e.g., BCR).
[00173] In some aspects the present invention relates to the method of the present invention, wherein said method utilizes PCR to re-linearize circulated cDNA.
[00174] In some aspects the present invention relates to the method of the present invention, wherein said method circulazization step is carried out before enrichment step.
[00175] In some aspects the present invention relates to the method of the present invention, which is capable of utilizing cDNA from the samples that have been already processed by other single cell 3’-capture methods.
[00176] In some aspects the present invention relates to the method of the present invention, wherein said method is an in vitro or ex vivo or in vivo method.
[00177] In some aspects the present invention relates to/provides a nucleic acid (e.g., a nucleic acid of interest, e.g., DNA, RNA or cDNA) carrying (e.g., comprising) at least one specific barcode (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or at least 100 barcodes) at its 3’ and/or 5’-end, e.g., produced (or modified) by the method of the present invention (e.g., as described in the Examples section herein).
[00178] In some aspects the present invention relates to/provides the nucleic acid, e.g., produced (or modified) by the method of the present invention, wherein said nucleic acid is an intermediate product, e.g., in another method.
[00179] In some aspects the present invention relates to/provides the nucleic acid, e.g., produced (or modified) by the method of the present invention, wherein said nucleic acid comprising: a binding site for a sequencing primer; a sequence of interest with binding sites for three primers; a cell barcode and/or unique molecular identifier; and a binding site for another sequencing primer.
[00180] In some aspects the present invention relates to/provides the nucleic acid/s and/or polypeptide/s and/or nucleic acid/s encoding said polypeptides, e.g., SEQ ID NOs: 1-34 and/or nucleic acid/s and/or polypeptide/s and/or nucleic acid/s encoding said polypeptides being at least 80% (e.g., at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99%) sequence identity to any one of SEQ ID NOs: 1-34, e.g., for use in the method/s, composition/s and/or kit/s of the present invention.
[00181] In some aspects the present invention relates to/provides the nucleic acid, e.g., produced (or modified) by the method of the present invention, wherein said nucleic acid further comprising: any one of the following sequences: a first primer binding site; a first index sequence as library identifier; a second index sequence as library identifier; or a second primer binding site.
[00182] In some aspects the present invention relates to/provides the nucleic acid of the present invention, wherein functional equivalent sequences are used.
[00183] In some aspects the present invention relates to/provides the methods/nucleic acids/compositions and/or kits of the present invention as depicted in the Examples and/or Figures as described herein (e.g., as depicted in Figure 1-10) carried out, for example, with SEQ ID NOs: 1-2, 29, 30, 31, 32, 33 or 34.
[00184] In some aspects the present invention relates to/provides polypeptides comprising or consisting of: SEQ ID NOs: 3-28, for use in the methods of the present invention.
[00185] In some aspects the present invention relates to/provides a composition or kit comprising the nucleic acid and/or polypeptide/sof the present invention (e.g., as described in the Examples section herein), e.g., for use in the methods of the present invention. [00186] In some aspects the present invention relates to the nucleic acid/s, polypeptide/s, composition/s or kit/s of the present invention for use as a medicament and/or diagnostic marker.
[00187] In some aspects the present invention relates to the method/s, nucleic acid/s, polypeptide/s, composition/s or kit/s of the present invention for use in a diagnostic and/or screening (e.g., disease susceptibility screening) and/or prognostic and/or prediction (e.g., disease outcome and/or course prognosis/prediction) and/or phenotyping method (e.g., immunodiagnostic method, e.g., for an autoimmune disease, e.g., lupus erythematosis, immune disease, inflammatory disease, neuroinflammatory disease, meningitis, interleukin (I L)-17 producing T helper (Th17)-cells associated disease, cell-dominated meningeal inflammation, infections disease, genetic disorder, tissue typing/compatibility).
[00188] In some aspects the present invention relates to the method/s, nucleic acid/s, polypeptide/s, composition/s or kit/s of the present invention for use in in one or more of the following methods: (i) sequencing method (e.g., as described in the Examples section herein); (ii) library construction (e.g., cDNA library or sequencing library) method (e.g., as described in the Examples section herein); (iii) method for determining a clone type frequency of an antigen receptor and/or antibody, preferably said method is a method for determining a clone type frequency of an antigen receptor and/or antibody (e.g., as described in the Examples section herein); (iv) method for sequencing a variable region of a nucleic acid encoding an antigen receptor and/or an antibody, preferably said method is a method for sequencing of a nucleic acid encoding a variable region of an antigen receptor and/or an antibody (e.g., as described in the Examples section herein); (v) diagnostics and/or screening method (e.g., disease diagnostics and/or screening method); and/or (vi) any one of (i)-(v), wherein said method is an in vitro or ex vivo or in vivo method.
[00189] In some aspects the present invention relates to use of the nucleic acid or composition or kit of the present invention for one or more of the following: (i) for sequencing (e.g., as described in the Examples section herein); (ii) for library construction (e.g., cDNA library or sequencing library) (e.g., as described in the Examples section herein); (iii) for determining a clone type frequency of an antigen receptor and/or antibody, preferably said method is a method for determining a clone type frequency of an antigen receptor and/or antibody (e.g., as described in the Examples section herein); (iv) for sequencing a variable region of a nucleic acid encoding an antigen receptor and/or an antibody, preferably said method is a method for sequencing of a nucleic acid encoding a variable region of an antigen receptor and/or an antibody (e.g., as described in the Examples section herein); (v) for diagnostics and/or screening (e.g., disease diagnostics and/or screening); and/or (vi) for any one of (i)-(v), wherein said use is an in vitro or ex vivo or in vivo use.
[00190] Throughout the description and claims of this specification, the singular encompasses the plural unless the context otherwise requires. In particular, where the indefinite article is used, the specification is to be understood as contemplating plurality as well as singularity, unless the context requires otherwise.
[00191] It should be understood that this invention is not limited to the particular methodology, protocols, material, reagents, and substances, etc., described herein and as such can vary. The terminology used herein is for the purpose of describing particular embodiments only, and is not intended to limit the scope of the present invention, which is defined solely by the claims.
[00192] All publications cited throughout the text of this specification (including all patents, patent application, scientific publications, instructions, etc.), whether supra or infra, are hereby incorporated by reference in their entirety. Nothing herein is to be construed as an admission that the invention is not entitled to antedate such disclosure by virtue of prior invention. To the extent the material incorporated by reference contradicts or is inconsistent with this specification, the specification will supersede any such material.
[00193] The invention is also characterized by the following items:
1. A method for/of producing (and/or modifying) a nucleic acid (e.g., nucleic acid of interest, e.g., DNA, RNA or cDNA, etc.) carrying (e.g., comprising) at least one specific barcode (e.g., any unique sequence label) (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or at least 100 barcodes, etc.) at its 3’-end and/or 5’-end, said method comprising: preferably: (i) providing: a nucleic acid (e.g., nucleic acid of interest, e.g., DNA, RNA or cDNA, etc.) carrying (e.g., comprising) at least one specific barcode (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or at least 100 barcodes, etc.) at its 3’-end and/or 5’-end; or (ii) adding at least one specific barcode (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or at least 100 barcodes, etc.) at 3’- end and/or 5’-end of said nucleic acid (e.g., nucleic acid of interest, e.g., DNA, RNA or cDNA, etc.); a) circularizing (e.g., self-circularizing) said nucleic acid into a circular nucleic acid, preferably said circularizing (e.g., self-circularizing) is carried out by the means of an enzymatic ligation (e.g., by the means of: DNA ligase (e.g., T4 DNA Ligase) or RNA ligase, e.g., DNA ligase having EC:6.5.1.1, EC:6.5.1.2, EC:6.5.1.6 or EC:6.5.1.7 enzymatic activity, RNA ligase having EC:6.5.1.3 enzymatic activity, 3'-phosphate/5'- hydroxy nucleic acid ligase having EC:6.5.1.8 enzymatic activity, etc.); optionally, removing uncircularized (e.g., linear) nucleic acid after said circularizing, preferably said removing is carried out by the means of a nucleic acid digestion (e.g., by the means of: a nuclease/s, e.g., an exonuclease/s and/or endonuclease/s, e.g., as described in the Examples section herein); b) enriching (e.g., amplifying) said target nucleic acid from said circularized (e.g., self- circularized) nucleic acid from step (a), e.g., by the means of any suitable nucleic acid enriching (e.g., amplifying) method, preferably said enriching is carried out by the means of: PCR (e.g., polymerase chain reaction), NASBA (e.g., nucleic acid sequence based amplification), RCA (e.g., rolling-circle amplification), SPIA (e.g., single primer isothermal amplification), HDA (e.g., helicase-dependent amplification), SDA (e.g., strand displacement amplification), LAMP (e.g., loop-mediated isothermal amplification), RPA (e.g., recombinase polymerase amplification), TS-PCR (e.g., template-switching polymerase chain reaction, e.g., SMART, e.g., switching mechanism at 5’-end of RNA transcript), ICAN (e.g., isothermal and/or chimeric primer-initiated amplification of nucleic acids), SDA (e.g., strand displacement amplification), EXPAR (e.g., exponential amplification reaction), NEMA (e.g., nicking endonuclease-mediated isothermal amplification), in-vitro transcription, etc., and/or combination/s thereof, further preferably said PCR is carried out with primer/s capable of hybridizing to a specific and/or unspecific sequence/s within (and/or adjacent to) said nucleic acid of interest (e.g., to polyA sequence within said nucleic acid of interest and/or constant region within said nucleic acid of interest (e.g., constant (C) region of B-cell receptor) under suitable conditions (e.g., as described in the Examples section herein); c) optionally, circularizing (e.g., self-circularizing) the amplified nucleic acid from step (b) into a circular nucleic acid; optionally, removing uncircularized (e.g., linear) nucleic acid after said circularizing, preferably said removing is carried out by the means of a nucleic acid digestion (e.g., by the means of a nuclease, e.g., as described in the Examples section herein); d) optionaly, linearizing the circularized nucleic acid from step (c), preferably said linearizing is carried out by the means of PCR and/or nucleic acid fragmentation (e.g., shearing); further preferably said nucleic acid fragmentation is a random/stochastic nucleic acid fragmentation (e.g., as described in the Examples section herein); e) ligating an adapter sequence/s comprising at least one specific oligonucleotide (e.g., a sequencing primer) to the 3’-end and/or 5’-end of the linearized nucleic acid from step (b), (c) or (d); preferably said adapter sequence comprising SEQ ID NO: 1 (e.g., “Nlumina” readl primer) and/or SEQ ID NO: 2 (e.g., “Nlumina” read2 primer) sequence (e.g., as described in the Examples section herein); f) optionally, amplifying the ligation product from step (e); preferably said amplifying is carried out by the means of PCR; further preferably said PCR is carried out with a primer hybridizing to said adapter sequence (e.g., as described in the Examples section herein); g) optionally, sequencing the barcoded nucleic acid from step (e) or (f), wherein said nucleic acid of interest is sequenced together with said barcode (e.g., in a single- or paired-end sequencing), wherein said method is a 3’-end and/or 5’end sequencing method (e.g., high-throughput sequencing method) (e.g., as described in the Examples section herein). The method of any one of the preceding items, wherein the methods steps (a) to (e) or (f) or (g) are carried out consecutively. The method of any one of the preceding items, wherein said method comprises no amplification and/or molecular modification of said nucleic acid of interest prior to the circularizing of step (a). The method of any one of the preceding items, wherein said adapter sequence: (i) does not comprise restriction site/s for a restriction endonuclease (e.g., having EC:3.1.21.4 enzymatic activity, e.g., does not comprise restriction site/s for Not I restriction endonuclease (e.g., 5’-GCGGCCGC-3’), e.g., wherein Not I is a restriction endonuclease derived from Nocardia otitidiscaviarum, e.g., having UniProtKB - Q2I6W2); and/or (ii) can not be recognized and/or cleaved by a restriction endonuclease (e.g., Not I restriction endonuclease, e.g., having UniProtKB - Q2I6W2). The method of any one of the preceding items, comprising step (g), wherein said sequencing of step (g) is a single- or paired-end sequencing (e.g., as described in the Examples section herein). The method of any one of the preceding items, wherein said nucleic acid of interest comprising at least 1 specific barcode sequence (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or at least 100 specific barcode sequences). The method of any one of the preceding items, wherein said nucleic acid of interest (e.g., a farthest nucleotide of the nucleic acid of interest to be sequenced) and said barcode are at least 100 nucleotides apart, e.g., at least about 500-700 nucleotides apart, e.g., at least about 700 nucleotides apart. The method of any one of the preceding items, wherein said method is/suitable for a short read sequencing (e.g., a short read high-throughput sequencing), preferably with sequencing read length not longer than 1000 nucleotides. The method of any one of the preceding items comprising step (g), wherein said method has the sequencing read accuracy (e.g., single read-based, e.g., not consensus based) of at least 50%% (e.g., 60%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8% or 99.9%). The method of any one of the preceding items, comprising step (g), wherein said method has the time per sequencing run of at least 10 minute. The method of any one of the preceding items, wherein: (i) said nucleic acid is a plurality of nucleic acids (e.g., cDNA library or sequencing library), and preferably said plurality of nucleic acids is derived from a single cell (e.g., as described in the Examples section herein); and/or (ii) said method comprises multiple (e.g., non-identical) nucleic acids modified/processed according to the method steps according to any one of the preceding items, preferably said method is a method for nucleic acid library construction (e.g., cDNA library or sequencing library). The method of any one of the preceding items, wherein said nucleic acid is a plurality of nucleic acids (e.g., cDNA library or sequencing library), wherein said method, comprising step (g), is a method for multiplex sequencing of said plurality of nucleic acids (e.g., as described in the Examples section herein). The method of any one of the preceding items, wherein said nucleic acid of interest is an amplification and/or reverse transcription product (e.g., as described in the Examples section herein). The method of any one of the preceding items, comprising step (g), wherein said method is suitable for sequencing a variable region of a nucleic acid encoding an antigen receptor and/or an antibody, preferably said method is a method for sequencing of a nucleic acid encoding a variable region of an antigen receptor and/or an antibody (e.g., as described in the Examples section herein). The method of any one of the preceding items, comprising step (g), wherein said method is suitable for determining a clone type frequency of an antigen receptor and/or antibody, preferably said method is a method for determining a clone type frequency of an antigen receptor and/or antibody (e.g., as described in the Examples section herein). The method of any one of the preceding items, comprising step (g), wherein said method is suitable for a full length RNA sequencing (e.g., full length single cell RNA sequencing), preferably said method is a method for a full length single cell RNA target enrichment sequencing (e.g., as described in the Examples section herein). The method of any one of the preceding items, wherein (i) said method is an in vitro or ex vivo method and/or (ii) the primer is selected from the group consisting og: e.g., SEQ ID NOs: 1-2, 29-34. A nucleic acid (e.g., DNA, RNA or cDNA, e.g., SEQ ID NOs: 1-34, e.g., as described in the Examples section herein) carrying (e.g., comprising) at least one specific barcode (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or at least 100 barcodes) at its 3’ and/or 5’-end, produced (or modified) by the method according to any one of preceding items (e.g., as described in the Examples section herein). A composition or kit comprising the nucleic acid according to any one of preceding items (e.g., as described in the Examples section herein). The nucleic acid or polypeptide (e.g., SEQ ID NOs: 1-34, e.g., as described in the Examples section herein) or composition or kit according to any one of preceding items for use in one or more of the following methods: i) sequencing method (e.g., as described in the Examples section herein); ii) library construction (e.g., cDNA library or sequencing library) method (e.g., as described in the Examples section herein); iii) method for determining a clone type frequency of an antigen receptor and/or antibody, preferably said method is a method for determining a clone type frequency of an antigen receptor and/or antibody (e.g., as described in the Examples section herein); iv) method for sequencing a variable region of a nucleic acid encoding an antigen receptor and/or an antibody, preferably said method is a method for sequencing of a nucleic acid encoding a variable region of an antigen receptor and/or an antibody (e.g., as described in the Examples section herein); v) diagnostic and/or screening (e.g., disease susceptibility screening) and/or prognostic and/or prediction (e.g., disease outcome and/or course prognosis/prediction) and/or phenotyping method (e.g., immunodiagnostic method, e.g., for an autoimmune disease, e.g., lupus erythematosis, immune disease, inflammatory disease, neuroinflammatory disease, meningitis, interleukin (I L)-17 producing T helper (Th17)- cells associated disease, cell-dominated meningeal inflammation, infections disease, genetic disorder, tissue typing/compatibility); vi) in any one of (i)-(v), wherein said method is an in vitro or ex vivo or in vivo method. Use of the nucleic acid or polypeptide (e.g., SEQ ID NOs: 1-34, e.g., as described in the Examples section herein) or composition or kit according to any one of preceding items for/in one or more of the following: i) for sequencing (e.g., as described in the Examples section herein); ii) for library construction (e.g., cDNA library or sequencing library) (e.g., as described in the Examples section herein); iii) for determining a clone type frequency of an antigen receptor and/or antibody, preferably said method is a method for determining a clone type frequency of an antigen receptor and/or antibody (e.g., as described in the Examples section herein); iv) for sequencing a variable region of a nucleic acid encoding an antigen receptor and/or an antibody, preferably said method is a method for sequencing of a nucleic acid encoding a variable region of an antigen receptor and/or an antibody (e.g., as described in the Examples section herein); v) for diagnostics and/or screening (e.g., disease susceptibility screening) and/or prognostic and/or prediction (e.g., disease outcome and/or course prognosis/prediction) and/or phenotyping (e.g., immunodiagnostic method, e.g., for an autoimmune disease, e.g., lupus erythematosis, immune disease, inflammatory disease, neuroinflam matory disease, meningitis, interleukin (I L)-17 producing T helper (Th17)-cells associated disease, cell-dominated meningeal inflammation, infections disease, genetic disorder, tissue typing/compatibility); vi) for any one of (i)-(v), wherein said use is an in vitro or ex vivo or in vivo use.
[00194] A better understanding of the present invention and of its advantages will be gained from the following examples, offered for illustrative purposes only. The examples are not intended to limit the scope of the present invention in any way. Accoringly, the invention is illustrated by the following examples, however, without being limited to the example or by any specific embodiment of the examples.
[00195] Examples of the invention [00196] Exampple 1 : Method procedures
[00197] 1. Obtain or produce single strand or double strand DNA/RNA with a barcode (subsequently can be called “barcoded DNA/RNA”). It can be synthesized or prepared from virus, bacteria, micro-beings, plants, animals, cells, single cell, tissue, biopsies, blood, or cultures. The barcodes include cell identifying barcodes, molecular identifying barcodes, DNA/RNA identifying barcodes, sample identifying barcodes, chemical identifying barcodes, protein identifying barcodes, quantification barcodes. Barcodes can be added to RNA/DNA by ligation, extraction, proliferation, transcription, amplification, reverse-transcription, DNA extension, antibody binding, PCR, qPCR, realtime PCR, Digital PCR(dPCR), Droplet Digital PCR(ddPCR), recombination, biotin capture, transposition, enzyme reaction, Exonuclease digestion, Endouclease digestion, digestion or 2nd strand synthesizing. Barcodes can be on the region 0-10000 bp from the 5’ or/and 3’ of DNA/RNA fragments. The end of the barcoded DNA/RNA can be blunt-end or sticky end. Barcorded DNA/RNA contain 0-100% of Deoxyribonucleic Acid or Ribonucleic Acid, and can have molecular modifications of fluorophores and dark quenchers labeling, nonfluorescent labeling, fluorescent labeling, biotinylation, avidinylation, attachment chemistry/linkers modifications, adenylation, spacer modifications, phosphorylation, phosphorothioate bonds, click chemistry modifications, base modifications (like 2-Aminopurine, 2,6-Dia inopurine (2-Amino-dA), 5-Bromo dU, deoxyUridine, Inverted dT, Inverted Dideoxy-T, Dideoxy-C, 5-Methyl dC, deoxyl nosine, Super T (5- hydroxybutynl-2’-deoxyuridine), Super G (8-aza-7-deazaguanosine), Locked nucleic acids — Affinity Plus modified bases, 5-Nitroindole, 2'-0-Methyl RNA Bases, Hydroxymethyl dC, Iso-dG, Iso-dC, Fluoro Bases, Fluoro C, Fluoro U, Fluoro A, Fluoro G, 2’-Omethoxy-ethyl Bases (2’- MOE), 2-MethoxyEthoxy A, 2-MethoxyEthoxy MeC, 2-MethoxyEthoxy G, 2-MethoxyEthoxy T). [00198] 2. (optional) Amplify barcoded DNA/RNA by ligation, extraction, proliferation, transcription, amplification, reverse-transcription, DNA extension, antibody binding, PCR, qPCR, realtime PCR, dPCR, ddPCR, recombination, biotin capture, transposition, enzyme reaction, transfection, culture, digestion Exonuclease digestion, Endouclease digestion, or 2nd strand synthesizing.
[00199] 3. (optional) Molecular modify the barcoded DNA/RNA by shearing, end repairing, “A” adding, ligation, extraction, proliferation, transcription, amplification, reverse-transcription, DNA extension, antibody binding, PCR, qPCR, realtime PCR, dPCR, ddPCR, recombination, biotin capture, transposition, end phosphating, enzyme reaction, single strand digestion, Exonuclease digestion, Endouclease digestion, digestion, 2nd strand synthesizing, luorophores and dark quenchers labeling, non-fluorescent labeling, fluorescent labeling, biotinylation, avidinylation, attachment chemistry/linkers modifications, Adenylation, spacers modifications, phosphorylation, phosphorothioate bonds, Click chemistry modifications, Bases Modifications(like 2-Aminopurine, 2,6-Diaminopurine (2-Amino-dA), 5-Bromo dU, deoxyUridine, Inverted dT, Inverted Dideoxy-T, Dideoxy-C, 5-Methyl dC, deoxyl nosine, Super T (5- hydroxybutynl-2’-deoxyuridine), Super G (8-aza-7-deazaguanosine), Locked nucleic acids — Affinity Plus modified bases, 5-Nitroindole, 2'-0-Methyl RNA Bases, Hydroxymethyl dC, Iso-dG, Iso-dC, Fluoro Bases, Fluoro C, Fluoro U, Fluoro A, Fluoro G, 2’-0-methoxy-ethyl Bases (2’- MOE), 2-MethoxyEthoxy A, 2-MethoxyEthoxy MeC, 2-MethoxyEthoxy G, 2-MethoxyEthoxy T).. [00200] 4. (optional) Target enrichment of a population of DNA/RNA molecules contains immune repertoire sequences by hybridization-Based Capture, PCR-Based Capture, biotin capture, Exonuclease digestion, Endouclease digestion, digestion, biotin capture, or PCR. [00201] 5. Self-circularize DNA/RNA from last step of DNA ligation, RNA ligation, T4 DNA Ligase, ere lox recombination, transposition, or DNA circulating Enzyme.
[00202] 6. (optional) Linerize DNA/RNA digestion by combination of 1-100 Exonuclease, Endonuclease, or/and chemical.
[00203] 7. (optional) Target enrichment of a population of DNA/RNA molecules contains immune repertoire sequences by Hybridization-Based Capture, PCR-Based Capture, biotin capture, Exonuclease digestion, Endouclease digestion, digestion, biotin capture, or PCR. [00204] 8. (optional) Molecular modify barcoded DNA/RNA by shearing, end repairing, “A” adding, ligation, extraction, proliferation, transcription, amplification, reverse-transcription, DNA extension, antibody binding, PCR, qPCR, realtime PCR, dPCR, ddPCR, recombination, biotin capture, transposition, end phosphating, enzyme reaction, single strand digestion, Exonuclease digestion, Endouclease digestion, digestion, 2nd strand synthesizing, luorophores and dark quenchers labeling, non-fluorescent labeling, fluorescent labeling, biotinylation, avidinylation, attachment chemistry/linkers modifications, Adenylation, spacers modifications, phosphorylation, phosphorothioate bonds, Click chemistry modifications, Bases Modifications(like 2-Aminopurine, 2,6-Diaminopurine (2-Amino-dA), 5-Bromo dU, deoxyUridine, Inverted dT, Inverted Dideoxy-T, Dideoxy-C, 5-Methyl dC, deoxyl nosine, Super T (5- hydroxybutynl-2’-deoxyuridine), Super G (8-aza-7-deazaguanosine), Locked nucleic acids — Affinity Plus modified bases, 5-Nitroindole, 2'-0-Methyl RNA Bases, Hydroxymethyl dC, Iso-dG, Iso-dC, Fluoro Bases, Fluoro C, Fluoro U, Fluoro A, Fluoro G, 2’-0-methoxy-ethyl Bases (2’- MOE), 2-MethoxyEthoxy A, 2-MethoxyEthoxy MeC, 2-MethoxyEthoxy G, 2-MethoxyEthoxy T). [00205] 9. (optional) Amplify barcoded DNA/RNA by ligation, extraction, proliferation, transcription, amplification, reverse-transcription, DNA extension, antibody binding, PCR, qPCR, realtime PCR, dPCR, ddPCR, recombination, biotin capture, transposition, enzyme reaction, transfection, culture, digestion, Exonuclease digestion, Endouclease digestion, or 2nd strand synthesizing.
[00206] 10. (optional) Self-circulate DNA/RNA from last step by DNA ligation, RNA ligation, T4 DNA Ligase, ere lox recombination, transposition, or DNA circulating Enzyme.
[00207] 11. (optional) Linear DNA/RNA digestion by combination of 1-100 Exonuclease, Endonuclease, or/and chemicals.
[00208] 12. (optional) Make the circulated DNA into linear DNA. This can be done by shearing, digestion, Exonuclease digestion, Endouclease digestion, or PCR.
[00209] 13. (optional) Target enrichment of population of DNA/RNA molecules contains immune repertoire sequences by Hybridization-Based Capture, PCR-Based Capture, biotin capture, Exonuclease digestion, Endouclease digestion, digestion, biotin capture, or PCR.
[00210] 14. (optional) Shearing, end repairing.
[00211] 15. (optional) Target enrichment of a population including immune repertoire by Hybridization-Based Capture, PCR-Based Capture, biotin capture, Exonuclease digestion, Endouclease digestion, digestion, biotin capture, or PCR.
[00212] 16. add “A” or/ and Adaptor ligation or/ and PCR amplification.
[00213] 17. (optional) Target enrichment of a population including immune repertoire by Hybridization-Based Capture, PCR-Based Capture, biotin capture, Exonuclease digestion, Endouclease digestion, digestion, biotin capture, or PCR.
[00214] 18. Sequencing with single-end or paired-end method by high-throughput sequencers. [00215] 19. Analysis sequencing result by combining barcodes and DNA/RNA sequences. Note:1) For Single cell immune repertoire sequencing, at least one of the target enrichments in step 4, 7, 13, 15 and 17 is required. 2) Between each step, 1-10 times of purification or fragments collection can be used. The purification can be performed by DNA/RNA combining columns, DNA/RNA combining beads, biotin capture or chemical precipitation.
Application examples:
[00216] The inventors use human single cell B cell receptor (BCR) immune repertoire sequences (IGH gene) from cDNA processed by 10x Genomics 3’ kit as a representative template to demonstrate applicability of the inventors method (Work flow is shown in Figure 3) [00217] 1. Obtain cDNA or cDNA amplification processed by 10x Genomics 3’ Kit. (optional) Phosphate cDNA [00218] Prepare Mixture:
Figure imgf000037_0001
[00219] Incubate at 37°C for 30 minutes, then at 65°C for 20 minutes to deactivate enzyme. [00220] Add in 30 pi Ampure XP, pipette 15 times, and incubate at room temperature for 5min, place tube on magnet until supernatant become clear. Remove and discard supernatant, Wash twice with 200 mI 80% Ethanol. Centrifuge tube briefly and return it to magnet. Remove and discard remaining ethanol, air dry for 1 minute. Beads are re-suspend in 15 mI Buffer EB, leave in room temperature for 2 minutes. Place tube on magnet until supernatant become clear, collect supernatant.
[00221] 2. Circulate cDNA [00222] Prepare Mixture:
Figure imgf000037_0002
[00223] Incubate at 16 °C for 16 hours. [00224] 3. Purification:
[00225] Add in 30 mI Ampure XP, pipette 15 times, incubate at room temperature for 5min, place tube on magnet until supernatant become clear. Remove and discard supernatant, Wash twice with 200 mI 80% Ethanol. Centrifuge tube briefly and return it to magnet. Remove and discard remaining ethanol, air dry for 1 minute. Beads are re-suspend in 15 mI Buffer EB, leave in room temperature for 2 minutes. Place tube on magnet until supernatant become clear, collect supernatant.
[00226] 4. Linear DNA digestion [00227] Prepare Mixture:
Figure imgf000038_0001
[00228] Incubate at 37 °C for 30 minutes. Then add in 0.7mI of 0.25M EDTA, and incubate at 70°C for 30 minutes.
[00229] 5. Purification
[00230] Add in 10 mI Ampure XP, pipette 15 times, incubate at room temperature for 5min, place tube on magnet until supernatant become clear. Remove and discard supernatant, Wash twice with 200 mI 80% Ethanol. Centrifuge tube briefly and return it to magnet. Remove and discard remaining ethanol, re-suspend in 15 mI Buffer EB, leave in room temperature for 2 minutes. Place tube on magnet until supernatant become clear, collect supernatant.
[00231] 6. Enrichment of V(D)J region of the BCR (IGH gene)
[00232] Prepare Mixture:
Figure imgf000038_0002
Run PCR program in a PCR cycler:
Figure imgf000038_0003
[00233] 7. Size selection and Purification
[00234] Add in 20 mI Ampure XP, pipette 15 times, incubate at room temperature for 5min, place tube on magnet until supernatant become clear. Transfer 75 mI supernatant into a new tube, add in 20 mI Ampure XP, pipette 15 times, incubate at room temperature for 10min, place it on magnet until supernatant become clear. Remove and discard supernatant, Wash twice with 200 mI 80% Ethanol. Centrifuge tube briefly and return it to magnet. Remove and discard remaining ethanol, beads are re-suspend in 35 mI Buffer EB, leave in room temperature for 2 minutes. Place tube on magnet until supernatant become clear, collect supernatant.
[00235] 8. (optional) nest enrichment of TCR and/or BCR [00236] Prepare Mixture:
Figure imgf000039_0001
Run PCR program in a PCR cycler:
Figure imgf000039_0002
[00237] Add in 20 mI Ampure XP, pipette 15 times, incubate at room temperature for 5min, place tube on magnet until supernatant become clear. Transfer 75 mI supernatant into a new tube, add in 20 mI Ampure XP, pipette 15 times, incubate at room temperature for 10min, place it on magnet until supernatant become clear. Remove and discard supernatant, Wash twice with 200 mI 80% Ethanol. Centrifuge tube briefly and return it to magnet. Remove and discard remaining ethanol, beads are re-suspend in 35 mI Buffer EB, leave in room temperature for 2 minutes. Place tube on magnet until supernatant become clear, collect supernatant.
[00238] 9. (optional) Shearing [00239] Prepare Mixture on ice:
Figure imgf000039_0003
[00240] Incubate at 37°C for 2.5 minutes, then at 65°C for 30minutes to deactivate enzyme. Add in 30 mI Ampure XP, pipette 15 times, incubate at room temperature for 5 min, place tube on magnet until supernatant become clear. Remove and discard supernatant, Wash twice with 200 mI 80% Ethanol. Centrifuge tube briefly and return it to magnet. Remove and discard remaining ethanol, air dry for 1 minute. Beads are re-suspend in 15 pi Buffer EB, leave in room temperature for 2 minutes. Place tube on magnet until supernatant become clear, collect supernatant.
[00241] 10. (optional) End repairing [00242] 11. (optional) Phosphate cDNA [00243] Prepare Mixture:
Figure imgf000040_0001
[00244] Incubate at 37°C for 30minutes, then at 65°C for 20minutes to deactivate enzyme. [00245] Add in 30 mI Ampure XP, pipette 15 times, incubate at room temperature for 5min, place tube on magnet until supernatant become clear. Remove and discard supernatant, Wash twice with 200 mI 80% Ethanol. Centrifuge tube briefly and return it to magnet. Remove and discard remaining ethanol, air dry for 1 minute. Beads are re-suspend in 15 mI Buffer EB, leave in room temperature for 2 minutes. Place tube on magnet until supernatant become clear, collect supernatant.
[00246] 12. Circularize enrichment fragments [00247] Prepare Mixture:
Figure imgf000040_0002
[00248] Incubate at 16 °C for 16 hours.
[00249] 13. Purification:
[00250] Add in 40 mI Ampure XP, pipette 15 times, incubate at room temperature for 5min, place tube on magnet until supernatant become clear. Remove and discard supernatant, Wash twice with 200 mI 80% Ethanol. Centrifuge tube briefly and return it to magnet. Remove and discard remaining ethanol, air dry for 1 minute. Beads are re-suspend in 15 mI Buffer EB, leave in room temperature for 2 minutes. Place tube on magnet until supernatant become clear, collect supernatant.
[00251] 14. Open circle [00252] Prepare Mixture:
Figure imgf000041_0001
Run PCR program in a PCR cycler:
Figure imgf000041_0002
[00253] 15. Size selection and Purification:
[00254] Add in 20-30 mI Ampure XP, pipette 15 times, incubate at room temperature for 5min, place tube on magnet until supernatant become clear. Transfer 75 mI supernatant into a new tube, add in 10-20 mI Ampure XP, pipette 15 times, incubate at room temperature for 10min, place it on magnet until supernatant become clear. Remove and discard supernatant, Wash twice with 200 mI 80% Ethanol. Centrifuge tube briefly and return it to magnet. Remove and discard remaining ethanol, beads are re-suspend in 35 mI Buffer EB, leave in room temperature for 2 minutes. Place tube on magnet until supernatant become clear, collect supernatant.
[00255] 16.(optional)Shearing [00256] Prepare Mixture on ice:
Figure imgf000041_0004
Figure imgf000041_0003
17. (optional) Shearing, end repairing and add A Prepare Mixture on ice:
Figure imgf000042_0001
18.(optional)add A
Prepare Mixture on ice;
Figure imgf000042_0002
[00257] 19. (optional) Purification:
[00258] Add in 40 pi Ampure XP, pipette 15 times, incubate at room temperature for 5min, place tube on magnet until supernatant become clear. Remove and discard supernatant, Wash twice with 200 mI 80% Ethanol. Centrifuge tube briefly and return it to magnet. Remove and discard remaining ethanol, air dry for 1 minute. Beads are re-suspend in 25 mI Buffer EB, leave in room temperature for 2 minutes. Place tube on magnet until supernatant become clear, collect supernatant.
[00259] 20. (optional) Size selection and Purification
[00260] Add in 20-30 mI Ampure XP, pipette 15 times, incubate at room temperature for 5min, place tube on magnet until supernatant become clear. Transfer 75 mI supernatant into a new tube, add in 10-20 mI Ampure XP, pipette 15 times, incubate at room temperature for 10min, place it on magnet until supernatant become clear. Remove and discard supernatant, Wash twice with 200 mI 80% Ethanol. Centrifuge tube briefly and return it to magnet. Remove and discard remaining ethanol, beads are re-suspend in 35 mI Buffer EB, leave in room temperature for 2 minutes. Place tube on magnet until supernatant become clear, collect supernatant.
[00261] 21. Ligate adaptor [00262] Prepare Mixture on ice:
Figure imgf000042_0003
[00263] Incubate at 20°C for 15minutes.
[00264] 22. Purification
[00265] Add in 40 mI Ampure XP, pipette 15 times, incubate at room temperature for 5min, place tube on magnet until supernatant become clear. Remove and discard supernatant, Wash twice with 200 mI 80% Ethanol. Centrifuge tube briefly and return it to magnet. Remove and discard remaining ethanol, air dry for 1 minute. Beads are re-suspend in 25 mI Buffer EB, leave in room temperature for 2 minutes. Place tube on magnet until supernatant become clear, collect supernatant.
[00266] 23. library amplification [00267] Prepare Mixture:
Figure imgf000043_0001
Run PCR program in a PCR cycler:
Figure imgf000043_0002
[00268] 25. Sequencing
[00269] A paired end sequencing is performed by lllumina sequencer. Sequencing stratergy PE150.
[00270] 26. Analysis
[00271] After sequenced data are demultiplexed, and fastq are generated. Change read2 sequences in a reverse-complemeted way. Result fastq were analysed by 10x Genomics cellranger or packages compatible to 10x Genomics 5’ VDJ kit.
[00272] 27. Results
[00273] The results are shown as BCR annotation in Figures 3, 7 and 8.
[00274] Figure 3 shows the BCR(IGH) annotation result and the statistics of the immune repertoire counting including V(D)J Annotation, Top 10 Clonotype frequencies and Top 10 Clonotype CDR3 sequences (e.g., SEQ ID NOs: 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) . The clonotype of single cell IGH sequenced in the course of the present invention. V(D)J: variation regions of BCR. IGK: Immunoglobulin light chain kappa. IGL: Immunoglobulin light chain lambda. IGH: Immunoglobulin heavy chain. Contig: a set of overlapping DNA segments that together represent a consensus region of DNA. CDR3: the main CDR(complementarity determining regions) responsible for recognizing processed antigen. V-J spanning pair: fraction of cell- associated barcodes with at least one contig for each chain of the receptor pair. Clonotype: The phenotype of a clone of a cell.
[00275] Figure 7 shows the validation of the data rate, defined by the percentage of data maps to BCR after data filtering and UMI adjusting. For the total data, valid rate is 98.95%, on average of cells, validation rate is 98.73%.
[00276] Figure 8 shows the BCR analysis data including statistic of sequencing, including BCR containing cell numbers, enrichment rate (Note: enrichment rate is based on total reads).
[00277] Example 2: Single cell TCR sequencing from mice 3’ single cell cDNA library.
[00278] We developed a novel method to reconstruct antigen receptor information from 3’ scRNA-seq by circulating the cDNA library and re-linearizing it with the cell barcode and UMI information attached at the other end of the cDNA molecule (e.g., TCR and Figure 9). All the primers were synthesized by Eurofins Genomics; the primer names, sequences (e.g., SEQ ID NOs: 1-2, 29-34). cDNA libraries were generated with 10x Genomics 3’ single cell RNA sequencing kit. For circularization 25 ng of cDNA libraries were end-phosphated with the T4 Polynucleotide Kinase (New England Biolabs), purified by 0.6x Ampure XP beads (Beckman Coulter). Then 1,000 units of T4 DNA Ligase (New England Biolabs) were added to self- circularize the phosphated cDNA in 25 pi total volume and at 16°C for 16 hours. Subsequently, 0.7x Ampure XP beads were used to purify. Remaining linear DNA was digested by 0.9 units/pl RecJf and 0.1 units/mI Lambda Exonuclease (both from New England Biolabs). Circularized cDNA libraries were purified by 0.7x Ampure XP beads. A PCR enrichment of the T cell receptor (TCR) variable region was performed with 25 mI Kapa hotstart amplification mix (KAPA Biosystems), 10 mI primer polyA and 5 mI primer Trxc rev poo!jout (e.g., SEQ ID NOs: 29, 32- 33). After purification with 0.8x Ampure XP beads, a nested PCR was performed with 25 mI Kapa hotstart amplification mix, 10 mI primer polyA and 5 mI primer Trxc rev pool Jn (e.g., SEQ ID NOs: 29, 30-31). A size selection was done by 0.5x - 0.8 x Ampure XP beads. Then, 25 ng of the nested PCR product were phosphated, circulated and linear digested again as above. A PCR with primers “readl” and “TSO” (e.g., SEQ ID NOs: 1, 34) was used to re-linearize the circulated library (Figure 9).
[00279] Program for double Annealing PCR:
Step Temperature Time
1 98° C 00:45:00
2 98° C 00:20:00
3 63° C 00:25:00 4 57° C 00:25:00
5 72° C 01 :00:00
6 72° C 01 :00:00
7 4°C pause
[00280] PCR products were purified by 0.5x - 0.8 x Ampure XP beads (Beckman Coulter) and libraries were prepared from them using the Chromium Single Cell 3’ Library Kit v3 (10x Genomics).
[00281] Results
[00282] The results are shown as TCR annotation in Figure 10, which shows the TCR annotation result and the statistics of the immune repertoire counting including V(D)J Annotation, Top 10 Clonotype frequencies and Top 10 Clonotype CDR3 sequences (e.g., SEQ ID NOs: 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28). V(D)J: variation regions of TCR. TRA: T cell Receptor Alpha. TRB: T cell Receptor Beta. Contig: a set of overlapping DNA segments that together represent a consensus region of DNA. CDR3: the main CDR (complementarity determining regions) responsible for recognizing processed antigen. V-J spanning pair: fraction of cell-associated barcodes with at least one contig for each chain of the receptor pair. Clonotype: The phenotype of a clone of a cell.
[00283] References:
1. Jaitin, D. A. et al. Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types. Science 343, 776-779 (2014).
2. Fan, H. C., Fu, G. K. & Fodor, S. P. A. Expression profiling. Combinatorial labeling of single cells for gene expression cytometry. Science 347, 1258367 (2015).
3. Macosko, E. Z. et al. Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. Cell 161 , 1202-1214 (2015).
4. Klein, A. M. et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161 , 1187-1201 (2015).
5. Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat Commun 8, 14049 (2017).
6. Cao, J. et al. Comprehensive single-cell transcriptional profiling of a multicellular organism. Science 357, 661-667 (2017).
7. Gierahn, T. M. et al. Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput. Nat. Methods 14, 395-398 (2017).
8. Habib, N. et al. Massively parallel single-nucleus RNA-seq with DroNc-seq. Nat. Methods 14, 955-958 (2017).
9. Rosenberg, A. B. et al. Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding. Science 360, 176-182 (2018). Sasagawa, Y. et al. Quartz-Seq2: a high-throughput single-cell RNA-sequencing method that effectively uses limited sequence reads. Genome Biol. 19, 29 (2018). Bose, S. et al. Scalable microfluidics for single-cell RNA printing and sequencing. Genome Biol. 16, 120 (2015). Han, X. et al. Mapping the Mouse Cell Atlas by Microwell-Seq. Cell 172, 1091-1107. e17 (2018). Tonegawa, S. Somatic generation of antibody diversity. Nature 302, 575-581 (1983). Alt, F. W. et al. Ordered rearrangement of immunoglobulin heavy chain variable region segments. EMBO J. 3, 1209-1219 (1984). Alt, F. W, Reth, M. G., Blackwell, T. K. & Yancopoulos, G. D. Regulation of immunoglobulin variable-region gene assembly. Mt. Sinai J. Med. 53, 166-169 (1986). Schatz, D. G., Oettinger, M. A. & Schlissel, M. S. V(D)J recombination: molecular biology and regulation. Annu. Rev. Immunol. 10, 359-383 (1992). Yancopoulos, G. D., Blackwell, T. K., Suh, H., Hood, L. & Alt, F. W. Introduced T cell receptor variable region gene segments recombine in pre-B cells: evidence that B and T cells use a common recombinase. Cell 44, 251-259 (1986). Goodwin, S., McPherson, J. D. & McCombie, W. R. Coming of age: ten years of nextgeneration sequencing technologies. Nat. Rev. Genet. 17, 333-351 (2016). Miner, B. E., Stoger, R. J., Burden, A. E, Laird, C. D. & Hansen, R. S. Molecular barcodes detect redundancy and contamination in hairpin-bisulfite PCR. Nucleic Acids Res. 32, e135 (2004). McCloskey, M. L., Stoger, R., Hansen, R. S. & Laird, C. D. Encoding PCR products with batchstamps and barcodes. Biochem. Genet. 45, 761-767 (2007). Kivioja, T. et al. Counting absolute numbers of molecules using unique molecular identifiers. Nat. Methods 9, 72-74 (2011). Shugay, M. et al. Towards error-free profiling of immune repertoires. Nat. Methods 11 , 653-655 (2014). Islam, S. et al. Quantitative single-cell RNA-seq with unique molecular identifiers. Nat. Methods 11, 163-166 (2014).

Claims

What is claimed is:
1. A method for producing a nucleic acid of interest (e.g., DNA, RNA or cDNA) carrying at least one specific barcode (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or at least 100 barcodes) at its 3’-end and/or 5’- end, said method comprising: a) circularizing (e.g., self-circularizing) said nucleic acid of interest into a circular nucleic acid, preferably said circularizing (e.g., self-circularizing) is carried out by the means of an enzymatic ligation (e.g., by the means of: DNA ligase or RNA ligase, e.g., DNA ligase having EC:6.5.1.1, EC:6.5.1.2, EC:6.5.1.6 or EC:6.5.1.7 enzymatic activity, RNA ligase having EC:6.5.1.3 enzymatic activity, 3'-phosphate/5'-hydroxy nucleic acid ligase having EC: 6.5.1.8 enzymatic activity); optionally, removing uncircularized (e.g., linear) nucleic acid after said circularizing, preferably said removing is carried out by the means of a nucleic acid digestion (e.g., by the means of: a nuclease/s, e.g., an exonuclease/s and/or endonuclease/s); b) enriching (e.g., amplifying) said target nucleic acid from said circularized (e.g., self- circularized) nucleic acid from step (a), preferably said enriching is carried out by the means of: PCR, NASBA, RCA, SPIA, HDA, SDA, LAMP, RPA, SMART, ICAN, SDA, EXPAR, NEMA or in-vitro transcription, further preferably said PCR is carried out with primer/s capable of hybridizing to constant region of said nucleic acid of interest (e.g., constant (C) region of B-cell receptor) under suitable conditions; c) optionally, circularizing (e.g., self-circularizing) the amplified nucleic acid from step (b) into a circular nucleic acid; optionally, removing uncircularized (e.g., linear) nucleic acid after said circularizing, preferably said removing is carried out by the means of a nucleic acid digestion (e.g., by the means of a nuclease); d) optionaly, linearizing the circularized nucleic acid from step (c), preferably said linearizing is carried out by the means of PCR and/or nucleic acid fragmentation (e.g., shearing); further preferably said nucleic acid fragmentation is a random/stochastic nucleic acid fragmentation; e) ligating an adapter sequence comprising at least one specific oligonucleotide (e.g., a sequencing primer) to the 3’-end and/or 5’-end of the linearized nucleic acid from step (b), (c) or (d); preferably said adapter sequence further comprising SEQ ID NO: 1 (e.g., Illumina read2 primer) sequence; f) optionally, amplifying the ligation product from step (e); preferably said amplifying is carried out by the means of PCR; further preferably said PCR is carried out with a primer hybridizing to said adapter sequence; g) optionally, sequencing the barcoded nucleic acid from step (e) or (f), wherein said nucleic acid of interest is sequenced together with said barcode (e.g., in a single- or paired-end sequencing), wherein said method is a 3’-end and/or 5’end sequencing method (e.g., high-throughput sequencing method).
2. The method of any one of the preceding claims, wherein the methods steps (a) to (e) or (f) or (g) are carried out consecutively.
3. The method of any one of the preceding claims, wherein said method comprises no amplification and/or molecular modification of said nucleic acid of interest prior to the circularizing of step (a).
4. The method of any one of the preceding claims, wherein said adapter sequence comprises no restriction sites for a restriction endonuclease (e.g., no Not1 restriction sites).
5. The method of any one of the preceding claims, comprising step (g), wherein said sequencing of step (g) is a single- or paired-end sequencing.
6. The method of any one of the preceding claims, wherein said nucleic acid of interest comprising at least 1 specific barcode sequence (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or at least 100 specific barcode sequences).
7. The method of any one of the preceding claims, wherein said nucleic acid of interest (e.g., a farthest nucleotide of the nucleic acid of interest to be sequenced) and said barcode are at least 100 nucleotides apart, e.g., at least about 500-700 nucleotides apart, e.g., at least about 700 nucleotides apart.
8. The method of any one of the preceding claims, wherein said method is/suitable for a short read sequencing (e.g., a short read high-throughput sequencing) , preferably with sequencing read length not longer than 1000 nucleotides.
9. The method of any one of the preceding claims comprising step (g), wherein said method has the sequencing read accuracy (e.g., single read-based, e.g., not consensus based) of at least 50%% (e.g., 60%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8% or 99.9%).
10. The method of any one of the preceding claims, comprising step (g), wherein said method has the time per sequencing run of at least 10 minute.
11. The method of any one of the preceding claims, wherein said nucleic acid is a plurality of nucleic acids (e.g., cDNA library), preferably said plurality of nucleic acids is derived from a single cell.
12. The method of any one of the preceding claims, wherein said nucleic acid is a plurality of nucleic acids (e.g., cDNA library), wherein said method, comprising step (g), is a method for multiplex sequencing of said plurality of nucleic acids.
13. The method of any one of the preceding claims, wherein said nucleic acid of interest is an amplification and/or reverse transcription product.
14. The method of any one of the preceding claims, comprising step (g), wherein said method is suitable for sequencing a variable region of a nucleic acid encoding an antigen receptor and/or an antibody, preferably said method is a method for sequencing of a nucleic acid encoding a variable region of an antigen receptor and/or an antibody.
15. The method of any one of the preceding claims, comprising step (g), wherein said method is suitable for determining a clone type frequency of an antigen receptor and/or antibody, preferably said method is a method for determining a clone type frequency of an antigen receptor and/or antibody.
16. The method of any one of the preceding claims, comprising step (g), wherein said method is suitable for a full length RNA sequencing (e.g., full length single cell RNA sequencing), preferably said method is a method for a full length single cell RNA target enrichment sequencing.
17. A nucleic acid of interest (e.g., DNA, RNA or cDNA) carrying at least one specific barcode (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or at least 100 barcodes) at its 3’ and/or 5’-end, produced by the method according to any one of preceding claims.
18. A composition or kit comprising the nucleic acid according to any one of preceding claims.
PCT/EP2021/070210 2020-07-20 2021-07-20 Circulation method to sequence immune repertoires of individual cells WO2022018055A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
LU101949 2020-07-20
LULU101949 2020-07-20

Publications (1)

Publication Number Publication Date
WO2022018055A1 true WO2022018055A1 (en) 2022-01-27

Family

ID=71944184

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2021/070210 WO2022018055A1 (en) 2020-07-20 2021-07-20 Circulation method to sequence immune repertoires of individual cells

Country Status (1)

Country Link
WO (1) WO2022018055A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023150277A1 (en) * 2022-02-03 2023-08-10 The Johns Hopkins University Methods for sequencing an immune cell receptor
WO2023184616A1 (en) * 2022-03-31 2023-10-05 立凌生物制药苏州有限公司 Method for detecting cloned tcr sequence and use thereof
WO2023240093A1 (en) * 2022-06-06 2023-12-14 Element Biosciences, Inc. Methods for assembling and reading nucleic acid sequences from mixed populations
US11859171B2 (en) 2013-04-17 2024-01-02 Agency For Science, Technology And Research Method for generating extended sequence reads

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160304954A1 (en) * 2013-12-11 2016-10-20 Accuragen, Inc. Compositions and methods for detecting rare sequence variants
WO2019033062A2 (en) * 2017-08-10 2019-02-14 Metabiotech Corporation Tagging nucleic acid molecules from single cells for phased sequencing
WO2019086531A1 (en) * 2017-11-03 2019-05-09 F. Hoffmann-La Roche Ag Linear consensus sequencing
US20190345488A1 (en) * 2016-10-01 2019-11-14 Berkeley Lights, Inc. Dna barcode compositions and methods of in situ identification in a microfluidic device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160304954A1 (en) * 2013-12-11 2016-10-20 Accuragen, Inc. Compositions and methods for detecting rare sequence variants
US20190345488A1 (en) * 2016-10-01 2019-11-14 Berkeley Lights, Inc. Dna barcode compositions and methods of in situ identification in a microfluidic device
WO2019033062A2 (en) * 2017-08-10 2019-02-14 Metabiotech Corporation Tagging nucleic acid molecules from single cells for phased sequencing
WO2019086531A1 (en) * 2017-11-03 2019-05-09 F. Hoffmann-La Roche Ag Linear consensus sequencing

Non-Patent Citations (32)

* Cited by examiner, † Cited by third party
Title
"Antibodies: A Laboratory Manual", 1988, COLD SPRING HARBOR LABORATORY
"Antibody Engineering Lab Manual", SPRINGER-VERLAG, article "Protein Sequence and Structure Analysis of Antibody Variable Domains"
"Sequences of Proteins of immunological Interest", 1991, US DEPARTMENT OF HEALTH AND HUMAN SERVICES
ALT, F. W. ET AL.: "Ordered rearrangement of immunoglobulin heavy chain variable region segments", EMBO J, vol. 3, 1984, pages 1209 - 1219, XP055277074
ALT, F. W.RETH, M. G.BLACKWELL, T. K.YANCOPOULOS, G. D.: "Regulation of immunoglobulin variable-region gene assembly", MT. SINAI J. MED., vol. 53, 1986, pages 166 - 169
BAIROCH A: "The ENZYME database in 2000", NUCLEIC ACIDS RES, vol. 28, 2000, pages 304 - 305
BOSE, S ET AL.: "Scalable microfluidics for single-cell RNA printing and sequencing", GENOME BIOL, vol. 16, 2015, pages 120
CAO, J ET AL.: "Comprehensive single-cell transcriptional profiling of a multicellular organism", SCIENCE, vol. 357, 2017, pages 661 - 667, XP055624798, DOI: 10.1126/science.aam8940
CHOTHIA ET AL., J. MOL. BIOL., vol. 227, 1992, pages 799 - 817
FAN, H. C.FU, G. K.FODOR, S. P. A.: "Expression profiling. Combinatorial labeling of single cells for gene expression cytometry", SCIENCE, vol. 347, 2015, pages 1258367
GIERAHN, T. M. ET AL.: "Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput", NAT. METHODS, vol. 14, 2017, pages 395 - 398
GOODWIN, S.MCPHERSON, J. D.MCCOMBIE, W. R.: "Coming of age: ten years of nextgeneration sequencing technologies", NAT. REV. GENET., vol. 17, 2016, pages 333 - 351, XP055544186, DOI: 10.1038/nrg.2016.49
HABIB, N ET AL.: "Massively parallel single-nucleus RNA-seq with DroNc-seq", NAT. METHODS, vol. 14, 2017, pages 955 - 958, XP055651390, DOI: 10.1038/nmeth.4407
HAN, X ET AL.: "Mapping the Mouse Cell Atlas by Microwell-Seq", CELL, vol. 172, 2018, pages 1091 - 1107
ISLAM, S ET AL.: "Quantitative single-cell RNA-seq with unique molecular identifiers", NAT. METHODS, vol. 11, 2014, pages 163 - 166, XP055614140, DOI: 10.1038/nmeth.2772
JAITIN, D. A. ET AL.: "Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types", SCIENCE, vol. 343, 2014, pages 776 - 779, XP055111761, DOI: 10.1126/science.1247651
JULIA BECK ET AL: "Genome Aberrations in Canine Mammary Carcinomas and Their Detection in Cell-Free Plasma DNA", PLOS ONE, vol. 8, no. 9, 30 September 2013 (2013-09-30), pages e75485, XP055704775, DOI: 10.1371/journal.pone.0075485 *
KIVIOJA, T ET AL.: "Counting absolute numbers of molecules using unique molecular identifiers", NAT. METHODS, vol. 9, 2011, pages 72 - 74, XP055401382, DOI: 10.1038/nmeth.1778
KLEIN, A. M. ET AL.: "Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells", CELL, vol. 161, 2015, pages 1187 - 1201, XP055731640, DOI: 10.1016/j.cell.2015.04.044
MACOSKO, E. Z. ET AL.: "Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets", CELL, vol. 161, 2015, pages 1202 - 1214, XP055586617, DOI: 10.1016/j.cell.2015.05.002
MCCLOSKEY, M. L.STAGER, R.HANSEN, R. S.LAIRD, C. D.: "Encoding PCR products with batchstamps and barcodes", BIOCHEM. GENET., vol. 45, 2007, pages 761 - 767, XP019548696, DOI: 10.1007/s10528-007-9114-x
MINER, B. E.STAGER, R. J.BURDEN, A. F.LAIRD, C. D.HANSEN, R. S.: "Molecular barcodes detect redundancy and contamination in hairpin-bisulfite PCR", NUCLEIC ACIDS RES., vol. 32, 2004, pages e135, XP002726256, DOI: 10.1093/NAR/GNH132
NEEDLEMANWUNSCH, J. MOL. BIOL., vol. 48, 1970, pages 443 - 453
RICE ET AL.: "EMBOSS: The European Molecular Biology Open Software Suite", TRENDS GENET, vol. 16, 2000, pages 276 - 277, XP004200114, DOI: 10.1016/S0168-9525(00)02024-2
ROSENBERG, A ET AL.: "Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding", SCIENCE, vol. 360, 2018, pages 176 - 182, XP055803532, DOI: 10.1126/science.aam8999
SASAGAWA, Y ET AL.: "Quartz-Seq2: a high-throughput single-cell RNA-sequencing method that effectively uses limited sequence reads", GENOME BIOL, vol. 19, 2018, pages 29
SCHATZ, D. G.OETTINGER, M. A.SCHLISSEL, M. S.: "V(D)J recombination: molecular biology and regulation", ANNU. REV. IMMUNOL., vol. 10, 1992, pages 359 - 383
SHUGAY, M ET AL.: "Towards error-free profiling of immune repertoires", NAT. METHODS, vol. 11, 2014, pages 653 - 655, XP055493585, DOI: 10.1038/nmeth.2960
TOMLINSON ET AL., EMBO J, vol. 14, 1995, pages 4628 - 4638
TONEGAWA, S: "Somatic generation of antibody diversity", NATURE, vol. 302, 1983, pages 575 - 581, XP055177607, DOI: 10.1038/302575a0
YANCOPOULOS, G. D.BLACKWELL, T. K.SUH, H.HOOD, L.ALT, F. W.: "Introduced T cell receptor variable region gene segments recombine in pre-B cells: evidence that B and T cells use a common recombinase", CELL, vol. 44, 1986, pages 251 - 259, XP023883645, DOI: 10.1016/0092-8674(86)90759-2
ZHENG, G. X. Y. ET AL.: "Massively parallel digital transcriptional profiling of single cells", NAT COMMUN, vol. 8, 2017, pages 14049, XP055503732, DOI: 10.1038/ncomms14049

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11859171B2 (en) 2013-04-17 2024-01-02 Agency For Science, Technology And Research Method for generating extended sequence reads
WO2023150277A1 (en) * 2022-02-03 2023-08-10 The Johns Hopkins University Methods for sequencing an immune cell receptor
WO2023184616A1 (en) * 2022-03-31 2023-10-05 立凌生物制药苏州有限公司 Method for detecting cloned tcr sequence and use thereof
WO2023240093A1 (en) * 2022-06-06 2023-12-14 Element Biosciences, Inc. Methods for assembling and reading nucleic acid sequences from mixed populations

Similar Documents

Publication Publication Date Title
US11155813B2 (en) Semi-random barcodes for nucleic acid analysis
WO2022018055A1 (en) Circulation method to sequence immune repertoires of individual cells
AU2020201691B2 (en) Methods of sequencing nucleic acids in mixtures and compositions related thereto
Head et al. Library construction for next-generation sequencing: overviews and challenges
US20080108804A1 (en) Method for modifying RNAS and preparing DNAS from RNAS
JP2017527313A5 (en)
US20220259649A1 (en) Method for target specific rna transcription of dna sequences
CN109576346B (en) Construction method and application of high-throughput sequencing library
CA3128098A1 (en) Haplotagging - haplotype phasing and single-tube combinatorial barcoding of nucleic acid molecules using bead-immobilized tn5 transposase
Jacob et al. Reaching the depth of the Chinese hamster ovary cell transcriptome
US20140336058A1 (en) Method and kit for characterizing rna in a composition
EP2820153B1 (en) Method of identifying vdj recombination products
JP2023153732A (en) Method for target specific rna transcription of dna sequences
Wulf et al. Chemical capping improves template switching and enhances sequencing of small RNAs
WO2022007863A1 (en) Method for rapidly enriching target gene region
KR20220164753A (en) floating barcode
Lu et al. Identification of full-length circular nucleic acids using long-read sequencing technologies
Head et al. Practical Guide
Liu 15.1 HISTORY OF RNA SEQUENCING
WO2022243192A1 (en) Method for parallel real-time sequence analysis
WO2023086818A1 (en) Target enrichment and quantification utilizing isothermally linear-amplified probes
Manchon et al. Targeting Long Non-Coding RNA splicing by novel candidate drug
Byrne Building a Better Transcriptome
Tu et al. Comparison of the experimental methods in haplotype sequencing via next generation sequencing
CN115820824A (en) Detection method for plant whole genome RNA-chromatin interaction

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21754924

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21754924

Country of ref document: EP

Kind code of ref document: A1