WO2013109935A1 - Method for analysis of immune variable sequences - Google Patents

Method for analysis of immune variable sequences Download PDF

Info

Publication number
WO2013109935A1
WO2013109935A1 PCT/US2013/022210 US2013022210W WO2013109935A1 WO 2013109935 A1 WO2013109935 A1 WO 2013109935A1 US 2013022210 W US2013022210 W US 2013022210W WO 2013109935 A1 WO2013109935 A1 WO 2013109935A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
immune repertoire
immune
sequences
c57bl
Prior art date
Application number
PCT/US2013/022210
Other languages
French (fr)
Inventor
David Scott Johnson
Andrea Loehr
Original Assignee
Gigagen, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gigagen, Inc. filed Critical Gigagen, Inc.
Publication of WO2013109935A1 publication Critical patent/WO2013109935A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/686Polymerase chain reaction [PCR]
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6881Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for tissue or cell typing, e.g. human leukocyte antigen [HLA] probes
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids

Definitions

  • This invention relates generally to immune repertoire specific reference sequences ("words”) that unambiguously identify genes and/or subgroups of genes, and the use of these words to analyze multiplexed immune repertoire data.
  • Immune systems are comprised of a huge diversity of immune cells, such as T cells and B cells.
  • Immune cell repertoires are comprised of millions of clones, which produce proteins that enable each cell to specifically recognize a single antigen. When the cells recognize that antigen, they produce an immune response.
  • Genetic analysis of millions of immune cells is useful in medicine and research, in part because components of an individual's immune system are indicative of health. Disregulation of the immune system is responsible for a variety of disorders including autoimmune diseases such as Crohn's disease, juvenile diabetes (Type 1 diabetes, TID), multiple sclerosis, rheumatoid arthritis, and systemic lupus erythromatosis (SLE). Immune monitoring is useful to better understand cancer, immunotherapy, and immune-competence. In addition, detailed analysis of the immune system can determine appropriate donors for organ transplants and monitor for signs of graft versus host disease (GVHD).
  • GVHD graft versus host disease
  • Antibodies are produced by recombined genomic immunoglobulin (Ig) sequences in B lineage cells.
  • Immunoglobulin light chains are derived from either /cor ⁇ genes.
  • the ⁇ genes are comprised of four constant (C) genes and approximately thirty variable (V) genes.
  • the ⁇ genes are comprised of one C gene and 250 V genes.
  • the heavy chain gene family is comprised of several hundred V genes, fifteen D genes, and four joining (J) genes. Somatic recombination during B cell differentiation randomly chooses one V-D-J combination in the heavy chain and one V-J combination in either /cor ⁇ light chain. Because there are so many genes, millions of unique combinations are possible.
  • V genes also undergo somatic hypermutation after recombination, generating further diversity. Despite this underlying complexity, it is possible to use dozens of primers targeting conserved sequences to sequence the full heavy and light chain complement in several multiplexed reactions (van Dongen et al, 2003 Leukemia 17: 2257-2317).
  • T cells use T cell receptors (TCR) to recognize antigens and control immune responses.
  • T cell receptor is composed of two subunits: a and ⁇ or ⁇ and ⁇ .
  • CDR3P complementary determining region 3 ⁇
  • V noncontiguous variable
  • D diversity
  • J joining
  • a published set of forty-five forward primers and thirteen reverse primers amplify the ⁇ 200bp recombined genomic CDR3P region for multiplex amplification of the full CDR3P complement of a sample of human peripheral blood mononuclear cells (Robins et al, 2009 Blood 114:4099- 4107; Robins et al., 2010 Science Translational Med 2:47ra64).
  • the CDR3 region begins with the second conserved cysteine in the 3' region of the ⁇ gene and ends with the conserved phenylalanine encoded by the 5' region of the ⁇ gene (Monod et al., 2004 Bioinformatics 20:i379-i385).
  • amplified sequences can be informatically translated to locate the conserved cysteine, obtain the intervening peptide sequence, and tabulate counts of each unique clone in the sample.
  • the data sets produced by multiplexed PCR or next generation sequencing are intrinsically complex and require advanced informatic processing.
  • the current invention discloses a robust, fast, and accurate computational method for processing immune repertoire data.
  • the present invention provides a method for immune repertoire sequence identification which comprises: comparing an unknown sequence to a plurality of immune repertoire specific reference sequences; if a portion of the unknown sequence matches an immune repertoire reference sequence, measuring a distance from the immune repertoire reference sequence to a conserved reference codon in the unknown sequence; and using the measured distance and the immune repertoire reference sequence to identify the unknown sequence.
  • the invention provides a computer-implemented method for immune repertoire sequence identification which comprises: comparing an unknown sequence to a plurality of immune repertoire specific reference sequences; if a portion of the unknown sequence matches an immune repertoire reference sequence, measuring a distance from the immune repertoire reference sequence to a conserved reference codon in the unknown sequence; and using the measured distance and the immune repertoire reference sequence to identify the unknown sequence.
  • the invention provides a system for immune repertoire sequence identification which comprises: an alignment module wherein an unknown sequence is aligned with a plurality of immune repertoire specific reference sequences; and a measurement module that measures and compares a distance from the immune repertoire reference sequence to a conserved reference codon in the unknown.
  • Figure 1 General schematic / flow chart with boxes/steps of the method of using the immune repertoire specific reference sequences.
  • FIG. 1 Example of immune repertoire specific reference sequences unique to TCR V and J genes used to identify V (TRBV19) and J (TRBJ1-2) genes in an unknown sequence.
  • the V tag is used to identify the start of the CDR3P peptide. (SEQ ID NOS:l-3)
  • FIG 3 Another example of immune repertoire specific reference sequences unique to TCR V and J genes used to identify V (TRBV6-6) and J (TRBJ2-7) genes in an unknown sequence.
  • two immune repertoire specific reference sequences are used to identify the TRBV6-6 genes. (SEQ ID NOS:4-6)
  • B cell refers to a type of lymphocyte that plays a large role in the humoral immune response (as opposed to the cell-mediated immune response, which is governed by T cells).
  • the principal functions of B cells are to make antibodies against antigens, perform the role of antigen-presenting cells (APCs) and eventually develop into memory B cells after activation by antigen interaction.
  • APCs antigen-presenting cells
  • B cells are an essential component of the adaptive immune system.
  • the term "bulk sequencing” or “next generation sequencing” or “massively parallel sequencing” refers to any high throughput sequencing technology that parallelizes the DNA sequencing process. For example, bulk sequencing methods are typically capable of producing more than one million polynucleic acid amplicons in a single assay.
  • the terms “bulk sequencing,” “massively parallel sequencing,” and “next generation sequencing” refer only to general methods, not necessarily to the acquisition of greater than 1 million sequence sequences in a single run.
  • Any bulk sequencing method can be implemented in the invention, such as reversible terminator chemistry (e.g., Illumina), pyrosequencing using polony emulsion droplets (e.g., Roche), ion semiconductor sequencing (IonTorrent), single molecule sequencing (e.g., Pacific Biosciences), massively parallel signature sequencing, etc.
  • reversible terminator chemistry e.g., Illumina
  • pyrosequencing using polony emulsion droplets e.g., Roche
  • IonTorrent ion semiconductor sequencing
  • single molecule sequencing e.g., Pacific Biosciences
  • massively parallel signature sequencing etc.
  • cell refers to a functional basic unit of living organisms.
  • a cell includes any kind of cell (prokaryotic or eukaryotic) from a living organism. Examples include, but are not limited to, mammalian mononuclear blood cells, yeast cells, or bacterial cells.
  • genes refers to a nucleic acid sequence that can be potentially transcribed and/or translated which may include the regulatory elements in 5' and 3', and the introns, if present. Examples of genes are TRBV10-6, TRBJ2-7. See “gene” at www.imgt.org.
  • group a set of genes which share the same gene type and participate potentially to the synthesis of a polypeptide of the same immunologic chain type.
  • a group includes the related pseudogenes and orphans.
  • a group is independent from the species. Groups are defined for the immunoglobulins (IG), T cell receptors (TR) and major histocompatibility complex (MHC) molecules, e.g. , TRBJ, TRBV and TRBD are part of the same group. See “group” at www.imgt.org.
  • ligase chain reaction refers to a type of DNA amplification where two DNA probes are ligated by a DNA ligase, and a DNA polymerase is used to amplify the resulting ligation product.
  • Traditional PCR methods are used to amplify the ligated DNA sequence.
  • mammal as used herein includes both humans and non-humans and include, but is not limited to, humans, non-human primates, canines, felines, murines, bo vines, equines, and porcines.
  • a heat- stable DNA polymerase such as Taq polymerase, is used.
  • the thermal cycling steps are necessary first to physically separate the two strands in a DNA double helix at a high temperature in a process called DNA melting. At a lower temperature, each strand is then used as the template in DNA synthesis by the DNA polymerase to selectively amplify the target DNA.
  • the selectivity of PCR results from the use of primers that are complementary to the DNA region targeted for amplification under specific thermal cycling conditions.
  • RT-PCR reverse transcriptase polymerase chain reaction
  • an RNA strand is first reverse transcribed into its DNA complement (complementary DNA or cDNA) using the enzyme reverse transcriptase, and the resulting cDNA is amplified using traditional PCR techniques.
  • subgroup refers to a set of IG or TR genes (C-gene, V-gene, D-gene or J-gene) which belong to the same group, in a given species, and which share at least 75% identity at the nucleotide level (in the germline configuration for V, D, and J), e.g. , TRBV6-1 and TRBV6-2 are genes in the TRBV6 subgroup. See “subgroup” in www.imgt.org.
  • T cell refers to a type of cell that plays a central role in cell-mediated immune response.
  • T cells belong to a group of white blood cells known as lymphocytes and can be distinguished from other lymphocytes, such as B cells and natural killer T (NKT) cells by the presence of a T cell receptor (TCR) on the cell surface.
  • T cells responses are antigen specific and are activated by foreign antigens.
  • T cells are activated to proliferate and differentiate into effector cells when the foreign antigen is displayed on the surface of the antigen-presenting cells in peripheral lymphoid organs.
  • T cells recognize fragments of protein antigens that have been partly degraded inside the antigen-presenting cell.
  • There are two main classes of T cells - cytotoxic T cells and helper T cells. Effector cytotoxic T cells directly kill cells that are infected with a virus or some other intracellular pathogen. Effector helper T cells help to stimulate the responses of other cells, mainly macrophages, B cells and cytotoxic T cells.
  • the current invention is a method for analysis of repertoires of immune variable sequences, in groups such as IG and TR.
  • This invention has broad applicability in many areas of biological analysis.
  • the useful applications can include cancer diagnostics, immunology, or infectious disease diagnostics.
  • the present invention provides a method for immune repertoire sequence identification which comprises: comparing an unknown sequence to a plurality of immune repertoire specific reference sequences; if a portion of the unknown sequence matches an immune repertoire reference sequence, measuring a distance from the immune repertoire reference sequence to a conserved reference codon in the unknown sequence; and using the measured distance and the immune repertoire reference sequence to identify the unknown sequence.
  • multiplexed polymerase chain reaction is used to amplify the unknown sequence.
  • massively parallel sequencing is used to sequence the unknown sequence.
  • the immune repertoire specific reference sequences are immunoglobulin IgH, immunoglobulin IgL, T cell receptor (TCR ) or T cell receptor (TCRoc) reference sequences.
  • the immune repertoire specific reference sequences are joining (J) gene or variable (V) gene reference sequences.
  • the conserved reference codon is a second conserved cysteine, a phenylalanine or a tryptophan codon.
  • the immune repertoire specific reference sequence or the conserved reference codon are selected from the sequences in Table 1 or the distance to the conserved reference codon is a distance selected from the distances to the reference codons in Table 1.
  • the invention provides a computer-implemented method for immune repertoire sequence identification which comprises: comparing an unknown sequence to a plurality of immune repertoire specific reference sequences; if a portion of the unknown sequence matches an immune repertoire reference sequence, measuring a distance from the immune repertoire reference sequence to a conserved reference codon in the unknown sequence; and using the measured distance and the immune repertoire reference sequence to identify the unknown sequence.
  • the invention provides a system for immune repertoire sequence identification which comprises: an alignment module wherein an unknown sequence is aligned with a plurality of immune repertoire specific reference sequences; and a measurement module that measures and compares a distance from the immune repertoire reference sequence to a conserved reference codon in the unknown.
  • Methods of the invention are applied to post-transplant immune monitoring whether autologous, allogeneic, syngeneic, or xenographic.
  • an allogeneic transplant i.e. , kidney, liver, or stem cells
  • a host's T cells response to transplants are assessed to monitor the health of the host and the graft.
  • Molecular monitoring of blood or urine is helpful to detect acute or chronic rejection before a biopsy would typically be indicated.
  • detection of alloantibodies to human leukocyte antigen (HLA) has been associated with chronic allograft rejection (Terasaki and Ozawa, 2004 American Journal of Transplantation 4:438-43).
  • molecular markers include b2-microglobulin, neopterin, and proinflammatory cytokines in urine and blood (Sabek et ah , 2002 Transplantation 74:701-7; Tatapudi et ah , 2004 Kidney International 65:2390; Matz et ah , 2006 Kidney International 69: 1683; Bestard et ah , 2010 Current Opinion in Organ Transplantation 15:467-473).
  • b2-microglobulin, neopterin, and proinflammatory cytokines in urine and blood (Sabek et ah , 2002 Transplantation 74:701-7; Tatapudi et ah , 2004 Kidney International 65:2390; Matz et ah , 2006 Kidney International 69: 1683; Bestard et ah , 2010 Current Opinion in Organ Transplantation 15:467-473).
  • none of these methods has become widely adopted in clinical practice, perhaps due to
  • Treg regulatory T cells
  • Th helper T cells
  • transplanting hematopoietic stem cells from HLA-mismatched donors into the recipient has resulted in long-term nonimmunosuppressive renal transplant tolerance up to 5 years after transplant (Kawai et al , 2008 NEJM 358:353-61).
  • Latent tuberculosis is a major global epidemic, affecting as many as 2 billion people worldwide. There is currently no reliable test for clinical diagnosis of latent TB. This technology gap has severe clinical consequences, since reactivated TB is the only reliable hallmark of latent TB. Furthermore, clinical trials for vaccines and therapies lack biomarkers for latent TB, and therefore must follow cohorts over many years to prove efficacy.
  • BCG Bacillus Calmette-Guerin
  • tuberculosis is a facultative intracellular pathogen
  • immunity is almost entirely mediated through T cells.
  • Interferon-g expressing T helper 1 (Thl) cells elicit primary TB response, with some involvement by T helper 2 cells (Th2).
  • Th2 T helper 2 cells
  • Treg regulatory T cell
  • Tmem memory T cells
  • eleven new vaccine candidates have entered clinical trials (Kaufmann, 2005 Trends in Immunology 26:660-67). These vaccines are all "post-exposure" vaccines, i.e. , they target T cell responses to latent TB and are intended to prevent disease reactivation. Because of the partial failure of BCG to induce full immunity, rational design and validation of future TB vaccines should include systematic analysis of the specific immune response to both TB and the new vaccines.
  • TST tuberculin skin test
  • T cell monitoring is used for diagnosis and monitoring of nearly any human disease.
  • diseases include but are not limited to, systemic lupus erythmatosis (SLE), allergy, autoimmune disease, heart transplants, liver transplants, bone marrow transplants, lung transplants, solid tumors, liquid tumors, myelodysplastic syndrome (MDS), chronic infection, acute infection, hepatitis, human papilloma virus (HPV), herpes simplex virus, cytomegalovirus (CMV), and human immunodeficiency virus (HIV).
  • SLE systemic lupus erythmatosis
  • MDS myelodysplastic syndrome
  • HPV human papilloma virus
  • CMV herpes simplex virus
  • CMV cytomegalovirus
  • HAV human immunodeficiency virus
  • T cell monitoring is used for research purposes using any non-human model system, such as zebrafish, mouse, rat, or rabbit. T cell monitoring also is used for research purposes using any human model system, such as primary T cell lines or immortal T cell lines.
  • Antibody therapeutics are increasingly used by pharmaceutical companies to treat intractable diseases such as cancer (Carter 2006 Nature Reviews Immunology 6:343-357).
  • the process of antibody drug discovery is expensive and tedious, requiring the identification of an antigen, and then the isolation and production of monoclonal antibodies with activity against the antigen.
  • Individuals that have been exposed to disease produce antibodies against antigens associated with that disease.
  • Humoral memory B cells help mammalian immune systems retain certain kinds of immunity. After exposure to an antigen and expansion of antibody-producing cells, Bmem cells survive for many years and contribute to the secondary immune response upon re-introduction of an antigen. Such immunity is typically measured in a cellular or antibody- based in vitro assay. In some cases, it is beneficial to detect immunity by amplifying, linking, and detecting IgH and light chain immunoglobulin variable regions in single B cells. Such a method is more specific and sensitive than current methods. Massively parallel B cell repertoire sequencing is used to screen for Bmem cells that contain a certain heavy and light chain pairing which is indicative of immunity.
  • B cell monitoring is used for diagnosis and monitoring of nearly any human disease.
  • diseases include, but are not limited to, systemic lupus erythmatosis (SLE), allergy, autoimmune disease, heart transplants, liver transplants, bone marrow transplants, lung transplants, solid tumors, liquid tumors, myelodysplastic syndrome (MDS), chronic infection, acute infection, hepatitis, human papilloma virus (HPV), herpes simplex virus (HSV), cytomegalovirus (CMV), and human immunodeficiency virus (HIV).
  • SLE systemic lupus erythmatosis
  • MDS myelodysplastic syndrome
  • HPV human papilloma virus
  • HSV herpes simplex virus
  • CMV cytomegalovirus
  • HMV human immunodeficiency virus
  • Such monitoring could include individual diagnosis and monitoring or population monitoring for epidemiological studies.
  • B cell monitoring is also used for research purposes using any non-human model system, such as zebrafish, mouse, rat, or rabbit.
  • B cell monitoring is used for research purposes using any human model system, such as primary B cell lines or immortal B cell lines.
  • TCR repertoire identification is a method for TCR repertoire identification. Because the TCR repertoire contains as many as 5xl0 6 clonotypes, and CDR3 regions often differ by only a few nucleotides, a sophisticated custom analysis platform is necessary just to identify the clones in the library. Turnkey fast-alignment methods such as BLAST (Altschul et al., 1990 J Mol Biol 215:403-410), BLAT (Kent 2002 Genome Research 12:656-64), and SOAP (Li et al, 2008, Bioinformatics 24:713-4) are inadequate for the task at hand, because they result in many spurious matches.
  • BLAST Altschul et al., 1990 J Mol Biol 215:403-410
  • BLAT Kent 2002 Genome Research 12:656-64
  • SOAP Li et al, 2008, Bioinformatics 24:713-4
  • TRBV12-3 and TRBV12-4 are identical over 97.7% of 347 bases; TRBV6-2 and TRBV6-3 are 99.7% identical over 344 bases.
  • TRBV12-3 and TRBV12-4 are identical over 97.7% of 347 bases; TRBV6-2 and TRBV6-3 are 99.7% identical over 344 bases.
  • this invention describes a method significantly faster than any current methods and which has the same accuracy as standard alignment methods.
  • the method starts with a table of nucleotide "words" often 4-23 base word or word pairs that uniquely identify the V and J genes of mouse or human within the amplified region.
  • the validity of each V gene match is tested by identifying the distance to and the sequence of a conserved codon, e.g., the second conserved cysteine in the case of TCR .
  • the match is accepted as correct only if both distance and cysteine sequence confirm the match.
  • Using data from our TCR repertoire sequencing experiments typically -99.98% of V- ⁇ combinations are identified unambiguously. The remaining reads are discarded.
  • the method identifies the protein coding sequence of the CDR3 region within the known reading frame for that particular gene. Some input sequences in the method may contain errors. To minimize our susceptibility to errors, the uniquely identifying words are as short as possible, therefore reducing the probability of identifying a gene incorrectly. This method ensures speed, accuracy and lowest error rates. The method may be used readily for other variable gene families, such as TCRa, HLA, or IgH.
  • the immune repertoire specific sequences are unique only in the area of and around the CDR3 region that is amplified, but not over the entire V or J genes (which are several hundred bases long).
  • the sequences are amplified with a method similar to Robins et al. Blood 114:4099-4107: "The Vbeta forward primer is anchored at position -43 in the Vbeta segment, relative to the recombination signal sequence.
  • the Jbeta reverse primers were designed to be anchored at their 3' ends on a consensus splice site motif.”
  • the optimized words may be longer than 4-8 bases.
  • the J-words are 4- 6 bases long; the V-words come in singles and pairs; the shortest single is 5 bases long, the longest single is 15 bases long; the shortest pair is 19 bases and the longest pair is 23 bases long.
  • Table 1 below provides a complete set of immune specific reference sequences, "words" for human TCR and exemplary words for human TCRa, human IgH.
  • V-words range from 6-13 bases with only one pair of 10 bases
  • J-words range from 4-6 bases.
  • SW Smith-Waterman alignment of two sequences of length m and n is O(nm).
  • SW aligns 76 base reads against 13 J-genes of median length of 21 at 20,748 time units per read.
  • the method described herein aligns 6 bases against 13 J-words of length 4 at 312 time units per read.
  • the method described herein aligns 43 bases against 50 V-words of median length 9 at 19,350 time units per read.
  • the J gene alignment is 66.5x faster than SW and the V gene 7.5x faster than SW.
  • the total processing cost is 165,148 time units per read for SW and 19,662 time units per read for the method described herein, making it 8.4x faster than SW.
  • one has to have the shortest possible words, while maintaining the greatest possible difference between them.
  • TABLE 1 lists exemplary sets of (i) gene names, (ii) the immune repertoire specific sequences, (iii) the nucleotide sequence for the conserved codons, and (iv) the positive (+) or negative (-) distances to the conserved codon.
  • two immune repertoire specific sequences are preferred for use in the methods described herein which are separated by a space e.g., TRVB4-2.
  • TRVB4-2 a space
  • the deposited sequence listing also provides complete sets of sequences for human and murine TCRp, TCRa, and IgH J and V genes (SEQ ID NOS: 41-595). See Table 2 below for the mapping of the SEQ ID NO to the sequence names.
  • One of ordinary skill could readily obtain such sequences from databases such as RefSeq (http://www.ncbi.nlm.nih.gov/gene/), the international ImMunoGeneTics information system® (http://www.imgt.org/), EMBL Nucleotide Sequence Database VBASE2 (http://www.vbase2.org/), or MRC Centre for Protein Engineering V BASE (http://vbase.mrc-cpe.cam.ac.uk/).
  • the computer-implemented method or system may be configured in either hardware, software, or both based on the types of applications needed and the hardware available.
  • Hardware examples of implementation include hardware implemented ASIC ("Application Specific Integrated Circuit"), SOC ("System on a Chip”), RISC ("Reduced Instruction Set Computing”) processor, general processor, DSP ("Digital Signal Processor”), etc.
  • software comprises an ordered listing of executable instructions for implementing logical functions, and may selectively be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that may selectively fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions.
  • a "computer-readable medium” is any means that may contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
  • the computer-readable medium may selectively be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. More specific (yet a non-exhaustive list of) examples of the computer-readable medium would include the following: an electrical connection (electronic) having one or more wires, a portable computer diskette (magnetic), a RAM (electronic), a read-only memory "ROM” (electronic), an erasable programmable read-only memory (EPROM or Flash memory) (electronic), an optical fiber (optical), and a portable compact disc read-only memory "CDROM” (optical).

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Analytical Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Zoology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Immunology (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Cell Biology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The immune system responds to disease by inducing cellular responses. Methods for sequencing and analysis of immune repertoires can be used to develop noninvasive diagnostics, high-value diagnostics that inform treatment regimens, and novel therapeutic agents. However, immune repertoires have up to 106 diversity, so computer analysis can be slow and error prone. In this invention, we identify unique immune repertoire reference sequences "words" that unambiguously identify genes and/or subgroups of genes, and then use these words to analyze multiplexed immune repertoire data.

Description

METHOD FOR ANALYSIS OF IMMUNE VARIABLE SEQUENCES
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional Appn. No. 61/588,878 filed January 20, 2012, which is hereby incorporated by reference in its entirety.
1. FIELD OF THE INVENTION
[0002] This invention relates generally to immune repertoire specific reference sequences ("words") that unambiguously identify genes and/or subgroups of genes, and the use of these words to analyze multiplexed immune repertoire data.
2. BACKGROUND OF THE INVENTION
2.1. Introduction
[0003] Immune systems are comprised of a huge diversity of immune cells, such as T cells and B cells. Immune cell repertoires are comprised of millions of clones, which produce proteins that enable each cell to specifically recognize a single antigen. When the cells recognize that antigen, they produce an immune response. Genetic analysis of millions of immune cells is useful in medicine and research, in part because components of an individual's immune system are indicative of health. Disregulation of the immune system is responsible for a variety of disorders including autoimmune diseases such as Crohn's disease, juvenile diabetes (Type 1 diabetes, TID), multiple sclerosis, rheumatoid arthritis, and systemic lupus erythromatosis (SLE). Immune monitoring is useful to better understand cancer, immunotherapy, and immune-competence. In addition, detailed analysis of the immune system can determine appropriate donors for organ transplants and monitor for signs of graft versus host disease (GVHD).
[0004] Antibodies are produced by recombined genomic immunoglobulin (Ig) sequences in B lineage cells. Immunoglobulin light chains are derived from either /cor λ genes. The λ genes are comprised of four constant (C) genes and approximately thirty variable (V) genes. In contrast, the κ genes are comprised of one C gene and 250 V genes. The heavy chain gene family is comprised of several hundred V genes, fifteen D genes, and four joining (J) genes. Somatic recombination during B cell differentiation randomly chooses one V-D-J combination in the heavy chain and one V-J combination in either /cor λ light chain. Because there are so many genes, millions of unique combinations are possible. The V genes also undergo somatic hypermutation after recombination, generating further diversity. Despite this underlying complexity, it is possible to use dozens of primers targeting conserved sequences to sequence the full heavy and light chain complement in several multiplexed reactions (van Dongen et al, 2003 Leukemia 17: 2257-2317).
[0005] T cells use T cell receptors (TCR) to recognize antigens and control immune responses. The T cell receptor is composed of two subunits: a and β or γ and δ. Much of the peptide variability of the TCR is encoded in complementary determining region 3β (CDR3P), which is formed by recombination between noncontiguous variable (V), diversity (D), and joining (J) genes in the b chain loci (Wang et al, 2010 PNAS 107: 1518-23). A published set of forty-five forward primers and thirteen reverse primers amplify the ~200bp recombined genomic CDR3P region for multiplex amplification of the full CDR3P complement of a sample of human peripheral blood mononuclear cells (Robins et al, 2009 Blood 114:4099- 4107; Robins et al., 2010 Science Translational Med 2:47ra64). The CDR3 region begins with the second conserved cysteine in the 3' region of the νβ gene and ends with the conserved phenylalanine encoded by the 5' region of the Ιβ gene (Monod et al., 2004 Bioinformatics 20:i379-i385). Thus, amplified sequences can be informatically translated to locate the conserved cysteine, obtain the intervening peptide sequence, and tabulate counts of each unique clone in the sample.
[0006] Several patent applications have published disclosing molecular methods for multiplexed immune repertoire analysis by PCR and deep sequencing. Han (WO 2009/137255) describes a protocol and primer system for amplification of immune repertoires. Lim et al. (WO 2005/059176) also describes a very similar multiplexed method. Fahem & Willis (WO 2010/053587) describes a molecular system and method for multiplexed molecular analysis of immune repertoires that is similar to Han and Lim. However, none of these disclosures describe a computational method for the rapid and accurate processing of immune repertoire data.
[0007] The data sets produced by multiplexed PCR or next generation sequencing are intrinsically complex and require advanced informatic processing. The current invention discloses a robust, fast, and accurate computational method for processing immune repertoire data.
3. SUMMARY OF THE INVENTION
[0008] In particular non- limiting embodiments, the present invention provides a method for immune repertoire sequence identification which comprises: comparing an unknown sequence to a plurality of immune repertoire specific reference sequences; if a portion of the unknown sequence matches an immune repertoire reference sequence, measuring a distance from the immune repertoire reference sequence to a conserved reference codon in the unknown sequence; and using the measured distance and the immune repertoire reference sequence to identify the unknown sequence.
[0009] In addition, the invention provides a computer-implemented method for immune repertoire sequence identification which comprises: comparing an unknown sequence to a plurality of immune repertoire specific reference sequences; if a portion of the unknown sequence matches an immune repertoire reference sequence, measuring a distance from the immune repertoire reference sequence to a conserved reference codon in the unknown sequence; and using the measured distance and the immune repertoire reference sequence to identify the unknown sequence.
[0010] Furthermore, the invention provides a system for immune repertoire sequence identification which comprises: an alignment module wherein an unknown sequence is aligned with a plurality of immune repertoire specific reference sequences; and a measurement module that measures and compares a distance from the immune repertoire reference sequence to a conserved reference codon in the unknown.
4. BRIEF DESCRIPTION OF THE FIGURES
[0011] Figure 1. General schematic / flow chart with boxes/steps of the method of using the immune repertoire specific reference sequences.
[0012] Figure 2. Example of immune repertoire specific reference sequences unique to TCR V and J genes used to identify V (TRBV19) and J (TRBJ1-2) genes in an unknown sequence. The V tag is used to identify the start of the CDR3P peptide. (SEQ ID NOS:l-3)
[0013] Figure 3. Another example of immune repertoire specific reference sequences unique to TCR V and J genes used to identify V (TRBV6-6) and J (TRBJ2-7) genes in an unknown sequence. Here, two immune repertoire specific reference sequences are used to identify the TRBV6-6 genes. (SEQ ID NOS:4-6)
5. DETAILED DESCRIPTION OF THE INVENTION 5.1. Definitions
[0014] Terms used in the claims and specification are defined as set forth below unless otherwise specified. [0015] The term "B cell" refers to a type of lymphocyte that plays a large role in the humoral immune response (as opposed to the cell-mediated immune response, which is governed by T cells). The principal functions of B cells are to make antibodies against antigens, perform the role of antigen-presenting cells (APCs) and eventually develop into memory B cells after activation by antigen interaction. B cells are an essential component of the adaptive immune system.
[0016] The term "bulk sequencing" or "next generation sequencing" or "massively parallel sequencing" refers to any high throughput sequencing technology that parallelizes the DNA sequencing process. For example, bulk sequencing methods are typically capable of producing more than one million polynucleic acid amplicons in a single assay. The terms "bulk sequencing," "massively parallel sequencing," and "next generation sequencing" refer only to general methods, not necessarily to the acquisition of greater than 1 million sequence sequences in a single run. Any bulk sequencing method can be implemented in the invention, such as reversible terminator chemistry (e.g., Illumina), pyrosequencing using polony emulsion droplets (e.g., Roche), ion semiconductor sequencing (IonTorrent), single molecule sequencing (e.g., Pacific Biosciences), massively parallel signature sequencing, etc.
[0017] The term "cell" refers to a functional basic unit of living organisms. A cell includes any kind of cell (prokaryotic or eukaryotic) from a living organism. Examples include, but are not limited to, mammalian mononuclear blood cells, yeast cells, or bacterial cells.
[0018] The term "gene" refers to a nucleic acid sequence that can be potentially transcribed and/or translated which may include the regulatory elements in 5' and 3', and the introns, if present. Examples of genes are TRBV10-6, TRBJ2-7. See "gene" at www.imgt.org.
[0019] The term "group" a set of genes which share the same gene type and participate potentially to the synthesis of a polypeptide of the same immunologic chain type. By extension, a group includes the related pseudogenes and orphans. A group is independent from the species. Groups are defined for the immunoglobulins (IG), T cell receptors (TR) and major histocompatibility complex (MHC) molecules, e.g. , TRBJ, TRBV and TRBD are part of the same group. See "group" at www.imgt.org.
[0020] The term "ligase chain reaction" or LCR refers to a type of DNA amplification where two DNA probes are ligated by a DNA ligase, and a DNA polymerase is used to amplify the resulting ligation product. Traditional PCR methods are used to amplify the ligated DNA sequence.
[0021] The term "mammal" as used herein includes both humans and non-humans and include, but is not limited to, humans, non-human primates, canines, felines, murines, bo vines, equines, and porcines.
[0022] The term "polymerase chain reaction" or PCR refers to a molecular biology technique for amplifying a DNA sequence from a single copy to several orders of magnitude (thousands to millions of copies). PCR relies on thermal cycling, which requires cycles of repeated heating and cooling of the reaction for DNA melting and enzymatic replication of the DNA. Primers (short DNA fragments, or oligonucleotides) containing sequences complementary to the target region of the DNA sequence and a DNA polymerase are key components to enable selective and repeated amplification. As PCR progresses, the DNA generated is itself used as a template for replication, setting in motion a chain reaction in which the DNA template is exponentially amplified. A heat- stable DNA polymerase, such as Taq polymerase, is used. The thermal cycling steps are necessary first to physically separate the two strands in a DNA double helix at a high temperature in a process called DNA melting. At a lower temperature, each strand is then used as the template in DNA synthesis by the DNA polymerase to selectively amplify the target DNA. The selectivity of PCR results from the use of primers that are complementary to the DNA region targeted for amplification under specific thermal cycling conditions.
[0023] The term "reverse transcriptase polymerase chain reaction" or RT-PCR refers to a type of PCR reaction used to generate multiple copies of a DNA sequence. In RT-PCR, an RNA strand is first reverse transcribed into its DNA complement (complementary DNA or cDNA) using the enzyme reverse transcriptase, and the resulting cDNA is amplified using traditional PCR techniques.
[0024] The term "subgroup" refers to a set of IG or TR genes (C-gene, V-gene, D-gene or J-gene) which belong to the same group, in a given species, and which share at least 75% identity at the nucleotide level (in the germline configuration for V, D, and J), e.g. , TRBV6-1 and TRBV6-2 are genes in the TRBV6 subgroup. See "subgroup" in www.imgt.org.
[0025] The term "T cell" refers to a type of cell that plays a central role in cell-mediated immune response. T cells belong to a group of white blood cells known as lymphocytes and can be distinguished from other lymphocytes, such as B cells and natural killer T (NKT) cells by the presence of a T cell receptor (TCR) on the cell surface. T cells responses are antigen specific and are activated by foreign antigens. T cells are activated to proliferate and differentiate into effector cells when the foreign antigen is displayed on the surface of the antigen-presenting cells in peripheral lymphoid organs. T cells recognize fragments of protein antigens that have been partly degraded inside the antigen-presenting cell. There are two main classes of T cells - cytotoxic T cells and helper T cells. Effector cytotoxic T cells directly kill cells that are infected with a virus or some other intracellular pathogen. Effector helper T cells help to stimulate the responses of other cells, mainly macrophages, B cells and cytotoxic T cells.
[0026] It must be noted that, as used in the specification and the appended claims, the singular forms "a," "an," and "the" include plural referents unless the context clearly dictates otherwise.
5.2. GENERAL METHODS
[0027] In one embodiment, the current invention is a method for analysis of repertoires of immune variable sequences, in groups such as IG and TR. This invention has broad applicability in many areas of biological analysis. The useful applications can include cancer diagnostics, immunology, or infectious disease diagnostics.
[0028] In particular non- limiting embodiments, the present invention provides a method for immune repertoire sequence identification which comprises: comparing an unknown sequence to a plurality of immune repertoire specific reference sequences; if a portion of the unknown sequence matches an immune repertoire reference sequence, measuring a distance from the immune repertoire reference sequence to a conserved reference codon in the unknown sequence; and using the measured distance and the immune repertoire reference sequence to identify the unknown sequence.
[0029] In one embodiment, multiplexed polymerase chain reaction is used to amplify the unknown sequence. In another embodiment, massively parallel sequencing is used to sequence the unknown sequence. In yet another embodiment, the immune repertoire specific reference sequences are immunoglobulin IgH, immunoglobulin IgL, T cell receptor (TCR ) or T cell receptor (TCRoc) reference sequences.
[0030] In one embodiment, the immune repertoire specific reference sequences are joining (J) gene or variable (V) gene reference sequences. In some embodiments, the conserved reference codon is a second conserved cysteine, a phenylalanine or a tryptophan codon. In other embodiments, the immune repertoire specific reference sequence or the conserved reference codon are selected from the sequences in Table 1 or the distance to the conserved reference codon is a distance selected from the distances to the reference codons in Table 1.
[0031] In addition, the invention provides a computer-implemented method for immune repertoire sequence identification which comprises: comparing an unknown sequence to a plurality of immune repertoire specific reference sequences; if a portion of the unknown sequence matches an immune repertoire reference sequence, measuring a distance from the immune repertoire reference sequence to a conserved reference codon in the unknown sequence; and using the measured distance and the immune repertoire reference sequence to identify the unknown sequence.
[0032] Furthermore, the invention provides a system for immune repertoire sequence identification which comprises: an alignment module wherein an unknown sequence is aligned with a plurality of immune repertoire specific reference sequences; and a measurement module that measures and compares a distance from the immune repertoire reference sequence to a conserved reference codon in the unknown.
[0033]
5.3. USE OF THE METHODS
[0034] Methods of the invention are applied to post-transplant immune monitoring whether autologous, allogeneic, syngeneic, or xenographic. After an allogeneic transplant (i.e. , kidney, liver, or stem cells), a host's T cells response to transplants are assessed to monitor the health of the host and the graft. Molecular monitoring of blood or urine is helpful to detect acute or chronic rejection before a biopsy would typically be indicated. For example, detection of alloantibodies to human leukocyte antigen (HLA) has been associated with chronic allograft rejection (Terasaki and Ozawa, 2004 American Journal of Transplantation 4:438-43). Other molecular markers include b2-microglobulin, neopterin, and proinflammatory cytokines in urine and blood (Sabek et ah , 2002 Transplantation 74:701-7; Tatapudi et ah , 2004 Kidney International 65:2390; Matz et ah , 2006 Kidney International 69: 1683; Bestard et ah , 2010 Current Opinion in Organ Transplantation 15:467-473). However, none of these methods has become widely adopted in clinical practice, perhaps due to low specificity and sensitivity. Prior work has shown that regulatory T cells (Treg) induce graft tolerance by down-regulating helper T cells (Th) (Graca et ah , 2002 Journal of Experimental Medicine 195: 1641). Additionally, transplanting hematopoietic stem cells from HLA-mismatched donors into the recipient has resulted in long-term nonimmunosuppressive renal transplant tolerance up to 5 years after transplant (Kawai et al , 2008 NEJM 358:353-61).
5.4. T Cell Analysis and Latent Tuberculosis Diagnosis
[0035] Latent tuberculosis (TB) is a major global epidemic, affecting as many as 2 billion people worldwide. There is currently no reliable test for clinical diagnosis of latent TB. This technology gap has severe clinical consequences, since reactivated TB is the only reliable hallmark of latent TB. Furthermore, clinical trials for vaccines and therapies lack biomarkers for latent TB, and therefore must follow cohorts over many years to prove efficacy.
[0036] The major current vaccine for tuberculosis, bacillus Calmette-Guerin (BCG), is an unreliable prophylactic. In a meta-analysis of dozens of epidemiological studies, the overall effect of BCG was 50% against TB infections, 78% against pulmonary TB, 64% against TB meningitis, and 71% against death due to TB infection (Colditz et al , 1994 JAMA 271 :698- 702). Additionally, the rapid rise in multidrug resistant TB has increased the need for new vaccine and immunotherapy approaches. Up to 90% of infected, immunocompetent individuals never progress to disease, resulting in the huge global latent TB reservoir (Kaufmann, 2005 Trends in Immunology 26:660-67).
[0037] Since tuberculosis is a facultative intracellular pathogen, immunity is almost entirely mediated through T cells. Interferon-g expressing T helper 1 (Thl) cells elicit primary TB response, with some involvement by T helper 2 cells (Th2). After primary response, the bacteria become latent, controlled by regulatory T cell (Treg) and memory T cells (Tmem). Recently, eleven new vaccine candidates have entered clinical trials (Kaufmann, 2005 Trends in Immunology 26:660-67). These vaccines are all "post-exposure" vaccines, i.e. , they target T cell responses to latent TB and are intended to prevent disease reactivation. Because of the partial failure of BCG to induce full immunity, rational design and validation of future TB vaccines should include systematic analysis of the specific immune response to both TB and the new vaccines.
[0038] For decades, the standard of care for diagnosis of latent tuberculosis has been the tuberculin skin test (TST) (Pai et al , 2004 Lancet Infectious Disease 4:761-76). More recently, two commercial in vitro interferon-g assays have been developed: the QuantiFERON-TB assay and the T SPOT-TB assay. These assays measure cell-mediated immunity by quantifying interferon-g released from T cells when challenged with a cocktail of tuberculosis antigens. Unfortunately, neither the TST nor the newer interferon-g tests is effective at distinguishing latent TB from cleared TB (Diel et al. , 2007 American Journal of Respir Crit Care Med 177: 1164-70). This is a significant problem because patients without clinical evidence of latent TB {i.e. , visualization of granulomas) but with positive TST or interferon-g test typically receive 6-9 months of isoniazide therapy, even though this empiric intervention is unnecessary in patients who have cleared primary infection and can cause serious complications such as liver failure.
[0039] Prior work has demonstrated that T cell responses are used to distinguish latent from active TB (Schuck et al , 2009 PLoS One 4:e5590). The premise of this prior work is that immune cells directed against TB antigens will be expanded in the memory T cell population if the TB is latent, but expanded in a helper T cell fraction if the TB is active. Functional T cell sequencing is used to distinguish latent TB from cleared TB.
5.5. T Cell Analysis and Diagnosing or Monitoring Disease
[0040] Similarly, functional T cell monitoring is used for diagnosis and monitoring of nearly any human disease. These diseases, include but are not limited to, systemic lupus erythmatosis (SLE), allergy, autoimmune disease, heart transplants, liver transplants, bone marrow transplants, lung transplants, solid tumors, liquid tumors, myelodysplastic syndrome (MDS), chronic infection, acute infection, hepatitis, human papilloma virus (HPV), herpes simplex virus, cytomegalovirus (CMV), and human immunodeficiency virus (HIV). Such monitoring includes individual diagnosis and monitoring or population monitoring for epidemiological studies.
[0041] T cell monitoring is used for research purposes using any non-human model system, such as zebrafish, mouse, rat, or rabbit. T cell monitoring also is used for research purposes using any human model system, such as primary T cell lines or immortal T cell lines.
5.6. B Cell Analysis and Drug Discovery
[0042] Antibody therapeutics are increasingly used by pharmaceutical companies to treat intractable diseases such as cancer (Carter 2006 Nature Reviews Immunology 6:343-357). However, the process of antibody drug discovery is expensive and tedious, requiring the identification of an antigen, and then the isolation and production of monoclonal antibodies with activity against the antigen. Individuals that have been exposed to disease produce antibodies against antigens associated with that disease. Thus, it is possible mine patient immune repertoires for specific antibodies that could be used for pharmaceutical development.
5.7. B Cell Analysis and Monitoring Immunity
[0043] Humoral memory B cells (Bmem) help mammalian immune systems retain certain kinds of immunity. After exposure to an antigen and expansion of antibody-producing cells, Bmem cells survive for many years and contribute to the secondary immune response upon re-introduction of an antigen. Such immunity is typically measured in a cellular or antibody- based in vitro assay. In some cases, it is beneficial to detect immunity by amplifying, linking, and detecting IgH and light chain immunoglobulin variable regions in single B cells. Such a method is more specific and sensitive than current methods. Massively parallel B cell repertoire sequencing is used to screen for Bmem cells that contain a certain heavy and light chain pairing which is indicative of immunity.
5.8. B Cell Analysis and Diagnosing and Monitoring Disease
[0044] B cell monitoring is used for diagnosis and monitoring of nearly any human disease. These diseases include, but are not limited to, systemic lupus erythmatosis (SLE), allergy, autoimmune disease, heart transplants, liver transplants, bone marrow transplants, lung transplants, solid tumors, liquid tumors, myelodysplastic syndrome (MDS), chronic infection, acute infection, hepatitis, human papilloma virus (HPV), herpes simplex virus (HSV), cytomegalovirus (CMV), and human immunodeficiency virus (HIV). Such monitoring could include individual diagnosis and monitoring or population monitoring for epidemiological studies.
[0045] B cell monitoring is also used for research purposes using any non-human model system, such as zebrafish, mouse, rat, or rabbit. B cell monitoring is used for research purposes using any human model system, such as primary B cell lines or immortal B cell lines.
[0046] The following Examples further illustrate the invention and are not intended to limit the scope of the invention.
6. EXAMPLES [0047] Here is a method for TCR repertoire identification. Because the TCR repertoire contains as many as 5xl06 clonotypes, and CDR3 regions often differ by only a few nucleotides, a sophisticated custom analysis platform is necessary just to identify the clones in the library. Turnkey fast-alignment methods such as BLAST (Altschul et al., 1990 J Mol Biol 215:403-410), BLAT (Kent 2002 Genome Research 12:656-64), and SOAP (Li et al, 2008, Bioinformatics 24:713-4) are inadequate for the task at hand, because they result in many spurious matches. Moreover, highly accurate turnkey methods such as the Smith- Waterman Algorithm (Smith and Waterman, 1981, Journal of Molecular Biology 147: 195- 197) are cumbersomely slow for this kind of analysis. Corbett et al. report a Germline Query (GQ) a program that uses BLAST and sequential queries for V genes, J genes, and D genes (Corbett et al., 1997, Journal of Molecular Biology 270: 587-597). The National Center for Biotechnology Information (NCBI) has a BLAST tool for immune sequences, IgBLAST (http://www.ncbi.nlm.nih.gov/igblast/). Finally, all of these methods would require a huge reference library (1015 diversity) of all possible CDR3 nucleotide sequences, which is a computational burden.
[0048] Specifically, common sequence aligners such as BLAST/BLAT, bwa (Burrows- Wheeler transform, Li and Durbin 2009 Bioinformatics, 25, 1754-1760), and SOAP allow sequences to contain errors while aligning them to a reference. While this is of advantage when aligning a large number of reads (for example in evolutionary biology), for immune repertoire analysis, this feature is undesirable.
[0049] The V and J gene sequences are very similar to each other within their group, often showing as little as one base difference in the CDR3 region. For example, TRBV12-3 and TRBV12-4 are identical over 97.7% of 347 bases; TRBV6-2 and TRBV6-3 are 99.7% identical over 344 bases. For repertoire analysis, one needs exact identification of the genes or subgroups of genes and cannot tolerate errors in the alignment.
[0050] To address these problems, this invention describes a method significantly faster than any current methods and which has the same accuracy as standard alignment methods. The method starts with a table of nucleotide "words" often 4-23 base word or word pairs that uniquely identify the V and J genes of mouse or human within the amplified region. Next, the validity of each V gene match is tested by identifying the distance to and the sequence of a conserved codon, e.g., the second conserved cysteine in the case of TCR . The match is accepted as correct only if both distance and cysteine sequence confirm the match. Using data from our TCR repertoire sequencing experiments, typically -99.98% of V-Ιβ combinations are identified unambiguously. The remaining reads are discarded.
[0051] In non-limiting embodiments there are two optional further quality control steps: (i) requiring that the CDR3 region must not contain any sequencing errors in the form of uncalled bases; or (ii) requiring that the CDR3 region is in frame as defined by the second conserved cysteine. In one embodiment, only if all quality tests are passed, the method identifies the protein coding sequence of the CDR3 region within the known reading frame for that particular gene. Some input sequences in the method may contain errors. To minimize our susceptibility to errors, the uniquely identifying words are as short as possible, therefore reducing the probability of identifying a gene incorrectly. This method ensures speed, accuracy and lowest error rates. The method may be used readily for other variable gene families, such as TCRa, HLA, or IgH.
[0052] The immune repertoire specific sequences, or "words", are unique only in the area of and around the CDR3 region that is amplified, but not over the entire V or J genes (which are several hundred bases long). In one embodiment, the sequences are amplified with a method similar to Robins et al. Blood 114:4099-4107: "The Vbeta forward primer is anchored at position -43 in the Vbeta segment, relative to the recombination signal sequence. [...] The Jbeta reverse primers were designed to be anchored at their 3' ends on a consensus splice site motif."
[0053] The optimized words may be longer than 4-8 bases. For human, the J-words are 4- 6 bases long; the V-words come in singles and pairs; the shortest single is 5 bases long, the longest single is 15 bases long; the shortest pair is 19 bases and the longest pair is 23 bases long. Table 1 below provides a complete set of immune specific reference sequences, "words" for human TCR and exemplary words for human TCRa, human IgH.
[0054] For mouse the V-words range from 6-13 bases with only one pair of 10 bases, the J-words range from 4-6 bases.
[0055] Words were optimized to minimize misidentification across genes, minimize their length (hence pairs and not one long one), and maximize "Hamming distance" to other words (Hamming, 1950, Bell System Tech. J. 29: 147-160). The human V-words have to be longer since there are large families (e.g. 6, 12) of V genes with very similar sequences. There are 3 pairs of V-genes that cannot be uniquely identified using this method, because their V and J genes are identical within the CDR3 region: TRBV12-3 and TRBV12-4, TRBV6-2 and TRBV6-3, and TRBV6-5 and TRBV6-6. In agreement with standard practice the method attributes the occurrence of either sequence to only TRBV12-4, TRBV6-3, and TRBV6-6.
[0056] The computational complexity for Smith-Waterman (SW) alignment of two sequences of length m and n is O(nm). SW aligns 76 base reads against 13 J-genes of median length of 21 at 20,748 time units per read. SW aligns 76 base reads against 50 V-genes of median length of 38 at 144,400 time units per read. The method described herein aligns 6 bases against 13 J-words of length 4 at 312 time units per read. The method described herein aligns 43 bases against 50 V-words of median length 9 at 19,350 time units per read.
[0057] Therefore, the J gene alignment is 66.5x faster than SW and the V gene 7.5x faster than SW.
[0058] The total processing cost is 165,148 time units per read for SW and 19,662 time units per read for the method described herein, making it 8.4x faster than SW. Preferably, one has to have the shortest possible words, while maintaining the greatest possible difference between them.
[0059] TABLE 1 lists exemplary sets of (i) gene names, (ii) the immune repertoire specific sequences, (iii) the nucleotide sequence for the conserved codons, and (iv) the positive (+) or negative (-) distances to the conserved codon. For some genes two immune repertoire specific sequences are preferred for use in the methods described herein which are separated by a space e.g., TRVB4-2. One of ordinary skill in the art will recognize that one readily could use the complementary sequence in the methods disclosed herein.
TABLE 1
Figure imgf000015_0001
Figure imgf000016_0001
Figure imgf000017_0001
Figure imgf000018_0001
[0060] The deposited sequence listing also provides complete sets of sequences for human and murine TCRp, TCRa, and IgH J and V genes (SEQ ID NOS: 41-595). See Table 2 below for the mapping of the SEQ ID NO to the sequence names. One of ordinary skill could readily obtain such sequences from databases such as RefSeq (http://www.ncbi.nlm.nih.gov/gene/), the international ImMunoGeneTics information system® (http://www.imgt.org/), EMBL Nucleotide Sequence Database VBASE2 (http://www.vbase2.org/), or MRC Centre for Protein Engineering V BASE (http://vbase.mrc-cpe.cam.ac.uk/).
6.1. COMPUTER IMPLEMENTED METHODS
[0061] The computer-implemented method or system may be configured in either hardware, software, or both based on the types of applications needed and the hardware available. Hardware examples of implementation include hardware implemented ASIC ("Application Specific Integrated Circuit"), SOC ("System on a Chip"), RISC ("Reduced Instruction Set Computing") processor, general processor, DSP ("Digital Signal Processor"), etc.
[0062] The various implementations of the subject matter disclosed herein may be implemented in hardware, software, or both. In the present context, software comprises an ordered listing of executable instructions for implementing logical functions, and may selectively be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that may selectively fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. In the context of this document, a "computer-readable medium" is any means that may contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. The computer-readable medium may selectively be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. More specific (yet a non-exhaustive list of) examples of the computer-readable medium would include the following: an electrical connection (electronic) having one or more wires, a portable computer diskette (magnetic), a RAM (electronic), a read-only memory "ROM" (electronic), an erasable programmable read-only memory (EPROM or Flash memory) (electronic), an optical fiber (optical), and a portable compact disc read-only memory "CDROM" (optical).
[0063] While in the foregoing detailed description this invention has been described in relation to certain preferred embodiments thereof, and many details have been set forth for purposes of illustration, it will be apparent to those skilled in the art that the invention is susceptible to additional embodiments and that certain of the details described herein can be varied considerably without departing from the basic principles of the invention.
[0064] It also is to be understood that, while the invention has been described in conjunction with the detailed description, thereof, the foregoing description is intended to illustrate and not limit the scope of the invention. Other aspects, advantages, and modifications of the invention are within the scope of the claims set forth below. All publications, patents, and patent applications cited in this specification are herein incorporated by reference as if each individual publication or patent application were specifically and individually indicated to be incorporated by reference.
TABLE 2
Figure imgf000020_0001
77 >IgHV3-66
78 >IgHV3-72
79 >IgHV3-73
80 >IgHV3-74
81 >IgHV4-b
82 >IgHV4-4
83 >IgHV4-28
84 >IgHV4-30-2
85 >IgHV4-30-4
86 >IgHV4-31
87 >IgHV4-34
88 >IgHV4-39
89 >IgHV4-59
90 >IgHV4-61
91 >IgHV5-a
92 >IgHV5-51
93 >IgHV6-l
94 >IgHV7-4-l
>gi 1224589805 : 23012122-23012183 Homo sapiens chromosome 14, GRCh37.p5
95 Primary Assembly, TRAJ3
>gi 1224589805 : 23011142-23011204 Homo sapiens chromosome 14, GRCh37.p5
96 Primary Assembly, TRAJ4
>gi 1224589805 : 23009190-23009249 Homo sapiens chromosome 14, GRCh37.p5
97 Primary Assembly, TRAJ5
>gi 1224589805 : 23008016-23008077 Homo sapiens chromosome 14, GRCh37.p5
98 Primary Assembly, TRAJ6
>gi 1224589805 : 23006567-23006625 Homo sapiens chromosome 14, GRCh37.p5
99 Primary Assembly, TRAJ7
>gi 1224589805 : 23005092-23005151 Homo sapiens chromosome 14, GRCh37.p5
100 Primary Assembly, TRAJ8
>gi 1224589805 : 23004502-23004562 Homo sapiens chromosome 14, GRCh37.p5
101 Primary Assembly, TRAJ9
>gi 1224589805 : 23002445-23002508 Homo sapiens chromosome 14, GRCh37.p5
102 Primary Assembly, TRAJ10
>gi 1224589805 : 23001452-23001511 Homo sapiens chromosome 14, GRCh37.p5
103 Primary Assembly, TRAJll
>gi 1224589805 : 23000889-23000948 Homo sapiens chromosome 14, GRCh37.p5
104 Primary Assembly, TRAJ12
>gi 1224589805 : 23000026-23000088 Homo sapiens chromosome 14, GRCh37.p5
105 Primary Assembly, TRAJ13
>gi 1224589805 : 22999278-22999329 Homo sapiens chromosome 14, GRCh37.p5
106 Primary Assembly, TRAJ14
>gi 1224589805 : 22998580-22998641 Homo sapiens chromosome 14, GRCh37.p5
107 Primary Assembly, TRAJ15
>gi 1224589805 : 22997487-22997546 Homo sapiens chromosome 14, GRCh37.p5
108 Primary Assembly, TRAJ16
>gi 1224589805 : 22995812-22995874 Homo sapiens chromosome 14, GRCh37.p5
109 Primary Assembly, TRAJ17
>gi 1224589805 : 22994620-22994685 Homo sapiens chromosome 14, GRCh37.p5
110 Primary Assembly, TRAJ18
>gi 1224589805 : 22993296-22993352 Homo sapiens chromosome 14, GRCh37.p5
111 Primary Assembly, TRAJ20
112 >gi 1224589805 : 22992573-22992627 Homo sapiens chromosome 14, GRCh37.p5 Primary Assembly, TRAJ21
>gi 1224589805:22991016-22991078 Homo sapiens chromosome 14, GRCh37 p5
113 Primary Assembly, TRAJ22
>gi 1224589805:22989394-22989456 Homo sapiens chromosome 14, GRCh37 p5
114 Primary Assembly, TRAJ23
>gi 1224589805:22988947-22989009 Homo sapiens chromosome 14, GRCh37 p5
115 Primary Assembly, TRAJ24
>gi 1224589805:22987424-22987483 Homo sapiens chromosome 14, GRCh37 p5
116 Primary Assembly, TRAJ26
>gi 1224589805:22985251-22985309 Homo sapiens chromosome 14, GRCh37 p5
117 Primary Assembly, TRAJ27
>gi 1224589805:22984601-22984666 Homo sapiens chromosome 14, GRCh37 p5
118 Primary Assembly, TRAJ28
>gi 1224589805:22982921-22982980 Homo sapiens chromosome 14, GRCh37 p5
119 Primary Assembly, TRAJ29
>gi 1224589805:22981834-22981890 Homo sapiens chromosome 14, GRCh37 p5
120 Primary Assembly, TRAJ30
>gi 1224589805:22979951-22980007 Homo sapiens chromosome 14, GRCh37 P5
121 Primary Assembly, TRAJ31
>gi 1224589805:22978325-22978390 Homo sapiens chromosome 14, GRCh37 P5
122 Primary Assembly, TRAJ32
>gi 1224589805:22977587-22977643 Homo sapiens chromosome 14, GRCh37 P5
123 Primary Assembly, TRAJ33, tryptophane no phe
>gi 1224589805:22976651-22976708 Homo sapiens chromosome 14, GRCh37 P5
124 Primary Assembly, TRAJ34
>gi 1224589805:22974096-22974155 Homo sapiens chromosome 14, GRCh37 P5
125 Primary Assembly, TRAJ36
>gi 1224589805:22972735-22972797 Homo sapiens chromosome 14, GRCh37 P5
126 Primary Assembly, TRAJ37
>gi 1224589805:22971215-22971276 Homo sapiens chromosome 14, GRCh37 P5
127 Primary Assembly, TRAJ38, tryptophane no phe
>gi 1224589805:22970585-22970647 Homo sapiens chromosome 14, GRCh37 P5
128 Primary Assembly, TRAJ39
>gi 1224589805:22968672-22968732 Homo sapiens chromosome 14, GRCh37 P5
129 Primary Assembly, TRAJ40
>gi 1224589805:22966642-22966703 Homo sapiens chromosome 14, GRCh37 P5
130 Primary Assembly, TRAJ41
>gi 1224589805:22965872-22965937 Homo sapiens chromosome 14, GRCh37 P5
131 Primary Assembly, TRAJ42
>gi 1224589805:22964896-22964949 Homo sapiens chromosome 14, GRCh37 P5
132 Primary Assembly, TRAJ43
>gi 1224589805:22963806-22963868 Homo sapiens chromosome 14, GRCh37 P5
133 Primary Assembly, TRAJ44
>gi 1224589805:22962911-22962976 Homo sapiens chromosome 14, GRCh37 P5
134 Primary Assembly, TRAJ45
>gi 1224589805:22962389-22962451 Homo sapiens chromosome 14, GRCh37 P5
135 Primary Assembly, TRAJ46
>gi 1224589805:22961838-22961894 Homo sapiens chromosome 14, GRCh37 P5
136 Primary Assembly, TRAJ47
>gi 1224589805:22959479-22959541 Homo sapiens chromosome 14, GRCh37 P5
137 Primary Assembly, TRAJ48
>gi 1224589805:22958476-22958531 Homo sapiens chromosome 14, GRCh37 P5
138 Primary Assembly, TRAJ49
>gi 1224589805:22957581-22957640 Homo sapiens chromosome 14, GRCh37 P5
139 Primary Assembly, TRAJ50
>gi 1224589805:22955216-22955284 Homo sapiens chromosome 14, GRCh37 P5
140 Primary Assembly, TRAJ52
>gi 1224589805:22951993-22952058 Homo sapiens chromosome 14, GRCh37 P5
141 Primary Assembly, TRAJ53 >gi 1224589805 : 22951276-22951335 Homo sapiens chromosome 14, GRCh37.p5
142 Primary Assembly, TRAJ54
>gi 1224589805 : 22948510-22948571 Homo sapiens chromosome 14, GRCh37.p5
143 Primary Assembly, TRAJ56
>gi 1224589805 : 22947861-22947923 Homo sapiens chromosome 14, GRCh37.p5
144 Primary Assembly, TRAJ57
145 >TRAV1-1
146 >TRAVl-2
147 >TRAV2
148 >TRAV3
149 >TRAV4
150 >TRAV5
151 >TRAV6
152 >TRAV7
153 >TRAV8-1
154 >TRAV8-2
155 >TRAV8-3
156 >TRAV8-4
157 >TRAV8-6
158 >TRAV9-1
159 >TRAV9-2
160 >TRAV10
161 >TRAV12-1
162 >TRAV12-2
163 >TRAV12-3
164 >TRAV13-1
165 >TRAV13-2
166 >TRAV14/DV4
167 >TRAV16
168 >TRAV17
169 >TRAV18
170 >TRAV19
171 >TRAV20
172 >TRAV21
173 >TRAV22
174 >TRAV23/DV6
175 >TRAV24
176 >TRAV25
177 >TRAV26-1
178 >TRAV26-2
179 >TRAV27
180 >TRAV29/DV5
181 >TRAV30
182 >TRAV34
183 >TRAV35
184 >TRAV36/DV7 185 >TRAV38-1
186 >TRAV38-2/DV8
187 >TRAV39
188 >TRAV40
189 >TRAV41
190 >TRBJ1-1 |gi 189027696: 41921990-41922037
191 >TRBJl-2 |gi 189027696: 41922127-41922174
192 >TRBJl-3 |gi 189027696: 41922740-41922789
193 >TRBJl-4 |gi 189027696: 41923335-41923385
194 >TRBJl-5 |gi 189027696: 41923608-41923657
195 >TRBJl-6 |gi 189027696: 41924098-41924150
196 >TRBJ2-1 |gi 1224589819: 142494049-142494098
197 >TRBJ2-2 |gi 1224589819: 142494244-142494294
198 >TRBJ2-3 |gi | 224589819: 142494531-142494579
199 >TRBJ2-4 |gi 1224589819: 142494682-142494731
200 >TRBJ2-5 |gi 1224589819: 142494803-142494922
201 >TRBJ2-6 |gi | 224589819: 142494895- 142494975
202 >TRBJ2-7 |gi 1224589819: 142495140-142495186
203 >TRBV2 |gi 1224589819: 142000821-142001255
204 >TRBV4-1 |gi 1224589819: 142013036-142013489
205 >TRBV4-2 |gi 1224589819: 142045363-142045816
206 >TRBV4-3 |gi 189027696: 41436393-41436846
207 >TRBV5-1 |gi 1224589819: 142020894-142021363
208 >TRBV5-3 |gi 1224589819 :cl42242747-142242281
209 >TRBV5-4 |gi 1224589819 :cl42168844-142168380
210 >TRBV5-5 |gi 1224589819 :cl42149392-142148928
211 >TRBV5-6 |gi 1224589819 :cl42131877-142131412
212 >TRBV5-7 |gi 1224589819 :cl42111859-142111393
213 >TRBV5-81 gi 189027696 :41635991-41636457
214 >TRBV6-1 |gi 1224589819: 142028178-142028610
215 >TRBV6-2 |gi 189027696: 41422953-41423385
216 >TRBV6-3 |gi 189027696: 41444634-41445066
217 >TRBV6-4 |gi 1224589819 :cl42251137-142250703
218 >TRBV6-5 |gi 1224589819 :cl42180950-142180515
219 >TRBV6-6 |gi 1224589819 :cl42162363-142161931
220 >TRBV6-7 |gi 1224589819 :cl42144084-142143652
221 >TRBV6-8 |gi 1224589819 :cl42124565-142124137
222 >TRBV6-9 |gi 1224589819 :cl42104553-142104121
223 >TRBV7-1 |gi 1224589819: 142032039-142032527
224 >TRBV7-2 |gi 189027696: 41448267-41448763
225 >TRBV7-3 |gi 1224589819 :cl42247565-142247109
226 >TRBV7-4 |gi 1224589819 :cl42176790-142176329
227 >TRBV7-6 |gi 1224589819 :cl42139770-142139278
228 >TRBV7-7 |gi 1224589819 :cl42120321-142119820 229 >TRBV7-8 |gi 1224589819 :cl42099938-142099455
230 >TRBV7-9 |gi 189027696: 41645192-41645664
231 >TRBV9 |gi 1224589819 :cl42240011-142239537
232 >TRBV2 |gi 1224589819: 142000821-142001255
233 >TRBV4-1 gi 1224589819: 142013036-142013489
234 >TRBV4-2 gi 1224589819: 142045363-142045816
235 >TRBV4-3 gi 189027696: 41436393-41436846
236 >TRBV5-1 gi 1224589819: 142020894-142021363
237 >TRBV5-3 gi 1224589819 :cl42242747-142242281
238 >TRBV5-4 gi 1224589819 :cl42168844-142168380
239 >TRBV5-5 gi 1224589819 :cl42149392-142148928
240 >TRBV5-6 gi 1224589819 :cl42131877-142131412
241 >TRBV5-7 gi 1224589819 :cl42111859-142111393
242 >TRBV5-8 gi 189027696 :41635991-41636457
243 >TRBV6-1 gi 1224589819: 142028178-142028610
244 >TRBV6-2 gi 189027696: 41422953-41423385
245 >TRBV6-3 gi 189027696: 41444634-41445066
246 >TRBV6-4 gi 1224589819 :cl42251137-142250703
247 >TRBV6-5 gi 1224589819 :cl42180950-142180515
248 >TRBV6-6 gi 1224589819 :cl42162363-142161931
249 >TRBV6-7 gi 1224589819 :cl42144084-142143652
250 >TRBV6-8 gi 1224589819 :cl42124565-142124137
251 >TRBV6-9 gi 1224589819 :cl42104553-142104121
252 >TRBV7-1 gi 1224589819: 142032039-142032527
253 >TRBV7-2 gi 189027696: 41448267-41448763
254 >TRBV7-3 gi 1224589819 :cl42247565-142247109
255 >TRBV7-4 gi 1224589819 :cl42176790-142176329
256 >TRBV7-6 gi 1224589819 :cl42139770-142139278
257 >TRBV7-7 gi 1224589819 :cl42120321-142119820
258 >TRBV7-8 gi 1224589819 :cl42099938-142099455
259 >TRBV7-9 gi 189027696: 41645192-41645664
260 >TRBV91 gi 224589819 :cl42240011-142239537
261 >TRBV10-1 gi 1224589819 :cl 42232022- 142231573
262 >TRBV10-2 gi 1224589819 :cl 42206960-142206511
263 >TRBV10-3 gi 189027696 :41660123-41660572
264 >TRBV11-1 gi 1224589819 :cl 42224267- 142223790
265 >TRBVll-2 gi 1224589819 :cl 42198008- 142197570
266 >TRBVll-3 gi 189027696 :41670771-41671208
267 >TRBV12-3 gi 189027696 :41676373-41676819
268 >TRBV12-4 gi 189027696 :41679696-41680142
269 >TRBV12-5 gi 189027696 : 41696887-41697333
270 >TRBV13 |gi | 89027696: 41651711-41652194
271 >TRBV141 gi 189027696: 41703830-41704262
272 >TRBV151 gi | 89027696 : 41708905-41709373 273 >TRBV161 gi 189027696: 41713914-41714367
274 >TRBV171 gi 189027696: 41717530-41718264
275 >TRBV181 gi 189027696: 41731601-41732219
276 >TRBV19 |gi 1224589819: 142326571-142327046
277 >TRBV20-1 |gi 1224589819: 142334241-142334913
278 >TRBV21-1 |gi 1224589819: 142344427-142344894
279 >TRBV23-1 |gi 1224589819: 142353468-142353971
280 >TRBV24-1 |gi 1224589819: 142364234-142364710
281 >TRBV25-1 |gi 1224589819: 142378609-142379076
282 >TRBV271 gi 1224589819: 142423216-142423688
283 >TRBV28 |gi 1224589819: 142428504-142428984
284 >TRBV29-1 |gi 1224589819: 142448120-142448741
285 >TRBV301 gi 1224589819 :cl 42510972- 142510271
286 >TRBV13 |gi | 89027696: 41651711-41652194
287 >TRBV141 gi 189027696: 41703830-41704262
288 >TRBV151 gi | 89027696 : 41708905-41709373
289 >TRBV161 gi 189027696: 41713914-41714367
290 >TRBV171 gi 189027696: 41717530-41718264
291 >TRBV181 gi 189027696: 41731601-41732219
292 >TRBV19 |gi 1224589819: 142326571-142327046
293 >TRBV20-1 |gi 1224589819: 142334241-142334913
294 >TRBV21-1 |gi 1224589819: 142344427-142344894
295 >TRBV23-1 |gi 1224589819: 142353468-142353971
296 >TRBV24-1 |gi 1224589819: 142364234-142364710
297 >TRBV25-1 |gi 1224589819: 142378609-142379076
298 >TRBV271 gi 1224589819: 142423216-142423688
299 >TRBV28 |gi 1224589819: 142428504-142428984
300 >TRBV29-1 |gi 1224589819: 142448120-142448741
301 >TRBV301 gi 1224589819 :cl 42510972- 142510271
>gi 1149292731 : cl 14668044- 114667992 Mus musculus strain C57BL/6J
302 chromosome 12, MGSCv37 C57BL/6J, IgHJl
>gi 1149292731 : cl 14667726- 114667679 Mus musculus strain C57BL/6J
303 chromosome 12, MGSCv37 C57BL/6J, IgHJ2
>gi 1149292731 : cl 14667343- 114667296 Mus musculus strain C57BL/6J
304 chromosome 12, MGSCv37 C57BL/6J, IgHJ3
>gi 1149292731 : cl 14666778- 114666725 Mus musculus strain C57BL/6J
305 chromosome 12, MGSCv37 C57BL/6J, IgHJ4
306 >IgHVl-4
307 >IgHVl-5
308 >IgHVl-7
309 >IgHVl-9
310 >IgHVl-ll
311 >IgHVl-12
312 >IgHVl-14
313 >IgHVl-15
314 >IgHVl-17-l 315 >IgHVl-18
316 >IgHVl-19
317 >IgHVl-20
318 >IgHVl-22
319 >IgHVl-26
320 >IgHVl-31
321 >IgHVl-34
322 >IgHVl-36
323 >IgHVl-37
324 >IgHVl-39
325 >IgHVl-42
326 >IgHVl-43
327 >IgHVl-47
328 >IgHVl-49
329 >IgHVl-50
330 >IgHVl-52
331 >IgHVl-53
332 >IgHVl-54
333 >IgHVl-55
334 >IgHVl-56
335 >IgHVl-58
336 >IgHVl-59
337 >IgHVl-61
338 >IgHVl-62-l
339 >IgHVl-62-2
340 >IgHVl-62-3
341 >IgHVl-63
342 >IgHVl-64
343 >IgHVl-66
344 >IgHVl-67
345 >IgHVl-69
346 >IgHVl-71
347 >IgHVl-72
348 >IgHVl-74
349 >IgHVl-75
350 >IgHVl-76
351 >IgHVl-77
352 >IgHVl-78
353 >IgHVl-80
354 >IgHVl-81
355 >IgHVl-82
356 >IgHVl-84
357 >IgHVl-85
358 >IgHV2-2 359 >IgHV2-3
360 >IgHV2-4
361 >IgHV2-5
362 >IgHV2-6
363 >IgHV2-6-8
364 >IgHV2-7
365 >IgHV2-9
366 >IgHV2-9-l
367 >IgHV3-l
368 >IgHV3-2
369 >IgHV3-3
370 >IgHV3-4
371 >IgHV3-5
372 >IgHV3-6
373 >IgHV3-8
374 >IgHV4-l
375 >IgHV4-2
376 >IgHV5-2
377 >IgHV5-4
378 >IgHV5-6
379 >IgHV5-9
380 >IgHV5-9-l
381 >IgHV5-12
382 >IgHV5-12-4
383 >IgHV5-15
384 >IgHV5-16
385 >IgHV5-17
386 >IgHV6-3
387 >IgHV6-4
388 >IgHV6-5
389 >IgHV6-6
390 >IgHV6-7
391 >IgHV7-l
392 >IgHV7-2
393 >IgHV7-3
394 >IgHV7-4
395 >IgHV8-2
396 >IgHV8-4
397 >IgHV8-5
398 >IgHV8-6
399 >IgHV8-8
400 >IgHV8-ll
401 >IgHV8-12
402 >IgHV9-l 403 >IgHV9-2
404 >IgHV9-3
405 >IgHV9-4
406 >IgHV10-l
407 >IgHV10-3
408 >IgHVll-l
409 >IgHVll-2
410 >IgHV12-3
411 >IgHV13-l
412 >IgHV13-2
413 >IgHV14-l
414 >IgHV14-2
415 >IgHV14-3
416 >IgHV14-4
417 >IgHV15-2
418 >IgHV16-l
>gi 149292735:54837511-54837576 Mus musculus strain C57BL/6J chromosome
419 14, MGSCv37 C57BL/6J, TRAJ2
>gi 149292735:54833452-54833513 Mus musculus strain C57BL/6J chromosome
420 14, MGSCv37 C57BL/6J, TRAJ5
>gi 149292735:54832363-54832424 Mus musculus strain C57BL/6J chromosome
421 14, MGSCv37 C57BL/6J, TRAJ6
>gi 149292735:54829068-54829125 Mus musculus strain C57BL/6J chromosome
422 14, MGSCv37 C57BL/6J, TRAJ9
>gi 149292735:54826814-54826872 Mus musculus strain C57BL/6J chromosome
423 14, MGSCv37 C57BL/6J, TRAJ11
>gi 149292735:54826227-54826285 Mus musculus strain C57BL/6J chromosome
424 14, MGSCv37 C57BL/6J, TRAJ12
>gi 149292735:54825416-54825472 Mus musculus strain C57BL/6J chromosome
425 14, MGSCv37 C57BL/6J, TRAJ13
>gi 149292735:54824097-54824156 Mus musculus strain C57BL/6J chromosome
426 14, MGSCv37 C57BL/6J, TRAJ15
>gi 149292735:54822809-54822869 Mus musculus strain C57BL/6J chromosome
427 14, MGSCv37 C57BL/6J, TRAJ16
>gi 149292735:54821450-54821512 Mus musculus strain C57BL/6J chromosome
428 14, MGSCv37 C57BL/6J, TRAJ17
>gi 149292735:54820452-54820517 Mus musculus strain C57BL/6J chromosome
429 14, MGSCv37 C57BL/6J, TRAJ18
>gi 149292735:54818465-54818521 Mus musculus strain C57BL/6J chromosome
430 14, MGSCv37 C57BL/6J, TRAJ21
>gi 149292735:54816923-54816983 Mus musculus strain C57BL/6J chromosome
431 14, MGSCv37 C57BL/6J, TRAJ22
>gi 149292735:54815756-54815815 Mus musculus strain C57BL/6J chromosome
432 14, MGSCv37 C57BL/6J, TRAJ23
>gi 149292735:54815320-54815375 Mus musculus strain C57BL/6J chromosome
433 14, MGSCv37 C57BL/6J, TRAJ24
>gi 149292735:54814165-54814224 Mus musculus strain C57BL/6J chromosome
434 14, MGSCv37 C57BL/6J, TRAJ26
>gi 149292735:54811978-54812036 Mus musculus strain C57BL/6J chromosome
435 14, MGSCv37 C57BL/6J, TRAJ27
>gi 149292735:54811336-54811400 Mus musculus strain C57BL/6J chromosome
436 14, MGSCv37 C57BL/6J, TRAJ28
>gi 149292735:54809541-54809599 Mus musculus strain C57BL/6J chromosome
437 14, MGSCv37 C57BL/6J, TRAJ30 >gi 149292735:54807570-54807626 Mus musculus strain C57BL/6J chromosome
438 14, MGSCv37 C57BL/6J, TRAJ31
>gi 149292735:54805776-54805841 Mus musculus strain C57BL/6J chromosome
439 14, MGSCv37 C57BL/6J, TRAJ32
>gi 149292735:54805033-54805089 Mus musculus strain C57BL/6J chromosome
440 14, MGSCv37 C57BL/6J, TRAJ33, tryptophane no phe
>gi 149292735:54804374-54804431 Mus musculus strain C57BL/6J chromosome
441 14, MGSCv37 C57BL/6J, TRAJ34
>gi 149292735:54803449-54803513 Mus musculus strain C57BL/6J chromosome
442 14, MGSCv37 C57BL/6J, TRAJ35
>gi 149292735:54801193-54801252 Mus musculus strain C57BL/6J chromosome
443 14, MGSCv37 C57BL/6J, TRAJ37
>gi 149292735:54799637-54799699 Mus musculus strain C57BL/6J chromosome
444 14, MGSCv37 C57BL/6J, TRAJ39
>gi 149292735:54797596-54797656 Mus musculus strain C57BL/6J chromosome
445 14, MGSCv37 C57BL/6J, TRAJ40
>gi 149292735:54795448-54796511 Mus musculus strain C57BL/6J chromosome
446 14, MGSCv37 C57BL/6J, TRAJ42
>gi 149292735:54794417-54794473 Mus musculus strain C57BL/6J chromosome
447 14, MGSCv37 C57BL/6J, TRAJ43
>gi 149292735:54792506-54792565 Mus musculus strain C57BL/6J chromosome
448 14, MGSCv37 C57BL/6J, TRAJ45
>gi 149292735:54789483-54789543 Mus musculus strain C57BL/6J chromosome
449 14, MGSCv37 C57BL/6J, TRAJ48
>gi 149292735:54788361-54788419 Mus musculus strain C57BL/6J chromosome
450 14, MGSCv37 C57BL/6J, TRAJ49
>gi 149292735:54787265-54787327 Mus musculus strain C57BL/6J chromosome
451 14, MGSCv37 C57BL/6J, TRAJ50
>gi 149292735:54784991-54785056 Mus musculus strain C57BL/6J chromosome
452 14, MGSCv37 C57BL/6J, TRAJ52
>gi 149292735:54782319-54782384 Mus musculus strain C57BL/6J chromosome
453 14, MGSCv37 C57BL/6J, TRAJ53
>gi 149292735:54778938-54779000 Mus musculus strain C57BL/6J chromosome
454 14, MGSCv37 C57BL/6J, TRAJ56
>gi 149292735:54778182-54778244 Mus musculus strain C57BL/6J chromosome
455 14, MGSCv37 C57BL/6J, TRAJ57
>gi 149292735:54776955-54777017 Mus musculus strain C57BL/6J chromosome
456 14, MGSCv37 C57BL/6J, TRAJ58
457 >TRAV1
458 >TRAV2
459 >TRAV3-1
460 >TRAV3-3
461 >TRAV3-4
462 >TRAV3D-3
463 >TRAV3N-3
464 >TRAV4-2
465 >TRAV4-3
466 >TRAV4-4/DV10
467 >TRAV4D-3
468 >TRAV4D-4
469 >TRAV4N-3
470 >TRAV4N-4
471 >TRAV5-1
472 >TRAV5D-4 473 >TRAV5N-4
474 >TRAV6-1
475 >TRAV6-2
476 >TRAV6-3
477 >TRAV6-4
478 >TRAV6-5
479 >TRAV6-6
480 >TRAV6-7/DV9
481 >TRAV6D-3
482 >TRAV6D-4
483 >TRAV6D-5
484 >TRAV6D-6
485 >TRAV6D-7
486 >TRAV6N-5
487 >TRAV6N-6
488 >TRAV6N-7
489 >TRAV7-1
490 >TRAV7-2
491 >TRAV7-3
492 >TRAV7-4
493 >TRAV7-5
494 >TRAV7-6
495 >TRAV7D-2
496 >TRAV7D-3
497 >TRAV7D-4
498 >TRAV7D-5
499 >TRAV7D-6
500 >TRAV7N-4
501 >TRAV7N-6
502 >TRAV8-1
503 >TRAV8-2
504 >TRAV8D-1
505 >TRAV8D-2
506 >TRAV8N-2
507 >TRAV9-1
508 >TRAV9-2
509 >TRAV9-3
510 >TRAV9-4
511 >TRAV9D-1
512 >TRAV9D-2
513 >TRAV9D-3
514 >TRAV9N-2
515 >TRAV9N-4
516 >TRAV10 517 >TRAV10D
518 >TRAV10N
519 >TRAV11
520 >TRA11-D
521 >TRAV12-1
522 >TRAV12-2
523 >TRAV12-3
524 >TRAV12D-1
525 >TRAV12D-2
526 >TRAV12D-3
527 >TRAV12N-1
528 >TRAV12N-2
529 >TRAV12N-3
530 >TRAV13-1
531 >TRAV13-2
532 >TRAV13-3
533 >TRAV13-4/DV7
534 >TRAV13-5
535 >TRAV13D-1
536 >TRAV13D-2
537 >TRAV13D-3
538 >TRAV13D-4
539 >TRAV13N-1
540 >TRAV13N-2
541 >TRAV13N-3
542 >TRAV13N-4
543 >TRAV14-1
544 >TRAV14-2
545 >TRAV14-3
546 >TRAV14D-1
547 >TRAV14D-2
548 >TRAV14D-3/DV8
549 >TRAV14N-1
550 >TRAV14N-2
551 >TRAV14N-3
552 >TRAV15-1/DV6-1
553 >TRAV15-2/DV6-2
554 >TRAV15D-1/DV6D-1
555 >TRAV15D-2/DV6D-3
556 >TRAV15N-1
557 >TRAV15N-2
558 >TRAV16
559 >TRAV16D/DV11
560 >TRAV16N 561 >TRAV17
562 >TRAV19
563 >TRAV21/DV12
564 >TRBJ1-1
565 >TRBJl-2
566 >TRBJl-3
567 >TRBJl-4
568 >TRBJl-5
569 >TRBJ2-1
570 >TRBJ2-2
571 >TRBJ2-3
572 >TRBJ2-4
573 >TRBJ2-5
574 >TRBJ2-7
575 >TRBV1
576 >TRBV2
577 >TRBV3
578 >TRBV4
579 >TRBV5
580 >TRBV12-1
581 >TRBV12-2
582 >TRBV13-1
583 >TRBV13-2
584 >TRBV13-3A
585 >TRBV14
586 >TRBV15
587 >TRBV16
588 >TRBV17
589 >TRBV19
590 >TRBV20
591 >TRBV23
592 >TRBV26
593 >TRBV29
594 >TRBV30
595 >TRBV31

Claims

CLAIMS What is claimed is:
1. A method for immune repertoire sequence identification which comprises:
a. comparing an unknown sequence to a plurality of immune repertoire specific
reference sequences ("words");
b. if a portion of the unknown sequence matches an immune repertoire reference
sequence, measuring a distance from the immune repertoire reference sequence to a conserved reference codon in the unknown sequence; and
c. using the measured distance and the immune repertoire reference sequence to identify the unknown sequence.
2. The method of Claim 1, wherein multiplexed polymerase chain reaction is used to amplify the unknown sequence.
3. The method of Claim 1, wherein massively parallel sequencing is used to sequence the unknown sequence.
4. The method of Claim 1, wherein the immune repertoire specific reference sequences are immunoglobulin IgH or immunoglobulin IgL reference sequences.
5. The method of Claim 1, wherein the immune repertoire specific reference sequences are T cell receptor (TCR ) or T cell receptor (TCRoc) reference sequences.
6. The method of Claim 1, wherein the immune repertoire specific reference sequences are joining (J) gene reference sequences.
7. The method of Claim 1, wherein the immune repertoire specific reference sequences are variable (V) gene reference sequences.
8. The method of Claim 1, wherein the conserved reference codon is a second conserved cysteine, a phenylalanine or a tryptophan codon.
9. The method of Claim 1, wherein the immune repertoire specific reference sequences are selected from the sequences in Table 1.
10. The method of Claim 1, wherein the conserved reference codon is a codon selected from the reference codons in Table 1.
11. The method of Claim 1, wherein the distance to the conserved reference codon is a distance selected from the distances to the reference codons in Table 1.
12. A computer-implemented method for immune repertoire sequence identification which comprises: comparing an unknown sequence to a plurality of immune repertoire specific reference sequences ("words");
if a portion of the unknown sequence matches an immune repertoire reference sequence, measuring a distance from the immune repertoire reference sequence to a conserved reference codon in the unknown sequence; and
using the measured distance and the immune repertoire reference sequence to identify the unknown sequence.
A system for immune repertoire sequence identification which comprises:
an alignment module wherein an unknown sequence is aligned with a plurality of immune repertoire specific reference sequences ("words"); and
a measurement module that measures and compares a distance from the immune repertoire reference sequence to a conserved reference codon in the unknown sequence.
PCT/US2013/022210 2012-01-20 2013-01-18 Method for analysis of immune variable sequences WO2013109935A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201261588878P 2012-01-20 2012-01-20
US61/588,878 2012-01-20

Publications (1)

Publication Number Publication Date
WO2013109935A1 true WO2013109935A1 (en) 2013-07-25

Family

ID=48799695

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2013/022210 WO2013109935A1 (en) 2012-01-20 2013-01-18 Method for analysis of immune variable sequences

Country Status (1)

Country Link
WO (1) WO2013109935A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9618474B2 (en) 2014-12-18 2017-04-11 Edico Genome, Inc. Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
US9695474B2 (en) 2010-12-16 2017-07-04 Gigagen, Inc. System and methods for massively parallel analysis of nucleic acids in single cells
US9738699B2 (en) 2015-06-09 2017-08-22 Gigagen, Inc. Recombinant fusion proteins and libraries from immune cell repertoires
US9859394B2 (en) 2014-12-18 2018-01-02 Agilome, Inc. Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
US9857328B2 (en) 2014-12-18 2018-01-02 Agilome, Inc. Chemically-sensitive field effect transistors, systems and methods for manufacturing and using the same
US10006910B2 (en) 2014-12-18 2018-06-26 Agilome, Inc. Chemically-sensitive field effect transistors, systems, and methods for manufacturing and using the same
US10020300B2 (en) 2014-12-18 2018-07-10 Agilome, Inc. Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
US10429342B2 (en) 2014-12-18 2019-10-01 Edico Genome Corporation Chemically-sensitive field effect transistor
US10811539B2 (en) 2016-05-16 2020-10-20 Nanomedical Diagnostics, Inc. Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
US11421220B2 (en) 2019-03-21 2022-08-23 Gigamune, Inc. Engineered cells expressing anti-viral T cell receptors and methods of use thereof

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003044225A2 (en) * 2001-11-23 2003-05-30 Bayer Healthcare Ag Profiling of the immune gene repertoire

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003044225A2 (en) * 2001-11-23 2003-05-30 Bayer Healthcare Ag Profiling of the immune gene repertoire

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
BOYD ET AL.: "Measurement and clinical monitoring of human lymphocyte clonality by massively parallel VDJ pyrosequencing", SCIENCE TRANSLATIONAL MEDICINE, vol. 1, no. 12, 2009, pages 12RA23 *
FREEMAN ET AL.: "Profiling the T-cell receptor beta-chain repertoire by massively parallel sequencing", GENOME RESEARCH, vol. 19, no. 10, 2009, pages 1817 - 1824, XP002636496 *
VENTURI ET AL.: "A mechanism for TCR sharing between T cell subsets and individuals revealed by pyrosequencing", THE JOURNAL OF IMMUNOLOGY, vol. 186, no. 7, 2011, pages 4285 - 4294, XP055081487 *
VENTURI ET AL.: "Methods for comparing the diversity of samples of the T cell receptor repertoire", JOURNAL OF IMMUNOLOGICAL METHODS, vol. 321, no. 1, 2007, pages 182 - 195, XP005936362 *
YOHANNES ET AL.: "Computational comparative study of blood TCR repertoire: Celiac disease patients versus controls", AALTO UNIVERSITY SCHOOL OF SCIENCE, MASTER' S THESIS, 2011, pages 1 - 66 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10106789B2 (en) 2010-12-16 2018-10-23 Gigagen, Inc. System and methods for massively parallel analysis of nucleic acids in single cells
US9695474B2 (en) 2010-12-16 2017-07-04 Gigagen, Inc. System and methods for massively parallel analysis of nucleic acids in single cells
US11591652B2 (en) 2010-12-16 2023-02-28 Gigagen, Inc. System and methods for massively parallel analysis of nucleic acids in single cells
US10465243B2 (en) 2010-12-16 2019-11-05 Gigagen, Inc. System and methods for massively parallel analysis of nucleic acids in single cells
US10787706B2 (en) 2010-12-16 2020-09-29 Gigagen, Inc. System and methods for massively parallel analysis of nucleic acids in single cells
US11053543B2 (en) 2010-12-16 2021-07-06 Gigagen, Inc. System and methods for massively parallel analysis of nucleic acids in single cells
US9857328B2 (en) 2014-12-18 2018-01-02 Agilome, Inc. Chemically-sensitive field effect transistors, systems and methods for manufacturing and using the same
US10020300B2 (en) 2014-12-18 2018-07-10 Agilome, Inc. Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
US9618474B2 (en) 2014-12-18 2017-04-11 Edico Genome, Inc. Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
US10607989B2 (en) 2014-12-18 2020-03-31 Nanomedical Diagnostics, Inc. Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
US10006910B2 (en) 2014-12-18 2018-06-26 Agilome, Inc. Chemically-sensitive field effect transistors, systems, and methods for manufacturing and using the same
US10429342B2 (en) 2014-12-18 2019-10-01 Edico Genome Corporation Chemically-sensitive field effect transistor
US10429381B2 (en) 2014-12-18 2019-10-01 Agilome, Inc. Chemically-sensitive field effect transistors, systems, and methods for manufacturing and using the same
US9859394B2 (en) 2014-12-18 2018-01-02 Agilome, Inc. Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
US10494670B2 (en) 2014-12-18 2019-12-03 Agilome, Inc. Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
US9926554B2 (en) 2015-06-09 2018-03-27 Gigamune, Inc. Recombinant fusion proteins and libraries from immune cell repertoires
US10689641B2 (en) 2015-06-09 2020-06-23 Gigagen, Inc. Recombinant fusion proteins and libraries from immune cell repertoires
US10214740B2 (en) 2015-06-09 2019-02-26 Gigagen, Inc. Recombinant fusion proteins and libraries from immune cell repertoires
US9926555B2 (en) 2015-06-09 2018-03-27 Gigamune, Inc. Recombinant fusion proteins and libraries from immune cell repertoires
US9738699B2 (en) 2015-06-09 2017-08-22 Gigagen, Inc. Recombinant fusion proteins and libraries from immune cell repertoires
US11702765B2 (en) 2015-06-09 2023-07-18 Gigagen, Inc. Recombinant fusion proteins and libraries from immune cell repertoires
US10811539B2 (en) 2016-05-16 2020-10-20 Nanomedical Diagnostics, Inc. Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
US11421220B2 (en) 2019-03-21 2022-08-23 Gigamune, Inc. Engineered cells expressing anti-viral T cell receptors and methods of use thereof

Similar Documents

Publication Publication Date Title
WO2013109935A1 (en) Method for analysis of immune variable sequences
US20150031555A1 (en) Method for correction of bias in multiplexed amplification
Spencer et al. Loss of the interleukin-6 receptor causes immunodeficiency, atopy, and abnormal inflammatory responses
Gutierrez et al. Deciphering the TCR repertoire to solve the COVID-19 mystery
Miragaia et al. Single-cell transcriptomics of regulatory T cells reveals trajectories of tissue adaptation
Kelsen et al. Exome sequencing analysis reveals variants in primary immunodeficiency genes in patients with very early onset inflammatory bowel disease
AU2016242967B2 (en) Method of identifying human compatible T cell receptors specific for an antigenic target
Hillen et al. Plasmacytoid DCs from patients with Sjögren's syndrome are transcriptionally primed for enhanced pro-inflammatory cytokine production
Robinson Sequencing the functional antibody repertoire—diagnostic and therapeutic discovery
Cerosaletti et al. Single-cell RNA sequencing reveals expanded clones of islet antigen-reactive CD4+ T cells in peripheral blood of subjects with type 1 diabetes
ES2784343T3 (en) Simultaneous, highly multiplexed detection of nucleic acids encoding paired adaptive immune receptor heterodimers from many samples
Park et al. Interferon signature in the blood in inflammatory common variable immune deficiency
JP6158080B2 (en) Health and disease status monitoring using chronotype profiles
Komech et al. CD8+ T cells with characteristic T cell receptor beta motif are detected in blood and expanded in synovial fluid of ankylosing spondylitis patients
JP2017508457A (en) T cell balance gene expression, composition and method of use thereof
EP3114240A2 (en) Methods using randomer-containing synthetic molecules
Moreno-Torres et al. Immunophenotype and transcriptome profile of patients with multiple sclerosis treated with fingolimod: Setting up a model for prediction of response in a 2-year translational study
Amoriello et al. The TCR repertoire reconstitution in multiple sclerosis: comparing one-shot and continuous immunosuppressive therapies
WO2014011735A1 (en) Methods and kits for integrating genomic sequences with immune monitoring
Massoni-Badosa et al. An atlas of cells in the human tonsil
Lin et al. Deep sequencing of the T cell receptor β repertoire reveals signature patterns and clonal drift in atherosclerotic plaques and patients
van Schaik et al. Discovery of invariant T cells by next-generation sequencing of the human TCR α-chain repertoire
Fang et al. The cell-surface 5′-nucleotidase CD73 defines a functional T memory cell subset that declines with age
JPWO2017159686A1 (en) Monitoring or diagnosis and design of therapeutic agents for immunotherapy
Oakes et al. The T cell response to the contact sensitizer paraphenylenediamine is characterized by a polyclonal diverse repertoire of antigen-specific receptors

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13739062

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13739062

Country of ref document: EP

Kind code of ref document: A1