CN114774517A - Method and kit for sequencing human immune repertoire - Google Patents

Method and kit for sequencing human immune repertoire Download PDF

Info

Publication number
CN114774517A
CN114774517A CN202210381164.8A CN202210381164A CN114774517A CN 114774517 A CN114774517 A CN 114774517A CN 202210381164 A CN202210381164 A CN 202210381164A CN 114774517 A CN114774517 A CN 114774517A
Authority
CN
China
Prior art keywords
primer
sequence
sequencing
upstream
seq
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210381164.8A
Other languages
Chinese (zh)
Inventor
许明炎
张晓妮
周书雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Haplox Medical Science Examination Laboratory Co ltd
Original Assignee
Shenzhen Haplox Medical Science Examination Laboratory Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Haplox Medical Science Examination Laboratory Co ltd filed Critical Shenzhen Haplox Medical Science Examination Laboratory Co ltd
Priority to CN202210381164.8A priority Critical patent/CN114774517A/en
Publication of CN114774517A publication Critical patent/CN114774517A/en
Pending legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/06Biochemical methods, e.g. using enzymes or whole viable microorganisms
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/16Primer sets for multiplex assays

Abstract

The application discloses a method and a kit for sequencing a human immune repertoire. The method comprises performing primary extension of a template nucleic acid with a first primer; the first primer comprises a platform upstream primer binding region, UMI and a target upstream primer sequence, and partial T is replaced by dU; adding a second primer into the reaction system for primary extension; the second primer comprises a platform downstream primer binding region and a target downstream primer sequence; adding UDG/UNG enzyme to the reaction system to digest dU; adding a third primer into the reaction system after digestion to carry out PCR; the third primer is a sequence of the upstream primer binding region of the first primer platform; the target sequences of the first and second primers are specific sequences designed for the human TCR gene. The method of the application covers CDR3 of the human immune repertoire TCR completely, and the capture efficiency is high; through the special design of primers, three primers are added into a reaction system in sequence, so that each original template nucleic acid only corresponds to one UMI, the amplification deviation of each target spot is corrected, and the quantitative detection of the copy number of the target gene is realized.

Description

Method and kit for sequencing human immune repertoire
Technical Field
The application relates to the technical field of sequencing of human immune repertoires, in particular to a method and a kit for sequencing the human immune repertoires.
Background
The human immune repertoire (immuneproperitoire) refers to the sum of all specifically distinct T-lymphocyte and B-lymphocyte clones in the circulation of an individual at any given time point. The process by which T cells recognize antigens and are activated includes: antigen is phagocytized, processed, and processed by antigen presenting cells (e.g., macrophages) into polypeptide fragments, expressed on the cell surface as antigenic peptide-MHC complexes, and recognized by T Cell Receptors (TCRs) on the surface of T cells, thereby activating the immune response of the body. The richer the immune repertoire is, the more effectively the organism can resist the invasion of pathogens such as bacteria, viruses and the like; conversely, the more susceptible the disease. The TCR is a heterodimer composed of two distinct peptide chains, most TCRs, around 95%, are composed of two α and β chains, the remaining few are composed of two γ and δ chains, around 5%. The V regions of both α and β chains (V α, V β) have three hypervariable regions CDR1, CDR2, CDR3, respectively, of which the largest variation in CDR3 directly determines the antigen-binding specificity of the TCR. The CDR3 of TCR is encoded by V, D, J three genes, and during lymphocyte maturation, various recombinant sequence fragments are formed by the rearrangement of V, D, J genes, which explains the limited number of genes revealed by the human genome and proteomics to encode nearly unlimited protein classes. The TCR CDR3 is the primary site for antigen recognition by T cells. In a narrow sense, TCR CDR3 sequence analysis represents a T cell repertoire study.
Generally, human immune repertoire sequencing (IR-seq) refers to a method of using T/B lymphocytes as a research target, using 5' RACE technology or multiplex PCR amplification to determine complementarity determining regions (CDR regions) of B Cell Receptor (BCR) or T Cell Receptor (TCR) diversity, and combining with high throughput sequencing technology to comprehensively evaluate the diversity of the immune system, and deeply mining the relationship between immune repertoire and disease.
However, the sequencing technology of the human immune repertoire still has some defects per se. The 5' RACE technology is based on RNA amplification technology, and the technical principle is that a general primer sequence of a TCR/BCR constant region is amplified to a variable region, and then an introduced adaptor sequence is subjected to second unbiased PCR amplification, and the defects of the technology are that: (1) an RNA sample is used as an amplification template, and the RNA has harsher extraction conditions than DNA, so that the RNA has higher requirements on experimental environment and technicians; (2) the RNA has the defect of easy degradation, and strict conditions are provided for the transportation and the storage of the original tissue sample; (3) is more complicated and complicated in experimental operation.
Although multiplex PCR amplification can be directly applied to genomic DNA by designing specific primers based on the TCR V, D, J regions using DNA as the original template, its drawbacks are also evident: (1) the design difficulty of the multiple PCR primers is high, and amplification products are difficult to cover all subtypes; (2) in the prior art, the operation process is complex, after the multiple PCR amplification and purification are finished, the tail end needs to be repaired, the base A is added and the tail end is modified to build a library, and the operation is complicated; (3) the multiple PCR amplification can involve V, D, J region fragments which are not rearranged or are rearranged, so that the range of PCR amplification products covers dozens of to thousands of bp length fragments, the fragments need to be sorted at the step, smaller effective fragments are reserved, the fragment sorting is usually operated by a gel electrophoresis gel cutting recovery mode, more experiment time is needed, and meanwhile, certain loss is caused to the effective fragments; the operation is also complicated; (4) regardless of whether the amplification of the CDR3 of the TCR beta chain takes DNA or RNA as a template, preferential amplification cannot be avoided in a multiple PCR specific amplification stage, a large number of amplification errors can be accumulated due to a large number of amplification cycles, the diversity analysis is not facilitated, and the erroneous interpretation is easily caused.
Therefore, how to reduce or correct the problem of false positive caused by PCR amplification errors or human factors remains the focus of research on sequencing of human immune repertoires.
Disclosure of Invention
It is an object of the present application to provide an improved method and kit for sequencing a human immune repertoire.
In order to achieve the purpose, the following technical scheme is adopted in the application:
one aspect of the application discloses a method for sequencing a human immune repertoire, comprising the following steps:
preparing a reaction system, and performing primary extension on the template nucleic acid by adopting a first primer to obtain a complementary strand; the first primer sequentially comprises a sequencing platform upstream primer binding region, a unique identifier and a target specificity upstream primer sequence from a 5 'end to a 3' end; in the first primer, the base T in the sequencing platform upstream primer binding region and the target specificity upstream primer sequence is replaced by deoxyuracil, and the sequencing platform upstream primer binding region corresponds to the 3' end of the upstream sequencing primer of the sequencing platform;
after the first primer is extended, adding a second primer into the reaction system, and performing primary extension on the complementary strand extended by the first primer by using the second primer to obtain a product consisting of a sequencing platform upstream primer binding region, a unique identifier, a target sequence and a sequencing platform downstream primer binding region; the second primer sequentially comprises a downstream primer binding region of a sequencing platform and a target specificity downstream primer sequence from a 5 ' end to a 3 ' end, and the downstream primer binding region of the sequencing platform corresponds to the 3 ' end of a downstream sequencing primer of the sequencing platform;
after the extension of the second primer is finished, adding UDG/UNG enzyme into the reaction system to digest deoxyuracil so as to digest the first primer and the extension chain of the first primer;
after the UDG/UNG enzyme digestion is finished, adding a third primer into the reaction system, and performing PCR amplification enrichment on a product extended by the second primer by using the third primer and the second primer to obtain products with the same unique identifier added to all amplicons of the template nucleic acid; the third primer is the whole or partial sequence of the sequencing platform upstream primer binding region of the first primer from the 5' end, and the base T in the third primer is not replaced by deoxyuracil;
adding products with the same unique identifier to all amplicons obtained by PCR amplification enrichment to construct and sequence a sequencing library, namely completing sequencing of the human immune repertoire;
wherein, the target specificity upstream primer sequence and the target specificity downstream primer sequence are designed aiming at the human T cell receptor coding gene and fully cover the coding gene sequence of the CDR3 region.
In the method of the present application, "performing primary extension" means that only extension of a primer is performed after hybridization to a target sequence by primer annealing, and the primer is not denatured and annealed again. This ensures that a template nucleic acid parent strand is labeled with a unique UMI. Of course, after the second primer is added, although it is designed that the second primer is annealed, hybridized, extended; however, in this case, the first primer also anneals, hybridizes, and extends; however, the second primer is annealed, hybridized and extended only once; in this case, the product consisting of the sequencing platform upstream primer binding region, the unique identifier, the target sequence and the sequencing platform downstream primer binding region is also only the first primer extension tagged UMI first parent strand. Finally, under exponential PCR amplification enrichment of the third primer and the second primer, only the amplicon of the UMI parent strand that was initially labeled by the first primer extension can be exponentially enriched. And, before the third primer and the second primer are enriched for PCR amplification, the first primer is removed by digestion with UDG/UNG enzyme, and the first primer is prevented from introducing new UMI again in a new round of PCR amplification. The template nucleic acid of the present application may be DNA or cDNA.
It should be noted that, according to the method of the present application, the first primer, the second primer and the third primer are specially designed, and the three primers are sequentially added to the reaction system, so that the same UMI can be added to all the amplicon strands of a template nucleic acid parent strand, which is particularly important for mutation detection. For example, it can be determined directly by UMI which of the amplicon strands obtained by amplification with the same specific primers are derived from mutation or non-mutation, so that the mutation can be detected quantitatively and the accurate mutation rate can be obtained.
It should be further noted that the target specific upstream primer sequence and the target specific downstream primer sequence of the present application are specific primer sequences designed for human T cell receptor coding genes and fully covering the coding gene sequences of the CDR3 regions thereof, can be rapidly captured and amplified based on human genomic DNA or cDNA, cover the CDR3 functional region of the TCR of the human immune repertoire, and have high capture efficiency; in addition, products with the same unique identifier are directly added to all amplicons obtained by PCR amplification and enrichment to construct and sequence a sequencing library, namely, sequencing of the human immune repertoire is completed, experimental steps are reduced, DNA loss is reduced, and authenticity of a final result is improved; the human immune repertoire sequencing method can efficiently integrate the repertoire building process of an immune repertoire and an Illumina high-throughput sequencing platform, and by using the primers designed by the application, two-step PCR (polymerase chain reaction) can be realized to quickly build a library, so that the whole experimental time can be shortened to be within 4 hours; in addition, in the method, the UMI is added in the sequencing of the immune repertoire through specially designed primers, so that the same UMI is added to all amplicon strands of a template nucleic acid mother chain, the amplification deviation of each target point can be corrected, the amplification error of each target point is corrected, the quantitative detection of the copy number of the target gene is realized, and the problem of false positive caused by PCR amplification errors or human factors is solved.
It should be noted that the key point of the present application is the design of the primer structure and the design of the addition sequence of each primer, so that the final amplified and enriched amplicon has the same UMI; as to the specific primer sequences, it may be determined according to the specific target sequence targeted and the specific sequencing platform. For example, the first primer is composed by designing a target-specific forward primer sequence of the first primer for a specific target sequence and a sequencing platform forward primer binding region of the first primer for the expected sequencing platform, using conventional primer design software.
In one implementation of the present application, the target-specific upstream primer sequence is a specific primer sequence designed for the V gene of the human T cell receptor beta chain; the target specificity downstream primer sequence is a specificity primer sequence designed aiming at J gene of a human T cell receptor beta chain; the gene encoding the CDR3 region of the T cell receptor beta chain can be completely covered by the amplification of the target-specific forward primer sequence and the target-specific reverse primer sequence.
It is understood that CDR3 of the T cell receptor β chain is encoded by V, D, J three genes; therefore, the upstream primer is designed for the V gene, and the downstream primer is designed for the J gene, so that the coding gene of the CDR3 region can be better ensured to be completely covered.
In one embodiment of the present application, the first primer has at least one deoxy-uracil inserted in the sequence of the unique identifier, and the number of consecutive bases of the unique identifier is less than 5 by the separation of the inserted deoxy-uracils.
In the present application, deoxyuracil is inserted into a primer or T is replaced with deoxyuracil in order to digest the primer with UDG/UNG enzyme when it is not necessary. Inserting deoxyuracil in the unique identifier, and in order to avoid non-specific amplification of random UMI possibly occurring in subsequent amplification as much as possible, selectively inserting one or more fixed deoxyuracils into the middle of a base sequence of the unique identifier sequence, wherein the number of continuous N bases on the left side and the right side of the deoxyuracils is less than 5nt, so that the non-specific amplification can be effectively avoided; of course, if the possibility of non-specific amplification is not considered, deoxyuracil may not be inserted into the unique identifier.
In one implementation of the present application, the amplification cycle number of the PCR amplification enrichment is greater than or equal to 5.
It should be noted that, the PCR amplification enrichment of the third primer and the second primer is mainly to enable the amplicon of the mother strand of the first primer extension labeled UMI to be exponentially amplified and enriched, so as to obtain more amplicon strands derived from the same template nucleic acid and having the same UMI, so as to facilitate the subsequent pooling and sequencing.
In one embodiment of the present application, the first primer consists of 40 primers having the sequences shown in Seq ID No.1 to Seq ID No. 40.
It should be noted that, the 40 primers of the sequences shown in Seq ID No.1 to Seq ID No.40 are specific upstream primers which are designed autonomously according to the base complementary pairing and primer design principle, in combination with the required site, specifically for the V gene reference sequence of the human T lymphocyte receptor beta chain; the 40 primers designed by the application comprise 42V functional regions; when in use, 40 primers are mixed according to the molar mass equal ratio. It is understood that the 40 primers of the present application are only one of the upstream specific primers that have been confirmed to be particularly useful in the implementation manner of the present application, and that the increase and decrease of several bases can be performed on the basis of the 40 primers of the present application, or the primer sequence can be redesigned according to the principle of primer design under the inventive concept of the present application.
In one embodiment of the present application, the second primer consists of 12 primers having the sequences shown in Seq ID No.41 to Seq ID No. 52.
It should be noted that, 12 primers of the sequences shown in Seq ID No.41 to Seq ID No.52 are specific downstream primers which are designed autonomously according to the base complementary pairing and primer design principle, in combination with the required site, of the J gene reference sequence of the human T lymphocyte receptor beta chain; the 12 primers of the present application, comprising 6 functional regions of J1 and 7 functional regions of J2; when in use, the 12 primers are mixed according to the molar mass equal ratio. It is understood that the 12 primers of the present application are only one of the downstream specific primers that have been confirmed to be particularly useful in the implementation manner of the present application, and that the inventive concept of the present application can be used to increase or decrease several bases on the basis of the 12 primers of the present application, or to redesign the primer sequence according to the principle of primer design.
In one embodiment of the present application, the third primer is a sequence shown in Seq ID No. 53.
It should be noted that the third primer in the present application is actually all or part of the sequence of the upstream primer binding region of the sequencing platform of the first primer from the 5 'end, i.e., the primer designed for the 3' end of the primer sequence of the sequencing platform. For example, a primer designed by referring to 19nt sequence at the 3' end of P5 of the sequencing platform of Illumina NovaSeq6000 is the sequence shown in Seq ID No. 53.
In one implementation of the present application, the sequencing library construction comprises the following steps:
adding products with the same unique identifier to all amplicons obtained by PCR amplification enrichment, and purifying to obtain purified products; adopting a fourth primer and a fifth primer to perform library construction and amplification on the purified product to obtain a sequencing library; the fourth primer is a sequencing platform upstream sequencing primer with a sequencing joint and a Barcode, and the fifth primer is a sequencing platform downstream sequencing primer with a sequencing joint and a Barcode.
It should be noted that, the sequencing library construction method of the present application actually performs amplification library construction on the PCR amplification enriched products of the third primer and the second primer; namely, the target product is amplified and enriched again by using the sequencing primer at the upstream of the sequencing platform and the sequencing primer at the downstream of the sequencing platform.
In one implementation of the present application, the purification is at least one of magnetic bead purification, column chromatography purification, and gel purification.
In one embodiment of the present application, the fourth primer is the sequence of Seq ID No. 54.
In one embodiment of the present application, the fifth primer is the sequence shown in Seq ID No. 55.
In the fourth primer having the sequence shown in Seq ID No.54 and the fifth primer having the sequence shown in Seq ID No.55, "NNNNNN" refers to Barcode, which is an Index having a length of 6 to 10 nt. For example, in one implementation of the present application, "NNNNNN" of the fourth primer of the sequence shown in Seq ID No.54 is specifically "TGCGTAAT" and "NNNNNN" of the fifth primer of the sequence shown in Seq ID No.55 is specifically "CCTAACCT".
The other side of the application discloses a kit for sequencing a human immune repertoire, which comprises a first primer, a second primer, a third primer and UDG/UNG enzyme; the first primer sequentially comprises a sequencing platform upstream primer binding region, a unique identifier and a target specificity upstream primer sequence from a 5 'end to a 3' end; in the first primer, the base T in the upstream primer binding region of the sequencing platform and the target specificity upstream primer sequence is replaced by deoxyuracil, and the upstream primer binding region of the sequencing platform corresponds to the 3' end of the upstream sequencing primer of the sequencing platform; the second primer sequentially comprises a downstream primer binding region of a sequencing platform and a target specificity downstream primer sequence from a 5 ' end to a 3 ' end, and the downstream primer binding region of the sequencing platform corresponds to the 3 ' end of a downstream sequencing primer of the sequencing platform; the third primer is the whole or partial sequence of the upstream primer binding region of the sequencing platform of the first primer from the 5' end, and the base T in the third primer is not replaced by deoxyuracil; the target specificity upstream primer sequence and the target specificity downstream primer sequence are designed aiming at human T cell receptor coding gene and are specific primer sequences which completely cover the coding gene sequence of CDR3 area.
It should be noted that, the human immune repertoire sequencing kit of the present application is actually a kit assembled by the first primer, the second primer, the third primer and the UDG/UNG enzyme used in the method for sequencing the immune repertoire of the present applicant, so as to facilitate the implementation of the method for sequencing the human immune repertoire of the present application. Therefore, the definition of the first primer, the second primer and the third primer in the kit can refer to the sequencing method of the human immune repertoire of the application. For example, the target-specific upstream primer sequence is a specific primer sequence designed for the V gene of the human T cell receptor beta chain; the target specificity downstream primer sequence is a specificity primer sequence designed aiming at J gene of a human T cell receptor beta chain; the gene encoding the CDR3 region of the T cell receptor beta chain can be completely covered by the amplification of the target-specific forward primer sequence and the target-specific reverse primer sequence. For another example, the first primer has at least one deoxyuracil inserted into the sequence of the unique identifier, and the number of consecutive bases of the unique identifier is less than 5 by the insertion of the deoxyuracil.
It should also be noted that one of the keys in the present application is the design of the primer structure, and the specific primer sequence can be determined according to the specific target sequence and the specific sequencing platform. For example, the first primer may be composed by designing a target-specific forward primer sequence of the first primer for a specific target sequence and a sequencing platform forward primer binding region of the first primer for the expected sequencing platform using conventional primer design software.
In one implementation manner of the application, the first primer in the kit consists of 40 primers with sequences shown in Seq ID No.1 to Seq ID No. 40.
In one embodiment, the second primer of the kit comprises 12 primers having the sequences shown in Seq ID No.41 to Seq ID No. 52.
In one implementation manner of the present application, the third primer in the kit is a sequence shown in Seq ID No. 53.
In one implementation manner of the present application, the kit of the present application further comprises a fourth primer and a fifth primer; the fourth primer is a sequencing platform upstream sequencing primer with a sequencing joint and a Barcode, and the fifth primer is a sequencing platform downstream sequencing primer with a sequencing joint and a Barcode.
It should be noted that, the fourth primer and the fifth primer of the present application are actually designed for the sequencing primer on the upstream side of the sequencing platform and the sequencing primer on the downstream side of the sequencing platform, or the upstream primer and the downstream primer of the sequencing platform can be directly used, as long as the sequencing linker and the Barcode carried by the primers are consistent with the present application. Therefore, the fourth primer and the fifth primer can be selectively added into the kit according to requirements. Of course, the kit of the present application includes a fourth primer and a fifth primer for convenience of use.
In one embodiment of the present application, the fourth primer in the kit is the sequence shown in Seq ID No. 54.
In one implementation manner of the present application, the fifth primer in the kit is a sequence shown in Seq ID No. 55.
In one implementation of the present application, the kit of the present application further comprises a PCR amplification reagent.
It is understood that the PCR amplification reagents may be incorporated into the kit of the present application as needed, or may be separately purchased from commonly used PCR amplification reagents, such as PCR reaction buffer, enzymes, and the like.
Due to the adoption of the technical scheme, the beneficial effects of the application are as follows:
according to the sequencing method and the kit for the human immune repertoire, the specific primer design is carried out on the human T cell receptor, the CDR3 area of the TCR of the human immune repertoire is fully covered, and the capture efficiency is high; and through specially designing a first primer, a second primer and a third primer, sequentially adding the three primers into a reaction system, so that each original template nucleic acid only corresponds to one UMI label, thereby correcting the amplification deviation of each target spot, correcting PCR amplification errors and correcting artificially introduced amplification errors in the library building process; the method and the kit can mark each original template nucleic acid, thereby realizing the quantitative detection of the copy number of the target gene.
Detailed Description
The sequencing method of the human immune repertoire is improved based on an IR-seq multiplex PCR amplification method, DNA or cDNA is used as a template for amplification, for example, V region and J region primers upstream and downstream of CDR3 of a TCR beta chain are well designed, a CDR3 region of a human TCR is specifically captured and enriched, and 42 functional regions, 2D functional regions, 6J 1 functional regions and 7J 2 functional regions of CDR3 of the TCR beta chain of the human TCR beta chain are covered. And highly optimizing the library construction process on the basis, reducing the conventional complicated experimental operation, and simultaneously using an amplicon library construction method. Meanwhile, when the second-step amplification primer is designed, the double-end Index label is used for designing the amplification primer, so that the accuracy of data is ensured. Most importantly, the method creatively adopts a first primer, a second primer and a third primer which are specially designed, sequentially adds the three primers into a reaction system, inserts a unique UMI label into each parent strand molecular complementary strand, realizes that each original template nucleic acid only corresponds to one UMI, corrects the amplification deviation of each target spot, and realizes the quantitative detection of the copy number of the target gene.
The method for sequencing the human immune repertoire comprises the following steps:
preparing a reaction system, and performing primary extension on template nucleic acid by adopting a first primer to obtain a complementary strand; the first primer sequentially comprises a sequencing platform upstream primer binding region, a unique identifier and a target specificity upstream primer sequence from a 5 'end to a 3' end; in the first primer, the base T in the upstream primer binding region of the sequencing platform and the target specificity upstream primer sequence is replaced by deoxyuracil, and the upstream primer binding region of the sequencing platform corresponds to the 3' end of the upstream sequencing primer of the sequencing platform;
after the first primer is extended, adding a second primer into the reaction system, and performing primary extension on the extended complementary strand of the first primer by using the second primer to obtain a product consisting of a sequencing platform upstream primer binding region, a unique identifier, a target sequence and a sequencing platform downstream primer binding region; the second primer sequentially comprises a sequencing platform downstream primer binding region and a target specificity downstream primer sequence from a 5 ' end to a 3 ' end, and the sequencing platform downstream primer binding region corresponds to the 3 ' end of a downstream sequencing primer of a sequencing platform;
after the extension of the second primer is finished, adding UDG/UNG enzyme into the reaction system to digest deoxyuracil so as to digest the first primer and the extension chain of the first primer;
after the UDG/UNG enzyme digestion is finished, adding a third primer into the reaction system, and performing PCR amplification enrichment on a product extended by the second primer by using the third primer and the second primer to obtain products with the same unique identifier added to all amplicons of the template nucleic acid; the third primer is the whole or partial sequence of the sequencing platform upstream primer binding region of the first primer from the 5' end, and the base T in the third primer is not replaced by deoxyuracil;
adding products with the same unique identifier to all amplicons obtained by PCR amplification enrichment to construct and sequence a sequencing library, namely completing sequencing of the human immune repertoire;
the target specificity upstream primer sequence and the target specificity downstream primer sequence are designed aiming at human T cell receptor coding gene and are specific primer sequences which completely cover the coding gene sequence of CDR3 area.
The general principle of the sequencing method of the human immune repertoire is as follows: designing a specific UMI sequence, for example, combining a human T cell receptor beta chain target region sequence, extending a complementary chain and adding UMI, and then utilizing an unused UMI sequence in a UDG/UNG enzyme digestion system to realize the uniqueness of the extended chain UMI of each template molecule. And then amplifying the enrichment target region by utilizing the complementary strand, designing a corresponding sequencing primer, adding Barcode/Index and a sequencing joint into the enrichment product, and completing library construction.
The sequencing method for the human immune repertoire has strong universality, is suitable for TCR sequencing by taking DNA as an initial template or multiple PCR amplification by taking cDNA synthesized by RNA as the initial template, and has the characteristics of high specificity and high sensitivity. And redundant UMI labels are digested by an enzyme method, so that the molecular chains of the same UMI are ensured to come from the same template, unique molecular labels are realized, the false positive problem caused by PCR errors or human factors can be corrected, the deviation caused by uneven amplification efficiency in multiple PCR can be corrected, and the authenticity of data is ensured.
The sequencing method of the human immune repertoire has the following advantages:
1. the related primers are designed autonomously, can be captured and amplified quickly based on human genome DNA or cDNA, for example, the functional region of the TCR beta chain CDR3 of the human immune repertoire is covered, and the capturing efficiency is high.
2. Some TCR products in the current market use agarose gel electrophoresis to sort and purify a capture region, the operation is more complicated and error-prone, and the purification of an electrophoresis method easily causes nucleic acid loss, so that the experimental result is distorted. In one implementation of the present application, products with the same unique identifier are directly added to all amplicons obtained by PCR amplification enrichment to perform purification, sequencing library construction and sequencing, for example, DNA purification and fragment sorting are performed in one step using AMPure magnetic beads with a specific ratio, thereby reducing experimental steps, reducing DNA loss, and improving the authenticity of final results.
3. In the method, the library building process of the human immune repertoire and the Illumina high-throughput sequencing platform is efficiently integrated, the relevant primers are independently designed, and the two-step PCR rapid library building can be realized; the experimental time can be shortened to be within 4 hours.
4. Primers are specially designed, UMI is added in immune repertoire sequencing, the amplification deviation of each target spot is corrected, the amplification error of each target spot is corrected, and the quantitative detection of the copy number of the target gene is realized.
In the method, the first primer, the second primer, the third primer, the fourth primer and the fifth primer are designed according to the following design ideas:
the first primer is a UMI sequence and comprises three parts: the first part is a fixed sequence of 15-25nt corresponding to the 3' end of the upstream sequencing primer of the sequencing platform, the second part is a random N base sequence of 6-8 bits, namely UMI, and the third part is a target specificity upstream primer sequence. The sequence connection order is: 5 '-first part-second part-third part-3'.
Wherein, in the first and third partial sequences, dU (deoxyuracil) base is used to replace T (thymine) base. The first part 15-25nt of the fixed sequence can refer to the complete 3 'terminal sequence of the upstream joint of different sequencing platforms, for example, the 3' terminal 19nt sequence of P5 end of Illumina NovaSeq6000 sequencing platform is designed as the sequence shown in Seq ID No.56,
Seq ID No.56:5’-CACGACGCUCUUCCGAUCU-3’。
further, in order to avoid the possibility of non-specific amplification of random UMI in the subsequent amplification as much as possible, one or more fixed deoxyuracils may be inserted into the middle of the random N base sequence at positions 6-8, and the number of consecutive N bases on both sides of the deoxyuracils is less than 5nt in the second partial sequence. For example, the second partial sequence can be 5 '-NNNNNNNN-3', 5 '-NNNNNNNNNN-3', or the like.
Furthermore, the third partial sequence is a target specificity upstream primer sequence, a target gene can be searched according to an NCBI (national center for Biotechnology information) isopachy database, and an upstream primer can be autonomously designed by combining a required site according to base complementary pairing and a primer design principle. A plurality of PCR primers with strong specificity can be designed according to multiple target points and mixed for use. For example, in one implementation of the present application, the V gene reference sequence of the beta chain of the human T lymphocyte receptor is looked up with reference to the NCBI, IMGT standard database; according to the base complementary pairing and primer design principle, combining with a required site, and autonomously designing an upstream primer; the application designs 40 target specificity upstream primer sequences in total, and the upstream primer sequences comprise 42V functional regions in total; all the functional region primers are mixed according to molar mass equal ratio for use.
A second primer comprising two parts: the first part is a downstream specificity primer sequence, the second part is a fixed sequence of 15-25nt, and the fixed sequence corresponds to a complementary sequence at the 3' end of a downstream sequencing primer of a sequencing platform. The sequence connection order is: 5 '-second part-first part-3'.
Furthermore, the first partial sequence of the second primer is a target specificity downstream primer sequence, a target gene can be searched according to an NCBI (national center for Biotechnology information) isopachy database, and a downstream primer can be autonomously designed by combining a required site according to the base complementary pairing and primer design principle. A plurality of PCR primers with strong specificity can be designed according to multiple target points and mixed for use. For example, in one implementation of the present application, a J gene reference sequence of the beta chain of the human T lymphocyte receptor is looked up with reference to NCBI, IMGT standard database; according to the base complementary pairing and primer design principle, combining with a required locus, and autonomously designing a downstream primer; the application designs 12 target-specific upstream primer sequences which comprise 6J 1 functional regions and 7J 2 functional regions in total; a total of 13 functional region primers were mixed and used at a molar mass ratio.
The second part 15-25nt of the second primer can be referred to the complete 3 'terminal sequence of the downstream adaptor of different sequencing platforms, for example, the complementary sequence of the 3' terminal 21nt sequence of P7 of Illumina NovaSeq6000 sequencing platform is referred to as the sequence shown in Seq ID No.57,
Seq ID No.57:5’-AGACGTGTGCTCTTCCGATCT-3’。
the third primer is the same sequence as the fixed sequence of 15-25nt in the first primer sequence, and it should be noted that the T base of the sequence cannot be replaced by deoxyuracil. For example, the 3' end 19nt sequence at the P5 end of the reference Illumina NovaSeq6000 sequencing platform is designed as the sequence shown in Seq ID No.53,
Seq ID No.53:5’-CACGACGCTCTTCCGATCT-3’。
the sequence of the fourth primer is as follows: complete upstream sequencing linker sequence with Barcode, Index can be 6-10nt in length. For example, the P5 sequence referred to the Illumina NovaSeq6000 sequencing platform is designed as the sequence shown in Seq ID No.54,
Seq ID No.54:
5’-AATGATACGGCGACCACCGAGATCTACACNNNNNNACACTCTTTCCCTACACGACGCTCTTCCGATCT-3’。
the "NNNNNN" sequence of the fourth primer of the sequence shown in Seq ID No.54 is the Index sequence.
The sequence of the fifth primer is: the Index can be 6-10nt in length, with the complement of the Barcode's downstream sequencing adapter sequence. For example, the sequence P7 referred to the Illumina NovaSeq6000 sequencing platform is designed as the sequence shown in Seq ID No.55, Seq ID No. 55:
5’-CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT-3’。
the "NNNNNN" sequence of the fifth primer of the sequence shown in Seq ID No.55 is the Index sequence.
Based on the method for rapidly adding UMI into the first primer to the fifth primer and the complete library preparation technology, the process comprises the following steps: 1. the template nucleic acid is extended using a first primer to obtain a complementary strand, and the UMI tag is introduced. 2. Adding a second primer to fully extend the complementary strand in step 1. 3. Deoxyuracil was digested using the UDG/UNG enzyme, and the first primer was digested. 4. Adding a third primer to perform amplification and enrichment on the product obtained in the step 2; and purifying the amplification product to remove system, redundant primers and genome pollution. 5. And (4) carrying out library construction amplification by using a fourth primer and a fifth primer, and adding a Barcode and a sequencing joint to complete library construction.
Specifically, the technical process is described in detail as follows:
the first step is as follows: introduction of UMI tag, template complementary strand extension:
the template nucleic acid, e.g.DNA/cDNA, containing the target region is taken in a total amount of 1-100ng, preferably genomic DNA as template, preferably in an amount of 100 ng.
Preparing an extension system: commercially available PCR amplification kits or self-developed PCR amplification kits can be used, the major components of which can include but are not limited to: DNA polymerase, Mg ions, dNTPs and a buffer system.
If the experiment is designed as a multiplex PCR reaction, it is preferred to select a commercially available or self-developed multiplex PCR amplification kit.
Adding a first primer into a prepared extension system, wherein the working concentration of the first primer can be 50-500 mM; preferably, the working concentration is set at 200mM, i.e., the concentration of each primer is 200 mM.
Adding prepared template DNA/cDNA into prepared extending system, mixing fully and extending reaction of complementary chain of template. The reaction program parameters should be set with reference to the PCR amplification kit instructions. It should be noted that the extension time of the reaction needs to be adjusted to be greater than "target region length/extension speed", even if the target region can be fully and completely extended; the number of PCR cycles was not set or the number of cycles was set to 1, that is, the extension was performed once, and the deformation, annealing and extension were not repeated.
After the reaction is completed, the obtained product strand is the template complementary strand to which the UMI has been added.
The second step: complementary extension of template complementary strand
Adding a second primer into a reaction product in the first step, wherein the working concentration of the second primer can be 50-500 mM; preferably, the working concentration is 200mM, i.e., the concentration of each primer is 200 mM.
After mixing well, the mixture was put into a PCR program, which was identical to the first step.
After the reaction is completed, the obtained product chain is the library fragment added with the UMI, and the sequence of the product chain is consistent with the target fragment on the template.
The third step: digestion of deoxyuracils using the UDG/UNG enzyme
The thermosensitive UDG/UNG enzyme is prepared, and may be a commercially available or self-made enzyme.
And taking out the reaction product in the second step, adding the prepared thermosensitive UDG/UNG enzyme, wherein the adding amount of the enzyme is adjusted according to the enzyme activity and the digestion efficiency, and generally, when the enzyme activity is more than 1U/mu L, adding 1 mu L. After fully and uniformly mixing, digesting all the sequences containing deoxyuracil in the system according to the optimal reaction temperature and conditions of the enzyme.
The purpose of this step is to extend the sequence for the first primer that is left in the digestion system and the first primer that is involved in the first step reaction. Finally, the product obtained contains only the initially added DNA template and the second step yielding an extended product strand with a unique UMI tag.
The fourth step: enrichment by specific amplification
Adding a third primer to the product of the third step of mixing, wherein the working concentration of the third primer can be 50-500 mM; preferably, the working concentration is 200 mM.
After the mixture was sufficiently mixed, a template complementary strand extension reaction was performed. The reaction program parameters should be set with reference to the PCR amplification kit instructions. The number of PCR cycles can be set individually according to project requirements and kit performance, and is preferably more than 5 cycles.
After the amplification is completed, all the products are taken out and subjected to nucleic acid purification. Obtaining a purified product with high purity and no impurities, and carrying out the next step of library construction and amplification. The purification method may be, but is not limited to, a magnetic bead method, a column chromatography method, or a gel method.
The fifth step: library construction and amplification
And (3) adding the purified product obtained in the fourth step into an amplification system, a fourth primer and a fifth primer, and fully mixing. The working concentration of the fourth primer and the fifth primer can be 200-2000mM, and is preferably 1500 mM.
The reaction program parameters should be set with reference to the PCR amplification kit instructions. The PCR cycle number can be set individually according to project requirements and kit performance.
And after the reaction is finished, obtaining the product, namely the operable computer library with complete joint information. The high purity library is obtained by nucleic acid purification. After quality detection and quantification, the library can be used for sequencing on a computer.
The present application will be described in further detail with reference to specific examples. The following examples are intended to be illustrative of the present application only and should not be construed as limiting the present application.
Examples
According to the method and the thought, the first primer to the fifth primer for sequencing the human immune repertoire are designed for the experiment. In this example, UMI was added to the sequencing of the human immune repertoire to correct preferential amplification and amplification errors, and 40 upstream V region primers and 12 downstream J region primers were designed using genomic DNA as a template to cover all subtypes in this region. The first primer and the second primer designed in the embodiment are used for performing multiplex PCR amplification on the TCR beta chain CDR3 area, so that the influence caused by preferential amplification and amplification error accumulation is eliminated, and the result is closer to the reality. The method comprises the following specific steps:
a sample of genomic DNA extracted from human peripheral blood was taken for future use. The genome DNA sample is provided and stored by Shenzhen Shanghai Prolos Biotech Limited.
Designing a first primer to a fifth primer according to the above thought:
a first primer, which refers to NCBI and IMGT standard database in the embodiment, searches a V gene reference sequence of a human T lymphocyte receptor beta chain; according to the base complementary pairing and primer design principle, combining with the required locus, and independently designing an upstream primer; in the example, a total of 40 target-specific upstream primer sequences are designed, and the sequences comprise 42V functional regions; all the functional region primers are mixed according to molar mass equal ratio for use. The first primer in this example consisted of 40 primers having the sequences shown in Seq ID No.1 to Seq ID No.40, as shown in Table 1.
The second primer refers to standard databases such as NCBI, IMGT and the like in the embodiment and searches a J gene reference sequence of a human T lymphocyte receptor beta chain; according to the base complementary pairing and primer design principle, combining with the required locus, and independently designing a downstream primer; in the example, a total of 12 target-specific upstream primer sequences are designed, and the total comprises 6J 1 functional regions and 7J 2 functional regions; all the functional region primers are mixed according to the molar mass equal ratio for use. The second primer of this example consisted of 12 primers having sequences shown in Seq ID No.41 to Seq ID No.52, as shown in Table 1.
TABLE 1 first and second primers
Figure BDA0003591926050000101
Figure BDA0003591926050000111
The third primer is a sequence shown in Seq ID No.53,
Seq ID No.53:5’-CACGACGCTCTTCCGATCT-3’。
the fourth primer is a sequence shown as Seq ID No.54,
Seq ID No.54:
5’-AATGATACGGCGACCACCGAGATCTACACNNNNNNACACTCTTTCCCTACACGACGCTCTTCCGATCT-3’;
"NNNNNN" of the fourth primer having the sequence shown in Seq ID No.54 is specifically "TGCGTAAT".
The fifth primer is a sequence shown as Seq ID No.55,
Seq ID No.55:
5’-CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT-3’;
the fifth primer having the sequence shown in Seq ID No.55, wherein "NNNNNN" is specifically "CCTAAACCT".
After the above primers were designed and synthesized, the primers were diluted with TE buffer to a concentration of 5. mu.M for the first, second and third primers, and 30. mu.M for the fourth and fifth primers.
100ng of the genomic DNA of the sample was taken and subjected to an amplification experiment using QIAgen MultiplexxPCR kit.
Preparing a reaction system, and sequentially adding the reaction system into a new 0.2mL tube, wherein the reaction system is as follows: the first primer was 4.5. mu.L, genomic DNA100ng, PCR Master mix 25. mu. L, Q-solution 5. mu.L, supplemented with NF water to 45. mu.L.
Gently mixing a 0.2mL sample tube mixed with a sample and a reagent, and then putting the sample tube on a BIORAD T100PCR instrument for PCR reaction, wherein the reaction program comprises the following steps: denaturation at 95 deg.C for 15min, then 94 deg.C for 30s, 60 deg.C for 90s, 72 deg.C for 90s, and finally extension at 72 deg.C for 5min, and standing at 4 deg.C.
And (3) after the reaction is finished, taking out a reaction product, adding 5 mu L of a second primer, uniformly mixing, and putting the mixture on a BIORAD T100PCR instrument for PCR reaction, wherein the reaction procedure is as follows: denaturation at 95 deg.C for 15min, then 94 deg.C for 30s, 60 deg.C for 90s, 72 deg.C for 90s, and finally extension at 72 deg.C for 5min, and standing at 4 deg.C.
After the reaction was completed, the reaction product was taken out, 1. mu.L of a Heat-sensitive UDG enzyme (Heat-label UDG, Vazyme) was added thereto, and after mixing, the mixture was reacted in the following procedure: digesting at 25 deg.C for 10min, inactivating at 55 deg.C for 5min, inactivating at 95 deg.C for 5min, and standing at 4 deg.C.
After the reaction is finished, taking out a reaction product, adding 5 mu L of third primer, uniformly mixing, and putting the mixture on a BIORAD T100PCR instrument for PCR reaction, wherein the reaction procedure is as follows: denaturation at 95 ℃ for 15min, then 30 cycles: 30s at 94 ℃, 90s at 56 ℃ and 90s at 72 ℃, extending for 5min at 72 ℃ after the circulation is finished, and standing by at 4 ℃.
After the reaction is completed, taking out the reaction product, and purifying the reaction product by using magnetic beads, wherein the detailed steps are as follows:
1. multiplex PCR product purification using 1.2 × AMpure XP beads magnetic beads: and taking a new 1.5mL sample tube, adding 50 mu L of the multiplex PCR product and 60 mu L of the AMpure XP beads magnetic beads which are uniformly mixed into the new 1.5mL sample tube, uniformly mixing by vortex, and placing at room temperature for 10min to fully combine the DNA with the magnetic beads. Place 1.5mL sample tube on magnetic rack, perform magnetic bead adsorption until the solution is clear, carefully remove the supernatant.
2. Then adding 500 mu L of 80% ethanol, rotating the sample tube by 180 degrees to enable the magnetic beads to penetrate through the solution and be sucked to the tube wall at the other side, rotating for 2-3 times, standing for 15s, and then removing the supernatant.
3. Repeating the step 2 once;
4. naturally standing a 1.5mL sample tube, adding 20 mu L of nuclease-free water into the 1.5mL sample tube after the alcohol is completely volatilized, and fully and uniformly mixing. And (3) placing the 1.5mL sample tube on a magnetic frame, carrying out magnetic bead adsorption until the solution is clear, carefully sucking out the supernatant, and placing the supernatant into a new 0.2 mu L sample tube to obtain a purified product.
And (3) constructing a sequencing library, namely mixing the fourth primer, the fifth primer, an amplification reagent and the purified product, and performing library construction and amplification. Amplification was performed using the KAPA HiFi Hotstart Ready Mix reagent in the following proportions: 2 XKAPA HiFi hotspot ReadyMix 25. mu.L, fourth primer 2.5. mu.L, fifth primer 2.5. mu.L, purified product 20. mu.L, supplemented with NF water to 50. mu.L.
After mixing well, put into the following procedure to react: denaturation at 98 ℃ for 45s, then 5 cycles: 15s at 98 ℃, 30s at 60 ℃ and 30s at 72 ℃, extending for 1min at 72 ℃ after circulation is finished, and standing by at 4 ℃.
After the end of the procedure, 50 μ L of the pooled amplified PCR product was obtained, and the multiplex PCR product was purified using 1 × AMpure XP magnetic beads:
and taking a new 1.5mL sample tube, adding 50 mu L of the multiple PCR product and 50 mu L of the AMpure XP magnetic beads which are uniformly mixed into the new 1.5mL sample tube, uniformly mixing by vortex, and placing at room temperature for 10min to fully combine the DNA with the magnetic beads. Place 1.5mL sample tube on magnetic rack, perform magnetic bead adsorption until the solution is clear, carefully remove supernatant.
Adding 500 mu L of 80% ethanol, rotating the sample tube at 180 ℃ to enable the magnetic beads to penetrate through the solution and be sucked to the tube wall at the other side, rotating for 2-3 times, standing for 15s, and then removing the supernatant; this step was repeated once.
Naturally standing a 1.5mL sample tube, adding 20 mu L of uclease-FreeWater into the 1.5mL sample tube after the alcohol is completely volatilized, and fully and uniformly mixing. And (3) placing a 1.5mL sample tube on a magnetic rack, carrying out magnetic bead adsorption until the solution is clarified, carefully sucking out the supernatant, and marking and storing to complete the library establishment of the amplicon.
High-throughput sequencing: the method uses an Illumina NovaSeq6000 high-throughput sequencing system to perform on-machine sequencing, and the on-machine mode is PE151+8+8+ 151.
As a result:
and (3) performing conventional quality control filtration and sequencing depth threshold filtration on off-line data obtained by sequencing, and analyzing the sequence UMI. This example was analyzed in two ways, the first analysis: as comparative analysis data, the ratio of each functional region of the upstream V region and the downstream J region, analysis diversity, cloning index and related indexes were directly analyzed without filtering the UMI sequence, regardless of the presence of UMI. The second analysis method: filtering the UMI sequence, wherein the sequence of the UMI and the CDR3 is considered to be from the same mother chain template, calculating the sequence as a copy number, comparing all the CDR3 sequences of the same UMI, analyzing whether the Read has single or a plurality of sporadic base differences, if so, filtering the difference sequence, correcting errors introduced by amplification and experiments, and analyzing the ratio of each functional region in the upstream V region and the downstream J region, analyzing diversity, cloning index and related indexes. The analysis results are shown in Table 2.
TABLE 2 comparative analysis of the Effect of UMI sequences on the sequencing results of the human immune repertoire
Analytical method Unfiltered UMI Filtering UMI
Total reads number (strips) 12503452 12503452
Wrong base ratio (%) 98 98
Total clone Reads (bar) 6559378 6559378
Capture Reads percentage (%) 98 98
Not captured Reads (%) 2 2
Random grabbing raw _ read (strip) 1000000 1000000
Unique cloned amino acid sequence number (bar) 30456 20903
Percentage of the highest-frequency amino acid sequence (%) 0.13 0.04
Shannon index 11.2 10.2
In table 2, "unfiltered UMI" is the first analysis method, "filtered UMI" is the second analysis method, "total Reads number" means the total Reads number contained in the raw data, "wrong base proportion" means the base proportion with an error rate of less than 0.1%, "captured Reads proportion" means the proportion of Reads in the CDR3 region captured, "uncapped Reads" means the proportion of Reads in the CDR3 region not captured, "randomly captured read" means the read band for analysis with randomly captured read, "unique cloned amino acid sequence number" means the higher the number of unique cloned amino acid sequences, "shannon index" value, the higher the diversity of the immune repertoire.
The results show that the highest frequency amino acid sequence occupancy is falsely high due to the inability to remove preferential amplification without filtering UMI, and that preferential amplification can be removed using UMI filtering to more truly reflect the highest CDR3 occupancy. In addition, the number of unique cloned amino acid sequences is higher than that of shannon index under the condition of data without filtering UMI, because a large number of amplification errors are accumulated under the condition of high amplification cycle number, and under the condition of no UMI, the sequences are difficult to identify whether the sequences are the sequences of the template or the errors introduced by experiments, however, the unique and diversity of CDR3 are increased by the wrong sequences, and the wrong sequences enter an analysis link, so that the index is high falsely. The data filtered by the UMI is closer to the real situation, inaccurate quantification caused by amplification deviation is corrected, the virtual height of an analysis index caused by amplification errors or errors introduced by experiments is corrected, and the data reliability is improved.
Therefore, the sequencing method of the human immune repertoire has good correction effect in sequencing errors, and can correct inaccurate quantification caused by amplification deviation and correct false positive/false positive of positive rate caused by amplification errors or errors introduced by experiments.
The foregoing is a more detailed description of the present application in connection with specific embodiments thereof, and it is not intended that the present application be limited to the specific embodiments thereof. It will be apparent to those skilled in the art from this disclosure that many more simple derivations or substitutions can be made without departing from the spirit of the disclosure.
SEQUENCE LISTING
<110> Shenzhen haipraos medical examination laboratory
<120> method and kit for sequencing human immune repertoire
<130> 22I33618
<160> 57
<170> PatentIn version 3.3
<210> 1
<211> 94
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 1
cacgacgcuc uuccgaucun nnnunnnncg gcauacgaac ugucauuaug cggcauacga 60
acugucauua ugcggcauac gaacugucau uaug 94
<210> 2
<211> 76
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 2
cacgacgcuc uuccgaucun nnnunnnncc cacguuaacc cuagauuaua cacccacguu 60
aacccuagau uauaca 76
<210> 3
<211> 72
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 3
cacgacgcuc uuccgaucun nnnunnnnac cguagauuau acaaugccug accguagauu 60
auacaaugcc ug 72
<210> 4
<211> 74
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 4
cacgacgcuc uuccgaucun nnnunnnnga acguuauauc cauauagaua ugaacguuau 60
auccauauag auau 74
<210> 5
<211> 70
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 5
cacgacgcuc uuccgaucun nnnunnnncg cugccugugc gugaauuggc gcugccugug 60
cgugaauugg 70
<210> 6
<211> 64
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 6
cacgacgcuc uuccgaucun nnnunnnnug ccugugccag aauuggugcc ugugccagaa 60
uugg 64
<210> 7
<211> 64
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 7
cacgacgcuc uuccgaucun nnnunnnngu gccagaauug uugauggugc cagaauuguu 60
gaug 64
<210> 8
<211> 66
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 8
cacgacgcuc uuccgaucun nnnunnnnau gauaaauaaa cgcacuaaug auaaauaaac 60
gcacua 66
<210> 9
<211> 66
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 9
cacgacgcuc uuccgaucun nnnunnnnau gauaaauaaa cccacuaaug auaaauaaac 60
ccacua 66
<210> 10
<211> 68
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 10
cacgacgcuc uuccgaucun nnnunnnnca uuaugugaac gacgugcaca uuaugugaac 60
gacgugca 68
<210> 11
<211> 74
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 11
cacgacgcuc uuccgaucun nnnunnnncg cauuauauuu cauuauguga acgcauuaua 60
uuucauuaug ugaa 74
<210> 12
<211> 64
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 12
cacgacgcuc uuccgaucun nnnunnnnug ccguuaacga gacacaugcc guuaacgaga 60
caca 64
<210> 13
<211> 66
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 13
cacgacgcuc uuccgaucun nnnunnnnua acauaugcag cuaacgauaa cauaugcagc 60
uaacga 66
<210> 14
<211> 72
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 14
cacgacgcuc uuccgaucun nnnunnnnua uaucauaugc cacuaacgag uauaucauau 60
gccacuaacg ag 72
<210> 15
<211> 68
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 15
cacgacgcuc uuccgaucun nnnunnnnca cgcgacgggg gcauacgaca cgcgacgggg 60
gcauacga 68
<210> 16
<211> 70
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 16
cacgacgcuc uuccgaucun nnnunnnnua acuauaacau augcagcuuu aacuauaaca 60
uaugcagcuu 70
<210> 17
<211> 64
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 17
cacgacgcuc uuccgaucun nnnunnnncg gcauaagaag ugucuacggc auaagaagug 60
ucua 64
<210> 18
<211> 64
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 18
cacgacgcuc uuccgaucun nnnunnnnuu ggcgcuaacg agacacuugg cgcuaacgag 60
acac 64
<210> 19
<211> 64
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 19
cacgacgcuc uuccgaucun nnnunnnnuu augugaacga cgagucuuau gugaacgacg 60
aguc 64
<210> 20
<211> 66
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 20
cacgacgcuc uuccgaucun nnnunnnnau gaauaauaaa cgcacuaaug aauaauaaac 60
gcacua 66
<210> 21
<211> 62
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 21
cacgacgcuc uuccgaucun nnnunnnnca aagauaaacg cacuacaaag auaaacgcac 60
ua 62
<210> 22
<211> 72
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 22
cacgacgcuc uuccgaucun nnnunnnngu gucauuaugu gaacuacgug gugucauuau 60
gugaacuacg ug 72
<210> 23
<211> 64
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 23
cacgacgcuc uuccgaucun nnnunnnngg gcauaggaac ugucuagggc auaggaacug 60
ucua 64
<210> 24
<211> 64
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 24
cacgacgcuc uuccgaucun nnnunnnncg cuaacgaaug acccgacgcu aacgaaugac 60
ccga 64
<210> 25
<211> 72
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 25
cacgacgcuc uuccgaucun nnnunnnnca uuaugugaac gacguuucga cauuauguga 60
acgacguuuc ga 72
<210> 26
<211> 68
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 26
cacgacgcuc uuccgaucun nnnunnnngg gcauacgaaa ugucauuagg gcauacgaaa 60
ugucauua 68
<210> 27
<211> 72
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 27
cacgacgcuc uuccgaucun nnnunnnngg cgggcagucu uaucauaugc ggcgggcagu 60
cuuaucauau gc 72
<210> 28
<211> 62
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 28
cacgacgcuc uuccgaucun nnnunnnncu aacggcggaa gccaccuaac ggcggaagcc 60
ac 62
<210> 29
<211> 70
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 29
cacgacgcuc uuccgaucun nnnunnnnga uugcggcuua cgacgugucg auugcggcuu 60
acgacguguc 70
<210> 30
<211> 68
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 30
cacgacgcuc uuccgaucun nnnunnnngg gcauacgaag ugucuauagg gcauacgaag 60
ugucuaua 68
<210> 31
<211> 66
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 31
cacgacgcuc uuccgaucun nnnunnnncg acggucguga gcagcgccga cggucgugag 60
cagcgc 66
<210> 32
<211> 64
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 32
cacgacgcuc uuccgaucun nnnunnnnaa accccgccaa agcacgaaac cccgccaaag 60
cacg 64
<210> 33
<211> 68
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 33
cacgacgcuc uuccgaucun nnnunnnngc acgugcaacg ugaaacuagc acgugcaacg 60
ugaaacua 68
<210> 34
<211> 62
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 34
cacgacgcuc uuccgaucun nnnunnnnga auggaccuaa uguaagaaug gaccuaaugu 60
aa 62
<210> 35
<211> 62
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 35
cacgacgcuc uuccgaucun nnnunnnnua ugaacuaaaa ccaacuauga acuaaaacca 60
ac 62
<210> 36
<211> 66
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 36
cacgacgcuc uuccgaucun nnnunnnnaa uaugugaacg acgugccaau augugaacga 60
cgugcc 66
<210> 37
<211> 72
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 37
cacgacgcuc uuccgaucun nnnunnnnca uuaugugaac gacguuucua cauuauguga 60
acgacguuuc ua 72
<210> 38
<211> 70
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 38
cacgacgcuc uuccgaucun nnnunnnngu caauauguga acgacguuug ucaauaugug 60
aacgacguuu 70
<210> 39
<211> 70
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 39
cacgacgcuc uuccgaucun nnnunnnncc acugcgaaau gccgcacgac cacugcgaaa 60
ugccgcacga 70
<210> 40
<211> 74
<212> RNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (20)..(23)
<223> n is a, c, g, or u
<220>
<221> misc_feature
<222> (25)..(28)
<223> n is a, c, g, or u
<400> 40
cacgacgcuc uuccgaucun nnnunnnnua uacgugcaua uagauuaucu auauacgugc 60
auauagauua ucua 74
<210> 41
<211> 40
<212> DNA
<213> Artificial sequence
<400> 41
agacgtgtgc tcttccgatc ttgcgtatgg tgaattgtaa 40
<210> 42
<211> 37
<212> DNA
<213> Artificial sequence
<400> 42
agacgtgtgc tcttccgatc ttaaaagcca agccggt 37
<210> 43
<211> 42
<212> DNA
<213> Artificial sequence
<400> 43
agacgtgtgc tcttccgatc ttcaccacgt gcgaaccatt aa 42
<210> 44
<211> 40
<212> DNA
<213> Artificial sequence
<400> 44
agacgtgtgc tcttccgatc tcgtaaacta caacccctga 40
<210> 45
<211> 39
<212> DNA
<213> Artificial sequence
<400> 45
agacgtgtgc tcttccgatc ttggtaaact taaacccgt 39
<210> 46
<211> 42
<212> DNA
<213> Artificial sequence
<400> 46
agacgtgtgc tcttccgatc taattattca atcgacaggt gc 42
<210> 47
<211> 41
<212> DNA
<213> Artificial sequence
<400> 47
agacgtgtgc tcttccgatc tgtacgaatc gcgaattata a 41
<210> 48
<211> 37
<212> DNA
<213> Artificial sequence
<400> 48
agacgtgtgc tcttccgatc tggtgaatgg gaacccc 37
<210> 49
<211> 37
<212> DNA
<213> Artificial sequence
<400> 49
agacgtgtgc tcttccgatc tacggcgaac atagaga 37
<210> 50
<211> 38
<212> DNA
<213> Artificial sequence
<400> 50
agacgtgtgc tcttccgatc taaaagcccg tccggcag 38
<210> 51
<211> 41
<212> DNA
<213> Artificial sequence
<400> 51
agacgtgtgc tcttccgatc ttgtgcaagt gcgaatggtg a 41
<210> 52
<211> 41
<212> DNA
<213> Artificial sequence
<400> 52
agacgtgtgc tcttccgatc ttgtgcaagt gcgaatggtg a 41
<210> 53
<211> 19
<212> DNA
<213> Artificial sequence
<400> 53
cacgacgctc ttccgatct 19
<210> 54
<211> 68
<212> DNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (30)..(35)
<223> n is a, c, g, or t
<400> 54
aatgatacgg cgaccaccga gatctacacn nnnnnacact ctttccctac acgacgctct 60
tccgatct 68
<210> 55
<211> 64
<212> DNA
<213> Artificial sequence
<220>
<221> misc_feature
<222> (25)..(30)
<223> n is a, c, g, or t
<400> 55
caagcagaag acggcatacg agatnnnnnn gtgactggag ttcagacgtg tgctcttccg 60
atct 64
<210> 56
<211> 19
<212> RNA
<213> Artificial sequence
<400> 56
cacgacgcuc uuccgaucu 19
<210> 57
<211> 21
<212> DNA
<213> Artificial sequence
<400> 57
agacgtgtgc tcttccgatc t 21

Claims (10)

1. A method of sequencing a human immune repertoire, comprising: comprises the following steps of (a) preparing a solution,
preparing a reaction system, and performing primary extension on the template nucleic acid by adopting a first primer to obtain a complementary strand; the first primer comprises a sequencing platform upstream primer binding region, a unique identifier and a target specificity upstream primer sequence from 5 'end to 3' end in sequence; in the first primer, the base T in the sequencing platform upstream primer binding region and the target specificity upstream primer sequence is replaced by deoxyuracil, and the sequencing platform upstream primer binding region corresponds to the 3' end of the upstream sequencing primer of the sequencing platform;
after the first primer is extended, adding a second primer into the reaction system, and performing primary extension on the extended complementary strand of the first primer by using the second primer to obtain a product consisting of a sequencing platform upstream primer binding region, a unique identifier, a target sequence and a sequencing platform downstream primer binding region; the second primer sequentially comprises a sequencing platform downstream primer binding region and a target specificity downstream primer sequence from a 5 ' end to a 3 ' end, and the sequencing platform downstream primer binding region corresponds to the 3 ' end of a downstream sequencing primer of a sequencing platform;
after the extension of the second primer is finished, adding UDG/UNG enzyme into the reaction system to digest deoxyuracil so as to digest the first primer and the extension chain of the first primer;
after the digestion of the UDG/UNG enzyme is finished, adding a third primer into the reaction system, and performing PCR amplification enrichment on a product extended by the second primer by using the third primer and the second primer to obtain products with the same unique identifier added to all amplicons of the template nucleic acid; the third primer is the whole or partial sequence of the upstream primer binding region of the sequencing platform of the first primer from the 5' end, and the base T in the third primer is not replaced by deoxyuracil;
adding products with the same unique identifier to all amplicons obtained by PCR amplification and enrichment to construct and sequence a sequencing library, namely completing sequencing of the human immune repertoire;
the target specificity upstream primer sequence and the target specificity downstream primer sequence are specificity primer sequences which are designed aiming at human T cell receptor coding genes and fully cover the coding gene sequences of CDR3 areas.
2. The method of claim 1, wherein: the target specificity upstream primer sequence is a specificity primer sequence designed aiming at a V gene of a human T cell receptor beta chain;
the target specificity downstream primer sequence is a specificity primer sequence designed aiming at J gene of human T cell receptor beta chain;
the CDR3 region encoding gene capable of fully covering the beta chain of the T cell receptor is amplified by the target specific upstream primer sequence and the target specific downstream primer sequence.
3. The method of claim 1, wherein: in the first primer, at least one deoxyuracil is inserted into the sequence of the unique identifier, and the number of continuous bases of the unique identifier is less than 5 through the separation of the inserted deoxyuracil;
preferably, the number of amplification cycles of said PCR amplification enrichment is greater than or equal to 5.
4. The method of claim 1, wherein: the first primer consists of 40 primers of sequences shown by Seq ID No.1 to Seq ID No. 40;
preferably, the second primer consists of 12 primers of the sequence shown in Seq ID No.41 to Seq ID No. 52;
preferably, the third primer is a sequence shown in Seq ID No. 53.
5. The method according to any one of claims 1 to 4, wherein: the sequencing library construction comprises the following steps,
purifying products obtained by adding the same unique identifier to all amplicons obtained by PCR amplification enrichment to obtain purified products;
adopting a fourth primer and a fifth primer to perform library construction and amplification on the purified product to obtain a sequencing library; the fourth primer is a sequencing platform upstream sequencing primer with a sequencing joint and a Barcode, and the fifth primer is a sequencing platform downstream sequencing primer with a sequencing joint and a Barcode.
6. The method of claim 5, wherein: the fourth primer is a sequence shown in Seq ID No. 54;
preferably, the fifth primer is a sequence shown in Seq ID No. 55.
7. A kit for sequencing a human immune repertoire, which is characterized in that: comprises a first primer, a second primer, a third primer and a UDG/UNG enzyme;
the first primer comprises a sequencing platform upstream primer binding region, a unique identifier and a target specificity upstream primer sequence from 5 'end to 3' end in sequence; in the first primer, the base T in the sequencing platform upstream primer binding region and the target specificity upstream primer sequence is replaced by deoxyuracil, and the sequencing platform upstream primer binding region corresponds to the 3' end of the upstream sequencing primer of the sequencing platform;
the second primer sequentially comprises a sequencing platform downstream primer binding region and a target specificity downstream primer sequence from a 5 ' end to a 3 ' end, and the sequencing platform downstream primer binding region corresponds to the 3 ' end of a downstream sequencing primer of a sequencing platform;
the third primer is the whole or partial sequence of the upstream primer binding region of the sequencing platform of the first primer from the 5' end, and the base T in the third primer is not replaced by deoxyuracil;
the target specificity upstream primer sequence and the target specificity downstream primer sequence are specificity primer sequences which are designed aiming at human T cell receptor coding genes and fully cover the coding gene sequences of CDR3 areas.
8. The kit of claim 7, wherein: the target specificity upstream primer sequence is a specificity primer sequence designed aiming at a V gene of a human T cell receptor beta chain;
the target specificity downstream primer sequence is a specificity primer sequence designed aiming at J gene of a human T cell receptor beta chain;
amplifying by the target specific forward primer sequence and the target specific reverse primer sequence a gene encoding a CDR3 region capable of fully covering the beta chain of the T cell receptor;
preferably, in the first primer, at least one deoxyuracil is inserted into the sequence of the unique identifier, and the number of consecutive bases of the unique identifier is less than 5 by the separation of the inserted deoxyuracils.
9. The kit of claim 7, wherein: the first primer consists of 40 primers of sequences shown by Seq ID No.1 to Seq ID No. 40;
preferably, the second primer consists of 12 primers with sequences shown in Seq ID No.41 to Seq ID No. 52;
preferably, the third primer is a sequence shown in Seq ID No. 53.
10. The kit according to any one of claims 7 to 9, characterized in that: also comprises a fourth primer and a fifth primer;
the fourth primer is a sequencing platform upstream sequencing primer with a sequencing joint and a Barcode, and the fifth primer is a sequencing platform downstream sequencing primer with a sequencing joint and a Barcode;
preferably, the fourth primer is a sequence shown as Seq ID No. 54;
preferably, the fifth primer is a sequence shown in Seq ID No. 55.
CN202210381164.8A 2022-04-12 2022-04-12 Method and kit for sequencing human immune repertoire Pending CN114774517A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210381164.8A CN114774517A (en) 2022-04-12 2022-04-12 Method and kit for sequencing human immune repertoire

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210381164.8A CN114774517A (en) 2022-04-12 2022-04-12 Method and kit for sequencing human immune repertoire

Publications (1)

Publication Number Publication Date
CN114774517A true CN114774517A (en) 2022-07-22

Family

ID=82428810

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210381164.8A Pending CN114774517A (en) 2022-04-12 2022-04-12 Method and kit for sequencing human immune repertoire

Country Status (1)

Country Link
CN (1) CN114774517A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115976221A (en) * 2023-03-21 2023-04-18 迈杰转化医学研究(苏州)有限公司 Internally doped reference substance for BCR or TCR rearrangement quantitative detection and preparation method and application thereof

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115976221A (en) * 2023-03-21 2023-04-18 迈杰转化医学研究(苏州)有限公司 Internally doped reference substance for BCR or TCR rearrangement quantitative detection and preparation method and application thereof
CN115976221B (en) * 2023-03-21 2023-05-23 迈杰转化医学研究(苏州)有限公司 Internally-doped reference substance for quantitative detection of BCR or TCR rearrangement as well as preparation method and application thereof

Similar Documents

Publication Publication Date Title
EP3763825B1 (en) High multiplex pcr with molecular barcoding
JP2019523638A (en) Multi-positioning double tag adapter set for detecting gene mutation, and its preparation method and application
US9334532B2 (en) Complexity reduction method
CN111363783B (en) T cell receptor library high-throughput sequencing library construction and sequencing data analysis method based on specific recognition sequence
CN111808854B (en) Balanced joint with molecular bar code and method for quickly constructing transcriptome library
US20160115544A1 (en) Molecular barcoding for multiplex sequencing
CN108070658B (en) Non-diagnostic method for detecting MSI
EP3643789A1 (en) Pcr primer pair and application thereof
CN110863056A (en) Method, reagent and application for accurately typing human DNA
CN113337576A (en) Library preparation method, kit and sequencing method
CN114774517A (en) Method and kit for sequencing human immune repertoire
CN109415768B (en) Variable region sequence library construction method, sequencing method and kit thereof
CN111647953A (en) High-throughput library construction kit and library construction method for detecting thalassemia gene mutation
CN111748637A (en) SNP molecular marker combination, multiplex composite amplification primer set, kit and method for genetic relationship analysis and identification
CN109852668B (en) Simplified genome sequencing library and library construction method thereof
CN108707653B (en) Kit for constructing variable region sequence library and sequencing method of variable region sequence
CN105316320B (en) DNA label, PCR primer and application thereof
CN107267600B (en) Primers, method and kit for enriching BRCA1 and BRCA2 gene target regions and application of primers, method and kit
EP3474168B1 (en) Method for measuring mutation rate
CN107164365B (en) Method for detecting chromosome telomere DNA full length by applying third generation sequencing technology
CN112639127A (en) Method for detecting and quantifying genetic alterations
CN114277114A (en) Method for adding unique identifier in amplicon sequencing and application
CN110957005B (en) Design of primer for amplicon sequencing and construction method of amplicon sequencing library
CN112322704B (en) Method for rapidly detecting DNA sequence mutation in batches
KR102187795B1 (en) Preparing method of library for next generation sequencing using deoxyuridine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination