CN109411011B - Design method and application of primer group - Google Patents

Design method and application of primer group Download PDF

Info

Publication number
CN109411011B
CN109411011B CN201811313752.8A CN201811313752A CN109411011B CN 109411011 B CN109411011 B CN 109411011B CN 201811313752 A CN201811313752 A CN 201811313752A CN 109411011 B CN109411011 B CN 109411011B
Authority
CN
China
Prior art keywords
primer
sequence
antibody
seq
sequences
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811313752.8A
Other languages
Chinese (zh)
Other versions
CN109411011A (en
Inventor
吴婷婷
李彦敏
陈淑美
蔡晓辉
杨平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Synbio Technologies
Original Assignee
Synbio Technologies
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Synbio Technologies filed Critical Synbio Technologies
Priority to CN201811313752.8A priority Critical patent/CN109411011B/en
Publication of CN109411011A publication Critical patent/CN109411011A/en
Application granted granted Critical
Publication of CN109411011B publication Critical patent/CN109411011B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides a design method of a primer group and application thereof, which skillfully extracts a sequence before an FR1 initial site at the upstream of an FR1 initial site of an antibody according to the requirement of amplification length, sequentially shifts and cuts a primer sequence with a fixed length from a first base at a 5' end to form a candidate primer library, and screens the candidate primer library after cluster analysis to obtain the primer group of an immune group library.

Description

Design method and application of primer group
Technical Field
The invention belongs to the technical field of biology, and relates to a design method and application of a primer group, in particular to a design method and application of a primer group of an immune repertoire.
Background
The Immune Repertoire (IR) refers to the sum of polymorphisms of all functionally diverse B and T cells in the circulatory system of an individual at any given time. The immune cells in the human body which are responsible for protecting the body mainly comprise T cells, B cells, macrophages, dendritic cells and the like. T, B cells are the major lymphocytes in humans and are responsible for cellular and humoral immunity, respectively, with B cells accounting for approximately 20% of the total number of peripheral lymphocytes. The BCR is formed by connecting two heavy chains and two light chains, wherein the heavy chains are divided into a variable region (V region), a constant region (C region), a transmembrane region and a cytoplasmic region; the light chain has only V and C regions. The V region is composed of two domains of VH and VL, each of which is composed of three Complementarity Determining Regions (CDR), namely CDR1, CDR2 and CDR3, and the three CDR regions are jointly involved in the recognition of an antibody to an antigen and jointly determine the antigen specificity of BCR and TCR. The amino acid composition and the arrangement sequence of the CDR regions in an individual show high diversity, and the diversity of the CDR regions can reach 10 in the same body9-1012And a large-capacity BCR library is formed, and the individual is endowed with great potential for recognizing various antigens and generating specific antibodies. Currently emerging immunityThe emphasis of the repertoire study has been focused on studying the diversity of CDR genes. Therefore, it is necessary to perform amplification of the immune repertoire.
At present, the immune repertoire amplification method mainly comprises the following steps: 5' -RACE method, multiplex PCR method, unique molecular identifiers (UID) method, etc.
The 5' -RACE method is a rapid amplification method of 5' end, which uses specific primer to carry out reverse transcription, adds a joint at the 5' end of a strand of cDNA to carry out secondary unbiased PCR amplification, and obtains a sequence containing a target region by the enrichment of avidin magnetic beads. The method only needs gene specific primers such as C region conserved region primers of BCR/TCR to amplify primers, can reduce multiple PCR deviation, but the method can only be used for amplifying RNA and sorting specific types of cells to be researched; the experiment is more complicated than common multiplex PCR, and has the preference of gene transcript length and GC content. The method uses primers designed only at one end of the C region, and the product length range is large due to rapid amplification. 5' RACE method can realize equivalent amplification of different clones with single clone number up to 105The 5' end of cDNA can be rapidly amplified from low abundance transcripts to amplify region CDRs. The bias of PCR amplification can be avoided to the maximum extent. The method has the defects of complex operation, loss of partial sequence due to the interruption of an experimental sample, poor repeatability and the like.
Multiplex PCR: more than two pairs of primers are added into the same PCR reaction system, and the PCR reaction of a plurality of nucleic acid fragments is simultaneously amplified, and the reaction principle, the reaction reagent and the operation process are the same as those of the common PCR. Multiplex PCR is the same as normal PCR, without breaking the sample, and the data is complete, but amplification is biased. The evolution of immune gene diversity is completed by gene replication and gene mutation, and the target BCR/TCR gene can be obtained by designing multiple PCR amplification primers for amplification. The method usually designs primers in conserved regions of a V region and a J/C region to realize multiplex PCR amplification, but amplification bias (amplification bias) is necessarily caused by different PCR amplification efficiencies of the primers, a large amount of primers are amplified, but the primers are hardly amplified, and the PCR amplification bias is eliminated only by finding an optimal primer concentration combination through optimization for several times, but the method has no universality on new primers, so the complexity of finding the optimal primer concentration is increased. Usually, the cloning of the variable region gene refers to an antibody sequence in a Kabat database, a plurality of sets of universal primers are designed aiming at a conserved region of the variable region of the antibody, the RT-PCR method is adopted to amplify the variable region gene from a cDNA library of human lymphocyte B cells, the method is simple and practical, and the universal primer at the 5' end is usually designed in a first framework region or a leader peptide region; the 3' universal primer is designed in a constant region or a J chain region, but the sequence of the antibody obtained by the method is relatively short, and the sequence before FR1 cannot be detected.
The UID method is characterized in that a unique UID is added to each molecule before a target molecule is amplified by large-scale PCR, the UID is a general 12-16 oligonucleotides (random barcode) which are randomly synthesized, and the random combination of the UID and the random oligonucleotides can generate huge number to add different labels to each specific molecule in a sample, so that even if the PCR amplification is not uniform, the deviation can be eliminated by a calculation method, and meanwhile, the errors of PCR and sequencing can be corrected. However, this method requires very long primers, leads to reduced amplification efficiency and shortened target gene fragments, and requires extremely high throughput to cover all UIDs, and is currently mainly applied to RNA sequencing of IGH/TCR.
Aiming at the primer design of an immune repertoire, the prior art is difficult to design an upstream primer for identifying a 5' end sequence of a BCR gene, and the upstream information of an antibody cannot be covered due to the primer design according to the region of FR1, so that the sequencing gene information is incomplete; for the primers in document 2(Wang X, Stollar BD. human immunoglobulin variable region gene analysis by single cell RT-PCR. J Immunol Methods 2000; 244: 217-25), coverage was low although the number of primers was small. The sequencing method of the immune repertoire of the multiplex primer PCR technology only needs dozens of pairs of primers when sequencing BCR, so that the amplification efficiency and specificity of the whole PCR are poor due to the huge number of the primers.
In view of the defects of the existing antibody immune repertoire amplification, a primer group design method which is simple, high in efficiency, high in coverage rate and low in mismatching rate is developed, and the method has wide application prospect and great market value.
Disclosure of Invention
Aiming at the defects and practical requirements of the prior art, the invention provides a design method and application of a primer group, skillfully extracts a sequence before an initial site of an antibody FR1 and designs a candidate primer library, and obtains the primer group by screening through bioinformatics design analysis.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, the present invention provides a method for designing a primer set, comprising the steps of:
(1) obtaining germline gene data for the heavy and/or light chain of an antibody of the species and genomic data for the species;
(2) positioning of a primer selection region: aligning germline genes of the heavy chain and/or the light chain of the antibody to the genome data of the species, finding an initial site of the antibody FR1, and extracting a sequence before the initial site;
(3) design of a candidate primer library: slicing the sequence before the initial site extracted in the step (2), and sequentially shifting and cutting primer sequences with fixed lengths from the first base at the 5' end to form a candidate primer library;
(4) performing cluster analysis, and screening to obtain an initial primer group;
(5) and comparing the initial primer group with an antibody database, and screening to obtain a final primer group.
In the invention, in long-term scientific research practice, aiming at the defects and disadvantages of the primer group design of the immune repertoire in the prior art, in order to solve the problems of efficiency reduction caused by excessive primer quantity, low coverage degree caused by less primer quantity, incapability of covering upstream information and the like, according to the requirement of amplification length, a sequence in front of an FR1 initial site is skillfully extracted at the upstream of an antibody FR1 initial site, a primer sequence with a fixed length is sequentially cut by displacement from a first base at a 5' end to form a candidate primer library, a primer group is obtained by screening after clustering analysis, and experiments prove that the primer group obtained by adopting the method can remarkably improve the coverage degree and the experimental efficiency, reduce the mismatching rate, is simple and saves the cost.
Preferably, the species of step (1) comprises any one of human, mouse, rat, rabbit, rhesus monkey, sheep, pig, teleost fish, cartilaginous fish, atlantic cod, clarias fuscus, rainbow trout, zebrafish, platypus, alpaca, cynomolgus monkey, cow, dog, chicken or salmon, or a combination of at least two thereof.
Preferably, the length range of the sequence before the initiation site in step (2) is 1-300bp, such as 1bp, 5bp, 10bp, 20bp, 30bp, 40bp, 50bp, 60bp, 70bp, 80bp, 90bp, 100bp, 110bp, 120bp, 130bp, 140bp, 150bp, 160bp, 170bp, 180bp, 190bp, 200bp, 210bp, 220bp, 230bp, 240bp, 250bp, 260bp, 270bp, 280bp, 290bp or 300 bp.
Preferably, the fixed length range of the primer sequence in step (3) is 16-26bp, such as 16bp, 17bp, 18bp, 19bp, 20bp, 21bp or 22bp, preferably 18-22 bp.
Preferably, the method of cluster analysis in step (4) is: and (3) setting a-c parameter through Cd-hit software, and clustering the candidate primer library sequences according to the-c parameter setting.
Preferably, the-c parameter is set to 0.8-1.0, for example, it may be 0.8, 0.9 or 1.0, preferably 0.9.
Preferably, the screening method in step (4) is: sorting the cluster according to the size, selecting the top 10-90 clusters, extracting the representative sequence of each cluster, and screening according to the primer design principle to obtain the initial primer group.
The top 10-90 cluster may be, for example, top 10, top 20, top 30, top 40, top 50, top 60, top 70, top 80, or top 90.
Preferably, the method for alignment in step (5) is: and comparing the initial primer group with an antibody database of the corresponding species to obtain the coverage of each sequence in the antibody database, sequencing the coverage according to the size, and selecting the primer with the top 10-40 ranks as a final primer group.
The primer of first 10-40 may be, for example, first 10, first 16, first 20, first 25, first 30, first 35 or first 40.
As a preferred technical scheme, the method for designing the primer group specifically comprises the following steps:
(1) obtaining germline gene data for the species' antibody light and/or heavy chains and genomic data for the species;
wherein the species comprises any one or a combination of at least two of human, mouse, rat, rabbit, rhesus monkey, sheep, pig, teleostean, cartilaginous fish, Atlantic cod, Clarias fuscus, rainbow trout, zebrafish, duck bill, alpaca, cynomolgus monkey, cow, dog, chicken or salmon;
(2) positioning of a primer selection region: comparing germline genes of the heavy chain and/or the light chain of the antibody to the genome data of the chromosome of the antibody of the species, finding the starting site of the antibody FR1, and extracting the sequence 1-300bp before the starting site;
(3) design of a candidate primer library: slicing the sequence before the initial site extracted in the step (2), and sequentially shifting and cutting a primer sequence with a fixed length from the first base at the 5' end, wherein the length range of the primer is 18-22bp, so as to form a candidate primer library;
(4) performing cluster analysis, and screening to obtain an initial primer group;
the method for cluster analysis comprises the following steps: setting a-c parameter through Cd-hit software, and clustering candidate primer library sequences according to the-c parameter setting;
the size of the-c parameter is set to 0.9;
the screening method comprises the following steps: sorting the cluster according to the size, selecting the top 10-90 clusters, extracting the representative sequence of each cluster, and screening according to the primer design principle to obtain an initial primer group;
(5) and comparing the initial primer group with an antibody database of a corresponding species to obtain the coverage of each primer in the antibody database, sequencing the coverage according to the size, and selecting the primer with the top 10-40 ranks as a final primer group.
In a second aspect, the present invention provides a human B cell immune repertoire heavy chain primer set designed by the method of the first aspect, wherein the sequence of the primer set comprises the sequence shown as SEQ NO.1-SEQ NO. 16;
the detailed sequence is as follows:
SEQ NO.1:ATGGACATACTTTGTTCCA
SEQ NO.2:CCATGGAGTTTGGGCTGAGC
SEQ NO.3:GGCTGAGCTGGGTTTTCCTT
SEQ NO.4:CTGAGCTGGGTTTTCCTTGT
SEQ NO.5:CTCCTGGTGGCAGCTCCCAG
SEQ NO.6:CCTCCTCCTGGTGGCAGCTC
SEQ NO.7:CAGCTCCCAGATGTGAGTGT
SEQ NO.8:GCTGGGTTTTCCTTGTTGCT
SEQ NO.9:ATGAAACACCTGTGGTTCTT
SEQ NO.10:CCTGGAGGATCCTCTTCTTG
SEQ NO.11:GGTTTTCCTTGTTGCTATTT
SEQ NO.12:TGGTGGCAGCTCCCAGATGT
SEQ NO.13:GGACGTGAGTGAGAGAAACA
SEQ NO.14:TCCTCACCATGGACTGGACC
SEQ NO.15:CTTGTTGGTATTTTAAAAGG
SEQ NO.16:GAGGATCCTCTTCTTGGTGG.
preferably, the sequences of the primer set comprise any fifteen primer sequences of the sequences shown as SEQ ID NO.1-SEQ ID NO. 16.
The arbitrary fifteen primer sequences are randomly selected from the sequences shown in SEQ ID NO.1-SEQ ID NO. 16.
Further preferably, the sequences of the primer set comprise any ten primer sequences in the sequences shown as SEQ ID NO.1-SEQ ID NO. 16.
The arbitrary ten primer sequences are randomly selected from sequences shown in SEQ ID NO.1-SEQ ID NO. 16.
In a third aspect, the present invention provides a use of a primer set according to the method of the first aspect or the second aspect for immune repertoire amplification.
In a fourth aspect, the present invention provides the use of a primer set according to the method of the first aspect or the second aspect for the preparation of a kit for amplifying an immune repertoire.
Compared with the prior art, the invention has the following beneficial effects:
the method provided by the invention is simple and efficient, can obviously improve the reaction efficiency and the coverage of the multiplex PCR, the coverage reaches 95-100%, the mismatch rate is reduced, the integrity of the antibody is higher, and the cost is saved.
Drawings
FIG. 1 is a PCR amplification electrophoretogram of the human B cell immune repertoire of the present invention;
FIG. 2 is a graph showing the length distribution of CDR3 amplified by the primer set of the present invention;
fig. 3 is a germline (germline) profile of CDR3 of IGVH according to the present invention.
Detailed Description
To further illustrate the technical means and effects of the present invention, the following further describes the technical solutions of the present invention by way of specific embodiments with reference to the drawings, but the present invention is not limited to the scope of the embodiments.
Example 1 design of primer set for human B cell immune repertoire
(1) Extraction of human germline Gene
The human heavy chain germline gene was downloaded from the IMGT database (http:// IMGT. org/vquest/refseq. html): 361 genes are extracted, wherein 1 FASTA sequence information is as follows:
>M99641|IGHV1-18*01|Homo sapiens|F|V-REGION|188..483|296nt|1|||||296+24=320|||
(2) primer selection region mapping
Aligning 361 sequences to chromosome 14 'hs _ ref _ GRCh38.p12_ chr14. fa', extracting a base sequence group of 150bp in front of the sequence starting site;
(3) design of candidate primer library
A69430 primer start data set was formed every 20bp to form a candidate primer library.
(4) Cluster analysis
Classifying the candidate primer library by using Cd-hit software, setting-c parameter to be 0.9, and performing cluster analysis; sorting the cluster according to the size, selecting the first 90 clusters, extracting the representative sequence of each cluster, and screening according to the primer design principle to obtain an initial primer group;
the primer original data group is clustered, the number behind ">" is the primer sequence number in the candidate primer library,% represents the similarity between the candidate primer sequence and the representative sequence, and the clustering partial results are shown in the following table 1:
TABLE 1
Figure BDA0001855709070000061
Figure BDA0001855709070000071
Figure BDA0001855709070000081
As can be seen from Table 1, the sequence in cluster1 and the representative sequence have similarity, i.e., the first 35 sequences in the cluster of more than 95% of the sequences in cluster after cd-hit screening.
(5) And comparing the initial primer group with a human antibody database to obtain the coverage of each sequence in the antibody database, sorting the values according to the sizes, selecting the top 20, analyzing GC content, annealing temperature and the like, and finally selecting 16 primers as a final heavy chain primer group for carrying out next experimental verification.
The selected 16 primer sequences (SEQ NO.1-SEQ NO.16) are the primer group of the selected human B cell immune repertoire, and the detailed sequence is as follows (5 '-3'):
SEQ NO.1:ATGGACATACTTTGTTCCA
SEQ NO.2:CCATGGAGTTTGGGCTGAGC
SEQ NO.3:GGCTGAGCTGGGTTTTCCTT
SEQ NO.4:CTGAGCTGGGTTTTCCTTGT
SEQ NO.5:CTCCTGGTGGCAGCTCCCAG
SEQ NO.6:CCTCCTCCTGGTGGCAGCTC
SEQ NO.7:CAGCTCCCAGATGTGAGTGT
SEQ NO.8:GCTGGGTTTTCCTTGTTGCT
SEQ NO.9:ATGAAACACCTGTGGTTCTT
SEQ NO.10:CCTGGAGGATCCTCTTCTTG
SEQ NO.11:GGTTTTCCTTGTTGCTATTT
SEQ NO.12:TGGTGGCAGCTCCCAGATGT
SEQ NO.13:GGACGTGAGTGAGAGAAACA
SEQ NO.14:TCCTCACCATGGACTGGACC
SEQ NO.15:CTTGTTGGTATTTTAAAAGG
SEQ NO.16:GAGGATCCTCTTCTTGGTGG
when evaluating the performance of the subsequent primer group, the immune group library is amplified by taking SEQ NO.1-SEQ NO.16 as an upstream primer; the immune repertoire is amplified by taking SEQ NO.17 as a downstream primer, and the sequence of the downstream primer SEQ NO.17 is (5 '-3'):
SEQ NO.17:GGGGAAGACCGATGGGCCCTTGGTGG
example 2 evaluation of the Performance of the primer set
(1) Sample preparation
Separating peripheral blood B lymphocyte with human lymphocyte separating liquid, extracting RNA, reverse transcribing to obtain cDNA, the steps are as follows:
1) collecting fresh peripheral blood samples (5) each of 10 mL (mL) and obtaining relatively pure Peripheral Blood Mononuclear Cells (PBMC) according to the LymphoPrep kit instructions;
2) RNA was extracted, and the concentration and purity of RNA were measured by Nanodrop2000, and then reverse transcription was performed using a TIANCcriptM-MLV reverse transcriptase kit (cat # ER104) to obtain cDNA (i.e., sample: ST-1, ST-2, ST-3, ST-4 and ST-5) are used as amplification templates for standby;
the reverse transcription reaction steps are as follows:
a) system configuration:
the reverse transcription reaction system was configured as shown in Table 2.
TABLE 2
Components Dosage of
oligo(dT)12-18 2μL
Total RNA 1μg
dNTP 2μL
ddH2O By ddH2O make up the system to 15. mu.L
b) Heating at 65 deg.C for 5min, rapidly cooling on ice for 2min, centrifuging briefly to collect reaction solution, and adding 5 × First-Strand Buffer and RNase (40U/. mu.L);
c) adding 1 μ L (200U) of TIANCcript M-MLV and gently mixing with a pipette;
d) bathing at 42 deg.C for 1h10 min;
e) the reaction was terminated by heating at 85 ℃ for 5min and placed on ice for subsequent experiments.
(2) Multiplex PCR amplification
1) Architecture configuration
Adding an upstream primer and a downstream primer to prepare a multiplex PCR system, wherein the upstream primer group (SEQ ID NO:1-SEQ ID NO:16) in the system is equimolar mixture, and the total concentration of the primers is 20 mu m; the total concentration of the downstream primer (SEQ NO.17) was 20 μm; cDNA (i.e., samples: ST-1, ST-2, ST-3, ST-4, ST-5) was used as an amplification template, and a multiplex PCR reaction system was prepared as shown in Table 3;
TABLE 3
Components Volume of
10 XPCR buffer 5μL
Upstream primer set 2μL
Downstream primer 2μL
dNTP 1μL
Taq enzyme 0.5μL
cDNA 5μL
ddH2O 34.5μL
Total of 50μL
2) PCR reaction
Setting PCR instrument program according to the conditions of multiplex PCR, carrying out multiplex PCR: after the PCR is finished, preserving the PCR product at 4 ℃ and carrying out electrophoresis detection, selecting a fragment with the fragment length of about 500bp, tapping and recovering the gel to obtain a purified antibody DNA fragment, wherein the gel recovery step adopts a QIAquick gel purification kit of QIAGEN company and is carried out according to conventional experimental operation; nanodrop2000 measures DNA concentration and performs high throughput sequencing.
Multiplex PCR conditions are as follows in Table 4:
TABLE 4
Figure BDA0001855709070000101
The PCR amplification electrophoretogram of the human B cell immune repertoire obtained by gel electrophoresis after PCR amplification is shown in FIG. 1, and it can be seen from FIG. 1 that a band is evident around 500bp, and the band range amplified by the primer set 1 (i.e., SEQ ID NO:1-SEQ ID NO:16) of the present invention is substantially consistent with that amplified by the primer set 2 (document 1(1.High-throughput isolation of immunological genes from single human cells B cells and expression as monoclonal antibodies. Liao HX1, Levesque MC, Nanel A, et al. J Virol methods.2009; 158(1-2): 171-9.)). It can be preliminarily judged that the primer set (SEQ ID NO:1-SEQ ID NO:16) designed by the invention can amplify the heavy chain sequence of the human B cell immune repertoire.
(3) Analysis of primer utilization
The use of bioinformatics bowtie software, analysis of the primer (SEQ ID NO:1-SEQ ID NO:16) and 5 samples (ST-1, ST-2, ST-3, ST-4, ST-5) sequence matching, 5' RACE method as the control (i.e. the method of the amplification of the antibody sequence as the benchmark), analysis of the primer utilization as shown in Table 5 below;
TABLE 5
Figure BDA0001855709070000111
In the above table:
a 0 mismatch indicates that mismatches between the primer sequence and the antibody sequence are not allowed;
1 mismatch means that 1 base of the primer sequence and the antibody sequence are allowed to be mismatched;
a2 mismatch is one that allows 2 bases of the primer sequence to mismatch the antibody sequence.
The results of the above table show: the designed primer set (SEQ ID NO:1-SEQ ID NO:16) has good match with the sample.
(4) Coverage analysis
Samples ST-1, ST-2, ST-3, ST-4 and ST-5 are taken as amplification templates, then PCR amplification is carried out by respectively using the primers designed by the invention (SEQ NO.1-SEQ NO.16, namely the primer group 1 in the table) and the primers in the literature 1 (the primer group 2 in the table), and the coverage of the antibody sequence amplified by the primers of the invention is analyzed by taking the 5' RACE method as a reference (namely the antibody sequence amplified by the method is taken as a reference), and the analysis result of the coverage is shown in the following table 6:
TABLE 6
Figure BDA0001855709070000112
Coverage is equal to "PCR method" divided by "5' RACE method" multiplied by one hundred percent.
Wherein: "PCR method" refers to the number of amplified antibody sequences by multiplex PCR method that are identical to the sequence of the antibody amplified by 5' RACE method; the "5 'RACE method" refers to the total number of antibody sequences amplified by the 5' RACE method.
In the above table, for 5 samples tested, the average coverage of the antibody sequences amplified by the primer set 1 is 95.01% or more, which is 89.35% on average compared with the coverage of the primer set 2 in the literature, and the coverage of the primer set 1 designed by the present invention is 5.66% higher on average compared with the coverage of the primers disclosed in the literature, and the coverage of the primer set designed by the present invention is very high.
Example 3 immunohistochemical library bioinformation analysis
The designed primer set (SEQ ID NO:1-SEQ ID NO:16) was used as an upstream primer, and the sequence of SEQ ID NO: and 17, a downstream primer, namely a human B cell immune repertoire amplified, and performing subsequent bioinformatics analysis, wherein the analysis result is as follows:
(1) CDR3 region Length analysis
In the CDR3 amino acid sequence, a change of 1 amino acid (aa) may cause a change in receptor conformation, and therefore, a change in the length of the CDR3 amino acid sequence may reflect the diversity of the CDR3 gene junction region, and after analyzing the sequence information of the immune repertoire by Igblast software, the distribution of the CDR3 length of the immune repertoire (see fig. 2) and the use of germline genes (germline) in the CDR3 region (see fig. 3) are obtained.
As can be seen from FIG. 2, the amino acid length of CDR3 is mainly concentrated between 9-12 amino acids.
(2) Use of germline genes (germline) in the CDR3 region
FIG. 3 is a Germline profile and usage profile of CDR3 of IGVH; the abscissa is different Germlines and the ordinate is the number of certain Germlines, the result is shown in FIG. 3;
from figure (3) it can be seen that the abundance of the different germlines is very different, some fragments are significantly more abundant than others, and the germlines of the VH of this sample are mainly concentrated in IGVH3-7 x 01.
(3) The kinds of amino acids in the CDR3 region, and the frequency and length of the amino acid sequence are shown in Table 7;
TABLE 7
Figure BDA0001855709070000121
Figure BDA0001855709070000131
As can be seen from Table 7, the sequences of CDR3 vary greatly, and the abundance of some fragments, such as "AVDSNYQLI", is higher than that of other fragments, so that the characteristic antibodies in the immune repertoire can be judged according to the abundance of CDR3, and an idea is provided for subsequent research and treatment.
In conclusion, the invention provides a method for designing a primer group, according to the requirement of amplification length, a sequence in front of an FR1 initiation site is skillfully extracted at the upstream of the initiation site of an antibody FR1, a primer sequence with a fixed length is sequentially cut by shifting from the first base at the 5' end to form a candidate primer library, and a primer group is obtained by screening after cluster analysis.
The applicant states that the present invention is illustrated in detail by the above examples, but the present invention is not limited to the above detailed methods, i.e. it is not meant that the present invention must rely on the above detailed methods for its implementation. It should be understood by those skilled in the art that any modification of the present invention, equivalent substitutions of the raw materials of the product of the present invention, addition of auxiliary components, selection of specific modes, etc., are within the scope and disclosure of the present invention.
Sequence listing
<110> Suzhou Hongxn Biotechnology Ltd
<120> design method of primer group and application thereof
<130> 2018
<141> 2018-11-06
<160> 17
<170> SIPOSequenceListing 1.0
<210> 1
<211> 19
<212> DNA
<213> Artificial Synthesis ()
<400> 1
atggacatac tttgttcca 19
<210> 2
<211> 20
<212> DNA
<213> Artificial Synthesis ()
<400> 2
ccatggagtt tgggctgagc 20
<210> 3
<211> 20
<212> DNA
<213> Artificial Synthesis ()
<400> 3
ggctgagctg ggttttcctt 20
<210> 4
<211> 20
<212> DNA
<213> Artificial Synthesis ()
<400> 4
ctgagctggg ttttccttgt 20
<210> 5
<211> 20
<212> DNA
<213> Artificial Synthesis ()
<400> 5
ctcctggtgg cagctcccag 20
<210> 6
<211> 20
<212> DNA
<213> Artificial Synthesis ()
<400> 6
cctcctcctg gtggcagctc 20
<210> 7
<211> 20
<212> DNA
<213> Artificial Synthesis ()
<400> 7
cagctcccag atgtgagtgt 20
<210> 8
<211> 20
<212> DNA
<213> Artificial Synthesis ()
<400> 8
gctgggtttt ccttgttgct 20
<210> 9
<211> 20
<212> DNA
<213> Artificial Synthesis ()
<400> 9
atgaaacacc tgtggttctt 20
<210> 10
<211> 20
<212> DNA
<213> Artificial Synthesis ()
<400> 10
cctggaggat cctcttcttg 20
<210> 11
<211> 20
<212> DNA
<213> Artificial Synthesis ()
<400> 11
ggttttcctt gttgctattt 20
<210> 12
<211> 20
<212> DNA
<213> Artificial Synthesis ()
<400> 12
tggtggcagc tcccagatgt 20
<210> 13
<211> 20
<212> DNA
<213> Artificial Synthesis ()
<400> 13
ggacgtgagt gagagaaaca 20
<210> 14
<211> 20
<212> DNA
<213> Artificial Synthesis ()
<400> 14
tcctcaccat ggactggacc 20
<210> 15
<211> 20
<212> DNA
<213> Artificial Synthesis ()
<400> 15
cttgttggta ttttaaaagg 20
<210> 16
<211> 20
<212> DNA
<213> Artificial Synthesis ()
<400> 16
gaggatcctc ttcttggtgg 20
<210> 17
<211> 26
<212> DNA
<213> Artificial Synthesis ()
<400> 17
ggggaagacc gatgggccct tggtgg 26

Claims (13)

1. A method for designing a primer set is characterized by comprising the following steps:
(1) obtaining germline gene data for the heavy and/or light chain of an antibody of the species and genomic data for the species;
(2) positioning of a primer selection region: aligning germline genes of the heavy chain and/or the light chain of the antibody to the genome data of the species, finding an initial site of the antibody FR1, and extracting a sequence before the initial site;
(3) design of a candidate primer library: slicing the sequence before the initial site extracted in the step (2), and sequentially shifting and cutting primer sequences with fixed lengths from the first base at the 5' end to form a candidate primer library;
(4) performing cluster analysis, and screening to obtain an initial primer group;
the method for cluster analysis comprises the following steps: setting a-c parameter through Cd-hit software, and clustering candidate primer library sequences according to the-c parameter setting;
the-c parameter is set to 0.8-1.0;
the screening method comprises the following steps: sorting the cluster according to the size, selecting the top 10-90 clusters, extracting the representative sequence of each cluster, and screening to obtain an initial primer group;
(5) and comparing the initial primer group with an antibody database, and screening to obtain a final primer group.
2. The method according to claim 1, wherein the species of step (1) comprises any one or a combination of at least two of human, mouse, rat, rabbit, rhesus monkey, sheep, pig, teleost fish, cartilaginous fish, platypus, alpaca, cynomolgus monkey, cow, dog or chicken.
3. The method of claim 1, wherein the sequence preceding the start site in step (2) is in the range of 1-300bp in length.
4. The method of claim 1, wherein the fixed length of the primer sequence of step (3) is in the range of 16-26 bp.
5. The method of claim 4, wherein the fixed length of the primer sequence of step (3) is in the range of 18-22 bp.
6. The method of claim 1, wherein the-c parameter of step (4) is set to 0.9.
7. The method of claim 1, wherein the alignment of step (5) is performed by: and comparing the initial primer group with the antibody database data of the corresponding species to obtain the coverage of each sequence in the antibody database, sequencing the coverage according to the size, and selecting the primer with the top 10-40 ranks as the final primer group.
8. The method according to claim 1, characterized in that it comprises in particular the steps of:
(1) obtaining germline gene data for the species' antibody light and/or heavy chains and genomic data for the species;
wherein the species comprises any one or a combination of at least two of human, mouse, rat, rabbit, rhesus monkey, sheep, pig, teleost, cartilaginous fish, platypus, alpaca, cynomolgus monkey, cow, dog or chicken;
(2) positioning a primer selection region: comparing germline genes of the heavy chain and/or the light chain of the antibody to genome data of a chromosome where the species antibody is located, finding an initial site of the antibody FR1, and extracting a sequence 1-300bp before the initial site;
(3) design of a candidate primer library: slicing the sequence before the initial site extracted in the step (2), and sequentially shifting and cutting a primer sequence with a fixed length from the first base at the 5' end, wherein the length range of the primer is 18-22bp, so as to form a candidate primer library;
(4) performing cluster analysis, and screening to obtain an initial primer group;
the method for cluster analysis comprises the following steps: setting a-c parameter through Cd-hit software, and clustering candidate primer library sequences according to the-c parameter setting;
the size of the-c parameter is set to 0.9;
the screening method comprises the following steps: sorting the cluster according to the size, selecting the top 10-90 clusters, extracting the representative sequence of each cluster, and screening to obtain an initial primer group;
(5) and comparing the initial primer group with an antibody database of the corresponding species to obtain the coverage of each primer in the antibody database, sequencing the coverage according to the size, and selecting the primer with the top 10-40 ranks as a final primer group.
9. A human B cell immune repertoire heavy chain primer set designed by the method of any one of claims 1 to 8, wherein the sequences of the primer set comprise sequences set forth as SEQ No.1 to SEQ No. 16.
10. The human B-cell repertoire heavy chain primer set of claim 9, wherein the sequences of the primer set comprise any fifteen primer sequences of the sequences shown as SEQ ID No.1-SEQ ID No. 16.
11. The human B-cell repertoire heavy chain primer set of claim 10, wherein the sequence of the primer set comprises any ten primer sequences of the sequences set forth as SEQ ID No.1-SEQ ID No. 16.
12. A method of use of the method of any one of claims 1 to 8 or the primer set of any one of claims 9 to 11 in amplification of an immune repertoire.
13. A device for use of the method of any one of claims 1 to 8 or the primer set of any one of claims 9 to 11 for preparing a kit for amplifying a repertoire of immunology.
CN201811313752.8A 2018-11-06 2018-11-06 Design method and application of primer group Active CN109411011B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811313752.8A CN109411011B (en) 2018-11-06 2018-11-06 Design method and application of primer group

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811313752.8A CN109411011B (en) 2018-11-06 2018-11-06 Design method and application of primer group

Publications (2)

Publication Number Publication Date
CN109411011A CN109411011A (en) 2019-03-01
CN109411011B true CN109411011B (en) 2022-05-17

Family

ID=65471732

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811313752.8A Active CN109411011B (en) 2018-11-06 2018-11-06 Design method and application of primer group

Country Status (1)

Country Link
CN (1) CN109411011B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110428870B (en) * 2019-08-08 2023-03-21 苏州泓迅生物科技股份有限公司 Method for predicting antibody heavy chain and light chain pairing probability and application thereof
CN111326210B (en) * 2020-03-11 2023-07-14 中国科学院生态环境研究中心 Primer design method and system based on k-mer algorithm

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103409544A (en) * 2013-08-19 2013-11-27 苏州吉泰生物科技有限公司 Application of primer to detection of false negative of polymerase chain reaction (PCR) of various organisms
CN103890245A (en) * 2011-05-20 2014-06-25 富鲁达公司 Nucleic acid encoding reaction
CN105925681A (en) * 2016-05-06 2016-09-07 博尔诚(北京)科技有限公司 Composition for lung cancer screening and application of composition
CN107435065A (en) * 2016-05-10 2017-12-05 江苏荃信生物医药有限公司 The method for identifying primate antibody
CN107760672A (en) * 2016-08-17 2018-03-06 苏州泓迅生物科技股份有限公司 A kind of industrialization method for synthesizing gene based on two generation sequencing technologies
CN107964036A (en) * 2017-05-27 2018-04-27 武汉博沃生物科技有限公司 Respiratory Syncytial Virus(RSV) recombinant protein and its preparation method and application

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103890245A (en) * 2011-05-20 2014-06-25 富鲁达公司 Nucleic acid encoding reaction
CN103409544A (en) * 2013-08-19 2013-11-27 苏州吉泰生物科技有限公司 Application of primer to detection of false negative of polymerase chain reaction (PCR) of various organisms
CN105925681A (en) * 2016-05-06 2016-09-07 博尔诚(北京)科技有限公司 Composition for lung cancer screening and application of composition
CN107435065A (en) * 2016-05-10 2017-12-05 江苏荃信生物医药有限公司 The method for identifying primate antibody
CN107760672A (en) * 2016-08-17 2018-03-06 苏州泓迅生物科技股份有限公司 A kind of industrialization method for synthesizing gene based on two generation sequencing technologies
CN107964036A (en) * 2017-05-27 2018-04-27 武汉博沃生物科技有限公司 Respiratory Syncytial Virus(RSV) recombinant protein and its preparation method and application

Also Published As

Publication number Publication date
CN109411011A (en) 2019-03-01

Similar Documents

Publication Publication Date Title
US11913017B2 (en) Efficient genetic screening method
CN106086013B (en) A kind of probe and design method for nucleic acid enriching capture
CN105821075A (en) Establishment method of caffeine synthetase CRISPR/Cas9 genome editing vector
CN111363783B (en) T cell receptor library high-throughput sequencing library construction and sequencing data analysis method based on specific recognition sequence
CN106319639B (en) Build the method and apparatus of sequencing library
CN105063032A (en) Multiple PCR primers and method for constructing leukemia minimal residual disease BCR library based on high-flux sequencing
CN109411011B (en) Design method and application of primer group
CN113463202B (en) Novel RNA high-throughput sequencing method, primer group and kit and application thereof
CN107779495B (en) Construction method and kit of T cell antigen receptor diversity sequencing library
CN105154440A (en) Multi-PCR primer and method for constructing leukemia minimal residual disease TCR library based on high-throughput sequencing
CN107038349B (en) Method and apparatus for determining pre-rearrangement V/J gene sequence
CN114736971A (en) SNP molecular marker related to egg yield of female pigeons, kit and application thereof
CN109706231B (en) High-throughput SNP (single nucleotide polymorphism) typing method for molecular breeding of litopenaeus vannamei
CN111192636A (en) mRNA next-generation sequencing result analysis method suitable for oligodT enrichment
CN114107444A (en) Method for discovering and regulating plant development key regulation factor and application thereof
CN111662970B (en) Three-generation library-building sequencing method for full-length amplification of BCR immune repertoire
Lyu et al. TEAseq-based identification of 35,696 Dissociation insertional mutations facilitates functional genomic studies in maize
CN107058484B (en) Primer combination and kit applied to high-throughput sequencing and simultaneous detection of T cell and B cell immune repertoire
CN107354151B (en) STR molecular marker developed based on sika whole genome and application thereof
US20230220466A1 (en) Immune cell sequencing methods
CN113061609B (en) sgRNA for specifically recognizing porcine IGF2R site and coding DNA and application thereof
CN114107287A (en) Preparation method for comprehensively amplifying humann TCR beta chain library by adopting a small amount of degenerate primers
CN110331216A (en) The specificity amplification primer of sika deer microsatellite locus M027 a kind of and its application
CN106520758A (en) Screening and identifying method of miRNAs (micro Ribonucleic Acids) of fetal fibroblasts of Saanen dairy goats
CN112695133A (en) Method for screening molecular markers associated with pear brown skin by using BSA-seq

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant