CN109411011B

CN109411011B - Design method and application of primer group

Info

Publication number: CN109411011B
Application number: CN201811313752.8A
Authority: CN
Inventors: 吴婷婷; 李彦敏; 陈淑美; 蔡晓辉; 杨平
Original assignee: Synbio Technologies
Current assignee: Synbio Technologies
Priority date: 2018-11-06
Filing date: 2018-11-06
Publication date: 2022-05-17
Anticipated expiration: 2038-11-06
Also published as: CN109411011A

Abstract

The invention provides a design method of a primer group and application thereof, which skillfully extracts a sequence before an FR1 initial site at the upstream of an FR1 initial site of an antibody according to the requirement of amplification length, sequentially shifts and cuts a primer sequence with a fixed length from a first base at a 5' end to form a candidate primer library, and screens the candidate primer library after cluster analysis to obtain the primer group of an immune group library.

Description

Design method and application of primer group

Technical Field

The invention belongs to the technical field of biology, and relates to a design method and application of a primer group, in particular to a design method and application of a primer group of an immune repertoire.

Background

The Immune Repertoire (IR) refers to the sum of polymorphisms of all functionally diverse B and T cells in the circulatory system of an individual at any given time. The immune cells in the human body which are responsible for protecting the body mainly comprise T cells, B cells, macrophages, dendritic cells and the like. T, B cells are the major lymphocytes in humans and are responsible for cellular and humoral immunity, respectively, with B cells accounting for approximately 20% of the total number of peripheral lymphocytes. The BCR is formed by connecting two heavy chains and two light chains, wherein the heavy chains are divided into a variable region (V region), a constant region (C region), a transmembrane region and a cytoplasmic region; the light chain has only V and C regions. The V region is composed of two domains of VH and VL, each of which is composed of three Complementarity Determining Regions (CDR), namely CDR1, CDR2 and CDR3, and the three CDR regions are jointly involved in the recognition of an antibody to an antigen and jointly determine the antigen specificity of BCR and TCR. The amino acid composition and the arrangement sequence of the CDR regions in an individual show high diversity, and the diversity of the CDR regions can reach 10 in the same body⁹-10¹²And a large-capacity BCR library is formed, and the individual is endowed with great potential for recognizing various antigens and generating specific antibodies. Currently emerging immunityThe emphasis of the repertoire study has been focused on studying the diversity of CDR genes. Therefore, it is necessary to perform amplification of the immune repertoire.

At present, the immune repertoire amplification method mainly comprises the following steps: 5' -RACE method, multiplex PCR method, unique molecular identifiers (UID) method, etc.

The 5' -RACE method is a rapid amplification method of 5' end, which uses specific primer to carry out reverse transcription, adds a joint at the 5' end of a strand of cDNA to carry out secondary unbiased PCR amplification, and obtains a sequence containing a target region by the enrichment of avidin magnetic beads. The method only needs gene specific primers such as C region conserved region primers of BCR/TCR to amplify primers, can reduce multiple PCR deviation, but the method can only be used for amplifying RNA and sorting specific types of cells to be researched; the experiment is more complicated than common multiplex PCR, and has the preference of gene transcript length and GC content. The method uses primers designed only at one end of the C region, and the product length range is large due to rapid amplification. 5' RACE method can realize equivalent amplification of different clones with single clone number up to 10⁵The 5' end of cDNA can be rapidly amplified from low abundance transcripts to amplify region CDRs. The bias of PCR amplification can be avoided to the maximum extent. The method has the defects of complex operation, loss of partial sequence due to the interruption of an experimental sample, poor repeatability and the like.

Multiplex PCR: more than two pairs of primers are added into the same PCR reaction system, and the PCR reaction of a plurality of nucleic acid fragments is simultaneously amplified, and the reaction principle, the reaction reagent and the operation process are the same as those of the common PCR. Multiplex PCR is the same as normal PCR, without breaking the sample, and the data is complete, but amplification is biased. The evolution of immune gene diversity is completed by gene replication and gene mutation, and the target BCR/TCR gene can be obtained by designing multiple PCR amplification primers for amplification. The method usually designs primers in conserved regions of a V region and a J/C region to realize multiplex PCR amplification, but amplification bias (amplification bias) is necessarily caused by different PCR amplification efficiencies of the primers, a large amount of primers are amplified, but the primers are hardly amplified, and the PCR amplification bias is eliminated only by finding an optimal primer concentration combination through optimization for several times, but the method has no universality on new primers, so the complexity of finding the optimal primer concentration is increased. Usually, the cloning of the variable region gene refers to an antibody sequence in a Kabat database, a plurality of sets of universal primers are designed aiming at a conserved region of the variable region of the antibody, the RT-PCR method is adopted to amplify the variable region gene from a cDNA library of human lymphocyte B cells, the method is simple and practical, and the universal primer at the 5' end is usually designed in a first framework region or a leader peptide region; the 3' universal primer is designed in a constant region or a J chain region, but the sequence of the antibody obtained by the method is relatively short, and the sequence before FR1 cannot be detected.

The UID method is characterized in that a unique UID is added to each molecule before a target molecule is amplified by large-scale PCR, the UID is a general 12-16 oligonucleotides (random barcode) which are randomly synthesized, and the random combination of the UID and the random oligonucleotides can generate huge number to add different labels to each specific molecule in a sample, so that even if the PCR amplification is not uniform, the deviation can be eliminated by a calculation method, and meanwhile, the errors of PCR and sequencing can be corrected. However, this method requires very long primers, leads to reduced amplification efficiency and shortened target gene fragments, and requires extremely high throughput to cover all UIDs, and is currently mainly applied to RNA sequencing of IGH/TCR.

Aiming at the primer design of an immune repertoire, the prior art is difficult to design an upstream primer for identifying a 5' end sequence of a BCR gene, and the upstream information of an antibody cannot be covered due to the primer design according to the region of FR1, so that the sequencing gene information is incomplete; for the primers in document 2(Wang X, Stollar BD. human immunoglobulin variable region gene analysis by single cell RT-PCR. J Immunol Methods 2000; 244: 217-25), coverage was low although the number of primers was small. The sequencing method of the immune repertoire of the multiplex primer PCR technology only needs dozens of pairs of primers when sequencing BCR, so that the amplification efficiency and specificity of the whole PCR are poor due to the huge number of the primers.

In view of the defects of the existing antibody immune repertoire amplification, a primer group design method which is simple, high in efficiency, high in coverage rate and low in mismatching rate is developed, and the method has wide application prospect and great market value.

Disclosure of Invention

Aiming at the defects and practical requirements of the prior art, the invention provides a design method and application of a primer group, skillfully extracts a sequence before an initial site of an antibody FR1 and designs a candidate primer library, and obtains the primer group by screening through bioinformatics design analysis.

In order to achieve the purpose, the invention adopts the following technical scheme:

in a first aspect, the present invention provides a method for designing a primer set, comprising the steps of:

(1) obtaining germline gene data for the heavy and/or light chain of an antibody of the species and genomic data for the species;

(2) positioning of a primer selection region: aligning germline genes of the heavy chain and/or the light chain of the antibody to the genome data of the species, finding an initial site of the antibody FR1, and extracting a sequence before the initial site;

(3) design of a candidate primer library: slicing the sequence before the initial site extracted in the step (2), and sequentially shifting and cutting primer sequences with fixed lengths from the first base at the 5' end to form a candidate primer library;

(4) performing cluster analysis, and screening to obtain an initial primer group;

(5) and comparing the initial primer group with an antibody database, and screening to obtain a final primer group.

In the invention, in long-term scientific research practice, aiming at the defects and disadvantages of the primer group design of the immune repertoire in the prior art, in order to solve the problems of efficiency reduction caused by excessive primer quantity, low coverage degree caused by less primer quantity, incapability of covering upstream information and the like, according to the requirement of amplification length, a sequence in front of an FR1 initial site is skillfully extracted at the upstream of an antibody FR1 initial site, a primer sequence with a fixed length is sequentially cut by displacement from a first base at a 5' end to form a candidate primer library, a primer group is obtained by screening after clustering analysis, and experiments prove that the primer group obtained by adopting the method can remarkably improve the coverage degree and the experimental efficiency, reduce the mismatching rate, is simple and saves the cost.

Preferably, the species of step (1) comprises any one of human, mouse, rat, rabbit, rhesus monkey, sheep, pig, teleost fish, cartilaginous fish, atlantic cod, clarias fuscus, rainbow trout, zebrafish, platypus, alpaca, cynomolgus monkey, cow, dog, chicken or salmon, or a combination of at least two thereof.

Preferably, the length range of the sequence before the initiation site in step (2) is 1-300bp, such as 1bp, 5bp, 10bp, 20bp, 30bp, 40bp, 50bp, 60bp, 70bp, 80bp, 90bp, 100bp, 110bp, 120bp, 130bp, 140bp, 150bp, 160bp, 170bp, 180bp, 190bp, 200bp, 210bp, 220bp, 230bp, 240bp, 250bp, 260bp, 270bp, 280bp, 290bp or 300 bp.

Preferably, the fixed length range of the primer sequence in step (3) is 16-26bp, such as 16bp, 17bp, 18bp, 19bp, 20bp, 21bp or 22bp, preferably 18-22 bp.

Preferably, the method of cluster analysis in step (4) is: and (3) setting a-c parameter through Cd-hit software, and clustering the candidate primer library sequences according to the-c parameter setting.

Preferably, the-c parameter is set to 0.8-1.0, for example, it may be 0.8, 0.9 or 1.0, preferably 0.9.

Preferably, the screening method in step (4) is: sorting the cluster according to the size, selecting the top 10-90 clusters, extracting the representative sequence of each cluster, and screening according to the primer design principle to obtain the initial primer group.

The top 10-90 cluster may be, for example, top 10, top 20, top 30, top 40, top 50, top 60, top 70, top 80, or top 90.

Preferably, the method for alignment in step (5) is: and comparing the initial primer group with an antibody database of the corresponding species to obtain the coverage of each sequence in the antibody database, sequencing the coverage according to the size, and selecting the primer with the top 10-40 ranks as a final primer group.

The primer of first 10-40 may be, for example, first 10, first 16, first 20, first 25, first 30, first 35 or first 40.

As a preferred technical scheme, the method for designing the primer group specifically comprises the following steps:

(1) obtaining germline gene data for the species' antibody light and/or heavy chains and genomic data for the species;

wherein the species comprises any one or a combination of at least two of human, mouse, rat, rabbit, rhesus monkey, sheep, pig, teleostean, cartilaginous fish, Atlantic cod, Clarias fuscus, rainbow trout, zebrafish, duck bill, alpaca, cynomolgus monkey, cow, dog, chicken or salmon;

(2) positioning of a primer selection region: comparing germline genes of the heavy chain and/or the light chain of the antibody to the genome data of the chromosome of the antibody of the species, finding the starting site of the antibody FR1, and extracting the sequence 1-300bp before the starting site;

(3) design of a candidate primer library: slicing the sequence before the initial site extracted in the step (2), and sequentially shifting and cutting a primer sequence with a fixed length from the first base at the 5' end, wherein the length range of the primer is 18-22bp, so as to form a candidate primer library;

the method for cluster analysis comprises the following steps: setting a-c parameter through Cd-hit software, and clustering candidate primer library sequences according to the-c parameter setting;

the size of the-c parameter is set to 0.9;

the screening method comprises the following steps: sorting the cluster according to the size, selecting the top 10-90 clusters, extracting the representative sequence of each cluster, and screening according to the primer design principle to obtain an initial primer group;

(5) and comparing the initial primer group with an antibody database of a corresponding species to obtain the coverage of each primer in the antibody database, sequencing the coverage according to the size, and selecting the primer with the top 10-40 ranks as a final primer group.

In a second aspect, the present invention provides a human B cell immune repertoire heavy chain primer set designed by the method of the first aspect, wherein the sequence of the primer set comprises the sequence shown as SEQ NO.1-SEQ NO. 16;

the detailed sequence is as follows:

SEQ NO.1:ATGGACATACTTTGTTCCA

SEQ NO.2:CCATGGAGTTTGGGCTGAGC

SEQ NO.3:GGCTGAGCTGGGTTTTCCTT

SEQ NO.4:CTGAGCTGGGTTTTCCTTGT

SEQ NO.5:CTCCTGGTGGCAGCTCCCAG

SEQ NO.6:CCTCCTCCTGGTGGCAGCTC

SEQ NO.7:CAGCTCCCAGATGTGAGTGT

SEQ NO.8:GCTGGGTTTTCCTTGTTGCT

SEQ NO.9:ATGAAACACCTGTGGTTCTT

SEQ NO.10:CCTGGAGGATCCTCTTCTTG

SEQ NO.11:GGTTTTCCTTGTTGCTATTT

SEQ NO.12:TGGTGGCAGCTCCCAGATGT

SEQ NO.13:GGACGTGAGTGAGAGAAACA

SEQ NO.14:TCCTCACCATGGACTGGACC

SEQ NO.15:CTTGTTGGTATTTTAAAAGG

SEQ NO.16:GAGGATCCTCTTCTTGGTGG.

preferably, the sequences of the primer set comprise any fifteen primer sequences of the sequences shown as SEQ ID NO.1-SEQ ID NO. 16.

The arbitrary fifteen primer sequences are randomly selected from the sequences shown in SEQ ID NO.1-SEQ ID NO. 16.

Further preferably, the sequences of the primer set comprise any ten primer sequences in the sequences shown as SEQ ID NO.1-SEQ ID NO. 16.

The arbitrary ten primer sequences are randomly selected from sequences shown in SEQ ID NO.1-SEQ ID NO. 16.

In a third aspect, the present invention provides a use of a primer set according to the method of the first aspect or the second aspect for immune repertoire amplification.

In a fourth aspect, the present invention provides the use of a primer set according to the method of the first aspect or the second aspect for the preparation of a kit for amplifying an immune repertoire.

Compared with the prior art, the invention has the following beneficial effects:

the method provided by the invention is simple and efficient, can obviously improve the reaction efficiency and the coverage of the multiplex PCR, the coverage reaches 95-100%, the mismatch rate is reduced, the integrity of the antibody is higher, and the cost is saved.

Drawings

FIG. 1 is a PCR amplification electrophoretogram of the human B cell immune repertoire of the present invention;

FIG. 2 is a graph showing the length distribution of CDR3 amplified by the primer set of the present invention;

fig. 3 is a germline (germline) profile of CDR3 of IGVH according to the present invention.

Detailed Description

To further illustrate the technical means and effects of the present invention, the following further describes the technical solutions of the present invention by way of specific embodiments with reference to the drawings, but the present invention is not limited to the scope of the embodiments.

Example 1 design of primer set for human B cell immune repertoire

(1) Extraction of human germline Gene

The human heavy chain germline gene was downloaded from the IMGT database (http:// IMGT. org/vquest/refseq. html): 361 genes are extracted, wherein 1 FASTA sequence information is as follows:

>M99641|IGHV1-18*01|Homo sapiens|F|V-REGION|188..483|296nt|1|||||296+24＝320|||

(2) primer selection region mapping

Aligning 361 sequences to chromosome 14 'hs _ ref _ GRCh38.p12_ chr14. fa', extracting a base sequence group of 150bp in front of the sequence starting site;

(3) design of candidate primer library

A69430 primer start data set was formed every 20bp to form a candidate primer library.

(4) Cluster analysis

Classifying the candidate primer library by using Cd-hit software, setting-c parameter to be 0.9, and performing cluster analysis; sorting the cluster according to the size, selecting the first 90 clusters, extracting the representative sequence of each cluster, and screening according to the primer design principle to obtain an initial primer group;

the primer original data group is clustered, the number behind ">" is the primer sequence number in the candidate primer library,% represents the similarity between the candidate primer sequence and the representative sequence, and the clustering partial results are shown in the following table 1:

TABLE 1

As can be seen from Table 1, the sequence in cluster1 and the representative sequence have similarity, i.e., the first 35 sequences in the cluster of more than 95% of the sequences in cluster after cd-hit screening.

(5) And comparing the initial primer group with a human antibody database to obtain the coverage of each sequence in the antibody database, sorting the values according to the sizes, selecting the top 20, analyzing GC content, annealing temperature and the like, and finally selecting 16 primers as a final heavy chain primer group for carrying out next experimental verification.

The selected 16 primer sequences (SEQ NO.1-SEQ NO.16) are the primer group of the selected human B cell immune repertoire, and the detailed sequence is as follows (5 '-3'):

SEQ NO.1:ATGGACATACTTTGTTCCA

SEQ NO.2:CCATGGAGTTTGGGCTGAGC

SEQ NO.3:GGCTGAGCTGGGTTTTCCTT

SEQ NO.4:CTGAGCTGGGTTTTCCTTGT

SEQ NO.5:CTCCTGGTGGCAGCTCCCAG

SEQ NO.6:CCTCCTCCTGGTGGCAGCTC

SEQ NO.7:CAGCTCCCAGATGTGAGTGT

SEQ NO.8:GCTGGGTTTTCCTTGTTGCT

SEQ NO.9:ATGAAACACCTGTGGTTCTT

SEQ NO.10:CCTGGAGGATCCTCTTCTTG

SEQ NO.11:GGTTTTCCTTGTTGCTATTT

SEQ NO.12:TGGTGGCAGCTCCCAGATGT

SEQ NO.13:GGACGTGAGTGAGAGAAACA

SEQ NO.14:TCCTCACCATGGACTGGACC

SEQ NO.15:CTTGTTGGTATTTTAAAAGG

SEQ NO.16:GAGGATCCTCTTCTTGGTGG

when evaluating the performance of the subsequent primer group, the immune group library is amplified by taking SEQ NO.1-SEQ NO.16 as an upstream primer; the immune repertoire is amplified by taking SEQ NO.17 as a downstream primer, and the sequence of the downstream primer SEQ NO.17 is (5 '-3'):

SEQ NO.17：GGGGAAGACCGATGGGCCCTTGGTGG

example 2 evaluation of the Performance of the primer set

(1) Sample preparation

Separating peripheral blood B lymphocyte with human lymphocyte separating liquid, extracting RNA, reverse transcribing to obtain cDNA, the steps are as follows:

1) collecting fresh peripheral blood samples (5) each of 10 mL (mL) and obtaining relatively pure Peripheral Blood Mononuclear Cells (PBMC) according to the LymphoPrep kit instructions;

2) RNA was extracted, and the concentration and purity of RNA were measured by Nanodrop2000, and then reverse transcription was performed using a TIANCcriptM-MLV reverse transcriptase kit (cat # ER104) to obtain cDNA (i.e., sample: ST-1, ST-2, ST-3, ST-4 and ST-5) are used as amplification templates for standby;

the reverse transcription reaction steps are as follows:

a) system configuration:

the reverse transcription reaction system was configured as shown in Table 2.

TABLE 2

Components	Dosage of
		oligo(dT)_12-18	2μL
Total RNA	1μg
		dNTP	2μL
ddH₂O	By ddH₂O make up the system to 15. mu.L

b) Heating at 65 deg.C for 5min, rapidly cooling on ice for 2min, centrifuging briefly to collect reaction solution, and adding 5 × First-Strand Buffer and RNase (40U/. mu.L);

c) adding 1 μ L (200U) of TIANCcript M-MLV and gently mixing with a pipette;

d) bathing at 42 deg.C for 1h10 min;

e) the reaction was terminated by heating at 85 ℃ for 5min and placed on ice for subsequent experiments.

(2) Multiplex PCR amplification

1) Architecture configuration

Adding an upstream primer and a downstream primer to prepare a multiplex PCR system, wherein the upstream primer group (SEQ ID NO:1-SEQ ID NO:16) in the system is equimolar mixture, and the total concentration of the primers is 20 mu m; the total concentration of the downstream primer (SEQ NO.17) was 20 μm; cDNA (i.e., samples: ST-1, ST-2, ST-3, ST-4, ST-5) was used as an amplification template, and a multiplex PCR reaction system was prepared as shown in Table 3;

TABLE 3

Components	Volume of
		10 XPCR buffer	5μL
Upstream primer set	2μL
		Downstream primer	2μL
dNTP	1μL
		Taq enzyme	0.5μL
cDNA	5μL
		ddH₂O	34.5μL
Total of	50μL

2) PCR reaction

Setting PCR instrument program according to the conditions of multiplex PCR, carrying out multiplex PCR: after the PCR is finished, preserving the PCR product at 4 ℃ and carrying out electrophoresis detection, selecting a fragment with the fragment length of about 500bp, tapping and recovering the gel to obtain a purified antibody DNA fragment, wherein the gel recovery step adopts a QIAquick gel purification kit of QIAGEN company and is carried out according to conventional experimental operation; nanodrop2000 measures DNA concentration and performs high throughput sequencing.

Multiplex PCR conditions are as follows in Table 4:

TABLE 4

The PCR amplification electrophoretogram of the human B cell immune repertoire obtained by gel electrophoresis after PCR amplification is shown in FIG. 1, and it can be seen from FIG. 1 that a band is evident around 500bp, and the band range amplified by the primer set 1 (i.e., SEQ ID NO:1-SEQ ID NO:16) of the present invention is substantially consistent with that amplified by the primer set 2 (document 1(1.High-throughput isolation of immunological genes from single human cells B cells and expression as monoclonal antibodies. Liao HX1, Levesque MC, Nanel A, et al. J Virol methods.2009; 158(1-2): 171-9.)). It can be preliminarily judged that the primer set (SEQ ID NO:1-SEQ ID NO:16) designed by the invention can amplify the heavy chain sequence of the human B cell immune repertoire.

(3) Analysis of primer utilization

The use of bioinformatics bowtie software, analysis of the primer (SEQ ID NO:1-SEQ ID NO:16) and 5 samples (ST-1, ST-2, ST-3, ST-4, ST-5) sequence matching, 5' RACE method as the control (i.e. the method of the amplification of the antibody sequence as the benchmark), analysis of the primer utilization as shown in Table 5 below;

TABLE 5

In the above table:

a 0 mismatch indicates that mismatches between the primer sequence and the antibody sequence are not allowed;

1 mismatch means that 1 base of the primer sequence and the antibody sequence are allowed to be mismatched;

a2 mismatch is one that allows 2 bases of the primer sequence to mismatch the antibody sequence.

The results of the above table show: the designed primer set (SEQ ID NO:1-SEQ ID NO:16) has good match with the sample.

(4) Coverage analysis

Samples ST-1, ST-2, ST-3, ST-4 and ST-5 are taken as amplification templates, then PCR amplification is carried out by respectively using the primers designed by the invention (SEQ NO.1-SEQ NO.16, namely the primer group 1 in the table) and the primers in the literature 1 (the primer group 2 in the table), and the coverage of the antibody sequence amplified by the primers of the invention is analyzed by taking the 5' RACE method as a reference (namely the antibody sequence amplified by the method is taken as a reference), and the analysis result of the coverage is shown in the following table 6:

TABLE 6

Coverage is equal to "PCR method" divided by "5' RACE method" multiplied by one hundred percent.

Wherein: "PCR method" refers to the number of amplified antibody sequences by multiplex PCR method that are identical to the sequence of the antibody amplified by 5' RACE method; the "5 'RACE method" refers to the total number of antibody sequences amplified by the 5' RACE method.

In the above table, for 5 samples tested, the average coverage of the antibody sequences amplified by the primer set 1 is 95.01% or more, which is 89.35% on average compared with the coverage of the primer set 2 in the literature, and the coverage of the primer set 1 designed by the present invention is 5.66% higher on average compared with the coverage of the primers disclosed in the literature, and the coverage of the primer set designed by the present invention is very high.

Example 3 immunohistochemical library bioinformation analysis

The designed primer set (SEQ ID NO:1-SEQ ID NO:16) was used as an upstream primer, and the sequence of SEQ ID NO: and 17, a downstream primer, namely a human B cell immune repertoire amplified, and performing subsequent bioinformatics analysis, wherein the analysis result is as follows:

(1) CDR3 region Length analysis

In the CDR3 amino acid sequence, a change of 1 amino acid (aa) may cause a change in receptor conformation, and therefore, a change in the length of the CDR3 amino acid sequence may reflect the diversity of the CDR3 gene junction region, and after analyzing the sequence information of the immune repertoire by Igblast software, the distribution of the CDR3 length of the immune repertoire (see fig. 2) and the use of germline genes (germline) in the CDR3 region (see fig. 3) are obtained.

As can be seen from FIG. 2, the amino acid length of CDR3 is mainly concentrated between 9-12 amino acids.

(2) Use of germline genes (germline) in the CDR3 region

FIG. 3 is a Germline profile and usage profile of CDR3 of IGVH; the abscissa is different Germlines and the ordinate is the number of certain Germlines, the result is shown in FIG. 3;

from figure (3) it can be seen that the abundance of the different germlines is very different, some fragments are significantly more abundant than others, and the germlines of the VH of this sample are mainly concentrated in IGVH3-7 x 01.

(3) The kinds of amino acids in the CDR3 region, and the frequency and length of the amino acid sequence are shown in Table 7;

TABLE 7

As can be seen from Table 7, the sequences of CDR3 vary greatly, and the abundance of some fragments, such as "AVDSNYQLI", is higher than that of other fragments, so that the characteristic antibodies in the immune repertoire can be judged according to the abundance of CDR3, and an idea is provided for subsequent research and treatment.

In conclusion, the invention provides a method for designing a primer group, according to the requirement of amplification length, a sequence in front of an FR1 initiation site is skillfully extracted at the upstream of the initiation site of an antibody FR1, a primer sequence with a fixed length is sequentially cut by shifting from the first base at the 5' end to form a candidate primer library, and a primer group is obtained by screening after cluster analysis.

The applicant states that the present invention is illustrated in detail by the above examples, but the present invention is not limited to the above detailed methods, i.e. it is not meant that the present invention must rely on the above detailed methods for its implementation. It should be understood by those skilled in the art that any modification of the present invention, equivalent substitutions of the raw materials of the product of the present invention, addition of auxiliary components, selection of specific modes, etc., are within the scope and disclosure of the present invention.

Sequence listing

<110> Suzhou Hongxn Biotechnology Ltd

<120> design method of primer group and application thereof

<130> 2018

<141> 2018-11-06

<160> 17

<170> SIPOSequenceListing 1.0

<210> 1

<211> 19

<212> DNA

<213> Artificial Synthesis ()

<400> 1

atggacatac tttgttcca 19

<210> 2

<211> 20

<212> DNA

<213> Artificial Synthesis ()

<400> 2

ccatggagtt tgggctgagc 20

<210> 3

<211> 20

<212> DNA

<213> Artificial Synthesis ()

<400> 3

ggctgagctg ggttttcctt 20

<210> 4

<211> 20

<212> DNA

<213> Artificial Synthesis ()

<400> 4

ctgagctggg ttttccttgt 20

<210> 5

<211> 20

<212> DNA

<213> Artificial Synthesis ()

<400> 5

ctcctggtgg cagctcccag 20

<210> 6

<211> 20

<212> DNA

<213> Artificial Synthesis ()

<400> 6

cctcctcctg gtggcagctc 20

<210> 7

<211> 20

<212> DNA

<213> Artificial Synthesis ()

<400> 7

cagctcccag atgtgagtgt 20

<210> 8

<211> 20

<212> DNA

<213> Artificial Synthesis ()

<400> 8

gctgggtttt ccttgttgct 20

<210> 9

<211> 20

<212> DNA

<213> Artificial Synthesis ()

<400> 9

atgaaacacc tgtggttctt 20

<210> 10

<211> 20

<212> DNA

<213> Artificial Synthesis ()

<400> 10

cctggaggat cctcttcttg 20

<210> 11

<211> 20

<212> DNA

<213> Artificial Synthesis ()

<400> 11

ggttttcctt gttgctattt 20

<210> 12

<211> 20

<212> DNA

<213> Artificial Synthesis ()

<400> 12

tggtggcagc tcccagatgt 20

<210> 13

<211> 20

<212> DNA

<213> Artificial Synthesis ()

<400> 13

ggacgtgagt gagagaaaca 20

<210> 14

<211> 20

<212> DNA

<213> Artificial Synthesis ()

<400> 14

tcctcaccat ggactggacc 20

<210> 15

<211> 20

<212> DNA

<213> Artificial Synthesis ()

<400> 15

cttgttggta ttttaaaagg 20

<210> 16

<211> 20

<212> DNA

<213> Artificial Synthesis ()

<400> 16

gaggatcctc ttcttggtgg 20

<210> 17

<211> 26

<212> DNA

<213> Artificial Synthesis ()

<400> 17

ggggaagacc gatgggccct tggtgg 26

Claims

1. A method for designing a primer set is characterized by comprising the following steps:

the-c parameter is set to 0.8-1.0;

the screening method comprises the following steps: sorting the cluster according to the size, selecting the top 10-90 clusters, extracting the representative sequence of each cluster, and screening to obtain an initial primer group;

2. The method according to claim 1, wherein the species of step (1) comprises any one or a combination of at least two of human, mouse, rat, rabbit, rhesus monkey, sheep, pig, teleost fish, cartilaginous fish, platypus, alpaca, cynomolgus monkey, cow, dog or chicken.

3. The method of claim 1, wherein the sequence preceding the start site in step (2) is in the range of 1-300bp in length.

4. The method of claim 1, wherein the fixed length of the primer sequence of step (3) is in the range of 16-26 bp.

5. The method of claim 4, wherein the fixed length of the primer sequence of step (3) is in the range of 18-22 bp.

6. The method of claim 1, wherein the-c parameter of step (4) is set to 0.9.

7. The method of claim 1, wherein the alignment of step (5) is performed by: and comparing the initial primer group with the antibody database data of the corresponding species to obtain the coverage of each sequence in the antibody database, sequencing the coverage according to the size, and selecting the primer with the top 10-40 ranks as the final primer group.

8. The method according to claim 1, characterized in that it comprises in particular the steps of:

wherein the species comprises any one or a combination of at least two of human, mouse, rat, rabbit, rhesus monkey, sheep, pig, teleost, cartilaginous fish, platypus, alpaca, cynomolgus monkey, cow, dog or chicken;

(2) positioning a primer selection region: comparing germline genes of the heavy chain and/or the light chain of the antibody to genome data of a chromosome where the species antibody is located, finding an initial site of the antibody FR1, and extracting a sequence 1-300bp before the initial site;

the size of the-c parameter is set to 0.9;

(5) and comparing the initial primer group with an antibody database of the corresponding species to obtain the coverage of each primer in the antibody database, sequencing the coverage according to the size, and selecting the primer with the top 10-40 ranks as a final primer group.

9. A human B cell immune repertoire heavy chain primer set designed by the method of any one of claims 1 to 8, wherein the sequences of the primer set comprise sequences set forth as SEQ No.1 to SEQ No. 16.

10. The human B-cell repertoire heavy chain primer set of claim 9, wherein the sequences of the primer set comprise any fifteen primer sequences of the sequences shown as SEQ ID No.1-SEQ ID No. 16.

11. The human B-cell repertoire heavy chain primer set of claim 10, wherein the sequence of the primer set comprises any ten primer sequences of the sequences set forth as SEQ ID No.1-SEQ ID No. 16.

12. A method of use of the method of any one of claims 1 to 8 or the primer set of any one of claims 9 to 11 in amplification of an immune repertoire.

13. A device for use of the method of any one of claims 1 to 8 or the primer set of any one of claims 9 to 11 for preparing a kit for amplifying a repertoire of immunology.