The content of the invention
In view of the shortcomings of the prior art, the object of the present invention is to provide a kind of multiple PCR primer design based on Primer3
Method, it can reduce non-specific amplification, design the stronger multiple PCR primer of specificity.
To achieve these goals, it is more the present invention provides a kind of multiple PCR primer design method based on Primer3
Weight PCR primer design method includes:
S1:Obtain the original series of target dna sequence;
S2:Primer3 carries out PCR primer design to target sequence, and generates candidate drugs;
S3:Multiple PCR primer is assessed with PE models or SE models, and filters out the multiple PCR primer of qualification;
S4:Change primer screening parameter, the target dna sequence for failing to design primer be designed and screened again,
Finally obtain the multiple PCR primer of all target dna sequences.
The present invention is solved in multiple PCR primer design, because the problems such as specificity causes by merging PE models and SE models
Non-specific amplification problem;The multiple PCR primer of high specific high sensitivity can quickly be designed.
Another embodiment according to the present invention, multiple PCR primer design method further comprise setting for degenerate primer
Meter, the design of degenerate primer include:
If two target sequence region close proximities, the primers of two target sequences can occur non-specific because of interfering with each other
Property amplification, cause amplification fail;Therefore two target areas need to be merged to design degenerate primer.
In the present solution, non-specific amplification is because sense primer similar in two or anti-sense primer and template DNA
Chain is combined, and forms new upstream and downstream primer relation;Sometimes the anti-sense primer and the latter target of even previous target sequence
The sense primer of sequence can also form new upstream and downstream primer relation.
Another embodiment according to the present invention, in step S1, obtaining the original series of target dna sequence includes:Structure
The coordinate file of target sequence is built, and every a line of coordinate file is included a gene order coordinate.
Another embodiment according to the present invention, the form of gene order coordinate are bed document forms, bed file shapes
Formula is:Chromosome position, tab, origin coordinates, tab, terminates coordinate.
In the present solution, the reference gene group for obtaining target dna sequence is hg19, it is the mankind (Homo sapiens) standard
Reference gene group (GRCh37/hg19) is international as screening target sequence mutational site or the reference base of SNP site
Because of sequence, i.e. NCBI GenBank assemblyaccession:GCA_000001405.1.
Another embodiment according to the present invention, in step S3, screening parameter of the PE models to multiplex PCR candidate drugs
Including:
G/C content:Refer to a key character of the content of GC in primer sequence, simply DNA sequence dna, it is strong to directly affect annealing
Degree, is set as:20%~80%.
TM values:Refer to the melting temperature of oligonucleotides, i.e., under certain salinity, 50% oligonucleotides duplex melting temperature,
It is set as:59 DEG C~61 DEG C.
Primer size:Refer to primer length, due to the specificity of reaction, temperature and annealing time are all partly depended on and drawn
The length of thing, is set as:18~27.
Amplicon size:One section of nucleotide sequence after finger amplicon length, DNA or RNA amplification, is set as:200
~260.
Target_side:Refer to the localization length of target sequence left and right ends design primer, setting:100.
buffer:Thing 3 ' is guided to arrive the buffer area of target area, setting:5.
Another embodiment according to the present invention, in step S3, PE models or SE models comment multiple PCR primer
Estimate, and it is prediction model to filter out the PE models in the multiple PCR primer of qualification and SE models, is realized by code and algorithm
The process of PCR experiment, and predict the specificity of multiple PCR primer, prevent the primer non-specific amplification feelings of multiplex PCR experiment
Condition;Prediction process is realized by code.
Another embodiment according to the present invention, in step S3, carries out multiple PCR primer with PE models or SE models
Assess, and filter out the primer assessment in the multiple PCR primer of qualification and screening candidate drugs to include:
The specificity of candidate drugs is assessed;
The structures such as primer dimer whether can be formed to candidate drugs to assess;
The mononucleotide high frequency polymorphic site situation of candidate drugs is assessed;
To the annealing temperature of candidate drugs, melting temperature, GC values are assessed;
Screen out candidate drugs of the melting temperature not in setting range;
Screen out candidate drugs of the GC values not in setting range;
Screen out the candidate drugs that mononucleotide high frequency polymorphic site exceedes threshold value;
Screen out no specific candidate drugs;
The isostructural candidate drugs of primer dimer can be formed by screening out;
Screen out the candidate drugs there are risk.
In the present solution, to the annealing temperature of candidate drugs, melting temperature, GC values, which carry out assessment, to be included:To the TM values of primer
Calculated, calculation formula is:TM=△ H °/(△ S °+RlnCT);Wherein △ H ° and △ S ° are respectively the Standard Enthalpies of hybridization reaction
Become and Entropy Changes, R are gas constant 1.987cal/kmol, CTFor DNA molecular molar concentration (when DNA molecular is asymmetric sequence
When its molar concentration take CT/4)。
Assessment is carried out including being extracted to target dna sequence to the mononucleotide high frequency polymorphic site situation of candidate drugs
High frequency polymorphic site in DNA polymorphism data.
High frequency polymorphic site in extraction DNA polymorphism data includes:From DNA polymorphism extracting data and target
The corresponding high frequency polymorphic site of coordinate file of DNA sequence dna.
Another embodiment according to the present invention, in step S4, obtains the multiple PCR primer of all target dna sequences
Method include:
For because GC values, TM values, lack the target dna sequence that specificity causes that primer cannot be designed, are not influencing
Before on the premise of designed primer, by varying the length of amplicon, primer is designed in target sequence left and right ends
Localization length increases the quantity of candidate drugs, selects suitable candidate drugs;If suitable primer can't be selected, lead to
Cross and rationally expand TM values and GC values to filter out suitable candidate drugs;
For the mesh of suitable candidate drugs cannot be designed because mononucleotide high frequency polymorphic site exceedes threshold value
Sequence is marked, on the premise of designed primer is not influenced, suitable primer is filtered out by raising threshold value;
For the target dna sequence of suitable candidate primer cannot be designed because of the structure such as primer dimer, in not shadow
Before ringing on the premise of designed primer, suitable candidate drugs are found by varying assessment system.
In the present solution, SE models can be used instead to assess primer;SE models do not introduce more ginsengs than PE model
Number, but can also select the candidate drugs of high specificity.
Compared with prior art, the present invention possesses following beneficial effect:
The recognizable sequence of the present invention is at a distance of nearer target sequence and designs degenerate primer, and then reduces because at a distance of nearer
Non-specific amplification caused by the primer of target sequence interferes with each other;The specificity of primer, is reduced designed by systematicness assessment
Because of non-specific amplification, there are the reasons such as primer dimer and hairpin structure caused by expand failure, designed for multiplex PCR experiment
Go out the stronger multiple PCR primer of specificity.
The present invention is described in further detail below in conjunction with the accompanying drawings.
Embodiment 1
A kind of multiple PCR primer design method based on Primer3 is present embodiments provided, as shown in Figure 1, it includes:
S1:Obtain the original series of target dna sequence;
S2:Primer3 carries out PCR primer design to target sequence, and generates candidate drugs;
S3:Multiple PCR primer is assessed with PE models or SE models, and filters out the multiple PCR primer of qualification;
S4:Change primer screening parameter, the target dna sequence for failing to design primer be designed and screened again,
Finally obtain the multiple PCR primer of all target dna sequences.
A kind of multiple PCR primer design method based on Primer3 is present embodiments provided, solves primer and DNA moulds
The erroneous matching and non-specific amplification of plate, occur expanding failure problem caused by the reasons such as primer dimer and hairpin structure,
The specificity of primer designed by systematicness assessment, it is ensured that the success of multiplex PCR experiment.
The present embodiment is related to 58 SNP sites (table 1) of 46 tumour Individual Chemotherapy medication guide genes, to these
SNP site builds the coordinate file (table 2) of target sequence, and designs multiple PCR primer;46 tumour Individual Chemotherapy medications refer to
58 SNP sites for leading gene are as follows:
The 1 relevant SNP site gene of tumour Individual Chemotherapy medication guide of table
The coordinate file form of SNP site structure target sequence is bed files, and bed document forms are:Chromosome position,
Tab, origin coordinates, tab, terminates coordinate.The coordinate file of the present embodiment is as follows:
Table 2SNP sites build the coordinate file of target sequence
Multiple PCR primer design method further comprises the design of degenerate primer, and the design of degenerate primer includes:
If two target sequence region close proximities, the primers of two target sequences can occur non-specific because of interfering with each other
Property amplification, cause amplification fail;Therefore two target areas need to be merged to design degenerate primer.
In the present solution, non-specific amplification is because sense primer similar in two or anti-sense primer and template DNA
Chain is combined, and forms new upstream and downstream primer relation;Sometimes the anti-sense primer and the latter target of even previous target sequence
The sense primer of sequence can also form new upstream and downstream primer relation.
As shown in Fig. 2, the region of target sequence a and target sequence b are respectively (A, B) and (C, D), two target sequences are apart
Nearer situation has two kinds:For situation one for two target sequence close proximities but without public region, situation two is two target sequence phases
Public domain is cut with away from close;Either which kind of situation, can merge into target sequence and annex sequence (a, d), this part
Merging is realized by code.
As shown in figure 3, sense primer similar in two or anti-sense primer are combined to form new upstream and downstream and draw with template DNA chain
Thing relation causes amplification to fail:Primer A and primer B amplification target sequence a, primer C and primer D amplification target sequences b;But primer
A and primer C forms new upstream and downstream primer relation and amplifies the upper of non-specific amplification strips A C, primer B and primer D-shaped Cheng Xin
Anti-sense primer relation amplifies non-specific amplification band BD;And normal condition is to amplify the two specific amplifications of AB and CD
Band.
As shown in figure 4, the anti-sense primer of previous target sequence and the sense primer of the latter target sequence formed it is new
Upstream and downstream primer relation causes amplification to fail:Attached primer A and primer B amplification target sequence a, primer C and primer D amplification target sequence
Arrange b;But primer C and primer B form new upstream and downstream primer relation, non-specific band BC is amplified;Primer A and primer D-shaped
The upstream-downstream relationship of Cheng Xin amplifies non-specific strips A D;And normal condition is to amplify the two specific amplifications of AB and CD
Band.
In step S1, obtaining the original series of target dna sequence includes:The coordinate file of target sequence is built, and makes seat
Every a line of mark file includes a gene order coordinate.
The form of gene order coordinate is bed document forms, and bed document forms are:Chromosome position, tab, starting
Coordinate, tab, terminates coordinate.
In the present solution, the reference gene group for obtaining target dna sequence is hg19, it is the mankind (Homo sapiens) standard
Reference gene group (GRCh37/hg19) is international as screening target sequence mutational site or the reference base of SNP site
Because of sequence, i.e. NCBI GenBank assemblyaccession:GCA_000001405.1.
In step S3, PE models include the screening parameter of multiplex PCR candidate drugs:
G/C content:Refer to a key character of the content of GC in primer sequence, simply DNA sequence dna, it is strong to directly affect annealing
Degree, is set as:20%~80%.
TM values:Refer to the melting temperature of oligonucleotides, i.e., under certain salinity, 50% oligonucleotides duplex melting temperature,
It is set as:59 DEG C~61 DEG C.
Primer size:Refer to primer length, due to the specificity of reaction, temperature and annealing time are all partly depended on and drawn
The length of thing, is set as:18~27.
Amplicon size:One section of nucleotide sequence after finger amplicon length, DNA or RNA amplification, is set as:200
~260.
Target_side:Refer to the localization length of target sequence left and right ends design primer, setting:100.
buffer:Thing 3 ' is guided to arrive the buffer area of target area, setting:5.
In step S3, PE models or SE models assess multiple PCR primer, and the multiplex PCR for filtering out qualification draws
PE models and SE models in thing are prediction model, the process of PCR experiment are realized by code and algorithm, and predict multiplex PCR
The specificity of primer, prevents the primer non-specific amplification situation of multiplex PCR experiment;Prediction process is realized by code.
In step S3, multiple PCR primer is assessed with PE models or SE models, and filters out the multiplex PCR of qualification
Primer assessment and screening candidate drugs in primer include:
The specificity of candidate drugs is assessed;
The structures such as primer dimer whether can be formed to candidate drugs to assess;
The mononucleotide high frequency polymorphic site situation of candidate drugs is assessed;
To the annealing temperature of candidate drugs, melting temperature, GC values are assessed;
Screen out candidate drugs of the melting temperature not in setting range;
Screen out candidate drugs of the GC values not in setting range;
Screen out the candidate drugs that mononucleotide high frequency polymorphic site exceedes threshold value;
Screen out no specific candidate drugs;
The isostructural candidate drugs of primer dimer can be formed by screening out;
Screen out the candidate drugs there are risk.
In the present solution, to the annealing temperature of candidate drugs, melting temperature, GC values, which carry out assessment, to be included:To the TM values of primer
Calculated, calculation formula is:TM=△ H °/(△ S °+RlnCT);Wherein △ H ° and △ S ° are respectively the Standard Enthalpies of hybridization reaction
Become and Entropy Changes, R are gas constant 1.987cal/kmol, CTFor DNA molecular molar concentration (when DNA molecular is asymmetric sequence
When its molar concentration take CT/4)。
Assessment is carried out including being extracted to target dna sequence to the mononucleotide high frequency polymorphic site situation of candidate drugs
High frequency polymorphic site in DNA polymorphism data.
High frequency polymorphic site in extraction DNA polymorphism data includes:From DNA polymorphism extracting data and target
The corresponding high frequency polymorphic site of coordinate file of DNA sequence dna.
In step S4, obtaining the method for the multiple PCR primer of all target dna sequences includes:
For because GC values, TM values, lack the target dna sequence that specificity causes that primer cannot be designed, are not influencing
Before on the premise of designed primer, by varying the length of amplicon, primer is designed in target sequence left and right ends
Localization length increases the quantity of candidate drugs, selects suitable candidate drugs;If suitable primer can't be selected, lead to
Cross and rationally expand TM values and GC values to filter out suitable candidate drugs;
For the mesh of suitable candidate drugs cannot be designed because mononucleotide high frequency polymorphic site exceedes threshold value
Sequence is marked, on the premise of designed primer is not influenced, suitable primer is filtered out by raising threshold value;
For the target dna sequence of suitable candidate primer cannot be designed because of the structure such as primer dimer, in not shadow
Before ringing on the premise of designed primer, suitable candidate drugs are found by varying assessment system.
In the present solution, SE models can be used instead to assess primer;SE models do not introduce more ginsengs than PE model
Number, but can also select the candidate drugs of high specificity.
It is main to consider some following content when being screened to candidate drugs:Tm values are screened out not in setting range
Interior candidate drugs;
Screen out candidate drugs of the GC values not in setting range;Candidate of the primer length not in setting range is screened out to draw
Thing;Screen out the candidate drugs that mononucleotide high frequency polymorphic site exceedes threshold value;Screen out no specific candidate drugs;Screen out
The isostructural candidate drugs of primer dimer can be formed;Screen out the candidate drugs there are risk.
Wherein, since melting temperature Tm values are the melting temperatures of oligonucleotides, i.e., under certain salt concentration conditions, 50% is few
The temperature that nucleotide double is unwind, therefore Tm values react the important reference of annealing temperature for PCR, for primer, most
Good melting temperature scope is:52 DEG C~58 DEG C, generally melting temperature setting is not no more than 65 DEG C, avoids going out for double annealing
It is existing.Therefore candidate drugs of the Tm values not in user's setting range to be screened out when design of primers is carried out.
The GC values of primer are the key characters of DNA sequence dna, directly affect anneal intensity.If the G/C content of primer is too low
Appropriate extension primer sequence is wanted, therefore to screen out candidate drugs of the G/C content not in setting range.
Since reaction temperature and annealing time all partly depend on the length of primer, the setting of the parameter is very heavy
Will;Since every one nucleotide primer specificity of increase improves four times, so the most short primer length of most of applications is 18
Nucleotide, can so reduce the chance of the secondary hybridization of primer and secondary carrier site or insertion to the greatest extent, be set so carrying out primer
Candidate drugs of the primer length not in setting range are screened out when meter.
If designed candidate drugs can cause practical application there are SNP polymorphic sites and INDEL polymorphic sites
In some samples amplification efficiency reduce even can not expand to obtain product;Therefore it is directed on primer there are polymorphic site, is
Improving the amplification efficiency of primer needs to screen out the candidate drugs that SNP exceedes given threshold.
Whether non-specific amplification can be carried out come predicting candidate primer by PE models, so that the time by non-specific amplification
Primer is selected to screen out.
Table 3 is the screening parameter of candidate drugs, and parameter includes Tm values, G/C content, primer length:Primer_size, amplification
Sub- length:The localization length of primer is designed in amplicon_size, target sequence left and right ends:Target_side, guides thing 3 '
To the buffer area of target area:Buffer, it is allowed to which the 3 ' terminal sequences and template strand 3 ' of forward primer hold complete coupling number:
Forward_perfect, it is allowed to 3 ' terminal sequences and template strand 3 ' end the mispairing number of forward primer:Forward_mismatch, permits
Perhaps 3 ' terminal sequences of reverse primer and template strand 3 ' hold complete coupling number:Reverse_perfect, it is allowed to 3 ' ends of reverse primer
Sequence and template strand 3 ' hold mispairing number:Reverse_mismatch, mononucleotide high frequency polymorphic site threshold value:maf
3 candidate drugs screening parameter of table
By the screening of preliminary design of primers and candidate drugs, to 46 tumour Individual Chemotherapy medication guide genes
Degenerate primer is designed in two close sites in 58 SNP sites, while also carries out design of primers to other sites;But only set
39 suitable primers are counted out, also have 18 sites not design suitable primer, concrete condition is as shown in Figure 5.
Arrangement (table 4) is made in site for not designing appropriate primer, then changes primer screening parameter, again to not
These sites that primer can be designed are designed and screen again.
The reason for failing the site for designing candidate drugs has been counted in table 4, wherein undesigned is included because Tm values,
The reasons such as G/C content, primer length could not design the site of primer, uniqueness refer to primer lack specificity and could not
Primer is designed, SNP refers to that mononucleotide high frequency polymorphic site exceedes threshold value so primer could not be designed.
Table 4 fails to design the site statistical form of primer
The first step, changes the Tm values in the parameter of screening primer, amplicon length:Amplicon_size, target sequence are left
The localization length of right both ends design primer:Target_side, primer length:primer_size.The specific situation that changes is with reference to table
5。
The amended screening primer parameter of table 5
Project |
Pass through scope |
amplicon_size |
195-249 |
primer_size |
18-30 |
GC |
20%~80% |
Tm |
58~65 |
Target_side |
130 |
buffer |
5 |
forward_perfect |
<9 |
forward_mismatch |
<11 |
reverse_perfect |
<3 |
reverse_mismatch |
<7 |
maf |
0.005 |
By varying above parameter, in 18 sites that could not design primer, 14 sites have devised suitable time
Primer is selected, also has 4 sites to fail to design suitable candidate drugs;Wherein there are two to fail to design because of specificity issues
Go out primer, also have two to be because mononucleotide high frequency polymorphic site exceedes threshold value primer could not be designed.Specific feelings
Condition is with reference to table 6
Fail to design the site statistical form of primer after 6 first step of table
Second step, improves the threshold value of mononucleotide high frequency polymorphic site, allows because mononucleotide high frequency polymorphic site
Design candidate drugs in the site for failing to design primer more than threshold value.Maf values are brought up to 0.27.
By improving Maf values, design and draw because mononucleotide high frequency polymorphic site fails to design more than threshold value
The candidate drugs in the site of thing.
3rd step, is designed and changes screening system to the site for failing to design candidate drugs because of specificity, make
With SE model systems these points are carried out with the screening of candidate drugs, the final multiple PCR primer for obtaining all sites.
By above step, to 58 SNP sites of 46 tumour Individual Chemotherapy medication guide genes, design altogether
57 pairs of primers, one pair of which primer are degenerate primer;And ensure that each SNP site can be capped.
There is the correct band of size in electrophoresis experiment after 57 pairs of primers progress PCR amplifications, and meets the qualified knots of PCR
Fruit;According to experiment, the amplified production for obtaining degenerate primer XL170915034510 has two main bands in below 1000bp, and
The size of primary product band is approached with expected results.Thus illustrate, degenerate primer provided in this embodiment can solve two targets
Sequence is because cannot design appropriate primer at a distance of relatively near the problem of.
Multiple PCR primer design method provided in this embodiment can reduce the erroneous matching of primer and DNA profiling, reduce
Because of non-specific amplification, there are the reasons such as primer dimer and hairpin structure caused by expand failure, systematicness assessment designed by
The specificity of primer, while can identify two-phase close-target sequence and design degenerate primer, design specificity for multiplex PCR experiment
Stronger multiple PCR primer.
It can apply in order to illustrate multiple PCR primer design method provided in this embodiment in high-flux sequence, this reality
The required 57 pairs of primers of 58 SNP sites that example devises 46 tumour Individual Chemotherapy medication guide genes are applied, through excessive
After weighing PCR amplification and building storehouse, it is sequenced on Illumina MiSeq high-flux sequence platforms, finally through analysis of biological information
Obtain depth profile (as shown in Figure 6):57 amplicons all Successful amplifications, wherein lowest depth are 503, and highest depth is
793, difference is less than 200 bp;Between amplicon mean depth is 650~700, all amplicons are all in 5 times of mean depths;
It is considered that the sequencing depth of each amplicon is homogeneous, sequencing quality meets expection.
Although the present invention is disclosed above with preferred embodiment, the scope that the present invention is implemented is not limited to.Any
The those of ordinary skill in field, in the invention scope for not departing from the present invention, improves when can make a little, i.e., every according to this hair
Bright done equal improvement, should be the scope of the present invention and is covered.