A kind of Advances in protein structure prediction based on secondary structure similarity
Technical field
The present invention is that one kind being related to biological information, molecular dynamics simulation, statistical learning and Combinatorial Optimization, computer
Application field, and in particular to, a kind of Advances in protein structure prediction based on secondary structure similarity.
Background technology
Since twentieth century end, with the rapid development of life science, more and more researchers and research aircraft
Structure participates in the research of life science.Protein is to form ammonia by template dehydrating condensation of mRNA by 20 kinds of different amino acid
Base acid sequence forms the three-dimensional structure with specific function using folding.Secondary structure refers to the more of protein
The conformation of regular repetition in peptide chain, such as α spirals and β lamellas.The three-dimensional structure for understanding protein is to study its biological function
And the basis of activity mechanism, Protein Structure Prediction Problem are current bioinformatics and the research heat in computer application field
The design of one of point, invention and drug target albumen to novel protein has highly important directive significance.Pass through reality now
The three-dimensional structure that protein is determined in test mainly has the methods of X- crystal diffractions and nuclear magnetic resonance (NMR).X- crystal diffraction technologies can
To obtain high-precision protein structure, but for that can not prepare the protein of parsing crystal, the method is no longer applicable in.Nuclear-magnetism
Resonance method need not prepare protein crystal, but be only capable of measuring the little albumen less than 300 amino acid, and nuclear magnetic resonance side
Time-consuming for method, of high cost.
Since the speed of protein structure determination is far below the speed of sequencing, in fact, only 0.2% protein
Sequence possesses the protein structure of measuring, therefore the use of computer method is very according to protein prediction structure
Significant work.The experiment of Anfinsen shows that the structural information of protein resides among its sequence, to show from sequence
It is feasible to list hair and carry out structure prediction.The Advances in protein structure prediction of present mainstream mainly has homology method, threads
Method and ab initio prediction method.For the higher situation of sequence phase knowledge and magnanimity (>50%), homologous modeling and threading method are primary selections,
But for sequence similarity it is relatively low (<30%) the case where, the prediction of first two method are no longer applicable in, and can only select ab initio prediction
Method.
During protein structure ab initio prediction, there are two the bottlenecks that are primarily present at present, and one is energy landscape
Duplicity so that the low conformation of energy of acquisition is not the conformation of native state, is embodied in the inaccurate of energy function, no
The conformation that can have been picked out;Another is then deficiency of the existing technology to the ability in sampling of conformational space, is embodied in structure
As lack of diversity.
Therefore, there is defects in terms of precision of prediction and ability in sampling for present Advances in protein structure prediction, need
It improves.
Invention content
In order to overcome existing Advances in protein structure prediction ability in sampling and the insufficient defect of precision of prediction, the present invention to carry
Go out that a kind of ability in sampling is preferable, the higher group's Advances in protein structure prediction based on secondary structure similarity of precision of prediction,
A kind of secondary structure index of similarity is devised, by secondary structure index of similarity and energy function double constraints to select energy
Amount and the more excellent individual of structure are effectively improved since the inaccurate caused protein structure prediction precision of energy function is low
Problem.
The technical solution adopted by the present invention to solve the technical problems is:
A kind of Advances in protein structure prediction based on secondary structure similarity, the described method comprises the following steps:1) join
Number setting, process are as follows:
The sequence information of target protein, fragment library information are read in, the population of protein conformation is arranged in energy function
Population={ x1,x2,...,xi,...,xNP, wherein NP is Population Size, xiIndicate i-th of individual of population, iteration
Number is Gen, maximum iteration Gmax, crossover probability p1, mutation probability p2, sequence length L;
2) secondary structure similarity is calculated, process is as follows:
For the protein x that a sequence length is Li, the two level knot of its prediction is obtained by PSIPRED line servers
Structure isK ∈ [1, L], the secondary structure that conformation is obtained by Dssp algorithms areK ∈ [1, L], according to formula
The secondary structure similarity of conformation is calculated, number of scoring when both k-th of positions secondary structure is identical is 1,
Otherwise, be denoted as 0, traverse obtain final score after entire sequence length divided by sequence length be secondary structure similarity
SSi;
3) initialization of population, process are as follows:
Segment assembling is carried out to the individual of population, until the resi-dues of all positions are all replaced once, then completion is first
Beginningization operates;
4) population is intersected, and process is as follows:
4.1) NP individual is matched two-by-two, forms NP/2 pairs, and to its number a1,a2,...,aj,...,aNP2,j∈
[1,NP/2];
4.2) one group of a therein is randomly choosedj, according to Probability p1It judges whether to intersect, if intersecting, randomly select
Some position of individual is the segment that crosslength ∈ [3,10] exchange the two individuals to intersect length at random, obtains two
A new individual;
5) Population Variation, process are as follows:
5.1) to individual xi, i ∈ [1, NP] use mutation probability p2Segment assembling is carried out, if random (0,1) > p2, then
The segment assembling for carrying out 3 segments, if random (0,1) < p2, then the segment assembling of 9 segments is carried out, is obtained after segment assembling a
Body xi′;
5.2) use energy function to assembling front and back individual xiAnd xi' calculate separately energy and obtain EiAnd Ei', if Ei' <
Ei, then retain individual xi′;If Ei' > Ei, then according to Boltzmann Probability p=exp {-(Ei′-Ei)/KT }, to determine whether connecing
The individual after assembling is received, if random (0,1) < p, then retain individual xi′;Otherwise, retain individual xi;
6) population is selected using secondary structure similarity, process is as follows:
First, the population after initial population and variation is merged into the new population that a Population Size is 2*NP, then, meter
The secondary structure similarity for calculating new population individual sorts to the population after merging according to the height of secondary structure similarity, chooses
Preceding NP secondary structure similarity it is high individual alternatively after population at individual, finally, be arranged Gen=Gen+1;
7) judge whether that reaching maximum obtains iterations Gmax, stop iteration if meeting condition and export last generation kind
Group's individual information, otherwise return to step 4).
The present invention technical concept be:The present invention proposes a kind of based on secondary structure similarity under the frame of group's algorithm
Advances in protein structure prediction.First, the setting of crossover probability can not only control the convergent speed of group in group's algorithm
Degree avoids precocity, also so that carrying out information exchange between group;Then, mutation operation can increase the diversity of conformation, to obtain
More preferably conformation;Finally, population is carried out by using secondary structure similarity during selection preferentially, eliminating two level knot
The smaller individual of structure similarity, leaves more excellent individual, avoids the problem that energy function is inaccurate.
Beneficial effects of the present invention are shown as:One side use groups algorithm carries out information exchange between group, increases conformation
The search in space;On the other hand, population is selected by secondary structure similarity, considerably increases the reservation of high-quality individual
Probability, reduce the error inaccurately brought due to energy function, improve precision of prediction.
Description of the drawings
Fig. 1 is the flow chart of the Advances in protein structure prediction based on secondary structure similarity.
When Fig. 2 is that the Advances in protein structure prediction based on secondary structure similarity carries out structure prediction to protein 1AIL
Obtained conformation distribution map.
Fig. 3 is that the Advances in protein structure prediction based on secondary structure similarity obtains protein 1AIL progress structure predictions
The tomograph arrived.
The present invention is described further below in conjunction with the accompanying drawings.
Referring to Fig.1~Fig. 3, a kind of Advances in protein structure prediction based on secondary structure similarity, the method includes
Following steps:
1) parameter setting, process are as follows:
The sequence information of target protein, fragment library information are read in, the population of protein conformation is arranged in energy function
Population={ x1,x2,...,xi,...,xNP, wherein NP is Population Size, xiIndicate i-th of individual of population, iteration
Number is Gen, maximum iteration Gmax, crossover probability p1, mutation probability p2, sequence length L;
2) secondary structure similarity is calculated, process is as follows:
For the protein x that a sequence length is Li, the two level knot of its prediction is obtained by PSIPRED line servers
Structure isK ∈ [1, L], the secondary structure that conformation is obtained by Dssp algorithms areK ∈ [1, L], according to formula
The secondary structure similarity of conformation is calculated, number of scoring when both k-th of positions secondary structure is identical is 1,
Otherwise, be denoted as 0, traverse obtain final score after entire sequence length divided by sequence length be secondary structure similarity
SSi;
3) initialization of population, process are as follows:
Segment assembling is carried out to the individual of population, until the resi-dues of all positions are all replaced once, then completion is first
Beginningization operates;
4) population is intersected, and process is as follows:
4.1) NP individual is matched two-by-two, forms NP/2 pairs, and to its number a1,a2,...,aj,...,aNP/2,j∈
[1,NP/2];
4.2) one group of a therein is randomly choosedj, according to Probability p1It judges whether to intersect, if intersecting, randomly select
Some position of individual is the segment that crosslength ∈ [3,10] exchange the two individuals to intersect length at random, obtains two
A new individual;
5) Population Variation, process are as follows:
5.1) to individual xi, i ∈ [1, NP] use mutation probability p2Segment assembling is carried out, if random (0,1) > p2, then
The segment assembling for carrying out 3 segments, if random (0,1) < p2, then the segment assembling of 9 segments is carried out, is obtained after segment assembling a
Body xi′;
5.2) use energy function to assembling front and back individual xiAnd xi' calculate separately energy and obtain EiAnd Ei', if Ei' <
Ei, then retain individual xi′;If Ei' > Ei, then according to Boltzmann Probability p=exp {-(Ei′-Ei)/KT }, to determine whether connecing
The individual after assembling is received, if random (0,1) < p, then retain individual xi′;Otherwise, retain individual xi;
6) population is selected using secondary structure similarity, process is as follows:
First, the population after initial population and variation is merged into the new population that a Population Size is 2*NP, then, meter
The secondary structure similarity for calculating new population individual sorts to the population after merging according to the height of secondary structure similarity, chooses
Preceding NP secondary structure similarity it is high individual alternatively after population at individual, finally, be arranged Gen=Gen+1;
7) judge whether that reaching maximum obtains iterations Gmax, stop iteration if meeting condition and export last generation kind
Group's individual information, otherwise return to step 4).
For the α unfolded proteins 1AIL that the present embodiment is 73 using sequence length as embodiment, one kind is similar based on secondary structure
The Advances in protein structure prediction of degree, the described method comprises the following steps:
1) parameter setting, process are as follows:
The sequence information of target protein, fragment library information are read in, protein conformation is arranged in energy function " score3 "
Population population={ x1,x2,...,xi,...,xNP, wherein NP=100 is Population Size, xiIndicate i-th of population
Individual, iterations Gen, maximum iteration Gmax=100, crossover probability p1=0.1, mutation probability p2=0.5, sequence is long
Degree is L=73;
2) secondary structure similarity is calculated, process is as follows:
For the protein x that a sequence length is Li, the two level knot of its prediction is obtained by PSIPRED line servers
Structure isK ∈ [1, L], the secondary structure that conformation is obtained by Dssp algorithms areK ∈ [1, L], according to formula
The secondary structure similarity of conformation is calculated, number of scoring when both k-th of positions secondary structure is identical is 1,
Otherwise, be denoted as 0, traverse obtain final score after entire sequence length divided by sequence length be secondary structure similarity
SSi;
3) initialization of population, process are as follows:
Segment assembling is carried out to the individual of population, until the resi-dues of all positions are all replaced once, then completion is first
Beginningization operates;
4) population is intersected, and process is as follows:
4.1) NP individual is matched two-by-two, forms NP/2 pairs, and to its number a1,a2,...,aj,...,aNP/2,j∈
[1,NP/2];
4.2) one group of a therein is randomly choosedj, according to Probability p1It judges whether to intersect, if intersecting, randomly select
Some position of individual is the segment that crosslength ∈ [3,10] exchange the two individuals to intersect length at random, obtains two
A new individual;
5) Population Variation, process are as follows:
5.1) to individual xi, i ∈ [1, NP] use mutation probability p2Segment assembling is carried out, if random (0,1) > p2, then
The segment assembling for carrying out 3 segments, if random (0,1) < p2, then the segment assembling of 9 segments is carried out, is obtained after segment assembling a
Body xi′;
5.2) use energy function to assembling front and back individual xiAnd xi' calculate separately energy and obtain EiAnd Ei', if Ei' <
Ei, then retain individual xi′;If Ei' > Ei, then according to Boltzmann Probability p=exp {-(Ei′-Ei)/KT }, to determine whether connecing
The individual after assembling is received, if random (0,1) < p, then retain individual xi′;Otherwise, retain individual xi;
6) population is selected using secondary structure similarity, process is as follows:
First, the population after initial population and variation is merged into the new population that a Population Size is 2*NP, then, meter
The secondary structure similarity for calculating new population individual sorts to the population after merging according to the height of secondary structure similarity, chooses
Preceding NP secondary structure similarity it is high individual alternatively after population at individual, finally, be arranged Gen=Gen+1;
7) judge whether that reaching maximum obtains iterations Gmax, stop iteration if meeting condition and export last generation kind
Group's individual information, otherwise return to step 4).
The α unfolded proteins 1AIL for being 73 using sequence length has obtained the protein as embodiment with above method
Nearly native state conformation, lowest mean square root deviation areAverage root-mean-square deviation isPre- geodesic structure is as shown in Figure 3.
Described above is the effect of optimization that is obtained by example using 1AIL protein of the present invention, and the reality of the non-limiting present invention
Range is applied, various modifications and improvement are done to it under the premise of without departing from range involved by substance of the present invention, should not be excluded
Except protection scope of the present invention.