CN108334746A

CN108334746A - A kind of Advances in protein structure prediction based on secondary structure similarity

Info

Publication number: CN108334746A
Application number: CN201810034686.4A
Authority: CN
Inventors: 李章维; 孙科; 余宝昆; 马来发; 周晓根; 张贵军
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Guangzhou Zhaoji Biotechnology Co ltd; Shenzhen Xinrui Gene Technology Co ltd
Priority date: 2018-01-15
Filing date: 2018-01-15
Publication date: 2018-07-27
Anticipated expiration: 2038-01-15
Also published as: CN108334746B

Abstract

A kind of protein conformation space search method based on secondary structure similarity, first, in group's algorithm the setting of crossover probability can not only control the convergent speed of group, precocity is avoided, also so that carrying out information exchange between group；Then, mutation operation can increase the diversity of conformation, to obtain more preferably conformation；Finally, population preferentially, eliminate the smaller individual of secondary structure similarity, leave more excellent individual, avoid the problem that energy function is inaccurate by using secondary structure similarity during selection.Ability in sampling of the present invention is preferable, precision of prediction is higher.

Description

A kind of Advances in protein structure prediction based on secondary structure similarity

Technical field

The present invention is that one kind being related to biological information, molecular dynamics simulation, statistical learning and Combinatorial Optimization, computer Application field, and in particular to, a kind of Advances in protein structure prediction based on secondary structure similarity.

Background technology

Since twentieth century end, with the rapid development of life science, more and more researchers and research aircraft Structure participates in the research of life science.Protein is to form ammonia by template dehydrating condensation of mRNA by 20 kinds of different amino acid Base acid sequence forms the three-dimensional structure with specific function using folding.Secondary structure refers to the more of protein The conformation of regular repetition in peptide chain, such as α spirals and β lamellas.The three-dimensional structure for understanding protein is to study its biological function And the basis of activity mechanism, Protein Structure Prediction Problem are current bioinformatics and the research heat in computer application field The design of one of point, invention and drug target albumen to novel protein has highly important directive significance.Pass through reality now The three-dimensional structure that protein is determined in test mainly has the methods of X- crystal diffractions and nuclear magnetic resonance (NMR).X- crystal diffraction technologies can To obtain high-precision protein structure, but for that can not prepare the protein of parsing crystal, the method is no longer applicable in.Nuclear-magnetism Resonance method need not prepare protein crystal, but be only capable of measuring the little albumen less than 300 amino acid, and nuclear magnetic resonance side Time-consuming for method, of high cost.

Since the speed of protein structure determination is far below the speed of sequencing, in fact, only 0.2% protein Sequence possesses the protein structure of measuring, therefore the use of computer method is very according to protein prediction structure Significant work.The experiment of Anfinsen shows that the structural information of protein resides among its sequence, to show from sequence It is feasible to list hair and carry out structure prediction.The Advances in protein structure prediction of present mainstream mainly has homology method, threads Method and ab initio prediction method.For the higher situation of sequence phase knowledge and magnanimity (>50%), homologous modeling and threading method are primary selections, But for sequence similarity it is relatively low (<30%) the case where, the prediction of first two method are no longer applicable in, and can only select ab initio prediction Method.

During protein structure ab initio prediction, there are two the bottlenecks that are primarily present at present, and one is energy landscape Duplicity so that the low conformation of energy of acquisition is not the conformation of native state, is embodied in the inaccurate of energy function, no The conformation that can have been picked out；Another is then deficiency of the existing technology to the ability in sampling of conformational space, is embodied in structure As lack of diversity.

Therefore, there is defects in terms of precision of prediction and ability in sampling for present Advances in protein structure prediction, need It improves.

Invention content

In order to overcome existing Advances in protein structure prediction ability in sampling and the insufficient defect of precision of prediction, the present invention to carry Go out that a kind of ability in sampling is preferable, the higher group's Advances in protein structure prediction based on secondary structure similarity of precision of prediction, A kind of secondary structure index of similarity is devised, by secondary structure index of similarity and energy function double constraints to select energy Amount and the more excellent individual of structure are effectively improved since the inaccurate caused protein structure prediction precision of energy function is low Problem.

The technical solution adopted by the present invention to solve the technical problems is：

A kind of Advances in protein structure prediction based on secondary structure similarity, the described method comprises the following steps：1) join Number setting, process are as follows：

The sequence information of target protein, fragment library information are read in, the population of protein conformation is arranged in energy function Population={ x₁,x₂,...,x_i,...,x_NP, wherein NP is Population Size, x_iIndicate i-th of individual of population, iteration Number is Gen, maximum iteration G_max, crossover probability p₁, mutation probability p₂, sequence length L；

2) secondary structure similarity is calculated, process is as follows：

For the protein x that a sequence length is L_i, the two level knot of its prediction is obtained by PSIPRED line servers Structure isK ∈ [1, L], the secondary structure that conformation is obtained by Dssp algorithms areK ∈ [1, L], according to formula

The secondary structure similarity of conformation is calculated, number of scoring when both k-th of positions secondary structure is identical is 1, Otherwise, be denoted as 0, traverse obtain final score after entire sequence length divided by sequence length be secondary structure similarity SS_i；

3) initialization of population, process are as follows：

Segment assembling is carried out to the individual of population, until the resi-dues of all positions are all replaced once, then completion is first Beginningization operates；

4) population is intersected, and process is as follows：

4.1) NP individual is matched two-by-two, forms NP/2 pairs, and to its number a₁,a₂,...,a_j,...,a_NP2,j∈ [1,NP/2]；

4.2) one group of a therein is randomly choosed_j, according to Probability p₁It judges whether to intersect, if intersecting, randomly select Some position of individual is the segment that crosslength ∈ [3,10] exchange the two individuals to intersect length at random, obtains two A new individual；

5) Population Variation, process are as follows：

5.1) to individual x_i, i ∈ [1, NP] use mutation probability p₂Segment assembling is carried out, if random (0,1) ＞ p₂, then The segment assembling for carrying out 3 segments, if random (0,1) ＜ p₂, then the segment assembling of 9 segments is carried out, is obtained after segment assembling a Body x_i′；

5.2) use energy function to assembling front and back individual x_iAnd x_i' calculate separately energy and obtain E_iAnd E_i', if E_i' ＜ E_i, then retain individual x_i′；If E_i' ＞ E_i, then according to Boltzmann Probability p=exp {-(E_i′-E_i)/KT }, to determine whether connecing The individual after assembling is received, if random (0,1) ＜ p, then retain individual x_i′；Otherwise, retain individual x_i；

6) population is selected using secondary structure similarity, process is as follows：

First, the population after initial population and variation is merged into the new population that a Population Size is 2*NP, then, meter The secondary structure similarity for calculating new population individual sorts to the population after merging according to the height of secondary structure similarity, chooses Preceding NP secondary structure similarity it is high individual alternatively after population at individual, finally, be arranged Gen=Gen+1；

7) judge whether that reaching maximum obtains iterations G_max, stop iteration if meeting condition and export last generation kind Group's individual information, otherwise return to step 4).

The present invention technical concept be：The present invention proposes a kind of based on secondary structure similarity under the frame of group's algorithm Advances in protein structure prediction.First, the setting of crossover probability can not only control the convergent speed of group in group's algorithm Degree avoids precocity, also so that carrying out information exchange between group；Then, mutation operation can increase the diversity of conformation, to obtain More preferably conformation；Finally, population is carried out by using secondary structure similarity during selection preferentially, eliminating two level knot The smaller individual of structure similarity, leaves more excellent individual, avoids the problem that energy function is inaccurate.

Beneficial effects of the present invention are shown as：One side use groups algorithm carries out information exchange between group, increases conformation The search in space；On the other hand, population is selected by secondary structure similarity, considerably increases the reservation of high-quality individual Probability, reduce the error inaccurately brought due to energy function, improve precision of prediction.

Description of the drawings

Fig. 1 is the flow chart of the Advances in protein structure prediction based on secondary structure similarity.

When Fig. 2 is that the Advances in protein structure prediction based on secondary structure similarity carries out structure prediction to protein 1AIL Obtained conformation distribution map.

Fig. 3 is that the Advances in protein structure prediction based on secondary structure similarity obtains protein 1AIL progress structure predictions The tomograph arrived.

The present invention is described further below in conjunction with the accompanying drawings.

Referring to Fig.1~Fig. 3, a kind of Advances in protein structure prediction based on secondary structure similarity, the method includes Following steps：

1) parameter setting, process are as follows：

2) secondary structure similarity is calculated, process is as follows：

3) initialization of population, process are as follows：

4) population is intersected, and process is as follows：

4.1) NP individual is matched two-by-two, forms NP/2 pairs, and to its number a₁,a₂,...,a_j,...,a_NP/2,j∈ [1,NP/2]；

5) Population Variation, process are as follows：

For the α unfolded proteins 1AIL that the present embodiment is 73 using sequence length as embodiment, one kind is similar based on secondary structure The Advances in protein structure prediction of degree, the described method comprises the following steps：

1) parameter setting, process are as follows：

The sequence information of target protein, fragment library information are read in, protein conformation is arranged in energy function " score3 " Population population={ x₁,x₂,...,x_i,...,x_NP, wherein NP=100 is Population Size, x_iIndicate i-th of population Individual, iterations Gen, maximum iteration G_max=100, crossover probability p₁=0.1, mutation probability p₂=0.5, sequence is long Degree is L=73；

2) secondary structure similarity is calculated, process is as follows：

3) initialization of population, process are as follows：

4) population is intersected, and process is as follows：

5) Population Variation, process are as follows：

The α unfolded proteins 1AIL for being 73 using sequence length has obtained the protein as embodiment with above method Nearly native state conformation, lowest mean square root deviation areAverage root-mean-square deviation isPre- geodesic structure is as shown in Figure 3.

Described above is the effect of optimization that is obtained by example using 1AIL protein of the present invention, and the reality of the non-limiting present invention Range is applied, various modifications and improvement are done to it under the premise of without departing from range involved by substance of the present invention, should not be excluded Except protection scope of the present invention.

Claims

1. a kind of Advances in protein structure prediction based on secondary structure similarity, which is characterized in that the protein structure is pre- Survey method includes the following steps：

1) parameter setting, process are as follows：

2) secondary structure similarity is calculated, process is as follows：

For the protein x that a sequence length is L_i, obtaining the secondary structure that it is predicted by PSIPRED line servers isK ∈ [1, L], the secondary structure that conformation is obtained by Dssp algorithms areK ∈ [1, L], according to formula

3) initialization of population, process are as follows：

Segment assembling is carried out to the individual of population, until the resi-dues of all positions are all replaced once, then completes to initialize Operation；

4) population is intersected, and process is as follows：

4.1) NP individual is matched two-by-two, forms NP/2 pairs, and to its number a₁,a₂,...,a_j,...,a_NP/2,j∈[1,NP/ 2]；

4.2) one group of a therein is randomly choosed_j, according to Probability p₁It judges whether to intersect, if intersecting, randomly selects individual Some position, be segments that crosslength ∈ [3,10] exchange the two individuals to intersect length at random, obtain two it is new Individual；

5) Population Variation, process are as follows：

5.1) to individual x_i, i ∈ [1, NP] use mutation probability p₂Segment assembling is carried out, if random (0,1) ＞ p₂, then 3 are carried out The segment of segment assembles, if random (0,1) ＜ p₂, then the segment assembling of 9 segments is carried out, individual x is obtained after segment assembling_i′；

5.2) use energy function to assembling front and back individual x_iAnd x_i' calculate separately energy and obtain E_iAnd E_i', if E_i' ＜ E_i, then Retain individual x_i′；If E_i' ＞ E_i, then according to Boltzmann Probability p=exp {-(E_i′-E_i)/KT }, to determine whether receiving assembling Individual afterwards, if random (0,1) ＜ p, then retain individual x_i′；Otherwise, retain individual x_i；

First, the population after initial population and variation is merged into the new population that a Population Size is 2*NP, then, calculated new The secondary structure similarity of population at individual sorts to the population after merging according to the height of secondary structure similarity, NP before choosing A secondary structure similarity it is high individual alternatively after population at individual, finally, be arranged Gen=Gen+1；

7) judge whether that reaching maximum obtains iterations G_max, stop iteration if meeting condition and export last generation population Body information, otherwise return to step 4).