CN109063413B

CN109063413B - Method for optimizing space of protein conformation by population hill climbing iteration

Info

Publication number: CN109063413B
Application number: CN201810579338.5A
Authority: CN
Inventors: 张贵军; 刘俊; 彭春祥; 周晓根; 胡俊; 余宝昆
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2018-06-07
Filing date: 2018-06-07
Publication date: 2021-04-06
Anticipated expiration: 2038-06-07
Also published as: CN109063413A

Abstract

A population hill climbing iterative protein conformation space optimization method includes the steps of firstly, utilizing a Rosetta protocol to conduct large-scale conformation search, then utilizing an iterative hill climbing search method to conduct further exploration on a conformation space, effectively avoiding trapping local optima while improving conformation space search efficiency, and forming a three-dimensional structure closer to a natural protein, so that accuracy of protein structure prediction is improved. The invention provides a method for optimizing a group hill-climbing iterative protein conformation space with high prediction precision.

Description

Method for optimizing space of protein conformation by population hill climbing iteration

Technical Field

The invention relates to the fields of bioinformatics and computer application, in particular to a method for optimizing a group mountain climbing iterative protein conformation space.

Background

Protein molecules play a crucial role in the course of biochemical reactions in biological cells. It is estimated that the highest content of organic substances is the highest protein content in cells of living bodies, which is 15% to 20%. The protein has abundant functions and plays an important role in the normal operation of the organism. The three-dimensional structure of a protein determines the function of the protein, and the protein can only be correctly folded into a specific three-dimensional structure to generate a specific biological function. The diseases such as mad cow disease, senile dementia and the like are caused by protein misfolding. Therefore, it is necessary to obtain a three-dimensional structure of a protein in order to understand the function of the protein and cure various diseases related to the protein.

Different proteins possess different amino acid sequences, and understanding the three-dimensional structure of proteins is the basis for studying their biological functions. The mainstream experimental methods for determining the tertiary structure of protein include X-ray crystal diffraction, nuclear magnetic resonance and the like. X-ray crystal diffraction enables the acquisition of highly accurate protein structures, but many proteins have difficulty in preparing crystals for structure analysis; whereas nmr methods are generally only capable of measuring small proteins no longer than 300 amino acids in length. The cryoelectron microscopy technology has recently developed rapidly, with the major advantage of being able to determine the structure of large proteins. Because the experimental determination speed of protein structure is far from the speed of sequence determination, it is important to predict the three-dimensional structure of protein by simulating the process of protein folding from amino acid sequence into specific space structure by combining computer technology and bioinformatics method. Anfinsen et al demonstrated: in general, proteins are capable of spontaneously folding into a particular structural conformation. That is, structural information of a protein is contained in its amino acid sequence. Therefore, it is feasible to predict the three-dimensional structure of a protein based on its amino acid sequence.

Protein structure prediction methods are mainly classified into homology modeling, canonical methods, and de novo prediction methods. Where the de novo prediction method does not rely on a database of known structures, with the possibility of finding new structure types. The existing successful methods for predicting the protein structure from the head include a Rosetta method designed by David Baker and a team thereof, a QUARK method developed by Zhangyang and the team thereof, and the like. However, a very complete method for predicting the three-dimensional structure of a protein is not available so far. The existing conformation space optimization method has the problems of low search efficiency, low convergence speed and the like, even falls into local optimum, and the phenomenon of premature convergence occurs, thereby influencing the prediction precision.

Therefore, the current conformational space optimization methods are deficient in search efficiency and prediction accuracy, and need to be improved.

Disclosure of Invention

In order to overcome the defects of the conventional conformational space optimization method in search efficiency and prediction precision, the invention provides a population hill-climbing iterative protein conformational space optimization method with higher prediction precision. Firstly, initializing a population by utilizing a first phase, a second phase, a third phase and a fourth phase of a Rosetta protocol; and then, further exploring the conformational space by using an iterative hill-climbing search method, so that the search efficiency of the conformational space is improved, and the situation that the conformational space is trapped in local optimum is effectively avoided.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a population hill-climbing iterative protein conformation space optimization method, comprising the steps of:

1) inputting sequence information of a target protein;

2) setting parameters: population size NP, number of iterations G_maxThe number of cross iterations HC, the number of variant iterations HM;

3) population initialization: iterating the first, second, third and fourth stages of the Rosetta protocol to generate a population P of NP individuals ═ P₁,P₂,...,P_NP}；

4) Iterative hill climbing search, the process is as follows:

4.1) set G ═ 1, where G ∈ {1,2_max}；

4.2) random selection of two individuals P from the population P¹,P²And select P¹,P²The individual with the lowest energy in Rosetta score3 is taken as the optimal individual in the cross stage

4.3) iterative cross-over phase, the process is as follows:

4.3.1) setting HC ═ 1, where HC ∈ {1, 2.

4.3.2) generating uniform random integers rand1, rand1 ∈ [1, L ], wherein L represents the length of the sequence of the target protein;

4.3.3) exchange P with residue rand1 as the crossover point¹,P²The structures before and after the intersection point, to generate the crossed body

And select

The lowest energy individual of Rosetta score3 was used as the test individual

4.3.4) determining individuals based on Metropolis criteria

Whether or not to replace

The process is as follows:

4.3.4.1) calculated using the Rosetta score3 energy function

And

energy of

And

order to

4.3.4.2) the replacement probability p is calculated as follows,

KT is a temperature parameter and is set to be 2 by default;

4.3.4.3) generates random uniform fraction rand2, rand2 belongs to [0,1 ];

4.3.4.4) if rand2 is not more than p, use

Replacement of

Otherwise, keep

The change is not changed;

4.3.5)hc＝hc+1；

4.3.6) if HC is less than or equal to HC, go to step 4.3.2); otherwise, ending the iterative crossover stage and entering the iterative variation stage;

4.4) iterative variation phase, the process is as follows:

4.4.1) order

Wherein

And

respectively representing the optimal individual and the target individual in the variation stage;

4.4.2) set HM ═ 1, where HM ∈ {1, 2.

4.4.3) pairs

Performing mutation operation on each segment window to select the optimal variant individual, wherein the process is as follows:

4.4.3.1) setting the fragment window number hw equal to 1, where hw is equal to {1, 2.., L-2}, and L represents the length of the sequence of the predicted protein;

4.4.3.2) randomly selecting a fragment from the fragment library corresponding to the hw window, and replacing the fragment with the fragment to generate variant individuals

4.4.3.3) determining whether to use the individual according to Metropolis criteria

Replacement of

4.4.3.4)hw＝hw+1；

4.4.3.5) if hw is less than or equal to L-2, go to step 4.4.3.2); otherwise, go to step 4.4.4);

4.4.4) if in step 4.4.3)

If it is successfully replaced, the command

4.4.5)hm＝hm+1；

4.4.6) if HM is less than or equal to HM, turning to the step 4.4.3); otherwise, ending the iterative variation stage and entering the selection stage;

4.5) selection phase, the process is as follows:

4.5.1) selecting the two individuals with the highest energy from the population P according to the Rosetta score3 energy function

4.5.2) use respectively

And

replacement of

And

4.6)g＝g+1；

4.7) if G is less than or equal to G_maxGo to step 4.2); otherwise, ending the iterative hill climbing search;

5) and clustering the individuals in the population P according to a Rosetta clustering algorithm, and selecting the heart-like conformation individual of the maximum class as a final prediction result.

The invention has the beneficial effects that: firstly, a Rosetta protocol is utilized to search the conformation in a large range, then an iterative hill climbing search method is utilized to further explore the conformation space, the search efficiency of the conformation space is improved, the conformation space is effectively prevented from being trapped into local optimum, a three-dimensional structure closer to natural protein is formed, and the prediction precision of the protein structure is improved.

Drawings

Fig. 1 is a schematic diagram of conformation update when a population hill-climbing iterative protein conformation space optimization method is used for performing structure prediction on protein 1HZ 6.

FIG. 2 is a three-dimensional structure diagram obtained by performing structure prediction on protein 1HZ6 by a population hill-climbing iterative protein conformation space optimization method.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Referring to fig. 1 and2, a method for optimizing a population hill-climbing iterative protein conformation space comprises the following steps:

1) inputting sequence information of a target protein;

4) Iterative hill climbing search, the process is as follows:

4.1) set G ═ 1, where G ∈ {1,2_max}；

4.3) iterative cross-over phase, the process is as follows:

4.3.1) setting HC ═ 1, where HC ∈ {1, 2.

And select

The lowest energy individual of Rosetta score3 was used as the test individual

4.3.4) determining individuals based on Metropolis criteria

Whether or not to replace

The process is as follows:

4.3.4.1) calculated using the Rosetta score3 energy function

And

energy of

And

order to

4.3.4.2) the replacement probability p is calculated as follows,

KT is a temperature parameter and is set to be 2 by default;

4.3.4.3) generates random uniform fraction rand2, rand2 belongs to [0,1 ];

4.3.4.4) if rand2 is not more than p, use

Replacement of

Otherwise, keep

The change is not changed;

4.3.5)hc＝hc+1；

4.4) iterative variation phase, the process is as follows:

4.4.1) order

Wherein

And

4.4.2) set HM ═ 1, where HM ∈ {1, 2.

4.4.3) pairs

Performing mutation operation on each segment window to select the optimal segment windowThe process of (1) is as follows:

Replacement of

4.4.3.4)hw＝hw+1；

4.4.4) if in step 4.4.3)

If it is successfully replaced, the command

4.4.5)hm＝hm+1；

4.5) selection phase, the process is as follows:

4.5.2) use respectively

And

replacement of

And

4.6)g＝g+1；

In this embodiment, the protein 1HZ6 with a sequence length of 72 is taken as an example, and a method for optimizing the conformational space of a population hill-climbing iterative protein comprises the following steps:

1) inputting sequence information of the target protein 1HZ 6;

2) setting parameters: population size NP 200, iteration number G _max1000, 20 times of cross iteration HC, 20 times of variant iteration HM;

4) Iterative hill climbing search, the process is as follows:

4.1) set G ═ 1, where G ∈ {1,2_max}；

4.3) iterative cross-over phase, the process is as follows:

4.3.1) setting HC ═ 1, where HC ∈ {1, 2.

And select

The lowest energy individual of Rosetta score3 was used as the test individual

4.3.4) determining individuals based on Metropolis criteria

Whether or not to replace

The process is as follows:

4.3.4.1) calculated using the Rosetta score3 energy function

And

energy of

And

order to

4.3.4.2) the replacement probability p is calculated as follows,

KT is a temperature parameter and is set to be 2 by default;

4.3.4.3) generates random uniform fraction rand2, rand2 belongs to [0,1 ];

4.3.4.4) if rand2 is not more than p, use

Replacement of

Otherwise, keep

The change is not changed;

4.3.5)hc＝hc+1；

4.4) iterative variation phase, the process is as follows:

4.4.1) order

Wherein

And

4.4.2) set HM ═ 1, where HM ∈ {1, 2.

4.4.3) pairs

Replacement of

4.4.3.4)hw＝hw+1；

4.4.4) if in step 4.4.3)

If it is successfully replaced, the command

4.4.5)hm＝hm+1；

4.5) selection phase, the process is as follows:

4.5.2) use respectively

And

replacement of

And

4.6)g＝g+1；

Using protein 1HZ6 with amino acid sequence length of 72 as an example, the above method can obtain the near-native conformation of the protein, the conformation renewal scheme is shown in FIG. 1, and the minimum root mean square deviation is

The prediction structure is shown in fig. 2.

The foregoing is a predictive description of the invention as embodied in one embodiment, and it will be apparent that the invention is not limited to the embodiment described above, but may be embodied with various modifications without departing from the basic inventive concept and without departing from the spirit thereof.

Claims

1. A method for optimizing a population hill-climbing iterative protein conformation space is characterized by comprising the following steps: the conformation space optimization method comprises the following steps:

1) inputting sequence information of a target protein;

2) setting parameters: population size NP, number of iterations G_maxCross iteration number HC and variation iteration number HM;

4) Iterative hill climbing search, the process is as follows:

4.1) set G ═ 1, where G ∈ {1,2_max}；

4.2) random selection of two individuals P from the population P¹,P²And select P¹,P²The individual with the lowest energy in Rosetta score3 is taken as the optimal cross-stage individual P_c ^best；

4.3) iterative cross-over phase, the process is as follows:

4.3.1) setting HC ═ 1, where HC ∈ {1, 2.

4.3.3) exchange P with residue rand1 as the crossover point¹,P²The structures before and after the intersection point generate the crossed individual P_c ¹,P_c ²And select P_c ¹,P_c ²The lowest energy individual of Rosetta score3 was designated as test individual P_c ^trial；

4.3.4) determining individual P according to Metropolis criteria_c ^trialWhether or not to replace P_c ^bestThe process is as follows:

4.3.4.1) calculating P using the Rosetta score3 energy function_c ^bestAnd P_c ^trialEnergy of

And

order to

4.3.4.2) the replacement probability p is calculated as follows,

KT is a temperature parameter and is set to be 2 by default;

4.3.4.3) generates random uniform fraction rand2, rand2 belongs to [0,1 ];

4.3.4.4) if rand2 is not more than P, use P_c ^trialReplacement of P_c ^best(ii) a Otherwise, P is maintained_c ^bestThe change is not changed;

4.3.5)hc＝hc+1；

4.4) iterative variation phase, the process is as follows:

4.4.1) order

Wherein

And

4.4.2) set HM ═ 1, where HM ∈ {1, 2.

4.4.3) pairs

Performing mutation operation on each segment window, selecting the optimal variant individuals,

the process is as follows:

4.4.3.1) setting the fragment window number hw equal to 1, wherein hw is equal to {1, 2.., L-2}, and L represents the length of the sequence of the target protein;