CN109063413B - Method for optimizing space of protein conformation by population hill climbing iteration - Google Patents
Method for optimizing space of protein conformation by population hill climbing iteration Download PDFInfo
- Publication number
- CN109063413B CN109063413B CN201810579338.5A CN201810579338A CN109063413B CN 109063413 B CN109063413 B CN 109063413B CN 201810579338 A CN201810579338 A CN 201810579338A CN 109063413 B CN109063413 B CN 109063413B
- Authority
- CN
- China
- Prior art keywords
- iterative
- individual
- population
- protein
- follows
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
A population hill climbing iterative protein conformation space optimization method includes the steps of firstly, utilizing a Rosetta protocol to conduct large-scale conformation search, then utilizing an iterative hill climbing search method to conduct further exploration on a conformation space, effectively avoiding trapping local optima while improving conformation space search efficiency, and forming a three-dimensional structure closer to a natural protein, so that accuracy of protein structure prediction is improved. The invention provides a method for optimizing a group hill-climbing iterative protein conformation space with high prediction precision.
Description
Technical Field
The invention relates to the fields of bioinformatics and computer application, in particular to a method for optimizing a group mountain climbing iterative protein conformation space.
Background
Protein molecules play a crucial role in the course of biochemical reactions in biological cells. It is estimated that the highest content of organic substances is the highest protein content in cells of living bodies, which is 15% to 20%. The protein has abundant functions and plays an important role in the normal operation of the organism. The three-dimensional structure of a protein determines the function of the protein, and the protein can only be correctly folded into a specific three-dimensional structure to generate a specific biological function. The diseases such as mad cow disease, senile dementia and the like are caused by protein misfolding. Therefore, it is necessary to obtain a three-dimensional structure of a protein in order to understand the function of the protein and cure various diseases related to the protein.
Different proteins possess different amino acid sequences, and understanding the three-dimensional structure of proteins is the basis for studying their biological functions. The mainstream experimental methods for determining the tertiary structure of protein include X-ray crystal diffraction, nuclear magnetic resonance and the like. X-ray crystal diffraction enables the acquisition of highly accurate protein structures, but many proteins have difficulty in preparing crystals for structure analysis; whereas nmr methods are generally only capable of measuring small proteins no longer than 300 amino acids in length. The cryoelectron microscopy technology has recently developed rapidly, with the major advantage of being able to determine the structure of large proteins. Because the experimental determination speed of protein structure is far from the speed of sequence determination, it is important to predict the three-dimensional structure of protein by simulating the process of protein folding from amino acid sequence into specific space structure by combining computer technology and bioinformatics method. Anfinsen et al demonstrated: in general, proteins are capable of spontaneously folding into a particular structural conformation. That is, structural information of a protein is contained in its amino acid sequence. Therefore, it is feasible to predict the three-dimensional structure of a protein based on its amino acid sequence.
Protein structure prediction methods are mainly classified into homology modeling, canonical methods, and de novo prediction methods. Where the de novo prediction method does not rely on a database of known structures, with the possibility of finding new structure types. The existing successful methods for predicting the protein structure from the head include a Rosetta method designed by David Baker and a team thereof, a QUARK method developed by Zhangyang and the team thereof, and the like. However, a very complete method for predicting the three-dimensional structure of a protein is not available so far. The existing conformation space optimization method has the problems of low search efficiency, low convergence speed and the like, even falls into local optimum, and the phenomenon of premature convergence occurs, thereby influencing the prediction precision.
Therefore, the current conformational space optimization methods are deficient in search efficiency and prediction accuracy, and need to be improved.
Disclosure of Invention
In order to overcome the defects of the conventional conformational space optimization method in search efficiency and prediction precision, the invention provides a population hill-climbing iterative protein conformational space optimization method with higher prediction precision. Firstly, initializing a population by utilizing a first phase, a second phase, a third phase and a fourth phase of a Rosetta protocol; and then, further exploring the conformational space by using an iterative hill-climbing search method, so that the search efficiency of the conformational space is improved, and the situation that the conformational space is trapped in local optimum is effectively avoided.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a population hill-climbing iterative protein conformation space optimization method, comprising the steps of:
1) inputting sequence information of a target protein;
2) setting parameters: population size NP, number of iterations GmaxThe number of cross iterations HC, the number of variant iterations HM;
3) population initialization: iterating the first, second, third and fourth stages of the Rosetta protocol to generate a population P of NP individuals ═ P1,P2,...,PNP};
4) Iterative hill climbing search, the process is as follows:
4.1) set G ═ 1, where G ∈ {1,2max};
4.2) random selection of two individuals P from the population P1,P2And select P1,P2The individual with the lowest energy in Rosetta score3 is taken as the optimal individual in the cross stage
4.3) iterative cross-over phase, the process is as follows:
4.3.1) setting HC ═ 1, where HC ∈ {1, 2.
4.3.2) generating uniform random integers rand1, rand1 ∈ [1, L ], wherein L represents the length of the sequence of the target protein;
4.3.3) exchange P with residue rand1 as the crossover point1,P2The structures before and after the intersection point, to generate the crossed bodyAnd selectThe lowest energy individual of Rosetta score3 was used as the test individual
4.3.4) determining individuals based on Metropolis criteriaWhether or not to replaceThe process is as follows:
4.3.4.2) the replacement probability p is calculated as follows,
KT is a temperature parameter and is set to be 2 by default;
4.3.4.3) generates random uniform fraction rand2, rand2 belongs to [0,1 ];
4.3.5)hc=hc+1;
4.3.6) if HC is less than or equal to HC, go to step 4.3.2); otherwise, ending the iterative crossover stage and entering the iterative variation stage;
4.4) iterative variation phase, the process is as follows:
4.4.1) orderWhereinAndrespectively representing the optimal individual and the target individual in the variation stage;
4.4.2) set HM ═ 1, where HM ∈ {1, 2.
4.4.3) pairsPerforming mutation operation on each segment window to select the optimal variant individual, wherein the process is as follows:
4.4.3.1) setting the fragment window number hw equal to 1, where hw is equal to {1, 2.., L-2}, and L represents the length of the sequence of the predicted protein;
4.4.3.2) randomly selecting a fragment from the fragment library corresponding to the hw window, and replacing the fragment with the fragment to generate variant individuals
4.4.3.4)hw=hw+1;
4.4.3.5) if hw is less than or equal to L-2, go to step 4.4.3.2); otherwise, go to step 4.4.4);
4.4.5)hm=hm+1;
4.4.6) if HM is less than or equal to HM, turning to the step 4.4.3); otherwise, ending the iterative variation stage and entering the selection stage;
4.5) selection phase, the process is as follows:
4.5.1) selecting the two individuals with the highest energy from the population P according to the Rosetta score3 energy function
4.6)g=g+1;
4.7) if G is less than or equal to GmaxGo to step 4.2); otherwise, ending the iterative hill climbing search;
5) and clustering the individuals in the population P according to a Rosetta clustering algorithm, and selecting the heart-like conformation individual of the maximum class as a final prediction result.
The invention has the beneficial effects that: firstly, a Rosetta protocol is utilized to search the conformation in a large range, then an iterative hill climbing search method is utilized to further explore the conformation space, the search efficiency of the conformation space is improved, the conformation space is effectively prevented from being trapped into local optimum, a three-dimensional structure closer to natural protein is formed, and the prediction precision of the protein structure is improved.
Drawings
Fig. 1 is a schematic diagram of conformation update when a population hill-climbing iterative protein conformation space optimization method is used for performing structure prediction on protein 1HZ 6.
FIG. 2 is a three-dimensional structure diagram obtained by performing structure prediction on protein 1HZ6 by a population hill-climbing iterative protein conformation space optimization method.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1 and2, a method for optimizing a population hill-climbing iterative protein conformation space comprises the following steps:
1) inputting sequence information of a target protein;
2) setting parameters: population size NP, number of iterations GmaxThe number of cross iterations HC, the number of variant iterations HM;
3) population initialization: iterating the first, second, third and fourth stages of the Rosetta protocol to generate a population P of NP individuals ═ P1,P2,...,PNP};
4) Iterative hill climbing search, the process is as follows:
4.1) set G ═ 1, where G ∈ {1,2max};
4.2) random selection of two individuals P from the population P1,P2And select P1,P2The individual with the lowest energy in Rosetta score3 is taken as the optimal individual in the cross stage
4.3) iterative cross-over phase, the process is as follows:
4.3.1) setting HC ═ 1, where HC ∈ {1, 2.
4.3.2) generating uniform random integers rand1, rand1 ∈ [1, L ], wherein L represents the length of the sequence of the target protein;
4.3.3) exchange P with residue rand1 as the crossover point1,P2The structures before and after the intersection point, to generate the crossed bodyAnd selectThe lowest energy individual of Rosetta score3 was used as the test individual
4.3.4) determining individuals based on Metropolis criteriaWhether or not to replaceThe process is as follows:
4.3.4.2) the replacement probability p is calculated as follows,
KT is a temperature parameter and is set to be 2 by default;
4.3.4.3) generates random uniform fraction rand2, rand2 belongs to [0,1 ];
4.3.5)hc=hc+1;
4.3.6) if HC is less than or equal to HC, go to step 4.3.2); otherwise, ending the iterative crossover stage and entering the iterative variation stage;
4.4) iterative variation phase, the process is as follows:
4.4.1) orderWhereinAndrespectively representing the optimal individual and the target individual in the variation stage;
4.4.2) set HM ═ 1, where HM ∈ {1, 2.
4.4.3) pairsPerforming mutation operation on each segment window to select the optimal segment windowThe process of (1) is as follows:
4.4.3.1) setting the fragment window number hw equal to 1, where hw is equal to {1, 2.., L-2}, and L represents the length of the sequence of the predicted protein;
4.4.3.2) randomly selecting a fragment from the fragment library corresponding to the hw window, and replacing the fragment with the fragment to generate variant individuals
4.4.3.4)hw=hw+1;
4.4.3.5) if hw is less than or equal to L-2, go to step 4.4.3.2); otherwise, go to step 4.4.4);
4.4.5)hm=hm+1;
4.4.6) if HM is less than or equal to HM, turning to the step 4.4.3); otherwise, ending the iterative variation stage and entering the selection stage;
4.5) selection phase, the process is as follows:
4.5.1) selecting the two individuals with the highest energy from the population P according to the Rosetta score3 energy function
4.6)g=g+1;
4.7) if G is less than or equal to GmaxGo to step 4.2); otherwise, ending the iterative hill climbing search;
5) and clustering the individuals in the population P according to a Rosetta clustering algorithm, and selecting the heart-like conformation individual of the maximum class as a final prediction result.
In this embodiment, the protein 1HZ6 with a sequence length of 72 is taken as an example, and a method for optimizing the conformational space of a population hill-climbing iterative protein comprises the following steps:
1) inputting sequence information of the target protein 1HZ 6;
2) setting parameters: population size NP 200, iteration number G max1000, 20 times of cross iteration HC, 20 times of variant iteration HM;
3) population initialization: iterating the first, second, third and fourth stages of the Rosetta protocol to generate a population P of NP individuals ═ P1,P2,...,PNP};
4) Iterative hill climbing search, the process is as follows:
4.1) set G ═ 1, where G ∈ {1,2max};
4.2) random selection of two individuals P from the population P1,P2And select P1,P2The individual with the lowest energy in Rosetta score3 is taken as the optimal individual in the cross stage
4.3) iterative cross-over phase, the process is as follows:
4.3.1) setting HC ═ 1, where HC ∈ {1, 2.
4.3.2) generating uniform random integers rand1, rand1 ∈ [1, L ], wherein L represents the length of the sequence of the target protein;
4.3.3) exchange P with residue rand1 as the crossover point1,P2The structures before and after the intersection point, to generate the crossed bodyAnd selectThe lowest energy individual of Rosetta score3 was used as the test individual
4.3.4) determining individuals based on Metropolis criteriaWhether or not to replaceThe process is as follows:
4.3.4.2) the replacement probability p is calculated as follows,
KT is a temperature parameter and is set to be 2 by default;
4.3.4.3) generates random uniform fraction rand2, rand2 belongs to [0,1 ];
4.3.5)hc=hc+1;
4.3.6) if HC is less than or equal to HC, go to step 4.3.2); otherwise, ending the iterative crossover stage and entering the iterative variation stage;
4.4) iterative variation phase, the process is as follows:
4.4.1) orderWhereinAndrespectively representing the optimal individual and the target individual in the variation stage;
4.4.2) set HM ═ 1, where HM ∈ {1, 2.
4.4.3) pairsPerforming mutation operation on each segment window to select the optimal variant individual, wherein the process is as follows:
4.4.3.1) setting the fragment window number hw equal to 1, where hw is equal to {1, 2.., L-2}, and L represents the length of the sequence of the predicted protein;
4.4.3.2) randomly selecting a fragment from the fragment library corresponding to the hw window, and replacing the fragment with the fragment to generate variant individuals
4.4.3.4)hw=hw+1;
4.4.3.5) if hw is less than or equal to L-2, go to step 4.4.3.2); otherwise, go to step 4.4.4);
4.4.5)hm=hm+1;
4.4.6) if HM is less than or equal to HM, turning to the step 4.4.3); otherwise, ending the iterative variation stage and entering the selection stage;
4.5) selection phase, the process is as follows:
4.5.1) selecting the two individuals with the highest energy from the population P according to the Rosetta score3 energy function
4.6)g=g+1;
4.7) if G is less than or equal to GmaxGo to step 4.2); otherwise, ending the iterative hill climbing search;
5) and clustering the individuals in the population P according to a Rosetta clustering algorithm, and selecting the heart-like conformation individual of the maximum class as a final prediction result.
Using protein 1HZ6 with amino acid sequence length of 72 as an example, the above method can obtain the near-native conformation of the protein, the conformation renewal scheme is shown in FIG. 1, and the minimum root mean square deviation isThe prediction structure is shown in fig. 2.
The foregoing is a predictive description of the invention as embodied in one embodiment, and it will be apparent that the invention is not limited to the embodiment described above, but may be embodied with various modifications without departing from the basic inventive concept and without departing from the spirit thereof.
Claims (1)
1. A method for optimizing a population hill-climbing iterative protein conformation space is characterized by comprising the following steps: the conformation space optimization method comprises the following steps:
1) inputting sequence information of a target protein;
2) setting parameters: population size NP, number of iterations GmaxCross iteration number HC and variation iteration number HM;
3) population initialization: iterating the first, second, third and fourth stages of the Rosetta protocol to generate a population P of NP individuals ═ P1,P2,...,PNP};
4) Iterative hill climbing search, the process is as follows:
4.1) set G ═ 1, where G ∈ {1,2max};
4.2) random selection of two individuals P from the population P1,P2And select P1,P2The individual with the lowest energy in Rosetta score3 is taken as the optimal cross-stage individual Pc best;
4.3) iterative cross-over phase, the process is as follows:
4.3.1) setting HC ═ 1, where HC ∈ {1, 2.
4.3.2) generating uniform random integers rand1, rand1 ∈ [1, L ], wherein L represents the length of the sequence of the target protein;
4.3.3) exchange P with residue rand1 as the crossover point1,P2The structures before and after the intersection point generate the crossed individual Pc 1,Pc 2And select Pc 1,Pc 2The lowest energy individual of Rosetta score3 was designated as test individual Pc trial;
4.3.4) determining individual P according to Metropolis criteriac trialWhether or not to replace Pc bestThe process is as follows:
4.3.4.1) calculating P using the Rosetta score3 energy functionc bestAnd Pc trialEnergy ofAndorder to
4.3.4.2) the replacement probability p is calculated as follows,
KT is a temperature parameter and is set to be 2 by default;
4.3.4.3) generates random uniform fraction rand2, rand2 belongs to [0,1 ];
4.3.4.4) if rand2 is not more than P, use Pc trialReplacement of Pc best(ii) a Otherwise, P is maintainedc bestThe change is not changed;
4.3.5)hc=hc+1;
4.3.6) if HC is less than or equal to HC, go to step 4.3.2); otherwise, ending the iterative crossover stage and entering the iterative variation stage;
4.4) iterative variation phase, the process is as follows:
4.4.1) orderWhereinAndrespectively representing the optimal individual and the target individual in the variation stage;
4.4.2) set HM ═ 1, where HM ∈ {1, 2.
4.4.3) pairsPerforming mutation operation on each segment window, selecting the optimal variant individuals,
the process is as follows:
4.4.3.1) setting the fragment window number hw equal to 1, wherein hw is equal to {1, 2.., L-2}, and L represents the length of the sequence of the target protein;
4.4.3.2) randomly selecting a fragment from the fragment library corresponding to the hw window, and replacing the fragment with the fragment to generate variant individuals
4.4.3.4)hw=hw+1;
4.4.3.5) if hw is less than or equal to L-2, go to step 4.4.3.2); otherwise, go to step 4.4.4);
4.4.5)hm=hm+1;
4.4.6) if HM is less than or equal to HM, turning to the step 4.4.3); otherwise, ending the iterative variation stage and entering the selection stage;
4.5) selection phase, the process is as follows:
4.5.1) selecting the two individuals with the highest energy from the population P according to the Rosetta score3 energy function
4.6)g=g+1;
4.7) if G is less than or equal to GmaxGo to step 4.2); otherwise, ending the iterative hill climbing search;
5) and clustering the individuals in the population P according to a Rosetta clustering algorithm, and selecting the heart-like conformation individual of the maximum class as a final prediction result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810579338.5A CN109063413B (en) | 2018-06-07 | 2018-06-07 | Method for optimizing space of protein conformation by population hill climbing iteration |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810579338.5A CN109063413B (en) | 2018-06-07 | 2018-06-07 | Method for optimizing space of protein conformation by population hill climbing iteration |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109063413A CN109063413A (en) | 2018-12-21 |
CN109063413B true CN109063413B (en) | 2021-04-06 |
Family
ID=64820510
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810579338.5A Active CN109063413B (en) | 2018-06-07 | 2018-06-07 | Method for optimizing space of protein conformation by population hill climbing iteration |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109063413B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103077226A (en) * | 2012-12-31 | 2013-05-01 | 浙江工业大学 | Spatial search method for multi-modal protein conformations |
CN103455610A (en) * | 2013-09-01 | 2013-12-18 | 西安电子科技大学 | Network community detecting method based on multi-objective memetic computation |
CN103473482A (en) * | 2013-07-15 | 2013-12-25 | 浙江工业大学 | Protein three-dimensional structure prediction method based on differential evolution and conformation space annealing |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6490532B1 (en) * | 1999-01-25 | 2002-12-03 | Mount Sinai Hospital | Method to construct protein structures |
EP3123150A1 (en) * | 2014-03-25 | 2017-02-01 | Malvern Instruments Ltd | Raman spectroscopic structure investigation of proteins dispersed in a liquid phase |
-
2018
- 2018-06-07 CN CN201810579338.5A patent/CN109063413B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103077226A (en) * | 2012-12-31 | 2013-05-01 | 浙江工业大学 | Spatial search method for multi-modal protein conformations |
CN103473482A (en) * | 2013-07-15 | 2013-12-25 | 浙江工业大学 | Protein three-dimensional structure prediction method based on differential evolution and conformation space annealing |
CN103455610A (en) * | 2013-09-01 | 2013-12-18 | 西安电子科技大学 | Network community detecting method based on multi-objective memetic computation |
Non-Patent Citations (1)
Title |
---|
Hill-Climbing search and diversification within an evolutionary approach to protein structure prediction;Camelia Chira 等;《BioData Mining 2011》;20110730;第1-17页 * |
Also Published As
Publication number | Publication date |
---|---|
CN109063413A (en) | 2018-12-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107609342B (en) | Protein conformation search method based on secondary structure space distance constraint | |
Deng et al. | Protein structure prediction | |
Lin et al. | An efficient deep reinforcement learning model for urban traffic control | |
Zheng et al. | Detecting distant-homology protein structures by aligning deep neural-network based contact maps | |
CN108334746B (en) | Protein structure prediction method based on secondary structure similarity | |
CN107633159B (en) | Protein conformation space search method based on distance similarity | |
CN109360599B (en) | Protein structure prediction method based on residue contact information cross strategy | |
CN109086566B (en) | Group protein structure prediction method based on fragment resampling | |
CN108846256B (en) | Group protein structure prediction method based on residue contact information | |
CN108647486B (en) | Protein three-dimensional structure prediction method based on conformation diversity strategy | |
CN109360601B (en) | Multi-modal protein structure prediction method based on displacement strategy | |
CN109215733B (en) | Protein structure prediction method based on residue contact information auxiliary evaluation | |
CN109063413B (en) | Method for optimizing space of protein conformation by population hill climbing iteration | |
CN109378034B (en) | Protein prediction method based on distance distribution estimation | |
CN109360597B (en) | Group protein structure prediction method based on global and local strategy cooperation | |
CN109360598B (en) | Protein structure prediction method based on two-stage sampling | |
CN108595910B (en) | Group protein conformation space optimization method based on diversity index | |
CN109346128B (en) | Protein structure prediction method based on residue information dynamic selection strategy | |
CN109300505B (en) | Protein structure prediction method based on biased sampling | |
CN109448786B (en) | Method for predicting protein structure by lower bound estimation dynamic strategy | |
CN109326318B (en) | Group protein structure prediction method based on Loop region Gaussian disturbance | |
CN111951885B (en) | Protein structure prediction method based on local bias | |
CN109326319B (en) | Protein conformation space optimization method based on secondary structure knowledge | |
CN109243526B (en) | Protein structure prediction method based on specific fragment crossing | |
CN109147867B (en) | Group protein structure prediction method based on dynamic segment length |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |