CN109509510B - Protein structure prediction method based on multi-population ensemble variation strategy - Google Patents

Protein structure prediction method based on multi-population ensemble variation strategy Download PDF

Info

Publication number
CN109509510B
CN109509510B CN201810762915.4A CN201810762915A CN109509510B CN 109509510 B CN109509510 B CN 109509510B CN 201810762915 A CN201810762915 A CN 201810762915A CN 109509510 B CN109509510 B CN 109509510B
Authority
CN
China
Prior art keywords
sub
conformation
population
distance
populations
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810762915.4A
Other languages
Chinese (zh)
Other versions
CN109509510A (en
Inventor
张贵军
彭春祥
周晓根
刘俊
王柳静
胡俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201810762915.4A priority Critical patent/CN109509510B/en
Publication of CN109509510A publication Critical patent/CN109509510A/en
Application granted granted Critical
Publication of CN109509510B publication Critical patent/CN109509510B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

A protein structure prediction method based on multi-population ensemble variation strategy is characterized in that under the framework of an evolutionary algorithm, a population is averagely divided into four sub-populations, and different variation strategies are respectively designed for each sub-population through conformation cooperative cooperation in each sub-population; and secondly, selecting the constellation according to a Rosetta energy function score3, a distance error coefficient and a Monte Carlo probability receiving criterion to guide the update process of the constellation, so that the problem of inaccuracy of the energy function can be relieved, algorithm sampling can be guided to obtain the constellation with lower energy and more reasonable structure, and the sampling efficiency is improved. The invention provides a protein structure prediction method based on multi-population ensemble mutation strategies, which is high in sampling efficiency and prediction accuracy.

Description

Protein structure prediction method based on multi-population ensemble variation strategy
Technical Field
The invention relates to the fields of bioinformatics and computer application, in particular to a protein structure prediction method based on a multi-population ensemble variation strategy.
Background
The rapid development of computer hardware and software technologies provides a robust, fundamental platform for the development of de novo prediction methods. The progress and breakthrough of the de novo protein structure prediction method has further promoted the wide participation of the subject researchers in computer science and evolutionary computation, and has become one of the most active multidisciplinary research subjects in the field of protein structure prediction in recent years. In a review article published in the Science journal of 2012, professor Dill of academy of sciences of the united states of america reviewed the progress made from the field of de novo prediction for 50 years, and it was pointed out that in the process of seeking answers to this problem, the development of supercomputers, new materials and drug discovery was greatly promoted, helping people understand the basic process of life. De novo prediction methods currently face a number of difficulties and challenges.
The de novo prediction method is directly based on a protein physical or knowledge energy model, and utilizes an optimization algorithm to search a global minimum energy conformational solution in a conformational space. The conformation space optimization method is one of the most critical factors for restricting the de novo prediction precision of the protein structure at present. The application of the optimization algorithm to the de novo prediction sampling process must first solve the following three problems: (1) the complexity of the energy. (2) High dimensional properties of the energy model. (3) Inaccuracy of the energy model. At present, we are far from constructing a force field which can guide the target sequence to fold towards the correct direction and is accurate enough, so that the optimal solution in mathematics does not necessarily correspond to the natural structure of the target protein; furthermore, model inaccuracies can also result in an inability to objectively analyze the performance of the optimization algorithm.
The inherent complexity of spatial optimization of protein conformation makes it a very challenging research topic in the field of de novo protein structure prediction. In order to find unique native protein structures in a huge sampling space by using a computer, an efficient conformational space optimization algorithm must be designed to convert the native protein structures into a practical computational problem.
The differential evolution algorithm (DE) has been successfully applied to protein structure prediction due to its advantages of simple structure, easy implementation, strong robustness, fast convergence rate, etc. However, with the increase of amino acid sequences, the degree of freedom of a protein molecular system is increased, and obtaining a global optimal solution of a large-scale protein conformation space by using the traditional population algorithm sampling becomes challenging work; secondly, the coarse-grained model reduces the conformational search space, but also causes information loss between interaction forces, thereby directly affecting the prediction accuracy.
Therefore, the conventional protein structure prediction method has disadvantages in sampling efficiency and prediction accuracy, and needs to be improved.
Disclosure of Invention
In order to overcome the defects of low sampling efficiency, poor population diversity and low prediction precision of the conventional protein structure prediction method, the invention introduces a multi-population mutation strategy to guide conformational space optimization under the framework of a basic differential evolution algorithm, and provides the protein structure prediction method based on the multi-population ensemble mutation strategy, which has high sampling efficiency and high prediction precision.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a method of protein structure prediction based on a multi-population ensemble mutation strategy, the prediction method comprising the steps of:
1) sequence information for a given protein of interest;
2) from ROBETTA servers according to the target protein sequence (http://www.robetta.org/) Obtaining fragment library files, wherein the fragment library files comprise 3 fragment library files and 9 fragment library files;
3) from QUARK Server (https:// zhanglab. ccmb. med. umich. edu/QUARK /)
Obtaining a distance spectrum file;
4) setting parameters: the population size NP, the maximum iteration algebra G of the algorithm, a cross factor CR and a temperature factor beta, and the iteration algebra G is set to be 0;
5) population initialization: random fragment assembly to generate NP initial conformations CiI ═ 1,2, …, NP }, and the NP individuals are divided equally into four sub-populations, i.e.
Figure GDA0002947061680000021
And
Figure GDA0002947061680000022
wherein, j is {1,2, …, NP/4}, k is { NP/4+1, …, NP/2}, m is { NP/2+1, …, NP3/4}, and n is {3NP/4+1, …, NP };
6) for individuals in the first sub-population
Figure GDA0002947061680000023
The following operations are carried out:
6.1) mixing
Figure GDA0002947061680000024
Set as a target individual
Figure GDA0002947061680000025
Randomly selecting a conformational individual in a first sub-population
Figure GDA0002947061680000026
Figure GDA0002947061680000027
Randomly selecting two sub-populations from the remaining three sub-populations, and randomly taking out two individuals C from the two sub-populationsa,CbFrom C, respectivelya、CbIn the method, a 9 segment with different positions is randomly selected and respectively replaced to
Figure GDA0002947061680000028
Fragments of the corresponding positions generate a mutated conformation
Figure GDA0002947061680000029
To pair
Figure GDA00029470616800000210
Performing a fragment assembly to generate a conformation
Figure GDA00029470616800000211
6.2) randomly generating a uniformly distributed fraction R between 0 and 1, if R>CR, then
Figure GDA00029470616800000212
Randomly selects a 9-segment to replace to
Figure GDA00029470616800000213
A corresponding position; otherwise, it keeps
Figure GDA00029470616800000214
The conformation resulting from this operation was recorded as the test conformation, without change
Figure GDA0002947061680000031
6.3) separately computing with Rosetta score3 energy function
Figure GDA0002947061680000032
And
Figure GDA0002947061680000033
energy of (2):
Figure GDA0002947061680000034
and
Figure GDA0002947061680000035
6.4) if
Figure GDA0002947061680000036
Then conformation
Figure GDA0002947061680000037
Replacement of
Figure GDA0002947061680000038
Adding 1 to the receiving times count1, and going to step 6.8), otherwise, continuing to execute step 6.5);
6.5) separate computations based on residue pairs in the distance spectrum
Figure GDA0002947061680000039
And
Figure GDA00029470616800000310
inter-residue distance of
Figure GDA00029470616800000311
And
Figure GDA00029470616800000312
then respectively calculating according to formulas (1) and (2)
Figure GDA00029470616800000313
And
Figure GDA00029470616800000314
distance error coefficient D oftrialAnd DtargetWhere T represents the number of pairs of residues in the distance spectrum,
Figure GDA00029470616800000315
and
Figure GDA00029470616800000316
respectively represent
Figure GDA00029470616800000317
And
Figure GDA00029470616800000318
residues in the t-th pair of conformations correspond to CαDistance between atoms, dNRepresenting the mean value of the distance spectrum in the Nth distance interval of the distance spectrum, PDNRepresenting the number of distance spectrum lengths within the interval N, the distance range in the distance spectrum is (0,9), the distance interval is 0.5, i.e., the distance interval is (0, 0.5)],(0.5,1],…,(8.5,9);
Figure GDA00029470616800000319
Figure GDA00029470616800000320
6.6) if Dtrial<DtargetThen conformation
Figure GDA00029470616800000321
Alternative conformations
Figure GDA00029470616800000322
Adding 1 to the receiving times count1, otherwise, performing step 6.7);
6.7) calculating the difference in the distance error coefficients of the target and test conformations
Figure GDA00029470616800000323
According to probability
Figure GDA00029470616800000324
Acceptance of conformation by Monte Carlo criteria
Figure GDA00029470616800000325
Wherein β is a temperature factor;
6.8) j equals j +1, iteratively executing steps 6.1) -6.8) until j equals NP/4;
7) for each conformation in the second sub-population
Figure GDA00029470616800000326
The operation was carried out as follows:
7.1) formation of
Figure GDA00029470616800000327
Recording as target individual
Figure GDA00029470616800000328
Selecting a lowest energy conformation from the second sub-population
Figure GDA00029470616800000329
Two of the three subgroups were randomly selected, and two conformations C were randomly selected from themc、CdAre respectively paired with Cc、CdRandomly selecting a 9 segment from different positions to replace the 9 segment
Figure GDA00029470616800000330
Corresponding position, generating
Figure GDA00029470616800000331
To pair
Figure GDA00029470616800000332
Performing a fragment assembly to generate a conformation
Figure GDA00029470616800000333
7.2) pairs of steps corresponding to 6.2) to 6.7)
Figure GDA0002947061680000041
And
Figure GDA0002947061680000042
performing an operation wherein the number of times the test conformation is received is denoted as count 2;
7.3) k equals k +1, and iteratively executing steps 7.1) -7.2) until k equals NP/2;
8) for each conformation in the third group of sub-populations
Figure GDA0002947061680000043
The operation was carried out as follows:
8.1) formation of
Figure GDA0002947061680000044
Is recorded as a target individual
Figure GDA0002947061680000045
Sorting the third group of sub-populations from smaller to larger energy, and randomly selecting an individual in the first half of the conformations
Figure GDA0002947061680000046
Then randomly selecting two sub-populations from the other three sub-populations, and randomly selecting conformation C from the two sub-populationseAnd CfAre respectively paired with Ce、CfRandomly selecting a 9 segment from different positions to replace the 9 segment
Figure GDA0002947061680000047
Corresponding position, generating
Figure GDA0002947061680000048
To pair
Figure GDA0002947061680000049
Performing a fragment assembly to generate a conformation
Figure GDA00029470616800000410
8.2) pairing of constellations according to the corresponding steps 6.2) to 6.7)
Figure GDA00029470616800000411
And
Figure GDA00029470616800000412
performing an operation wherein the number of times the test conformation is received is denoted as count 3;
8.3) m is m +1, and the steps 8.1) to 8.2) are executed in an iterative manner until k is NP 3/4;
9) for all conformations in the fourth subgroup population
Figure GDA00029470616800000413
Assembling Rosetta segments;
10) iteratively operating steps 6) -9), carrying out variation on the fourth sub-population by selecting a population variation strategy corresponding to the maximum value of count1, count2 and count3 every 20 generations to calculate the sizes of count1, count2 and count3, operating according to steps 6.2) -6.8), and setting the count1, count2 and count3 to zero;
11) g +1, iteratively executing steps 6) -10) until G is greater than G;
12) and outputting the result.
The technical conception of the invention is as follows: under an evolutionary algorithm framework, firstly, a population is averagely divided into four sub-populations, and different variation strategies are respectively designed for each sub-population through the conformation collaborative cooperation in each sub-population; and secondly, selecting the constellation according to a Rosetta energy function score3, a distance error coefficient and a Monte Carlo probability receiving criterion to guide the update process of the constellation, so that the problem of inaccuracy of the energy function can be relieved, algorithm sampling can be guided to obtain the constellation with lower energy and more reasonable structure, and the sampling efficiency is improved. The invention provides a protein structure prediction method based on multi-population ensemble mutation strategies, which is high in sampling efficiency and prediction accuracy.
The invention has the beneficial effects that: through the cooperation among multiple populations to guide variation, the sampling efficiency can be improved, and the population diversity can be kept; the distance spectrum is used for assisting the conformation selection, so that although the energy function is high, the conformation with a reasonable structure is kept, the problem of prediction error caused by inaccuracy of the energy function is solved, and the prediction accuracy is improved.
Drawings
FIG. 1 is a conformational profile of protein 2EZK sampled by a protein structure prediction method based on a multi-population ensemble mutation strategy.
FIG. 2 is a schematic diagram of the conformational update when protein 2EZK is sampled by a protein structure prediction method based on a multi-population ensemble mutation strategy.
FIG. 3 is a three-dimensional structure predicted from the structure of protein 2EZK by a protein structure prediction method based on a multi-population ensemble mutation strategy;
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1 to 3, a method for predicting protein structure based on a multi-population ensemble mutation strategy, the method comprising the steps of:
1) sequence information for a given protein of interest;
2) from ROBETTA servers according to the target protein sequence (http://www.robetta.org/) Obtaining fragment library files, wherein the fragment library files comprise 3 fragment library files and 9 fragment library files;
3) obtaining a distance spectrum file from a QUARK server (https:// zhanglab. ccmb. med. umich. edu/QUARK /) according to the sequence information;
4) setting parameters: the population size NP, the maximum iteration algebra G of the algorithm, a cross factor CR and a temperature factor beta, and the iteration algebra G is set to be 0;
5) population initialization: random fragment assembly to generate NP initial conformations CiI ═ 1,2, …, NP }, and the NP individuals are divided equally into four sub-populations, i.e.
Figure GDA0002947061680000051
And
Figure GDA0002947061680000052
wherein, j is {1,2, …, NP/4}, k is { NP/4+1, …, NP/2}, m is { NP/2+1, …, NP3/4}, and n is {3NP/4+1, …, NP };
6) for individuals in the first sub-population
Figure GDA0002947061680000053
The following operations are carried out:
6.1) mixing
Figure GDA0002947061680000054
Set as a target individual
Figure GDA0002947061680000055
Randomly selecting a conformational individual in a first sub-population
Figure GDA0002947061680000056
Figure GDA0002947061680000057
Randomly selecting two sub-populations from the remaining three sub-populations, and randomly taking out two individuals C from the two sub-populationsa,CbFrom C, respectivelya、CbIn the method, a 9 segment with different positions is randomly selected and respectively replaced to
Figure GDA0002947061680000058
Fragments of the corresponding positions generate a mutated conformation
Figure GDA0002947061680000059
To pair
Figure GDA00029470616800000510
Performing a fragment assembly to generate a conformation
Figure GDA00029470616800000511
6.2) randomly generating a uniformly distributed fraction R between 0 and 1, if R>CR, then
Figure GDA0002947061680000061
Randomly selects a 9-segment to replace to
Figure GDA0002947061680000062
A corresponding position; otherwise, it keeps
Figure GDA0002947061680000063
The conformation resulting from this operation was recorded as the test conformation, without change
Figure GDA0002947061680000064
6.3) separately computing with Rosetta score3 energy function
Figure GDA0002947061680000065
And
Figure GDA0002947061680000066
energy of (2):
Figure GDA0002947061680000067
and
Figure GDA0002947061680000068
6.4) if
Figure GDA0002947061680000069
Then conformation
Figure GDA00029470616800000610
Replacement of
Figure GDA00029470616800000611
Adding 1 to the receiving times count1, and going to step 6.8), otherwise, continuing to execute step 6.5);
6.5) separate computations based on residue pairs in the distance spectrum
Figure GDA00029470616800000612
And
Figure GDA00029470616800000613
inter-residue distance of
Figure GDA00029470616800000614
And
Figure GDA00029470616800000615
then respectively calculating according to formulas (1) and (2)
Figure GDA00029470616800000616
And
Figure GDA00029470616800000617
distance error coefficient D oftrialAnd DtargetWhere T represents the number of pairs of residues in the distance spectrum,
Figure GDA00029470616800000618
and
Figure GDA00029470616800000619
respectively represent
Figure GDA00029470616800000620
And
Figure GDA00029470616800000621
residues in the t-th pair of conformations correspond to CαDistance between atoms, dNRepresenting the mean value of the distance spectrum in the Nth distance interval of the distance spectrum, PDNRepresenting the number of distance spectrum lengths within the interval N, the distance range in the distance spectrum is (0,9), the distance interval is 0.5, i.e., the distance interval is (0, 0.5)],(0.5,1],…,(8.5,9);
Figure GDA00029470616800000622
Figure GDA00029470616800000623
6.6) if Dtrial<DtargetThen conformation
Figure GDA00029470616800000624
Alternative conformations
Figure GDA00029470616800000625
Adding 1 to the receiving times count1, otherwise, performing step 6.7);
6.7) calculating the difference in the distance error coefficients of the target and test conformations
Figure GDA00029470616800000626
According to probability
Figure GDA00029470616800000627
Acceptance of conformation by Monte Carlo criteria
Figure GDA00029470616800000628
Wherein β is a temperature factor;
6.8) j equals j +1, iteratively executing steps 6.1) -6.8) until j equals NP/4;
7) for each conformation in the second sub-population
Figure GDA00029470616800000629
The operation was carried out as follows:
7.1) formation of
Figure GDA00029470616800000630
Recording as target individual
Figure GDA00029470616800000631
Selecting a lowest energy conformation from the second sub-population
Figure GDA00029470616800000632
Randomly selecting two of the other three subgroups, and randomly selecting one of themTwo conformations selected Cc、CdAre respectively paired with Cc、CdRandomly selecting a 9 segment from different positions to replace the 9 segment
Figure GDA0002947061680000071
Corresponding position, generating
Figure GDA0002947061680000072
To pair
Figure GDA0002947061680000073
Performing a fragment assembly to generate a conformation
Figure GDA0002947061680000074
7.2) pairs of steps corresponding to 6.2) to 6.7)
Figure GDA0002947061680000075
And
Figure GDA0002947061680000076
performing an operation wherein the number of times the test conformation is received is denoted as count 2;
7.3) k equals k +1, and iteratively executing steps 7.1) -7.2) until k equals NP/2;
8) for each conformation in the third group of sub-populations
Figure GDA0002947061680000077
The operation was carried out as follows:
8.1) formation of
Figure GDA0002947061680000078
Is recorded as a target individual
Figure GDA0002947061680000079
Sorting the third group of sub-populations from smaller to larger energy, and randomly selecting an individual in the first half of the conformations
Figure GDA00029470616800000710
And then from itRandomly selecting two sub-populations from the three sub-populations, and randomly selecting conformation C from the two sub-populationseAnd CfAre respectively paired with Ce、CfRandomly selecting a 9 segment from different positions to replace the 9 segment
Figure GDA00029470616800000711
Corresponding position, generating
Figure GDA00029470616800000712
To pair
Figure GDA00029470616800000713
Performing a fragment assembly to generate a conformation
Figure GDA00029470616800000714
8.2) pairing of constellations according to the corresponding steps 6.2) to 6.7)
Figure GDA00029470616800000715
And
Figure GDA00029470616800000716
performing an operation wherein the number of times the test conformation is received is denoted as count 3;
8.3) m is m +1, and the steps 8.1) to 8.2) are executed in an iterative manner until k is NP 3/4;
9) for all conformations in the fourth subgroup population
Figure GDA00029470616800000717
Assembling Rosetta segments;
10) iteratively operating steps 6) -9), carrying out variation on the fourth sub-population by selecting a population variation strategy corresponding to the maximum value of count1, count2 and count3 every 20 generations to calculate the sizes of count1, count2 and count3, operating according to steps 6.2) -6.8), and setting the count1, count2 and count3 to zero;
11) g +1, iteratively executing steps 6) -10) until G is greater than G;
12) and outputting the result.
Taking alpha protein 2EZK with the sequence length of 99 as an example, a protein structure prediction method based on a multi-population ensemble variation strategy comprises the following steps:
1) sequence information for a given protein of interest;
2) from ROBETTA servers according to the target protein sequence (http://www.robetta.org/) Obtaining fragment library files, wherein the fragment library files comprise 3 fragment library files and 9 fragment library files;
3) obtaining a distance spectrum file from a QUARK server (https:// zhanglab. ccmb. med. umich. edu/QUARK /) according to the sequence information;
4) setting parameters: the population size NP is 100, the maximum iteration algebra G of the algorithm is 1000, the crossover factor CR is 0.3, the temperature factor β is 2, and the iteration algebra G is 0;
5) population initialization: random fragment assembly to generate NP initial conformations CiI ═ 1,2, …, NP }, and the NP individuals are divided equally into four sub-populations, i.e.
Figure GDA0002947061680000081
And
Figure GDA0002947061680000082
wherein, j is {1,2, …, NP/4}, k is { NP/4+1, …, NP/2}, m is { NP/2+1, …, NP3/4}, and n is {3NP/4+1, …, NP };
6) for individuals in the first sub-population
Figure GDA0002947061680000083
The following operations are carried out:
6.1) mixing
Figure GDA0002947061680000084
Set as a target individual
Figure GDA0002947061680000085
Randomly selecting a conformational individual in a first sub-population
Figure GDA0002947061680000086
Figure GDA0002947061680000087
Randomly selecting two sub-populations from the remaining three sub-populations, and randomly taking out two individuals C from the two sub-populationsa,CbFrom C, respectivelya、CbIn the method, a 9 segment with different positions is randomly selected and respectively replaced to
Figure GDA0002947061680000088
Fragments of the corresponding positions generate a mutated conformation
Figure GDA0002947061680000089
To pair
Figure GDA00029470616800000810
Performing a fragment assembly to generate a conformation
Figure GDA00029470616800000811
6.2) randomly generating a uniformly distributed fraction R between 0 and 1, if R>CR, then
Figure GDA00029470616800000812
Randomly selects a 9-segment to replace to
Figure GDA00029470616800000813
A corresponding position; otherwise, it keeps
Figure GDA00029470616800000814
The conformation resulting from this operation was recorded as the test conformation, without change
Figure GDA00029470616800000815
6.3) separately computing with Rosetta score3 energy function
Figure GDA00029470616800000816
And
Figure GDA00029470616800000817
energy of (2):
Figure GDA00029470616800000818
and
Figure GDA00029470616800000819
6.4) if
Figure GDA00029470616800000820
Then conformation
Figure GDA00029470616800000821
Replacement of
Figure GDA00029470616800000822
Adding 1 to the receiving times count1, and going to step 6.8), otherwise, continuing to execute step 6.5);
6.5) separate computations based on residue pairs in the distance spectrum
Figure GDA00029470616800000823
And
Figure GDA00029470616800000824
inter-residue distance of
Figure GDA00029470616800000825
And
Figure GDA00029470616800000826
then respectively calculating according to formulas (1) and (2)
Figure GDA00029470616800000827
And
Figure GDA00029470616800000828
distance error coefficient D oftrialAnd DtargetWhere T represents the number of pairs of residues in the distance spectrum,
Figure GDA00029470616800000829
and
Figure GDA00029470616800000830
respectively represent
Figure GDA00029470616800000831
And
Figure GDA00029470616800000832
residues in the t-th pair of conformations correspond to CαDistance between atoms, dNRepresenting the mean value of the distance spectrum in the Nth distance interval of the distance spectrum, PDNRepresenting the number of distance spectrum lengths within the interval N, the distance range in the distance spectrum is (0,9), the distance interval is 0.5, i.e., the distance interval is (0, 0.5)],(0.5,1],…,(8.5,9);
Figure GDA0002947061680000091
Figure GDA0002947061680000092
6.6) if Dtrial<DtargetThen conformation
Figure GDA0002947061680000093
Alternative conformations
Figure GDA0002947061680000094
Adding 1 to the receiving times count1, otherwise, performing step 6.7);
6.7) calculating the difference in the distance error coefficients of the target and test conformations
Figure GDA0002947061680000095
According to probability
Figure GDA0002947061680000096
Acceptance of conformation by Monte Carlo criteria
Figure GDA0002947061680000097
Wherein β is a temperature factor;
6.8) j equals j +1, iteratively executing steps 6.1) -6.8) until j equals NP/4;
7) for each conformation in the second sub-population
Figure GDA0002947061680000098
The operation was carried out as follows:
7.1) formation of
Figure GDA0002947061680000099
Recording as target individual
Figure GDA00029470616800000910
Selecting a lowest energy conformation from the second sub-population
Figure GDA00029470616800000911
Two of the three subgroups were randomly selected, and two conformations C were randomly selected from themc、CdAre respectively paired with Cc、CdRandomly selecting a 9 segment from different positions to replace the 9 segment
Figure GDA00029470616800000912
Corresponding position, generating
Figure GDA00029470616800000913
To pair
Figure GDA00029470616800000914
Performing a fragment assembly to generate a conformation
Figure GDA00029470616800000915
7.2) pairs of steps corresponding to 6.2) to 6.7)
Figure GDA00029470616800000916
And
Figure GDA00029470616800000917
performing an operation wherein the number of times the test conformation is received is denoted as count 2;
7.3) k equals k +1, and iteratively executing steps 7.1) -7.2) until k equals NP/2;
8) for each conformation in the third group of sub-populations
Figure GDA00029470616800000918
The operation was carried out as follows:
8.1) formation of
Figure GDA00029470616800000919
Is recorded as a target individual
Figure GDA00029470616800000920
Sorting the third group of sub-populations from smaller to larger energy, and randomly selecting an individual in the first half of the conformations
Figure GDA00029470616800000921
Then randomly selecting two sub-populations from the other three sub-populations, and randomly selecting conformation C from the two sub-populationseAnd CfAre respectively paired with Ce、CfRandomly selecting a 9 segment from different positions to replace the 9 segment
Figure GDA00029470616800000922
Corresponding position, generating
Figure GDA00029470616800000923
To pair
Figure GDA00029470616800000924
Performing a fragment assembly to generate a conformation
Figure GDA00029470616800000925
8.2) pairing of constellations according to the corresponding steps 6.2) to 6.7)
Figure GDA00029470616800000926
And
Figure GDA00029470616800000927
performing an operation wherein the number of times the test conformation is received is denoted as count 3;
8.3) m is m +1, and the steps 8.1) to 8.2) are executed in an iterative manner until k is NP 3/4;
9) for all conformations in the fourth subgroup population
Figure GDA0002947061680000101
Assembling Rosetta segments;
10) iteratively operating steps 6) -9), carrying out variation on the fourth sub-population by selecting a population variation strategy corresponding to the maximum value of count1, count2 and count3 every 20 generations to calculate the sizes of count1, count2 and count3, operating according to steps 6.2) -6.8), and setting the count1, count2 and count3 to zero;
11) g +1, iteratively executing steps 6) -10) until G is greater than G;
12) and outputting the result.
Taking alpha protein 2EZK with sequence length 99 as an example, the near-native conformation of the protein is obtained by the above method, and the mean root mean square deviation between the structure obtained by running 1000 generations and the native structure is
Figure GDA0002947061680000102
Minimum root mean square deviation of
Figure GDA0002947061680000103
The predicted three-dimensional structure is shown in fig. 3.
The foregoing illustrates one example of the invention, and it will be apparent that the invention is not limited to the above-described embodiments, but may be practiced with various modifications without departing from the essential spirit of the invention and without departing from the spirit thereof.

Claims (1)

1. A protein structure prediction method based on multi-population ensemble mutation strategy is characterized in that: the method comprises the following steps:
1) sequence information for a given protein of interest;
2) obtaining fragment library files from a ROBETTA server according to a target protein sequence, wherein the fragment library files comprise 3 fragment library files and 9 fragment library files;
3) obtaining a distance spectrum file from a QUARK server according to the sequence information;
4) setting parameters: the population size NP, the maximum iteration algebra G of the algorithm, a cross factor CR and a temperature factor beta, and the iteration algebra G is set to be 0;
5) population initialization: random fragment assembly to generate NP initial conformations CiI ═ 1,2, …, NP }, and the NP individuals are divided equally into four sub-populations, i.e.
Figure FDA0002947061670000011
And
Figure FDA0002947061670000012
wherein, j is {1,2, …, NP/4}, k is { NP/4+1, …, NP/2}, m is { NP/2+1, …, NP3/4}, and n is {3NP/4+1, …, NP };
6) for individuals in the first sub-population
Figure FDA0002947061670000013
The following operations are carried out:
6.1) mixing
Figure FDA0002947061670000014
Set as a target individual
Figure FDA0002947061670000015
Randomly selecting a conformational individual in a first sub-population
Figure FDA0002947061670000016
Figure FDA0002947061670000017
From the remaining threeRandomly selecting two sub-populations from the sub-populations, and randomly taking out two individuals C from the two sub-populations respectivelya,CbFrom C, respectivelya、CbIn the method, a 9 segment with different positions is randomly selected and respectively replaced to
Figure FDA0002947061670000018
Fragments of the corresponding positions generate a mutated conformation
Figure FDA0002947061670000019
To pair
Figure FDA00029470616700000110
Performing a fragment assembly to generate a conformation
Figure FDA00029470616700000111
6.2) randomly generating a uniformly distributed fraction R between 0 and 1, if R>CR, then
Figure FDA00029470616700000112
Randomly selects a 9-segment to replace to
Figure FDA00029470616700000113
A corresponding position; otherwise, it keeps
Figure FDA00029470616700000114
The conformation resulting from this operation was recorded as the test conformation, without change
Figure FDA00029470616700000115
6.3) separately computing with Rosetta score3 energy function
Figure FDA00029470616700000116
And
Figure FDA00029470616700000117
energy of (2):
Figure FDA00029470616700000118
and
Figure FDA00029470616700000119
6.4) if
Figure FDA00029470616700000120
Then conformation
Figure FDA00029470616700000121
Replacement of
Figure FDA00029470616700000122
Figure FDA00029470616700000123
Adding 1 to the receiving times count1, and going to step 6.8), otherwise, continuing to execute step 6.5);
6.5) separate computations based on residue pairs in the distance spectrum
Figure FDA00029470616700000124
And
Figure FDA00029470616700000125
inter-residue distance of
Figure FDA00029470616700000126
And
Figure FDA00029470616700000127
then respectively calculating according to formulas (1) and (2)
Figure FDA0002947061670000021
And
Figure FDA0002947061670000022
distance error coefficient D oftrialAnd DtargetWhere T represents the number of pairs of residues in the distance spectrum,
Figure FDA0002947061670000023
and
Figure FDA0002947061670000024
respectively represent
Figure FDA0002947061670000025
And
Figure FDA0002947061670000026
residues in the t-th pair of conformations correspond to CαDistance between atoms, dNRepresenting the mean value of the distance spectrum in the Nth distance interval of the distance spectrum, PDNRepresenting the number of distance spectrum lengths within the interval N, the distance range in the distance spectrum is (0,9), the distance interval is 0.5, i.e., the distance interval is (0, 0.5)],(0.5,1],…,(8.5,9);
Figure FDA0002947061670000027
Figure FDA0002947061670000028
6.6) if Dtrial<DtargetThen conformation
Figure FDA0002947061670000029
Alternative conformations
Figure FDA00029470616700000210
Figure FDA00029470616700000211
Adding 1 to the receiving times count1, otherwise, performing step 6.7);
6.7) calculating the difference in the distance error coefficients of the target and test conformations
Figure FDA00029470616700000212
According to probability
Figure FDA00029470616700000213
Acceptance of conformation by Monte Carlo criteria
Figure FDA00029470616700000214
Wherein β is a temperature factor;
6.8) j equals j +1, iteratively executing steps 6.1) -6.8) until j equals NP/4;
7) for each conformation in the second sub-population
Figure FDA00029470616700000215
The operation was carried out as follows:
7.1) formation of
Figure FDA00029470616700000216
Recording as target individual
Figure FDA00029470616700000217
Selecting a lowest energy conformation from the second sub-population
Figure FDA00029470616700000218
Two of the three subgroups were randomly selected, and two conformations C were randomly selected from themc、CdAre respectively paired with Cc、CdRandomly selecting a 9 segment from different positions to replace the 9 segment
Figure FDA00029470616700000219
Corresponding position, generating
Figure FDA00029470616700000220
To pair
Figure FDA00029470616700000221
Performing a fragment assembly to generate a conformation
Figure FDA00029470616700000222
7.2) pairs of steps corresponding to 6.2) to 6.7)
Figure FDA00029470616700000223
And
Figure FDA00029470616700000224
performing an operation wherein the number of times the test conformation is received is denoted as count 2;
7.3) k equals k +1, and iteratively executing steps 7.1) -7.2) until k equals NP/2;
8) for each conformation in the third group of sub-populations
Figure FDA00029470616700000225
The operation was carried out as follows:
8.1) formation of
Figure FDA00029470616700000226
Is recorded as a target individual
Figure FDA00029470616700000227
Sorting the third group of sub-populations from smaller to larger energy, and randomly selecting an individual in the first half of the conformations
Figure FDA00029470616700000228
Then randomly selecting two sub-populations from the other three sub-populations, and randomly selecting conformation C from the two sub-populationseAnd CfAre respectively paired with Ce、CfRandomly selecting a 9 segment from different positions to replace the 9 segment
Figure FDA0002947061670000031
Corresponding position, generating
Figure FDA0002947061670000032
To pair
Figure FDA0002947061670000033
Performing a fragment assembly to generate a conformation
Figure FDA0002947061670000034
8.2) pairing of constellations according to the corresponding steps 6.2) to 6.7)
Figure FDA0002947061670000035
And
Figure FDA0002947061670000036
performing an operation wherein the number of times the test conformation is received is denoted as count 3;
8.3) m is m +1, and the steps 8.1) to 8.2) are executed in an iterative manner until k is NP 3/4;
9) for all conformations in the fourth subgroup population
Figure FDA0002947061670000037
Assembling Rosetta segments;
10) iteratively operating steps 6) -9), carrying out variation on the fourth sub-population by selecting a population variation strategy corresponding to the maximum value of count1, count2 and count3 every 20 generations to calculate the sizes of count1, count2 and count3, operating according to steps 6.2) -6.8), and setting the count1, count2 and count3 to zero;
11) g +1, iteratively executing steps 6) -10) until G is greater than G;
12) and outputting the result.
CN201810762915.4A 2018-07-12 2018-07-12 Protein structure prediction method based on multi-population ensemble variation strategy Active CN109509510B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810762915.4A CN109509510B (en) 2018-07-12 2018-07-12 Protein structure prediction method based on multi-population ensemble variation strategy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810762915.4A CN109509510B (en) 2018-07-12 2018-07-12 Protein structure prediction method based on multi-population ensemble variation strategy

Publications (2)

Publication Number Publication Date
CN109509510A CN109509510A (en) 2019-03-22
CN109509510B true CN109509510B (en) 2021-06-18

Family

ID=65745470

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810762915.4A Active CN109509510B (en) 2018-07-12 2018-07-12 Protein structure prediction method based on multi-population ensemble variation strategy

Country Status (1)

Country Link
CN (1) CN109509510B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110148437B (en) * 2019-04-16 2021-01-01 浙江工业大学 Residue contact auxiliary strategy self-adaptive protein structure prediction method
CN110162739B (en) * 2019-04-30 2023-05-02 哈尔滨工业大学 RFFKBMS algorithm weight updating and optimizing method based on forgetting factor

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104951670A (en) * 2015-06-08 2015-09-30 浙江工业大学 Group conformation space optimization method based on distance spectrum
CN105205348A (en) * 2015-09-22 2015-12-30 浙江工业大学 Method for colony conformation space optimization based on distance constraint selection strategy
CN105808973A (en) * 2016-03-03 2016-07-27 浙江工业大学 Staged multi-strategy-based group conformation space sampling method
CN106096326A (en) * 2016-06-02 2016-11-09 浙江工业大学 A kind of differential evolution Advances in protein structure prediction based on barycenter Mutation Strategy
CN106778059A (en) * 2016-12-19 2017-05-31 浙江工业大学 A kind of colony's Advances in protein structure prediction based on Rosetta local enhancements
CN107506613A (en) * 2017-08-29 2017-12-22 浙江工业大学 A kind of multi-modal protein conformation space optimization method based on multiple structural features

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104951670A (en) * 2015-06-08 2015-09-30 浙江工业大学 Group conformation space optimization method based on distance spectrum
CN105205348A (en) * 2015-09-22 2015-12-30 浙江工业大学 Method for colony conformation space optimization based on distance constraint selection strategy
CN105808973A (en) * 2016-03-03 2016-07-27 浙江工业大学 Staged multi-strategy-based group conformation space sampling method
CN106096326A (en) * 2016-06-02 2016-11-09 浙江工业大学 A kind of differential evolution Advances in protein structure prediction based on barycenter Mutation Strategy
CN106778059A (en) * 2016-12-19 2017-05-31 浙江工业大学 A kind of colony's Advances in protein structure prediction based on Rosetta local enhancements
CN107506613A (en) * 2017-08-29 2017-12-22 浙江工业大学 A kind of multi-modal protein conformation space optimization method based on multiple structural features

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A Population-based Conformational Optimal Algorithm Using Replica-exchange in Ab-initio Protein Structure Prediction;Guijun Zhang et al.;《2016 Chinese Control and Decision Conference (CCDC)》;20160530;第701-706页 *
基于能量模型的蛋白质构象空间优化方法研究;俞旭锋;《中国优秀硕士学位论文全文数据库 基础科学辑》;20180415(第04期);全文 *

Also Published As

Publication number Publication date
CN109509510A (en) 2019-03-22

Similar Documents

Publication Publication Date Title
CN108846256B (en) Group protein structure prediction method based on residue contact information
Yang et al. Molecular phylogenetics: principles and practice
CN108334746B (en) Protein structure prediction method based on secondary structure similarity
CN109509510B (en) Protein structure prediction method based on multi-population ensemble variation strategy
CN110148437B (en) Residue contact auxiliary strategy self-adaptive protein structure prediction method
CN105808973B (en) One kind is based on interim shifty group&#39;s conformational space method of sampling
CN109872770B (en) Variable strategy protein structure prediction method combined with displacement degree evaluation
CN111180004B (en) Multi-contact information sub-population strategy protein structure prediction method
Liu et al. De novo protein structure prediction by incremental inter-residue geometries prediction and model quality assessment using deep learning
CN109346126B (en) Adaptive protein structure prediction method of lower bound estimation strategy
CN109448786B (en) Method for predicting protein structure by lower bound estimation dynamic strategy
CN109360597B (en) Group protein structure prediction method based on global and local strategy cooperation
CN109378034B (en) Protein prediction method based on distance distribution estimation
CN109346128B (en) Protein structure prediction method based on residue information dynamic selection strategy
CN108595910B (en) Group protein conformation space optimization method based on diversity index
CN109461471B (en) Adaptive protein structure prediction method based on championship mechanism
CN113393900B (en) RNA state inference research method based on improved Transformer model
CN109147867B (en) Group protein structure prediction method based on dynamic segment length
CN111815036B (en) Protein structure prediction method based on multi-residue contact map cooperative constraint
CN109411013B (en) Group protein structure prediction method based on individual specific variation strategy
CN109360600B (en) Protein structure prediction method based on residue characteristic distance
CN110634531B (en) Protein structure prediction method based on double-layer bias search
Aggour et al. A highly parallel next-generation DNA sequencing data analysis pipeline in Hadoop
CN111161791A (en) Experimental data-assisted adaptive strategy protein structure prediction method
CN109243526B (en) Protein structure prediction method based on specific fragment crossing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20190322

Assignee: ZHEJIANG ORIENT GENE BIOTECH CO.,LTD.

Assignor: JIANG University OF TECHNOLOGY

Contract record no.: X2023980053610

Denomination of invention: A protein structure prediction method based on multiple ensemble mutation strategies

Granted publication date: 20210618

License type: Common License

Record date: 20231222

EE01 Entry into force of recordation of patent licensing contract