CN109509510B - Protein structure prediction method based on multi-population ensemble variation strategy - Google Patents
Protein structure prediction method based on multi-population ensemble variation strategy Download PDFInfo
- Publication number
- CN109509510B CN109509510B CN201810762915.4A CN201810762915A CN109509510B CN 109509510 B CN109509510 B CN 109509510B CN 201810762915 A CN201810762915 A CN 201810762915A CN 109509510 B CN109509510 B CN 109509510B
- Authority
- CN
- China
- Prior art keywords
- sub
- conformation
- population
- distance
- populations
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000000455 protein structure prediction Methods 0.000 title claims abstract description 19
- 230000035772 mutation Effects 0.000 claims abstract description 10
- 239000012634 fragment Substances 0.000 claims description 36
- 238000001228 spectrum Methods 0.000 claims description 29
- 102000004169 proteins and genes Human genes 0.000 claims description 21
- 108090000623 proteins and genes Proteins 0.000 claims description 21
- 238000012360 testing method Methods 0.000 claims description 16
- 230000015572 biosynthetic process Effects 0.000 claims description 8
- 230000008859 change Effects 0.000 claims description 4
- 238000005070 sampling Methods 0.000 abstract description 13
- 230000008569 process Effects 0.000 abstract description 5
- 238000005457 optimization Methods 0.000 description 7
- 238000011161 development Methods 0.000 description 3
- 230000006303 immediate early viral mRNA transcription Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 125000003275 alpha amino acid group Chemical group 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004883 computer application Methods 0.000 description 1
- 238000002884 conformational search Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000007876 drug discovery Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
Images
Abstract
A protein structure prediction method based on multi-population ensemble variation strategy is characterized in that under the framework of an evolutionary algorithm, a population is averagely divided into four sub-populations, and different variation strategies are respectively designed for each sub-population through conformation cooperative cooperation in each sub-population; and secondly, selecting the constellation according to a Rosetta energy function score3, a distance error coefficient and a Monte Carlo probability receiving criterion to guide the update process of the constellation, so that the problem of inaccuracy of the energy function can be relieved, algorithm sampling can be guided to obtain the constellation with lower energy and more reasonable structure, and the sampling efficiency is improved. The invention provides a protein structure prediction method based on multi-population ensemble mutation strategies, which is high in sampling efficiency and prediction accuracy.
Description
Technical Field
The invention relates to the fields of bioinformatics and computer application, in particular to a protein structure prediction method based on a multi-population ensemble variation strategy.
Background
The rapid development of computer hardware and software technologies provides a robust, fundamental platform for the development of de novo prediction methods. The progress and breakthrough of the de novo protein structure prediction method has further promoted the wide participation of the subject researchers in computer science and evolutionary computation, and has become one of the most active multidisciplinary research subjects in the field of protein structure prediction in recent years. In a review article published in the Science journal of 2012, professor Dill of academy of sciences of the united states of america reviewed the progress made from the field of de novo prediction for 50 years, and it was pointed out that in the process of seeking answers to this problem, the development of supercomputers, new materials and drug discovery was greatly promoted, helping people understand the basic process of life. De novo prediction methods currently face a number of difficulties and challenges.
The de novo prediction method is directly based on a protein physical or knowledge energy model, and utilizes an optimization algorithm to search a global minimum energy conformational solution in a conformational space. The conformation space optimization method is one of the most critical factors for restricting the de novo prediction precision of the protein structure at present. The application of the optimization algorithm to the de novo prediction sampling process must first solve the following three problems: (1) the complexity of the energy. (2) High dimensional properties of the energy model. (3) Inaccuracy of the energy model. At present, we are far from constructing a force field which can guide the target sequence to fold towards the correct direction and is accurate enough, so that the optimal solution in mathematics does not necessarily correspond to the natural structure of the target protein; furthermore, model inaccuracies can also result in an inability to objectively analyze the performance of the optimization algorithm.
The inherent complexity of spatial optimization of protein conformation makes it a very challenging research topic in the field of de novo protein structure prediction. In order to find unique native protein structures in a huge sampling space by using a computer, an efficient conformational space optimization algorithm must be designed to convert the native protein structures into a practical computational problem.
The differential evolution algorithm (DE) has been successfully applied to protein structure prediction due to its advantages of simple structure, easy implementation, strong robustness, fast convergence rate, etc. However, with the increase of amino acid sequences, the degree of freedom of a protein molecular system is increased, and obtaining a global optimal solution of a large-scale protein conformation space by using the traditional population algorithm sampling becomes challenging work; secondly, the coarse-grained model reduces the conformational search space, but also causes information loss between interaction forces, thereby directly affecting the prediction accuracy.
Therefore, the conventional protein structure prediction method has disadvantages in sampling efficiency and prediction accuracy, and needs to be improved.
Disclosure of Invention
In order to overcome the defects of low sampling efficiency, poor population diversity and low prediction precision of the conventional protein structure prediction method, the invention introduces a multi-population mutation strategy to guide conformational space optimization under the framework of a basic differential evolution algorithm, and provides the protein structure prediction method based on the multi-population ensemble mutation strategy, which has high sampling efficiency and high prediction precision.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a method of protein structure prediction based on a multi-population ensemble mutation strategy, the prediction method comprising the steps of:
1) sequence information for a given protein of interest;
2) from ROBETTA servers according to the target protein sequence (http://www.robetta.org/) Obtaining fragment library files, wherein the fragment library files comprise 3 fragment library files and 9 fragment library files;
3) from QUARK Server (https:// zhanglab. ccmb. med. umich. edu/QUARK /)
Obtaining a distance spectrum file;
4) setting parameters: the population size NP, the maximum iteration algebra G of the algorithm, a cross factor CR and a temperature factor beta, and the iteration algebra G is set to be 0;
5) population initialization: random fragment assembly to generate NP initial conformations CiI ═ 1,2, …, NP }, and the NP individuals are divided equally into four sub-populations, i.e.Andwherein, j is {1,2, …, NP/4}, k is { NP/4+1, …, NP/2}, m is { NP/2+1, …, NP3/4}, and n is {3NP/4+1, …, NP };
6.1) mixingSet as a target individualRandomly selecting a conformational individual in a first sub-population Randomly selecting two sub-populations from the remaining three sub-populations, and randomly taking out two individuals C from the two sub-populationsa,CbFrom C, respectivelya、CbIn the method, a 9 segment with different positions is randomly selected and respectively replaced toFragments of the corresponding positions generate a mutated conformationTo pairPerforming a fragment assembly to generate a conformation
6.2) randomly generating a uniformly distributed fraction R between 0 and 1, if R>CR, thenRandomly selects a 9-segment to replace toA corresponding position; otherwise, it keepsThe conformation resulting from this operation was recorded as the test conformation, without change
6.4) ifThen conformationReplacement ofAdding 1 to the receiving times count1, and going to step 6.8), otherwise, continuing to execute step 6.5);
6.5) separate computations based on residue pairs in the distance spectrumAndinter-residue distance ofAndthen respectively calculating according to formulas (1) and (2)Anddistance error coefficient D oftrialAnd DtargetWhere T represents the number of pairs of residues in the distance spectrum,andrespectively representAndresidues in the t-th pair of conformations correspond to CαDistance between atoms, dNRepresenting the mean value of the distance spectrum in the Nth distance interval of the distance spectrum, PDNRepresenting the number of distance spectrum lengths within the interval N, the distance range in the distance spectrum is (0,9), the distance interval is 0.5, i.e., the distance interval is (0, 0.5)],(0.5,1],…,(8.5,9);
6.6) if Dtrial<DtargetThen conformationAlternative conformationsAdding 1 to the receiving times count1, otherwise, performing step 6.7);
6.7) calculating the difference in the distance error coefficients of the target and test conformationsAccording to probabilityAcceptance of conformation by Monte Carlo criteriaWherein β is a temperature factor;
6.8) j equals j +1, iteratively executing steps 6.1) -6.8) until j equals NP/4;
7.1) formation ofRecording as target individualSelecting a lowest energy conformation from the second sub-populationTwo of the three subgroups were randomly selected, and two conformations C were randomly selected from themc、CdAre respectively paired with Cc、CdRandomly selecting a 9 segment from different positions to replace the 9 segmentCorresponding position, generatingTo pairPerforming a fragment assembly to generate a conformation
7.2) pairs of steps corresponding to 6.2) to 6.7)Andperforming an operation wherein the number of times the test conformation is received is denoted as count 2;
7.3) k equals k +1, and iteratively executing steps 7.1) -7.2) until k equals NP/2;
8) for each conformation in the third group of sub-populationsThe operation was carried out as follows:
8.1) formation ofIs recorded as a target individualSorting the third group of sub-populations from smaller to larger energy, and randomly selecting an individual in the first half of the conformationsThen randomly selecting two sub-populations from the other three sub-populations, and randomly selecting conformation C from the two sub-populationseAnd CfAre respectively paired with Ce、CfRandomly selecting a 9 segment from different positions to replace the 9 segmentCorresponding position, generatingTo pairPerforming a fragment assembly to generate a conformation
8.2) pairing of constellations according to the corresponding steps 6.2) to 6.7)Andperforming an operation wherein the number of times the test conformation is received is denoted as count 3;
8.3) m is m +1, and the steps 8.1) to 8.2) are executed in an iterative manner until k is NP 3/4;
10) iteratively operating steps 6) -9), carrying out variation on the fourth sub-population by selecting a population variation strategy corresponding to the maximum value of count1, count2 and count3 every 20 generations to calculate the sizes of count1, count2 and count3, operating according to steps 6.2) -6.8), and setting the count1, count2 and count3 to zero;
11) g +1, iteratively executing steps 6) -10) until G is greater than G;
12) and outputting the result.
The technical conception of the invention is as follows: under an evolutionary algorithm framework, firstly, a population is averagely divided into four sub-populations, and different variation strategies are respectively designed for each sub-population through the conformation collaborative cooperation in each sub-population; and secondly, selecting the constellation according to a Rosetta energy function score3, a distance error coefficient and a Monte Carlo probability receiving criterion to guide the update process of the constellation, so that the problem of inaccuracy of the energy function can be relieved, algorithm sampling can be guided to obtain the constellation with lower energy and more reasonable structure, and the sampling efficiency is improved. The invention provides a protein structure prediction method based on multi-population ensemble mutation strategies, which is high in sampling efficiency and prediction accuracy.
The invention has the beneficial effects that: through the cooperation among multiple populations to guide variation, the sampling efficiency can be improved, and the population diversity can be kept; the distance spectrum is used for assisting the conformation selection, so that although the energy function is high, the conformation with a reasonable structure is kept, the problem of prediction error caused by inaccuracy of the energy function is solved, and the prediction accuracy is improved.
Drawings
FIG. 1 is a conformational profile of protein 2EZK sampled by a protein structure prediction method based on a multi-population ensemble mutation strategy.
FIG. 2 is a schematic diagram of the conformational update when protein 2EZK is sampled by a protein structure prediction method based on a multi-population ensemble mutation strategy.
FIG. 3 is a three-dimensional structure predicted from the structure of protein 2EZK by a protein structure prediction method based on a multi-population ensemble mutation strategy;
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1 to 3, a method for predicting protein structure based on a multi-population ensemble mutation strategy, the method comprising the steps of:
1) sequence information for a given protein of interest;
2) from ROBETTA servers according to the target protein sequence (http://www.robetta.org/) Obtaining fragment library files, wherein the fragment library files comprise 3 fragment library files and 9 fragment library files;
3) obtaining a distance spectrum file from a QUARK server (https:// zhanglab. ccmb. med. umich. edu/QUARK /) according to the sequence information;
4) setting parameters: the population size NP, the maximum iteration algebra G of the algorithm, a cross factor CR and a temperature factor beta, and the iteration algebra G is set to be 0;
5) population initialization: random fragment assembly to generate NP initial conformations CiI ═ 1,2, …, NP }, and the NP individuals are divided equally into four sub-populations, i.e.Andwherein, j is {1,2, …, NP/4}, k is { NP/4+1, …, NP/2}, m is { NP/2+1, …, NP3/4}, and n is {3NP/4+1, …, NP };
6.1) mixingSet as a target individualRandomly selecting a conformational individual in a first sub-population Randomly selecting two sub-populations from the remaining three sub-populations, and randomly taking out two individuals C from the two sub-populationsa,CbFrom C, respectivelya、CbIn the method, a 9 segment with different positions is randomly selected and respectively replaced toFragments of the corresponding positions generate a mutated conformationTo pairPerforming a fragment assembly to generate a conformation
6.2) randomly generating a uniformly distributed fraction R between 0 and 1, if R>CR, thenRandomly selects a 9-segment to replace toA corresponding position; otherwise, it keepsThe conformation resulting from this operation was recorded as the test conformation, without change
6.4) ifThen conformationReplacement ofAdding 1 to the receiving times count1, and going to step 6.8), otherwise, continuing to execute step 6.5);
6.5) separate computations based on residue pairs in the distance spectrumAndinter-residue distance ofAndthen respectively calculating according to formulas (1) and (2)Anddistance error coefficient D oftrialAnd DtargetWhere T represents the number of pairs of residues in the distance spectrum,andrespectively representAndresidues in the t-th pair of conformations correspond to CαDistance between atoms, dNRepresenting the mean value of the distance spectrum in the Nth distance interval of the distance spectrum, PDNRepresenting the number of distance spectrum lengths within the interval N, the distance range in the distance spectrum is (0,9), the distance interval is 0.5, i.e., the distance interval is (0, 0.5)],(0.5,1],…,(8.5,9);
6.6) if Dtrial<DtargetThen conformationAlternative conformationsAdding 1 to the receiving times count1, otherwise, performing step 6.7);
6.7) calculating the difference in the distance error coefficients of the target and test conformationsAccording to probabilityAcceptance of conformation by Monte Carlo criteriaWherein β is a temperature factor;
6.8) j equals j +1, iteratively executing steps 6.1) -6.8) until j equals NP/4;
7.1) formation ofRecording as target individualSelecting a lowest energy conformation from the second sub-populationRandomly selecting two of the other three subgroups, and randomly selecting one of themTwo conformations selected Cc、CdAre respectively paired with Cc、CdRandomly selecting a 9 segment from different positions to replace the 9 segmentCorresponding position, generatingTo pairPerforming a fragment assembly to generate a conformation
7.2) pairs of steps corresponding to 6.2) to 6.7)Andperforming an operation wherein the number of times the test conformation is received is denoted as count 2;
7.3) k equals k +1, and iteratively executing steps 7.1) -7.2) until k equals NP/2;
8) for each conformation in the third group of sub-populationsThe operation was carried out as follows:
8.1) formation ofIs recorded as a target individualSorting the third group of sub-populations from smaller to larger energy, and randomly selecting an individual in the first half of the conformationsAnd then from itRandomly selecting two sub-populations from the three sub-populations, and randomly selecting conformation C from the two sub-populationseAnd CfAre respectively paired with Ce、CfRandomly selecting a 9 segment from different positions to replace the 9 segmentCorresponding position, generatingTo pairPerforming a fragment assembly to generate a conformation
8.2) pairing of constellations according to the corresponding steps 6.2) to 6.7)Andperforming an operation wherein the number of times the test conformation is received is denoted as count 3;
8.3) m is m +1, and the steps 8.1) to 8.2) are executed in an iterative manner until k is NP 3/4;
10) iteratively operating steps 6) -9), carrying out variation on the fourth sub-population by selecting a population variation strategy corresponding to the maximum value of count1, count2 and count3 every 20 generations to calculate the sizes of count1, count2 and count3, operating according to steps 6.2) -6.8), and setting the count1, count2 and count3 to zero;
11) g +1, iteratively executing steps 6) -10) until G is greater than G;
12) and outputting the result.
Taking alpha protein 2EZK with the sequence length of 99 as an example, a protein structure prediction method based on a multi-population ensemble variation strategy comprises the following steps:
1) sequence information for a given protein of interest;
2) from ROBETTA servers according to the target protein sequence (http://www.robetta.org/) Obtaining fragment library files, wherein the fragment library files comprise 3 fragment library files and 9 fragment library files;
3) obtaining a distance spectrum file from a QUARK server (https:// zhanglab. ccmb. med. umich. edu/QUARK /) according to the sequence information;
4) setting parameters: the population size NP is 100, the maximum iteration algebra G of the algorithm is 1000, the crossover factor CR is 0.3, the temperature factor β is 2, and the iteration algebra G is 0;
5) population initialization: random fragment assembly to generate NP initial conformations CiI ═ 1,2, …, NP }, and the NP individuals are divided equally into four sub-populations, i.e.Andwherein, j is {1,2, …, NP/4}, k is { NP/4+1, …, NP/2}, m is { NP/2+1, …, NP3/4}, and n is {3NP/4+1, …, NP };
6.1) mixingSet as a target individualRandomly selecting a conformational individual in a first sub-population Randomly selecting two sub-populations from the remaining three sub-populations, and randomly taking out two individuals C from the two sub-populationsa,CbFrom C, respectivelya、CbIn the method, a 9 segment with different positions is randomly selected and respectively replaced toFragments of the corresponding positions generate a mutated conformationTo pairPerforming a fragment assembly to generate a conformation
6.2) randomly generating a uniformly distributed fraction R between 0 and 1, if R>CR, thenRandomly selects a 9-segment to replace toA corresponding position; otherwise, it keepsThe conformation resulting from this operation was recorded as the test conformation, without change
6.4) ifThen conformationReplacement ofAdding 1 to the receiving times count1, and going to step 6.8), otherwise, continuing to execute step 6.5);
6.5) separate computations based on residue pairs in the distance spectrumAndinter-residue distance ofAndthen respectively calculating according to formulas (1) and (2)Anddistance error coefficient D oftrialAnd DtargetWhere T represents the number of pairs of residues in the distance spectrum,andrespectively representAndresidues in the t-th pair of conformations correspond to CαDistance between atoms, dNRepresenting the mean value of the distance spectrum in the Nth distance interval of the distance spectrum, PDNRepresenting the number of distance spectrum lengths within the interval N, the distance range in the distance spectrum is (0,9), the distance interval is 0.5, i.e., the distance interval is (0, 0.5)],(0.5,1],…,(8.5,9);
6.6) if Dtrial<DtargetThen conformationAlternative conformationsAdding 1 to the receiving times count1, otherwise, performing step 6.7);
6.7) calculating the difference in the distance error coefficients of the target and test conformationsAccording to probabilityAcceptance of conformation by Monte Carlo criteriaWherein β is a temperature factor;
6.8) j equals j +1, iteratively executing steps 6.1) -6.8) until j equals NP/4;
7.1) formation ofRecording as target individualSelecting a lowest energy conformation from the second sub-populationTwo of the three subgroups were randomly selected, and two conformations C were randomly selected from themc、CdAre respectively paired with Cc、CdRandomly selecting a 9 segment from different positions to replace the 9 segmentCorresponding position, generatingTo pairPerforming a fragment assembly to generate a conformation
7.2) pairs of steps corresponding to 6.2) to 6.7)Andperforming an operation wherein the number of times the test conformation is received is denoted as count 2;
7.3) k equals k +1, and iteratively executing steps 7.1) -7.2) until k equals NP/2;
8) for each conformation in the third group of sub-populationsThe operation was carried out as follows:
8.1) formation ofIs recorded as a target individualSorting the third group of sub-populations from smaller to larger energy, and randomly selecting an individual in the first half of the conformationsThen randomly selecting two sub-populations from the other three sub-populations, and randomly selecting conformation C from the two sub-populationseAnd CfAre respectively paired with Ce、CfRandomly selecting a 9 segment from different positions to replace the 9 segmentCorresponding position, generatingTo pairPerforming a fragment assembly to generate a conformation
8.2) pairing of constellations according to the corresponding steps 6.2) to 6.7)Andperforming an operation wherein the number of times the test conformation is received is denoted as count 3;
8.3) m is m +1, and the steps 8.1) to 8.2) are executed in an iterative manner until k is NP 3/4;
10) iteratively operating steps 6) -9), carrying out variation on the fourth sub-population by selecting a population variation strategy corresponding to the maximum value of count1, count2 and count3 every 20 generations to calculate the sizes of count1, count2 and count3, operating according to steps 6.2) -6.8), and setting the count1, count2 and count3 to zero;
11) g +1, iteratively executing steps 6) -10) until G is greater than G;
12) and outputting the result.
Taking alpha protein 2EZK with sequence length 99 as an example, the near-native conformation of the protein is obtained by the above method, and the mean root mean square deviation between the structure obtained by running 1000 generations and the native structure isMinimum root mean square deviation ofThe predicted three-dimensional structure is shown in fig. 3.
The foregoing illustrates one example of the invention, and it will be apparent that the invention is not limited to the above-described embodiments, but may be practiced with various modifications without departing from the essential spirit of the invention and without departing from the spirit thereof.
Claims (1)
1. A protein structure prediction method based on multi-population ensemble mutation strategy is characterized in that: the method comprises the following steps:
1) sequence information for a given protein of interest;
2) obtaining fragment library files from a ROBETTA server according to a target protein sequence, wherein the fragment library files comprise 3 fragment library files and 9 fragment library files;
3) obtaining a distance spectrum file from a QUARK server according to the sequence information;
4) setting parameters: the population size NP, the maximum iteration algebra G of the algorithm, a cross factor CR and a temperature factor beta, and the iteration algebra G is set to be 0;
5) population initialization: random fragment assembly to generate NP initial conformations CiI ═ 1,2, …, NP }, and the NP individuals are divided equally into four sub-populations, i.e.Andwherein, j is {1,2, …, NP/4}, k is { NP/4+1, …, NP/2}, m is { NP/2+1, …, NP3/4}, and n is {3NP/4+1, …, NP };
6.1) mixingSet as a target individualRandomly selecting a conformational individual in a first sub-population From the remaining threeRandomly selecting two sub-populations from the sub-populations, and randomly taking out two individuals C from the two sub-populations respectivelya,CbFrom C, respectivelya、CbIn the method, a 9 segment with different positions is randomly selected and respectively replaced toFragments of the corresponding positions generate a mutated conformationTo pairPerforming a fragment assembly to generate a conformation
6.2) randomly generating a uniformly distributed fraction R between 0 and 1, if R>CR, thenRandomly selects a 9-segment to replace toA corresponding position; otherwise, it keepsThe conformation resulting from this operation was recorded as the test conformation, without change
6.4) ifThen conformationReplacement of Adding 1 to the receiving times count1, and going to step 6.8), otherwise, continuing to execute step 6.5);
6.5) separate computations based on residue pairs in the distance spectrumAndinter-residue distance ofAndthen respectively calculating according to formulas (1) and (2)Anddistance error coefficient D oftrialAnd DtargetWhere T represents the number of pairs of residues in the distance spectrum,andrespectively representAndresidues in the t-th pair of conformations correspond to CαDistance between atoms, dNRepresenting the mean value of the distance spectrum in the Nth distance interval of the distance spectrum, PDNRepresenting the number of distance spectrum lengths within the interval N, the distance range in the distance spectrum is (0,9), the distance interval is 0.5, i.e., the distance interval is (0, 0.5)],(0.5,1],…,(8.5,9);
6.6) if Dtrial<DtargetThen conformationAlternative conformations Adding 1 to the receiving times count1, otherwise, performing step 6.7);
6.7) calculating the difference in the distance error coefficients of the target and test conformationsAccording to probabilityAcceptance of conformation by Monte Carlo criteriaWherein β is a temperature factor;
6.8) j equals j +1, iteratively executing steps 6.1) -6.8) until j equals NP/4;
7.1) formation ofRecording as target individualSelecting a lowest energy conformation from the second sub-populationTwo of the three subgroups were randomly selected, and two conformations C were randomly selected from themc、CdAre respectively paired with Cc、CdRandomly selecting a 9 segment from different positions to replace the 9 segmentCorresponding position, generatingTo pairPerforming a fragment assembly to generate a conformation
7.2) pairs of steps corresponding to 6.2) to 6.7)Andperforming an operation wherein the number of times the test conformation is received is denoted as count 2;
7.3) k equals k +1, and iteratively executing steps 7.1) -7.2) until k equals NP/2;
8) for each conformation in the third group of sub-populationsThe operation was carried out as follows:
8.1) formation ofIs recorded as a target individualSorting the third group of sub-populations from smaller to larger energy, and randomly selecting an individual in the first half of the conformationsThen randomly selecting two sub-populations from the other three sub-populations, and randomly selecting conformation C from the two sub-populationseAnd CfAre respectively paired with Ce、CfRandomly selecting a 9 segment from different positions to replace the 9 segmentCorresponding position, generatingTo pairPerforming a fragment assembly to generate a conformation
8.2) pairing of constellations according to the corresponding steps 6.2) to 6.7)Andperforming an operation wherein the number of times the test conformation is received is denoted as count 3;
8.3) m is m +1, and the steps 8.1) to 8.2) are executed in an iterative manner until k is NP 3/4;
10) iteratively operating steps 6) -9), carrying out variation on the fourth sub-population by selecting a population variation strategy corresponding to the maximum value of count1, count2 and count3 every 20 generations to calculate the sizes of count1, count2 and count3, operating according to steps 6.2) -6.8), and setting the count1, count2 and count3 to zero;
11) g +1, iteratively executing steps 6) -10) until G is greater than G;
12) and outputting the result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810762915.4A CN109509510B (en) | 2018-07-12 | 2018-07-12 | Protein structure prediction method based on multi-population ensemble variation strategy |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810762915.4A CN109509510B (en) | 2018-07-12 | 2018-07-12 | Protein structure prediction method based on multi-population ensemble variation strategy |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109509510A CN109509510A (en) | 2019-03-22 |
CN109509510B true CN109509510B (en) | 2021-06-18 |
Family
ID=65745470
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810762915.4A Active CN109509510B (en) | 2018-07-12 | 2018-07-12 | Protein structure prediction method based on multi-population ensemble variation strategy |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109509510B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110148437B (en) * | 2019-04-16 | 2021-01-01 | 浙江工业大学 | Residue contact auxiliary strategy self-adaptive protein structure prediction method |
CN110162739B (en) * | 2019-04-30 | 2023-05-02 | 哈尔滨工业大学 | RFFKBMS algorithm weight updating and optimizing method based on forgetting factor |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104951670A (en) * | 2015-06-08 | 2015-09-30 | 浙江工业大学 | Group conformation space optimization method based on distance spectrum |
CN105205348A (en) * | 2015-09-22 | 2015-12-30 | 浙江工业大学 | Method for colony conformation space optimization based on distance constraint selection strategy |
CN105808973A (en) * | 2016-03-03 | 2016-07-27 | 浙江工业大学 | Staged multi-strategy-based group conformation space sampling method |
CN106096326A (en) * | 2016-06-02 | 2016-11-09 | 浙江工业大学 | A kind of differential evolution Advances in protein structure prediction based on barycenter Mutation Strategy |
CN106778059A (en) * | 2016-12-19 | 2017-05-31 | 浙江工业大学 | A kind of colony's Advances in protein structure prediction based on Rosetta local enhancements |
CN107506613A (en) * | 2017-08-29 | 2017-12-22 | 浙江工业大学 | A kind of multi-modal protein conformation space optimization method based on multiple structural features |
-
2018
- 2018-07-12 CN CN201810762915.4A patent/CN109509510B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104951670A (en) * | 2015-06-08 | 2015-09-30 | 浙江工业大学 | Group conformation space optimization method based on distance spectrum |
CN105205348A (en) * | 2015-09-22 | 2015-12-30 | 浙江工业大学 | Method for colony conformation space optimization based on distance constraint selection strategy |
CN105808973A (en) * | 2016-03-03 | 2016-07-27 | 浙江工业大学 | Staged multi-strategy-based group conformation space sampling method |
CN106096326A (en) * | 2016-06-02 | 2016-11-09 | 浙江工业大学 | A kind of differential evolution Advances in protein structure prediction based on barycenter Mutation Strategy |
CN106778059A (en) * | 2016-12-19 | 2017-05-31 | 浙江工业大学 | A kind of colony's Advances in protein structure prediction based on Rosetta local enhancements |
CN107506613A (en) * | 2017-08-29 | 2017-12-22 | 浙江工业大学 | A kind of multi-modal protein conformation space optimization method based on multiple structural features |
Non-Patent Citations (2)
Title |
---|
A Population-based Conformational Optimal Algorithm Using Replica-exchange in Ab-initio Protein Structure Prediction;Guijun Zhang et al.;《2016 Chinese Control and Decision Conference (CCDC)》;20160530;第701-706页 * |
基于能量模型的蛋白质构象空间优化方法研究;俞旭锋;《中国优秀硕士学位论文全文数据库 基础科学辑》;20180415(第04期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN109509510A (en) | 2019-03-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108846256B (en) | Group protein structure prediction method based on residue contact information | |
Yang et al. | Molecular phylogenetics: principles and practice | |
CN108334746B (en) | Protein structure prediction method based on secondary structure similarity | |
CN109509510B (en) | Protein structure prediction method based on multi-population ensemble variation strategy | |
CN110148437B (en) | Residue contact auxiliary strategy self-adaptive protein structure prediction method | |
CN105808973B (en) | One kind is based on interim shifty group's conformational space method of sampling | |
CN109872770B (en) | Variable strategy protein structure prediction method combined with displacement degree evaluation | |
CN111180004B (en) | Multi-contact information sub-population strategy protein structure prediction method | |
Liu et al. | De novo protein structure prediction by incremental inter-residue geometries prediction and model quality assessment using deep learning | |
CN109346126B (en) | Adaptive protein structure prediction method of lower bound estimation strategy | |
CN109448786B (en) | Method for predicting protein structure by lower bound estimation dynamic strategy | |
CN109360597B (en) | Group protein structure prediction method based on global and local strategy cooperation | |
CN109378034B (en) | Protein prediction method based on distance distribution estimation | |
CN109346128B (en) | Protein structure prediction method based on residue information dynamic selection strategy | |
CN108595910B (en) | Group protein conformation space optimization method based on diversity index | |
CN109461471B (en) | Adaptive protein structure prediction method based on championship mechanism | |
CN113393900B (en) | RNA state inference research method based on improved Transformer model | |
CN109147867B (en) | Group protein structure prediction method based on dynamic segment length | |
CN111815036B (en) | Protein structure prediction method based on multi-residue contact map cooperative constraint | |
CN109411013B (en) | Group protein structure prediction method based on individual specific variation strategy | |
CN109360600B (en) | Protein structure prediction method based on residue characteristic distance | |
CN110634531B (en) | Protein structure prediction method based on double-layer bias search | |
Aggour et al. | A highly parallel next-generation DNA sequencing data analysis pipeline in Hadoop | |
CN111161791A (en) | Experimental data-assisted adaptive strategy protein structure prediction method | |
CN109243526B (en) | Protein structure prediction method based on specific fragment crossing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20190322 Assignee: ZHEJIANG ORIENT GENE BIOTECH CO.,LTD. Assignor: JIANG University OF TECHNOLOGY Contract record no.: X2023980053610 Denomination of invention: A protein structure prediction method based on multiple ensemble mutation strategies Granted publication date: 20210618 License type: Common License Record date: 20231222 |
|
EE01 | Entry into force of recordation of patent licensing contract |