CN109448786B - Method for predicting protein structure by lower bound estimation dynamic strategy - Google Patents

Method for predicting protein structure by lower bound estimation dynamic strategy Download PDF

Info

Publication number
CN109448786B
CN109448786B CN201810994693.9A CN201810994693A CN109448786B CN 109448786 B CN109448786 B CN 109448786B CN 201810994693 A CN201810994693 A CN 201810994693A CN 109448786 B CN109448786 B CN 109448786B
Authority
CN
China
Prior art keywords
conformation
population
randomly selecting
segment
trial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810994693.9A
Other languages
Chinese (zh)
Other versions
CN109448786A (en
Inventor
张贵军
彭春祥
王柳静
周晓根
刘俊
胡俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201810994693.9A priority Critical patent/CN109448786B/en
Publication of CN109448786A publication Critical patent/CN109448786A/en
Application granted granted Critical
Publication of CN109448786B publication Critical patent/CN109448786B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

A lower bound estimation dynamic strategy protein structure prediction method is characterized in that under the framework of an evolutionary algorithm, firstly, two groups of strategy pools are established, each group of strategy pools has three different variation strategies, and the variation strategies in the different strategy pools are selected according to different evolution algebra; and secondly, selecting the variant conformation according to a lower bound estimation function, and finally selecting the conformation by using a Rosetta energy function score3 and a Monte Carlo Boltzmann receiving criterion. The invention provides a lower-bound estimation dynamic strategy protein structure prediction method with higher sampling efficiency and prediction precision.

Description

Method for predicting protein structure by lower bound estimation dynamic strategy
Technical Field
The invention relates to the fields of bioinformatics and computer application, in particular to a method for predicting a protein structure by using a lower-bound estimation dynamic strategy.
Background
Protein molecules play a crucial role in the course of biochemical reactions in biological cells. Their structural models and biological activity states are of great importance to our understanding and cure of various diseases. Proteins can only produce their characteristic biological functions by folding into a specific three-dimensional structure. Therefore, to understand the function of a protein, it is necessary to obtain its three-dimensional structure.
Protein tertiary structure prediction is an important task of bioinformatics. The most challenging problem of protein conformation optimization is to search the complex protein energy model function surface, and the finer the model is, the more detailed knowledge can be provided, and the more computing resources are needed.
The rapid development of computer hardware and software technologies provides a robust, fundamental platform for the development of predictions from the tertiary structure of proteins. The development and breakthrough of the protein structure de novo prediction method further promote the wide participation of subject researchers in computer science and evolutionary computation. The de novo prediction method is directly based on a protein physical or knowledge energy model, and utilizes an optimization algorithm to search a global minimum energy conformational solution in a conformational space. The conformation space optimization method is one of the most critical factors for restricting the de novo prediction precision of the protein structure at present. Currently, many optimization methods have been started to solve this problem. The application of the optimization algorithm to the de novo prediction sampling process must first solve the following three problems: (1) the complexity of the energy. (2) High dimensional properties of the energy model. (3) Inaccuracy of the energy model. To date, we are far from constructing a sufficiently accurate force field that can direct the target sequence to fold in the correct direction, resulting in a mathematically optimal solution that does not necessarily correspond to the native structure of the target protein; furthermore, model inaccuracies can also result in an inability to objectively analyze the performance of the optimization algorithm.
The differential evolution algorithm (DE) has been successfully applied to protein structure prediction due to its advantages of simple structure, easy implementation, strong robustness, fast convergence rate, etc. However, with the increase of amino acid sequences, the degree of freedom of a protein molecular system is increased, and obtaining a global optimal solution of a large-scale protein conformation space by using the traditional population algorithm sampling becomes challenging work; secondly, the coarse-grained model reduces the conformational search space, but also causes information loss between interaction forces, thereby directly affecting the prediction accuracy.
Therefore, the conventional protein structure prediction method has disadvantages in sampling efficiency and prediction accuracy, and needs to be improved.
Disclosure of Invention
In order to overcome the defects of low sampling efficiency, poor population diversity and low prediction precision of the conventional protein structure prediction method, the invention introduces a dynamic variation strategy to guide conformational space optimization under the framework of a basic differential evolution algorithm, and provides a lower-bound estimation dynamic strategy protein structure prediction method with high sampling efficiency and high prediction precision.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a method for predicting a protein structure of a lower bound estimation dynamic strategy, comprising the following steps:
1) sequence information for a given protein of interest;
2) obtaining a fragment library file from a ROBETTA server (http:// www.robetta.org /) according to a target protein sequence, wherein the fragment library file comprises a 3 fragment library file and a 9 fragment library file;
3) setting parameters: the method comprises the following steps of (1) setting a population size NP, a maximum iteration algebra G of an algorithm, a cross factor CR, a temperature factor beta and a slope control factor M to be an iteration algebra G equal to 0;
4) population initialization: random fragment assembly to generate NP initial conformations Ci,i={1,2,…,NP};
5) Each conformation CiThe three-dimensional coordinates of each carbon α atom of i ═ {1,2, …, NP } are combined into position coordinates of the conformation
Figure BDA0001781583230000021
Figure BDA0001781583230000022
A j-dimensional element representing a spatial position coordinate of the i-th conformation, len being a length of the protein sequence;
6) for each individual in the population CiThe following operations are carried out:
6.1) mixing CiSet as a target individual
Figure BDA0001781583230000023
If g is 0 or even, then steps 6.2) to 6.4) are performed, otherwise steps 6.5) to 6.7) are performed, generating Ctrial1、Ctrial2、Ctrial3
6.2) randomly selecting three individuals C different from each other in the populationa、CbAnd Cc
Figure BDA0001781583230000024
Respectively from Ca、CbRandomly selecting a 3-segment with different positions, and respectively replacing the 3-segment with CcFragments of the corresponding positions generate a mutated conformation Ctrial1
6.3) randomly selecting four mutually different individuals C in the populationa、Cb、CcAnd Cd
Figure BDA0001781583230000025
Respectively from Ca、Cb、CcRandomly selecting a 3-segment with different positions, and respectively replacing the 3-segment with CdFragments of the corresponding positions generate a mutated conformation Ctrial2
6.4) randomly selecting two mutually different individuals C in the populationa、Cb
Figure BDA0001781583230000026
Respectively from Ca、CbIn the method, a 3-segment with different positions is randomly selected and respectively replaced to
Figure BDA0001781583230000027
Fragments of the corresponding positions generate a mutated conformation Ctrial3
6.5) randomly selecting an energy ratio from the population
Figure BDA0001781583230000031
Low conformation CSLIf, if
Figure BDA0001781583230000032
As energy in the populationThe lowest amount of conformation, one conformation C is randomly selected from the whole populationSLThen randomly selecting two mutually unequal conformations C from the whole populationaAnd CbAnd is and
Figure BDA0001781583230000033
respectively from Ca、CbRandomly selecting a 3-segment with different positions, and respectively replacing the 3-segment with CSLFragments of the corresponding positions generate a mutated conformation Ctrial1
6.6) randomly selecting an energy ratio from the population
Figure BDA0001781583230000034
Low conformation CSLIf, if
Figure BDA0001781583230000035
A conformation C is randomly selected from the whole population as the lowest energy conformation in the populationSLThen randomly selecting a conformation C from the whole populationaAnd is and
Figure BDA0001781583230000036
respectively from Ca、CSLIn the method, a 3-segment with different positions is randomly selected and respectively replaced to
Figure BDA0001781583230000037
Fragments of the corresponding positions generate a mutated conformation Ctrial2
6.7) randomly selecting an energy ratio from the population
Figure BDA0001781583230000038
Low conformation CSLIf, if
Figure BDA0001781583230000039
A conformation C is randomly selected from the whole population as the lowest energy conformation in the populationSLThen randomly selecting two mutually unequal conformations C from the whole populationaAnd CbAnd is and
Figure BDA00017815832300000310
respectively from CSL、CbRandomly selecting a 3-segment with different positions, and respectively replacing the 3-segment with CaFragments of the corresponding positions generate a mutated conformation Ctrial3
6.8) finding the distance C from the populationtrial1、Ctrial2、Ctrial3Recent individual Cnear1、Cnear2、Cnear3Respectively combining the three-dimensional coordinates of each carbon alpha atom of the corresponding conformation into the position coordinates of the conformation, then Ctrial1、Ctrial2、Ctrial3And Cnear1、Cnear2、Cnear3Respectively are
Figure BDA00017815832300000311
Figure BDA00017815832300000312
Figure BDA00017815832300000313
6.9) if g is 0, C is calculated using Rosetta score3 energy function respectivelytrial1、Ctrial2、Ctrial3Energy score3 (C)trial1)、score3(Ctrial2) And score3 (C)trial3) And selecting the conformation with the smallest energy as CtrialAnd recording the spatial position coordinates thereof as
Figure BDA00017815832300000314
Calculating CtrialThe Euclidean distance from each conformation in the population is found, and the conformation C closest to the Euclidean distance is foundnearAnd recording the space position coordinates thereof as
Figure BDA00017815832300000315
6.10) if
Figure BDA0001781583230000041
Then C istrialReplacement of
Figure BDA0001781583230000042
Otherwise according to probability
Figure BDA0001781583230000043
Receiving a constellation using Monte Carlo criteria;
6.11) if g>0, calculating C by equation (1) respectivelytrial1、Ctrial2、Ctrial3Lower bound estimation UEtrial1、UEtrial2、UEtrial3
Figure BDA0001781583230000044
The conformation with the smallest lower bound estimate was selected as CtrialThe corresponding lower bound estimate is denoted as UEtrialAnd recording the spatial position coordinates thereof as
Figure BDA0001781583230000045
Calculating CtrialThe Euclidean distance from each conformation in the population is found, and the conformation C closest to the Euclidean distance is foundnearAnd recording the space position coordinates thereof as
Figure BDA0001781583230000046
6.12) if
Figure BDA0001781583230000047
Then C istrialIs rejected, otherwise C is calculatedtrialEnergy value score of (C) 3 (C)trial) If, if
Figure BDA0001781583230000048
Then C istrialReplacement of
Figure BDA0001781583230000049
Otherwise pressProbability of illumination
Figure BDA00017815832300000410
Receiving a constellation using Monte Carlo criteria;
7) g +1, and iteratively executing the steps 6) to 7) until G is larger than G;
8) the conformation with the lowest output energy is the final result.
The technical conception of the invention is as follows: under the frame of an evolutionary algorithm, firstly, two groups of strategy pools are established, each group of strategy pools has three different variation strategies, and the variation strategies in the different strategy pools are selected according to different evolution algebras; and secondly, selecting the variant conformation according to a lower bound estimation function, and finally selecting the conformation by using a Rosetta energy function score3 and a Monte Carlo Boltzmann receiving criterion. The invention provides a method for predicting a protein structure by a lower bound estimation dynamic strategy.
The invention has the beneficial effects that: the variation strategies of different strategy pools are selected according to population evolution algebra to guide variation, so that not only can the diversity of the population be improved, but also the problem of low sampling efficiency of the traditional evolution algorithm can be solved; and the lower bound estimation function is used for assisting the conformation selection, so that the selection efficiency is improved, the problem of prediction error caused by inaccurate energy function is relieved, and the prediction precision is improved.
Drawings
FIG. 1 is a conformational distribution diagram obtained by sampling protein 1GB1 by a lower bound estimation dynamic strategy protein structure prediction method.
FIG. 2 is a schematic diagram of the conformational update of protein 1GB1 when the lower bound estimation dynamic strategy protein structure prediction method samples the protein.
FIG. 3 is a three-dimensional structure predicted by a lower bound estimation dynamic strategy protein structure prediction method on the structure of protein 1GB 1.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1 to 3, a method for predicting a protein structure by a lower bound estimation dynamic strategy, the method comprising the steps of:
1) sequence information for a given protein of interest;
2) obtaining a fragment library file from a ROBETTA server (http:// www.robetta.org /) according to a target protein sequence, wherein the fragment library file comprises a 3 fragment library file and a 9 fragment library file;
3) setting parameters: the method comprises the following steps of (1) setting a population size NP, a maximum iteration algebra G of an algorithm, a cross factor CR, a temperature factor beta and a slope control factor M to be an iteration algebra G equal to 0;
4) population initialization: random fragment assembly to generate NP initial conformations Ci,i={1,2,…,NP};
5) Each conformation CiThe three-dimensional coordinates of each carbon α atom of i ═ {1,2, …, NP } are combined into position coordinates of the conformation
Figure BDA0001781583230000051
Figure BDA0001781583230000052
A j-dimensional element representing a spatial position coordinate of the i-th conformation, len being a length of the protein sequence;
6) for each individual in the population CiThe following operations are carried out:
6.1) mixing CiSet as a target individual
Figure BDA0001781583230000053
If g is 0 or even, then steps 6.2) to 6.4) are performed, otherwise steps 6.5) to 6.7) are performed, generating Ctrial1、Ctrial2、Ctrial3
6.2) randomly selecting three individuals C different from each other in the populationa、CbAnd Cc
Figure BDA0001781583230000054
Respectively from Ca、CbRandomly selecting a 3-segment with different positions, and respectively replacing the 3-segment with CcFragments of the corresponding positions generate a mutated conformation Ctrial1
6.3) randomly selecting four mutually different individuals C in the populationa、Cb、CcAnd Cd
Figure BDA0001781583230000055
Respectively from Ca、Cb、CcRandomly selecting a 3-segment with different positions, and respectively replacing the 3-segment with CdFragments of the corresponding positions generate a mutated conformation Ctrial2
6.4) randomly selecting two mutually different individuals C in the populationa、Cb
Figure BDA0001781583230000056
Respectively from Ca、CbIn the method, a 3-segment with different positions is randomly selected and respectively replaced to
Figure BDA0001781583230000061
Fragments of the corresponding positions generate a mutated conformation Ctrial3
6.5) randomly selecting an energy ratio from the population
Figure BDA0001781583230000062
Low conformation CSLIf, if
Figure BDA0001781583230000063
A conformation C is randomly selected from the whole population as the lowest energy conformation in the populationSLThen randomly selecting two mutually unequal conformations C from the whole populationaAnd CbAnd is and
Figure BDA0001781583230000064
respectively from Ca、CbRandomly selecting a 3-segment with different positions, and respectively replacing the 3-segment with CSLCorresponding bitFragment generation of variant conformation Ctrial1
6.6) randomly selecting an energy ratio from the population
Figure BDA0001781583230000065
Low conformation CSLIf, if
Figure BDA0001781583230000066
A conformation C is randomly selected from the whole population as the lowest energy conformation in the populationSLThen randomly selecting a conformation C from the whole populationaAnd is and
Figure BDA0001781583230000067
respectively from Ca、CSLIn the method, a 3-segment with different positions is randomly selected and respectively replaced to
Figure BDA0001781583230000068
Fragments of the corresponding positions generate a mutated conformation Ctrial2
6.7) randomly selecting an energy ratio from the population
Figure BDA0001781583230000069
Low conformation CSLIf, if
Figure BDA00017815832300000610
A conformation C is randomly selected from the whole population as the lowest energy conformation in the populationSLThen randomly selecting two mutually unequal conformations C from the whole populationaAnd CbAnd is and
Figure BDA00017815832300000611
respectively from CSL、CbRandomly selecting a 3-segment with different positions, and respectively replacing the 3-segment with CaFragments of the corresponding positions generate a mutated conformation Ctrial3
6.8) finding the distance C from the populationtrial1、Ctrial2、Ctrial3Recent individual Cnear1、Cnear2、Cnear3Respectively combining the three-dimensional coordinates of each carbon alpha atom of the corresponding conformation into the position coordinates of the conformation, then Ctrial1、Ctrial2、Ctrial3And Cnear1、Cnear2、Cnear3Respectively are
Figure BDA00017815832300000612
Figure BDA00017815832300000613
Figure BDA00017815832300000614
6.9) if g is 0, C is calculated using Rosetta score3 energy function respectivelytrial1、Ctrial2、Ctrial3Energy score3 (C)trial1)、score3(Ctrial2) And score3 (C)trial3) And selecting the conformation with the smallest energy as CtrialAnd recording the spatial position coordinates thereof as
Figure BDA00017815832300000615
Calculating CtrialThe Euclidean distance from each conformation in the population is found, and the conformation C closest to the Euclidean distance is foundnearAnd recording the space position coordinates thereof as
Figure BDA0001781583230000071
6.10) if
Figure BDA0001781583230000072
Then C istrialReplacement of
Figure BDA0001781583230000073
Otherwise according to probability
Figure BDA0001781583230000074
Receiving a constellation using Monte Carlo criteria;
6.11) if g>0, calculating C by equation (1) respectivelytrial1、Ctrial2、Ctrial3Lower bound estimation UEtrial1、UEtrial2、UEtrial3
Figure BDA0001781583230000075
The conformation with the smallest lower bound estimate was selected as CtrialThe corresponding lower bound estimate is denoted as UEtrialAnd recording the spatial position coordinates thereof as
Figure BDA0001781583230000076
Calculating CtrialThe Euclidean distance from each conformation in the population is found, and the conformation C closest to the Euclidean distance is foundnearAnd recording the space position coordinates thereof as
Figure BDA0001781583230000077
6.12) if
Figure BDA0001781583230000078
Then C istrialIs rejected, otherwise C is calculatedtrialEnergy value score of (C) 3 (C)trial) If, if
Figure BDA0001781583230000079
Then C istrialReplacement of
Figure BDA00017815832300000710
Otherwise according to probability
Figure BDA00017815832300000711
Receiving a constellation using Monte Carlo criteria;
7) g +1, and iteratively executing the steps 6) to 7) until G is larger than G;
8) the conformation with the lowest output energy is the final result.
Taking alpha/beta protein 1GB1 with the sequence length of 56 as an example, the method for predicting the protein structure by the lower bound estimation dynamic strategy comprises the following steps:
1) sequence information for a given protein of interest;
2) obtaining a fragment library file from a ROBETTA server (http:// www.robetta.org /) according to a target protein sequence, wherein the fragment library file comprises a 3 fragment library file and a 9 fragment library file;
3) setting parameters: the population size NP is 100, the maximum iteration algebra G of the algorithm is 1000, the crossover factor CR is 0.5, the temperature factor β is 2, the slope control factor M is 10000, and the iteration algebra G is 0;
4) population initialization: random fragment assembly to generate NP initial conformations Ci,i={1,2,…,NP};
5) Each conformation CiThe three-dimensional coordinates of each carbon α atom of i ═ {1,2, …, NP } are combined into position coordinates of the conformation
Figure BDA00017815832300000712
Figure BDA00017815832300000713
A j-dimensional element representing a spatial position coordinate of the i-th conformation, len being a length of the protein sequence;
6) for each individual in the population CiThe following operations are carried out:
6.1) mixing CiSet as a target individual
Figure BDA0001781583230000081
If g is 0 or even, then steps 6.2) to 6.4) are performed, otherwise steps 6.5) to 6.7) are performed, generating Ctrial1、Ctrial2、Ctrial3
6.2) randomly selecting three individuals C different from each other in the populationa、CbAnd Cc
Figure BDA0001781583230000082
Respectively from Ca、CbRandomly selecting a 3-segment with different positions, and respectively replacing the 3-segment with CcFragments of the corresponding positions generate a mutated conformation Ctrial1
6.3) randomly selecting four mutually different individuals C in the populationa、Cb、CcAnd Cd
Figure BDA0001781583230000083
Respectively from Ca、Cb、CcRandomly selecting a 3-segment with different positions, and respectively replacing the 3-segment with CdFragments of the corresponding positions generate a mutated conformation Ctrial2
6.4) randomly selecting two mutually different individuals C in the populationa、Cb
Figure BDA0001781583230000084
Respectively from Ca、CbIn the method, a 3-segment with different positions is randomly selected and respectively replaced to
Figure BDA0001781583230000085
Fragments of the corresponding positions generate a mutated conformation Ctrial3
6.5) randomly selecting an energy ratio from the population
Figure BDA0001781583230000086
Low conformation CSLIf, if
Figure BDA0001781583230000087
A conformation C is randomly selected from the whole population as the lowest energy conformation in the populationSLThen randomly selecting two mutually unequal conformations C from the whole populationaAnd CbAnd is and
Figure BDA0001781583230000088
respectively from Ca、CbRandomly selecting a 3-segment with different positions, and respectively replacing the 3-segment with CSLFragments of the corresponding positions generate a mutated conformation Ctrial1
6.6) Slave populationIn which an energy ratio is randomly selected
Figure BDA0001781583230000089
Low conformation CSLIf, if
Figure BDA00017815832300000810
A conformation C is randomly selected from the whole population as the lowest energy conformation in the populationSLThen randomly selecting a conformation C from the whole populationaAnd is and
Figure BDA00017815832300000811
respectively from Ca、CSLIn the method, a 3-segment with different positions is randomly selected and respectively replaced to
Figure BDA00017815832300000812
Fragments of the corresponding positions generate a mutated conformation Ctrial2
6.7) randomly selecting an energy ratio from the population
Figure BDA00017815832300000813
Low conformation CSLIf, if
Figure BDA00017815832300000814
A conformation C is randomly selected from the whole population as the lowest energy conformation in the populationSLThen randomly selecting two mutually unequal conformations C from the whole populationaAnd CbAnd is and
Figure BDA00017815832300000815
respectively from CSL、CbRandomly selecting a 3-segment with different positions, and respectively replacing the 3-segment with CaFragments of the corresponding positions generate a mutated conformation Ctrial3
6.8) finding the distance C from the populationtrial1、Ctrial2、Ctrial3Recent individual Cnear1、Cnear2、Cnear3Are respectively paired withIf the three-dimensional coordinates of each carbon alpha atom of the conformation are combined to form the position coordinates of the conformation, Ctrial1、Ctrial2、Ctrial3And Cnear1、Cnear2、Cnear3Respectively are
Figure BDA0001781583230000091
Figure BDA0001781583230000092
Figure BDA0001781583230000093
6.9) if g is 0, C is calculated using Rosetta score3 energy function respectivelytrial1、Ctrial2、Ctrial3Energy score3 (C)trial1)、score3(Ctrial2) And score3 (C)trial3) And selecting the conformation with the smallest energy as CtrialAnd recording the spatial position coordinates thereof as
Figure BDA0001781583230000094
Calculating CtrialThe Euclidean distance from each conformation in the population is found, and the conformation C closest to the Euclidean distance is foundnearAnd recording the space position coordinates thereof as
Figure BDA0001781583230000095
6.10) if
Figure BDA0001781583230000096
Then C istrialReplacement of
Figure BDA0001781583230000097
Otherwise according to probability
Figure BDA0001781583230000098
Receiving a constellation using Monte Carlo criteria;
6.11) if g>0, calculating C by equation (1) respectivelytrial1、Ctrial2、Ctrial3Lower bound estimation UEtrial1、UEtrial2、UEtrial3
Figure BDA0001781583230000099
The conformation with the smallest lower bound estimate was selected as CtrialThe corresponding lower bound estimate is denoted as UEtrialAnd recording the spatial position coordinates thereof as
Figure BDA00017815832300000910
Calculating CtrialThe Euclidean distance from each conformation in the population is found, and the conformation C closest to the Euclidean distance is foundnearAnd recording the space position coordinates thereof as
Figure BDA00017815832300000911
6.12) if
Figure BDA00017815832300000912
Then C istrialIs rejected, otherwise C is calculatedtrialEnergy value score of (C) 3 (C)trial) If, if
Figure BDA00017815832300000913
Then C istrialReplacement of
Figure BDA00017815832300000914
Otherwise according to probability
Figure BDA00017815832300000915
Receiving a constellation using Monte Carlo criteria;
7) g +1, and iteratively executing the steps 6) to 7) until G is larger than G;
8) the conformation with the lowest output energy is the final result.
Taking alpha/beta protein 1GB1 with sequence length of 56 as an example, the method is used for obtaining the near-natural state conformation of the protein, and the structure and the natural state conformation obtained by running 1000 generationsThe mean RMS deviation between state structures is
Figure BDA0001781583230000101
Minimum root mean square deviation of
Figure BDA0001781583230000102
The predicted three-dimensional structure is shown in fig. 3.
The foregoing illustrates one example of the invention, and it will be apparent that the invention is not limited to the above-described embodiments, but may be practiced with various modifications without departing from the essential spirit of the invention and without departing from the spirit thereof.

Claims (1)

1. A method for predicting a protein structure by a lower bound estimation dynamic strategy is characterized by comprising the following steps: the method comprises the following steps:
1) sequence information for a given protein of interest;
2) obtaining fragment library files from a ROBETTA server according to a target protein sequence, wherein the fragment library files comprise 3 fragment library files and 9 fragment library files;
3) setting parameters: the method comprises the following steps of (1) setting a population size NP, a maximum iteration algebra G of an algorithm, a temperature factor beta and a slope control factor M to be an iteration algebra G equal to 0;
4) population initialization: random fragment assembly to generate NP initial conformations Ci,i={1,2,…,NP};
5) Each conformation CiAre combined into position coordinates of the conformation
Figure FDA0002836553220000011
Figure FDA0002836553220000012
A j-th dimension element representing a spatial position coordinate of an i-th conformation, j being 1, 2.., 3len, len being a length of a protein sequence;
6) for each individual in the population CiThe following operations are carried out:
6.1) mixing CiSet as a target individual
Figure FDA0002836553220000013
If g is 0 or even, then steps 6.2) to 6.4) are performed, otherwise steps 6.5) to 6.7) are performed, generating Ctrial1、Ctrial2、Ctrial3
6.2) randomly selecting three individuals C different from each other in the populationa1、Cb1And Cc1
Figure FDA0002836553220000014
Respectively from Ca1、Cb1Randomly selecting a 3-segment with different positions, and respectively replacing the 3-segment with Cc1Fragments of the corresponding positions generate a mutated conformation Ctrial1
6.3) randomly selecting four mutually different individuals C in the populationa2、Cb2、Cc2And Cd2
Figure FDA0002836553220000015
Respectively from Ca2、Cb2、Cc2Randomly selecting a 3-segment with different positions, and respectively replacing the 3-segment with Cd2Fragments of the corresponding positions generate a mutated conformation Ctrial2
6.4) randomly selecting two mutually different individuals C in the populationa3、Cb3
Figure FDA0002836553220000016
Respectively from Ca3、Cb3In the method, a 3-segment with different positions is randomly selected and respectively replaced to
Figure FDA0002836553220000017
Fragments of the corresponding positions generate a mutated conformation Ctrial3
6.5) randomly selecting an energy ratio from the population
Figure FDA0002836553220000018
Low conformation CSLIf, if
Figure FDA0002836553220000019
A conformation C is randomly selected from the whole population as the lowest energy conformation in the populationSLThen randomly selecting two mutually unequal conformations C from the whole populationa4And Cb4And is and
Figure FDA00028365532200000110
respectively from Ca4、Cb4Randomly selecting a 3-segment with different positions, and respectively replacing the 3-segment with CSLFragments of the corresponding positions generate a mutated conformation Ctrial1
6.6) randomly selecting an energy ratio from the population
Figure FDA0002836553220000021
Low conformation CSLIf, if
Figure FDA0002836553220000022
A conformation C is randomly selected from the whole population as the lowest energy conformation in the populationSLThen randomly selecting a conformation C from the whole populationa5And is and
Figure FDA0002836553220000023
respectively from Ca5、CSLIn the method, a 3-segment with different positions is randomly selected and respectively replaced to
Figure FDA0002836553220000024
Fragments of the corresponding positions generate a mutated conformation Ctrial2
6.7) randomly selecting an energy ratio from the population
Figure FDA0002836553220000025
Low conformation CSLIf, if
Figure FDA0002836553220000026
A conformation C is randomly selected from the whole population as the lowest energy conformation in the populationSLThen randomly selecting two mutually unequal conformations C from the whole populationa6And Cb6And is and
Figure FDA0002836553220000027
respectively from CSL、Cb6Randomly selecting a 3-segment with different positions, and respectively replacing the 3-segment with Ca6 Fragments of the corresponding positions generate a mutated conformation Ctrial3
6.8) finding the distance C from the populationtrial1、Ctrial2、Ctrial3Recent individual Cnear1、Cnear2、Cnear3Respectively combining the three-dimensional coordinates of each carbon alpha atom of the corresponding conformation into the position coordinates of the conformation, then Ctrial1、Ctrial2、Ctrial3And Cnear1、Cnear2、Cnear3Respectively are
Figure FDA0002836553220000028
Figure FDA0002836553220000029
Figure FDA00028365532200000210
6.9) if g is 0, C is calculated using Rosetta score3 energy function respectivelytrial1、Ctrial2、Ctrial3Energy score3 (C)trial1)、score3(Ctrial2) And score3 (C)trial3) And selecting the conformation with the smallest energy as CtrialAnd recording the spatial position coordinates thereof as
Figure FDA00028365532200000211
Calculating CtrialThe Euclidean distance from each conformation in the population is found, and the conformation C closest to the Euclidean distance is foundnearAnd recording the space position coordinates thereof as
Figure FDA00028365532200000212
6.10) if
Figure FDA00028365532200000213
Then C istrialReplacement of
Figure FDA00028365532200000214
Otherwise according to probability
Figure FDA00028365532200000215
Receiving a constellation using Monte Carlo criteria;
6.11) if g>0, calculating C by equation (1) respectivelytrial1、Ctrial2、Ctrial3Lower bound estimation UEtrial1、UEtrial2、UEtrial3
Figure FDA0002836553220000031
The conformation with the smallest lower bound estimate was selected as CtrialThe corresponding lower bound estimate is denoted as UEtrialAnd recording the spatial position coordinates thereof as
Figure FDA0002836553220000032
Calculating CtrialThe Euclidean distance from each conformation in the population is found, and the conformation C closest to the Euclidean distance is foundnearAnd recording the space position coordinates thereof as
Figure FDA0002836553220000033
6.12) if
Figure FDA0002836553220000034
Then C istrialIs rejected, otherwise C is calculatedtrialEnergy value score of (C) 3 (C)trial) If, if
Figure FDA0002836553220000035
Then C istrialReplacement of
Figure FDA0002836553220000036
Otherwise according to probability
Figure FDA0002836553220000037
Receiving a constellation using Monte Carlo criteria;
7) g +1, and iteratively executing the steps 6) to 7) until G is larger than G;
8) the conformation with the lowest output energy is the final result.
CN201810994693.9A 2018-08-29 2018-08-29 Method for predicting protein structure by lower bound estimation dynamic strategy Active CN109448786B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810994693.9A CN109448786B (en) 2018-08-29 2018-08-29 Method for predicting protein structure by lower bound estimation dynamic strategy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810994693.9A CN109448786B (en) 2018-08-29 2018-08-29 Method for predicting protein structure by lower bound estimation dynamic strategy

Publications (2)

Publication Number Publication Date
CN109448786A CN109448786A (en) 2019-03-08
CN109448786B true CN109448786B (en) 2021-04-06

Family

ID=65530202

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810994693.9A Active CN109448786B (en) 2018-08-29 2018-08-29 Method for predicting protein structure by lower bound estimation dynamic strategy

Country Status (1)

Country Link
CN (1) CN109448786B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111161791B (en) * 2019-11-28 2021-06-18 浙江工业大学 Experimental data-assisted adaptive strategy protein structure prediction method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103413067A (en) * 2013-07-30 2013-11-27 浙江工业大学 Abstract convex lower-bound estimation based protein structure prediction method
CN104732115A (en) * 2014-11-25 2015-06-24 浙江工业大学 Protein conformation optimization method based on simple space abstract convexity lower bound estimation
CN105224987A (en) * 2015-09-22 2016-01-06 浙江工业大学 A kind of change strategy colony global optimization method based on dynamic Lipschitz Lower Bound Estimation
CN106096328A (en) * 2016-04-26 2016-11-09 浙江工业大学 A kind of double-deck differential evolution Advances in protein structure prediction based on locally Lipschitz function supporting surface
CN106503484A (en) * 2016-09-23 2017-03-15 浙江工业大学 A kind of multistage differential evolution Advances in protein structure prediction that is estimated based on abstract convex

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004065363A2 (en) * 2003-01-21 2004-08-05 The Trustees Of The University Of Pennsylvania Computational design of a water-soluble analog of a protein, such as phospholamban and potassium channel kcsa

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103413067A (en) * 2013-07-30 2013-11-27 浙江工业大学 Abstract convex lower-bound estimation based protein structure prediction method
CN104732115A (en) * 2014-11-25 2015-06-24 浙江工业大学 Protein conformation optimization method based on simple space abstract convexity lower bound estimation
CN105224987A (en) * 2015-09-22 2016-01-06 浙江工业大学 A kind of change strategy colony global optimization method based on dynamic Lipschitz Lower Bound Estimation
CN106096328A (en) * 2016-04-26 2016-11-09 浙江工业大学 A kind of double-deck differential evolution Advances in protein structure prediction based on locally Lipschitz function supporting surface
CN106503484A (en) * 2016-09-23 2017-03-15 浙江工业大学 A kind of multistage differential evolution Advances in protein structure prediction that is estimated based on abstract convex

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A Novel Method Using Abstract Convex Underestimation in Ab-Initio Protein Structure Prediction for Guiding Search in Conformational Feature Space;Xiao-Hu Hao et al.;《 IEEE/ACM Transactions on Computational Biology and Bioinformatics》;20160930;全文 *
一种基于局部Lipschitz下界估计支撑面的差分进化算法;周晓根 等;《计算机学报》;20161231;第39卷(第12期);第2631-2651页 *

Also Published As

Publication number Publication date
CN109448786A (en) 2019-03-08

Similar Documents

Publication Publication Date Title
Cruz et al. RNA-Puzzles: a CASP-like evaluation of RNA three-dimensional structure prediction
Bowman et al. Using generalized ensemble simulations and Markov state models to identify conformational states
CN108846256B (en) Group protein structure prediction method based on residue contact information
CN110148437B (en) Residue contact auxiliary strategy self-adaptive protein structure prediction method
Lee et al. Exascale computing: A new dawn for computational biology
Chen et al. Overcoming free-energy barriers with a seamless combination of a biasing force and a collective variable-independent boost potential
CN109872770B (en) Variable strategy protein structure prediction method combined with displacement degree evaluation
CN109448786B (en) Method for predicting protein structure by lower bound estimation dynamic strategy
CN109360601B (en) Multi-modal protein structure prediction method based on displacement strategy
CN111180004B (en) Multi-contact information sub-population strategy protein structure prediction method
CN109346126B (en) Adaptive protein structure prediction method of lower bound estimation strategy
CN109509510B (en) Protein structure prediction method based on multi-population ensemble variation strategy
CN109360597B (en) Group protein structure prediction method based on global and local strategy cooperation
Roshan Multiple sequence alignment using Probcons and Probalign
CN109461471B (en) Adaptive protein structure prediction method based on championship mechanism
CN109243526B (en) Protein structure prediction method based on specific fragment crossing
CN111161791B (en) Experimental data-assisted adaptive strategy protein structure prediction method
Liu et al. GraphCPLMQA: Assessing protein model quality based on deep graph coupled networks using protein language model
CN110197700B (en) Protein ATP docking method based on differential evolution
CN109326319B (en) Protein conformation space optimization method based on secondary structure knowledge
CN109411013B (en) Group protein structure prediction method based on individual specific variation strategy
Jahanshahi et al. A coarse-graining approach for modeling nonlinear mechanical behavior of FCC nano-crystals
CN112085246B (en) Protein structure prediction method based on residue pair distance constraint
CN111815036B (en) Protein structure prediction method based on multi-residue contact map cooperative constraint
CN109063413B (en) Method for optimizing space of protein conformation by population hill climbing iteration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant