CN109448786B

CN109448786B - Method for predicting protein structure by lower bound estimation dynamic strategy

Info

Publication number: CN109448786B
Application number: CN201810994693.9A
Authority: CN
Inventors: 张贵军; 彭春祥; 王柳静; 周晓根; 刘俊; 胡俊
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2018-08-29
Filing date: 2018-08-29
Publication date: 2021-04-06
Anticipated expiration: 2038-08-29
Also published as: CN109448786A

Abstract

A lower bound estimation dynamic strategy protein structure prediction method is characterized in that under the framework of an evolutionary algorithm, firstly, two groups of strategy pools are established, each group of strategy pools has three different variation strategies, and the variation strategies in the different strategy pools are selected according to different evolution algebra; and secondly, selecting the variant conformation according to a lower bound estimation function, and finally selecting the conformation by using a Rosetta energy function score3 and a Monte Carlo Boltzmann receiving criterion. The invention provides a lower-bound estimation dynamic strategy protein structure prediction method with higher sampling efficiency and prediction precision.

Description

Method for predicting protein structure by lower bound estimation dynamic strategy

Technical Field

The invention relates to the fields of bioinformatics and computer application, in particular to a method for predicting a protein structure by using a lower-bound estimation dynamic strategy.

Background

Protein molecules play a crucial role in the course of biochemical reactions in biological cells. Their structural models and biological activity states are of great importance to our understanding and cure of various diseases. Proteins can only produce their characteristic biological functions by folding into a specific three-dimensional structure. Therefore, to understand the function of a protein, it is necessary to obtain its three-dimensional structure.

Protein tertiary structure prediction is an important task of bioinformatics. The most challenging problem of protein conformation optimization is to search the complex protein energy model function surface, and the finer the model is, the more detailed knowledge can be provided, and the more computing resources are needed.

The rapid development of computer hardware and software technologies provides a robust, fundamental platform for the development of predictions from the tertiary structure of proteins. The development and breakthrough of the protein structure de novo prediction method further promote the wide participation of subject researchers in computer science and evolutionary computation. The de novo prediction method is directly based on a protein physical or knowledge energy model, and utilizes an optimization algorithm to search a global minimum energy conformational solution in a conformational space. The conformation space optimization method is one of the most critical factors for restricting the de novo prediction precision of the protein structure at present. Currently, many optimization methods have been started to solve this problem. The application of the optimization algorithm to the de novo prediction sampling process must first solve the following three problems: (1) the complexity of the energy. (2) High dimensional properties of the energy model. (3) Inaccuracy of the energy model. To date, we are far from constructing a sufficiently accurate force field that can direct the target sequence to fold in the correct direction, resulting in a mathematically optimal solution that does not necessarily correspond to the native structure of the target protein; furthermore, model inaccuracies can also result in an inability to objectively analyze the performance of the optimization algorithm.

The differential evolution algorithm (DE) has been successfully applied to protein structure prediction due to its advantages of simple structure, easy implementation, strong robustness, fast convergence rate, etc. However, with the increase of amino acid sequences, the degree of freedom of a protein molecular system is increased, and obtaining a global optimal solution of a large-scale protein conformation space by using the traditional population algorithm sampling becomes challenging work; secondly, the coarse-grained model reduces the conformational search space, but also causes information loss between interaction forces, thereby directly affecting the prediction accuracy.

Therefore, the conventional protein structure prediction method has disadvantages in sampling efficiency and prediction accuracy, and needs to be improved.

Disclosure of Invention

In order to overcome the defects of low sampling efficiency, poor population diversity and low prediction precision of the conventional protein structure prediction method, the invention introduces a dynamic variation strategy to guide conformational space optimization under the framework of a basic differential evolution algorithm, and provides a lower-bound estimation dynamic strategy protein structure prediction method with high sampling efficiency and high prediction precision.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a method for predicting a protein structure of a lower bound estimation dynamic strategy, comprising the following steps:

1) sequence information for a given protein of interest;

2) obtaining a fragment library file from a ROBETTA server (http:// www.robetta.org /) according to a target protein sequence, wherein the fragment library file comprises a 3 fragment library file and a 9 fragment library file;

3) setting parameters: the method comprises the following steps of (1) setting a population size NP, a maximum iteration algebra G of an algorithm, a cross factor CR, a temperature factor beta and a slope control factor M to be an iteration algebra G equal to 0;

4) population initialization: random fragment assembly to generate NP initial conformations C_i，i＝{1,2，…,NP}；

5) Each conformation C_iThe three-dimensional coordinates of each carbon α atom of i ═ {1,2, …, NP } are combined into position coordinates of the conformation

A j-dimensional element representing a spatial position coordinate of the i-th conformation, len being a length of the protein sequence;

6) for each individual in the population C_iThe following operations are carried out:

6.1) mixing C_iSet as a target individual

If g is 0 or even, then steps 6.2) to 6.4) are performed, otherwise steps 6.5) to 6.7) are performed, generating C_trial1、C_trial2、C_trial3；

6.2) randomly selecting three individuals C different from each other in the population_a、C_bAnd C_c，

Respectively from C_a、C_bRandomly selecting a 3-segment with different positions, and respectively replacing the 3-segment with C_cFragments of the corresponding positions generate a mutated conformation C_trial1；

6.3) randomly selecting four mutually different individuals C in the population_a、C_b、C_cAnd C_d，

Respectively from C_a、C_b、C_cRandomly selecting a 3-segment with different positions, and respectively replacing the 3-segment with C_dFragments of the corresponding positions generate a mutated conformation C_trial2；

6.4) randomly selecting two mutually different individuals C in the population_a、C_b，

Respectively from C_a、C_bIn the method, a 3-segment with different positions is randomly selected and respectively replaced to

Fragments of the corresponding positions generate a mutated conformation C_trial3；

6.5) randomly selecting an energy ratio from the population

Low conformation C_SLIf, if

As energy in the populationThe lowest amount of conformation, one conformation C is randomly selected from the whole population_SLThen randomly selecting two mutually unequal conformations C from the whole population_aAnd C_bAnd is and

respectively from C_a、C_bRandomly selecting a 3-segment with different positions, and respectively replacing the 3-segment with C_SLFragments of the corresponding positions generate a mutated conformation C_trial1；

6.6) randomly selecting an energy ratio from the population

Low conformation C_SLIf, if

A conformation C is randomly selected from the whole population as the lowest energy conformation in the population_SLThen randomly selecting a conformation C from the whole population_aAnd is and

respectively from C_a、C_SLIn the method, a 3-segment with different positions is randomly selected and respectively replaced to

Fragments of the corresponding positions generate a mutated conformation C_trial2；

6.7) randomly selecting an energy ratio from the population

Low conformation C_SLIf, if

A conformation C is randomly selected from the whole population as the lowest energy conformation in the population_SLThen randomly selecting two mutually unequal conformations C from the whole population_aAnd C_bAnd is and

respectively from C_SL、C_bRandomly selecting a 3-segment with different positions, and respectively replacing the 3-segment with C_aFragments of the corresponding positions generate a mutated conformation C_trial3；

6.8) finding the distance C from the population_trial1、C_trial2、C_trial3Recent individual C_near1、C_near2、C_near3Respectively combining the three-dimensional coordinates of each carbon alpha atom of the corresponding conformation into the position coordinates of the conformation, then C_trial1、C_trial2、C_trial3And C_near1、C_near2、C_near3Respectively are

6.9) if g is 0, C is calculated using Rosetta score3 energy function respectively_trial1、C_trial2、C_trial3Energy score3 (C)_trial1)、score3(C_trial2) And score3 (C)_trial3) And selecting the conformation with the smallest energy as C_trialAnd recording the spatial position coordinates thereof as

Calculating C_trialThe Euclidean distance from each conformation in the population is found, and the conformation C closest to the Euclidean distance is found_nearAnd recording the space position coordinates thereof as

6.10) if

Then C is_trialReplacement of

Otherwise according to probability

Receiving a constellation using Monte Carlo criteria;

6.11) if g>0, calculating C by equation (1) respectively_trial1、C_trial2、C_trial3Lower bound estimation UE_trial1、UE_trial2、UE_trial3；

The conformation with the smallest lower bound estimate was selected as C_trialThe corresponding lower bound estimate is denoted as UE_trialAnd recording the spatial position coordinates thereof as

6.12) if

Then C is_trialIs rejected, otherwise C is calculated_trialEnergy value score of (C) 3 (C)_trial) If, if

Then C is_trialReplacement of

Otherwise pressProbability of illumination

Receiving a constellation using Monte Carlo criteria;

7) g +1, and iteratively executing the steps 6) to 7) until G is larger than G;

8) the conformation with the lowest output energy is the final result.

The technical conception of the invention is as follows: under the frame of an evolutionary algorithm, firstly, two groups of strategy pools are established, each group of strategy pools has three different variation strategies, and the variation strategies in the different strategy pools are selected according to different evolution algebras; and secondly, selecting the variant conformation according to a lower bound estimation function, and finally selecting the conformation by using a Rosetta energy function score3 and a Monte Carlo Boltzmann receiving criterion. The invention provides a method for predicting a protein structure by a lower bound estimation dynamic strategy.

The invention has the beneficial effects that: the variation strategies of different strategy pools are selected according to population evolution algebra to guide variation, so that not only can the diversity of the population be improved, but also the problem of low sampling efficiency of the traditional evolution algorithm can be solved; and the lower bound estimation function is used for assisting the conformation selection, so that the selection efficiency is improved, the problem of prediction error caused by inaccurate energy function is relieved, and the prediction precision is improved.

Drawings

FIG. 1 is a conformational distribution diagram obtained by sampling protein 1GB1 by a lower bound estimation dynamic strategy protein structure prediction method.

FIG. 2 is a schematic diagram of the conformational update of protein 1GB1 when the lower bound estimation dynamic strategy protein structure prediction method samples the protein.

FIG. 3 is a three-dimensional structure predicted by a lower bound estimation dynamic strategy protein structure prediction method on the structure of protein 1GB 1.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Referring to fig. 1 to 3, a method for predicting a protein structure by a lower bound estimation dynamic strategy, the method comprising the steps of:

1) sequence information for a given protein of interest;

6.1) mixing C_iSet as a target individual

6.5) randomly selecting an energy ratio from the population

Low conformation C_SLIf, if

respectively from C_a、C_bRandomly selecting a 3-segment with different positions, and respectively replacing the 3-segment with C_SLCorresponding bitFragment generation of variant conformation C_trial1；

6.6) randomly selecting an energy ratio from the population

Low conformation C_SLIf, if

6.7) randomly selecting an energy ratio from the population

Low conformation C_SLIf, if

6.10) if

Then C is_trialReplacement of

Otherwise according to probability

Receiving a constellation using Monte Carlo criteria;

6.12) if

Then C is_trialReplacement of

Otherwise according to probability

Receiving a constellation using Monte Carlo criteria;

7) g +1, and iteratively executing the steps 6) to 7) until G is larger than G;

8) the conformation with the lowest output energy is the final result.

Taking alpha/beta protein 1GB1 with the sequence length of 56 as an example, the method for predicting the protein structure by the lower bound estimation dynamic strategy comprises the following steps:

1) sequence information for a given protein of interest;

3) setting parameters: the population size NP is 100, the maximum iteration algebra G of the algorithm is 1000, the crossover factor CR is 0.5, the temperature factor β is 2, the slope control factor M is 10000, and the iteration algebra G is 0;

6.1) mixing C_iSet as a target individual

6.5) randomly selecting an energy ratio from the population

Low conformation C_SLIf, if

6.6) Slave populationIn which an energy ratio is randomly selected

Low conformation C_SLIf, if

6.7) randomly selecting an energy ratio from the population

Low conformation C_SLIf, if

6.8) finding the distance C from the population_trial1、C_trial2、C_trial3Recent individual C_near1、C_near2、C_near3Are respectively paired withIf the three-dimensional coordinates of each carbon alpha atom of the conformation are combined to form the position coordinates of the conformation, C_trial1、C_trial2、C_trial3And C_near1、C_near2、C_near3Respectively are

6.10) if

Then C is_trialReplacement of

Otherwise according to probability

Receiving a constellation using Monte Carlo criteria;

6.12) if

Then C is_trialReplacement of

Otherwise according to probability

Receiving a constellation using Monte Carlo criteria;

7) g +1, and iteratively executing the steps 6) to 7) until G is larger than G;

8) the conformation with the lowest output energy is the final result.

Taking alpha/beta protein 1GB1 with sequence length of 56 as an example, the method is used for obtaining the near-natural state conformation of the protein, and the structure and the natural state conformation obtained by running 1000 generationsThe mean RMS deviation between state structures is

Minimum root mean square deviation of

The predicted three-dimensional structure is shown in fig. 3.

The foregoing illustrates one example of the invention, and it will be apparent that the invention is not limited to the above-described embodiments, but may be practiced with various modifications without departing from the essential spirit of the invention and without departing from the spirit thereof.

Claims

1. A method for predicting a protein structure by a lower bound estimation dynamic strategy is characterized by comprising the following steps: the method comprises the following steps:

1) sequence information for a given protein of interest;

2) obtaining fragment library files from a ROBETTA server according to a target protein sequence, wherein the fragment library files comprise 3 fragment library files and 9 fragment library files;

3) setting parameters: the method comprises the following steps of (1) setting a population size NP, a maximum iteration algebra G of an algorithm, a temperature factor beta and a slope control factor M to be an iteration algebra G equal to 0;

5) Each conformation C_iAre combined into position coordinates of the conformation

A j-th dimension element representing a spatial position coordinate of an i-th conformation, j being 1, 2.., 3len, len being a length of a protein sequence;

6.1) mixing C_iSet as a target individual

6.2) randomly selecting three individuals C different from each other in the population_a1、C_b1And C_c1，

Respectively from C_a1、C_b1Randomly selecting a 3-segment with different positions, and respectively replacing the 3-segment with C_c1Fragments of the corresponding positions generate a mutated conformation C_trial1；

6.3) randomly selecting four mutually different individuals C in the population_a2、C_b2、C_c2And C_d2，

Respectively from C_a2、C_b2、C_c2Randomly selecting a 3-segment with different positions, and respectively replacing the 3-segment with C_d2Fragments of the corresponding positions generate a mutated conformation C_trial2；

6.4) randomly selecting two mutually different individuals C in the population_a3、C_b3，

Respectively from C_a3、C_b3In the method, a 3-segment with different positions is randomly selected and respectively replaced to

6.5) randomly selecting an energy ratio from the population

Low conformation C_SLIf, if

A conformation C is randomly selected from the whole population as the lowest energy conformation in the population_SLThen randomly selecting two mutually unequal conformations C from the whole population_a4And C_b4And is and

respectively from C_a4、C_b4Randomly selecting a 3-segment with different positions, and respectively replacing the 3-segment with C_SLFragments of the corresponding positions generate a mutated conformation C_trial1；

6.6) randomly selecting an energy ratio from the population

Low conformation C_SLIf, if

A conformation C is randomly selected from the whole population as the lowest energy conformation in the population_SLThen randomly selecting a conformation C from the whole population_a5And is and

respectively from C_a5、C_SLIn the method, a 3-segment with different positions is randomly selected and respectively replaced to

6.7) randomly selecting an energy ratio from the population

Low conformation C_SLIf, if

A conformation C is randomly selected from the whole population as the lowest energy conformation in the population_SLThen randomly selecting two mutually unequal conformations C from the whole population_a6And C_b6And is and

respectively from C_SL、C_b6Randomly selecting a 3-segment with different positions, and respectively replacing the 3-segment with C_a6Fragments of the corresponding positions generate a mutated conformation C_trial3；