CN111951885B

CN111951885B - Protein structure prediction method based on local bias

Info

Publication number: CN111951885B
Application number: CN202010803348.XA
Authority: CN
Inventors: 彭绍亮; 陈健; 王小奇; 陈东; 李肯立
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2020-08-11
Filing date: 2020-08-11
Publication date: 2022-05-03
Anticipated expiration: 2040-08-11
Also published as: CN111951885A

Abstract

The invention belongs to the fields of bioinformatics, intelligent optimization and computer application, and discloses a protein structure prediction method based on local bias. The invention comprises the following steps: calculating the hydrophobic scale difference between the target individual variation window fragment and the fragment library fragment and the secondary structure score of each fragment in the fragment library; counting and sequencing each fragment in the fragment library; selecting the best fragment for fragment assembly, and judging whether the best fragment is received through a Monte Carlo mechanism so as to determine a variant individual; calculating the secondary structure score of the cross fragments of the variant individuals and the random individuals to determine cross individuals; and determining a comparison target individual and a cross individual energy value or a secondary structure score through random number value to select a next generation target individual. The invention avoids the defects of the traditional conformation space optimization method, such as: the sampling efficiency is low, and the prediction precision is low. The invention realizes an improved structure model scoring method by virtue of the hydrophobic characteristics of amino acids and the local structure characteristics of the amino acids.

Description

Protein structure prediction method based on local bias

The technical field is as follows:

the invention relates to the fields of bioinformatics, intelligent optimization and computer application, in particular to a protein structure prediction method based on local bias.

Background art:

protein tertiary structure prediction is one of the major research issues in the field of structural biology. Proteins are long sequences of 20 different amino acid residues that fold into unique three-dimensional structures under specific conditions, and thus perform their biological functions. At present, the prediction of protein structure by computer means has become the mainstream method in this field. The de novo prediction is one of the methods for accurately predicting the three-dimensional structure of the protein from a one-dimensional amino acid sequence, but the complexity and the high dimension of the inherent conformation search space are the most important bottleneck of the method.

The folding process of a protein is very complicated, and among factors influencing the folding process, the hydrophobic interaction of amino acids is one of the main roles, so that considering the hydrophilicity and hydrophobicity of the amino acids can be helpful to improve the sampling efficiency of the de novo prediction method. The basic factor for determining the structure of protein is its one-dimensional amino acid sequence, which is folded by coiling to form a protein molecule with a certain space structure, so that the joint consideration of the primary structure of protein, i.e. the one-dimensional amino acid sequence and the secondary structure information, will help to further improve the efficiency and precision of structure prediction.

However, the existing conformational space optimization method has defects in prediction accuracy and sampling efficiency, and therefore, the model can be constructed by combining the above influence factors, so that the improvement of the existing method can be realized.

The invention content is as follows:

in order to overcome the defects of low sampling efficiency and low prediction precision in the conventional protein conformation optimization method, the invention provides a local biased protein structure prediction method with high sampling efficiency and high prediction precision.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a method for predicting protein structure based on local bias, the method comprising the steps of:

1) given input sequence information;

2) predicting the secondary structure information of the target protein by utilizing a PSIPRED platform, and constructing a fragment library by utilizing a ROSETTA platform;

3) initializing parameters: setting population size Ps, iteration counter G and maximum genetic algebra G_maxInitial population search track length N, cross segment length c, segment length l, variation counter T, maximum count value T_max；

4) Initializing a population: starting Ps Monte Carlo tracks, and searching each track for N times to generate Ps initial individuals;

5) for each target individual x_iI ∈ { i ═ 1,. 2.., Ps } proceeds as follows:

5.1) for individual x_iCarrying out mutation operation:

5.1.1) randomly generating an integer d' e [1, l-m]And then determining the individual x_iFragment of (4) is inserted into the window [ d ', m + d']Where m is the window size;

5.1.2) according to the formula

Calculating the difference of hydrophobic scales of the window fragment and the fragment library fragment, wherein

Is an individual x_iThe hydrophobicity corresponding to the i-th residue in the window fragment,

is the hydrophobicity value of the ith residue of the fragment in the library;

5.1.3) determining the secondary structure S of the target protein in the corresponding window area according to the predicted secondary structure_pre＝{sec_kK is more than or equal to | d 'and less than or equal to d' + m }, wherein sec is_kThe epsilon { H, E, L } is a predicted secondary structure type corresponding to the kth residue in a target protein window region, and H, E and L respectively represent an alpha helix, a beta fold and a loop region;

5.1.4) according to the formula

Wherein

Calculating the secondary structure scores of the fragments in the fragment library one by one, wherein

Representing the secondary structure type corresponding to the kth residue of the fragment h in the fragment library;

5.1.5) according to the formula

Respectively calculating the scores of all the fragments in the fragment library and sequencing the fragments from high to low, wherein w₁And w₂Weight of hydrophobic scale difference and secondary structure score, Δ R, respectively^hRepresenting the hydrophobic scale difference between the h-th fragment in the fragment library and the target window,

display sheetScoring the secondary structure of the h-th fragment in the fragment library;

5.1.6) randomly selecting one segment from the first n segments with the highest score for the individual x_iGenerating fragment assembly, judging whether the fragment insertion is received by Monte Carlo mechanism, and obtaining variant individual x 'if the fragment insertion is received'_iStep 5.2) is entered, otherwise step 5.1.7 is entered)

5.1.7) updating the iteration parameter T if T is less than T_maxReturning to step 5.1.6), otherwise direct fragment assembly generates variant individual x'_iAnd updating T ═ 0;

5.2) randomly selecting an individual x_jJ ∈ {1, 2.,. Ps } and j ≠ i performs the following interleaving operation:

5.2.1) generating a random integer d ' ∈ [1, l-c ], determining the intersection region [ d ', d ' + c ];

5.2.2) determining the secondary structure S corresponding to the target protein in the cross region according to the predicted secondary structure_pre＝{sec′_kL d ' is less than or equal to k is less than or equal to d ' + c, wherein sec '_kE { H, E, L } is a predicted secondary structure type corresponding to the kth residue of the target protein cross region;

5.2.3) determination of individual x 'Using DSSP'_iThereby determining the secondary structure sequence corresponding to the cross region

Wherein the content of the first and second substances,

is x'_iThe secondary structure type corresponding to the kth residue in (c);

5.2.4) according to the formula

Wherein

Calculating individual x'_iScore of middle cross-over fragment

Wherein

Represents x'_iThe secondary structure type corresponding to the kth residue in (c);

5.2.5) determination of individuals x Using DSSP_iSecondary structure sequence corresponding to middle cross region

5.2.6) according to the formula

Wherein

Calculating an individual x_jSecondary Structure score of Mesopross fragments

5.2.7) comparison

And

is large or small, if

Then x ″)_i＝x′_iAnd go to step 5.3), otherwise, go to step 5.2.8);

5.2.8) with individuals x_jCross-fragment of (a) replaces individual x'_iCorresponding fragment in (a), generating cross individual x ″)_i；

5.3) carrying out the following selection operations on the target individuals and the crossed individuals:

5.3.1) generating a random value rn ∈ [0, 1], if rn > 0.5, entering step 5.3.2), otherwise entering step 5.3.3);

5.3.2) calculating the target individuals x respectively_iAnd crossed individuals x ″)_iEnergy E of_iAnd E ″)_iIf E ″)_i＜E_iThen x ″)_iReplacement of x_iBecoming the next generation target individual, otherwise, not performing the replacement operation, and keeping x_iAs a next generation target individual, and proceeds to step 6);

5.3.3) according to the formula

Wherein

And

wherein

Calculating target individuals x respectively_iAnd crossed individuals x ″)_iScore of secondary structure of S_TAnd S ″)_TIf S ″)_T＞S_TThen x ″)_iReplacement of x_iBecome the next generation target individual, otherwise, keep x_iAs a next generation target individual, and proceeds to step 6);

6) after step 5) is executed for each individual in the population, judging whether G is larger than G or not, wherein the iteration number G is G +1_maxIf G > G_maxThen the iteration is stopped and exited, otherwise step 5) is returned to.

The technical conception of the invention is as follows: under the basic framework of an evolutionary algorithm, carrying out variation and intersection based on amino acid hydrophobic scale and secondary structure similarity on each target individual; and guiding the population to update by a Monte Carlo mechanism and an energy function, and further selecting potential conformation to enter the next generation population.

The beneficial effects of the invention are as follows: on one hand, a conformation space sampling strategy is designed through the hydrophobic property of amino acid and secondary structure knowledge, and the searching efficiency is improved; on the other hand, the Monte Carlo mechanism and the energy function jointly guide population updating, and therefore prediction accuracy is greatly improved.

Description of the drawings:

FIG. 1 is a flow chart of a method for predicting protein structure based on local bias;

FIG. 2 is a schematic diagram of the conformational update when the structure prediction of protein 1ail is performed based on a locally biased protein structure prediction method;

FIG. 3 is a three-dimensional structure diagram obtained by performing structure prediction of protein 1ail based on a locally biased protein structure prediction method.

The specific implementation mode is as follows:

the invention is described in further detail below with reference to the accompanying drawings and specific embodiments:

modifications to the embodiments as appropriate to the teachings of the disclosure

Referring to fig. 1 to 3, a method for predicting a protein structure based on local bias, the method comprising the steps of:

1) given input sequence information;

2) predicting the secondary structure information of the target protein by using a PSIPRED platform, and constructing a fragment library by using a ROSETTA platform;

3) initializing parameters: setting the population size Ps to be 100, the iteration counter G to be 0 and the maximum genetic algebra G_max200, 2500 initial population search track length N, 6 cross segment length c, 6 segment length l, 0 variance counter T, and maximum count value T_max＝150；

5.1) for individual x_iCarrying out mutation operation:

5.1.1) random GenerationAn integer d' epsilon [1, l-m ]]And then determining the individual x_iFragment of [ d ', m + d ' is inserted into the window ']Where m is the window size;

5.1.2) according to the formula

is the hydrophobicity value of the ith residue of the fragment in the library;

5.1.4) according to the formula

Wherein

5.1.5) according to the formula

The scores of the fragments in the fragment library are calculated respectively and are sorted from high to low. Wherein, w₁And w₂Respectively the hydrophobic scale difference and the secondary structure score,ΔR^hrepresenting the hydrophobic scale difference between the h-th fragment in the fragment library and the target window,

representing the secondary structure score of the h fragment in the fragment library;

5.1.6) randomly selecting one segment from the first n segments with the highest score for the individual x_iGenerating fragment assembly, judging whether the fragment insertion is received by Monte Carlo mechanism, and obtaining variant individual x 'if the fragment insertion is received'_iStep 5.2) or step 5.1.7)

5.2.1) generating a random integer d ' e [1, l-c ], determining an intersection region [ d ', d ' + c ];

Wherein the content of the first and second substances,

is x'_iThe secondary structure type corresponding to the kth residue in (c);

5.2.4) according to the formula

Wherein

Calculating individual x'_iScore of Mega Cross-fragment

Wherein, therein

5.2.5) determination of individuals x Using DSSP_jSecondary structure sequence corresponding to middle cross region

5.2.6) according to the formula

Wherein

Calculating an individual x_jSecondary structure of the middle cross-over fragment

5.2.7) comparison

And

is large or small, if

Then x "i ═ x' i, and proceed to step 5.3), otherwise, perform step 5.2.8);

5.2.8) with individuals x_jCross-fragment of (a) replaces individual x'_iMiddle corresponding segmentGenerating a cross number x ″)_i；

5.3.3) according to the formula

Wherein

And

wherein

Calculating target individuals x respectively_iAnd crossed individuals x ″)_iSecondary structure score of S_TAnd S ″)_TIf S ″)_T＞S_TThen x ″)_iReplacement of x_iBecome the next generation target individual, otherwise, keep x_iAs a next generation target individual, and proceeds to step 6);

Using the method described above, the near day of the protein was obtained using the alpha protein 1ail with a sequence length of 70 as an exampleHowever, the conformation in which the minimum root mean square deviation of 200 individuals in the final population generation is

Mean root mean square deviation of

The prediction structure is shown in fig. 3.

The above description is the prediction effect of the present invention using 1ail protein as an example, and is not intended to limit the scope of the present invention, and various modifications and improvements can be made without departing from the scope of the present invention.

Claims

1. A method for predicting protein structure based on local bias, comprising the steps of:

1) given input sequence information;

5) for each target individual x_iI ∈ { i ═ 1, 2.., Ps } proceeds as follows:

5.1) for individual x_iCarrying out mutation operation:

5.1.1) randomly generating an integer d' e [1, l-m]And then determining the individual x_iFragment of [ d ', m + d ' is inserted into the window ']Where m is the window size;

5.1.2) according to the formula

is the hydrophobicity value of the ith residue of the fragment in the library;

5.1.4) according to the formula

Wherein

Representing the type of secondary structure corresponding to the kth residue of the fragment h in the fragment library;

5.1.5) according to the formula

5.1.7) updating the iteration parameter T if T is less than T_maxReturning to step 5.1.6), otherwise direct fragment assembly generates variant individual x'_iAnd updating, wherein T is 0;

5.2.2) determining the secondary structure S corresponding to the target protein in the cross region according to the predicted secondary structure_pre′＝{sec′_kL d ' is less than or equal to k is less than or equal to d ' + c, wherein sec '_kE { H, E, L } is a predicted secondary structure type corresponding to the kth residue in the target protein cross region;