CN107180164B

CN107180164B - Template-based multi-domain protein structure assembly method

Info

Publication number: CN107180164B
Application number: CN201710256156.XA
Authority: CN
Inventors: 张贵军; 周晓根; 郝小虎; 王柳静
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2017-04-19
Filing date: 2017-04-19
Publication date: 2020-02-21
Anticipated expiration: 2037-04-19
Also published as: CN107180164A

Abstract

Firstly, according to the structure of each domain protein, a protein structure comparison tool TM-align is utilized to compare each domain protein once, and the optimal template is found out from a multi-domain protein database; then, a Kabsch method is used for obtaining a rotation translation matrix, each domain protein is overlapped on the template, and translation rotation operation is carried out on each domain protein, so that the distance between the domain proteins is equal to the minimum allowable distance; secondly, the quality of the assembly structure is measured by adjusting the random translation and rotation of the assembly structure and utilizing the collision factors among the domain proteins, the number of atoms interacted with each other and the moving amplitude of the assembly structure relative to the template; in the assembly process, adjacent domain proteins are assembled in sequence, the assembled structures are fixed, and after all the structures are assembled, the final assembly result is output. The invention provides a template-based multi-domain protein structure assembly method with high prediction precision.

Description

Template-based multi-domain protein structure assembly method

Technical Field

The invention relates to the fields of biological informatics, intelligent optimization and computer application, in particular to a template-based multi-domain protein structure assembling method.

Background

Large-scale proteins are usually composed of multiple independently folded domain proteins, and the determination of the structure of multi-domain proteins can powerfully advance the progress of biological research. Domain proteins generally have a compact three-dimensional structure and specific biological functions, and the same domain proteins may have different binding domains. In addition, the single domain three-dimensional structure of many multi-domain proteins has been determined by X-ray diffraction, nuclear magnetic resonance, and computer predictions. Therefore, it is an important step to obtain the structure of the corresponding multi-domain protein according to the structure of the single-domain protein, and to determine the structure of the full-length protein and to understand the biological function of the full-length protein.

At present, there are two types of methods commonly used for predicting the structure of multi-domain proteins from single-domain proteins. The first type is assembled by fixing the structure of single domain proteins, followed by alignment. The second category assembles the structure of the entire multi-domain protein by enumerating the structures linking conformations between domain proteins. Among them, the first method can be regarded as the docking problem between proteins, and some docking methods can also be used for the assembly of multi-domain protein structures; unlike the first method, the second method can be regarded as a problem of de novo prediction of the structure of the relatively short amino acid sequence between the domain proteins, which has little sampling space due to the structure of the conformation between the linked domain proteins only being changed. However, the above method lacks template guidance, so that the assembly direction of the domain protein cannot be determined during assembly, thereby resulting in low prediction accuracy.

Therefore, the existing multi-domain protein structure assembly method has defects in prediction accuracy, and needs to be improved.

Disclosure of Invention

In order to overcome the defects of the existing multi-domain protein structure assembly method in the aspect of prediction precision, the invention provides a template-based multi-domain protein structure assembly method with higher prediction precision.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a method of template-based multi-domain protein structure assembly, the method comprising the steps of:

1) inputting the three-dimensional structure of each single-domain protein and the sequence information of the corresponding multi-domain protein;

2) setting the maximum number of iterations I_maxThreshold value d of collision distance_clashInteraction threshold d_contactConstant n number of atoms interacting₀；

3) The following was performed for each multi-domain protein in the pdb (protein Data bank) library to determine the assembly template:

3.1) finding out the optimal alignment position of the first protein according to the protein structure alignment tool TM-align, and recording the template alignment score TM-score₁；

3.2) finding the best alignment position for the second domain protein using TM-align starting from the last residue position of the first domain protein alignment and recording TM-score₂；

3.3) repeating the step 3.2) to find the optimal alignment position of other domain proteins in turn, and recording TM-socre₃，TM-socre₄,…,TM-socre_NN is the total number of domain proteins;

3.4) calculating the score of the templateWherein score_iRepresents the score of the t-th template, TM-score_iDenotes the alignment score, L, of the ith domain protein_iIs the sequence length of the ith domain protein;

3.5) calculating the score of each template through the steps 3.1) -3.4), and selecting the protein with the highest score as the template;

4) each domain protein was overlaid onto the template by the following procedure:

4.1) comparing the C α atom of the query protein with the C α atom of the template one by one, and then obtaining the result according to the Kabsch method

Rotation matrixAnd a translation vector (t)₁,t₂,t₃)，u_stWhere s 1,2,3, t 1,2,3 denotes the t-th element of the s-th row of the rotation matrix, t_sRepresents the s-th translation vector;

4.2) Each C α atom for the query protein

Make a rotational translation

Wherein,

(ii) an s-dimensional coordinate representing the mth C α atom of the nth domain protein;

5) fixing the position of the nth domain protein, translating the nth +1 th domain protein according to the following formula to ensure that the root mean square deviation RMSD between the connection points is

Wherein l_nThe length of the nth domain protein of vitamin,

is the s-dimensional coordinate of the last C α atom of the nth domain protein,is the s-dimensional coordinate of the first C α atom of the n + 1-th domain protein, d_n,n+1Is the euclidean distance between the last C α atom of the nth domain protein and the first C α atom of the n +1 th domain protein;

6) calculating the root mean square deviation E of the C α atom between the current protein and the template_RMSD；

7) Calculating Euclidean distance between C α atom of the nth domain protein and C α atom of the n +1 th domain protein, and counting the distance to be less than d_clashNumber n of_clashAnd recording the corresponding distance

Computing a conflict score between domains

8) Counting that the distance in step 7 is less than d_contactNumber n of_contactAnd calculating an interaction score

9) Calculating the energy E ═ w of the current protein₁E_RMSD+w₂E_clash+w₃E_contactWherein w is₁,w₂,w₃Are the respective weight values;

10) iteratively determining the lowest energy assembly structure by:

10.1) determining the rotation axis:

X₃theta, where theta is 1-2rand[0,1]，φ＝2πrand[0,1]，rand[0,1]A random decimal between 0 and 1;

10.2) randomly generating the rotation angle γ ═ 2rand [0,1 ═ 2rand]-1 and an assembly translation vector (T)₁,T₂,T₃) Wherein T is_s＝0.3(2rand[0,1]-1),s＝1,2,3；

10.3) determining an assembly rotation matrix:

wherein α is cos γ, β is sin γ, U_stThe t-th element representing the s-th row of the assembled rotation matrix, s being 1,2,3, t being 1,2, 3;

10.4) rotation and translation operations on each C α atom of the n +1 th domain protein:

wherein,

denotes the s-dimensional coordinate of the first C α atom of the n + 1-th domain protein, s ═ 1,2,3,

(ii) the s-dimensional coordinate representing the mth C α atom of the (n + 1) -th domain protein, s being 1,2, 3;

10.5) calculating the energy of the current assembly structure according to the steps 6) -9), and if the energy is reduced, accepting the current assembly structure;

11) repeating step 10) I_maxSecondly, the structure of the last time is an assembly structure of the nth domain protein and the (n + 1) th domain protein;

12) when the assembly of the N +1 th protein is finished, fixing the structure of the first N +1 domain protein, assembling the N +2 th domain protein according to the steps 5) -11), and outputting the final assembly structure until all the N domain proteins are assembled.

The technical conception of the invention is as follows: firstly, according to the structure of each domain protein, a protein structure comparison tool TM-align is utilized to compare each domain protein once, and an optimal template is found out from a multi-domain protein database; then, a Kabsch method is used for obtaining a rotation translation matrix, each domain protein is overlapped on the template, and translation rotation operation is carried out on each domain protein, so that the distance between the domain proteins is equal to the minimum allowable distance; secondly, the quality of the assembly structure is measured by adjusting the random translation and rotation of the assembly structure and utilizing the collision factors among the domain proteins, the number of atoms interacted with each other and the moving amplitude of the assembly structure relative to the template; in the assembly process, adjacent domain proteins are assembled in sequence, the assembled structures are fixed, and after all the structures are assembled, the final assembly result is output.

The beneficial effects of the invention are as follows: the assembly is guided through the template, the direction information of the assembled structure is obtained, and the quality of the assembled structure is measured according to the conflict factor and the interaction factor among the domain proteins and the change between the assembled structure and the template, so that the effect of guiding the assembly is achieved, and the prediction progress of the whole protein is improved.

Drawings

FIG. 1 is a flow diagram of a template-based multi-domain protein structure assembly method.

FIG. 2 is the result of the assembly of the multi-domain protein 3nd1A by the template-based multi-domain protein structure assembly method.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Referring to fig. 1 and 2, a template-based multi-domain protein structure assembly method includes the steps of:

3.4) calculating the score of the template

Wherein score_iRepresents the score of the t-th template, TM-score_iDenotes the alignment score, L, of the ith domain protein_iIs the sequence length of the ith domain protein;

4.1) comparing C α atoms of the query protein with C α atoms of the template one by one, and then obtaining a rotation matrix according to a Kabsch method

And a translation vector (t)₁,t₂,t₃)，u_stWhere s 1,2,3, t 1,2,3 denotes the t-th element of the s-th row of the rotation matrix, t_sRepresents the s-th translation vector;

4.2) Each C α atom for the query protein

Make a rotational translation

Wherein,

Wherein l_nThe length of the nth domain protein of vitamin,

is the s-dimensional coordinate of the last C α atom of the nth domain protein,

is the s-dimensional coordinate of the first C α atom of the n + 1-th domain protein, d_n,n+1Is the euclidean distance between the last C α atom of the nth domain protein and the first C α atom of the n +1 th domain protein;

Computing a conflict score between domains

10) iteratively determining the lowest energy assembly structure by:

10.1) determining the rotation axis:

X₃θ, wherein θ is 1-2rand [0,1 ═ θ]，

φ＝2πrand[0,1]，rand[0,1]A random decimal between 0 and 1;

10.3) determining an assembly rotation matrix:

wherein,denotes the s-dimensional coordinate of the first C α atom of the n + 1-th domain protein, s ═ 1,2,3,

The multi-domain protein 3nd1A with the sequence length of 244 in the embodiment is an example, and the template-based multi-domain protein structure assembly method comprises the following steps:

2) setting the maximum number of iterations I_max30000, collision distance threshold d_clash3.75, interaction threshold d_contactNumber of atoms interacted constant n ═ 8₀＝87；

3) The following procedure was performed for each multi-domain protein in the pdb (protein Data bank) library to determine the assembly template, as follows:

3.6) finding out the best alignment position of the first protein according to the protein structure alignment tool TM-align, and recording the template alignment score TM-score₁；

3.7) finding the best alignment position for the second domain protein using TM-align starting from the last residue position of the first domain protein alignment and recording TM-score₂；

3.8) repeating the step 3.2) to find the optimal alignment position of other domain proteins in turn, and recording TM-socre₃，TM-socre₄,…,TM-socre_NN is the total number of domain proteins;

3.9) calculating the score of the template

3.10) calculating the score of each template through the steps 3.1) -3.4), and selecting the protein with the highest score as the template;

4.3) comparing C α atoms of the query protein with C α atoms of the template one by one, and then obtaining a rotation matrix according to a Kabsch method

4.4) Each C α atom for the query protein

Make a rotational translation

Wherein,(ii) an s-dimensional coordinate representing the mth C α atom of the nth domain protein;

Wherein l_nThe length of the nth domain protein of vitamin,

Computing a conflict score between domains

9) Calculating the energy E ═ w of the current protein₁E_RMSD+w₂E_clash+w₃E_contactWherein w is₁＝w₂＝1,w₃0.35 is the weight value；

10) Iteratively determining the lowest energy assembly structure by:

10.1) determining the rotation axis:

X₃θ, wherein θ is 1-2rand [0,1 ═ θ]，

φ＝2πrand[0,1]，rand[0,1]A random decimal between 0 and 1;

10.3) determining an assembly rotation matrix:

10.5) rotation and translation operations on each C α atom of the n +1 th domain protein:

wherein,

represents the s-dimensional coordinate of the first C α atom of the n + 1-th domain protein,(ii) the s-dimensional coordinate representing the m-th C α atom of the n + 1-th domain protein;

10.6) calculating the energy of the current assembly structure according to the steps 6) -9), and if the energy is reduced, accepting the current assembly structure;

Using the sequence length of 244 of the two-domain multi-domain protein 3nd1A as an example, the above method is used to assemble the near-native conformation of the multi-domain protein with the root mean square deviation of

TM-score was 0.997 and the predicted structure is shown in FIG. 2.

The above description is the optimization effect of the present invention using 3nd1A protein as an example, and is not intended to limit the scope of the present invention, and various modifications and improvements can be made without departing from the scope of the present invention.

Claims

1. A template-based multi-domain protein structure assembly method, characterized in that: the multi-domain protein structure assembly comprises the following steps:

3) The following operations were performed for each multi-domain protein in the PDB library, thereby determining an assembly template:

3.4) calculating the score of the template

Wherein score_iRepresents the score of the ith template, TM-score_iDenotes the alignment score, L, of the ith domain protein_iIs the sequence length of the ith domain protein;

And a translation vector (t)₁,t₂,t₃)，u_stWhere s 1,2,3, t 1,2,3 denotes the t-th element of the s-th row of the rotation matrix, t_sIs shown ass translation vectors;

4.2) Each C α atom for the query protein

Make a rotational translation

Wherein l_nIs the length of the nth domain protein,

7) Calculating Euclidean distance between C α atom in the nth domain protein and C α atom in the (n + 1) th domain protein, and countingDistance less than d_clashNumber n of_clashAnd recording the corresponding distance

Computing a conflict score between domains

10) iteratively determining the lowest energy assembly structure by:

10.1) determining the rotation axis:

X₃θ, wherein θ is 1-2rand [0,1 ═ θ]，φ＝2πrand[0,1]，rand[0,1]A random decimal between 0 and 1;

10.3) determining an assembly rotation matrix:

wherein α is cos γ, β is sin γ, U_stRepresenting the s-th row of the assembled rotation matrixThe number t of the elements is,

s＝1,2,3,t＝1,2,3；

wherein,