CN107704725B

CN107704725B - Discontinuous multi-domain protein structure assembly method

Info

Publication number: CN107704725B
Application number: CN201710684511.3A
Authority: CN
Inventors: 张贵军; 周晓根; 郝小虎; 王柳静
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2017-08-11
Filing date: 2017-08-11
Publication date: 2020-12-01
Anticipated expiration: 2037-08-11
Also published as: CN107704725A

Abstract

A discontinuous multi-domain protein structure assembly method comprises the steps of firstly, utilizing a protein sequence threading comparison tool to quickly search a protein library, and selecting a plurality of templates with the highest scores; then, accurately overlapping each single domain structure on the template according to the threading comparison information, thereby extracting the direction information in the template; secondly, carrying out random rotation and translation operations on the overlapped structure, measuring the quality of the current structure by using the interaction and the conflict distance between domains, simultaneously adding a template control factor to prevent the assembled structure from deviating the direction of the template, and adding a boundary distance factor to promote the full connection of the boundaries of continuous and discontinuous domain proteins; and finally, selecting an optimal structure obtained by a plurality of template groups according to the energy. The invention provides a discontinuous multi-domain protein structure assembly method with high prediction precision.

Description

Discontinuous multi-domain protein structure assembly method

Technical Field

The invention relates to the fields of biological informatics, intelligent optimization and computer application, in particular to a structural assembly method of discontinuous multi-domain protein.

Background

More than 70% of the proteins in a protein library are multi-domain proteins, i.e. one protein comprises multiple substructures. Of these multi-domain proteins, more than 40% comprise one or more discontinuous single domain proteins, e.g., more than 15% of the proteins in the CATH protein database are multi-domain proteins, of which more than 18% comprise at least one discontinuous domain protein. These single domain proteins have a discontinuous sequence and are divided into a plurality of parts, but in the case of three-dimensional structures, they all belong to one domain, although they are divided into a plurality of parts. It can be seen that the discontinuous multi-domain proteins account for a high proportion of the whole protein library, and therefore, how to predict the three-dimensional structures of the proteins by an assembly method is extremely important.

Currently, the most commonly used method for assembling the structure of multi-domain proteins is to sample the junction region between the continuous single-domain protein and the discontinuous single-domain protein, i.e., to fix the individual parts of the discontinuous single-domain protein, thereby sampling the junction region between the individual parts and the continuous single-domain protein, and thus obtaining the optimal junction structure. In the method, the energy function adopted in the sampling of the connecting region is the energy function for predicting the structure of the single-domain protein, and although the energy function is effective for predicting the structure of the single-domain protein, the sampling effect is not good for the connection between the discontinuous domain and the continuous domain due to the difference of energy items such as acting force and solvent accessibility among the single-domain proteins and the domain, so that the accuracy of predicting the structure of the discontinuous protein is low.

Therefore, the existing discontinuous multi-domain protein structure assembly method has defects in prediction precision, and needs to be improved.

Disclosure of Invention

In order to overcome the defect of low prediction precision of the existing discontinuous multi-domain protein structure assembly method, the invention provides the discontinuous multi-domain protein structure-based assembly method with high prediction precision.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a method of discontinuous multi-domain protein structure assembly, the method comprising the steps of:

1) inputting sequence information of the protein to be assembled and the three-dimensional structure of each single domain;

2) setting the maximum number of iterations G_maxThe number T of the assembly templates and a conflict distance threshold value d_clashAnd interaction threshold d_contact；

3) Searching the top T template proteins with the highest scores from a protein library by using a protein sequence threading alignment tool FFAS 3D;

4) the following operations were performed for each template protein:

4.1) overlapping each single domain of the protein to be assembled on the template according to the sequence alignment information of FFAS 3D;

4.2) immobilising the discontinuous Single Domain protein, translating the continuous protein inserted in between according to the following formula such that the distance between the point of attachment of the first part of the discontinuous protein and the point of attachment of the continuous proteinFor conflict distance threshold d_clash；

Wherein the content of the first and second substances,

denotes the s-dimensional coordinate of the m-th C.alpha.atom of the discontinuous domain protein, L denotes the discontinuous domain protein, N denotes the continuous domain protein, L and N are only labels that distinguish the discontinuous and continuous domain proteins, L is the sequence length of the discontinuous domain protein,

is the s-dimensional coordinate of the last C α atom of the first portion of the discontinuous domain protein, s-1, 2,3,

is the s-dimensional coordinate of the first C α atom of the continuous domain protein, d is the euclidean distance between the first C α atom of the continuous domain protein and the last C α atom of the first portion of the discontinuous domain protein;

4.3) calculating the energy of the current protein, and the process is as follows:

4.3.1) calculating the pairwise distance between C alpha atoms in the continuous domain protein and the discontinuous domain protein,

4.3.2) statistical distance less than the collision distance threshold d_clashAll distances d of_c,c＝1,2,...,n_clashWherein n is_clashIs the total number of collision distances and calculates the collision distance energy value

4.3.3) statistical distance less than interaction threshold d_contactN is the number of distances_contactAnd calculating the energy value of the interaction distance

Wherein n is₀Interacting with each otherAn atomic number constant, which takes the value of 0.306(l + q), wherein l and q respectively represent the sequence lengths of the discontinuous domain protein and the continuous domain protein;

4.3.4) calculating the distance between the last C.alpha.atom of the first part of the discrete domain protein and the first C.alpha.atom of the continuous domain protein, and calculating the distance between the first C.alpha.atom of the second part of the discrete domain protein and the last C.alpha.atom of the continuous domain protein, which distances add to obtain the boundary distance energy E_boundary；

4.3.5) calculating the root mean square deviation E of the C.alpha.atoms between the current protein and the template_RMSD；

4.3.6) calculating the energy value E of the current protein_old＝E_clash+E_contact+E_bounday+E_RMSD；

4.4) determining the rotation and translation according to the following operations:

4.4.1) randomly generating a 6-dimensional vector v ═ (v)₁,v₂,…,v₆) Wherein v is_jJ is 1,2, 6 is the jth dimension element of the vector v, and takes a random number between 0 and 1;

4.4.2) determine the axis of rotation z ═ z (z)₁,z₂,z₃) Wherein

z₃θ, and θ is 1-2v₁，

φ＝2πv₂；

4.4.3) according to rotation angle γ 2v₃-1 determining a rotation matrix u:

wherein u is_stWhere s 1,2,3, t 1,2,3 denotes the t-th element of the s-th row of the rotation matrix;

4.4.4) determine the translation vector p ═ p (p)₁,p₂,p₃) Which isIn (c) p_s＝0.3(2v_s+3-1), s-1, 2,3 is the s-th dimension element of the translation vector;

4.4.5) in the first C.alpha.atom of the continuous domain protein

For the rotation point, all C α atoms were rotated and translated:

wherein the content of the first and second substances,

the s-dimensional coordinate s ═ 1,2,3 for the mth C α atom of the continuous domain protein;

4.4.6) calculating the energy function value E of the current protein according to the step 4.3)_new；

4.4.7) if E_new＜E_oldThen, the current assembly structure is received and recorded as E_old＝E_new；

4.4.8) if E_new≥E_oldThen calculate the acceptance probability

Wherein E is the base of the natural logarithm, if the random number rand (0,1) between 0 and 1 is less than P, then the current structure is accepted and the E is recorded_old＝E_new；

4.5) performing iteration operation according to the step 4.4) until G is reached by iteration for a plurality of times_maxUntil, and regard structure with lowest energy as the result that the template assembles at present;

5) and comparing the energy of the structures assembled by the T templates, and selecting the structure with the lowest energy as a final assembled structure.

The technical conception of the invention is as follows: firstly, rapidly searching a protein library by using a protein sequence threading comparison tool, and selecting a plurality of templates with the highest scores; then, accurately overlapping each single domain structure on the template according to the threading comparison information, thereby extracting the direction information in the template; secondly, carrying out random rotation and translation operations on the overlapped structure, measuring the quality of the current structure by using the interaction and the conflict distance between domains, simultaneously adding a template control factor to prevent the assembled structure from deviating the direction of the template, and adding a boundary distance factor to promote the full connection of the boundaries of continuous and discontinuous domain proteins; and finally, selecting an optimal structure obtained by a plurality of template groups according to the energy.

The beneficial effects of the invention are as follows: the template is searched by a sequence threading comparison tool, the searching efficiency is improved, then the template is used for guiding assembly, and a plurality of energy items are designed to evaluate the assembly structure, so that the prediction precision is improved.

Drawings

FIG. 1 is a schematic of the assembly process of a discontinuous multi-domain protein structure.

FIG. 2 is the result of the assembly of the discontinuous multi-domain protein structure to the discontinuous multi-domain protein 3kc 2A.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Referring to fig. 1 and 2, a discontinuous multi-domain protein structure assembly method includes the steps of:

4) the following operations were performed for each template protein:

4.2) immobilising the discontinuous single domain protein by translating the continuous protein inserted between them according to the following formula such that the distance between the point of attachment of the first part of the discontinuous protein and the point of attachment of the continuous protein isCollision distance threshold d_clash；

Wherein the content of the first and second substances,

Wherein n is₀The source of the interactionA sub-number constant, which takes the value of 0.306(l + q), wherein l and q respectively represent the sequence lengths of the discontinuous domain protein and the continuous domain protein;

4.3.4) calculating the distance between the last C α atom A of the first portion of the discrete domain protein and the first C α atom B of the continuous domain protein, and calculating the distance between the first C α atom D of the second portion of the discrete domain protein and the last C α atom C of the continuous domain protein, as shown in FIG. 1, which add to the boundary distance energy E_boundary；

4.4.2) determine the axis of rotation z ═ z (z)₁,z₂,z₃) Wherein

z₃θ, and θ is 1-2v₁，

φ＝2πv₂；

4.4.3) according to rotation angle γ 2v₃-1 determining a rotation matrix u:

4.4.4) determine the translation vector p ═ p (p)₁,p₂,p₃) Wherein p is_s＝0.3(2v_s+3-1), s-1, 2,3 is the s-th dimension element of the translation vector;

4.4.5) in the first C.alpha.atom of the continuous domain protein

For the rotation point, all C α atoms were rotated and translated:

wherein the content of the first and second substances,

4.4.8) if E_new≥E_oldThen calculate the acceptance probability

The discontinuous multi-domain protein 3kc2A with the sequence length of 324 is an example, and the discontinuous multi-domain protein structure assembling method comprises the following steps:

2) setting the maximum number of iterations G_max30000, number of assembled templates T5, collision distance threshold d_clash3.8, interaction threshold d_contact＝8.0；

4) the following operations were performed for each template protein:

4.2) immobilising the discontinuous single domain protein by translating the continuous protein inserted between them according to the following formula such that the distance between the point of attachment of the first part of the discontinuous protein and the point of attachment of the continuous protein is the collision distance threshold d_clash；

Wherein the content of the first and second substances,

Wherein n is₀The number constant of atoms interacting, which is 0.306(l + q), wherein l and q represent the sequence lengths of the discontinuous domain protein and the continuous domain protein, respectively;

4.4.2) determine the axis of rotation z ═ z (z)₁,z₂,z₃) Wherein

z₃θ, and θ is 1-2v₁，

φ＝2πv₂；

4.4.3) according to rotation angle γ 2v₃-1 determining a rotation matrix u:

4.4.5) in the first C.alpha.atom of the continuous domain protein

For the rotation point, all C α atoms were rotated and translated:

wherein the content of the first and second substances,

4.4.8) if E_new≥E_oldThen calculate the acceptance probability

Using the discontinuous multi-domain protein 3kc2A with the sequence length of 324 as an example, the near-native conformation of the multi-domain protein was obtained by the above assembly method, and the alignment score TM-score with the native structure was 0.976, and the predicted structure is shown in FIG. 2.

The above description is the assembly effect of the present invention using 3kc2A protein as an example, and is not intended to limit the scope of the present invention, and various modifications and improvements can be made without departing from the scope of the present invention.

Claims

1. A method of assembling a discontinuous multi-domain protein structure, comprising: the discontinuous multi-domain protein structure assembly comprises the following steps:

4) the following operations were performed for each template protein:

4.2) immobilization of discontinuous Single Domain proteins, translational manipulation of the continuous protein inserted in between according to the following formulaMaking the distance between the point of attachment of the first part of the discontinuous protein and the point of attachment of the continuous protein a collision distance threshold d_clash；

Wherein the content of the first and second substances,

Wherein n is₀The number constant of atoms for interaction is 0.306(l + q), and l and q respectively represent the sequence lengths of the discontinuous domain protein and the continuous domain protein;

4.4.2) determine the axis of rotation z ═ z (z)₁,z₂,z₃) Wherein

z₃θ, and θ is 1-2v₁，

φ＝2πv₂；

4.4.3) according to rotation angle γ 2v₃-1 determining a rotation matrix u:

wherein u is_stB 1,2,3, t 1,2,3 denotes the t-th element of the b-th row of the rotation matrix;

4.4.5) in the first C.alpha.atom of the continuous domain protein

For the rotation point, all C α atoms were rotated and translated:

wherein the content of the first and second substances,

4.4.8) if E_new≥E_oldThen calculate the acceptance probability