CN107180164A

CN107180164A - A kind of multiple domain protein structure assemble method based on template

Info

Publication number: CN107180164A
Application number: CN201710256156.XA
Authority: CN
Inventors: 张贵军; 周晓根; 郝小虎; 王柳静
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2017-04-19
Filing date: 2017-04-19
Publication date: 2017-09-19
Anticipated expiration: 2037-04-19
Also published as: CN107180164B

Abstract

A kind of multiple domain protein structure assemble method based on template, first, according to the structure of each domain albumen, compares instrument TM align using protein structure and once each domain albumen is compared, optimal Template is found out from multiple domain albumen database；Then, rotation translation matrix is obtained using Kabsch methods, each domain albumen is overlapped onto in template, and translation rotation process is carried out to each domain albumen, the distance between its is equal to minimum allowable range；Secondly, it is adjusted by entering row stochastic translation and rotation to package assembly, and the quality of package assembly is weighed using the conflict factor between the albumen of domain, the atomic quantity of interaction, and the mobile range of package assembly opposite formwork；In an assembling process, adjacent domain albumen is assembled successively, and the structure assembled is fixed, after all structures are completed, the last assembling result of output.The present invention provides a kind of higher multiple domain protein structure assemble method based on template of precision of prediction.

Description

A kind of multiple domain protein structure assemble method based on template

Technical field

The present invention relates to a kind of biological information, intelligent optimization, computer application field, more particularly to a kind of base In the multiple domain protein structure assemble method of template.

Background technology

Large-scale protein is generally made up of multiple independent domain albumen folded, and being determined to for multiple domain protein structure is strong Promote biological study progress.Domain albumen generally has compact three-dimensional structure and specific biological function, and same domain albumen can There can be different knots to reach.In addition, the single domain three-dimensional structure of many multiple domain albumen by X-ray diffraction, nuclear magnetic resonance with And the prediction such as computer is determined.Therefore, the structure for obtaining its corresponding multiple domain albumen according to the structure of single domain albumen is one Important step, while determining full-length proteins structure and understanding a necessary links of its biological function.

At present, predict that the structure of multiple domain albumen has the conventional method of two classes from single domain albumen.The first kind passes through fixation The structure of single domain albumen, then alignment assembling.Equations of The Second Kind by enumerating link field protein between the structure of conformation assemble The structure of whole multiple domain albumen.Wherein, first kind method is considered as the docking problem between albumen, some docking calculations It may be used as the assembling of multiple domain protein structure；Different from first kind method, Equations of The Second Kind method is considered as between the albumen of domain relatively The structure ab initio prediction problem of shorter amino acid sequence, due to only changing the structure of the conformation between link field albumen, it is adopted Sample space very little.However, because the above method lacks template-directed, so that the assembling side of domain albumen can not be determined in assembling To, and then cause precision of prediction relatively low.

Therefore, existing multiple domain protein structure assemble method has defect, it is necessary to improve in terms of precision of prediction.

The content of the invention

In order to overcome the shortcomings of existing multiple domain protein structure assemble method in terms of precision of prediction, the present invention provides a kind of The higher multiple domain protein structure assemble method based on template of precision of prediction.

The technical solution adopted for the present invention to solve the technical problems is：

A kind of multiple domain protein structure assemble method based on template, the described method comprises the following steps：

1) sequence information of three-dimensional structure multiple domain albumen corresponding with its of each single domain albumen is inputted；

2) maximum iteration I is set_max, conflict distance threshold d_clash, interaction threshold value d_contact, interaction Atomic quantity constant n₀；

3) it is directed to each multiple domain albumen in PDB (Protein Data Bank) storehouse and performs following operation, so that it is determined that group Decking：

3.1) instrument TM-align is compared according to protein structure and finds out first albumen optimal comparison position, and record it Template matching score TM-score₁；

3.2) since the resi-dues of last that first domain albumen is compared, second is found out using TM-align The optimal comparison position of domain albumen, and and record TM-score₂；

3.3) repeat step 3.2) the optimal comparison position of other domain albumen is sequentially found, and record TM-socre₃, TM- socre₄,…,TM-socre_N, N is the total quantity of domain albumen；

3.4) score of the template is calculatedWherein score_iRepresent t-th template Point, TM-score_iRepresent the comparison score of i-th of domain albumen, L_iFor the sequence length of i-th of domain albumen；

3.5) by step 3.1) -3.4) calculate after the score for obtaining each template, the albumen for choosing highest scoring is made For template；

4) each domain albumen is overlapped onto in template by the following method, process is as follows：

4.1) the C alpha atoms of albumen will be inquired about and the C α of template is compared one by one, then tried to achieve according to Kabsch methods

Spin matrixWith translation vector (t₁,t₂,t₃), u_st, s=1,2,3, t=1,2,3 represent rotation T-th of element of the s rows of matrix, t_sRepresent s-th of translation vector；

4.2) for each C alpha atom of inquiry albumenMake rotation translation

Wherein,Represent the s dimension coordinates of m-th of C alpha atom of n-th of domain albumen；

5) fix the position of n-th of domain albumen, translated (n+1)th domain albumen according to equation below, make its tie point it Between root-mean-square-deviation RMSD be

Wherein, l_nThe length of n-th of domain albumen is tieed up,S for last C alpha atom of n-th of domain albumen ties up seat Mark,For the s dimension coordinates of first C alpha atom of (n+1)th domain albumen, d_n,n+1For last C α of n-th of domain albumen Euclidean distance between atom and first C alpha atom of (n+1)th domain albumen；

6) the root-mean-square-deviation E of the C alpha atoms between current albumen and template is calculated_RMSD；

7) Euclidean distance between C alpha atoms in the C alpha atoms and (n+1)th domain albumen of n-th of domain albumen is calculated, and is counted Distance is less than d_clashQuantity n_clash, and record corresponding distanceConflict score between computational fields

8) distance is less than d in statistic procedure 7_contactQuantity n_contact, and calculate interaction score

9) ENERGY E=w of current albumen is calculated₁E_RMSD+w₂E_clash+w₃E_contact, wherein, w₁,w₂,w₃For respective weight Value；

10) package assembly of minimum energy is determined by following operation iteration, process is as follows：

10.1) rotary shaft is determined：X₃=θ, wherein, θ=1-2rand [0,1],φ=2 π rand [0,1], rand [0,1] are the random decimal between 0 and 1；

10.2) random generation anglec of rotation γ=2rand [0,1] -1 and assembling translation vector (T₁,T₂,T₃), wherein T_s= 0.3 (2rand [0,1] -1), s=1,2,3；

10.3) assembling spin matrix is determined：

Wherein, α=cos γ, β=sin γ, U_stT-th of element of the s rows of expression assembling spin matrix, s=1,2,3, T=1,2,3；

10.4) rotation and translation operation is made to each C alpha atom of (n+1)th domain albumen：

Wherein,The s dimension coordinates of first C alpha atom of (n+1)th domain albumen of expression, s=1,2,3,

Represent the s dimension coordinates of m-th of C alpha atom of (n+1)th domain albumen, s=1,2,3；

10.5) according to step 6) -9) energy of current package assembly is calculated, if energy reduces, receive current assembling Structure；

11) repeat step 10) I_maxSecondary, then the structure of last time is the group of n-th of domain albumen and (n+1)th domain albumen Assembling structure；

12) after (n+1)th albumen is completed, then the structure of n+1 domain albumen before fixing, according to step 5) -11) group The n-th+2 domain albumen are filled, after all N number of domain albumen are completed, last package assembly are exported.

The present invention technical concept be：First, according to the structure of each domain albumen, instrument TM- is compared using protein structure Once each domain albumen is compared by align, and optimal Template is found out from multiple domain albumen database；Then, Kabsch side is utilized Method obtains rotation translation matrix, and each domain albumen is overlapped onto in template, and carries out translation rotation process to each domain albumen, makes its it Between distance be equal to minimum allowable range；Secondly, it is adjusted by entering row stochastic translation and rotation to package assembly, and profit Weighed with the conflict factor between the albumen of domain, the atomic quantity of interaction, and the mobile range of package assembly opposite formwork Measure the quality of package assembly；In an assembling process, adjacent domain albumen is assembled successively, and the structure assembled is fixed, when After all structures are completed, the last assembling result of output.

Beneficial effects of the present invention are shown：Assembling is instructed by template, the directional information of package assembly, and root is obtained Package assembly is weighed according to the change between the conflict factor between the albumen of domain, interaction factor and package assembly and template Quality, so as to reach the effect for instructing assembling, and then improves the prediction progress of whole albumen.

Brief description of the drawings

Fig. 1 is the flow chart of the multiple domain protein structure assemble method based on template.

Fig. 2 is the result that the multiple domain protein structure assemble method based on template is assembled to multiple domain albumen 3nd1A.

Embodiment

The invention will be further described below in conjunction with the accompanying drawings.

Referring to Figures 1 and 2, a kind of multiple domain protein structure assemble method based on template, comprises the following steps：

4.1) the C alpha atoms of albumen will be inquired about and the C α of template is compared one by one, spin moment is then tried to achieve according to Kabsch methods Battle arrayWith translation vector (t₁,t₂,t₃), u_st, s=1,2,3, t=1,2,3 represents the of the s rows of spin matrix T element, t_sRepresent s-th of translation vector；

4.2) for each C alpha atom of inquiry albumenMake rotation translation

10.3) assembling spin matrix is determined：

The multiple domain protein 3nd1A that the present embodiment sequence length is 244 is embodiment, a kind of multiple domain albumen based on template Structure assemble method, comprises the following steps：

2) maximum iteration I is set_max=30000, conflict distance threshold d_clash=3.75, interact threshold value d_contact=8, the atomic quantity constant n of interaction₀=87；

3) it is directed to each multiple domain albumen in PDB (Protein Data Bank) storehouse and performs following operation, so that it is determined that group Decking, process is as follows：

3.6) instrument TM-align is compared according to protein structure and finds out first albumen optimal comparison position, and record it Template matching score TM-score₁；

3.7) since the resi-dues of last that first domain albumen is compared, second is found out using TM-align The optimal comparison position of domain albumen, and and record TM-score₂；

3.8) repeat step 3.2) the optimal comparison position of other domain albumen is sequentially found, and record TM-socre₃, TM- socre₄,…,TM-socre_N, N is the total quantity of domain albumen；

3.9) score of the template is calculatedWherein score_iRepresent t-th template Point, TM-score_iRepresent the comparison score of i-th of domain albumen, L_iFor the sequence length of i-th of domain albumen；

3.10) by step 3.1) -3.4) calculate after the score for obtaining each template, the albumen for choosing highest scoring is made For template；

4.3) the C alpha atoms of albumen will be inquired about and the C α of template is compared one by one, spin moment is then tried to achieve according to Kabsch methods Battle arrayWith translation vector (t₁,t₂,t₃), u_st, s=1,2,3, t=1,2,3 represents the of the s rows of spin matrix T element, t_sRepresent s-th of translation vector；

4.4) for each C alpha atom of inquiry albumenMake rotation translation

9) ENERGY E=w of current albumen is calculated₁E_RMSD+w₂E_clash+w₃E_contact, wherein, w₁=w₂=1, w₃=0.35 is Respective weighted value；

10.3) assembling spin matrix is determined：

10.5) rotation and translation operation is made to each C alpha atom of (n+1)th domain albumen：

Wherein,The s dimension coordinates of first C alpha atom of (n+1)th domain albumen are represented,Represent (n+1)th domain egg The s dimension coordinates of m-th white of C alpha atom；

10.6) according to step 6) -9) energy of current package assembly is calculated, if energy reduces, receive current assembling Structure；

Using sequence length be 244 the multiple domain protein 3nd1A comprising two domains as embodiment, assembled with above method The nearly native state conformation of the multiple domain protein is obtained, root-mean-square-deviation isTM-score is 0.997, pre- geodesic structure As shown in Figure 2.

Described above is the effect of optimization that is drawn using 3nd1A protein by example of the present invention, and non-limiting of the invention Practical range, does various modifications and improvement on the premise of without departing from scope involved by substance of the present invention to it, should not Exclude outside protection scope of the present invention.

Claims

1. a kind of multiple domain protein structure assemble method based on template, it is characterised in that：The multiple domain protein structure assembling includes Following steps：

2) maximum iteration I is set_max, conflict distance threshold d_clash, interaction threshold value d_contact, the atom of interaction Quantity constant n₀；

3) it is directed to each multiple domain albumen in PDB storehouses and performs following operation, so that it is determined that rigging：

3.1) instrument TM-align is compared according to protein structure and finds out first albumen optimal comparison position, and record its template Compare score TM-score₁；

3.2) since the resi-dues of last that first domain albumen is compared, second domain egg is found out using TM-align White optimal comparison position, and and record TM-score₂；

3.4) score of the template is calculatedWherein score_iThe score of t-th of template is represented, TM-score_iRepresent the comparison score of i-th of domain albumen, L_iFor the sequence length of i-th of domain albumen；

3.5) by step 3.1) -3.4) calculate after the score for obtaining each template, the albumen for choosing highest scoring is used as mould Plate；

4.1) the C alpha atoms of albumen will be inquired about and the C α of template is compared one by one, spin matrix is then tried to achieve according to Kabsch methodsWith translation vector (t₁,t₂,t₃), u_st, s=1,2,3, t=1,2,3 represents the t of the s rows of spin matrix Individual element, t_sRepresent s-th of translation vector；

4.2) for each C alpha atom of inquiry albumenMake rotation translation

5) position of n-th of domain albumen is fixed, (n+1)th domain albumen is translated according to equation below, made between its tie point Root-mean-square-deviation RMSD is

Wherein, l_nThe length of n-th of domain albumen is tieed up,For the s dimension coordinates of last C alpha atom of n-th of domain albumen, For the s dimension coordinates of first C alpha atom of (n+1)th domain albumen, d_n,n+1For n-th of domain albumen last C alpha atom and Euclidean distance between first C alpha atom of (n+1)th domain albumen；

7) Euclidean distance of C alpha atoms between any two in the C alpha atoms and (n+1)th domain albumen of n-th of domain albumen is calculated, and is counted Distance is less than d_clashQuantity n_clash, and record corresponding distanceConflict score between computational fields

9) ENERGY E=w of current albumen is calculated₁E_RMSD+w₂E_clash+w₃E_contact, wherein, w₁,w₂,w₃For respective weighted value；

10.2) random generation anglec of rotation γ=2rand [0,1] -1 and assembling translation vector (T₁,T₂,T₃), wherein T_s=0.3 (2rand [0,1] -1), s=1,2,3；

10.3) assembling spin matrix is determined：

<mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <msub> <mi>U</mi> <mn>11</mn> </msub> <mo>=</mo> <msubsup> <mi>X</mi> <mn>1</mn> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>&alpha;</mi> <mo>)</mo> </mrow> <mo>+</mo> <mi>&alpha;</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>U</mi> <mn>12</mn> </msub> <mo>=</mo> <msub> <mi>X</mi> <mn>1</mn> </msub> <msub> <mi>X</mi> <mn>2</mn> </msub> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>&alpha;</mi> <mo>)</mo> </mrow> <mo>-</mo> <msub> <mi>&beta;X</mi> <mn>3</mn> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>U</mi> <mn>13</mn> </msub> <mo>=</mo> <msub> <mi>X</mi> <mn>1</mn> </msub> <msub> <mi>X</mi> <mn>3</mn> </msub> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>&alpha;</mi> <mo>)</mo> </mrow> <mo>-</mo> <msub> <mi>&beta;X</mi> <mn>2</mn> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>U</mi> <mn>21</mn> </msub> <mo>=</mo> <msub> <mi>X</mi> <mn>1</mn> </msub> <msub> <mi>X</mi> <mn>2</mn> </msub> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>&alpha;</mi> <mo>)</mo> </mrow> <mo>-</mo> <msub> <mi>&beta;X</mi> <mn>3</mn> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>U</mi> <mn>22</mn> </msub> <mo>=</mo> <msup> <msub> <mi>X</mi> <mn>2</mn> </msub> <mn>2</mn> </msup> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>&alpha;</mi> <mo>)</mo> </mrow> <mo>+</mo> <mi>&alpha;</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>U</mi> <mn>23</mn> </msub> <mo>=</mo> <msub> <mi>X</mi> <mn>2</mn> </msub> <msub> <mi>X</mi> <mn>3</mn> </msub> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>&alpha;</mi> <mo>)</mo> </mrow> <mo>-</mo> <msub> <mi>&beta;X</mi> <mn>1</mn> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>U</mi> <mn>31</mn> </msub> <mo>=</mo> <msub> <mi>X</mi> <mn>1</mn> </msub> <msub> <mi>X</mi> <mn>3</mn> </msub> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>&alpha;</mi> <mo>)</mo> </mrow> <mo>-</mo> <msub> <mi>&beta;X</mi> <mn>2</mn> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>U</mi> <mn>32</mn> </msub> <mo>=</mo> <msub> <mi>X</mi> <mn>3</mn> </msub> <msub> <mi>X</mi> <mn>2</mn> </msub> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>&alpha;</mi> <mo>)</mo> </mrow> <mo>-</mo> <msub> <mi>&beta;X</mi> <mn>3</mn> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>U</mi> <mn>33</mn> </msub> <mo>=</mo> <msup> <msub> <mi>X</mi> <mn>3</mn> </msub> <mn>2</mn> </msup> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>&alpha;</mi> <mo>)</mo> </mrow> <mo>+</mo> <mi>&alpha;</mi> </mrow> </mtd> </mtr> </mtable> </mfenced>

Wherein, α=cos γ, β=sin γ, U_stRepresent t-th of element of the s rows of assembling spin matrix, s=1,2,3, t= 1,2,3；

Wherein,The s dimension coordinates of first C alpha atom of (n+1)th domain albumen of expression, s=1,2,3,Represent (n+1)th The s dimension coordinates of m-th of C alpha atom of domain albumen, s=1,2,3；

10.5) according to step 6) -9) energy of current package assembly is calculated, if energy reduces, receive current package assembly；

11) repeat step 10) I_maxSecondary, then the structure of last time is the assembling knot of n-th of domain albumen and (n+1)th domain albumen Structure；

12) after (n+1)th albumen is completed, then it is fixed before n+1 domain albumen structure, according to step 5) -11) assemble the N+2 domain albumen, after all N number of domain albumen are completed, exports last package assembly.