CN107704725B - Discontinuous multi-domain protein structure assembly method - Google Patents

Discontinuous multi-domain protein structure assembly method Download PDF

Info

Publication number
CN107704725B
CN107704725B CN201710684511.3A CN201710684511A CN107704725B CN 107704725 B CN107704725 B CN 107704725B CN 201710684511 A CN201710684511 A CN 201710684511A CN 107704725 B CN107704725 B CN 107704725B
Authority
CN
China
Prior art keywords
protein
domain protein
discontinuous
domain
atom
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710684511.3A
Other languages
Chinese (zh)
Other versions
CN107704725A (en
Inventor
张贵军
周晓根
郝小虎
王柳静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201710684511.3A priority Critical patent/CN107704725B/en
Publication of CN107704725A publication Critical patent/CN107704725A/en
Application granted granted Critical
Publication of CN107704725B publication Critical patent/CN107704725B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment

Abstract

A discontinuous multi-domain protein structure assembly method comprises the steps of firstly, utilizing a protein sequence threading comparison tool to quickly search a protein library, and selecting a plurality of templates with the highest scores; then, accurately overlapping each single domain structure on the template according to the threading comparison information, thereby extracting the direction information in the template; secondly, carrying out random rotation and translation operations on the overlapped structure, measuring the quality of the current structure by using the interaction and the conflict distance between domains, simultaneously adding a template control factor to prevent the assembled structure from deviating the direction of the template, and adding a boundary distance factor to promote the full connection of the boundaries of continuous and discontinuous domain proteins; and finally, selecting an optimal structure obtained by a plurality of template groups according to the energy. The invention provides a discontinuous multi-domain protein structure assembly method with high prediction precision.

Description

Discontinuous multi-domain protein structure assembly method
Technical Field
The invention relates to the fields of biological informatics, intelligent optimization and computer application, in particular to a structural assembly method of discontinuous multi-domain protein.
Background
More than 70% of the proteins in a protein library are multi-domain proteins, i.e. one protein comprises multiple substructures. Of these multi-domain proteins, more than 40% comprise one or more discontinuous single domain proteins, e.g., more than 15% of the proteins in the CATH protein database are multi-domain proteins, of which more than 18% comprise at least one discontinuous domain protein. These single domain proteins have a discontinuous sequence and are divided into a plurality of parts, but in the case of three-dimensional structures, they all belong to one domain, although they are divided into a plurality of parts. It can be seen that the discontinuous multi-domain proteins account for a high proportion of the whole protein library, and therefore, how to predict the three-dimensional structures of the proteins by an assembly method is extremely important.
Currently, the most commonly used method for assembling the structure of multi-domain proteins is to sample the junction region between the continuous single-domain protein and the discontinuous single-domain protein, i.e., to fix the individual parts of the discontinuous single-domain protein, thereby sampling the junction region between the individual parts and the continuous single-domain protein, and thus obtaining the optimal junction structure. In the method, the energy function adopted in the sampling of the connecting region is the energy function for predicting the structure of the single-domain protein, and although the energy function is effective for predicting the structure of the single-domain protein, the sampling effect is not good for the connection between the discontinuous domain and the continuous domain due to the difference of energy items such as acting force and solvent accessibility among the single-domain proteins and the domain, so that the accuracy of predicting the structure of the discontinuous protein is low.
Therefore, the existing discontinuous multi-domain protein structure assembly method has defects in prediction precision, and needs to be improved.
Disclosure of Invention
In order to overcome the defect of low prediction precision of the existing discontinuous multi-domain protein structure assembly method, the invention provides the discontinuous multi-domain protein structure-based assembly method with high prediction precision.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a method of discontinuous multi-domain protein structure assembly, the method comprising the steps of:
1) inputting sequence information of the protein to be assembled and the three-dimensional structure of each single domain;
2) setting the maximum number of iterations GmaxThe number T of the assembly templates and a conflict distance threshold value dclashAnd interaction threshold dcontact
3) Searching the top T template proteins with the highest scores from a protein library by using a protein sequence threading alignment tool FFAS 3D;
4) the following operations were performed for each template protein:
4.1) overlapping each single domain of the protein to be assembled on the template according to the sequence alignment information of FFAS 3D;
4.2) immobilising the discontinuous Single Domain protein, translating the continuous protein inserted in between according to the following formula such that the distance between the point of attachment of the first part of the discontinuous protein and the point of attachment of the continuous proteinFor conflict distance threshold dclash
Figure BDA0001376342730000021
Wherein the content of the first and second substances,
Figure BDA0001376342730000022
denotes the s-dimensional coordinate of the m-th C.alpha.atom of the discontinuous domain protein, L denotes the discontinuous domain protein, N denotes the continuous domain protein, L and N are only labels that distinguish the discontinuous and continuous domain proteins, L is the sequence length of the discontinuous domain protein,
Figure BDA0001376342730000023
is the s-dimensional coordinate of the last C α atom of the first portion of the discontinuous domain protein, s-1, 2,3,
Figure BDA0001376342730000024
is the s-dimensional coordinate of the first C α atom of the continuous domain protein, d is the euclidean distance between the first C α atom of the continuous domain protein and the last C α atom of the first portion of the discontinuous domain protein;
4.3) calculating the energy of the current protein, and the process is as follows:
4.3.1) calculating the pairwise distance between C alpha atoms in the continuous domain protein and the discontinuous domain protein,
4.3.2) statistical distance less than the collision distance threshold dclashAll distances d ofc,c=1,2,...,nclashWherein n isclashIs the total number of collision distances and calculates the collision distance energy value
Figure BDA0001376342730000025
4.3.3) statistical distance less than interaction threshold dcontactN is the number of distancescontactAnd calculating the energy value of the interaction distance
Figure BDA0001376342730000026
Wherein n is0Interacting with each otherAn atomic number constant, which takes the value of 0.306(l + q), wherein l and q respectively represent the sequence lengths of the discontinuous domain protein and the continuous domain protein;
4.3.4) calculating the distance between the last C.alpha.atom of the first part of the discrete domain protein and the first C.alpha.atom of the continuous domain protein, and calculating the distance between the first C.alpha.atom of the second part of the discrete domain protein and the last C.alpha.atom of the continuous domain protein, which distances add to obtain the boundary distance energy Eboundary
4.3.5) calculating the root mean square deviation E of the C.alpha.atoms between the current protein and the templateRMSD
4.3.6) calculating the energy value E of the current proteinold=Eclash+Econtact+Ebounday+ERMSD
4.4) determining the rotation and translation according to the following operations:
4.4.1) randomly generating a 6-dimensional vector v ═ (v)1,v2,…,v6) Wherein v isjJ is 1,2, 6 is the jth dimension element of the vector v, and takes a random number between 0 and 1;
4.4.2) determine the axis of rotation z ═ z (z)1,z2,z3) Wherein
Figure BDA0001376342730000031
z3θ, and θ is 1-2v1
Figure BDA0001376342730000032
φ=2πv2
4.4.3) according to rotation angle γ 2v3-1 determining a rotation matrix u:
Figure BDA0001376342730000033
wherein u isstWhere s 1,2,3, t 1,2,3 denotes the t-th element of the s-th row of the rotation matrix;
4.4.4) determine the translation vector p ═ p (p)1,p2,p3) Which isIn (c) ps=0.3(2vs+3-1), s-1, 2,3 is the s-th dimension element of the translation vector;
4.4.5) in the first C.alpha.atom of the continuous domain protein
Figure BDA0001376342730000034
For the rotation point, all C α atoms were rotated and translated:
Figure BDA0001376342730000035
wherein the content of the first and second substances,
Figure BDA0001376342730000036
the s-dimensional coordinate s ═ 1,2,3 for the mth C α atom of the continuous domain protein;
4.4.6) calculating the energy function value E of the current protein according to the step 4.3)new
4.4.7) if Enew<EoldThen, the current assembly structure is received and recorded as Eold=Enew
4.4.8) if Enew≥EoldThen calculate the acceptance probability
Figure BDA0001376342730000037
Wherein E is the base of the natural logarithm, if the random number rand (0,1) between 0 and 1 is less than P, then the current structure is accepted and the E is recordedold=Enew
4.5) performing iteration operation according to the step 4.4) until G is reached by iteration for a plurality of timesmaxUntil, and regard structure with lowest energy as the result that the template assembles at present;
5) and comparing the energy of the structures assembled by the T templates, and selecting the structure with the lowest energy as a final assembled structure.
The technical conception of the invention is as follows: firstly, rapidly searching a protein library by using a protein sequence threading comparison tool, and selecting a plurality of templates with the highest scores; then, accurately overlapping each single domain structure on the template according to the threading comparison information, thereby extracting the direction information in the template; secondly, carrying out random rotation and translation operations on the overlapped structure, measuring the quality of the current structure by using the interaction and the conflict distance between domains, simultaneously adding a template control factor to prevent the assembled structure from deviating the direction of the template, and adding a boundary distance factor to promote the full connection of the boundaries of continuous and discontinuous domain proteins; and finally, selecting an optimal structure obtained by a plurality of template groups according to the energy.
The beneficial effects of the invention are as follows: the template is searched by a sequence threading comparison tool, the searching efficiency is improved, then the template is used for guiding assembly, and a plurality of energy items are designed to evaluate the assembly structure, so that the prediction precision is improved.
Drawings
FIG. 1 is a schematic of the assembly process of a discontinuous multi-domain protein structure.
FIG. 2 is the result of the assembly of the discontinuous multi-domain protein structure to the discontinuous multi-domain protein 3kc 2A.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1 and 2, a discontinuous multi-domain protein structure assembly method includes the steps of:
1) inputting sequence information of the protein to be assembled and the three-dimensional structure of each single domain;
2) setting the maximum number of iterations GmaxThe number T of the assembly templates and a conflict distance threshold value dclashAnd interaction threshold dcontact
3) Searching the top T template proteins with the highest scores from a protein library by using a protein sequence threading alignment tool FFAS 3D;
4) the following operations were performed for each template protein:
4.1) overlapping each single domain of the protein to be assembled on the template according to the sequence alignment information of FFAS 3D;
4.2) immobilising the discontinuous single domain protein by translating the continuous protein inserted between them according to the following formula such that the distance between the point of attachment of the first part of the discontinuous protein and the point of attachment of the continuous protein isCollision distance threshold dclash
Figure BDA0001376342730000041
Wherein the content of the first and second substances,
Figure BDA0001376342730000051
denotes the s-dimensional coordinate of the m-th C.alpha.atom of the discontinuous domain protein, L denotes the discontinuous domain protein, N denotes the continuous domain protein, L and N are only labels that distinguish the discontinuous and continuous domain proteins, L is the sequence length of the discontinuous domain protein,
Figure BDA0001376342730000052
is the s-dimensional coordinate of the last C α atom of the first portion of the discontinuous domain protein, s-1, 2,3,
Figure BDA0001376342730000053
is the s-dimensional coordinate of the first C α atom of the continuous domain protein, d is the euclidean distance between the first C α atom of the continuous domain protein and the last C α atom of the first portion of the discontinuous domain protein;
4.3) calculating the energy of the current protein, and the process is as follows:
4.3.1) calculating the pairwise distance between C alpha atoms in the continuous domain protein and the discontinuous domain protein,
4.3.2) statistical distance less than the collision distance threshold dclashAll distances d ofc,c=1,2,...,nclashWherein n isclashIs the total number of collision distances and calculates the collision distance energy value
Figure BDA0001376342730000054
4.3.3) statistical distance less than interaction threshold dcontactN is the number of distancescontactAnd calculating the energy value of the interaction distance
Figure BDA0001376342730000055
Wherein n is0The source of the interactionA sub-number constant, which takes the value of 0.306(l + q), wherein l and q respectively represent the sequence lengths of the discontinuous domain protein and the continuous domain protein;
4.3.4) calculating the distance between the last C α atom A of the first portion of the discrete domain protein and the first C α atom B of the continuous domain protein, and calculating the distance between the first C α atom D of the second portion of the discrete domain protein and the last C α atom C of the continuous domain protein, as shown in FIG. 1, which add to the boundary distance energy Eboundary
4.3.5) calculating the root mean square deviation E of the C.alpha.atoms between the current protein and the templateRMSD
4.3.6) calculating the energy value E of the current proteinold=Eclash+Econtact+Ebounday+ERMSD
4.4) determining the rotation and translation according to the following operations:
4.4.1) randomly generating a 6-dimensional vector v ═ (v)1,v2,…,v6) Wherein v isjJ is 1,2, 6 is the jth dimension element of the vector v, and takes a random number between 0 and 1;
4.4.2) determine the axis of rotation z ═ z (z)1,z2,z3) Wherein
Figure BDA0001376342730000056
z3θ, and θ is 1-2v1
Figure BDA0001376342730000057
φ=2πv2
4.4.3) according to rotation angle γ 2v3-1 determining a rotation matrix u:
Figure BDA0001376342730000061
wherein u isstWhere s 1,2,3, t 1,2,3 denotes the t-th element of the s-th row of the rotation matrix;
4.4.4) determine the translation vector p ═ p (p)1,p2,p3) Wherein p iss=0.3(2vs+3-1), s-1, 2,3 is the s-th dimension element of the translation vector;
4.4.5) in the first C.alpha.atom of the continuous domain protein
Figure BDA0001376342730000062
For the rotation point, all C α atoms were rotated and translated:
Figure BDA0001376342730000063
wherein the content of the first and second substances,
Figure BDA0001376342730000064
the s-dimensional coordinate s ═ 1,2,3 for the mth C α atom of the continuous domain protein;
4.4.6) calculating the energy function value E of the current protein according to the step 4.3)new
4.4.7) if Enew<EoldThen, the current assembly structure is received and recorded as Eold=Enew
4.4.8) if Enew≥EoldThen calculate the acceptance probability
Figure BDA0001376342730000065
Wherein E is the base of the natural logarithm, if the random number rand (0,1) between 0 and 1 is less than P, then the current structure is accepted and the E is recordedold=Enew
4.5) performing iteration operation according to the step 4.4) until G is reached by iteration for a plurality of timesmaxUntil, and regard structure with lowest energy as the result that the template assembles at present;
5) and comparing the energy of the structures assembled by the T templates, and selecting the structure with the lowest energy as a final assembled structure.
The discontinuous multi-domain protein 3kc2A with the sequence length of 324 is an example, and the discontinuous multi-domain protein structure assembling method comprises the following steps:
1) inputting sequence information of the protein to be assembled and the three-dimensional structure of each single domain;
2) setting the maximum number of iterations Gmax30000, number of assembled templates T5, collision distance threshold dclash3.8, interaction threshold dcontact=8.0;
3) Searching the top T template proteins with the highest scores from a protein library by using a protein sequence threading alignment tool FFAS 3D;
4) the following operations were performed for each template protein:
4.1) overlapping each single domain of the protein to be assembled on the template according to the sequence alignment information of FFAS 3D;
4.2) immobilising the discontinuous single domain protein by translating the continuous protein inserted between them according to the following formula such that the distance between the point of attachment of the first part of the discontinuous protein and the point of attachment of the continuous protein is the collision distance threshold dclash
Figure BDA0001376342730000071
Wherein the content of the first and second substances,
Figure BDA0001376342730000072
denotes the s-dimensional coordinate of the m-th C.alpha.atom of the discontinuous domain protein, L denotes the discontinuous domain protein, N denotes the continuous domain protein, L and N are only labels that distinguish the discontinuous and continuous domain proteins, L is the sequence length of the discontinuous domain protein,
Figure BDA0001376342730000073
is the s-dimensional coordinate of the last C α atom of the first portion of the discontinuous domain protein, s-1, 2,3,
Figure BDA0001376342730000074
is the s-dimensional coordinate of the first C α atom of the continuous domain protein, d is the euclidean distance between the first C α atom of the continuous domain protein and the last C α atom of the first portion of the discontinuous domain protein;
4.3) calculating the energy of the current protein, and the process is as follows:
4.3.1) calculating the pairwise distance between C alpha atoms in the continuous domain protein and the discontinuous domain protein,
4.3.2) statistical distance less than the collision distance threshold dclashAll distances d ofc,c=1,2,...,nclashWherein n isclashIs the total number of collision distances and calculates the collision distance energy value
Figure BDA0001376342730000075
4.3.3) statistical distance less than interaction threshold dcontactN is the number of distancescontactAnd calculating the energy value of the interaction distance
Figure BDA0001376342730000076
Wherein n is0The number constant of atoms interacting, which is 0.306(l + q), wherein l and q represent the sequence lengths of the discontinuous domain protein and the continuous domain protein, respectively;
4.3.4) calculating the distance between the last C.alpha.atom of the first part of the discrete domain protein and the first C.alpha.atom of the continuous domain protein, and calculating the distance between the first C.alpha.atom of the second part of the discrete domain protein and the last C.alpha.atom of the continuous domain protein, which distances add to obtain the boundary distance energy Eboundary
4.3.5) calculating the root mean square deviation E of the C.alpha.atoms between the current protein and the templateRMSD
4.3.6) calculating the energy value E of the current proteinold=Eclash+Econtact+Ebounday+ERMSD
4.4) determining the rotation and translation according to the following operations:
4.4.1) randomly generating a 6-dimensional vector v ═ (v)1,v2,…,v6) Wherein v isjJ is 1,2, 6 is the jth dimension element of the vector v, and takes a random number between 0 and 1;
4.4.2) determine the axis of rotation z ═ z (z)1,z2,z3) Wherein
Figure BDA0001376342730000081
z3θ, and θ is 1-2v1
Figure BDA0001376342730000082
φ=2πv2
4.4.3) according to rotation angle γ 2v3-1 determining a rotation matrix u:
Figure BDA0001376342730000083
wherein u isstWhere s 1,2,3, t 1,2,3 denotes the t-th element of the s-th row of the rotation matrix;
4.4.4) determine the translation vector p ═ p (p)1,p2,p3) Wherein p iss=0.3(2vs+3-1), s-1, 2,3 is the s-th dimension element of the translation vector;
4.4.5) in the first C.alpha.atom of the continuous domain protein
Figure BDA0001376342730000084
For the rotation point, all C α atoms were rotated and translated:
Figure BDA0001376342730000085
wherein the content of the first and second substances,
Figure BDA0001376342730000086
the s-dimensional coordinate s ═ 1,2,3 for the mth C α atom of the continuous domain protein;
4.4.6) calculating the energy function value E of the current protein according to the step 4.3)new
4.4.7) if Enew<EoldThen, the current assembly structure is received and recorded as Eold=Enew
4.4.8) if Enew≥EoldThen calculate the acceptance probability
Figure BDA0001376342730000087
Wherein E is the base of the natural logarithm, if the random number rand (0,1) between 0 and 1 is less than P, then the current structure is accepted and the E is recordedold=Enew
4.5) performing iteration operation according to the step 4.4) until G is reached by iteration for a plurality of timesmaxUntil, and regard structure with lowest energy as the result that the template assembles at present;
5) and comparing the energy of the structures assembled by the T templates, and selecting the structure with the lowest energy as a final assembled structure.
Using the discontinuous multi-domain protein 3kc2A with the sequence length of 324 as an example, the near-native conformation of the multi-domain protein was obtained by the above assembly method, and the alignment score TM-score with the native structure was 0.976, and the predicted structure is shown in FIG. 2.
The above description is the assembly effect of the present invention using 3kc2A protein as an example, and is not intended to limit the scope of the present invention, and various modifications and improvements can be made without departing from the scope of the present invention.

Claims (1)

1. A method of assembling a discontinuous multi-domain protein structure, comprising: the discontinuous multi-domain protein structure assembly comprises the following steps:
1) inputting sequence information of the protein to be assembled and the three-dimensional structure of each single domain;
2) setting the maximum number of iterations GmaxThe number T of the assembly templates and a conflict distance threshold value dclashAnd interaction threshold dcontact
3) Searching the top T template proteins with the highest scores from a protein library by using a protein sequence threading alignment tool FFAS 3D;
4) the following operations were performed for each template protein:
4.1) overlapping each single domain of the protein to be assembled on the template according to the sequence alignment information of FFAS 3D;
4.2) immobilization of discontinuous Single Domain proteins, translational manipulation of the continuous protein inserted in between according to the following formulaMaking the distance between the point of attachment of the first part of the discontinuous protein and the point of attachment of the continuous protein a collision distance threshold dclash
Figure FDA0002679826750000011
Wherein the content of the first and second substances,
Figure FDA0002679826750000012
denotes the s-dimensional coordinate of the m-th C.alpha.atom of the discontinuous domain protein, L denotes the discontinuous domain protein, N denotes the continuous domain protein, L and N are only labels that distinguish the discontinuous and continuous domain proteins, L is the sequence length of the discontinuous domain protein,
Figure FDA0002679826750000013
is the s-dimensional coordinate of the last C α atom of the first portion of the discontinuous domain protein, s-1, 2,3,
Figure FDA0002679826750000014
is the s-dimensional coordinate of the first C α atom of the continuous domain protein, d is the euclidean distance between the first C α atom of the continuous domain protein and the last C α atom of the first portion of the discontinuous domain protein;
4.3) calculating the energy of the current protein, and the process is as follows:
4.3.1) calculating the pairwise distance between C alpha atoms in the continuous domain protein and the discontinuous domain protein,
4.3.2) statistical distance less than the collision distance threshold dclashAll distances d ofc,c=1,2,...,nclashWherein n isclashIs the total number of collision distances and calculates the collision distance energy value
Figure FDA0002679826750000015
4.3.3) statistical distance less than interaction threshold dcontactN is the number of distancescontactAnd calculating the energy value of the interaction distance
Figure FDA0002679826750000016
Wherein n is0The number constant of atoms for interaction is 0.306(l + q), and l and q respectively represent the sequence lengths of the discontinuous domain protein and the continuous domain protein;
4.3.4) calculating the distance between the last C.alpha.atom of the first part of the discrete domain protein and the first C.alpha.atom of the continuous domain protein, and calculating the distance between the first C.alpha.atom of the second part of the discrete domain protein and the last C.alpha.atom of the continuous domain protein, which distances add to obtain the boundary distance energy Eboundary
4.3.5) calculating the root mean square deviation E of the C.alpha.atoms between the current protein and the templateRMSD
4.3.6) calculating the energy value E of the current proteinold=Eclash+Econtact+Ebounday+ERMSD
4.4) determining the rotation and translation according to the following operations:
4.4.1) randomly generating a 6-dimensional vector v ═ (v)1,v2,…,v6) Wherein v isjJ is 1,2, 6 is the jth dimension element of the vector v, and takes a random number between 0 and 1;
4.4.2) determine the axis of rotation z ═ z (z)1,z2,z3) Wherein
Figure FDA0002679826750000021
z3θ, and θ is 1-2v1
Figure FDA0002679826750000022
φ=2πv2
4.4.3) according to rotation angle γ 2v3-1 determining a rotation matrix u:
Figure FDA0002679826750000023
wherein u isstB 1,2,3, t 1,2,3 denotes the t-th element of the b-th row of the rotation matrix;
4.4.4) determine the translation vector p ═ p (p)1,p2,p3) Wherein p iss=0.3(2vs+3-1), s-1, 2,3 is the s-th dimension element of the translation vector;
4.4.5) in the first C.alpha.atom of the continuous domain protein
Figure FDA0002679826750000024
For the rotation point, all C α atoms were rotated and translated:
Figure FDA0002679826750000025
wherein the content of the first and second substances,
Figure FDA0002679826750000026
the s-dimensional coordinate s ═ 1,2,3 for the mth C α atom of the continuous domain protein;
4.4.6) calculating the energy function value E of the current protein according to the step 4.3)new
4.4.7) if Enew<EoldThen, the current assembly structure is received and recorded as Eold=Enew
4.4.8) if Enew≥EoldThen calculate the acceptance probability
Figure FDA0002679826750000031
Wherein E is the base of the natural logarithm, if the random number rand (0,1) between 0 and 1 is less than P, then the current structure is accepted and the E is recordedold=Enew
4.5) performing iteration operation according to the step 4.4) until G is reached by iteration for a plurality of timesmaxUntil, and regard structure with lowest energy as the result that the template assembles at present;
5) and comparing the energy of the structures assembled by the T templates, and selecting the structure with the lowest energy as a final assembled structure.
CN201710684511.3A 2017-08-11 2017-08-11 Discontinuous multi-domain protein structure assembly method Active CN107704725B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710684511.3A CN107704725B (en) 2017-08-11 2017-08-11 Discontinuous multi-domain protein structure assembly method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710684511.3A CN107704725B (en) 2017-08-11 2017-08-11 Discontinuous multi-domain protein structure assembly method

Publications (2)

Publication Number Publication Date
CN107704725A CN107704725A (en) 2018-02-16
CN107704725B true CN107704725B (en) 2020-12-01

Family

ID=61170905

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710684511.3A Active CN107704725B (en) 2017-08-11 2017-08-11 Discontinuous multi-domain protein structure assembly method

Country Status (1)

Country Link
CN (1) CN107704725B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001075436A1 (en) * 2000-04-03 2001-10-11 Board Of Trustees Of The Leland Stanford Junior University A method for protein structure alignment
CN102682223A (en) * 2010-11-30 2012-09-19 中国科学院计算机网络信息中心 Structure detection method of protein cryoelectron microscopy density map
CN103235900A (en) * 2013-03-28 2013-08-07 中山大学 Weight assembly clustering method for excavating protein complex
CN103714265A (en) * 2013-12-23 2014-04-09 浙江工业大学 Method for predicting protein three-dimensional structure based on Monte Carlo local shaking and fragment assembly
CN105121661A (en) * 2013-02-01 2015-12-02 加利福尼亚大学董事会 Methods for genome assembly and haplotype phasing

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001075436A1 (en) * 2000-04-03 2001-10-11 Board Of Trustees Of The Leland Stanford Junior University A method for protein structure alignment
CN102682223A (en) * 2010-11-30 2012-09-19 中国科学院计算机网络信息中心 Structure detection method of protein cryoelectron microscopy density map
CN105121661A (en) * 2013-02-01 2015-12-02 加利福尼亚大学董事会 Methods for genome assembly and haplotype phasing
CN103235900A (en) * 2013-03-28 2013-08-07 中山大学 Weight assembly clustering method for excavating protein complex
CN103714265A (en) * 2013-12-23 2014-04-09 浙江工业大学 Method for predicting protein three-dimensional structure based on Monte Carlo local shaking and fragment assembly

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《A Memetic Algorithm for 3D Protein Structure Prediction Problem》;Correa L et al;《IEEE》;20161202;第15卷(第3期);全文 *
《跨膜蛋白自组装机理的研究》;李成栋;《中国优秀硕士学位论文全文数据库工程科技Ⅰ辑》;20160215(第2016年第02期);全文 *

Also Published As

Publication number Publication date
CN107704725A (en) 2018-02-16

Similar Documents

Publication Publication Date Title
Pomyen et al. Deep metabolome: Applications of deep learning in metabolomics
Sandin et al. Data processing methods and quality control strategies for label-free LC–MS protein quantification
US10169305B2 (en) Marking comparison for similar documents
CN107609342B (en) Protein conformation search method based on secondary structure space distance constraint
CN107491664B (en) Protein structure de novo prediction method based on information entropy
CN109215732B (en) Protein structure prediction method based on residue contact information self-learning
CN109346125B (en) Rapid and accurate protein binding pocket structure alignment method
Ciobanu et al. Ab initio: Automatic Latin proto-word reconstruction
CN109215733B (en) Protein structure prediction method based on residue contact information auxiliary evaluation
CN107180164B (en) Template-based multi-domain protein structure assembly method
CN107704725B (en) Discontinuous multi-domain protein structure assembly method
Wang et al. Graph-based peak alignment algorithms for multiple liquid chromatography-mass spectrometry datasets
Zhang et al. WeLayout: WeChat Layout Analysis System for the ICDAR 2023 Competition on Robust Layout Segmentation in Corporate Documents
Botev et al. Word importance-based similarity of documents metric (WISDM) Fast and scalable document similarity metric for analysis of scientific documents
Biswas et al. A dynamic programming algorithm for finding the optimal placement of a secondary structure topology in Cryo-EM data
CN109033753B (en) Group protein structure prediction method based on secondary structure fragment assembly
CN1342291A (en) Matching engine
CN109346128B (en) Protein structure prediction method based on residue information dynamic selection strategy
CN107609345B (en) Multi-domain protein structure assembly method based on template self-adaptive selection
CN103778182A (en) Method for rapidly judging graph similarity
CN107273713B (en) Multi-domain protein template searching method based on TM-align
He et al. Gappy pattern matching on GPUs for on-demand extraction of hierarchical translation grammars
Atasever et al. 3-State Protein Secondary Structure Prediction based on SCOPe Classes
CN107609340B (en) Multi-domain protein distance spectrum construction method
Nasien et al. New feature vector from freeman chain code for handwritten roman character recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant