CN110729023B - Protein structure prediction method based on contact assistance of secondary structure elements - Google Patents

Protein structure prediction method based on contact assistance of secondary structure elements Download PDF

Info

Publication number
CN110729023B
CN110729023B CN201910805005.4A CN201910805005A CN110729023B CN 110729023 B CN110729023 B CN 110729023B CN 201910805005 A CN201910805005 A CN 201910805005A CN 110729023 B CN110729023 B CN 110729023B
Authority
CN
China
Prior art keywords
secondary structure
sampling
contact
assembly
elements
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910805005.4A
Other languages
Chinese (zh)
Other versions
CN110729023A (en
Inventor
张贵军
刘俊
彭春祥
饶亮
周晓根
胡俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201910805005.4A priority Critical patent/CN110729023B/en
Publication of CN110729023A publication Critical patent/CN110729023A/en
Application granted granted Critical
Publication of CN110729023B publication Critical patent/CN110729023B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment

Landscapes

  • Spectroscopy & Molecular Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A protein structure prediction method based on secondary structure element contact assistance comprises the following steps of firstly, extracting contact information between secondary structure elements according to a predicted secondary structure and a residue contact map; then, a sequential sampling strategy is utilized to respectively sample the secondary structure elements and the loop areas connected with the secondary structure elements, so that a conformation with more accurate secondary structure and basically correct spatial position relationship among the secondary structure elements and higher topological structure precision is quickly obtained; and finally, the structure is further optimized by combining an energy function, and the overall prediction precision is improved. The invention provides a protein structure prediction method with high prediction precision based on secondary structure element contact assistance.

Description

Protein structure prediction method based on contact assistance of secondary structure elements
Technical Field
The invention relates to the fields of bioinformatics and computer application, in particular to a protein structure prediction method based on secondary structure element contact assistance.
Background
As the human genome project announced completion, the process of DNA transcription, translation into amino acid sequences (i.e., the first genetic code) has been broken by humans. However, it is an unblended puzzle how a protein folds from an amino acid sequence into a specific three-dimensional structure (second genetic code). The structure of the protein determines the specific biological function, and the efficient acquisition of the protein structure is very important for understanding the biological function, drug design and disease treatment.
At present, the three-dimensional structure of the protein is mainly obtained by an experimental determination method. The method for measuring the protein structure by experiment mainly comprises X-ray crystal diffraction, nuclear magnetic resonance and cryoelectron microscope technology. Such methods are complex, require extremely high time and capital investment, and are difficult to determine experimentally for most drug target proteins in terms of their three-dimensional structure.
The amino acid sequence of the protein contains three-dimensional structure information, and with the rapid development of artificial intelligence, the prior knowledge is mined from a known protein database according to the amino acid sequence information, and the three-dimensional structure of the protein is directly predicted from the amino acid sequence of the protein by utilizing a computer to simulate the protein folding process, which has become a development trend. Many research institutes worldwide have been dedicated to research on predicting three-dimensional structure of protein using biodata, artificial intelligence, and systematic optimization techniques, and gradually applied to disease diagnosis and drug design, among which representative research teams are David Baker's laboratory of washington, zhang laboratory of michigan university, and the like. More and more colleges and research institutions in China are also added to the research of protein structure prediction.
With the rapid development of inter-residue contact prediction, most protein structure prediction methods use inter-residue contact information to improve prediction accuracy. The secondary structure elements have obvious local characteristics, and the position relationship among the secondary structure elements directly determines the precision of the protein topological structure. However, the current method only considers the contact between residues and does not consider the space constraint between secondary structure elements.
Therefore, the current protein structure prediction method does not consider the space constraint between secondary structure elements, and needs to be improved.
Disclosure of Invention
In order to overcome the defect of low precision of the protein topological structure of the conventional protein structure prediction method, the invention provides a protein structure prediction method assisted by contact of secondary structure elements; firstly, extracting contact information among secondary structure elements according to a predicted secondary structure and a residue contact diagram, then quickly sampling a protein conformation space by using a sequential sampling strategy to generate a conformation with higher topological structure precision, and finally further optimizing the structure by combining an energy function to improve the overall prediction precision.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a protein structure prediction method based on contact assistance of secondary structure elements comprises the following steps:
1) inputting a target sequence, a fragment library, a predicted secondary structure and a residue contact map of a predicted protein;
2) extracting contact information between secondary structure elements, wherein the process comprises the following steps:
2.1) labeling each Secondary Structure element as SS based on the predicted Secondary Structure informationiWhere i ∈ {1,2, …, m }, m representing the number of secondary structure elements; note two adjacent secondary structure elements SSiAnd SSi+1The loop region between is LkWhere k ∈ {1,2, …, m-1 };
2.2) for any pair of residues in the residue contact map, if the two residues belong to two different secondary structural elements SSiAnd SSjThen classify it into SSiAnd SSjContact set CSSi,jWherein i and j are respectively in the state of {1,2, …, m }, and i < j;
3) setting parameters: maximum number of assembly times G for secondary structure samplingSMaximum number of assembly times G for loop region samplingLAssembling the maximum iteration times G of the segments guided by the energy function;
4) sequential sampling based on secondary structure matching degree, the process is as follows:
4.1) sampling Secondary Structure element SS Assembly Using fragment of 9 fragments1Make SS1Secondary structure constraints are met to the maximum extent; for example, if predicted SS1Alpha helix, then sample to SS1All residues in the sequence are alpha helices or reach the maximum assembly times GS
4.2) sampling SS in sequence according to the mode of the step 4.1)2To SSmThe conformation with more accurate secondary structure can be obtained;
5) sequential sampling based on contact constraint between secondary structure elements, the process is as follows:
5.1) assembling sample SS using fragments of 9 fragments1And SS2Loop region L between1Up to SS2And SS1Satisfy the contact set CSS in the spatial position relationship therebetween1,2Or up to a maximum number of assembly times GL
5.2) sampling L2Make SS3And SS1And SS2In the space betweenThe positional relationship satisfies the contact set CSS1,3And CSS2,3Or up to a maximum number of assembly times GL
5.3) and so on, sample LkMake SSk+1And SS1To SSkSatisfy the contact set CSS in the spatial position relationship therebetween1,k+1To CSSk,k+1Or up to a maximum number of iterations GL
5.4) sampling all LkK belongs to {1,2, …, m-1}, and a conformation with accurate secondary structure and basically correct spatial position relationship among secondary structure elements and high topological structure precision can be obtained;
6) the structure was further optimized by combining the Rosetta score3 energy function as follows:
6.1) setting sampling probability: a secondary structural element SSiSetting the sampling probability of all residues to be 0.3, and setting the sampling probability of all loop regions to be 1;
6.2) fragment assembly of 3 fragments is used, and the whole protein sequence is randomly sampled according to the sampling probability; calculating the energy of the pre-and post-assembly conformation of each fragment by using a Rosetta score3 energy function, and determining whether the assembly is successful according to a boltzmann criterion;
6.3) iterating step 6.2) until the maximum assembly times G is reached;
7) the resulting conformation is output as a predicted result.
The invention has the beneficial effects that: firstly, extracting contact information between secondary structure elements according to a predicted secondary structure and a residue contact map; then, a sequential sampling strategy is utilized to respectively sample the secondary structure elements and the loop areas connected with the secondary structure elements, so that a conformation with more accurate secondary structure and basically correct spatial position relationship among the secondary structure elements and higher topological structure precision is quickly obtained; and finally, the structure is further optimized by combining an energy function, and the overall prediction precision is improved.
Drawings
FIG. 1 is a schematic diagram of the extraction of the contact between secondary structure elements based on the protein structure prediction method assisted by the contact of the secondary structure elements, wherein the curve connecting different secondary structure elements indicates that the residues in two secondary structure elements are in contact.
FIG. 2 is an RMSD distribution diagram of a conformation sampled when a protein 2BL7 is predicted based on a secondary structure element contact-assisted protein structure prediction method.
FIG. 3 is a three-dimensional structure diagram of protein 2BL7 obtained by structure prediction based on secondary structure element contact-assisted protein structure prediction method.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1,2 and 3, a method for predicting protein structure based on contact assistance of secondary structural elements comprises the following steps:
1) inputting a target sequence, a fragment library, a predicted secondary structure and a residue contact map of a predicted protein;
2) extracting contact information between secondary structure elements, wherein the process comprises the following steps:
2.1) labeling each Secondary Structure element as SS based on the predicted Secondary Structure informationiWhere i ∈ {1,2, …, m }, m representing the number of secondary structure elements; note two adjacent secondary structure elements SSiAnd SSi+1The loop region between is LkWhere k ∈ {1,2, …, m-1 };
2.2) for any pair of residues in the residue contact map, if the two residues belong to two different secondary structural elements SSiAnd SSjThen classify it into SSiAnd SSjContact set CSSi,jWherein i and j are respectively in the state of {1,2, …, m }, and i < j;
3) setting parameters: maximum number of assembly times G for secondary structure samplingSMaximum number of assembly times G for loop region samplingLAssembling the maximum iteration times G of the segments guided by the energy function;
4) sequential sampling based on secondary structure matching degree, the process is as follows:
4.1) sampling Secondary Structure element SS Assembly Using fragment of 9 fragments1To makeGet SS1Secondary structure constraints are met to the maximum extent; for example, if predicted SS1Alpha helix, then sample to SS1All residues in the sequence are alpha helices or reach the maximum assembly times GS
4.2) sampling SS in sequence according to the mode of the step 4.1)2To SSmThe conformation with more accurate secondary structure can be obtained;
5) sequential sampling based on contact constraint between secondary structure elements, the process is as follows:
5.1) assembling sample SS using fragments of 9 fragments1And SS2Loop region L between1Up to SS2And SS1Satisfy the contact set CSS in the spatial position relationship therebetween1,2Or up to a maximum number of assembly times GL
5.2) sampling L2Make SS3And SS1And SS2Satisfy the contact set CSS in the spatial position relationship therebetween1,3And CSS2,3Or up to a maximum number of assembly times GL
5.3) and so on, sample LkMake SSk+1And SS1To SSkSatisfy the contact set CSS in the spatial position relationship therebetween1,k+1To CSSk,k+1Or up to a maximum number of iterations GL
5.4) sampling all LkK belongs to {1,2, …, m-1}, and a conformation with accurate secondary structure and basically correct spatial position relationship among secondary structure elements and high topological structure precision can be obtained;
6) the structure was further optimized by combining the Rosetta score3 energy function as follows:
6.1) setting sampling probability: a secondary structural element SSiSetting the sampling probability of all residues to be 0.3, and setting the sampling probability of all loop regions to be 1;
6.2) fragment assembly of 3 fragments is used, and the whole protein sequence is randomly sampled according to the sampling probability; calculating the energy of the pre-and post-assembly conformation of each fragment by using a Rosetta score3 energy function, and determining whether the assembly is successful according to a boltzmann criterion;
6.3) iterating step 6.2) until the maximum assembly times G is reached;
7) the resulting conformation is output as a predicted result.
In this embodiment, protein 2BL7 with a sequence length of 79 is taken as an example, and a method for predicting a protein structure based on contact assistance of secondary structural elements includes the following steps:
1) inputting a target sequence, a fragment library, a predicted secondary structure and a residue contact map of a predicted protein 2BL 7;
2) extracting contact information between secondary structure elements, wherein the process comprises the following steps:
2.1) labeling each Secondary Structure element as SS based on the predicted Secondary Structure informationiWhere i ∈ {1,2, …, m }, m representing the number of secondary structure elements; note two adjacent secondary structure elements SSiAnd SSi+1The loop region between is LkWhere k ∈ {1,2, …, m-1 };
2.2) for any pair of residues in the residue contact map, if the two residues belong to two different secondary structural elements SSiAnd SSjThen classify it into SSiAnd SSjContact set CSSi,jWherein i and j are respectively in the state of {1,2, …, m }, and i < j;
3) setting parameters: maximum number of assembly times G for secondary structure samplingS500, loop region sample maximum assembly times GL500, the maximum iteration number G of the segment assembly guided by the energy function is 2000;
4) sequential sampling based on secondary structure matching degree, the process is as follows:
4.1) sampling Secondary Structure element SS Assembly Using fragment of 9 fragments1Make SS1Secondary structure constraints are met to the maximum extent; for example, if predicted SS1Alpha helix, then sample to SS1All residues in the sequence are alpha helices or reach the maximum assembly times GS
4.2) according to the way of step 4.1),sampling SS in sequence2To SSmThe conformation with more accurate secondary structure can be obtained;
5) sequential sampling based on contact constraint between secondary structure elements, the process is as follows:
5.1) assembling sample SS using fragments of 9 fragments1And SS2Loop region L between1Up to SS2And SS1Satisfy the contact set CSS in the spatial position relationship therebetween1,2Or up to a maximum number of assembly times GL
5.2) sampling L2Make SS3And SS1And SS2Satisfy the contact set CSS in the spatial position relationship therebetween1,3And CSS2,3Or up to a maximum number of assembly times GL
5.3) and so on, sample LkMake SSk+1And SS1To SSkSatisfy the contact set CSS in the spatial position relationship therebetween1,k+1To CSSk,k+1Or up to a maximum number of iterations GL
5.4) sampling all LkK belongs to {1,2, …, m-1}, and a conformation with accurate secondary structure and basically correct spatial position relationship among secondary structure elements and high topological structure precision can be obtained;
6) the structure was further optimized by combining the Rosetta score3 energy function as follows:
6.1) setting sampling probability: a secondary structural element SSiSetting the sampling probability of all residues to be 0.3, and setting the sampling probability of all loop regions to be 1;
6.2) fragment assembly of 3 fragments is used, and the whole protein sequence is randomly sampled according to the sampling probability; calculating the energy of the pre-and post-assembly conformation of each fragment by using a Rosetta score3 energy function, and determining whether the assembly is successful according to a boltzmann criterion;
6.3) iterating step 6.2) until the maximum assembly times G is reached;
7) the resulting conformation is output as a predicted result.
Taking the protein 2BL7 with the amino acid sequence length of 79 as an example, the near-native conformation of the protein is obtained by prediction by the above method, the conformation update schematic diagram is shown in FIG. 1, and the predicted root mean square deviation of the protein is
Figure BDA0002183383060000061
The prediction structure is shown in fig. 2.
The foregoing is a predictive effect of one embodiment of the invention, which may be adapted not only to the above-described embodiment, but also to various modifications thereof without departing from the basic idea of the invention and without exceeding the gist of the invention.

Claims (1)

1. A protein structure prediction method based on contact assistance of secondary structure elements is characterized by comprising the following steps:
1) inputting a target sequence, a fragment library, a predicted secondary structure and a residue contact map of a predicted protein;
2) extracting contact information between secondary structure elements, wherein the process comprises the following steps:
2.1) labeling each Secondary Structure element as SS based on the predicted Secondary Structure informationiWhere i ∈ {1,2, …, m }, m representing the number of secondary structure elements; note two adjacent secondary structure elements SSiAnd SSi+1The loop region between is LkWhere k ∈ {1,2, …, m-1 };
2.2) for any pair of residues in the residue contact map, if the two residues belong to two different secondary structural elements SSiAnd SSjThen classify it into SSiAnd SSjContact set CSSi,jWherein i and j are respectively in the state of {1,2, …, m }, and i < j;
3) setting parameters: maximum number of assembly times G for secondary structure samplingSMaximum number of assembly times G for loop region samplingLAssembling the maximum iteration times G of the segments guided by the energy function;
4) sequential sampling based on secondary structure matching degree, the process is as follows:
4.1) sampling Secondary Structure element SS Assembly Using fragment of 9 fragments1Make SS1Secondary structure constraints are met to the maximum extent; if predicted SS1Alpha helix, then sample to SS1All residues in the sequence are alpha helices or reach the maximum assembly times GS
4.2) sampling SS in sequence according to the mode of the step 4.1)2To SSmThe conformation with more accurate secondary structure can be obtained;
5) sequential sampling based on contact constraint between secondary structure elements, the process is as follows:
5.1) assembling sample SS using fragments of 9 fragments1And SS2Loop region L between1Up to SS2And SS1Satisfy the contact set CSS in the spatial position relationship therebetween1,2Or up to a maximum number of assembly times GL
5.2) sampling L2Make SS3And SS1And SS2Satisfy the contact set CSS in the spatial position relationship therebetween1,3And CSS2,3Or up to a maximum number of assembly times GL
5.3) and so on, sample LkMake SSk+1And SS1To SSkSatisfy the contact set CSS in the spatial position relationship therebetween1,k+1To CSSk,k+1Or up to a maximum number of iterations GL
5.4) sampling all LkK belongs to {1,2, …, m-1}, and a conformation with accurate secondary structure and basically correct spatial position relationship among secondary structure elements and high topological structure precision can be obtained;
6) the structure was further optimized by combining the Rosetta score3 energy function as follows:
6.1) setting sampling probability: a secondary structural element SSiSetting the sampling probability of all residues to be 0.3, and setting the sampling probability of all loop regions to be 1;
6.2) fragment assembly of 3 fragments is used, and the whole protein sequence is randomly sampled according to the sampling probability; calculating the energy of the pre-and post-assembly conformation of each fragment by using a Rosetta score3 energy function, and determining whether the assembly is successful according to a boltzmann criterion;
6.3) iterating step 6.2) until the maximum assembly times G is reached;
7) the resulting conformation is output as a predicted result.
CN201910805005.4A 2019-08-29 2019-08-29 Protein structure prediction method based on contact assistance of secondary structure elements Active CN110729023B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910805005.4A CN110729023B (en) 2019-08-29 2019-08-29 Protein structure prediction method based on contact assistance of secondary structure elements

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910805005.4A CN110729023B (en) 2019-08-29 2019-08-29 Protein structure prediction method based on contact assistance of secondary structure elements

Publications (2)

Publication Number Publication Date
CN110729023A CN110729023A (en) 2020-01-24
CN110729023B true CN110729023B (en) 2021-04-06

Family

ID=69217787

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910805005.4A Active CN110729023B (en) 2019-08-29 2019-08-29 Protein structure prediction method based on contact assistance of secondary structure elements

Country Status (1)

Country Link
CN (1) CN110729023B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108334746A (en) * 2018-01-15 2018-07-27 浙江工业大学 A kind of Advances in protein structure prediction based on secondary structure similarity
CN109033753A (en) * 2018-06-07 2018-12-18 浙江工业大学 A kind of group's Advances in protein structure prediction based on the assembling of secondary structure segment
CN109101785A (en) * 2018-07-12 2018-12-28 浙江工业大学 A kind of Advances in protein structure prediction based on secondary structure similarity selection strategy
CN109378035A (en) * 2018-08-29 2019-02-22 浙江工业大学 A kind of Advances in protein structure prediction based on secondary structure dynamic select strategy

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120004185A1 (en) * 2009-02-27 2012-01-05 Atyr Pharma, Inc. Polypeptide structural motifs associated with cell signaling activity
US11031094B2 (en) * 2015-07-16 2021-06-08 Dnastar, Inc. Protein structure prediction system
CN110148437B (en) * 2019-04-16 2021-01-01 浙江工业大学 Residue contact auxiliary strategy self-adaptive protein structure prediction method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108334746A (en) * 2018-01-15 2018-07-27 浙江工业大学 A kind of Advances in protein structure prediction based on secondary structure similarity
CN109033753A (en) * 2018-06-07 2018-12-18 浙江工业大学 A kind of group's Advances in protein structure prediction based on the assembling of secondary structure segment
CN109101785A (en) * 2018-07-12 2018-12-28 浙江工业大学 A kind of Advances in protein structure prediction based on secondary structure similarity selection strategy
CN109378035A (en) * 2018-08-29 2019-02-22 浙江工业大学 A kind of Advances in protein structure prediction based on secondary structure dynamic select strategy

Also Published As

Publication number Publication date
CN110729023A (en) 2020-01-24

Similar Documents

Publication Publication Date Title
CN107609342B (en) Protein conformation search method based on secondary structure space distance constraint
CN108334746B (en) Protein structure prediction method based on secondary structure similarity
CN110148437B (en) Residue contact auxiliary strategy self-adaptive protein structure prediction method
CN108846256B (en) Group protein structure prediction method based on residue contact information
CN109033744B (en) Protein structure prediction method based on residue distance and contact information
CN109448784B (en) Protein structure prediction method based on dihedral angle information auxiliary energy function selection
CN109215732B (en) Protein structure prediction method based on residue contact information self-learning
CN109360599B (en) Protein structure prediction method based on residue contact information cross strategy
CN108647486B (en) Protein three-dimensional structure prediction method based on conformation diversity strategy
CN109086565B (en) Protein structure prediction method based on contact constraint between residues
CN109872770B (en) Variable strategy protein structure prediction method combined with displacement degree evaluation
CN109101785B (en) Protein structure prediction method based on secondary structure similarity selection strategy
CN110610763A (en) KaTZ model-based metabolite and disease association relation prediction method
CN110729023B (en) Protein structure prediction method based on contact assistance of secondary structure elements
Sun et al. Smolign: a spatial motifs-based protein multiple structural alignment method
CN109378034B (en) Protein prediction method based on distance distribution estimation
CN109033753B (en) Group protein structure prediction method based on secondary structure fragment assembly
CN108595910B (en) Group protein conformation space optimization method based on diversity index
CN109360597B (en) Group protein structure prediction method based on global and local strategy cooperation
CN110189794B (en) Residue contact guided loop perturbation population protein structure prediction method
CN109300506B (en) Protein structure prediction method based on specific distance constraint
CN109300505B (en) Protein structure prediction method based on biased sampling
CN109658979B (en) Protein structure prediction method based on fragment library information enhancement
CN109378035B (en) Protein structure prediction method based on secondary structure dynamic selection strategy
CN109326318B (en) Group protein structure prediction method based on Loop region Gaussian disturbance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant