CN113257336A - Residue-to-distance constrained loop geometric optimization protein structure prediction method - Google Patents

Residue-to-distance constrained loop geometric optimization protein structure prediction method Download PDF

Info

Publication number
CN113257336A
CN113257336A CN202110422467.5A CN202110422467A CN113257336A CN 113257336 A CN113257336 A CN 113257336A CN 202110422467 A CN202110422467 A CN 202110422467A CN 113257336 A CN113257336 A CN 113257336A
Authority
CN
China
Prior art keywords
residue
conformation
distance
loop
fragment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110422467.5A
Other languages
Chinese (zh)
Inventor
张贵军
刘俊
杨子豪
彭春祥
赵凯龙
周晓根
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202110422467.5A priority Critical patent/CN113257336A/en
Publication of CN113257336A publication Critical patent/CN113257336A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/20Protein or domain folding

Landscapes

  • Spectroscopy & Molecular Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Peptides Or Proteins (AREA)

Abstract

A residue-to-distance constrained loop geometric optimization protein structure prediction method comprises the steps of firstly, utilizing random fragment assembly to generate an initial conformation; local distance constraint information is selected to construct a target function for sampling the loop region, and then a differential evolution algorithm is utilized to guide the residue dihedral angle of the loop region under sampling in the target function, so that the topological structure is greatly adjusted; the precision and efficiency of the whole structure can be effectively improved by combining the loop specific geometric optimization strategy with the global fragment assembly. The invention provides a method for predicting a protein structure by residue pair distance constraint loop geometric optimization, which has higher prediction precision.

Description

Residue-to-distance constrained loop geometric optimization protein structure prediction method
Technical Field
The invention relates to the fields of bioinformatics and computer application, in particular to a protein structure prediction method for residue-to-distance constrained loop geometric optimization.
Background
The protein is the main bearer of life activities, and the functions of the protein can not be separated from metabolism to disease immunity. The function of a protein is determined by its specific three-dimensional structure. Mutation of the three-dimensional structure of the protein is likely to cause the protein to lose its biological function, resulting in the occurrence of diseases. The diseases such as mad cow disease, senile dementia, Parkinson disease and the like are caused by wrong folding of protein structures. Therefore, the efficient acquisition of the three-dimensional structure of the protein is very critical for the research and development of drugs for understanding the biological functions and related diseases of the protein.
At present, the three-dimensional structure of the protein is mainly determined by experimental means such as X-ray crystal diffraction, nuclear magnetic resonance, cryoelectron microscope technology and the like. The cost of the method is extremely high, the price of one cryoelectron microscope is as high as tens of millions, and professional biological doctor operation is required; meanwhile, the experimental method for determining the three-dimensional structure of a protein also requires a great deal of time cost, and is not applicable to all types of proteins. Therefore, experimental assays cannot be used to obtain the three-dimensional structure of proteins with great accuracy.
According to the Anfinsen criterion, the amino acid sequence of a protein contains its three-dimensional structural information; prediction of the three-dimensional structure of proteins based on amino acid sequences is an important research direction in the field of bioinformatics. It is known that a large amount of structural information is contained in a protein structure database, and the rapid development of gene sequencing technology makes the number of amino acid sequences to be determined rapidly increase, providing a large data base. With the rapid development of artificial intelligence technology, more and more information can be learned from the data, so that the protein folding process can be simulated by using a computer and the three-dimensional structure of the protein can be predicted; especially in recent years, the prediction precision of the protein structure is remarkably refreshed. Representative research teams in this field include the David Baker laboratory of washington, usa, the zhang laboratory of michigan university, the schoolboy laboratory of the chicago-toyota technical research institute, usa, and the like. More and more colleges and research institutions in China are also added to the research of protein structure prediction.
The local structure of the protein is mainly divided into an alpha helix, a beta sheet and a random loop region, wherein the alpha helix and the beta sheet have obvious structural characteristics and are relatively easy to predict, and the loop region has a flexible structure and is usually difficult to determine; however, the loop region, which links the alpha helix and the beta sheet, has a significant impact on the protein topology. However, the current method rarely performs special processing on the loop area, which results in insufficient sampling of the loop area.
Therefore, the current protein structure prediction method lacks sampling of the loop region, ignores the structural specificity of the loop region, and needs improvement.
Disclosure of Invention
In order to overcome the defects of the prior art and enhance the sampling of a protein loop region and improve the prediction precision of the whole protein structure, the invention provides a method for predicting the protein structure by residue pair distance constraint loop geometric optimization, the invention designs a specific geometric optimization strategy of the loop region by utilizing the predicted distance constraint information between protein residues, constructs a local residue pair distance constraint objective function under the framework of a differential evolution algorithm, and guides the two-sided angle sampling of the loop region; on the basis, global sampling is carried out by combining a fragment assembly strategy; finally, the purpose of improving the prediction precision of the whole structure is achieved.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a method for predicting a protein structure with residue-to-distance constrained loop geometric optimization, the method comprising the steps of:
1) inputting a target sequence of a predicted protein, a fragment library and a predicted residue distance distribution;
2) setting parameters: iteration times G and a segment assembly base number T;
3) and (3) conformation initialization: randomly selecting a sliding window on a target sequence to carry out fragment assembly until dihedral angles of all residues are replaced once, so as to obtain an initial conformation P;
4) setting G as 1, where G ∈ {1, 2.
5) The residue-to-distance constrained loop region geometry is optimized as follows:
5.1) calculating the secondary structure SS of the conformation P by using DSSP algorithm, and marking all loop regions as loop regions in sequence according to SS
Figure BDA0003028388790000021
Wherein
Figure BDA0003028388790000022
Denotes the starting residue number of the mth loop region,
Figure BDA0003028388790000023
represents the number of the termination residue of the mth loop region;
5.2) setting M ═ 1, where M ∈ {1, 2.
5.3) extraction of
Figure BDA0003028388790000024
To the first
Figure BDA0003028388790000025
Dihedral angle of residue number
Figure BDA0003028388790000026
And the rotation axis corresponding to psi
Figure BDA0003028388790000027
Definition of
Figure BDA0003028388790000028
For the rotation angle on each rotation axis;
5.4) selecting two residues from the predicted distance distribution of residue pairs to be respectively positioned in the mth loop region
Figure BDA0003028388790000029
The residue pair distance constraints at both ends construct the objective function for the rotation angle Θ:
Figure BDA00030283887900000210
wherein K represents a selected residue pair distance constraint number,
Figure BDA0003028388790000031
represents the peak of the kth residue pair distance distribution,
Figure BDA0003028388790000032
to represent
Figure BDA0003028388790000033
The probabilities in the distribution, i and j denote the numbering of the two residues of the kth residue pair, di,jIs the true distance between the ith and jth residues in conformation P;
5.5) solving theta by adopting a differential evolution algorithm by taking omega as a rotating shaft, theta as a rotating variable and f (theta) as a target;
5.6) adding the values in Θ to the second of the conformations P, respectively
Figure BDA0003028388790000034
To the first
Figure BDA0003028388790000035
Dihedral angle of residue number
Figure BDA0003028388790000036
And psi, generating a new conformation P';
5.7) making P ═ P', update conformation;
5.8) m is m + 1; if M is less than or equal to M, turning to the step 6);
6) and (3) assembling global random fragments, wherein the process is as follows:
6.1) setting T ═ 1, where T ∈ {1, 2.
6.2) randomly selecting a sliding window W with the window width f from the conformation P;
6.3) randomly selecting a fragment from the fragment library corresponding to the sliding window W to replace the fragment in the sliding window W, and generating a conformation P';
6.4) calculating the energy of the conformations P 'and P' by using an energy function of Rosetta score3, and determining whether the fragment assembly is successful according to a boltzmann criterion;
6.5) if the fragment assembly is successful, P ═ P ", the conformation is renewed;
6.6) t ═ t + 1; if T is less than or equal to T, turning to the step 6.2) for next fragment assembly;
7) g is g + 1; if G is less than or equal to G, turning to the step 5); otherwise, outputting the final conformation as a prediction result.
The invention has the beneficial effects that: firstly, assembling random fragments to generate an initial conformation; local distance constraint information is selected to construct a target function for sampling the loop region, and then a differential evolution algorithm is utilized to guide the residue dihedral angle of the loop region under sampling in the target function, so that the topological structure is greatly adjusted; the precision and efficiency of the whole structure can be effectively improved by combining the loop specific geometric optimization strategy with the global fragment assembly.
Drawings
FIG. 1 is a RMSD distribution graph of conformations sampled when protein 1DP7 was structurally predicted by a method for protein structure prediction with residue-to-distance constrained loop geometry optimization.
FIG. 2 is a three-dimensional structure diagram of protein 1DP7 obtained by structure prediction using a residue-to-distance constrained loop geometry optimization protein structure prediction method.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1 and 2, a method for predicting a protein structure by residue-to-distance constrained loop geometric optimization, comprising the steps of:
1) inputting a target sequence of a predicted protein, a fragment library and a predicted residue distance distribution;
2) setting parameters: iteration times G and a segment assembly base number T;
3) and (3) conformation initialization: randomly selecting a sliding window on a target sequence to carry out fragment assembly until dihedral angles of all residues are replaced once, so as to obtain an initial conformation P;
4) setting G as 1, where G ∈ {1, 2.
5) The residue-to-distance constrained loop region geometry is optimized as follows:
5.1) calculating the secondary structure SS of the conformation P by using DSSP algorithm, and marking all loop regions as loop regions in sequence according to SS
Figure BDA0003028388790000041
Wherein
Figure BDA0003028388790000042
Denotes the starting residue number of the mth loop region,
Figure BDA0003028388790000043
represents the number of the termination residue of the mth loop region;
5.2) setting M ═ 1, where M ∈ {1, 2.
5.3) extraction of
Figure BDA0003028388790000044
To the first
Figure BDA0003028388790000045
Dihedral angle of residue number
Figure BDA0003028388790000046
And the rotation axis corresponding to psi
Figure BDA0003028388790000047
Definition of
Figure BDA0003028388790000048
For the rotation angle on each rotation axis;
5.4) selecting two residues from the predicted distance distribution of residue pairs to be respectively positioned in the mth loop region
Figure BDA0003028388790000049
The residue pair distance constraints at both ends construct the objective function for the rotation angle Θ:
Figure BDA00030283887900000410
wherein K represents a selected residue pair distance constraint number,
Figure BDA00030283887900000411
represents the peak of the kth residue pair distance distribution,
Figure BDA00030283887900000412
to represent
Figure BDA00030283887900000413
The probabilities in the distribution, i and j denote the numbering of the two residues of the kth residue pair, di,jIs the true distance between the ith and jth residues in conformation P;
5.5) solving theta by adopting a differential evolution algorithm by taking omega as a rotating shaft, theta as a rotating variable and f (theta) as a target;
5.6) adding the values in Θ to the second of the conformations P, respectively
Figure BDA00030283887900000414
To the first
Figure BDA00030283887900000415
Dihedral angle of residue number
Figure BDA00030283887900000416
And psi, generating a new conformation P';
5.7) making P ═ P', update conformation;
5.8) m is m + 1; if M is less than or equal to M, turning to the step 6);
6) and (3) assembling global random fragments, wherein the process is as follows:
6.1) setting T ═ 1, where T ∈ {1, 2.
6.2) randomly selecting a sliding window W with the window width f from the conformation P;
6.3) randomly selecting a fragment from the fragment library corresponding to the sliding window W to replace the fragment in the sliding window W, and generating a conformation P';
6.4) calculating the energy of the conformations P 'and P' by using an energy function of Rosetta score3, and determining whether the fragment assembly is successful according to a boltzmann criterion;
6.5) if the fragment assembly is successful, P ═ P ", the conformation is renewed;
6.6) t ═ t + 1; if T is less than or equal to T, turning to the step 6.2) for next fragment assembly;
7) g is g + 1; if G is less than or equal to G, turning to the step 5); otherwise, outputting the final conformation as a prediction result.
In this embodiment, taking the protein 1DP7 with a sequence length of 76 as an example, a method for predicting a protein structure by residue-to-distance constrained loop geometric optimization includes the following steps:
1) inputting a target sequence of a predicted protein, a fragment library and a predicted residue distance distribution;
2) setting parameters: the iteration number G is 100, and the segment assembly base T is 200;
3) and (3) conformation initialization: randomly selecting a sliding window on a target sequence to carry out fragment assembly until dihedral angles of all residues are replaced once, so as to obtain an initial conformation P;
4) setting G as 1, where G ∈ {1, 2.
5) The residue-to-distance constrained loop region geometry is optimized as follows:
5.1) calculating the secondary structure SS of the conformation P by using DSSP algorithm, and marking all loop regions as loop regions in sequence according to SS
Figure BDA0003028388790000051
Wherein
Figure BDA0003028388790000052
Denotes the starting residue number of the mth loop region,
Figure BDA0003028388790000053
represents the number of the termination residue of the mth loop region;
5.2) setting M ═ 1, where M ∈ {1, 2.
5.3) extraction of
Figure BDA0003028388790000054
To the first
Figure BDA0003028388790000055
Dihedral angle of residue number
Figure BDA0003028388790000056
And the rotation axis corresponding to psi
Figure BDA0003028388790000057
Definition of
Figure BDA0003028388790000058
For the rotation angle on each rotation axis;
5.4) selecting two residues from the predicted distance distribution of residue pairs to be respectively positioned in the mth loop region Lb mThe residue pair distance constraints at both ends construct the objective function for the rotation angle Θ:
Figure BDA0003028388790000059
wherein K represents a selected residue pair distance constraint number,
Figure BDA0003028388790000061
represents the peak of the kth residue pair distance distribution,
Figure BDA0003028388790000062
to represent
Figure BDA0003028388790000063
The probabilities in the distribution, i and j denote the numbering of the two residues of the kth residue pair, di,jIs the true distance between the ith and jth residues in conformation P;
5.5) solving theta by adopting a differential evolution algorithm by taking omega as a rotating shaft, theta as a rotating variable and f (theta) as a target;
5.6) adding the values in Θ to the second of the conformations P, respectively
Figure BDA0003028388790000064
To the first
Figure BDA0003028388790000065
Dihedral angle of residue number
Figure BDA0003028388790000066
And psi, generating a new conformation P';
5.7) making P ═ P', update conformation;
5.8) m is m + 1; if M is less than or equal to M, turning to the step 6);
6) and (3) assembling global random fragments, wherein the process is as follows:
6.1) setting T ═ 1, where T ∈ {1, 2.
6.2) randomly selecting a sliding window W with the window width f from the conformation P;
6.3) randomly selecting a fragment from the fragment library corresponding to the sliding window W to replace the fragment in the sliding window W, and generating a conformation P';
6.4) calculating the energy of the conformations P 'and P' by using an energy function of Rosetta score3, and determining whether the fragment assembly is successful according to a boltzmann criterion;
6.5) if the fragment assembly is successful, P ═ P ", the conformation is renewed;
6.6) t ═ t + 1; if T is less than or equal to T, turning to the step 6.2) for next fragment assembly;
7) g is g + 1; if G is less than or equal to G, turning to the step 5); otherwise, outputting the final conformation as a prediction result.
Using protein 1DP7 with amino acid sequence length of 76 as an example, the above method was used to predict the near-native conformation of the protein, the RMSD profile of the sampled conformation is shown in FIG. 1, and the predicted RMS deviation of the protein is
Figure BDA0003028388790000067
The prediction structure is shown in fig. 2.
The foregoing is a predictive effect of one embodiment of the invention, which may be adapted not only to the above-described embodiment, but also to various modifications thereof without departing from the basic idea of the invention and without exceeding the gist of the invention.

Claims (1)

1. A protein structure prediction method for residue-to-distance constrained loop geometric optimization, which is characterized by comprising the following steps:
1) inputting a target sequence of a predicted protein, a fragment library and a predicted residue distance distribution;
2) setting parameters: iteration times G and a segment assembly base number T;
3) and (3) conformation initialization: randomly selecting a sliding window on a target sequence to carry out fragment assembly until dihedral angles of all residues are replaced once, so as to obtain an initial conformation P;
4) setting G as 1, where G ∈ {1, 2.
5) The residue-to-distance constrained loop region geometry is optimized as follows:
5.1) calculating the secondary structure SS of the conformation P by using DSSP algorithm, and marking all loop regions as loop regions in sequence according to SS
Figure FDA0003028388780000011
Wherein
Figure FDA0003028388780000012
Denotes the starting residue number of the mth loop region,
Figure FDA0003028388780000013
represents the number of the termination residue of the mth loop region;
5.2) setting M ═ 1, where M ∈ {1, 2.
5.3) extraction of
Figure FDA0003028388780000014
To the first
Figure FDA0003028388780000015
Dihedral angle of residue number
Figure FDA0003028388780000016
And the rotation axis corresponding to psi
Figure FDA0003028388780000017
Definition of
Figure FDA0003028388780000018
For the rotation angle on each rotation axis;
5.4) selecting two residues from the predicted distance distribution of residue pairs to be respectively positioned in the mth loop region
Figure FDA0003028388780000019
The residue pair distance constraints at both ends construct the objective function for the rotation angle Θ:
Figure FDA00030283887800000110
wherein K represents a selected residue pair distance constraint number,
Figure FDA00030283887800000111
represents the peak of the kth residue pair distance distribution,
Figure FDA00030283887800000112
to represent
Figure FDA00030283887800000113
The probabilities in the distribution, i and j denote the numbering of the two residues of the kth residue pair, di,jIs the true distance between the ith and jth residues in conformation P;
5.5) solving theta by adopting a differential evolution algorithm by taking omega as a rotating shaft, theta as a rotating variable and f (theta) as a target;
5.6) adding the values in Θ to the second of the conformations P, respectively
Figure FDA00030283887800000114
To the first
Figure FDA00030283887800000115
Dihedral angle of residue number
Figure FDA00030283887800000116
And psi, generating a new conformation P';
5.7) making P ═ P', update conformation;
5.8) m is m + 1; if M is less than or equal to M, turning to the step 6);
6) and (3) assembling global random fragments, wherein the process is as follows:
6.1) setting T ═ 1, where T ∈ {1, 2.
6.2) randomly selecting a sliding window W with the window width f from the conformation P;
6.3) randomly selecting a fragment from the fragment library corresponding to the sliding window W to replace the fragment in the sliding window W, and generating a conformation P';
6.4) calculating the energy of the conformations P 'and P' by using an energy function of Rosetta score3, and determining whether the fragment assembly is successful according to a boltzmann criterion;
6.5) if the fragment assembly is successful, P ═ P ", the conformation is renewed;
6.6) t ═ t + 1; if T is less than or equal to T, turning to the step 6.2) for next fragment assembly;
7) g is g + 1; if G is less than or equal to G, turning to the step 5); otherwise, outputting the final conformation as a prediction result.
CN202110422467.5A 2021-04-20 2021-04-20 Residue-to-distance constrained loop geometric optimization protein structure prediction method Withdrawn CN113257336A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110422467.5A CN113257336A (en) 2021-04-20 2021-04-20 Residue-to-distance constrained loop geometric optimization protein structure prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110422467.5A CN113257336A (en) 2021-04-20 2021-04-20 Residue-to-distance constrained loop geometric optimization protein structure prediction method

Publications (1)

Publication Number Publication Date
CN113257336A true CN113257336A (en) 2021-08-13

Family

ID=77221141

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110422467.5A Withdrawn CN113257336A (en) 2021-04-20 2021-04-20 Residue-to-distance constrained loop geometric optimization protein structure prediction method

Country Status (1)

Country Link
CN (1) CN113257336A (en)

Similar Documents

Publication Publication Date Title
CN112585686B (en) Determination of protein distance maps by combining distance map clipping
CN107609342B (en) Protein conformation search method based on secondary structure space distance constraint
CN108334746B (en) Protein structure prediction method based on secondary structure similarity
CN108846256B (en) Group protein structure prediction method based on residue contact information
CN110148437B (en) Residue contact auxiliary strategy self-adaptive protein structure prediction method
CN109448784B (en) Protein structure prediction method based on dihedral angle information auxiliary energy function selection
CN109033744B (en) Protein structure prediction method based on residue distance and contact information
CN107491664B (en) Protein structure de novo prediction method based on information entropy
CN109360599B (en) Protein structure prediction method based on residue contact information cross strategy
CN109360596B (en) Protein conformation space optimization method based on differential evolution local disturbance
CN109086566B (en) Group protein structure prediction method based on fragment resampling
CN109872770B (en) Variable strategy protein structure prediction method combined with displacement degree evaluation
CN109360601B (en) Multi-modal protein structure prediction method based on displacement strategy
CN109378034B (en) Protein prediction method based on distance distribution estimation
CN109346128B (en) Protein structure prediction method based on residue information dynamic selection strategy
CN109360597B (en) Group protein structure prediction method based on global and local strategy cooperation
CN109033753B (en) Group protein structure prediction method based on secondary structure fragment assembly
CN110189794B (en) Residue contact guided loop perturbation population protein structure prediction method
Zhang et al. Two-stage distance feature-based optimization algorithm for de novo protein structure prediction
CN108920894B (en) Protein conformation space optimization method based on brief abstract convex estimation
CN113257336A (en) Residue-to-distance constrained loop geometric optimization protein structure prediction method
CN108763860B (en) Loop information sampling-based group protein conformation space optimization method
CN110729023B (en) Protein structure prediction method based on contact assistance of secondary structure elements
CN109411013B (en) Group protein structure prediction method based on individual specific variation strategy
CN111951885B (en) Protein structure prediction method based on local bias

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20210813