CN113257336A - Residue-to-distance constrained loop geometric optimization protein structure prediction method - Google Patents
Residue-to-distance constrained loop geometric optimization protein structure prediction method Download PDFInfo
- Publication number
- CN113257336A CN113257336A CN202110422467.5A CN202110422467A CN113257336A CN 113257336 A CN113257336 A CN 113257336A CN 202110422467 A CN202110422467 A CN 202110422467A CN 113257336 A CN113257336 A CN 113257336A
- Authority
- CN
- China
- Prior art keywords
- residue
- conformation
- distance
- loop
- fragment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
- G16B15/20—Protein or domain folding
Landscapes
- Spectroscopy & Molecular Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Crystallography & Structural Chemistry (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Theoretical Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Peptides Or Proteins (AREA)
Abstract
A residue-to-distance constrained loop geometric optimization protein structure prediction method comprises the steps of firstly, utilizing random fragment assembly to generate an initial conformation; local distance constraint information is selected to construct a target function for sampling the loop region, and then a differential evolution algorithm is utilized to guide the residue dihedral angle of the loop region under sampling in the target function, so that the topological structure is greatly adjusted; the precision and efficiency of the whole structure can be effectively improved by combining the loop specific geometric optimization strategy with the global fragment assembly. The invention provides a method for predicting a protein structure by residue pair distance constraint loop geometric optimization, which has higher prediction precision.
Description
Technical Field
The invention relates to the fields of bioinformatics and computer application, in particular to a protein structure prediction method for residue-to-distance constrained loop geometric optimization.
Background
The protein is the main bearer of life activities, and the functions of the protein can not be separated from metabolism to disease immunity. The function of a protein is determined by its specific three-dimensional structure. Mutation of the three-dimensional structure of the protein is likely to cause the protein to lose its biological function, resulting in the occurrence of diseases. The diseases such as mad cow disease, senile dementia, Parkinson disease and the like are caused by wrong folding of protein structures. Therefore, the efficient acquisition of the three-dimensional structure of the protein is very critical for the research and development of drugs for understanding the biological functions and related diseases of the protein.
At present, the three-dimensional structure of the protein is mainly determined by experimental means such as X-ray crystal diffraction, nuclear magnetic resonance, cryoelectron microscope technology and the like. The cost of the method is extremely high, the price of one cryoelectron microscope is as high as tens of millions, and professional biological doctor operation is required; meanwhile, the experimental method for determining the three-dimensional structure of a protein also requires a great deal of time cost, and is not applicable to all types of proteins. Therefore, experimental assays cannot be used to obtain the three-dimensional structure of proteins with great accuracy.
According to the Anfinsen criterion, the amino acid sequence of a protein contains its three-dimensional structural information; prediction of the three-dimensional structure of proteins based on amino acid sequences is an important research direction in the field of bioinformatics. It is known that a large amount of structural information is contained in a protein structure database, and the rapid development of gene sequencing technology makes the number of amino acid sequences to be determined rapidly increase, providing a large data base. With the rapid development of artificial intelligence technology, more and more information can be learned from the data, so that the protein folding process can be simulated by using a computer and the three-dimensional structure of the protein can be predicted; especially in recent years, the prediction precision of the protein structure is remarkably refreshed. Representative research teams in this field include the David Baker laboratory of washington, usa, the zhang laboratory of michigan university, the schoolboy laboratory of the chicago-toyota technical research institute, usa, and the like. More and more colleges and research institutions in China are also added to the research of protein structure prediction.
The local structure of the protein is mainly divided into an alpha helix, a beta sheet and a random loop region, wherein the alpha helix and the beta sheet have obvious structural characteristics and are relatively easy to predict, and the loop region has a flexible structure and is usually difficult to determine; however, the loop region, which links the alpha helix and the beta sheet, has a significant impact on the protein topology. However, the current method rarely performs special processing on the loop area, which results in insufficient sampling of the loop area.
Therefore, the current protein structure prediction method lacks sampling of the loop region, ignores the structural specificity of the loop region, and needs improvement.
Disclosure of Invention
In order to overcome the defects of the prior art and enhance the sampling of a protein loop region and improve the prediction precision of the whole protein structure, the invention provides a method for predicting the protein structure by residue pair distance constraint loop geometric optimization, the invention designs a specific geometric optimization strategy of the loop region by utilizing the predicted distance constraint information between protein residues, constructs a local residue pair distance constraint objective function under the framework of a differential evolution algorithm, and guides the two-sided angle sampling of the loop region; on the basis, global sampling is carried out by combining a fragment assembly strategy; finally, the purpose of improving the prediction precision of the whole structure is achieved.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a method for predicting a protein structure with residue-to-distance constrained loop geometric optimization, the method comprising the steps of:
1) inputting a target sequence of a predicted protein, a fragment library and a predicted residue distance distribution;
2) setting parameters: iteration times G and a segment assembly base number T;
3) and (3) conformation initialization: randomly selecting a sliding window on a target sequence to carry out fragment assembly until dihedral angles of all residues are replaced once, so as to obtain an initial conformation P;
4) setting G as 1, where G ∈ {1, 2.
5) The residue-to-distance constrained loop region geometry is optimized as follows:
5.1) calculating the secondary structure SS of the conformation P by using DSSP algorithm, and marking all loop regions as loop regions in sequence according to SSWhereinDenotes the starting residue number of the mth loop region,represents the number of the termination residue of the mth loop region;
5.2) setting M ═ 1, where M ∈ {1, 2.
5.3) extraction ofTo the firstDihedral angle of residue numberAnd the rotation axis corresponding to psiDefinition ofFor the rotation angle on each rotation axis;
5.4) selecting two residues from the predicted distance distribution of residue pairs to be respectively positioned in the mth loop regionThe residue pair distance constraints at both ends construct the objective function for the rotation angle Θ:
wherein K represents a selected residue pair distance constraint number,represents the peak of the kth residue pair distance distribution,to representThe probabilities in the distribution, i and j denote the numbering of the two residues of the kth residue pair, di,jIs the true distance between the ith and jth residues in conformation P;
5.5) solving theta by adopting a differential evolution algorithm by taking omega as a rotating shaft, theta as a rotating variable and f (theta) as a target;
5.6) adding the values in Θ to the second of the conformations P, respectivelyTo the firstDihedral angle of residue numberAnd psi, generating a new conformation P';
5.7) making P ═ P', update conformation;
5.8) m is m + 1; if M is less than or equal to M, turning to the step 6);
6) and (3) assembling global random fragments, wherein the process is as follows:
6.1) setting T ═ 1, where T ∈ {1, 2.
6.2) randomly selecting a sliding window W with the window width f from the conformation P;
6.3) randomly selecting a fragment from the fragment library corresponding to the sliding window W to replace the fragment in the sliding window W, and generating a conformation P';
6.4) calculating the energy of the conformations P 'and P' by using an energy function of Rosetta score3, and determining whether the fragment assembly is successful according to a boltzmann criterion;
6.5) if the fragment assembly is successful, P ═ P ", the conformation is renewed;
6.6) t ═ t + 1; if T is less than or equal to T, turning to the step 6.2) for next fragment assembly;
7) g is g + 1; if G is less than or equal to G, turning to the step 5); otherwise, outputting the final conformation as a prediction result.
The invention has the beneficial effects that: firstly, assembling random fragments to generate an initial conformation; local distance constraint information is selected to construct a target function for sampling the loop region, and then a differential evolution algorithm is utilized to guide the residue dihedral angle of the loop region under sampling in the target function, so that the topological structure is greatly adjusted; the precision and efficiency of the whole structure can be effectively improved by combining the loop specific geometric optimization strategy with the global fragment assembly.
Drawings
FIG. 1 is a RMSD distribution graph of conformations sampled when protein 1DP7 was structurally predicted by a method for protein structure prediction with residue-to-distance constrained loop geometry optimization.
FIG. 2 is a three-dimensional structure diagram of protein 1DP7 obtained by structure prediction using a residue-to-distance constrained loop geometry optimization protein structure prediction method.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1 and 2, a method for predicting a protein structure by residue-to-distance constrained loop geometric optimization, comprising the steps of:
1) inputting a target sequence of a predicted protein, a fragment library and a predicted residue distance distribution;
2) setting parameters: iteration times G and a segment assembly base number T;
3) and (3) conformation initialization: randomly selecting a sliding window on a target sequence to carry out fragment assembly until dihedral angles of all residues are replaced once, so as to obtain an initial conformation P;
4) setting G as 1, where G ∈ {1, 2.
5) The residue-to-distance constrained loop region geometry is optimized as follows:
5.1) calculating the secondary structure SS of the conformation P by using DSSP algorithm, and marking all loop regions as loop regions in sequence according to SSWhereinDenotes the starting residue number of the mth loop region,represents the number of the termination residue of the mth loop region;
5.2) setting M ═ 1, where M ∈ {1, 2.
5.3) extraction ofTo the firstDihedral angle of residue numberAnd the rotation axis corresponding to psiDefinition ofFor the rotation angle on each rotation axis;
5.4) selecting two residues from the predicted distance distribution of residue pairs to be respectively positioned in the mth loop regionThe residue pair distance constraints at both ends construct the objective function for the rotation angle Θ:
wherein K represents a selected residue pair distance constraint number,represents the peak of the kth residue pair distance distribution,to representThe probabilities in the distribution, i and j denote the numbering of the two residues of the kth residue pair, di,jIs the true distance between the ith and jth residues in conformation P;
5.5) solving theta by adopting a differential evolution algorithm by taking omega as a rotating shaft, theta as a rotating variable and f (theta) as a target;
5.6) adding the values in Θ to the second of the conformations P, respectivelyTo the firstDihedral angle of residue numberAnd psi, generating a new conformation P';
5.7) making P ═ P', update conformation;
5.8) m is m + 1; if M is less than or equal to M, turning to the step 6);
6) and (3) assembling global random fragments, wherein the process is as follows:
6.1) setting T ═ 1, where T ∈ {1, 2.
6.2) randomly selecting a sliding window W with the window width f from the conformation P;
6.3) randomly selecting a fragment from the fragment library corresponding to the sliding window W to replace the fragment in the sliding window W, and generating a conformation P';
6.4) calculating the energy of the conformations P 'and P' by using an energy function of Rosetta score3, and determining whether the fragment assembly is successful according to a boltzmann criterion;
6.5) if the fragment assembly is successful, P ═ P ", the conformation is renewed;
6.6) t ═ t + 1; if T is less than or equal to T, turning to the step 6.2) for next fragment assembly;
7) g is g + 1; if G is less than or equal to G, turning to the step 5); otherwise, outputting the final conformation as a prediction result.
In this embodiment, taking the protein 1DP7 with a sequence length of 76 as an example, a method for predicting a protein structure by residue-to-distance constrained loop geometric optimization includes the following steps:
1) inputting a target sequence of a predicted protein, a fragment library and a predicted residue distance distribution;
2) setting parameters: the iteration number G is 100, and the segment assembly base T is 200;
3) and (3) conformation initialization: randomly selecting a sliding window on a target sequence to carry out fragment assembly until dihedral angles of all residues are replaced once, so as to obtain an initial conformation P;
4) setting G as 1, where G ∈ {1, 2.
5) The residue-to-distance constrained loop region geometry is optimized as follows:
5.1) calculating the secondary structure SS of the conformation P by using DSSP algorithm, and marking all loop regions as loop regions in sequence according to SSWhereinDenotes the starting residue number of the mth loop region,represents the number of the termination residue of the mth loop region;
5.2) setting M ═ 1, where M ∈ {1, 2.
5.3) extraction ofTo the firstDihedral angle of residue numberAnd the rotation axis corresponding to psiDefinition ofFor the rotation angle on each rotation axis;
5.4) selecting two residues from the predicted distance distribution of residue pairs to be respectively positioned in the mth loop region Lb mThe residue pair distance constraints at both ends construct the objective function for the rotation angle Θ:
wherein K represents a selected residue pair distance constraint number,represents the peak of the kth residue pair distance distribution,to representThe probabilities in the distribution, i and j denote the numbering of the two residues of the kth residue pair, di,jIs the true distance between the ith and jth residues in conformation P;
5.5) solving theta by adopting a differential evolution algorithm by taking omega as a rotating shaft, theta as a rotating variable and f (theta) as a target;
5.6) adding the values in Θ to the second of the conformations P, respectivelyTo the firstDihedral angle of residue numberAnd psi, generating a new conformation P';
5.7) making P ═ P', update conformation;
5.8) m is m + 1; if M is less than or equal to M, turning to the step 6);
6) and (3) assembling global random fragments, wherein the process is as follows:
6.1) setting T ═ 1, where T ∈ {1, 2.
6.2) randomly selecting a sliding window W with the window width f from the conformation P;
6.3) randomly selecting a fragment from the fragment library corresponding to the sliding window W to replace the fragment in the sliding window W, and generating a conformation P';
6.4) calculating the energy of the conformations P 'and P' by using an energy function of Rosetta score3, and determining whether the fragment assembly is successful according to a boltzmann criterion;
6.5) if the fragment assembly is successful, P ═ P ", the conformation is renewed;
6.6) t ═ t + 1; if T is less than or equal to T, turning to the step 6.2) for next fragment assembly;
7) g is g + 1; if G is less than or equal to G, turning to the step 5); otherwise, outputting the final conformation as a prediction result.
Using protein 1DP7 with amino acid sequence length of 76 as an example, the above method was used to predict the near-native conformation of the protein, the RMSD profile of the sampled conformation is shown in FIG. 1, and the predicted RMS deviation of the protein isThe prediction structure is shown in fig. 2.
The foregoing is a predictive effect of one embodiment of the invention, which may be adapted not only to the above-described embodiment, but also to various modifications thereof without departing from the basic idea of the invention and without exceeding the gist of the invention.
Claims (1)
1. A protein structure prediction method for residue-to-distance constrained loop geometric optimization, which is characterized by comprising the following steps:
1) inputting a target sequence of a predicted protein, a fragment library and a predicted residue distance distribution;
2) setting parameters: iteration times G and a segment assembly base number T;
3) and (3) conformation initialization: randomly selecting a sliding window on a target sequence to carry out fragment assembly until dihedral angles of all residues are replaced once, so as to obtain an initial conformation P;
4) setting G as 1, where G ∈ {1, 2.
5) The residue-to-distance constrained loop region geometry is optimized as follows:
5.1) calculating the secondary structure SS of the conformation P by using DSSP algorithm, and marking all loop regions as loop regions in sequence according to SSWhereinDenotes the starting residue number of the mth loop region,represents the number of the termination residue of the mth loop region;
5.2) setting M ═ 1, where M ∈ {1, 2.
5.3) extraction ofTo the firstDihedral angle of residue numberAnd the rotation axis corresponding to psiDefinition ofFor the rotation angle on each rotation axis;
5.4) selecting two residues from the predicted distance distribution of residue pairs to be respectively positioned in the mth loop regionThe residue pair distance constraints at both ends construct the objective function for the rotation angle Θ:
wherein K represents a selected residue pair distance constraint number,represents the peak of the kth residue pair distance distribution,to representThe probabilities in the distribution, i and j denote the numbering of the two residues of the kth residue pair, di,jIs the true distance between the ith and jth residues in conformation P;
5.5) solving theta by adopting a differential evolution algorithm by taking omega as a rotating shaft, theta as a rotating variable and f (theta) as a target;
5.6) adding the values in Θ to the second of the conformations P, respectivelyTo the firstDihedral angle of residue numberAnd psi, generating a new conformation P';
5.7) making P ═ P', update conformation;
5.8) m is m + 1; if M is less than or equal to M, turning to the step 6);
6) and (3) assembling global random fragments, wherein the process is as follows:
6.1) setting T ═ 1, where T ∈ {1, 2.
6.2) randomly selecting a sliding window W with the window width f from the conformation P;
6.3) randomly selecting a fragment from the fragment library corresponding to the sliding window W to replace the fragment in the sliding window W, and generating a conformation P';
6.4) calculating the energy of the conformations P 'and P' by using an energy function of Rosetta score3, and determining whether the fragment assembly is successful according to a boltzmann criterion;
6.5) if the fragment assembly is successful, P ═ P ", the conformation is renewed;
6.6) t ═ t + 1; if T is less than or equal to T, turning to the step 6.2) for next fragment assembly;
7) g is g + 1; if G is less than or equal to G, turning to the step 5); otherwise, outputting the final conformation as a prediction result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110422467.5A CN113257336A (en) | 2021-04-20 | 2021-04-20 | Residue-to-distance constrained loop geometric optimization protein structure prediction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110422467.5A CN113257336A (en) | 2021-04-20 | 2021-04-20 | Residue-to-distance constrained loop geometric optimization protein structure prediction method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113257336A true CN113257336A (en) | 2021-08-13 |
Family
ID=77221141
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110422467.5A Withdrawn CN113257336A (en) | 2021-04-20 | 2021-04-20 | Residue-to-distance constrained loop geometric optimization protein structure prediction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113257336A (en) |
-
2021
- 2021-04-20 CN CN202110422467.5A patent/CN113257336A/en not_active Withdrawn
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112585686B (en) | Determination of protein distance maps by combining distance map clipping | |
CN107609342B (en) | Protein conformation search method based on secondary structure space distance constraint | |
CN108334746B (en) | Protein structure prediction method based on secondary structure similarity | |
CN108846256B (en) | Group protein structure prediction method based on residue contact information | |
CN110148437B (en) | Residue contact auxiliary strategy self-adaptive protein structure prediction method | |
CN109448784B (en) | Protein structure prediction method based on dihedral angle information auxiliary energy function selection | |
CN109033744B (en) | Protein structure prediction method based on residue distance and contact information | |
CN107491664B (en) | Protein structure de novo prediction method based on information entropy | |
CN109360599B (en) | Protein structure prediction method based on residue contact information cross strategy | |
CN109360596B (en) | Protein conformation space optimization method based on differential evolution local disturbance | |
CN109086566B (en) | Group protein structure prediction method based on fragment resampling | |
CN109872770B (en) | Variable strategy protein structure prediction method combined with displacement degree evaluation | |
CN109360601B (en) | Multi-modal protein structure prediction method based on displacement strategy | |
CN109378034B (en) | Protein prediction method based on distance distribution estimation | |
CN109346128B (en) | Protein structure prediction method based on residue information dynamic selection strategy | |
CN109360597B (en) | Group protein structure prediction method based on global and local strategy cooperation | |
CN109033753B (en) | Group protein structure prediction method based on secondary structure fragment assembly | |
CN110189794B (en) | Residue contact guided loop perturbation population protein structure prediction method | |
Zhang et al. | Two-stage distance feature-based optimization algorithm for de novo protein structure prediction | |
CN108920894B (en) | Protein conformation space optimization method based on brief abstract convex estimation | |
CN113257336A (en) | Residue-to-distance constrained loop geometric optimization protein structure prediction method | |
CN108763860B (en) | Loop information sampling-based group protein conformation space optimization method | |
CN110729023B (en) | Protein structure prediction method based on contact assistance of secondary structure elements | |
CN109411013B (en) | Group protein structure prediction method based on individual specific variation strategy | |
CN111951885B (en) | Protein structure prediction method based on local bias |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20210813 |