CN113205855B - Knowledge energy function optimization-based membrane protein three-dimensional structure prediction method - Google Patents
Knowledge energy function optimization-based membrane protein three-dimensional structure prediction method Download PDFInfo
- Publication number
- CN113205855B CN113205855B CN202110636292.8A CN202110636292A CN113205855B CN 113205855 B CN113205855 B CN 113205855B CN 202110636292 A CN202110636292 A CN 202110636292A CN 113205855 B CN113205855 B CN 113205855B
- Authority
- CN
- China
- Prior art keywords
- residue
- prediction
- dimensional structure
- energy function
- knowledge
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B5/00—ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
- G16B5/20—Probabilistic models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Theoretical Computer Science (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Biotechnology (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Chemical & Material Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Analytical Chemistry (AREA)
- Epidemiology (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Molecular Biology (AREA)
- Physiology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
A membrane protein three-dimensional structure prediction method based on knowledge energy function optimization is characterized in that constraint on residue distance is obtained according to a multi-sequence comparison result of an input sequence and statistical knowledge, a structure fragment query library is constructed according to a secondary structure prediction result of the input sequence and a known structure in a protein structure database PDB, and an energy function of a knowledge base is calculated according to a residue contact prediction result of the input sequence; then, iteratively carrying out fragment replacement on the initial structure under the conditions of energy function and residue distance constraint to obtain a plurality of candidate structures; and finally, screening the candidate structure to obtain the final predicted three-dimensional structure of the membrane protein. The invention is based on a de novo prediction method, uses multiple technologies such as Multiple Sequence Alignment (MSA), secondary structure prediction, residue contact prediction and the like, and has the advantages of convenient operation, high accuracy and the like.
Description
Technical Field
The invention relates to a technology in the field of bioengineering, in particular to a membrane protein three-dimensional structure prediction method based on knowledge energy function optimization.
Background
The method for obtaining accurate protein structure information is through experimental determination, the most common experimental methods at present are X-ray diffraction method, nuclear magnetic resonance method and frozen electron microscope technology, etc., and the protein structures obtained through the experimental methods are stored in a biological Database pdb (protein Database bank). The number of structures of the existing analyzed high-resolution membrane protein in PDB is only 1267, which accounts for about 2% of the total number of the protein structures in PDB, so that the calculation method for predicting the three-dimensional structure of the membrane protein is very important. The current computational methods predict protein structure in two main directions, one is template modeling and the other is de novo prediction. For membrane proteins, since the structures available in PDB are rare and no suitable template is generally found, most membrane protein structure prediction methods are based on de novo calculations. On the other hand, the membrane protein has a longer amino acid sequence, which causes the time efficiency of most de novo prediction methods to be extremely reduced, and some methods can not even complete the prediction task.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a membrane protein three-dimensional structure prediction method based on knowledge energy function optimization, which is based on a de novo prediction method, uses multiple technologies such as Multiple Sequence Alignment (MSA), secondary structure prediction, residue contact prediction and the like, and has the advantages of convenient operation, high accuracy and the like.
The invention is realized by the following technical scheme:
the invention relates to a membrane protein three-dimensional structure prediction method based on knowledge energy function optimization, which comprises the steps of respectively combining statistical knowledge with the multi-sequence comparison result of an input sequence to obtain the constraint on residue distance, combining a known structure in PDB with the secondary structure prediction result of the input sequence to construct a structure fragment query library, and calculating an energy function of a knowledge base according to the residue contact prediction result of the input sequence; then, iteratively carrying out fragment replacement on the initial structure under the conditions of energy function and residue distance constraint to obtain a plurality of candidate structures; and finally, screening the candidate structure to obtain the final predicted three-dimensional structure of the membrane protein.
The input sequence is an amino acid sequence of the membrane protein, the sequence length is not limited, a plurality of transmembrane helices are included, and the transmembrane part accounts for the main body.
And the multiple sequence comparison result is obtained by comparing the input sequence with the middle sequence to select a plurality of sequences with higher homology.
The constraint on the residue distance is that: according to the statistical rules of various protein structures in PDB, the maximum value and the minimum value of the value range of the C beta-C beta distance when two types of residues have contact relation are limited.
The statistical rule is as follows: analyzing a large number of real protein structures, calculating two types of residue distances with contact relation, and counting the value range obtained by the distances.
The secondary structure prediction result is obtained through a membrane protein transmembrane helix (TMH) prediction model (Membrain) based on multi-scale deep learning, and the specific operation steps are as follows: inputting a protein sequence, obtaining a large number of similar sequences through multi-sequence comparison, combining co-evolution information, and predicting a transmembrane region and a transmembrane direction by using a deep learning model and a support vector machine classifier.
The prediction model of transmembrane helix of membrane protein comprises: a transmembrane region prediction module and a direction prediction module, wherein: the transmembrane region prediction module comprises a multi-scale deep learning model and a binarization processing module, wherein the deep learning model consists of a small-scale residual error neural network based on residues and a large-scale residual error neural network based on a full sequence, and the binarization processing module is used for carrying out binarization processing on an original prediction score according to a dynamic threshold value and solving the problem of insufficient segmentation; the direction prediction module uses a support vector machine classifier (SVM).
The structural fragment query library comprises: the query library is constructed based on a fragment with a specific secondary structure, including an alpha-helix and a beta-sheet, and the minimum length of the fragment is 5 residues, the query library is constructed based on a protein fragment with a fixed length, the minimum length of the fragment is 9 residues and the maximum length of the fragment is 16 residues, and the query library is constructed based on a short fragment with a short length of 3 residues.
Each fixed length segment in the fixed length segment query library is the same as the secondary structure of a corresponding position of a certain segment with the same length in the query sequence.
The residue contact prediction result is obtained by a deep learning-based protein residue contact prediction model (shen-CDeep), and specifically comprises the following steps: dividing the C beta-C beta distance of residues into 10 regionsAnd (b) respectively: in the above, the probability of each pair of residues being located in each distance bin is predicted.
The protein residue contact prediction model comprises five groups of 29 improved ResNet residual modules which are divided into 3, 4, 6, 8 and 8 groups respectively, wherein a dilated convolution mechanism is introduced into the first three groups of modules, and a channel-based attention mechanism is introduced into the second two groups of modules.
The energy function based on the knowledge base is as follows: calculating a score by using the score-d function relation group of each residue pair, and taking the accumulated result of all scores as the energy value of the whole structure, wherein: the score-d functional relationship group refers to: for a protein sequence of length L residues, the predicted C beta-C beta distance is selected to beThe first L residue pairs of the probability between and the distance betweenPredicting the first L/5 residue pairs of the probability, and calculating the score of each residue pair in each probability interval for each residue pair after removing the repeated part between the residue pairsWherein: n is 9, i is the number of the interval, i is 1, 2, …, 9, d i Is the midpoint of the i-th interval, p i Is the probability value corresponding to the ith interval, and α is a normalization term, where the constant α is 1.57; then, each group is respectively processed with cubic spline interpolation to obtainA set of score-d functional relationships within the range.
The initial structure is as follows: the peptide bond between two residues is parallel to the backbone of the residue and overall is a straight chain, i.e., the starting point for the iterative substitution of fragments.
The segment replacement comprises the following steps:
i. generating a random number R1 for determining which of three fragment substitutions (secondary structure fragment substitution, fixed length fragment substitution, short fragment substitution) is to be made;
ii. Generating a random number R2 for determining the starting position for segment replacement;
iii, generating a random number R3 for selecting a specific type of fragment;
iv, carrying out coordinate transformation to complete a round of segment replacement process;
and v, judging whether the replaced structure meets the constraint condition, and if so, retaining the structure and not discarding the structure.
And the candidate structure repeatedly iterates the initial structure through a simulated annealing algorithm, and replaces other candidate structures with higher energy values each time when a structure with a lower energy value is generated.
The number of repeated iterations is preferably 2000 ten thousand or more, and the number of corresponding candidate structures is preferably 100 or more.
The screening is as follows: using another energy function based on statistical knowledge For each contact distance inCalculating the energy value of the residue pair with the probability of being more than 0.3, and taking the accumulated result of all the energy values as the final energy value; then the energy value and an energy function based on knowledge base are used for carrying out comprehensive evaluation on the candidate structure, the two energy functions are respectively used for carrying out sorting from small to large on the candidate structure, the serial numbers are added, and the results are obtainedThe structure is optimized by side chain after the structure with the minimum sequence number is screened out, the side chain isomers of each type of amino acid in the nature are counted, the side chain is replaced, so that the possible position overlapping between side chain atoms is eliminated, and the side chain conformation is improved to be more consistent with the real structure, wherein: p is residue to contact distanceProbability of d between d max Is the theoretical maximum of the C.beta. -C.beta.distance at which a contact relationship exists between these two types of residues.
The invention relates to a system for realizing the method, which comprises the following steps: a multi-sequence alignment module, a transmembrane region prediction module, a residue contact prediction module, and a tertiary structure prediction module, wherein: the input is connected with the multi-sequence comparison module to obtain a homologous sequence, the transmembrane region prediction module is connected with the multi-sequence comparison module and the three-dimensional structure prediction module, transmembrane region prediction is carried out by combining the result of multi-sequence comparison and the result is transmitted to the three-dimensional structure prediction module, the residue contact prediction module is also connected with the multi-sequence comparison module and the three-dimensional structure prediction module, residue contact prediction is carried out by combining the result of multi-sequence comparison and the result is transmitted to the three-dimensional structure prediction module, and the three-dimensional structure prediction module comprehensively uses the information given by the multi-sequence comparison module, the transmembrane region prediction module and the residue contact prediction module to finally complete the three-dimensional structure prediction.
Technical effects
The invention integrally solves the problems of insufficient pertinence, insufficient precision, insufficient speed and the like in the prior art;
compared with the prior art, the method for predicting the three-dimensional structure of the membrane protein can simultaneously give the prediction result of the transmembrane region and residue contact generated in the prediction process, has 10 to 20 percent improvement on the prediction precision compared with the current membrane protein structure prediction methods, only needs dozens of minutes to several hours in time, is simple to operate and convenient to use, and can achieve the RMSD of the predicted structure on certain proteins relative to the real structureThe majority of proteins are listed belowThe accuracy of the inner part of the membrane is higher.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
As shown in FIG. 1, the present example relates to a method for predicting the three-dimensional structure of a membrane protein based on knowledge energy function optimization, wherein the input is a membrane protein sequence, for example, the A chain of PDB with the protein number of 2D57 is shown in Seq ID No. 1.
In this embodiment, the iteration number is set to 2000 ten thousand, and the algorithm starts to be executed, which includes three stages:
s1, preprocessing;
s2, iterative optimization;
and S3, post-processing.
Further, the preprocessing stage S1 includes the following steps:
s11, obtaining a multi-sequence alignment result, a secondary structure prediction result and a residue contact prediction result;
s12, providing constraint on the distance between residues C beta and C beta by using the multi-sequence alignment result and combining statistical knowledge; constructing a structural fragment query library by using the secondary structure prediction result and combining with the known structure in the protein structure database PDB; and (4) giving an energy function of a knowledge base by using residue prediction results.
Further, the iterative optimization S2 stage includes the following steps:
s21, randomly selecting fragments from the structural fragment query library under the constraint conditions of an energy function and a residue distance, and replacing the initial structure;
and S22, repeating the process of S21 for 2000 ten thousand times, and selecting the last 100 structures as candidate structures.
Further, the post-processing stage S3 includes the following steps:
s31, comprehensively evaluating the 100 structures by using an energy function of another knowledge base, and selecting the best structure;
and S32, performing side chain optimization on the structure, and outputting a final result.
The final output of the algorithm is a file in PDB format.
Evaluation index used in the present exampleWherein: l is N Is the length of the template structure (generally the true protein structure), L T Is the length of the residue aligned with the template structure, d i Is the distance between the i-th alignment residues, d 0 Is a standardized scale item which is a fixed value.
The evaluation index TM-score has a value between 0 and 1, the larger the value is, the higher the similarity degree between the two structures is, the TM-score value calculated by using the predicted structure and the real structure can be used as an index for evaluating a prediction result, the larger the TM-score value is, the closer the predicted structure is to the real structure is, and the smaller the TM-score value is, the larger the difference between the predicted structure and the real structure is. GDT-TS, similarly, is between 0 and 100. RMSD represents the root mean square error between the predicted atomic coordinates and the true atomic coordinates.
In the present example, experiments were performed on some membrane proteins, and the experimental results shown in table 2 were obtained, and compared with the existing membrane protein structure prediction method FILM3, the membrane protein structure prediction method has different degrees of improvement in various indexes, and the improvement range on some proteins reaches more than 20%.
TABLE 2 prediction and comparison with FILM3
Compared with the prior art, the method has the advantages that the prediction precision of the three-dimensional structure of the membrane protein is greatly improved, and particularly, the error between a transmembrane helical region and a real structure is very low. The prediction time is short, and the three-dimensional structure prediction of the membrane protein with the length of hundreds of residues can be completed within a few hours. Prediction results and residue contacts for transmembrane domains generated during the prediction process can also be presented.
The foregoing embodiments may be modified in many different ways by those skilled in the art without departing from the spirit and scope of the invention, which is defined by the appended claims and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.
Sequence listing
<110> Shanghai university of transportation
<120> knowledge energy function optimization-based membrane protein three-dimensional structure prediction method
<130> fnc482e
<141> 2021-06-08
<160> 1
<170> SIPOSequenceListing 1.0
<210> 1
<211> 224
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<400> 1
Thr Gln Ala Phe Trp Lys Ala Val Thr Ala Glu Phe Leu Ala Met Leu
1 5 10 15
Ile Phe Val Leu Leu Ser Val Gly Ser Thr Ile Asn Trp Gly Gly Ser
20 25 30
Glu Asn Pro Leu Pro Val Asp Met Val Leu Ile Ser Leu Cys Phe Gly
35 40 45
Leu Ser Ile Ala Thr Met Val Gln Cys Phe Gly His Ile Ser Gly Gly
50 55 60
His Ile Asn Pro Ala Val Thr Val Ala Met Val Cys Thr Arg Lys Ile
65 70 75 80
Ser Ile Ala Lys Ser Val Phe Tyr Ile Thr Ala Gln Cys Leu Gly Ala
85 90 95
Ile Ile Gly Ala Gly Ile Leu Tyr Leu Val Thr Pro Pro Ser Val Val
100 105 110
Gly Gly Leu Gly Val Thr Thr Val His Gly Asn Leu Thr Ala Gly His
115 120 125
Gly Leu Leu Val Glu Leu Ile Ile Thr Phe Gln Leu Val Phe Thr Ile
130 135 140
Phe Ala Ser Cys Asp Ser Lys Arg Thr Asp Val Thr Gly Ser Val Ala
145 150 155 160
Leu Ala Ile Gly Phe Ser Val Ala Ile Gly His Leu Phe Ala Ile Asn
165 170 175
Tyr Thr Gly Ala Ser Met Asn Pro Ala Arg Ser Phe Gly Pro Ala Val
180 185 190
Ile Met Gly Asn Trp Glu Asn His Trp Ile Tyr Trp Val Gly Pro Ile
195 200 205
Ile Gly Ala Val Leu Ala Gly Ala Leu Tyr Glu Tyr Val Phe Cys Pro
210 215 220
Claims (7)
1. A membrane protein three-dimensional structure prediction method based on knowledge energy function optimization is characterized in that constraints on residue distances are obtained according to multiple sequence comparison results of input sequences and statistical knowledge, a structure fragment query library is constructed according to secondary structure prediction results of the input sequences and known structures in a protein structure database PDB, and an energy function of a knowledge base is calculated according to residue contact prediction results of the input sequences; then, iteratively carrying out fragment replacement on the initial structure under the conditions of energy function and residue distance constraint to obtain a plurality of candidate structures; finally, screening the candidate structure to obtain a final predicted membrane protein three-dimensional structure;
the energy function based on the knowledge base is as follows: calculating a score by using the score-d function relation group of each residue pair, and taking the accumulated result of all scores as the energy value of the whole structure, wherein: the score-d functional relationship group refers to: for a protein sequence of length L residues, the predicted C beta-C beta distance is selected to beThe first L residue pairs of the probability between and the distance betweenPredicting the first L/5 residue pairs of the probability, and calculating the score of each residue pair in each probability interval for each residue pair after removing the repeated part between the residue pairsWherein: n9, i is the interval number, i 1, 2 i Is the midpoint of the i-th interval, p i Is the probability value corresponding to the ith interval, and α is a normalization term, where the constant α is 1.57; then, each group is respectively processed with cubic spline interpolation to obtainA set of score-d functional relationships within the range;
the initial structure is as follows: the peptide bond between two residues is parallel to the backbone of the residue, and overall is a straight chain, i.e., the starting point for the iterative substitution of fragments;
the segment replacement comprises the following steps:
i. generating a random number R1 for determining whether to perform one of a secondary structure fragment replacement, a fixed length fragment replacement, or a short fragment replacement;
ii. Generating a random number R2 for determining the starting position for segment replacement;
iii, generating a random number R3 for selecting a specific type of fragment;
iv, carrying out coordinate transformation to complete a round of segment replacement process;
v, judging whether the replaced structure meets constraint conditions, and if so, retaining the structure, and not discarding the structure;
and the candidate structure repeatedly iterates the initial structure through a simulated annealing algorithm, and replaces other candidate structures with higher energy values each time when a structure with a lower energy value is generated.
2. The method for predicting the three-dimensional structure of the membrane protein based on the knowledge-energy function optimization of claim 1, wherein the constraint on the residue distance is as follows: according to the statistical rule of various protein structures in PDB, limiting the maximum value and the minimum value of the value range of the C beta-C beta distance when two types of residues have a contact relation;
the statistical rule is as follows: analyzing a large number of real protein structures, calculating two types of residue distances with contact relation, and counting the value range obtained by the distances.
3. The method for predicting the three-dimensional structure of the membrane protein based on knowledge energy function optimization according to claim 1, wherein the secondary structure prediction result is obtained by a membrane protein transmembrane helix prediction model based on multi-scale deep learning, and the specific operation steps are as follows: inputting a protein sequence, obtaining a large number of similar sequences through multi-sequence comparison, and predicting a transmembrane region and a transmembrane direction by using a deep learning model and a support vector machine classifier in combination with coevolution information;
the prediction model of the transmembrane helix of the membrane protein comprises: a transmembrane region prediction module and a direction prediction module, wherein: the transmembrane region prediction module comprises a multi-scale deep learning model and a binarization processing module, wherein the deep learning model consists of a small-scale residual error neural network based on residues and a large-scale residual error neural network based on a full sequence, and the binarization processing module is used for carrying out binarization processing on an original prediction score according to a dynamic threshold value and solving the problem of insufficient segmentation; the directional prediction module uses a support vector machine classifier.
4. The method for predicting the three-dimensional structure of the membrane protein based on the knowledge-energy function optimization of claim 1, wherein the query library of the structural fragments comprises: a query library constructed based on a fragment of a specific secondary structure including an alpha-helix and a beta-sheet, the fragment having a minimum length of 5 residues, a fixed length fragment query library constructed based on a protein fragment having a fixed length and a maximum length of 16 residues, the fragment having a minimum length of 9 residues, and a short fragment query library constructed based on a short fragment having a 3-residue length;
each fixed length segment in the fixed length segment query library is the same as the secondary structure of a corresponding position of a certain segment with the same length in the query sequence.
5. The method for predicting the three-dimensional structure of the membrane protein based on the knowledge-energy function optimization of claim 1, wherein the residue contact prediction result is obtained by a protein residue contact prediction model based on deep learning, and specifically comprises the following steps: the C β -C β distance of the residues is divided into 10 intervals, which are respectively: in the above, the probability of each pair of residues being located in each distance bin is predicted;
the protein residue contact prediction model comprises five groups of 29 improved ResNet residual modules which are respectively divided into 3 groups, 4 groups, 6 groups, 8 groups and 8 groups, wherein the former three groups of modules introduce a swelling convolution mechanism, and the latter two groups of modules introduce a channel-based attention mechanism.
6. The method for predicting the three-dimensional structure of the membrane protein based on the knowledge-energy function optimization of claim 1, wherein the screening is: using another energy function based on statistical knowledgeFor each contact distance inCalculating the energy value of the residue pair with the probability of being more than 0.3, and taking the accumulated result of all the energy values as the final energy value; then the energy value and an energy function based on knowledge base are used for carrying out comprehensive evaluation on the candidate structure, two energy functions are respectively used for carrying out sequencing from small to large on the candidate structure, the sequence numbers are added, the structure with the minimum sequence number and the minimum structure are screened out and then side chain optimization is carried out on the structure, side chain isomers of each type of amino acid in the nature are counted, the side chain is replaced, so that possible position overlapping among side chain atoms is eliminated, and the side chain conformation is improved to be more consistent with the real structure, wherein: p is residue to contact distanceProbability of d between d max Is the theoretical maximum of the C.beta. -C.beta.distance at which a contact relationship exists between these two types of residues.
7. A system for realizing the knowledge-based energy function optimization membrane protein three-dimensional structure prediction method of any one of claims 1 to 6, which comprises the following steps: a multi-sequence alignment module, a transmembrane region prediction module, a residue contact prediction module, and a three-dimensional structure prediction module, wherein: the input is connected with the multi-sequence comparison module to obtain a homologous sequence, the transmembrane region prediction module is connected with the multi-sequence comparison module and the three-dimensional structure prediction module, transmembrane region prediction is carried out by combining the result of multi-sequence comparison and the result is transmitted to the three-dimensional structure prediction module, the residue contact prediction module is also connected with the multi-sequence comparison module and the three-dimensional structure prediction module, residue contact prediction is carried out by combining the result of multi-sequence comparison and the result is transmitted to the three-dimensional structure prediction module, and the three-dimensional structure prediction module comprehensively uses the information given by the multi-sequence comparison module, the transmembrane region prediction module and the residue contact prediction module to finally complete three-dimensional structure prediction.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110636292.8A CN113205855B (en) | 2021-06-08 | 2021-06-08 | Knowledge energy function optimization-based membrane protein three-dimensional structure prediction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110636292.8A CN113205855B (en) | 2021-06-08 | 2021-06-08 | Knowledge energy function optimization-based membrane protein three-dimensional structure prediction method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113205855A CN113205855A (en) | 2021-08-03 |
CN113205855B true CN113205855B (en) | 2022-08-05 |
Family
ID=77024177
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110636292.8A Active CN113205855B (en) | 2021-06-08 | 2021-06-08 | Knowledge energy function optimization-based membrane protein three-dimensional structure prediction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113205855B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008157729A2 (en) * | 2007-06-21 | 2008-12-24 | California Institute Of Technology | Methods for predicting three-dimensional structures for alpha helical membrane proteins and their use in design of selective ligands |
CN104504299A (en) * | 2014-12-29 | 2015-04-08 | 中国科学院深圳先进技术研究院 | Method for predicting action relation between residues of membrane protein |
CN104615910A (en) * | 2014-12-30 | 2015-05-13 | 中国科学院深圳先进技术研究院 | Method for predicating helix interactive relationship of alpha transmembrane protein based on random forest |
CN110390995A (en) * | 2019-07-01 | 2019-10-29 | 上海交通大学 | α spiral transmembrane protein topological structure prediction technique and device |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130303383A1 (en) * | 2012-05-09 | 2013-11-14 | Sloan-Kettering Institute For Cancer Reseach | Methods and apparatus for predicting protein structure |
US20130304432A1 (en) * | 2012-05-09 | 2013-11-14 | Memorial Sloan-Kettering Cancer Center | Methods and apparatus for predicting protein structure |
CN109033744B (en) * | 2018-06-19 | 2021-08-03 | 浙江工业大学 | Protein structure prediction method based on residue distance and contact information |
CN112233723B (en) * | 2020-10-26 | 2022-10-25 | 上海天壤智能科技有限公司 | Protein structure prediction method and system based on deep learning |
-
2021
- 2021-06-08 CN CN202110636292.8A patent/CN113205855B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008157729A2 (en) * | 2007-06-21 | 2008-12-24 | California Institute Of Technology | Methods for predicting three-dimensional structures for alpha helical membrane proteins and their use in design of selective ligands |
CN104504299A (en) * | 2014-12-29 | 2015-04-08 | 中国科学院深圳先进技术研究院 | Method for predicting action relation between residues of membrane protein |
CN104615910A (en) * | 2014-12-30 | 2015-05-13 | 中国科学院深圳先进技术研究院 | Method for predicating helix interactive relationship of alpha transmembrane protein based on random forest |
CN110390995A (en) * | 2019-07-01 | 2019-10-29 | 上海交通大学 | α spiral transmembrane protein topological structure prediction technique and device |
Non-Patent Citations (1)
Title |
---|
Topology Prediction Improvement of α-helical Transmembrane Proteins Through Helix-tail Modelingand Multiscale Deep Learning Fusion;Shi-Hao Feng et al.;《Journal of Molecular Biology》;20191221;第1279-1296页 * |
Also Published As
Publication number | Publication date |
---|---|
CN113205855A (en) | 2021-08-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Camproux et al. | A hidden markov model derived structural alphabet for proteins | |
CN113990384B (en) | Deep learning-based method, system and application for constructing atomic model structure of frozen electron microscope | |
Kuang et al. | Protein backbone angle prediction with machine learning approaches | |
KR20180053731A (en) | How to find K extreme values within a certain processing time | |
Li et al. | Protein loop modeling using deep generative adversarial network | |
CN113257357B (en) | Protein residue contact map prediction method | |
CN109360599B (en) | Protein structure prediction method based on residue contact information cross strategy | |
WO2023129955A1 (en) | Inter-model prediction score recalibration | |
Kandathil et al. | Deep learning-based prediction of protein structure using learned representations of multiple sequence alignments | |
CN113205855B (en) | Knowledge energy function optimization-based membrane protein three-dimensional structure prediction method | |
CN110853702B (en) | Protein interaction prediction method based on spatial structure | |
Liang et al. | Prediction of protein structural class based on different autocorrelation descriptors of position-specific scoring matrix | |
Zhang et al. | Two-stage distance feature-based optimization algorithm for de novo protein structure prediction | |
Behera et al. | Higher accuracy protein multiple sequence alignments by genetic algorithm | |
Yao et al. | A two-stage multi-fidelity design optimization for K-mer-based pattern recognition (KPR) in image processing | |
Bhattacharya et al. | Progress: Simultaneous searching of protein databases by sequence and structure | |
CN116189776A (en) | Antibody structure generation method based on deep learning | |
Yang et al. | Localnet: a simple recurrent neural network model for protein secondary structure prediction using local amino acid sequences only | |
Andersen et al. | Topology of protein metastructure and $\beta $-sheet topology | |
Zhao et al. | Prediction of Protein Secondary Structure Based on Lightweight Convolutional Neural Network | |
JP7260934B2 (en) | Negative sequence pattern similarity analysis method based on biological sequences, its implementation system and medium | |
Guo et al. | Using Artificial Neural Networks to Model Errors in Biochemical Manipulation of DNA Molecules | |
Chen et al. | Contactlib-att: a structure-based search engine for homologous proteins | |
Li et al. | Prediction of splice site using support vector machine with feature selection | |
Sree et al. | An extensive report on cellular automata based artificial immune system for strengthening automated protein prediction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |