CN113205855B - Knowledge energy function optimization-based membrane protein three-dimensional structure prediction method - Google Patents

Knowledge energy function optimization-based membrane protein three-dimensional structure prediction method Download PDF

Info

Publication number
CN113205855B
CN113205855B CN202110636292.8A CN202110636292A CN113205855B CN 113205855 B CN113205855 B CN 113205855B CN 202110636292 A CN202110636292 A CN 202110636292A CN 113205855 B CN113205855 B CN 113205855B
Authority
CN
China
Prior art keywords
residue
prediction
dimensional structure
energy function
knowledge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110636292.8A
Other languages
Chinese (zh)
Other versions
CN113205855A (en
Inventor
柳源
沈红斌
冯世豪
张沛东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202110636292.8A priority Critical patent/CN113205855B/en
Publication of CN113205855A publication Critical patent/CN113205855A/en
Application granted granted Critical
Publication of CN113205855B publication Critical patent/CN113205855B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • G16B5/20Probabilistic models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Bioethics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Chemical & Material Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Epidemiology (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Molecular Biology (AREA)
  • Physiology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A membrane protein three-dimensional structure prediction method based on knowledge energy function optimization is characterized in that constraint on residue distance is obtained according to a multi-sequence comparison result of an input sequence and statistical knowledge, a structure fragment query library is constructed according to a secondary structure prediction result of the input sequence and a known structure in a protein structure database PDB, and an energy function of a knowledge base is calculated according to a residue contact prediction result of the input sequence; then, iteratively carrying out fragment replacement on the initial structure under the conditions of energy function and residue distance constraint to obtain a plurality of candidate structures; and finally, screening the candidate structure to obtain the final predicted three-dimensional structure of the membrane protein. The invention is based on a de novo prediction method, uses multiple technologies such as Multiple Sequence Alignment (MSA), secondary structure prediction, residue contact prediction and the like, and has the advantages of convenient operation, high accuracy and the like.

Description

Knowledge energy function optimization-based membrane protein three-dimensional structure prediction method
Technical Field
The invention relates to a technology in the field of bioengineering, in particular to a membrane protein three-dimensional structure prediction method based on knowledge energy function optimization.
Background
The method for obtaining accurate protein structure information is through experimental determination, the most common experimental methods at present are X-ray diffraction method, nuclear magnetic resonance method and frozen electron microscope technology, etc., and the protein structures obtained through the experimental methods are stored in a biological Database pdb (protein Database bank). The number of structures of the existing analyzed high-resolution membrane protein in PDB is only 1267, which accounts for about 2% of the total number of the protein structures in PDB, so that the calculation method for predicting the three-dimensional structure of the membrane protein is very important. The current computational methods predict protein structure in two main directions, one is template modeling and the other is de novo prediction. For membrane proteins, since the structures available in PDB are rare and no suitable template is generally found, most membrane protein structure prediction methods are based on de novo calculations. On the other hand, the membrane protein has a longer amino acid sequence, which causes the time efficiency of most de novo prediction methods to be extremely reduced, and some methods can not even complete the prediction task.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a membrane protein three-dimensional structure prediction method based on knowledge energy function optimization, which is based on a de novo prediction method, uses multiple technologies such as Multiple Sequence Alignment (MSA), secondary structure prediction, residue contact prediction and the like, and has the advantages of convenient operation, high accuracy and the like.
The invention is realized by the following technical scheme:
the invention relates to a membrane protein three-dimensional structure prediction method based on knowledge energy function optimization, which comprises the steps of respectively combining statistical knowledge with the multi-sequence comparison result of an input sequence to obtain the constraint on residue distance, combining a known structure in PDB with the secondary structure prediction result of the input sequence to construct a structure fragment query library, and calculating an energy function of a knowledge base according to the residue contact prediction result of the input sequence; then, iteratively carrying out fragment replacement on the initial structure under the conditions of energy function and residue distance constraint to obtain a plurality of candidate structures; and finally, screening the candidate structure to obtain the final predicted three-dimensional structure of the membrane protein.
The input sequence is an amino acid sequence of the membrane protein, the sequence length is not limited, a plurality of transmembrane helices are included, and the transmembrane part accounts for the main body.
And the multiple sequence comparison result is obtained by comparing the input sequence with the middle sequence to select a plurality of sequences with higher homology.
The constraint on the residue distance is that: according to the statistical rules of various protein structures in PDB, the maximum value and the minimum value of the value range of the C beta-C beta distance when two types of residues have contact relation are limited.
The statistical rule is as follows: analyzing a large number of real protein structures, calculating two types of residue distances with contact relation, and counting the value range obtained by the distances.
The secondary structure prediction result is obtained through a membrane protein transmembrane helix (TMH) prediction model (Membrain) based on multi-scale deep learning, and the specific operation steps are as follows: inputting a protein sequence, obtaining a large number of similar sequences through multi-sequence comparison, combining co-evolution information, and predicting a transmembrane region and a transmembrane direction by using a deep learning model and a support vector machine classifier.
The prediction model of transmembrane helix of membrane protein comprises: a transmembrane region prediction module and a direction prediction module, wherein: the transmembrane region prediction module comprises a multi-scale deep learning model and a binarization processing module, wherein the deep learning model consists of a small-scale residual error neural network based on residues and a large-scale residual error neural network based on a full sequence, and the binarization processing module is used for carrying out binarization processing on an original prediction score according to a dynamic threshold value and solving the problem of insufficient segmentation; the direction prediction module uses a support vector machine classifier (SVM).
The structural fragment query library comprises: the query library is constructed based on a fragment with a specific secondary structure, including an alpha-helix and a beta-sheet, and the minimum length of the fragment is 5 residues, the query library is constructed based on a protein fragment with a fixed length, the minimum length of the fragment is 9 residues and the maximum length of the fragment is 16 residues, and the query library is constructed based on a short fragment with a short length of 3 residues.
Each fixed length segment in the fixed length segment query library is the same as the secondary structure of a corresponding position of a certain segment with the same length in the query sequence.
The residue contact prediction result is obtained by a deep learning-based protein residue contact prediction model (shen-CDeep), and specifically comprises the following steps: dividing the C beta-C beta distance of residues into 10 regionsAnd (b) respectively:
Figure BDA0003105354820000021
Figure BDA0003105354820000022
in the above, the probability of each pair of residues being located in each distance bin is predicted.
The protein residue contact prediction model comprises five groups of 29 improved ResNet residual modules which are divided into 3, 4, 6, 8 and 8 groups respectively, wherein a dilated convolution mechanism is introduced into the first three groups of modules, and a channel-based attention mechanism is introduced into the second two groups of modules.
The energy function based on the knowledge base is as follows: calculating a score by using the score-d function relation group of each residue pair, and taking the accumulated result of all scores as the energy value of the whole structure, wherein: the score-d functional relationship group refers to: for a protein sequence of length L residues, the predicted C beta-C beta distance is selected to be
Figure BDA0003105354820000023
The first L residue pairs of the probability between and the distance between
Figure BDA0003105354820000024
Predicting the first L/5 residue pairs of the probability, and calculating the score of each residue pair in each probability interval for each residue pair after removing the repeated part between the residue pairs
Figure BDA0003105354820000025
Wherein: n is 9, i is the number of the interval, i is 1, 2, …, 9, d i Is the midpoint of the i-th interval, p i Is the probability value corresponding to the ith interval, and α is a normalization term, where the constant α is 1.57; then, each group is respectively processed with cubic spline interpolation to obtain
Figure BDA0003105354820000026
A set of score-d functional relationships within the range.
The initial structure is as follows: the peptide bond between two residues is parallel to the backbone of the residue and overall is a straight chain, i.e., the starting point for the iterative substitution of fragments.
The segment replacement comprises the following steps:
i. generating a random number R1 for determining which of three fragment substitutions (secondary structure fragment substitution, fixed length fragment substitution, short fragment substitution) is to be made;
ii. Generating a random number R2 for determining the starting position for segment replacement;
iii, generating a random number R3 for selecting a specific type of fragment;
iv, carrying out coordinate transformation to complete a round of segment replacement process;
and v, judging whether the replaced structure meets the constraint condition, and if so, retaining the structure and not discarding the structure.
And the candidate structure repeatedly iterates the initial structure through a simulated annealing algorithm, and replaces other candidate structures with higher energy values each time when a structure with a lower energy value is generated.
The number of repeated iterations is preferably 2000 ten thousand or more, and the number of corresponding candidate structures is preferably 100 or more.
The screening is as follows: using another energy function based on statistical knowledge
Figure BDA0003105354820000031
Figure BDA0003105354820000032
For each contact distance in
Figure BDA0003105354820000033
Calculating the energy value of the residue pair with the probability of being more than 0.3, and taking the accumulated result of all the energy values as the final energy value; then the energy value and an energy function based on knowledge base are used for carrying out comprehensive evaluation on the candidate structure, the two energy functions are respectively used for carrying out sorting from small to large on the candidate structure, the serial numbers are added, and the results are obtainedThe structure is optimized by side chain after the structure with the minimum sequence number is screened out, the side chain isomers of each type of amino acid in the nature are counted, the side chain is replaced, so that the possible position overlapping between side chain atoms is eliminated, and the side chain conformation is improved to be more consistent with the real structure, wherein: p is residue to contact distance
Figure BDA0003105354820000034
Probability of d between d max Is the theoretical maximum of the C.beta. -C.beta.distance at which a contact relationship exists between these two types of residues.
The invention relates to a system for realizing the method, which comprises the following steps: a multi-sequence alignment module, a transmembrane region prediction module, a residue contact prediction module, and a tertiary structure prediction module, wherein: the input is connected with the multi-sequence comparison module to obtain a homologous sequence, the transmembrane region prediction module is connected with the multi-sequence comparison module and the three-dimensional structure prediction module, transmembrane region prediction is carried out by combining the result of multi-sequence comparison and the result is transmitted to the three-dimensional structure prediction module, the residue contact prediction module is also connected with the multi-sequence comparison module and the three-dimensional structure prediction module, residue contact prediction is carried out by combining the result of multi-sequence comparison and the result is transmitted to the three-dimensional structure prediction module, and the three-dimensional structure prediction module comprehensively uses the information given by the multi-sequence comparison module, the transmembrane region prediction module and the residue contact prediction module to finally complete the three-dimensional structure prediction.
Technical effects
The invention integrally solves the problems of insufficient pertinence, insufficient precision, insufficient speed and the like in the prior art;
compared with the prior art, the method for predicting the three-dimensional structure of the membrane protein can simultaneously give the prediction result of the transmembrane region and residue contact generated in the prediction process, has 10 to 20 percent improvement on the prediction precision compared with the current membrane protein structure prediction methods, only needs dozens of minutes to several hours in time, is simple to operate and convenient to use, and can achieve the RMSD of the predicted structure on certain proteins relative to the real structure
Figure BDA0003105354820000041
The majority of proteins are listed below
Figure BDA0003105354820000042
The accuracy of the inner part of the membrane is higher.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
As shown in FIG. 1, the present example relates to a method for predicting the three-dimensional structure of a membrane protein based on knowledge energy function optimization, wherein the input is a membrane protein sequence, for example, the A chain of PDB with the protein number of 2D57 is shown in Seq ID No. 1.
In this embodiment, the iteration number is set to 2000 ten thousand, and the algorithm starts to be executed, which includes three stages:
s1, preprocessing;
s2, iterative optimization;
and S3, post-processing.
Further, the preprocessing stage S1 includes the following steps:
s11, obtaining a multi-sequence alignment result, a secondary structure prediction result and a residue contact prediction result;
s12, providing constraint on the distance between residues C beta and C beta by using the multi-sequence alignment result and combining statistical knowledge; constructing a structural fragment query library by using the secondary structure prediction result and combining with the known structure in the protein structure database PDB; and (4) giving an energy function of a knowledge base by using residue prediction results.
Further, the iterative optimization S2 stage includes the following steps:
s21, randomly selecting fragments from the structural fragment query library under the constraint conditions of an energy function and a residue distance, and replacing the initial structure;
and S22, repeating the process of S21 for 2000 ten thousand times, and selecting the last 100 structures as candidate structures.
Further, the post-processing stage S3 includes the following steps:
s31, comprehensively evaluating the 100 structures by using an energy function of another knowledge base, and selecting the best structure;
and S32, performing side chain optimization on the structure, and outputting a final result.
The final output of the algorithm is a file in PDB format.
Evaluation index used in the present example
Figure BDA0003105354820000043
Wherein: l is N Is the length of the template structure (generally the true protein structure), L T Is the length of the residue aligned with the template structure, d i Is the distance between the i-th alignment residues, d 0 Is a standardized scale item which is a fixed value.
The evaluation index TM-score has a value between 0 and 1, the larger the value is, the higher the similarity degree between the two structures is, the TM-score value calculated by using the predicted structure and the real structure can be used as an index for evaluating a prediction result, the larger the TM-score value is, the closer the predicted structure is to the real structure is, and the smaller the TM-score value is, the larger the difference between the predicted structure and the real structure is. GDT-TS, similarly, is between 0 and 100. RMSD represents the root mean square error between the predicted atomic coordinates and the true atomic coordinates.
In the present example, experiments were performed on some membrane proteins, and the experimental results shown in table 2 were obtained, and compared with the existing membrane protein structure prediction method FILM3, the membrane protein structure prediction method has different degrees of improvement in various indexes, and the improvement range on some proteins reaches more than 20%.
TABLE 2 prediction and comparison with FILM3
Figure BDA0003105354820000051
Compared with the prior art, the method has the advantages that the prediction precision of the three-dimensional structure of the membrane protein is greatly improved, and particularly, the error between a transmembrane helical region and a real structure is very low. The prediction time is short, and the three-dimensional structure prediction of the membrane protein with the length of hundreds of residues can be completed within a few hours. Prediction results and residue contacts for transmembrane domains generated during the prediction process can also be presented.
The foregoing embodiments may be modified in many different ways by those skilled in the art without departing from the spirit and scope of the invention, which is defined by the appended claims and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.
Sequence listing
<110> Shanghai university of transportation
<120> knowledge energy function optimization-based membrane protein three-dimensional structure prediction method
<130> fnc482e
<141> 2021-06-08
<160> 1
<170> SIPOSequenceListing 1.0
<210> 1
<211> 224
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<400> 1
Thr Gln Ala Phe Trp Lys Ala Val Thr Ala Glu Phe Leu Ala Met Leu
1 5 10 15
Ile Phe Val Leu Leu Ser Val Gly Ser Thr Ile Asn Trp Gly Gly Ser
20 25 30
Glu Asn Pro Leu Pro Val Asp Met Val Leu Ile Ser Leu Cys Phe Gly
35 40 45
Leu Ser Ile Ala Thr Met Val Gln Cys Phe Gly His Ile Ser Gly Gly
50 55 60
His Ile Asn Pro Ala Val Thr Val Ala Met Val Cys Thr Arg Lys Ile
65 70 75 80
Ser Ile Ala Lys Ser Val Phe Tyr Ile Thr Ala Gln Cys Leu Gly Ala
85 90 95
Ile Ile Gly Ala Gly Ile Leu Tyr Leu Val Thr Pro Pro Ser Val Val
100 105 110
Gly Gly Leu Gly Val Thr Thr Val His Gly Asn Leu Thr Ala Gly His
115 120 125
Gly Leu Leu Val Glu Leu Ile Ile Thr Phe Gln Leu Val Phe Thr Ile
130 135 140
Phe Ala Ser Cys Asp Ser Lys Arg Thr Asp Val Thr Gly Ser Val Ala
145 150 155 160
Leu Ala Ile Gly Phe Ser Val Ala Ile Gly His Leu Phe Ala Ile Asn
165 170 175
Tyr Thr Gly Ala Ser Met Asn Pro Ala Arg Ser Phe Gly Pro Ala Val
180 185 190
Ile Met Gly Asn Trp Glu Asn His Trp Ile Tyr Trp Val Gly Pro Ile
195 200 205
Ile Gly Ala Val Leu Ala Gly Ala Leu Tyr Glu Tyr Val Phe Cys Pro
210 215 220

Claims (7)

1. A membrane protein three-dimensional structure prediction method based on knowledge energy function optimization is characterized in that constraints on residue distances are obtained according to multiple sequence comparison results of input sequences and statistical knowledge, a structure fragment query library is constructed according to secondary structure prediction results of the input sequences and known structures in a protein structure database PDB, and an energy function of a knowledge base is calculated according to residue contact prediction results of the input sequences; then, iteratively carrying out fragment replacement on the initial structure under the conditions of energy function and residue distance constraint to obtain a plurality of candidate structures; finally, screening the candidate structure to obtain a final predicted membrane protein three-dimensional structure;
the energy function based on the knowledge base is as follows: calculating a score by using the score-d function relation group of each residue pair, and taking the accumulated result of all scores as the energy value of the whole structure, wherein: the score-d functional relationship group refers to: for a protein sequence of length L residues, the predicted C beta-C beta distance is selected to be
Figure FDA0003660693940000011
The first L residue pairs of the probability between and the distance between
Figure FDA0003660693940000012
Predicting the first L/5 residue pairs of the probability, and calculating the score of each residue pair in each probability interval for each residue pair after removing the repeated part between the residue pairs
Figure FDA0003660693940000013
Wherein: n9, i is the interval number, i 1, 2 i Is the midpoint of the i-th interval, p i Is the probability value corresponding to the ith interval, and α is a normalization term, where the constant α is 1.57; then, each group is respectively processed with cubic spline interpolation to obtain
Figure FDA0003660693940000014
A set of score-d functional relationships within the range;
the initial structure is as follows: the peptide bond between two residues is parallel to the backbone of the residue, and overall is a straight chain, i.e., the starting point for the iterative substitution of fragments;
the segment replacement comprises the following steps:
i. generating a random number R1 for determining whether to perform one of a secondary structure fragment replacement, a fixed length fragment replacement, or a short fragment replacement;
ii. Generating a random number R2 for determining the starting position for segment replacement;
iii, generating a random number R3 for selecting a specific type of fragment;
iv, carrying out coordinate transformation to complete a round of segment replacement process;
v, judging whether the replaced structure meets constraint conditions, and if so, retaining the structure, and not discarding the structure;
and the candidate structure repeatedly iterates the initial structure through a simulated annealing algorithm, and replaces other candidate structures with higher energy values each time when a structure with a lower energy value is generated.
2. The method for predicting the three-dimensional structure of the membrane protein based on the knowledge-energy function optimization of claim 1, wherein the constraint on the residue distance is as follows: according to the statistical rule of various protein structures in PDB, limiting the maximum value and the minimum value of the value range of the C beta-C beta distance when two types of residues have a contact relation;
the statistical rule is as follows: analyzing a large number of real protein structures, calculating two types of residue distances with contact relation, and counting the value range obtained by the distances.
3. The method for predicting the three-dimensional structure of the membrane protein based on knowledge energy function optimization according to claim 1, wherein the secondary structure prediction result is obtained by a membrane protein transmembrane helix prediction model based on multi-scale deep learning, and the specific operation steps are as follows: inputting a protein sequence, obtaining a large number of similar sequences through multi-sequence comparison, and predicting a transmembrane region and a transmembrane direction by using a deep learning model and a support vector machine classifier in combination with coevolution information;
the prediction model of the transmembrane helix of the membrane protein comprises: a transmembrane region prediction module and a direction prediction module, wherein: the transmembrane region prediction module comprises a multi-scale deep learning model and a binarization processing module, wherein the deep learning model consists of a small-scale residual error neural network based on residues and a large-scale residual error neural network based on a full sequence, and the binarization processing module is used for carrying out binarization processing on an original prediction score according to a dynamic threshold value and solving the problem of insufficient segmentation; the directional prediction module uses a support vector machine classifier.
4. The method for predicting the three-dimensional structure of the membrane protein based on the knowledge-energy function optimization of claim 1, wherein the query library of the structural fragments comprises: a query library constructed based on a fragment of a specific secondary structure including an alpha-helix and a beta-sheet, the fragment having a minimum length of 5 residues, a fixed length fragment query library constructed based on a protein fragment having a fixed length and a maximum length of 16 residues, the fragment having a minimum length of 9 residues, and a short fragment query library constructed based on a short fragment having a 3-residue length;
each fixed length segment in the fixed length segment query library is the same as the secondary structure of a corresponding position of a certain segment with the same length in the query sequence.
5. The method for predicting the three-dimensional structure of the membrane protein based on the knowledge-energy function optimization of claim 1, wherein the residue contact prediction result is obtained by a protein residue contact prediction model based on deep learning, and specifically comprises the following steps: the C β -C β distance of the residues is divided into 10 intervals, which are respectively:
Figure FDA0003660693940000021
Figure FDA0003660693940000022
in the above, the probability of each pair of residues being located in each distance bin is predicted;
the protein residue contact prediction model comprises five groups of 29 improved ResNet residual modules which are respectively divided into 3 groups, 4 groups, 6 groups, 8 groups and 8 groups, wherein the former three groups of modules introduce a swelling convolution mechanism, and the latter two groups of modules introduce a channel-based attention mechanism.
6. The method for predicting the three-dimensional structure of the membrane protein based on the knowledge-energy function optimization of claim 1, wherein the screening is: using another energy function based on statistical knowledge
Figure FDA0003660693940000023
For each contact distance in
Figure FDA0003660693940000024
Calculating the energy value of the residue pair with the probability of being more than 0.3, and taking the accumulated result of all the energy values as the final energy value; then the energy value and an energy function based on knowledge base are used for carrying out comprehensive evaluation on the candidate structure, two energy functions are respectively used for carrying out sequencing from small to large on the candidate structure, the sequence numbers are added, the structure with the minimum sequence number and the minimum structure are screened out and then side chain optimization is carried out on the structure, side chain isomers of each type of amino acid in the nature are counted, the side chain is replaced, so that possible position overlapping among side chain atoms is eliminated, and the side chain conformation is improved to be more consistent with the real structure, wherein: p is residue to contact distance
Figure FDA0003660693940000031
Probability of d between d max Is the theoretical maximum of the C.beta. -C.beta.distance at which a contact relationship exists between these two types of residues.
7. A system for realizing the knowledge-based energy function optimization membrane protein three-dimensional structure prediction method of any one of claims 1 to 6, which comprises the following steps: a multi-sequence alignment module, a transmembrane region prediction module, a residue contact prediction module, and a three-dimensional structure prediction module, wherein: the input is connected with the multi-sequence comparison module to obtain a homologous sequence, the transmembrane region prediction module is connected with the multi-sequence comparison module and the three-dimensional structure prediction module, transmembrane region prediction is carried out by combining the result of multi-sequence comparison and the result is transmitted to the three-dimensional structure prediction module, the residue contact prediction module is also connected with the multi-sequence comparison module and the three-dimensional structure prediction module, residue contact prediction is carried out by combining the result of multi-sequence comparison and the result is transmitted to the three-dimensional structure prediction module, and the three-dimensional structure prediction module comprehensively uses the information given by the multi-sequence comparison module, the transmembrane region prediction module and the residue contact prediction module to finally complete three-dimensional structure prediction.
CN202110636292.8A 2021-06-08 2021-06-08 Knowledge energy function optimization-based membrane protein three-dimensional structure prediction method Active CN113205855B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110636292.8A CN113205855B (en) 2021-06-08 2021-06-08 Knowledge energy function optimization-based membrane protein three-dimensional structure prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110636292.8A CN113205855B (en) 2021-06-08 2021-06-08 Knowledge energy function optimization-based membrane protein three-dimensional structure prediction method

Publications (2)

Publication Number Publication Date
CN113205855A CN113205855A (en) 2021-08-03
CN113205855B true CN113205855B (en) 2022-08-05

Family

ID=77024177

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110636292.8A Active CN113205855B (en) 2021-06-08 2021-06-08 Knowledge energy function optimization-based membrane protein three-dimensional structure prediction method

Country Status (1)

Country Link
CN (1) CN113205855B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008157729A2 (en) * 2007-06-21 2008-12-24 California Institute Of Technology Methods for predicting three-dimensional structures for alpha helical membrane proteins and their use in design of selective ligands
CN104504299A (en) * 2014-12-29 2015-04-08 中国科学院深圳先进技术研究院 Method for predicting action relation between residues of membrane protein
CN104615910A (en) * 2014-12-30 2015-05-13 中国科学院深圳先进技术研究院 Method for predicating helix interactive relationship of alpha transmembrane protein based on random forest
CN110390995A (en) * 2019-07-01 2019-10-29 上海交通大学 α spiral transmembrane protein topological structure prediction technique and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130303383A1 (en) * 2012-05-09 2013-11-14 Sloan-Kettering Institute For Cancer Reseach Methods and apparatus for predicting protein structure
US20130304432A1 (en) * 2012-05-09 2013-11-14 Memorial Sloan-Kettering Cancer Center Methods and apparatus for predicting protein structure
CN109033744B (en) * 2018-06-19 2021-08-03 浙江工业大学 Protein structure prediction method based on residue distance and contact information
CN112233723B (en) * 2020-10-26 2022-10-25 上海天壤智能科技有限公司 Protein structure prediction method and system based on deep learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008157729A2 (en) * 2007-06-21 2008-12-24 California Institute Of Technology Methods for predicting three-dimensional structures for alpha helical membrane proteins and their use in design of selective ligands
CN104504299A (en) * 2014-12-29 2015-04-08 中国科学院深圳先进技术研究院 Method for predicting action relation between residues of membrane protein
CN104615910A (en) * 2014-12-30 2015-05-13 中国科学院深圳先进技术研究院 Method for predicating helix interactive relationship of alpha transmembrane protein based on random forest
CN110390995A (en) * 2019-07-01 2019-10-29 上海交通大学 α spiral transmembrane protein topological structure prediction technique and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Topology Prediction Improvement of α-helical Transmembrane Proteins Through Helix-tail Modelingand Multiscale Deep Learning Fusion;Shi-Hao Feng et al.;《Journal of Molecular Biology》;20191221;第1279-1296页 *

Also Published As

Publication number Publication date
CN113205855A (en) 2021-08-03

Similar Documents

Publication Publication Date Title
Camproux et al. A hidden markov model derived structural alphabet for proteins
CN113990384B (en) Deep learning-based method, system and application for constructing atomic model structure of frozen electron microscope
Kuang et al. Protein backbone angle prediction with machine learning approaches
KR20180053731A (en) How to find K extreme values within a certain processing time
Li et al. Protein loop modeling using deep generative adversarial network
CN113257357B (en) Protein residue contact map prediction method
CN109360599B (en) Protein structure prediction method based on residue contact information cross strategy
WO2023129955A1 (en) Inter-model prediction score recalibration
Kandathil et al. Deep learning-based prediction of protein structure using learned representations of multiple sequence alignments
CN113205855B (en) Knowledge energy function optimization-based membrane protein three-dimensional structure prediction method
CN110853702B (en) Protein interaction prediction method based on spatial structure
Liang et al. Prediction of protein structural class based on different autocorrelation descriptors of position-specific scoring matrix
Zhang et al. Two-stage distance feature-based optimization algorithm for de novo protein structure prediction
Behera et al. Higher accuracy protein multiple sequence alignments by genetic algorithm
Yao et al. A two-stage multi-fidelity design optimization for K-mer-based pattern recognition (KPR) in image processing
Bhattacharya et al. Progress: Simultaneous searching of protein databases by sequence and structure
CN116189776A (en) Antibody structure generation method based on deep learning
Yang et al. Localnet: a simple recurrent neural network model for protein secondary structure prediction using local amino acid sequences only
Andersen et al. Topology of protein metastructure and $\beta $-sheet topology
Zhao et al. Prediction of Protein Secondary Structure Based on Lightweight Convolutional Neural Network
JP7260934B2 (en) Negative sequence pattern similarity analysis method based on biological sequences, its implementation system and medium
Guo et al. Using Artificial Neural Networks to Model Errors in Biochemical Manipulation of DNA Molecules
Chen et al. Contactlib-att: a structure-based search engine for homologous proteins
Li et al. Prediction of splice site using support vector machine with feature selection
Sree et al. An extensive report on cellular automata based artificial immune system for strengthening automated protein prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant