CN111815036A - Protein structure prediction method based on multi-residue contact map cooperative constraint - Google Patents

Protein structure prediction method based on multi-residue contact map cooperative constraint Download PDF

Info

Publication number
CN111815036A
CN111815036A CN202010578257.0A CN202010578257A CN111815036A CN 111815036 A CN111815036 A CN 111815036A CN 202010578257 A CN202010578257 A CN 202010578257A CN 111815036 A CN111815036 A CN 111815036A
Authority
CN
China
Prior art keywords
conformation
fragment
protein
server
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010578257.0A
Other languages
Chinese (zh)
Other versions
CN111815036B (en
Inventor
张贵军
彭春祥
刘俊
周晓根
夏瑜豪
赵凯龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Zhaoji Biotechnology Co ltd
Shenzhen Xinrui Gene Technology Co ltd
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202010578257.0A priority Critical patent/CN111815036B/en
Publication of CN111815036A publication Critical patent/CN111815036A/en
Application granted granted Critical
Publication of CN111815036B publication Critical patent/CN111815036B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Business, Economics & Management (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Medical Informatics (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Analytical Chemistry (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • General Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Evolutionary Computation (AREA)
  • Game Theory and Decision Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A protein structure prediction method based on multi-residue contact map cooperative constraint is based on a framework of Rosetta, firstly, a population is initialized by utilizing a first stage and a second stage of Rosetta, and then a new test conformation is generated by carrying out variation and cross on a target conformation; secondly, according to residue contact maps predicted by four contact servers, a cosine similarity index based on the residue contact maps is designed to assist a Rosetta energy function score3 to update the conformation, so that algorithm sampling is guided to obtain the conformation with lower energy and more compact structure. The invention provides a protein structure prediction method based on multi-residue contact map cooperative constraint with high prediction accuracy.

Description

Protein structure prediction method based on multi-residue contact map cooperative constraint
Technical Field
The invention relates to the fields of bioinformatics and computer application, in particular to a protein structure prediction method based on multi-residue contact map cooperative constraint.
Background
The prediction of protein structure is the main research content of structural bioinformatics and is an important basic scientific research subject which is not solved by the central laws of molecular biology. In the global protein structure prediction competition (CASP13) held by campfon, mexico, early 12 in 2018, the AlphaFold group developed by deep mind group under google obtained the first total name. The AlphaFold enables the protein structure prediction leading edge basic research problem to enter the visual field of people from a scientific hall, becomes a current 'heat suggestion' direction, and is expected to become an important milestone in the development process of structural bioinformatics; the work also shows that the deep cross fusion of the computer technology, the information technology and the life science field can effectively drive and accelerate the new scientific discovery.
The importance of protein structure prediction stems from the limitations of current experimental assays. X-ray crystal diffraction is the most effective method for determining the protein structure at present, the achieved precision is incomparable with other methods, and the main defects are that the protein crystal is difficult to culture and the period for determining the crystal structure is long; the multidimensional Nuclear Magnetic Resonance (NMR) method can directly determine the conformation of the protein in the solution, but has large requirements on the sample quantity and high purity, and only can determine the small-molecule protein at present. For a drug target-membrane protein, the three-dimensional structure of the membrane protein is extremely difficult to obtain by the existing experimental determination technology;
proteins can only produce their specific biological functions by folding into a specific three-dimensional structure. Therefore, understanding the three-dimensional structure (native state structure) of a protein is key to understanding the biological function of a protein. The three-dimensional structure of the protein can be obtained by experimental methods such as nuclear magnetic resonance and X-ray crystal diffraction, however, the experimental determination methods are time-consuming and extremely expensive, and are not suitable for some proteins which are not easy to crystallize. Therefore, according to the thermodynamic hypothesis of Anfinsen (the conformation with the lowest energy is considered to be the native state structure), many computational algorithms have been proposed for protein structure prediction.
Under the double promotion of theoretical exploration and application requirements, the technology for predicting protein structures by using computers is developed vigorously at the end of the 20 th century according to the Anfinsen rule. The CASP competition initiated by Moult, a scientist of the university of Marylan, 1994, is a worldwide protein structure prediction and evaluation activity, objectively reflects the latest technical level of development in the current protein structure prediction field, and is known as the Olympic competition of protein structure prediction. The competition aims to attract experts in different fields of computer science, biophysics and the like to participate in the very challenging bioinformatics problem of protein three-dimensional structure prediction, and jointly evaluate the current development situation and discuss the future trend.
Protein structure prediction by a calculation technology is usually evaluated by a very complex energy function, the energy function surface of the protein structure prediction has thousands of degrees of freedom and a large number of local optimal solutions, and the conformation search space is extremely large. To perform conformational space search, a de novo prediction method typically first obtains a global minimum solution of the conformational space based on a knowledge-based coarse-grained energy model, and then refines its corresponding conformation to obtain the predicted structure. Therefore, the de novo prediction method needs to solve two problems: 1. establishing a proper energy function to evaluate the reasonability of the conformation; 2. an effective conformational space search method is proposed to search for a globally optimal solution. The first factor is essentially a matter of molecular mechanics, mainly in order to be able to calculate the energy value corresponding to each protein structure. The second factor is essentially a global optimization problem, and a suitable optimization method is selected to quickly search the conformational space to obtain the conformation corresponding to a certain global minimum energy.
The differential evolution algorithm (DE) has been successfully applied to protein structure prediction due to its advantages of simple structure, easy implementation, strong robustness, fast convergence rate, etc. However, with the increase of amino acid sequences, the degree of freedom of a protein molecular system is increased, and obtaining a global optimal solution of a large-scale protein conformation space by using the traditional population algorithm sampling becomes challenging work; secondly, the coarse-grained model reduces the conformational search space, but also causes information loss between interaction forces, thereby directly affecting the prediction accuracy.
Therefore, the conventional protein structure prediction method has disadvantages in sampling efficiency and prediction accuracy, and needs to be improved.
Disclosure of Invention
In order to overcome the defects of low sampling efficiency and low prediction accuracy of the conventional protein structure prediction method, the invention introduces a plurality of residue contact maps to guide conformational space optimization based on Rosetta, and provides a protein structure prediction method based on multi-residue contact map cooperative constraint with high efficiency and high prediction accuracy.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a method for protein structure prediction based on multi-residue contact map co-constraints, the method comprising the steps of:
1) sequence information for a given protein of interest;
2) obtaining a fragment library file from a ROBETTA server (http:// www.robetta.org /) according to a target protein sequence, wherein the fragment library file comprises a 3 fragment library file and a 9 fragment library file;
3) according to the target protein sequence, four contictmaps, namely contiRaptorx, contitResPRE, contictDeepMaP and continecteBcon, are obtained by utilizing a Raptorx-Contact server (http:// RaptorX. uchago. edu/ContactMap /), an ResPRE server (https:// zhangglab. ccmb. med. edu/ResPRE /), a DeepMetaPSICOV server (http:// bioif. cs. ucl. ac. uk/psiprd /), a NeBcon server (https:// zhangb. ccmb. med. emuch. edu/NeBcon /);
4) setting parameters: the population size NP, the iteration times G of the algorithm, a cross factor CR, and an iteration algebra G of 0;
5) population initialization: random fragment assembly to generate NP initial conformations Ci,i={1,2,…,NP};
6) The conformational individuals in the population CiI e {1,2,3, …, NP } is regarded as the target conformation entity
Figure BDA0002552088390000031
Performing the following operations to generate a mutated conformation
Figure BDA0002552088390000032
6.1) randomly generating positive integers n1, n2, n3 in the range of 1 to NP, wherein n1 ≠ n2 ≠ n3 ≠ i;
6.2) in conformation Cn1Randomly selected 9-fragment at position to replace conformation Cn3From the fragment corresponding to the same position in conformation Cn2Randomly choosing one and conformation C in positionn1Selecting 9 segments with different positions to replace even image Cn3And then the corresponding fragment in the same position of (A) is used for conformation Cn3Performing 3-segment assembly to generate individual with variant conformation
Figure BDA0002552088390000033
7) For the variant conformation
Figure BDA0002552088390000034
Performing a crossover operation by i e {1,2,3, …, NP } to generate a test constellation
Figure BDA0002552088390000035
7.1) generating a random number rand1, wherein rand1 belongs to (0, 1);
7.2) if the random number rand1 is less than or equal to CR, the target conformation is selected
Figure BDA0002552088390000036
In which a 3-fragment is randomly selected to be substituted into a variant conformation
Figure BDA0002552088390000037
Otherwise mutated conformation
Figure BDA0002552088390000038
The change is not changed;
8) for each target conformation
Figure BDA0002552088390000039
And a test conformation
Figure BDA00025520883900000310
Carrying out selection operation;
8.1) separately calculated with the Rosetta score3 energy function
Figure BDA00025520883900000311
And
Figure BDA00025520883900000312
energy of (2):
Figure BDA00025520883900000313
and
Figure BDA00025520883900000314
8.2) if
Figure BDA00025520883900000315
Then conformation
Figure BDA00025520883900000316
Rejected, otherwise, continues to execute step 8.3);
8.3) first, the handle
Figure BDA00025520883900000317
And
Figure BDA00025520883900000318
is converted into a one-dimensional vector with the length of L multiplied by L
Figure BDA00025520883900000319
And
Figure BDA00025520883900000320
converting contictRaptorx, contitResPRE, contictDeepMetaPSICOV and contictNeBcon into 4 one-dimensional vectors L x L in length
Figure BDA00025520883900000321
Figure BDA0002552088390000041
And
Figure BDA0002552088390000042
wherein L is the length of the protein sequence; then separately calculate
Figure BDA0002552088390000043
And
Figure BDA0002552088390000044
and
Figure BDA0002552088390000045
cosine similarity and summing to obtain
Figure BDA0002552088390000046
And
Figure BDA0002552088390000047
the calculation method is as follows:
Figure BDA0002552088390000048
Figure BDA0002552088390000049
8.4) if
Figure BDA00025520883900000410
Then conformation
Figure BDA00025520883900000411
Alternative conformations
Figure BDA00025520883900000412
And go to step 9);
9) g +1, and iteratively executing the steps 6) to 8) until G is larger than G;
10) and outputting the result.
The technical conception of the invention is as follows: initializing a population by utilizing a first stage and a second stage of Rosetta based on a Rosetta framework, and generating a new test conformation by carrying out mutation and cross on a target conformation; secondly, according to residue contact maps predicted by four contact servers, a cosine similarity index based on the residue contact maps is designed to assist a Rosetta energy function score3 to update the conformation, so that algorithm sampling is guided to obtain the conformation with lower energy and more compact structure. The invention provides a protein structure prediction method based on multi-residue contact map cooperative constraint.
The invention has the beneficial effects that: firstly, by combining the residue contact map information predicted by different servers, the problems of insufficient recall rate and accuracy of a single residue contact map are solved; secondly, a cosine similarity index based on a residue contact map is designed to assist the Rosetta energy function score3 to update the conformation, so that the algorithm is guided to sample to obtain the conformation with lower energy and more compact structure.
Drawings
FIG. 1 is a graph of four predicted residue contacts.
FIG. 2 is a conformational distribution map obtained by sampling protein 1TEN based on a protein structure prediction method of multi-residue contact map co-constraint.
FIG. 3 is a three-dimensional structure predicted from the structure of the 1TEN protein based on the protein structure prediction method of the multi-residue contact map synergistic constraint.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1 to 3, a method for predicting a protein structure based on a multi-residue contact map synergistic constraint, the method comprising the steps of:
1) sequence information for a given protein of interest;
2) obtaining a fragment library file from a ROBETTA server (http:// www.robetta.org /) according to a target protein sequence, wherein the fragment library file comprises a 3 fragment library file and a 9 fragment library file;
3) according to the target protein sequence, four contictmaps, namely contiRaptorx, contitResPRE, contictDeepMaP and continecteBcon, are obtained by utilizing a Raptorx-Contact server (http:// RaptorX. uchago. edu/ContactMap /), an ResPRE server (https:// zhangglab. ccmb. med. edu/ResPRE /), a DeepMetaPSICOV server (http:// bioif. cs. ucl. ac. uk/psiprd /), a NeBcon server (https:// zhangb. ccmb. med. emuch. edu/NeBcon /);
4) setting parameters: the population size NP, the iteration times G of the algorithm, a cross factor CR, and an iteration algebra G of 0;
5) population initialization: random fragment assembly to generate NP initial conformations Ci,i={1,2,…,NP};
6) The conformational individuals in the population CiI e {1,2,3, …, NP } is regarded as the target conformation entity
Figure BDA0002552088390000051
Performing the following operations to generate a mutated conformation
Figure BDA0002552088390000052
6.1) randomly generating positive integers n1, n2, n3 in the range of 1 to NP, wherein n1 ≠ n2 ≠ n3 ≠ i;
6.2) in conformation Cn1Randomly selected 9-fragment at position to replace conformation Cn3From the fragment corresponding to the same position in conformation Cn2Randomly choosing one and conformation C in positionn1Selecting 9 segments with different positions to replace even image Cn3And then the corresponding fragment in the same position of (A) is used for conformation Cn3Performing 3-segment assembly to generate individual with variant conformation
Figure BDA0002552088390000053
7) For the variant conformation
Figure BDA0002552088390000054
Performing a crossover operation by i e {1,2,3, …, NP } to generate a test constellation
Figure BDA0002552088390000055
7.1) generating a random number rand1, wherein rand1 belongs to (0, 1);
7.2) if the random number rand1 is less than or equal to CR, the target conformation is selected
Figure BDA0002552088390000056
In which a 3-fragment is randomly selected to be substituted into a variant conformation
Figure BDA0002552088390000057
Otherwise mutated conformation
Figure BDA0002552088390000058
The change is not changed;
8) for each target conformation
Figure BDA0002552088390000059
And a test conformation
Figure BDA00025520883900000510
Carrying out selection operation;
8.1) separately calculated with the Rosetta score3 energy function
Figure BDA00025520883900000511
And
Figure BDA00025520883900000512
energy of (2):
Figure BDA00025520883900000513
and
Figure BDA00025520883900000514
8.2) if
Figure BDA00025520883900000515
Then conformation
Figure BDA00025520883900000516
Is rejected, otherwise, step 8.3 is continued);
8.3) first, the handle
Figure BDA00025520883900000517
And
Figure BDA00025520883900000518
is converted into a one-dimensional vector with the length of L multiplied by L
Figure BDA00025520883900000519
And
Figure BDA00025520883900000520
converting contictRaptorx, contitResPRE, contictDeepMetaPSICOV and contictNeBcon into 4 one-dimensional vectors L x L in length
Figure BDA0002552088390000061
Figure BDA0002552088390000062
And
Figure BDA0002552088390000063
wherein L is the length of the protein sequence; then separately calculate
Figure BDA0002552088390000064
And
Figure BDA0002552088390000065
and
Figure BDA0002552088390000066
cosine similarity and summing to obtain
Figure BDA0002552088390000067
And
Figure BDA0002552088390000068
the calculation method is as follows:
Figure BDA0002552088390000069
Figure BDA00025520883900000610
8.4) if
Figure BDA00025520883900000611
Then conformation
Figure BDA00025520883900000612
Alternative conformations
Figure BDA00025520883900000613
And go to step 9);
9) g +1, and iteratively executing the steps 6) to 8) until G is larger than G;
10) and outputting the result.
Taking protein 1TEN with the sequence length of 87 as an example, a protein structure prediction method based on multi-residue contact map cooperative constraint comprises the following steps:
1) sequence information for a given protein of interest;
2) obtaining a fragment library file from a ROBETTA server (http:// www.robetta.org /) according to a target protein sequence, wherein the fragment library file comprises a 3 fragment library file and a 9 fragment library file;
3) according to the target protein sequence, four contictmaps, namely contiRaptorx, contitResPRE, contictDeepMaP and continecteBcon, are obtained by utilizing a Raptorx-Contact server (http:// RaptorX. uchago. edu/ContactMap /), an ResPRE server (https:// zhangglab. ccmb. med. edu/ResPRE /), a DeepMetaPSICOV server (http:// bioif. cs. ucl. ac. uk/psiprd /), a NeBcon server (https:// zhangb. ccmb. med. emuch. edu/NeBcon /);
4) setting parameters: the population size NP is 100, the iteration number G of the algorithm is 300, the cross factor CR is 0.5, and the iteration algebra G is 0;
5) population initialization: random fragment assembly to generate NP initial conformations Ci,i={1,2,…,NP};
6) The conformational individuals in the population CiI e {1,2,3, …, NP } is regarded as the target conformation entity
Figure BDA00025520883900000614
Performing the following operations to generate a mutated conformation
Figure BDA00025520883900000615
6.1) randomly generating positive integers n1, n2, n3 in the range of 1 to NP, wherein n1 ≠ n2 ≠ n3 ≠ i;
6.2) in conformation Cn1Randomly selected 9-fragment at position to replace conformation Cn3From the fragment corresponding to the same position in conformation Cn2Randomly choosing one and conformation C in positionn1Selecting 9 segments with different positions to replace even image Cn3And then the corresponding fragment in the same position of (A) is used for conformation Cn3Performing 3-segment assembly to generate individual with variant conformation
Figure BDA0002552088390000071
7) For the variant conformation
Figure BDA0002552088390000072
Performing a crossover operation by i e {1,2,3, …, NP } to generate a test constellation
Figure BDA0002552088390000073
7.1) generating a random number rand1, wherein rand1 belongs to (0, 1);
7.2) if the random number rand1 is less than or equal to CR, the target conformation is selected
Figure BDA0002552088390000074
In which a 3-fragment is randomly selected to be substituted into a variant conformation
Figure BDA0002552088390000075
Otherwise mutated conformation
Figure BDA0002552088390000076
The change is not changed;
8) for each target conformation
Figure BDA0002552088390000077
And a test conformation
Figure BDA0002552088390000078
Carrying out selection operation;
8.1) separately calculated with the Rosetta score3 energy function
Figure BDA0002552088390000079
And
Figure BDA00025520883900000710
energy of (2):
Figure BDA00025520883900000711
and
Figure BDA00025520883900000712
8.2) if
Figure BDA00025520883900000713
Then conformation
Figure BDA00025520883900000714
Rejected, otherwise, continues to execute step 8.3);
8.3) first, the handle
Figure BDA00025520883900000715
And
Figure BDA00025520883900000716
is converted into a one-dimensional vector with the length of L multiplied by L
Figure BDA00025520883900000717
And
Figure BDA00025520883900000718
make contictRaptorx, contitResPRE, contictDeepMetaPSICOV and conticNeBcon are converted into 4 one-dimensional vectors of length L × L
Figure BDA00025520883900000719
Figure BDA00025520883900000720
And
Figure BDA00025520883900000721
wherein L is the length of the protein sequence; then separately calculate
Figure BDA00025520883900000722
And
Figure BDA00025520883900000723
and
Figure BDA00025520883900000724
cosine similarity and summing to obtain
Figure BDA00025520883900000725
And
Figure BDA00025520883900000726
the calculation method is as follows:
Figure BDA00025520883900000727
Figure BDA00025520883900000728
8.4) if
Figure BDA00025520883900000729
Then conformation
Figure BDA00025520883900000730
Alternative conformations
Figure BDA00025520883900000731
And go to step 9);
9) g +1, and iteratively executing the steps 6) to 8) until G is larger than G;
10) and outputting the result.
Taking protein 1TEN with sequence length 87 as an example, the above method is used to obtain the near-native conformation of the protein, the average root mean square deviation between the structure obtained by running 300 generations and the native structure is 2.86, the minimum root mean square deviation is 2.01, and the predicted three-dimensional structure is shown in FIG. 3.
The foregoing illustrates one example of the invention, and it will be apparent that the invention is not limited to the above-described embodiments, but may be practiced with various modifications without departing from the essential spirit of the invention and without departing from the spirit thereof.

Claims (1)

1. A protein structure prediction method based on multi-residue contact map cooperative constraint is characterized in that: the method comprises the following steps:
1) sequence information for a given protein of interest;
2) obtaining fragment library files from a ROBETTA server according to a target protein sequence, wherein the fragment library files comprise 3 fragment library files and 9 fragment library files;
3) according to the target protein sequence, utilizing a RaptorX-Contact server, a ResPRE server, a DeepMetaPsICOV server and a NeBcon server to obtain four contictmaps which are respectively contictRaptorX, contitResPRE, contictDeepMetaPSICOV and contictNeBcon;
4) setting parameters: the population size NP, the iteration times G of the algorithm, a cross factor CR, and an iteration algebra G of 0;
5) population initialization: random fragment assembly to generate NP initial conformations Ci,i={1,2,…,NP};
6) The conformational individuals in the population CiI e {1,2,3, …, NP } is regarded as the target conformation entity
Figure RE-FDA0002638306920000011
For each one
Figure RE-FDA0002638306920000012
Performing the following operations to generate a mutated conformation
Figure RE-FDA0002638306920000013
6.1) randomly generating positive integers n1, n2, n3 in the range of 1 to NP, wherein n1 ≠ n2 ≠ n3 ≠ i;
6.2) in conformation Cn1Randomly selected 9-fragment at position to replace conformation Cn3From the fragment corresponding to the same position in conformation Cn2Randomly choosing one and conformation C in positionn1Selecting 9 segments with different positions to replace even image Cn3And then the corresponding fragment in the same position of (A) is used for conformation Cn3Performing 3-segment assembly to generate individual with variant conformation
Figure RE-FDA0002638306920000014
7) For the variant conformation
Figure RE-FDA0002638306920000015
Performing a crossover operation by i e {1,2,3, …, NP } to generate a test constellation
Figure RE-FDA0002638306920000016
7.1) generating a random number rand1, wherein rand1 belongs to (0, 1);
7.2) if the random number rand1 is less than or equal to CR, the target conformation is selected
Figure RE-FDA0002638306920000017
In which a 3-fragment is randomly selected to be substituted into a variant conformation
Figure RE-FDA0002638306920000018
Otherwise mutated conformation
Figure RE-FDA0002638306920000019
The change is not changed;
8) for each target conformation
Figure RE-FDA00026383069200000110
And a test conformation
Figure RE-FDA00026383069200000111
Carrying out selection operation;
8.1) separately calculated with the Rosetta score3 energy function
Figure RE-FDA00026383069200000112
And
Figure RE-FDA00026383069200000113
energy of (2):
Figure RE-FDA00026383069200000114
and
Figure RE-FDA00026383069200000115
8.2) if
Figure RE-FDA00026383069200000116
Then conformation
Figure RE-FDA00026383069200000117
Rejected, otherwise, continues to execute step 8.3);
8.3) first, the handle
Figure RE-FDA0002638306920000021
And
Figure RE-FDA0002638306920000022
is converted into a one-dimensional vector with the length of L multiplied by L
Figure RE-FDA0002638306920000023
And
Figure RE-FDA0002638306920000024
converting contictRaptorx, contitResPRE, contictDeepMetaPSICOV and contictNeBcon into 4 one-dimensional vectors L x L in length
Figure RE-FDA0002638306920000025
Figure RE-FDA0002638306920000026
And
Figure RE-FDA0002638306920000027
wherein L is the length of the protein sequence; then separately calculate
Figure RE-FDA0002638306920000028
And
Figure RE-FDA0002638306920000029
and
Figure RE-FDA00026383069200000210
cosine similarity and summing to obtain
Figure RE-FDA00026383069200000211
And
Figure RE-FDA00026383069200000212
the calculation method is as follows:
Figure RE-FDA00026383069200000213
Figure RE-FDA00026383069200000214
8.4) if
Figure RE-FDA00026383069200000215
Then conformation
Figure RE-FDA00026383069200000216
Alternative conformations
Figure RE-FDA00026383069200000217
And go to step 9);
9) g +1, and iteratively executing the steps 6) to 8) until G is larger than G;
10) and outputting the result.
CN202010578257.0A 2020-06-23 2020-06-23 Protein structure prediction method based on multi-residue contact map cooperative constraint Active CN111815036B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010578257.0A CN111815036B (en) 2020-06-23 2020-06-23 Protein structure prediction method based on multi-residue contact map cooperative constraint

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010578257.0A CN111815036B (en) 2020-06-23 2020-06-23 Protein structure prediction method based on multi-residue contact map cooperative constraint

Publications (2)

Publication Number Publication Date
CN111815036A true CN111815036A (en) 2020-10-23
CN111815036B CN111815036B (en) 2022-04-08

Family

ID=72845425

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010578257.0A Active CN111815036B (en) 2020-06-23 2020-06-23 Protein structure prediction method based on multi-residue contact map cooperative constraint

Country Status (1)

Country Link
CN (1) CN111815036B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108846256A (en) * 2018-06-07 2018-11-20 浙江工业大学 A kind of group's Advances in protein structure prediction based on contact residues information
CN109086565A (en) * 2018-07-12 2018-12-25 浙江工业大学 A kind of Advances in protein structure prediction based on contiguity constraint between residue
CN109346128A (en) * 2018-08-01 2019-02-15 浙江工业大学 A kind of Advances in protein structure prediction based on residue information dynamic select strategy
CN110148437A (en) * 2019-04-16 2019-08-20 浙江工业大学 A kind of Advances in protein structure prediction that contact residues auxiliary strategy is adaptive

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108846256A (en) * 2018-06-07 2018-11-20 浙江工业大学 A kind of group's Advances in protein structure prediction based on contact residues information
CN109086565A (en) * 2018-07-12 2018-12-25 浙江工业大学 A kind of Advances in protein structure prediction based on contiguity constraint between residue
CN109346128A (en) * 2018-08-01 2019-02-15 浙江工业大学 A kind of Advances in protein structure prediction based on residue information dynamic select strategy
CN110148437A (en) * 2019-04-16 2019-08-20 浙江工业大学 A kind of Advances in protein structure prediction that contact residues auxiliary strategy is adaptive

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHUN-XIANG PENG: ""De novo Protein Structure Prediction by Coupling Contact with Distance Profile"", 《TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS》 *
倪洪杰 等: ""一种阶段性策略自适应差分进化算法"", 《计算机科学》 *

Also Published As

Publication number Publication date
CN111815036B (en) 2022-04-08

Similar Documents

Publication Publication Date Title
CN108846256B (en) Group protein structure prediction method based on residue contact information
CN109524058B (en) Protein dimer structure prediction method based on differential evolution
CN110148437B (en) Residue contact auxiliary strategy self-adaptive protein structure prediction method
CN109215732B (en) Protein structure prediction method based on residue contact information self-learning
CN115458039B (en) Method and system for predicting single-sequence protein structure based on machine learning
CN109360599B (en) Protein structure prediction method based on residue contact information cross strategy
CN109872770B (en) Variable strategy protein structure prediction method combined with displacement degree evaluation
CN111815036B (en) Protein structure prediction method based on multi-residue contact map cooperative constraint
CN111180004B (en) Multi-contact information sub-population strategy protein structure prediction method
CN109360597B (en) Group protein structure prediction method based on global and local strategy cooperation
Hong et al. fastmsa: Accelerating multiple sequence alignment with dense retrieval on protein language
CN109509510B (en) Protein structure prediction method based on multi-population ensemble variation strategy
CN108920894B (en) Protein conformation space optimization method based on brief abstract convex estimation
CN109360598B (en) Protein structure prediction method based on two-stage sampling
CN109147867B (en) Group protein structure prediction method based on dynamic segment length
CN109326321B (en) Abstract convex estimation-based k-nearest neighbor protein structure prediction method
CN109243526B (en) Protein structure prediction method based on specific fragment crossing
CN109448786B (en) Method for predicting protein structure by lower bound estimation dynamic strategy
CN110706741B (en) Multi-modal protein structure prediction method based on sequence niche
CN109461471B (en) Adaptive protein structure prediction method based on championship mechanism
CN109461470B (en) Protein structure prediction energy function weight optimization method
CN109300504B (en) Protein structure prediction method based on variable isoelite selection
CN109300503B (en) Global and local lower bound estimation synergistic group protein structure prediction method
CN111161791B (en) Experimental data-assisted adaptive strategy protein structure prediction method
CN112085246B (en) Protein structure prediction method based on residue pair distance constraint

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20231219

Address after: 518054, D1101, Building 4, Software Industry Base, No. 19, 17, and 18 Haitian 1st Road, Binhai Community, Yuehai Street, Nanshan District, Shenzhen City, Guangdong Province

Patentee after: Shenzhen Xinrui Gene Technology Co.,Ltd.

Address before: 510075 No. n2248, floor 3, Xingguang Yingjing, No. 117, Shuiyin Road, Yuexiu District, Guangzhou City, Guangdong Province

Patentee before: GUANGZHOU ZHAOJI BIOTECHNOLOGY CO.,LTD.

Effective date of registration: 20231219

Address after: 510075 No. n2248, floor 3, Xingguang Yingjing, No. 117, Shuiyin Road, Yuexiu District, Guangzhou City, Guangdong Province

Patentee after: GUANGZHOU ZHAOJI BIOTECHNOLOGY CO.,LTD.

Address before: 310014 No. 18 Chao Wang Road, Xiacheng District, Zhejiang, Hangzhou

Patentee before: JIANG University OF TECHNOLOGY