CN113257338A - Protein structure prediction method based on residue contact diagram information game mechanism - Google Patents

Protein structure prediction method based on residue contact diagram information game mechanism Download PDF

Info

Publication number
CN113257338A
CN113257338A CN202110440653.1A CN202110440653A CN113257338A CN 113257338 A CN113257338 A CN 113257338A CN 202110440653 A CN202110440653 A CN 202110440653A CN 113257338 A CN113257338 A CN 113257338A
Authority
CN
China
Prior art keywords
conformation
pool
conformations
energy
conformational
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110440653.1A
Other languages
Chinese (zh)
Inventor
张贵军
侯铭桦
魏源
彭春祥
杨涛
郭赛赛
周晓根
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202110440653.1A priority Critical patent/CN113257338A/en
Publication of CN113257338A publication Critical patent/CN113257338A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/20Protein or domain folding

Landscapes

  • Spectroscopy & Molecular Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A protein structure prediction method based on a residue contact graph information game mechanism comprises the steps of firstly, obtaining a plurality of residue contact graphs through four protein residue contact servers of Raptorx, ResPRE, NeBcon and DeepMetaPSICOV selected according to Jaccard indexes of CASP games so as to construct a plurality of energy functions; secondly, initializing the population by utilizing the first and second stages of Rosetta, and generating a new test conformation by carrying out variation and crossing on the target conformation; and finally, designing a Pareto-based multi-objective optimization algorithm to update the conformation according to an energy function constructed by the four-residue contact diagram, so as to guide algorithm sampling to obtain the conformation with the structure closer to the natural state. The invention provides a protein structure prediction method based on a residue contact map information game mechanism.

Description

Protein structure prediction method based on residue contact diagram information game mechanism
Technical Field
The invention relates to the fields of bioinformatics and computational intelligence, in particular to a protein structure prediction method based on a residue contact diagram information game mechanism.
Background
The life health is the leading direction of future industry development in the world, and is a basic field for improving the health level of people and enhancing the acquaintance of common people. The reproductive activities of all life processes and races are closely related to the synthesis, decomposition and change of proteins. The three-dimensional structure of a protein determines its specific biological function and is the material basis for life activities. Misfolding of the protein may result in failure to function properly. For example, in the brain of senile dementia patients, there are numerous disordered protein clusters formed by misfolded proteins. Therefore, in order to realize breakthrough in the field of life health and understand life phenomena and life processes more deeply to realize targeted drug development, the prerequisite is to acquire the three-dimensional structure of the protein.
At present, conventional methods of biological wet experiments, including X-ray crystallography, nuclear magnetic resonance and cryoelectron microscopy, although capable of determining the three-dimensional structure of proteins, are highly demanding on materials, instruments and personnel and are extremely time-consuming. Therefore, it is urgently required to perform structural modeling of sequences and to search for protein structure prediction by using computational techniques.
Protein structure prediction is taken as a major research problem in the field of bioinformatics, and two major fields exist in the field at present, namely an energy function model is constructed according to physicochemical knowledge of biomolecules, so that the trend is led all the time from the early CASP competition, and the situation is also a very important position at present. It is represented by Rosetta at Baker laboratory of Washington university and I-TASSER at Zhang Yang laboratory of Michigan university. As a structural prediction tool, the Rosetta algorithm is capable of predicting, designing, and analyzing a variety of biomolecular systems, including proteins, RNA, DNA, peptides, small molecules, and non-canonical or derivatized amino acids. I-TASSER is a method for predicting protein structure and function. The method predicts the functions of targets by a multithreading method LOMETS, a protein function database BioLiP and the like. The physicochemical model method achieves abundant results and simultaneously shows the defects of insufficient expression accuracy, imperfect characteristics and the like. And the other block is mainly used for predicting contact, distance and other information based on deep learning so as to construct a knowledge model. In the CASP14 results from the previous days, the AlphaFold proposed by Google ranked first in the artificial group and far beyond the second, Tencent, tfold first contest also achieved good performance ranked first in the contact group.
From the aspect of CASP competition contact prediction, although the precision of contact prediction is higher and higher at present, error information still exists; and the Jaccard distance graph shows that the information sets captured by different prediction servers are different. In addition, although the deep learning method has made great progress in the field of protein structure prediction, especially residue contact prediction, when a protein structure is folded, a plurality of different sets of residue contact information are often integrated by adopting simple weighted superposition, so that a part of predicted residue contact information is lost, and the prediction accuracy is inevitably influenced. On the other hand, prediction of protein structure by computational techniques is usually evaluated using a single energy function, which is limited in the ability to sample, and which ultimately yields a conformation of the protein that may be optimal in energy but not necessarily optimal, i.e., a conformation that is low in energy is not necessarily the closest to the native conformation.
Therefore, the existing protein structure prediction methods have shortcomings in data reception efficiency and conformation selection evaluation, and improvements are needed.
The invention content is as follows:
in order to overcome the defects of low data receiving efficiency and low prediction precision of the conventional protein structure prediction method, the invention provides the protein structure prediction method based on a residue contact graph information game mechanism, wherein a plurality of energy functions are constructed by a plurality of residue contact graphs based on four protein residue contact servers of Raptorx, ResPRE, NeBcon and DeepMetaPSICOV and a Rosetta platform, and a multi-objective optimization method is adopted to guide conformation space optimization.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a protein structure prediction method based on residue contact map information gambling mechanism, the method comprising the steps of:
1) sequence information for a given protein of interest;
2) according to the sequence information of the given target protein, the following four contact map prediction servers are utilized:
RaptorX(http://raptorx.uchicago.edu/ContactMap/);
ResPRE(https://zhanglab.ccmb.med.umich.edu/ResPRE/);
NeBcon(https://zhanglab.ccmb.med.umich.edu/NeBcon/);
DeepMetaPSICOV(http://bioinf.cs.ucl.ac.uk/psipred/);
acquiring four residue contact information files, and performing data processing to generate four contacMapRapter files, namely, contactMapRaptorX, contactMapRespre, contactMapNeBcon and contactMapDeepMetaPSICOV;
3) respectively constructing an Energy function Energy raptorX (C) according to four contact map files, namely contact map pRaprammonium X, contact map pRespPRE, contact map NeBcon and contact map data copy PSICOVn)、Energy ResPRE(Cn)、Energy NeBcon(Cn)、Energy PSICOV(Cn) The formula is as follows:
Figure BDA0003035148580000031
Figure BDA0003035148580000032
Figure BDA0003035148580000033
Figure BDA0003035148580000034
Figure BDA0003035148580000035
Figure BDA0003035148580000036
Figure BDA0003035148580000037
Figure BDA0003035148580000038
wherein the content of the first and second substances,
Figure BDA0003035148580000039
represents the confidence of the contact of the kth residue pair (i, j) in the residue contact maps contactMapRaptorX, contactMapRespre, contactMapNeBcon and contactMapDeepetaPSICOV,
Figure BDA00030351485800000310
represents the true distance between the kth residue pair (i, j), dconThe threshold value of 8, the maximum distance that two residues touch,
Figure BDA00030351485800000311
Figure BDA00030351485800000312
respectively represent conformation CnIs determined at four energy functions energy raptorx (C)n)、Energy ResPRE(Cn)、Energy NeBcon(Cn)、Energy PSICOV(Cn) The contact score of (1);
4) acquiring fragment library files from a ROBETTA server (http:// www.robetta.org /) according to a target protein sequence, wherein the fragment library files comprise 3 fragment library files and 9 fragment library files;
5) setting parameters: setting an initial iteration algebra G to be 0, wherein the population size NP, a cross factor CR and an iteration number G are set;
6) population initialization: random fragment assembly to generate NP initial conformations Cn,n={1,2,…,NP};
7) Will be conformation CnSubstituting into four Energy functions Energy raptorX (C)n)、Energy ResPRE(Cn)、Energy NeBcon(Cn)、Energy PSICOV(Cn) In the method, an energy value is obtained
Figure BDA0003035148580000041
Constructed as an energy array
Figure BDA0003035148580000042
8) According to energy array
Figure BDA0003035148580000043
A first conformational pool was constructed as follows:
8.1) setting the initial conformation number N to 0;
8.2) traversing the population, each conformation CnEnergy array of
Figure BDA0003035148580000044
If none of the four energy values of the conformations is better than the current conformation C, compared with all other conformationsnI.e. by
Figure BDA0003035148580000045
So that
Figure BDA0003035148580000046
Wherein C ismTo remove the current conformation CnIf any conformation is in other conformation, the solution is recorded as Pareto effective solution;
8.3) placing the conformation effectively solved by Pareto into a first conformation pool, recording the current conformation number as N, and removing the rest conformations;
9) and (3) circulation: g +1, if G > G, go to step 14);
10) subjecting the conformational individuals in the first conformational pool to CnN ∈ {1,2,3, …, N } is regarded as the target conformation entity
Figure BDA0003035148580000047
Performing the following operations to generate a mutated conformation
Figure BDA0003035148580000048
The process is as follows:
10.1) randomly generating positive integers N1, N2, N3 in the range of 1 to N, and N1 ≠ N2 ≠ N3 ≠ N;
10.2) in conformation Cn1Randomly selected 9-fragment at position to replace conformation Cn3From the fragment corresponding to the same position in conformation Cn2Randomly choosing one and conformation C in positionn1Selection of differently positioned 9 fragments for replacement of conformation Cn3And then the corresponding fragment in the same position of (A) is used for conformation Cn3Performing 3-segment assembly to generate individual with variant conformation
Figure BDA0003035148580000049
11) For the variant conformation
Figure BDA00030351485800000410
N e {1,2,3, …, N } performs a crossover operation to generate a test constellation
Figure BDA00030351485800000411
The process is as follows:
11.1) generating a random number rand1, wherein rand1 belongs to (0, 1);
11.2) if the random number rand1 is less than or equal to CR, then starting from the target conformation
Figure BDA00030351485800000412
In which a 3-fragment is randomly selected to be substituted into a variant conformation
Figure BDA00030351485800000413
Otherwise mutated conformation
Figure BDA00030351485800000414
The change is not changed;
11.3) test conformation to be generated
Figure BDA0003035148580000051
Placing into a second conformation pool;
12) testing the conformation in the second conformation cell
Figure BDA0003035148580000052
Substituting into four Energy functions Energy raptorX (C)n)、Energy ResPRE(Cn)、Energy NeBcon(Cn)、Energy PSICOV(Cn) In the method, an energy value is obtained
Figure BDA0003035148580000053
Figure BDA0003035148580000054
Constructed as an energy array
Figure BDA0003035148580000055
13) Traversing the second conformation pool, and reserving conformations of the full population Pareto effective solution
Figure BDA0003035148580000056
The process is as follows:
13.1) the second conformation pool internal conformations are compared with each other, the conformations that retain the effective solution of Pareto
Figure BDA0003035148580000057
Recording the number of conformations as NTBA
13.2) conformation in the second conformation pool
Figure BDA0003035148580000058
m∈{1,2,3,…,NTBAAnd the conformations in the first conformation well
Figure BDA0003035148580000059
N ∈ {1,2,3, …, N } for comparison:
13.2.1) if
Figure BDA00030351485800000510
So that
Figure BDA00030351485800000511
Deleting conformations in the second conformation pool
Figure BDA00030351485800000512
13.2.2) if
Figure BDA00030351485800000513
So that
Figure BDA00030351485800000514
And certainly
Figure BDA00030351485800000515
So that
Figure BDA00030351485800000516
Then use the conformation
Figure BDA00030351485800000517
Replacement of conformations in the first conformational pool
Figure BDA00030351485800000518
And deleting conformations in the second conformation pool
Figure BDA00030351485800000519
13.2.3) if present for any one of the conformations in the first pool of conformations
Figure BDA00030351485800000520
All exist k epsilon [1,2,3,4 ]]So that
Figure BDA00030351485800000521
Then the conformation will be changed
Figure BDA00030351485800000522
Retained in the second conformational bath;
13.2.4) update NTBARecording the number of conformations in the current second conformation pool;
14) for the conformation of the first conformation pool
Figure BDA00030351485800000523
And the conformation of the second conformation pool
Figure BDA00030351485800000524
The selection operation is carried out by the following process:
14.1) if the sum of the conformational numbers of the first conformational pool and the second conformational pool is greater than the set population number, i.e., N + NTBAIf not, continuing to step 14.2), otherwise, putting the conformation in the second conformation pool into the first conformation pool, emptying the second conformation pool and jumping to step 9);
14.2) introducing a conformational similarity index RMSD by calculating the RMSD value between each conformation and all the remaining conformations in two conformational pools, as shown in formula (5), wherein
Figure BDA00030351485800000525
Is in conformation Ci(x, y, z) coordinates in the internal atomic space,
Figure BDA00030351485800000526
in any of the remaining conformations Cj(x, y, z) coordinates in the internal atomic space;
Figure BDA0003035148580000061
14.3) judging the conformation similarity according to the RMSD value, selecting NP conformations with the most abundant diversity, putting the NP conformations into a first conformation pool, emptying a second conformation pool, and transferring to the step 9);
15) and outputting the result.
The technical conception of the invention is as follows: firstly, a plurality of residue contact maps are obtained by contacting four protein residues of Raptorx, ResPRE, NeBcon and DeepMetaPSICOV with a server so as to construct a plurality of energy functions; secondly, initializing the population by utilizing the first and second stages of Rosetta, and generating a new test conformation by carrying out variation and crossing on the target conformation; and finally, designing a Pareto-based multi-objective optimization algorithm to update the conformation according to an energy function constructed by the four-residue contact diagram, so as to guide algorithm sampling to obtain the conformation with the structure closer to the natural state. The invention provides a protein structure prediction method based on a residue contact map information game mechanism.
The invention has the beneficial effects that: firstly, the obtained residue contact information is predicted and obtained through different servers, so that the source diversity of the contact information is increased, and the influence of information loss and error leakage possibly caused by a single contact graph on structure prediction is reduced; secondly, a conformation selection method based on a residue contact information game mechanism is designed by combining a multi-objective optimization algorithm, and conformation guiding errors caused by inaccuracy of a traditional energy model are avoided.
Drawings
FIG. 1 is processed information of four predicted residue contact maps.
FIG. 2 is a conformational distribution diagram obtained by protein 1ELW sampling based on a protein structure prediction method of residue contact diagram information game mechanism.
FIG. 3 is a three-dimensional structure predicted from a 1ELW protein structure by a protein structure prediction method based on a residue contact map information game mechanism.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1 to 3, a method for predicting a protein structure based on a multi-residue contact map synergistic constraint, the method comprising the steps of:
1) sequence information for a given protein of interest;
2) according to the sequence information of the given target protein, the following four contact map prediction servers are utilized:
RaptorX(http://raptorx.uchicago.edu/ContactMap/);
ResPRE(https://zhanglab.ccmb.med.umich.edu/ResPRE/);
NeBcon(https://zhanglab.ccmb.med.umich.edu/NeBcon/);
DeepMetaPSICOV(http://bioinf.cs.ucl.ac.uk/psipred/);
acquiring four residue contact information files, and performing data processing to generate four contacMapRapter files, namely, contactMapRaptorX, contactMapRespre, contactMapNeBcon and contactMapDeepMetaPSICOV;
3) respectively constructing an energy function EnergyRaptorX (C) according to four contact map files, namely contact map pRaptorX, contact map pRespRE, contact map NeBcon and contact map data copy PSICOVn)、EnergyResPRE(Cn)、EnergyNeBcon(Cn)、EnergyPSICOV(Cn) The formula is as follows:
Figure BDA0003035148580000071
Figure BDA0003035148580000072
Figure BDA0003035148580000073
Figure BDA0003035148580000074
Figure BDA0003035148580000075
Figure BDA0003035148580000076
Figure BDA0003035148580000077
Figure BDA0003035148580000078
wherein the content of the first and second substances,
Figure BDA0003035148580000079
represents the confidence of the contact of the kth residue pair (i, j) in the residue contact maps contactMapRaptorX, contactMapRespre, contactMapNeBcon and contactMapDeepetaPSICOV,
Figure BDA0003035148580000081
represents the true distance between the kth residue pair (i, j), dconThe threshold value of 8, the maximum distance that two residues touch,
Figure BDA0003035148580000082
Figure BDA0003035148580000083
respectively represent conformation CnIs determined at four Energy functions Energy RaptorX (C)n)、Energy ResPRE(Cn)、Energy NeBcon(Cn)、Energy PSICOV(Cn) The contact score of (1);
4) acquiring fragment library files from a ROBETTA server (http:// www.robetta.org /) according to a target protein sequence, wherein the fragment library files comprise 3 fragment library files and 9 fragment library files;
5) setting parameters: setting an initial iteration algebra G to be 0, wherein the population size NP, a cross factor CR and an iteration number G are set;
6) population initialization: random fragment assembly to generate NP initial conformations Cn,n={1,2,…,NP};
7) Will be conformation CnSubstituting into four Energy functions Energy raptorX (C)n)、Energy ResPRE(Cn)、Energy NeBcon(Cn)、Energy PSICOV(Cn) In the method, an energy value is obtained
Figure BDA0003035148580000084
Constructed as an energy array
Figure BDA0003035148580000085
8) According to energy array
Figure BDA0003035148580000086
A first conformational pool was constructed as follows:
8.1) setting the initial conformation number N to 0;
8.2) traversing the population, each conformation CnEnergy array of
Figure BDA0003035148580000087
If none of the four energy values of the conformations is better than the current conformation C, compared with all other conformationsnI.e. by
Figure BDA0003035148580000088
So that
Figure BDA0003035148580000089
Wherein C ismTo remove the current conformation CnIf any conformation is in other conformation, the solution is recorded as Pareto effective solution;
8.3) placing the conformation effectively solved by Pareto into a first conformation pool, recording the current conformation number as N, and removing the rest conformations;
9) and (3) circulation: g +1, if G > G, go to step 14);
10) subjecting the conformational individuals in the first conformational pool to CnN ∈ {1,2,3, …, N } is regarded as the target conformation entity
Figure BDA00030351485800000810
Performing the following operations to generate a mutated conformation
Figure BDA00030351485800000811
The process is as follows:
10.1) randomly generating positive integers N1, N2, N3 in the range of 1 to N, and N1 ≠ N2 ≠ N3 ≠ N;
10.2) in conformation Cn1Randomly selected 9-fragment at position to replace conformation Cn3Corresponding to the same position ofFrom conformation Cn2Randomly choosing one and conformation C in positionn1Selection of differently positioned 9 fragments for replacement of conformation Cn3And then the corresponding fragment in the same position of (A) is used for conformation Cn3Performing 3-segment assembly to generate individual with variant conformation
Figure BDA0003035148580000091
11) For the variant conformation
Figure BDA0003035148580000092
N e {1,2,3, …, N } performs a crossover operation to generate a test constellation
Figure BDA0003035148580000093
The process is as follows:
11.1) generating a random number rand1, wherein rand1 belongs to (0, 1);
11.2) if the random number rand1 is less than or equal to CR, then starting from the target conformation
Figure BDA0003035148580000094
In which a 3-fragment is randomly selected to be substituted into a variant conformation
Figure BDA0003035148580000095
Otherwise mutated conformation
Figure BDA0003035148580000096
The change is not changed;
11.3) test conformation to be generated
Figure BDA0003035148580000097
Placing into a second conformation pool;
12) testing the conformation in the second conformation cell
Figure BDA0003035148580000098
Substituting into four Energy functions Energy raptorX (C)n)、Energy ResPRE(Cn)、Energy NeBcon(Cn)、Energy PSICOV(Cn) In the method, an energy value is obtained
Figure BDA0003035148580000099
Figure BDA00030351485800000910
Constructed as an energy array
Figure BDA00030351485800000911
13) Traversing the second conformation pool, and reserving conformations of the full population Pareto effective solution
Figure BDA00030351485800000912
The process is as follows:
13.1) the second conformation pool internal conformations are compared with each other, the conformations that retain the effective solution of Pareto
Figure BDA00030351485800000913
Recording the number of conformations as NTBA
13.2) conformation in the second conformation pool
Figure BDA00030351485800000914
m∈{1,2,3,…,NTBAAnd the conformations in the first conformation well
Figure BDA00030351485800000915
N ∈ {1,2,3, …, N } for comparison:
13.2.1) if
Figure BDA00030351485800000916
So that
Figure BDA00030351485800000917
Deleting conformations in the second conformation pool
Figure BDA00030351485800000918
13.2.2) if
Figure BDA00030351485800000919
So that
Figure BDA00030351485800000920
And certainly
Figure BDA00030351485800000921
So that
Figure BDA00030351485800000922
Then use the conformation
Figure BDA00030351485800000923
Replacement of conformations in the first conformational pool
Figure BDA00030351485800000924
And deleting conformations in the second conformation pool
Figure BDA00030351485800000925
13.2.3) if present for any one of the conformations in the first pool of conformations
Figure BDA00030351485800000926
All exist k epsilon [1,2,3,4 ]]So that
Figure BDA00030351485800000927
Then the conformation will be changed
Figure BDA00030351485800000928
Retained in the second conformational bath;
13.2.4) update NTBARecording the number of conformations in the current second conformation pool;
14) for the conformation of the first conformation pool
Figure BDA00030351485800000929
And the conformation of the second conformation pool
Figure BDA00030351485800000930
The selection operation is carried out by the following process:
14.1) if the first conformational cell and the second conformational cellIs greater than a set population quantity, i.e. N + NTBAIf not, continuing to step 14.2), otherwise, putting the conformation in the second conformation pool into the first conformation pool, emptying the second conformation pool and jumping to step 9);
14.2) introducing a conformational similarity index RMSD by calculating the RMSD value between each conformation and all the remaining conformations in two conformational pools, as shown in formula (5), wherein
Figure BDA0003035148580000101
Is in conformation Ci(x, y, z) coordinates in the internal atomic space,
Figure BDA0003035148580000102
in any of the remaining conformations Cj(x, y, z) coordinates in the internal atomic space;
Figure BDA0003035148580000103
14.3) judging the conformation similarity according to the RMSD value, selecting NP conformations with the most abundant diversity, putting the NP conformations into a first conformation pool, emptying a second conformation pool, and transferring to the step 9);
15) and outputting the result.
Taking protein 1ELW with the sequence length of 117 as an implementation case, the protein structure prediction method based on the residue contact map information game mechanism comprises the following steps:
1) sequence information for a given protein of interest;
2) according to the sequence information of the given target protein, the following four contact map prediction servers are utilized:
RaptorX(http://raptorx.uchicago.edu/ContactMap/);
ResPRE(https://zhanglab.ccmb.med.umich.edu/ResPRE/);
NeBcon(https://zhanglab.ccmb.med.umich.edu/NeBcon/);
DeepMetaPSICOV(http://bioinf.cs.ucl.ac.uk/psipred/);
acquiring four residue contact information files, and performing data processing to generate four contacMapRapter files, namely, contactMapRaptorX, contactMapRespre, contactMapNeBcon and contactMapDeepMetaPSICOV;
3) respectively constructing an energy function EnergyRaptorX (C) according to four contact map files, namely contact map pRaptorX, contact map pRespRE, contact map NeBcon and contact map data copy PSICOVn)、EnergyResPRE(Cn)、EnergyNeBcon(Cn)、EnergyPSICOV(Cn) The formula is as follows:
Figure BDA0003035148580000104
Figure BDA0003035148580000105
Figure BDA0003035148580000111
Figure BDA0003035148580000112
Figure BDA0003035148580000113
Figure BDA0003035148580000114
Figure BDA0003035148580000115
Figure BDA0003035148580000116
wherein the content of the first and second substances,
Figure BDA0003035148580000117
represents the confidence of the contact of the kth residue pair (i, j) in the residue contact maps contactMapRaptorX, contactMapRespre, contactMapNeBcon and contactMapDeepetaPSICOV,
Figure BDA0003035148580000118
represents the true distance between the kth residue pair (i, j), dconThe threshold value of 8, the maximum distance that two residues touch,
Figure BDA0003035148580000119
Figure BDA00030351485800001110
respectively represent conformation CnIs determined at four Energy functions Energy RaptorX (C)n)、Energy ResPRE(Cn)、Energy NeBcon(Cn)、Energy PSICOV(Cn) The contact score of (1);
4) acquiring fragment library files from a ROBETTA server (http:// www.robetta.org /) according to a target protein sequence, wherein the fragment library files comprise 3 fragment library files and 9 fragment library files;
5) setting parameters: the population size NP is 200, the cross factor CR is 0.5, the iteration number G is 500, and the initial iteration algebra G is 0;
6) population initialization: random fragment assembly to generate NP initial conformations Cn,n={1,2,…,NP};
7) Will be conformation CnSubstituting into four Energy functions Energy raptorX (C)n)、Energy ResPRE(Cn)、Energy NeBcon(Cn)、Energy PSICOV(Cn) In the method, an energy value is obtained
Figure BDA00030351485800001111
Constructed as an energy array
Figure BDA0003035148580000121
8) According to energy array
Figure BDA0003035148580000122
A first conformational pool was constructed as follows:
8.1) setting the initial conformation number N to 0;
8.2) traversing the population, each conformation CnEnergy array of
Figure BDA0003035148580000123
If none of the four energy values of the conformations is better than the current conformation C, compared with all other conformationsnI.e. by
Figure BDA00030351485800001218
So that
Figure BDA0003035148580000124
Wherein C ismTo remove the current conformation CnIf any conformation is in other conformation, the solution is recorded as Pareto effective solution;
8.3) placing the conformation effectively solved by Pareto into a first conformation pool, recording the current conformation number as N, and removing the rest conformations;
9) and (3) circulation: g +1, if G > G, go to step 14);
10) subjecting the conformational individuals in the first conformational pool to CnN ∈ {1,2,3, …, N } is regarded as the target conformation entity
Figure BDA0003035148580000125
Performing the following operations to generate a mutated conformation
Figure BDA0003035148580000126
The process is as follows:
10.1) randomly generating positive integers N1, N2, N3 in the range of 1 to N, and N1 ≠ N2 ≠ N3 ≠ N;
10.2) in conformation Cn1Randomly selected 9-fragment at position to replace conformation Cn3From the fragment corresponding to the same position in conformation Cn2Randomly choosing one and conformation C in positionn1Selection of differently positioned 9 fragments for replacement of conformation Cn3Corresponding to the same position ofThen using the pair conformation Cn3Performing 3-segment assembly to generate individual with variant conformation
Figure BDA0003035148580000127
11) For the variant conformation
Figure BDA0003035148580000128
N e {1,2,3, …, N } performs a crossover operation to generate a test constellation
Figure BDA0003035148580000129
The process is as follows:
11.1) generating a random number rand1, wherein rand1 belongs to (0, 1);
11.2) if the random number rand1 is less than or equal to CR, then starting from the target conformation
Figure BDA00030351485800001210
In which a 3-fragment is randomly selected to be substituted into a variant conformation
Figure BDA00030351485800001211
Otherwise mutated conformation
Figure BDA00030351485800001212
The change is not changed;
11.3) test conformation to be generated
Figure BDA00030351485800001213
Placing into a second conformation pool;
12) testing the conformation in the second conformation cell
Figure BDA00030351485800001214
Into four energy functions EnergyRaptorx (C)n)、EnergyResPRE(Cn)、EnergyNeBcon(Cn)、EnergyPSICOV(Cn) In the method, an energy value is obtained
Figure BDA00030351485800001215
Figure BDA00030351485800001216
Constructed as an energy array
Figure BDA00030351485800001217
13) Traversing the second conformation pool, and reserving conformations of the full population Pareto effective solution
Figure BDA0003035148580000131
The process is as follows:
13.1) the second conformation pool internal conformations are compared with each other, the conformations that retain the effective solution of Pareto
Figure BDA0003035148580000132
Recording the number of conformations as NTBA
13.2) conformation in the second conformation pool
Figure BDA0003035148580000133
m∈{1,2,3,…,NTBAAnd the conformations in the first conformation well
Figure BDA0003035148580000134
N ∈ {1,2,3, …, N } for comparison:
13.2.1) if
Figure BDA0003035148580000135
So that
Figure BDA0003035148580000136
Deleting conformations in the second conformation pool
Figure BDA0003035148580000137
13.2.2) if
Figure BDA0003035148580000138
So that
Figure BDA0003035148580000139
And certainly
Figure BDA00030351485800001310
So that
Figure BDA00030351485800001311
Then use the conformation
Figure BDA00030351485800001312
Replacement of conformations in the first conformational pool
Figure BDA00030351485800001313
And deleting conformations in the second conformation pool
Figure BDA00030351485800001314
13.2.3) if present for any one of the conformations in the first pool of conformations
Figure BDA00030351485800001315
All exist k epsilon [1,2,3,4 ]]So that
Figure BDA00030351485800001316
Then the conformation will be changed
Figure BDA00030351485800001317
Retained in the second conformational bath;
13.2.4) update NTBARecording the number of conformations in the current second conformation pool;
14) for the conformation of the first conformation pool
Figure BDA00030351485800001318
And the conformation of the second conformation pool
Figure BDA00030351485800001319
The selection operation is carried out by the following process:
14.1) if the sum of the conformational numbers of the first conformational pool and the second conformational pool is greater than the set population number, i.e., N + NTBANot less than NP, continuing step 14.2), otherwise, putting the conformation in the second conformation pool into the first conformation pool, emptying the second conformation poolLike pool and jump to step 9);
14.2) introducing a conformational similarity index RMSD by calculating the RMSD value between each conformation and all the remaining conformations in two conformational pools, as shown in formula (5), wherein
Figure BDA00030351485800001320
Is in conformation Ci(x, y, z) coordinates in the internal atomic space,
Figure BDA00030351485800001321
in any of the remaining conformations Cj(x, y, z) coordinates in the internal atomic space;
Figure BDA00030351485800001322
14.3) judging the conformation similarity according to the RMSD value, selecting NP conformations with the most abundant diversity, putting the NP conformations into a first conformation pool, emptying a second conformation pool, and transferring to the step 9);
15) and outputting the result.
Taking the protein 1ELW with the sequence length of 117 as an example, the protein conformation in the near-natural state is obtained by the method, the average root mean square deviation between the structure obtained by running 500 generations and the natural state structure is 2.34, the minimum root mean square deviation is 1.65, and the predicted three-dimensional structure is shown in FIG. 3.
The foregoing illustrates one example of the invention, and it will be apparent that the invention is not limited to the above-described embodiments, but may be practiced with various modifications without departing from the essential spirit of the invention and without departing from the spirit thereof.

Claims (1)

1. A protein structure prediction method based on a residue contact map information game mechanism is characterized in that: the method comprises the following steps:
1) sequence information for a given protein of interest;
2) according to the sequence information of the given target protein, the following four contact map prediction servers are utilized: RaptorX, ResPRE, NeBcon, and DeepMetaPSICOV; acquiring four residue contact information files, and performing data processing to generate four contacMapRapter files which are named as contactMapRaptorX, contactMapRespre, contactMapNeBcon and contactMapDeepMetaPSICOV respectively;
3) respectively constructing an energy function EnergyRaptorX (C) according to four contact map files, namely contact map pRaptorX, contact map pRespRE, contact map NeBcon and contact map data copy PSICOVn)、EnergyResPRE(Cn)、EnergyNeBcon(Cn)、EnergyPSICOV(Cn) The formula is as follows:
Figure FDA0003035148570000011
Figure FDA0003035148570000012
Figure FDA0003035148570000013
Figure FDA0003035148570000014
Figure FDA0003035148570000015
Figure FDA0003035148570000016
Figure FDA0003035148570000017
Figure FDA0003035148570000021
wherein the content of the first and second substances,
Figure FDA0003035148570000022
represents the confidence of the contact of the kth residue pair (i, j) in the residue contact maps contactMapRaptorX, contactMapRespre, contactMapNeBcon and contactMapDeepetaPSICOV,
Figure FDA0003035148570000023
represents the true distance between the kth residue pair (i, j), dconThe threshold value of 8, the maximum distance that two residues touch,
Figure FDA0003035148570000024
Figure FDA0003035148570000025
respectively represent conformation CnIs determined at four energy functions energy raptorx (C)n)、EnergyResPRE(Cn)、EnergyNeBcon(Cn)、EnergyPSICOV(Cn) The contact score of (1);
4) acquiring fragment library files from a ROBETTA server according to a target protein sequence, wherein the fragment library files comprise 3 fragment library files and 9 fragment library files;
5) setting parameters: setting an initial iteration algebra G to be 0, wherein the population size NP, a cross factor CR and an iteration number G are set;
6) population initialization: random fragment assembly to generate NP initial conformations Cn,n={1,2,…,NP};
7) Will be conformation CnInto four energy functions EnergyRaptorx (C)n)、EnergyResPRE(Cn)、EnergyNeBcon(Cn)、EnergyPSICOV(Cn) In the method, an energy value is obtained
Figure FDA0003035148570000026
Construction ofIs an energy array
Figure FDA0003035148570000027
8) According to energy array
Figure FDA0003035148570000028
A first conformational pool was constructed as follows:
8.1) setting the initial conformation number N to 0;
8.2) traversing the population, each conformation CnEnergy array of
Figure FDA0003035148570000029
If none of the four energy values of the conformations is better than the current conformation C, compared with all other conformationsnI.e. by
Figure FDA00030351485700000210
So that
Figure FDA00030351485700000211
Wherein C ismTo remove the current conformation CnIf any conformation is in other conformation, the solution is recorded as Pareto effective solution;
8.3) placing the conformation effectively solved by Pareto into a first conformation pool, recording the current conformation number as N, and removing the rest conformations;
9) and (3) circulation: g +1, if G > G, go to step 14);
10) subjecting the conformational individuals in the first conformational pool to CnN ∈ {1,2,3, …, N } is regarded as the target conformation entity
Figure FDA0003035148570000031
Performing the following operations to generate a mutated conformation
Figure FDA0003035148570000032
The process is as follows:
6.1) randomly generating positive integers N1, N2, N3 in the range of 1 to N, wherein N1 ≠ N2 ≠ N3 ≠ N;
6.2) in conformation Cn1Randomly selected 9-fragment at position to replace conformation Cn3From the fragment corresponding to the same position in conformation Cn2Randomly choosing one and conformation C in positionn1Selection of differently positioned 9 fragments for replacement of conformation Cn3And then the corresponding fragment in the same position of (A) is used for conformation Cn3Performing 3-segment assembly to generate individual with variant conformation
Figure FDA0003035148570000033
11) For the variant conformation
Figure FDA0003035148570000034
Performing a crossover operation to generate a test constellation
Figure FDA0003035148570000035
The process is as follows:
11.1) generating a random number rand1, wherein rand1 belongs to (0, 1);
11.2) if the random number rand1 is less than or equal to CR, then starting from the target conformation
Figure FDA0003035148570000036
In which a 3-fragment is randomly selected to be substituted into a variant conformation
Figure FDA0003035148570000037
Otherwise mutated conformation
Figure FDA0003035148570000038
The change is not changed;
11.3) test conformation to be generated
Figure FDA0003035148570000039
Placing into a second conformation pool;
12) testing the conformation in the second conformation cell
Figure FDA00030351485700000310
Into four energy functions EnergyRaptorx (C)n)、EnergyResPRE(Cn)、EnergyNeBcon(Cn)、EnergyPSICOV(Cn) In the method, an energy value is obtained
Figure FDA00030351485700000311
Figure FDA00030351485700000312
Constructed as an energy array
Figure FDA00030351485700000313
13) Traversing the second conformation pool, and reserving conformations of the full population Pareto effective solution
Figure FDA00030351485700000314
The process is as follows:
13.1) the second conformation pool internal conformations are compared with each other, the conformations that retain the effective solution of Pareto
Figure FDA00030351485700000315
Recording the number of conformations as NTBA
13.2) conformation in the second conformation pool
Figure FDA00030351485700000316
Conformation in pool with first conformation
Figure FDA00030351485700000317
And (3) comparison:
13.2.1) if
Figure FDA00030351485700000318
So that
Figure FDA00030351485700000319
Deleting conformations in the second conformation pool
Figure FDA00030351485700000320
13.2.2) if
Figure FDA00030351485700000321
So that
Figure FDA00030351485700000322
And certainly
Figure FDA00030351485700000323
So that
Figure FDA00030351485700000324
Then use the conformation
Figure FDA00030351485700000325
Replacement of conformations in the first conformational pool
Figure FDA00030351485700000326
And deleting conformations in the second conformation pool
Figure FDA0003035148570000041
13.2.3) if present for any one of the conformations in the first pool of conformations
Figure FDA0003035148570000042
All exist k epsilon [1,2,3,4 ]]So that
Figure FDA0003035148570000043
Then the conformation will be changed
Figure FDA0003035148570000044
Retained in the second conformational bath;
13.2.4) update NTBARecording the number of conformations in the current second conformation pool;
14) for the first conformational cellConformation
Figure FDA0003035148570000045
And the conformation of the second conformation pool
Figure FDA0003035148570000046
The selection operation is carried out by the following process:
8.1) if the sum of the conformational numbers of the first conformational pool and the second conformational pool is greater than the set population number, i.e., N + NTBAIf not, continuing to step 14.2), otherwise, putting the conformation in the second conformation pool into the first conformation pool, emptying the second conformation pool and jumping to step 9);
8.2) introducing a conformational similarity index RMSD by calculating the RMSD value between each conformation and all the remaining conformations in two conformational pools, as shown in formula (5), wherein
Figure FDA0003035148570000047
Is in conformation Ci(x, y, z) coordinates in the internal atomic space,
Figure FDA0003035148570000048
in any of the remaining conformations Cj(x, y, z) coordinates in the internal atomic space;
Figure FDA0003035148570000049
8.3) judging the conformation similarity according to the RMSD value, selecting NP conformations with the most abundant diversity, putting the NP conformations into a first conformation pool, emptying a second conformation pool, and transferring to the step 9);
15) and outputting the result.
CN202110440653.1A 2021-04-23 2021-04-23 Protein structure prediction method based on residue contact diagram information game mechanism Withdrawn CN113257338A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110440653.1A CN113257338A (en) 2021-04-23 2021-04-23 Protein structure prediction method based on residue contact diagram information game mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110440653.1A CN113257338A (en) 2021-04-23 2021-04-23 Protein structure prediction method based on residue contact diagram information game mechanism

Publications (1)

Publication Number Publication Date
CN113257338A true CN113257338A (en) 2021-08-13

Family

ID=77221402

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110440653.1A Withdrawn CN113257338A (en) 2021-04-23 2021-04-23 Protein structure prediction method based on residue contact diagram information game mechanism

Country Status (1)

Country Link
CN (1) CN113257338A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114121146A (en) * 2021-11-29 2022-03-01 山东建筑大学 RNA three-level structure prediction method based on parallel and Monte Carlo strategies

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114121146A (en) * 2021-11-29 2022-03-01 山东建筑大学 RNA three-level structure prediction method based on parallel and Monte Carlo strategies
CN114121146B (en) * 2021-11-29 2023-10-03 山东建筑大学 RNA tertiary structure prediction method based on parallel and Monte Carlo strategies

Similar Documents

Publication Publication Date Title
Cristianini et al. Introduction to computational genomics: a case studies approach
CN109524058B (en) Protein dimer structure prediction method based on differential evolution
CN108846256B (en) Group protein structure prediction method based on residue contact information
CN109727637B (en) Method for identifying key proteins based on mixed frog-leaping algorithm
CN113257338A (en) Protein structure prediction method based on residue contact diagram information game mechanism
CN109101785B (en) Protein structure prediction method based on secondary structure similarity selection strategy
CN111048145B (en) Method, apparatus, device and storage medium for generating protein prediction model
CN109378034B (en) Protein prediction method based on distance distribution estimation
Souza et al. Detecting clustered independent rare variant associations using genetic algorithms
CN110444249B (en) Method for predicting fluorescent protein based on calculation
CN110189794B (en) Residue contact guided loop perturbation population protein structure prediction method
CN110610742B (en) Functional module detection method based on protein interaction network
CN109300505B (en) Protein structure prediction method based on biased sampling
CN109243526B (en) Protein structure prediction method based on specific fragment crossing
CN112967751A (en) Protein conformation space optimization method based on evolution search
WO2008134261A2 (en) A method for protein structure determination, gene identification, mutational analysis, and protein design
CN110729023B (en) Protein structure prediction method based on contact assistance of secondary structure elements
Liu et al. Bayesian methods in biological sequence analysis
CN111815036B (en) Protein structure prediction method based on multi-residue contact map cooperative constraint
Majhi et al. Artificial Intelligence in Bioinformatics
CN109658979B (en) Protein structure prediction method based on fragment library information enhancement
CN109461472B (en) Protein conformation space optimization method based on replica exchange and biased distribution estimation
Floden Alignment uncertainty, regressive alignment and large scale deployment
Lee et al. Development of a library with feature selection algorithm based on microarray gene expression dataset for biomarker identification
CN114530194A (en) Protein structure prediction method based on distance map constraint among multiple residues

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20210813

WW01 Invention patent application withdrawn after publication