CN113257338A - Protein structure prediction method based on residue contact diagram information game mechanism - Google Patents
Protein structure prediction method based on residue contact diagram information game mechanism Download PDFInfo
- Publication number
- CN113257338A CN113257338A CN202110440653.1A CN202110440653A CN113257338A CN 113257338 A CN113257338 A CN 113257338A CN 202110440653 A CN202110440653 A CN 202110440653A CN 113257338 A CN113257338 A CN 113257338A
- Authority
- CN
- China
- Prior art keywords
- conformation
- pool
- conformations
- energy
- conformational
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
- G16B15/20—Protein or domain folding
Landscapes
- Spectroscopy & Molecular Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Crystallography & Structural Chemistry (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Theoretical Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A protein structure prediction method based on a residue contact graph information game mechanism comprises the steps of firstly, obtaining a plurality of residue contact graphs through four protein residue contact servers of Raptorx, ResPRE, NeBcon and DeepMetaPSICOV selected according to Jaccard indexes of CASP games so as to construct a plurality of energy functions; secondly, initializing the population by utilizing the first and second stages of Rosetta, and generating a new test conformation by carrying out variation and crossing on the target conformation; and finally, designing a Pareto-based multi-objective optimization algorithm to update the conformation according to an energy function constructed by the four-residue contact diagram, so as to guide algorithm sampling to obtain the conformation with the structure closer to the natural state. The invention provides a protein structure prediction method based on a residue contact map information game mechanism.
Description
Technical Field
The invention relates to the fields of bioinformatics and computational intelligence, in particular to a protein structure prediction method based on a residue contact diagram information game mechanism.
Background
The life health is the leading direction of future industry development in the world, and is a basic field for improving the health level of people and enhancing the acquaintance of common people. The reproductive activities of all life processes and races are closely related to the synthesis, decomposition and change of proteins. The three-dimensional structure of a protein determines its specific biological function and is the material basis for life activities. Misfolding of the protein may result in failure to function properly. For example, in the brain of senile dementia patients, there are numerous disordered protein clusters formed by misfolded proteins. Therefore, in order to realize breakthrough in the field of life health and understand life phenomena and life processes more deeply to realize targeted drug development, the prerequisite is to acquire the three-dimensional structure of the protein.
At present, conventional methods of biological wet experiments, including X-ray crystallography, nuclear magnetic resonance and cryoelectron microscopy, although capable of determining the three-dimensional structure of proteins, are highly demanding on materials, instruments and personnel and are extremely time-consuming. Therefore, it is urgently required to perform structural modeling of sequences and to search for protein structure prediction by using computational techniques.
Protein structure prediction is taken as a major research problem in the field of bioinformatics, and two major fields exist in the field at present, namely an energy function model is constructed according to physicochemical knowledge of biomolecules, so that the trend is led all the time from the early CASP competition, and the situation is also a very important position at present. It is represented by Rosetta at Baker laboratory of Washington university and I-TASSER at Zhang Yang laboratory of Michigan university. As a structural prediction tool, the Rosetta algorithm is capable of predicting, designing, and analyzing a variety of biomolecular systems, including proteins, RNA, DNA, peptides, small molecules, and non-canonical or derivatized amino acids. I-TASSER is a method for predicting protein structure and function. The method predicts the functions of targets by a multithreading method LOMETS, a protein function database BioLiP and the like. The physicochemical model method achieves abundant results and simultaneously shows the defects of insufficient expression accuracy, imperfect characteristics and the like. And the other block is mainly used for predicting contact, distance and other information based on deep learning so as to construct a knowledge model. In the CASP14 results from the previous days, the AlphaFold proposed by Google ranked first in the artificial group and far beyond the second, Tencent, tfold first contest also achieved good performance ranked first in the contact group.
From the aspect of CASP competition contact prediction, although the precision of contact prediction is higher and higher at present, error information still exists; and the Jaccard distance graph shows that the information sets captured by different prediction servers are different. In addition, although the deep learning method has made great progress in the field of protein structure prediction, especially residue contact prediction, when a protein structure is folded, a plurality of different sets of residue contact information are often integrated by adopting simple weighted superposition, so that a part of predicted residue contact information is lost, and the prediction accuracy is inevitably influenced. On the other hand, prediction of protein structure by computational techniques is usually evaluated using a single energy function, which is limited in the ability to sample, and which ultimately yields a conformation of the protein that may be optimal in energy but not necessarily optimal, i.e., a conformation that is low in energy is not necessarily the closest to the native conformation.
Therefore, the existing protein structure prediction methods have shortcomings in data reception efficiency and conformation selection evaluation, and improvements are needed.
The invention content is as follows:
in order to overcome the defects of low data receiving efficiency and low prediction precision of the conventional protein structure prediction method, the invention provides the protein structure prediction method based on a residue contact graph information game mechanism, wherein a plurality of energy functions are constructed by a plurality of residue contact graphs based on four protein residue contact servers of Raptorx, ResPRE, NeBcon and DeepMetaPSICOV and a Rosetta platform, and a multi-objective optimization method is adopted to guide conformation space optimization.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a protein structure prediction method based on residue contact map information gambling mechanism, the method comprising the steps of:
1) sequence information for a given protein of interest;
2) according to the sequence information of the given target protein, the following four contact map prediction servers are utilized:
RaptorX(http://raptorx.uchicago.edu/ContactMap/);
ResPRE(https://zhanglab.ccmb.med.umich.edu/ResPRE/);
NeBcon(https://zhanglab.ccmb.med.umich.edu/NeBcon/);
DeepMetaPSICOV(http://bioinf.cs.ucl.ac.uk/psipred/);
acquiring four residue contact information files, and performing data processing to generate four contacMapRapter files, namely, contactMapRaptorX, contactMapRespre, contactMapNeBcon and contactMapDeepMetaPSICOV;
3) respectively constructing an Energy function Energy raptorX (C) according to four contact map files, namely contact map pRaprammonium X, contact map pRespPRE, contact map NeBcon and contact map data copy PSICOVn)、Energy ResPRE(Cn)、Energy NeBcon(Cn)、Energy PSICOV(Cn) The formula is as follows:
wherein the content of the first and second substances,represents the confidence of the contact of the kth residue pair (i, j) in the residue contact maps contactMapRaptorX, contactMapRespre, contactMapNeBcon and contactMapDeepetaPSICOV,represents the true distance between the kth residue pair (i, j), dconThe threshold value of 8, the maximum distance that two residues touch, respectively represent conformation CnIs determined at four energy functions energy raptorx (C)n)、Energy ResPRE(Cn)、Energy NeBcon(Cn)、Energy PSICOV(Cn) The contact score of (1);
4) acquiring fragment library files from a ROBETTA server (http:// www.robetta.org /) according to a target protein sequence, wherein the fragment library files comprise 3 fragment library files and 9 fragment library files;
5) setting parameters: setting an initial iteration algebra G to be 0, wherein the population size NP, a cross factor CR and an iteration number G are set;
6) population initialization: random fragment assembly to generate NP initial conformations Cn,n={1,2,…,NP};
7) Will be conformation CnSubstituting into four Energy functions Energy raptorX (C)n)、Energy ResPRE(Cn)、Energy NeBcon(Cn)、Energy PSICOV(Cn) In the method, an energy value is obtainedConstructed as an energy array
8.1) setting the initial conformation number N to 0;
8.2) traversing the population, each conformation CnEnergy array ofIf none of the four energy values of the conformations is better than the current conformation C, compared with all other conformationsnI.e. bySo thatWherein C ismTo remove the current conformation CnIf any conformation is in other conformation, the solution is recorded as Pareto effective solution;
8.3) placing the conformation effectively solved by Pareto into a first conformation pool, recording the current conformation number as N, and removing the rest conformations;
9) and (3) circulation: g +1, if G > G, go to step 14);
10) subjecting the conformational individuals in the first conformational pool to CnN ∈ {1,2,3, …, N } is regarded as the target conformation entityPerforming the following operations to generate a mutated conformationThe process is as follows:
10.1) randomly generating positive integers N1, N2, N3 in the range of 1 to N, and N1 ≠ N2 ≠ N3 ≠ N;
10.2) in conformation Cn1Randomly selected 9-fragment at position to replace conformation Cn3From the fragment corresponding to the same position in conformation Cn2Randomly choosing one and conformation C in positionn1Selection of differently positioned 9 fragments for replacement of conformation Cn3And then the corresponding fragment in the same position of (A) is used for conformation Cn3Performing 3-segment assembly to generate individual with variant conformation
11) For the variant conformationN e {1,2,3, …, N } performs a crossover operation to generate a test constellationThe process is as follows:
11.1) generating a random number rand1, wherein rand1 belongs to (0, 1);
11.2) if the random number rand1 is less than or equal to CR, then starting from the target conformationIn which a 3-fragment is randomly selected to be substituted into a variant conformationOtherwise mutated conformationThe change is not changed;
12) testing the conformation in the second conformation cellSubstituting into four Energy functions Energy raptorX (C)n)、Energy ResPRE(Cn)、Energy NeBcon(Cn)、Energy PSICOV(Cn) In the method, an energy value is obtained Constructed as an energy array
13) Traversing the second conformation pool, and reserving conformations of the full population Pareto effective solutionThe process is as follows:
13.1) the second conformation pool internal conformations are compared with each other, the conformations that retain the effective solution of ParetoRecording the number of conformations as NTBA;
13.2) conformation in the second conformation poolm∈{1,2,3,…,NTBAAnd the conformations in the first conformation wellN ∈ {1,2,3, …, N } for comparison:
13.2.2) ifSo thatAnd certainlySo thatThen use the conformationReplacement of conformations in the first conformational poolAnd deleting conformations in the second conformation pool
13.2.3) if present for any one of the conformations in the first pool of conformationsAll exist k epsilon [1,2,3,4 ]]So thatThen the conformation will be changedRetained in the second conformational bath;
13.2.4) update NTBARecording the number of conformations in the current second conformation pool;
14) for the conformation of the first conformation poolAnd the conformation of the second conformation poolThe selection operation is carried out by the following process:
14.1) if the sum of the conformational numbers of the first conformational pool and the second conformational pool is greater than the set population number, i.e., N + NTBAIf not, continuing to step 14.2), otherwise, putting the conformation in the second conformation pool into the first conformation pool, emptying the second conformation pool and jumping to step 9);
14.2) introducing a conformational similarity index RMSD by calculating the RMSD value between each conformation and all the remaining conformations in two conformational pools, as shown in formula (5), whereinIs in conformation Ci(x, y, z) coordinates in the internal atomic space,in any of the remaining conformations Cj(x, y, z) coordinates in the internal atomic space;
14.3) judging the conformation similarity according to the RMSD value, selecting NP conformations with the most abundant diversity, putting the NP conformations into a first conformation pool, emptying a second conformation pool, and transferring to the step 9);
15) and outputting the result.
The technical conception of the invention is as follows: firstly, a plurality of residue contact maps are obtained by contacting four protein residues of Raptorx, ResPRE, NeBcon and DeepMetaPSICOV with a server so as to construct a plurality of energy functions; secondly, initializing the population by utilizing the first and second stages of Rosetta, and generating a new test conformation by carrying out variation and crossing on the target conformation; and finally, designing a Pareto-based multi-objective optimization algorithm to update the conformation according to an energy function constructed by the four-residue contact diagram, so as to guide algorithm sampling to obtain the conformation with the structure closer to the natural state. The invention provides a protein structure prediction method based on a residue contact map information game mechanism.
The invention has the beneficial effects that: firstly, the obtained residue contact information is predicted and obtained through different servers, so that the source diversity of the contact information is increased, and the influence of information loss and error leakage possibly caused by a single contact graph on structure prediction is reduced; secondly, a conformation selection method based on a residue contact information game mechanism is designed by combining a multi-objective optimization algorithm, and conformation guiding errors caused by inaccuracy of a traditional energy model are avoided.
Drawings
FIG. 1 is processed information of four predicted residue contact maps.
FIG. 2 is a conformational distribution diagram obtained by protein 1ELW sampling based on a protein structure prediction method of residue contact diagram information game mechanism.
FIG. 3 is a three-dimensional structure predicted from a 1ELW protein structure by a protein structure prediction method based on a residue contact map information game mechanism.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1 to 3, a method for predicting a protein structure based on a multi-residue contact map synergistic constraint, the method comprising the steps of:
1) sequence information for a given protein of interest;
2) according to the sequence information of the given target protein, the following four contact map prediction servers are utilized:
RaptorX(http://raptorx.uchicago.edu/ContactMap/);
ResPRE(https://zhanglab.ccmb.med.umich.edu/ResPRE/);
NeBcon(https://zhanglab.ccmb.med.umich.edu/NeBcon/);
DeepMetaPSICOV(http://bioinf.cs.ucl.ac.uk/psipred/);
acquiring four residue contact information files, and performing data processing to generate four contacMapRapter files, namely, contactMapRaptorX, contactMapRespre, contactMapNeBcon and contactMapDeepMetaPSICOV;
3) respectively constructing an energy function EnergyRaptorX (C) according to four contact map files, namely contact map pRaptorX, contact map pRespRE, contact map NeBcon and contact map data copy PSICOVn)、EnergyResPRE(Cn)、EnergyNeBcon(Cn)、EnergyPSICOV(Cn) The formula is as follows:
wherein the content of the first and second substances,represents the confidence of the contact of the kth residue pair (i, j) in the residue contact maps contactMapRaptorX, contactMapRespre, contactMapNeBcon and contactMapDeepetaPSICOV,represents the true distance between the kth residue pair (i, j), dconThe threshold value of 8, the maximum distance that two residues touch, respectively represent conformation CnIs determined at four Energy functions Energy RaptorX (C)n)、Energy ResPRE(Cn)、Energy NeBcon(Cn)、Energy PSICOV(Cn) The contact score of (1);
4) acquiring fragment library files from a ROBETTA server (http:// www.robetta.org /) according to a target protein sequence, wherein the fragment library files comprise 3 fragment library files and 9 fragment library files;
5) setting parameters: setting an initial iteration algebra G to be 0, wherein the population size NP, a cross factor CR and an iteration number G are set;
6) population initialization: random fragment assembly to generate NP initial conformations Cn,n={1,2,…,NP};
7) Will be conformation CnSubstituting into four Energy functions Energy raptorX (C)n)、Energy ResPRE(Cn)、Energy NeBcon(Cn)、Energy PSICOV(Cn) In the method, an energy value is obtainedConstructed as an energy array
8.1) setting the initial conformation number N to 0;
8.2) traversing the population, each conformation CnEnergy array ofIf none of the four energy values of the conformations is better than the current conformation C, compared with all other conformationsnI.e. bySo thatWherein C ismTo remove the current conformation CnIf any conformation is in other conformation, the solution is recorded as Pareto effective solution;
8.3) placing the conformation effectively solved by Pareto into a first conformation pool, recording the current conformation number as N, and removing the rest conformations;
9) and (3) circulation: g +1, if G > G, go to step 14);
10) subjecting the conformational individuals in the first conformational pool to CnN ∈ {1,2,3, …, N } is regarded as the target conformation entityPerforming the following operations to generate a mutated conformationThe process is as follows:
10.1) randomly generating positive integers N1, N2, N3 in the range of 1 to N, and N1 ≠ N2 ≠ N3 ≠ N;
10.2) in conformation Cn1Randomly selected 9-fragment at position to replace conformation Cn3Corresponding to the same position ofFrom conformation Cn2Randomly choosing one and conformation C in positionn1Selection of differently positioned 9 fragments for replacement of conformation Cn3And then the corresponding fragment in the same position of (A) is used for conformation Cn3Performing 3-segment assembly to generate individual with variant conformation
11) For the variant conformationN e {1,2,3, …, N } performs a crossover operation to generate a test constellationThe process is as follows:
11.1) generating a random number rand1, wherein rand1 belongs to (0, 1);
11.2) if the random number rand1 is less than or equal to CR, then starting from the target conformationIn which a 3-fragment is randomly selected to be substituted into a variant conformationOtherwise mutated conformationThe change is not changed;
12) testing the conformation in the second conformation cellSubstituting into four Energy functions Energy raptorX (C)n)、Energy ResPRE(Cn)、Energy NeBcon(Cn)、Energy PSICOV(Cn) In the method, an energy value is obtained Constructed as an energy array
13) Traversing the second conformation pool, and reserving conformations of the full population Pareto effective solutionThe process is as follows:
13.1) the second conformation pool internal conformations are compared with each other, the conformations that retain the effective solution of ParetoRecording the number of conformations as NTBA;
13.2) conformation in the second conformation poolm∈{1,2,3,…,NTBAAnd the conformations in the first conformation wellN ∈ {1,2,3, …, N } for comparison:
13.2.2) ifSo thatAnd certainlySo thatThen use the conformationReplacement of conformations in the first conformational poolAnd deleting conformations in the second conformation pool
13.2.3) if present for any one of the conformations in the first pool of conformationsAll exist k epsilon [1,2,3,4 ]]So thatThen the conformation will be changedRetained in the second conformational bath;
13.2.4) update NTBARecording the number of conformations in the current second conformation pool;
14) for the conformation of the first conformation poolAnd the conformation of the second conformation poolThe selection operation is carried out by the following process:
14.1) if the first conformational cell and the second conformational cellIs greater than a set population quantity, i.e. N + NTBAIf not, continuing to step 14.2), otherwise, putting the conformation in the second conformation pool into the first conformation pool, emptying the second conformation pool and jumping to step 9);
14.2) introducing a conformational similarity index RMSD by calculating the RMSD value between each conformation and all the remaining conformations in two conformational pools, as shown in formula (5), whereinIs in conformation Ci(x, y, z) coordinates in the internal atomic space,in any of the remaining conformations Cj(x, y, z) coordinates in the internal atomic space;
14.3) judging the conformation similarity according to the RMSD value, selecting NP conformations with the most abundant diversity, putting the NP conformations into a first conformation pool, emptying a second conformation pool, and transferring to the step 9);
15) and outputting the result.
Taking protein 1ELW with the sequence length of 117 as an implementation case, the protein structure prediction method based on the residue contact map information game mechanism comprises the following steps:
1) sequence information for a given protein of interest;
2) according to the sequence information of the given target protein, the following four contact map prediction servers are utilized:
RaptorX(http://raptorx.uchicago.edu/ContactMap/);
ResPRE(https://zhanglab.ccmb.med.umich.edu/ResPRE/);
NeBcon(https://zhanglab.ccmb.med.umich.edu/NeBcon/);
DeepMetaPSICOV(http://bioinf.cs.ucl.ac.uk/psipred/);
acquiring four residue contact information files, and performing data processing to generate four contacMapRapter files, namely, contactMapRaptorX, contactMapRespre, contactMapNeBcon and contactMapDeepMetaPSICOV;
3) respectively constructing an energy function EnergyRaptorX (C) according to four contact map files, namely contact map pRaptorX, contact map pRespRE, contact map NeBcon and contact map data copy PSICOVn)、EnergyResPRE(Cn)、EnergyNeBcon(Cn)、EnergyPSICOV(Cn) The formula is as follows:
wherein the content of the first and second substances,represents the confidence of the contact of the kth residue pair (i, j) in the residue contact maps contactMapRaptorX, contactMapRespre, contactMapNeBcon and contactMapDeepetaPSICOV,represents the true distance between the kth residue pair (i, j), dconThe threshold value of 8, the maximum distance that two residues touch, respectively represent conformation CnIs determined at four Energy functions Energy RaptorX (C)n)、Energy ResPRE(Cn)、Energy NeBcon(Cn)、Energy PSICOV(Cn) The contact score of (1);
4) acquiring fragment library files from a ROBETTA server (http:// www.robetta.org /) according to a target protein sequence, wherein the fragment library files comprise 3 fragment library files and 9 fragment library files;
5) setting parameters: the population size NP is 200, the cross factor CR is 0.5, the iteration number G is 500, and the initial iteration algebra G is 0;
6) population initialization: random fragment assembly to generate NP initial conformations Cn,n={1,2,…,NP};
7) Will be conformation CnSubstituting into four Energy functions Energy raptorX (C)n)、Energy ResPRE(Cn)、Energy NeBcon(Cn)、Energy PSICOV(Cn) In the method, an energy value is obtainedConstructed as an energy array
8.1) setting the initial conformation number N to 0;
8.2) traversing the population, each conformation CnEnergy array ofIf none of the four energy values of the conformations is better than the current conformation C, compared with all other conformationsnI.e. bySo thatWherein C ismTo remove the current conformation CnIf any conformation is in other conformation, the solution is recorded as Pareto effective solution;
8.3) placing the conformation effectively solved by Pareto into a first conformation pool, recording the current conformation number as N, and removing the rest conformations;
9) and (3) circulation: g +1, if G > G, go to step 14);
10) subjecting the conformational individuals in the first conformational pool to CnN ∈ {1,2,3, …, N } is regarded as the target conformation entityPerforming the following operations to generate a mutated conformationThe process is as follows:
10.1) randomly generating positive integers N1, N2, N3 in the range of 1 to N, and N1 ≠ N2 ≠ N3 ≠ N;
10.2) in conformation Cn1Randomly selected 9-fragment at position to replace conformation Cn3From the fragment corresponding to the same position in conformation Cn2Randomly choosing one and conformation C in positionn1Selection of differently positioned 9 fragments for replacement of conformation Cn3Corresponding to the same position ofThen using the pair conformation Cn3Performing 3-segment assembly to generate individual with variant conformation
11) For the variant conformationN e {1,2,3, …, N } performs a crossover operation to generate a test constellationThe process is as follows:
11.1) generating a random number rand1, wherein rand1 belongs to (0, 1);
11.2) if the random number rand1 is less than or equal to CR, then starting from the target conformationIn which a 3-fragment is randomly selected to be substituted into a variant conformationOtherwise mutated conformationThe change is not changed;
12) testing the conformation in the second conformation cellInto four energy functions EnergyRaptorx (C)n)、EnergyResPRE(Cn)、EnergyNeBcon(Cn)、EnergyPSICOV(Cn) In the method, an energy value is obtained Constructed as an energy array
13) Traversing the second conformation pool, and reserving conformations of the full population Pareto effective solutionThe process is as follows:
13.1) the second conformation pool internal conformations are compared with each other, the conformations that retain the effective solution of ParetoRecording the number of conformations as NTBA;
13.2) conformation in the second conformation poolm∈{1,2,3,…,NTBAAnd the conformations in the first conformation wellN ∈ {1,2,3, …, N } for comparison:
13.2.2) ifSo thatAnd certainlySo thatThen use the conformationReplacement of conformations in the first conformational poolAnd deleting conformations in the second conformation pool
13.2.3) if present for any one of the conformations in the first pool of conformationsAll exist k epsilon [1,2,3,4 ]]So thatThen the conformation will be changedRetained in the second conformational bath;
13.2.4) update NTBARecording the number of conformations in the current second conformation pool;
14) for the conformation of the first conformation poolAnd the conformation of the second conformation poolThe selection operation is carried out by the following process:
14.1) if the sum of the conformational numbers of the first conformational pool and the second conformational pool is greater than the set population number, i.e., N + NTBANot less than NP, continuing step 14.2), otherwise, putting the conformation in the second conformation pool into the first conformation pool, emptying the second conformation poolLike pool and jump to step 9);
14.2) introducing a conformational similarity index RMSD by calculating the RMSD value between each conformation and all the remaining conformations in two conformational pools, as shown in formula (5), whereinIs in conformation Ci(x, y, z) coordinates in the internal atomic space,in any of the remaining conformations Cj(x, y, z) coordinates in the internal atomic space;
14.3) judging the conformation similarity according to the RMSD value, selecting NP conformations with the most abundant diversity, putting the NP conformations into a first conformation pool, emptying a second conformation pool, and transferring to the step 9);
15) and outputting the result.
Taking the protein 1ELW with the sequence length of 117 as an example, the protein conformation in the near-natural state is obtained by the method, the average root mean square deviation between the structure obtained by running 500 generations and the natural state structure is 2.34, the minimum root mean square deviation is 1.65, and the predicted three-dimensional structure is shown in FIG. 3.
The foregoing illustrates one example of the invention, and it will be apparent that the invention is not limited to the above-described embodiments, but may be practiced with various modifications without departing from the essential spirit of the invention and without departing from the spirit thereof.
Claims (1)
1. A protein structure prediction method based on a residue contact map information game mechanism is characterized in that: the method comprises the following steps:
1) sequence information for a given protein of interest;
2) according to the sequence information of the given target protein, the following four contact map prediction servers are utilized: RaptorX, ResPRE, NeBcon, and DeepMetaPSICOV; acquiring four residue contact information files, and performing data processing to generate four contacMapRapter files which are named as contactMapRaptorX, contactMapRespre, contactMapNeBcon and contactMapDeepMetaPSICOV respectively;
3) respectively constructing an energy function EnergyRaptorX (C) according to four contact map files, namely contact map pRaptorX, contact map pRespRE, contact map NeBcon and contact map data copy PSICOVn)、EnergyResPRE(Cn)、EnergyNeBcon(Cn)、EnergyPSICOV(Cn) The formula is as follows:
wherein the content of the first and second substances,represents the confidence of the contact of the kth residue pair (i, j) in the residue contact maps contactMapRaptorX, contactMapRespre, contactMapNeBcon and contactMapDeepetaPSICOV,represents the true distance between the kth residue pair (i, j), dconThe threshold value of 8, the maximum distance that two residues touch, respectively represent conformation CnIs determined at four energy functions energy raptorx (C)n)、EnergyResPRE(Cn)、EnergyNeBcon(Cn)、EnergyPSICOV(Cn) The contact score of (1);
4) acquiring fragment library files from a ROBETTA server according to a target protein sequence, wherein the fragment library files comprise 3 fragment library files and 9 fragment library files;
5) setting parameters: setting an initial iteration algebra G to be 0, wherein the population size NP, a cross factor CR and an iteration number G are set;
6) population initialization: random fragment assembly to generate NP initial conformations Cn,n={1,2,…,NP};
7) Will be conformation CnInto four energy functions EnergyRaptorx (C)n)、EnergyResPRE(Cn)、EnergyNeBcon(Cn)、EnergyPSICOV(Cn) In the method, an energy value is obtainedConstruction ofIs an energy array
8.1) setting the initial conformation number N to 0;
8.2) traversing the population, each conformation CnEnergy array ofIf none of the four energy values of the conformations is better than the current conformation C, compared with all other conformationsnI.e. bySo thatWherein C ismTo remove the current conformation CnIf any conformation is in other conformation, the solution is recorded as Pareto effective solution;
8.3) placing the conformation effectively solved by Pareto into a first conformation pool, recording the current conformation number as N, and removing the rest conformations;
9) and (3) circulation: g +1, if G > G, go to step 14);
10) subjecting the conformational individuals in the first conformational pool to CnN ∈ {1,2,3, …, N } is regarded as the target conformation entityPerforming the following operations to generate a mutated conformationThe process is as follows:
6.1) randomly generating positive integers N1, N2, N3 in the range of 1 to N, wherein N1 ≠ N2 ≠ N3 ≠ N;
6.2) in conformation Cn1Randomly selected 9-fragment at position to replace conformation Cn3From the fragment corresponding to the same position in conformation Cn2Randomly choosing one and conformation C in positionn1Selection of differently positioned 9 fragments for replacement of conformation Cn3And then the corresponding fragment in the same position of (A) is used for conformation Cn3Performing 3-segment assembly to generate individual with variant conformation
11) For the variant conformationPerforming a crossover operation to generate a test constellationThe process is as follows:
11.1) generating a random number rand1, wherein rand1 belongs to (0, 1);
11.2) if the random number rand1 is less than or equal to CR, then starting from the target conformationIn which a 3-fragment is randomly selected to be substituted into a variant conformationOtherwise mutated conformationThe change is not changed;
12) testing the conformation in the second conformation cellInto four energy functions EnergyRaptorx (C)n)、EnergyResPRE(Cn)、EnergyNeBcon(Cn)、EnergyPSICOV(Cn) In the method, an energy value is obtained Constructed as an energy array
13) Traversing the second conformation pool, and reserving conformations of the full population Pareto effective solutionThe process is as follows:
13.1) the second conformation pool internal conformations are compared with each other, the conformations that retain the effective solution of ParetoRecording the number of conformations as NTBA;
13.2) conformation in the second conformation poolConformation in pool with first conformationAnd (3) comparison:
13.2.2) ifSo thatAnd certainlySo thatThen use the conformationReplacement of conformations in the first conformational poolAnd deleting conformations in the second conformation pool
13.2.3) if present for any one of the conformations in the first pool of conformationsAll exist k epsilon [1,2,3,4 ]]So thatThen the conformation will be changedRetained in the second conformational bath;
13.2.4) update NTBARecording the number of conformations in the current second conformation pool;
14) for the first conformational cellConformationAnd the conformation of the second conformation poolThe selection operation is carried out by the following process:
8.1) if the sum of the conformational numbers of the first conformational pool and the second conformational pool is greater than the set population number, i.e., N + NTBAIf not, continuing to step 14.2), otherwise, putting the conformation in the second conformation pool into the first conformation pool, emptying the second conformation pool and jumping to step 9);
8.2) introducing a conformational similarity index RMSD by calculating the RMSD value between each conformation and all the remaining conformations in two conformational pools, as shown in formula (5), whereinIs in conformation Ci(x, y, z) coordinates in the internal atomic space,in any of the remaining conformations Cj(x, y, z) coordinates in the internal atomic space;
8.3) judging the conformation similarity according to the RMSD value, selecting NP conformations with the most abundant diversity, putting the NP conformations into a first conformation pool, emptying a second conformation pool, and transferring to the step 9);
15) and outputting the result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110440653.1A CN113257338A (en) | 2021-04-23 | 2021-04-23 | Protein structure prediction method based on residue contact diagram information game mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110440653.1A CN113257338A (en) | 2021-04-23 | 2021-04-23 | Protein structure prediction method based on residue contact diagram information game mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113257338A true CN113257338A (en) | 2021-08-13 |
Family
ID=77221402
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110440653.1A Withdrawn CN113257338A (en) | 2021-04-23 | 2021-04-23 | Protein structure prediction method based on residue contact diagram information game mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113257338A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114121146A (en) * | 2021-11-29 | 2022-03-01 | 山东建筑大学 | RNA three-level structure prediction method based on parallel and Monte Carlo strategies |
-
2021
- 2021-04-23 CN CN202110440653.1A patent/CN113257338A/en not_active Withdrawn
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114121146A (en) * | 2021-11-29 | 2022-03-01 | 山东建筑大学 | RNA three-level structure prediction method based on parallel and Monte Carlo strategies |
CN114121146B (en) * | 2021-11-29 | 2023-10-03 | 山东建筑大学 | RNA tertiary structure prediction method based on parallel and Monte Carlo strategies |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Cristianini et al. | Introduction to computational genomics: a case studies approach | |
CN109524058B (en) | Protein dimer structure prediction method based on differential evolution | |
CN108846256B (en) | Group protein structure prediction method based on residue contact information | |
CN109727637B (en) | Method for identifying key proteins based on mixed frog-leaping algorithm | |
CN113257338A (en) | Protein structure prediction method based on residue contact diagram information game mechanism | |
CN109101785B (en) | Protein structure prediction method based on secondary structure similarity selection strategy | |
CN111048145B (en) | Method, apparatus, device and storage medium for generating protein prediction model | |
CN109378034B (en) | Protein prediction method based on distance distribution estimation | |
Souza et al. | Detecting clustered independent rare variant associations using genetic algorithms | |
CN110444249B (en) | Method for predicting fluorescent protein based on calculation | |
CN110189794B (en) | Residue contact guided loop perturbation population protein structure prediction method | |
CN110610742B (en) | Functional module detection method based on protein interaction network | |
CN109300505B (en) | Protein structure prediction method based on biased sampling | |
CN109243526B (en) | Protein structure prediction method based on specific fragment crossing | |
CN112967751A (en) | Protein conformation space optimization method based on evolution search | |
WO2008134261A2 (en) | A method for protein structure determination, gene identification, mutational analysis, and protein design | |
CN110729023B (en) | Protein structure prediction method based on contact assistance of secondary structure elements | |
Liu et al. | Bayesian methods in biological sequence analysis | |
CN111815036B (en) | Protein structure prediction method based on multi-residue contact map cooperative constraint | |
Majhi et al. | Artificial Intelligence in Bioinformatics | |
CN109658979B (en) | Protein structure prediction method based on fragment library information enhancement | |
CN109461472B (en) | Protein conformation space optimization method based on replica exchange and biased distribution estimation | |
Floden | Alignment uncertainty, regressive alignment and large scale deployment | |
Lee et al. | Development of a library with feature selection algorithm based on microarray gene expression dataset for biomarker identification | |
CN114530194A (en) | Protein structure prediction method based on distance map constraint among multiple residues |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20210813 |
|
WW01 | Invention patent application withdrawn after publication |