CN111968707A - Energy-based atomic structure and electron density map multi-objective optimization fitting prediction method - Google Patents

Energy-based atomic structure and electron density map multi-objective optimization fitting prediction method Download PDF

Info

Publication number
CN111968707A
CN111968707A CN202010789510.7A CN202010789510A CN111968707A CN 111968707 A CN111968707 A CN 111968707A CN 202010789510 A CN202010789510 A CN 202010789510A CN 111968707 A CN111968707 A CN 111968707A
Authority
CN
China
Prior art keywords
density map
electron density
model
energy
optimization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010789510.7A
Other languages
Chinese (zh)
Other versions
CN111968707B (en
Inventor
张彪
沈红斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202010789510.7A priority Critical patent/CN111968707B/en
Publication of CN111968707A publication Critical patent/CN111968707A/en
Application granted granted Critical
Publication of CN111968707B publication Critical patent/CN111968707B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/20Protein or domain folding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B45/00ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks

Landscapes

  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioethics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A multi-objective optimization fitting prediction method based on an atomic structure and an electron density map of energy is characterized in that according to a three-dimensional structure of protein and the electron density map, an initial model is generated by building a reference data set of a prediction structure and the electron density map; then, preliminarily moving the predicted atomic structure to the center of the density map by using the information of the electron density map to generate N initial models; and selecting a pareto set through a multi-target particle swarm optimization algorithm, selecting an optimal model from the pareto set through a Knee algorithm, and calculating to obtain a fitting result between the atomic structure and the electron density map. The present invention can solve the potential bias problem caused by minimizing only a single energy function.

Description

Energy-based atomic structure and electron density map multi-objective optimization fitting prediction method
Technical Field
The invention relates to a technology in the field of biological information, in particular to a multi-objective optimization fitting prediction method based on an atomic structure and an electron density map of global and local energy.
Background
High resolution protein structure is important for understanding the function of proteins and the mechanisms of associated diseases. Several methods such as X-ray crystallography, Nuclear Magnetic Resonance (NMR), and cryoelectron microscopy (Cryo-EM)Have been used to obtain macromolecular structures. In recent years, a large number of low-temperature electromagnetic density maps have emerged due to revolutionary developments in low-temperature electromagnetic imaging techniques, but these density maps typically have relatively low resolution (e.g., low resolution)
Figure BDA0002623253960000011
). Although most electron-microscopic derived density maps are not sufficiently high in resolution, they can generally provide a description of the molecular topology and thus can be applied to the optimization of atomic structures. This approach to optimization based on density map constraints has become a popular direction in the field of protein structure prediction. The optimization modeling process typically includes three steps, namely: (1) predicting an atomic structure; (2) fitting an atomic structure to an electron density map; (3) the atomic structure is optimized from the electron density map. Fitting the atomic structure to the electron density map is the basis for optimizing the atomic structure, which has the effect of reducing the search space for the subsequent optimization process. Especially for low or medium resolution density maps, searching the space is a difficult issue. To obtain high resolution protein structures, advanced computational techniques are required to make up for the missing information from electron density maps.
At present, several calculation methods have been successfully applied to the fitting of density maps and electron microscopes. Such as EMFIT, Situs, 3SOM, MultiFit, ADP _ EM, Attract-EM, EMatch, Powerfit and UCSF Chimera, which have been developed to allow the interpretation of structure-related molecular functions using these software. These programs typically perform an automatic search for possible rotations and translations to maximize the cross-correlation function and thus find the best fit. ADP _ EM is a multi-resolution docking method that performs a search in a rotation space by a fast rotation matching method to maximize correlation. The collies of simple is a contour-based matching method that combines fast fourier transforms to speed up the search of the space to quickly find the location of atomic structures relative to the density map. EMatch uses a template matching process to identify secondary structure elements in the electron density map to achieve alignment of the structure. Despite partial success, there are several problems that limit the efficacy of existing fitting algorithms. First, how to evaluate the fitting quality is an important goal, which will guide the search direction. Most existing algorithms use a single global objective (e.g., Correlation Coefficient (CC)) as an optimization criterion. The fit between the predicted structure and the electron density map can be very complex and result in different fit positions due to the following complications: the quality of the predicted structure, the resolution of the density map, the SNR (signal-to-noise ratio) of the density map, the misalignment between the predicted structure and the density map, etc. A single global objective function typically does not achieve the fitting effect and robustness with such complications. Second, because the search space is not limited during the fitting process, exhaustive searches performed by existing fitting tools are a time-consuming process. Thus, a more heuristic search algorithm will help balance performance and robustness.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a multi-objective optimization fitting prediction method for an atomic structure and an electron density map based on energy, which can solve the problem of potential deviation caused by only minimizing a single energy function.
The invention is realized by the following technical scheme:
the invention relates to an atomic structure and electron density map multi-objective optimization fitting prediction method based on energy optimization, which comprises the steps of generating an initial model by establishing a reference data set of a prediction structure and an electron density map according to a protein three-dimensional structure (PDB) and the electron density map; then, preliminarily moving the predicted atomic structure to the center of the density map by using the information of the electron density map to generate N initial models; and selecting a pareto set through a multi-target particle swarm optimization algorithm, selecting an optimal model from the pareto set through a Knee algorithm, and calculating to obtain a fitting result between the atomic structure and the electron density map.
The invention relates to a system for realizing the method, which comprises the following steps: the system comprises an initial model generation unit, a multi-objective optimization unit, a model selection unit and a full-atomic model output unit, wherein: the initial model generating unit receives system input information, performs initialization processing and outputs a plurality of initial state results to the multi-target optimizing unit, the multi-target optimizing unit receives the information of the initial model generating unit, performs core optimization processing and outputs a plurality of optimizing results to the model selecting unit, the model selecting unit receives the information of the multi-target optimizing unit, performs optimal result selection processing and outputs an optimal solution result to the all-atom model outputting unit, the all-atom model outputting unit receives the information of the model selecting unit, performs atom completion processing and outputs an all-atom model result to a user.
Technical effects
The invention integrally solves the problem of high-precision fitting of the predicted structure and the electron density map. The existing fitting algorithm of the atomic structure and the electron density map solves the similar fitting problem, and has the problems of unstable fitting result, low precision and the like.
Compared with single-target optimization, the method can avoid falling into a local optimal value, so that better fitting performance is obtained. In fitting the predicted structure to the density map, the present invention takes into account global and local cross-correlations, which include the correlation of the entire model, the correlation of the amino acid fragments, and the residue level correlation score. Multiobjective optimization allows complementary tradeoffs between multiple objectives to achieve an optimal solution. The best solution will be obtained from the non-dominated Pareto set. Global correlations evaluate the preference of the fit from the global shape, while local correlations evaluate the result of the fit from the local topology, the global and local correlations being complementary to produce the final optimized result.
According to the invention, the fitting difficulty is improved through the reference data sets (292 predicted structures and 292 electron density maps) of the predicted structures and the electron density maps which accord with the actual conditions, and the fitting precision reaches the average cRMSD of 2.46. Compared with the current popular single-target method, the method has great improvement.
Drawings
FIG. 1 is a schematic diagram of an atomic structure rigid docking algorithm based on an electron density map;
FIG. 2 is a distribution histogram of TM-score of initial structures corresponding to 292 test proteins;
FIG. 3 is a resolution distribution histogram of a simulated density map corresponding to 292 test proteins;
FIG. 4 is a schematic representation of the conversion of atomic structures into a calculated density map;
FIG. 5 is a graph showing the results of a comparison of MOFIT and the other three methods on 292 test proteins;
in the figure: left: MOFIT vs ADP _ EM; the method comprises the following steps: MOFIT vs PowerFit; and (3) right: MOFIT vs SITUS;
FIG. 6 is a schematic diagram showing a pareto concentration non-dominated particle energy distribution of the target protein T0880;
FIG. 7 is a schematic representation of the assembly on the domains of the termini of MADV2 fibers from murine adenovirus 2 (ID: T0880);
in the figure: all grey transparent objects are electron density maps and black objects are atomic structures. (A) The cRMSD of the structure at the position corresponding to the maximum CC (CC value 0.383, A-2) is
Figure BDA0002623253960000031
The cRMSD of the structure corresponding to the TM-score matching position (CC value 0.357, A-1) is 0.0; native PDB model (A-3); (B) for all methods, the different positions represent the fitting results of the different methods; the Powerfit fitting model (B-1,
Figure BDA0002623253960000032
) The Situs fitting model (B-2,
Figure BDA0002623253960000033
Figure BDA0002623253960000034
) ADP _ EM fitting model (B-3,
Figure BDA0002623253960000035
) And a MOFIT fitting model (B-4,
Figure BDA0002623253960000036
) (ii) a (C) The different positions represent the final modes for optimizing different fitting structures using RosettaMolding; RMSD of the optimized structure of PowerFit model is
Figure BDA0002623253960000037
TM-score is 0.460 (C-1); the RMSD of the optimized structure of the Situs model is
Figure BDA0002623253960000038
TM-score of 0.465 (C-2); RMSD of the optimized structure of ADP _ EM model is
Figure BDA0002623253960000039
TM-score is 0.503 (C-3); RMSD of the optimized structure of MOFIT model is
Figure BDA00026232539600000310
TM-score is 0.504 (C-4);
FIG. 8 is a schematic view of
Figure BDA00026232539600000311
A schematic diagram of the optimization effect of the test fit on the 3a1iA single domain protein on a simulated density map of resolution;
in the figure: all grey transparent objects are electron density maps and black objects are atomic structures. (A) The cRMSD of the structure corresponding to the TM-score matching position (A-1) was 0.0; the natural PDB model corresponds to (A-2); (B) different positions represent different fitting results, the cRMSD of the fitting model of PowerFit is
Figure BDA00026232539600000312
cRMSD of the Situs fitting model is
Figure BDA00026232539600000313
Figure BDA00026232539600000314
cRMSD of ADP _ EM fitting model is
Figure BDA00026232539600000315
And a cRMSD of MOFIT fitting model of
Figure BDA00026232539600000316
(C) The different positions represent the model after optimization using EM-Refiner; the RMSD of the model after the PowerFit model is optimized is
Figure BDA00026232539600000317
The RMSD of the model after the Situs model optimization is as
Figure BDA00026232539600000318
The RMSD of the model after the ADP _ EM model optimization is
Figure BDA00026232539600000319
RMSD of the model after MOFIT model optimization is
Figure BDA00026232539600000320
FIG. 9 is a diagram illustrating a comparison between a single objective function and multiple objective functions of multiple objectives;
in the figure: each square represents a cRMSD coordinate.
Detailed Description
As shown in fig. 1, the present embodiment relates to a multi-objective optimization fitting prediction method for an atomic structure and an electron density map based on energy, which includes creating a reference data set of a predicted structure and an electron density map and generating initial models, and then initially moving the predicted atomic structure to the center of the density map by using information of the electron density map to generate N initial models; and selecting a pareto set through a multi-target particle swarm optimization algorithm, selecting an optimal model from the pareto set through a Knee algorithm, and calculating to obtain a fitting result between the atomic structure and the electron density map.
The method for constructing the reference data set of the prediction structure and the electron density map and obtaining the statistical result of the data set is shown in fig. 2 and fig. 3, and comprises the following steps:
s11, firstly, extracting all PDBs (1809 in total) containing the electron density map from the PDB database, and then splitting the 1809 PDB structures into 37952 single-chain PDB structures;
as shown in fig. 4, the electron densityA map obtained by converting the protein structure into a calculated density map by converting the atomic coordinates into a lattice point space in which the density map is located by a coordinate conversion system, for a given inclusion only having coordinates x1~xNThe density map on grid y of the protein structure of c α atom of (1) is:
Figure BDA0002623253960000041
wherein: k ═ n (pi/(2.4 + 0.8R)0))2,C=a·(k/π)1.5K and C are parameters for describing the shape of the Gaussian kernel, R0For the resolution of the electron density map, a is the mass of the c α atom.
S12, deleting the sequences with redundancy of more than 90% by using the CD-HIT, and remaining 2488 samples. After eliminating samples with too short or discontinuous sequences from 2488 samples, 1186 samples remain;
s13, randomly selecting 292 corresponding sequences from 1186 samples as initial samples of the data set of the method, predicting corresponding atomic structures by utilizing I-TASSER, and giving a TM-score distribution histogram of the 292 predicted protein initial structures in figure 2;
s14, using the 292 target natural PDB structures, using EMAN2 and Xmipp to simulate the corresponding noise-free density map and noise density map (fig. 3 shows the resolution distribution histogram of the 292 protein corresponding simulated density maps), and the structures predicted in S13 above constitute the reference dataset of the method.
The initial model preliminarily moves the predicted atomic structure to the center of the density map by utilizing the information of the electron density map to generate N initial models, and the specific generation mode is as follows: and reading the lattice point and origin information of the electron density map from a header file of the electron density map, moving the predicted atomic structure to the center of the electron density map, and then randomly rotating the electron density map to generate N initial models at different positions.
The multi-objective particle swarm optimization algorithm specifically comprises the following steps:
step 1: the model of the I-TASSER prediction is first transformed to the density map center and then randomly rotated to generate the initial structure of N different positions. In thatIn the MOPSO optimization, the structure of each position is considered as one particle. From two-dimensional vector C ═ t, r]Denoted as the ith particle. At this time, t and r represent the amount of translation and rotation matrix of the rigid body, respectively, to which they belong
Figure BDA0002623253960000042
And [ -90 °, 90 ° ]]。
Step 2: in each simulation iteration, the position of each model is updated by rigid body translation and rotation. In each simulation, three energy functions are calculated as objective functions according to the conformational coordinates, and the non-dominated solution with at least two energy functions decreasing is put into the Pareto set.
The three energy functions include an evaluation of the global state and the local structure of the fitting quality, which are defined as:
(r) represents a density map ρ converted from an atomic structurec(y) and Experimental Density plot ρo(y) a correlation energy function between, which evaluates the fit state over the entire structure, specifically:
Figure BDA0002623253960000051
wherein:
Figure BDA0002623253960000052
and
Figure BDA0002623253960000053
the average value of the values contained in the grid points on the converted density chart and the experimental density chart is shown. RB (l) is the set of all grid points. CC is a global score that is very sensitive to the shape of the density map.
Energy function of local correlation
Figure BDA0002623253960000054
Wherein: CC (i) is the correlation between the lattice points represented by amino acid i and the lattice points in the corresponding laboratory density plot; l isaliIs the collection of all amino acids with a correlation greater than 0; l is the amino acid length of the atomic structure. The score function represents the correlation between local structures。
Evaluation of correlation energy function of all amino acid fragments with correlation greater than 0
Figure BDA0002623253960000055
Wherein:
Figure BDA0002623253960000056
when the relatedness of amino acid i is greater than 0, f (i) ═ 1; i _ up represents all amino acids with a sequential correlation greater than 0 that of the amino acid number; i _ down represents all amino acids having a smaller amino acid number and a consecutive correlation of more than 0; l isaliIs a collection of all amino acid fragments with a CV (i) of greater than 5. The scoring function is a function of similarity between segments between the table structures.
The updating of the position of each model specifically includes:
Figure BDA0002623253960000057
wherein:
Figure BDA0002623253960000058
the representation contemplates the magnitude of the amount of movement of i in the kth iteration.
Figure BDA0002623253960000059
Representing the new constellation of the ith model at the kth iteration. Omega is an inertial weight factor, which, according to the test of the method, decreases linearly with the number of iterations from 1.5 to 0.5, depending on the value. Gamma is a [0,1 ]]For introducing interference factors. c. C1And c2Is often set as
Figure BDA00026232539600000510
Is the best conformation of model i in the previous iteration,
Figure BDA00026232539600000511
is a non-dominant solution randomly chosen from a Pareto solution set.
And step 3: all models of the Pareto set will be ranked and then the optimal conformation will be selected as the final structure.
The optimal model selected from the pareto set is as follows: sorting the pareto solutions by using a knee algorithm, and selecting a model corresponding to a knee point as a final solution, wherein the method specifically comprises the following steps: the pareto solutions are ranked using the knee score, and then the inflection point is selected as the final solution.
After obtaining the pareto solutions, each pareto solution is projected into a three-dimensional energy space having energy axes E1, E2, E3. Searching a knee solution at the pareto frontier by a marginal utility method, specifically: u shapex,λ=λ1f1(x)+λ2f2(x)+λ3f3(x) Wherein: s.t. lambda123=1andλ123>0, x is the non-dominant solution in the pareto solution set. Lambda [ alpha ]123Is a weighting factor. The respective utility solutions may be represented by random values λ123And (6) calculating. The solution with the greatest marginal utility (knee point) is the final output solution.
And the fitting result is obtained by supplementing the whole atom model through supplementing atom software pulchra, the input of the atom model is main chain atoms, and the whole structure is supplemented by utilizing the statistical bond length, the statistical bond angle and the statistical dihedral angle and is output.
The method calculates the cRMSD of the fitted model,
Figure BDA0002623253960000061
in the formula, L is the length of the protein sequence,
Figure BDA0002623253960000062
representing the coordinates of the predicted structure aligned to the original structure using the TM-score alignment procedure,
Figure BDA0002623253960000063
representing the coordinates of the atomic structures corresponding to the predicted structures in the predicted structures to density maps using MOFIT fitting, the experimental results are as follows:
table 1 statistics on 292 tested proteins.
Figure BDA0002623253960000064
In table 1: cRMSDaRepresents the cRMSD, p-value between the TM-score aligned structure and the fit program aligned structurebResults of student's test representing cRMSD between the conformations of MOFIT and other fitting programs, Better casescRepresenting the number of MOFIT fitting results superior to those of other methods.
As shown in fig. 5-8, the results of the tests on specific proteins. The method compares the most popular algorithms (ADP _ EM, Situs, PowerFit) at present, and the comparison result with other algorithms shows the superiority of multi-target optimization in fitting the protein structure to the density map compared with single-target optimization.
As shown in fig. 9, for each individual objective function in the multiple objectives and for comparison between the multiple objective functions.
As can be seen from FIGS. 5-8 and 9, the method has better performance on a plurality of proteins. Table 1 gives the statistics on 292 tested proteins. The columns 2-3 in Table 1 give the alignment results on 292 noisy datasets. On a noise-free dataset, MOFIT achieved a cRMSD of 2.46, which was reduced compared to ADP _ EM, Situs, PowerFit, respectively
Figure BDA0002623253960000065
Figure BDA0002623253960000066
p-vlaue values of 3.51X 10, respectively-3,6.35×10-4And 7.92X 10-9. This indicates that on a noise-free density map, the MOFIT outperforms the other three methods and its statistical performance is significant. Columns 4-5 in table 1 give the statistical results under the noisy data set. MOFIT achieves relatively worse than no noise
Figure BDA0002623253960000067
But relative to the cRMSD of the other three methods, the sameSample descending
Figure BDA0002623253960000068
P-value of 1.19X 10-4,1.11×10-3,1.11×10-89. This demonstrates that on noisy data sets, the MOFIT is also superior to other methods. The number of proteins in MOFIT that outperformed the other methods is also given in Table 1. In the noise-free data, the number of proteins with MOFIT is smaller than that of cRMSD of ADP _ EM, Situs, PowerFit, 188, 162, 262, respectively. 200, 148, 286 on the noise data, respectively.
The method utilizes three designed global cross correlation and local cross correlation, such as the correlation of the whole model, the correlation of amino acid fragments and a residue level correlation energy function, to cooperate with multi-objective optimization to select local optimal points, wherein the different local optimal points are complementary and compromised on a plurality of targets with a single target, so as to find out an optimal solution. Obtained on the data set organized by the invention
Figure BDA0002623253960000069
Has a reduced average CRMSD compared to the current methods ADP _ EM, Situs, and PowerFit, respectively
Figure BDA0002623253960000071
Statistically, p-value was 3.51X 10-3,6.35×10-4And 7.92X 10-9. This demonstrates that the present invention is statistically significantly superior to the prior art relative to other methods.
The foregoing embodiments may be modified in many different ways by those skilled in the art without departing from the spirit and scope of the invention, which is defined by the appended claims and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims (10)

1. A multi-objective optimization fitting prediction method of an atomic structure and an electron density map based on energy optimization is characterized in that an initial model is generated by building a reference data set of a predicted structure and an electron density map according to a three-dimensional structure of a protein and the electron density map; then, preliminarily moving the predicted atomic structure to the center of the density map by using the information of the electron density map to generate N initial models; and selecting a pareto set through a multi-target particle swarm optimization algorithm, selecting an optimal model from the pareto set through a Knee algorithm, and calculating to obtain a fitting result between the atomic structure and the electron density map.
2. The energy-optimization-based multi-objective optimization fitting prediction method for the atomic structure and electron density map based on the energy optimization as claimed in claim 1, wherein the establishing of the reference data set of the predicted structure and the electron density map specifically comprises:
s11, firstly, extracting all PDBs containing the electron density map from the PDB database, and then splitting 1809 PDB structures into 37952 single-chain PDB structures;
s12, deleting sequences with redundancy of more than 90% by using CD-HIT, remaining 2488 samples, and removing over-short or discontinuous samples in the 2488 samples to leave 1186 samples;
s13, selecting 292 corresponding sequences from 1186 samples randomly as initial samples, and predicting corresponding atomic structures by utilizing I-TASSER;
s14, simulating a corresponding noise-free density map and a noise density map by using a 292 target natural PDB structure and using EMAN2 and Xmipp, wherein the noise-free density map and the noise density map and the structure predicted in the step S13 form a reference data set of the method.
3. The energy-optimization-based multi-objective optimization fitting prediction method for the atomic structure and the electron density map based on the energy optimization as claimed in claim 1, wherein the initial model is generated by initially moving the predicted atomic structure to the center of the density map by using the information of the electron density map, and the method specifically comprises the following steps: and reading the lattice point and origin information of the electron density map from a header file of the electron density map, moving the predicted atomic structure to the center of the electron density map, and then randomly rotating the electron density map to generate N initial models at different positions.
4. The energy optimization-based multi-objective optimization fitting prediction method for the atomic structure and electron density map based on the energy optimization as claimed in claim 1, wherein the multi-objective particle swarm optimization algorithm specifically comprises:
step 1: firstly, transforming an I-TASSER predicted model to the center of a density map, and then randomly rotating to generate initial structures of N different positions; in the MOPSO optimization, the structure of each position is considered as a particle; from two-dimensional vector C ═ t, r]Denoted as the ith particle; t and r represent the amount of translation and rotation matrix of the rigid body, respectively
Figure FDA0002623253950000011
And [ -90 °, 90 ° ]];
Step 2: updating the position of each model by rigid body translation and rotation in each simulation iteration; in each simulation, three energy functions are calculated according to the conformational coordinate and are used as target functions, and then the non-dominated solution with at least two energy functions decreasing is put into a Pareto set;
and step 3: all models of the Pareto set will be ranked and then the optimal conformation will be selected as the final structure.
5. The energy-optimized multi-objective optimization fitting prediction method for atomic structures and electron density maps based on energy optimization according to claim 4, wherein the three energy functions comprise the evaluation of global states and local structures of fitting quality, which are respectively defined as:
(r) represents a density map ρ converted from an atomic structurec(y) and Experimental Density plot ρo(y) a correlation energy function between, which evaluates the fit state over the entire structure, specifically:
Figure FDA0002623253950000021
wherein:
Figure FDA0002623253950000022
and
Figure FDA0002623253950000023
the average value of the numerical values contained in the grid points on the conversion density chart and the experimental density chart is obtained; RB (l) is the set of all grid points; CC is a global score that is very sensitive to the shape of the density map;
energy function of local correlation
Figure FDA0002623253950000024
Wherein: CC (i) is the correlation between the lattice points represented by amino acid i and the lattice points in the corresponding laboratory density plot; l isaliIs the collection of all amino acids with a correlation greater than 0; l is the amino acid length of the atomic structure; the second scoring function represents the correlation between local structures;
evaluation of correlation energy function of all amino acid fragments with correlation greater than 0
Figure FDA0002623253950000025
Wherein:
Figure FDA0002623253950000026
when the relatedness of amino acid i is greater than 0, f (i) ═ 1; i _ up represents all amino acids with a sequential correlation greater than 0 that of the amino acid number; i _ down represents all amino acids having a smaller amino acid number and a consecutive correlation of more than 0; l isali(ii) a collection of all amino acid fragments with a CV (i) of greater than 5; the scoring function is a function of similarity between segments between the table structures.
6. The energy-optimization-based multi-objective optimization fitting prediction method for atomic structures and electron density maps according to claim 1, wherein the updating of the position of each model specifically comprises:
Figure FDA0002623253950000027
wherein:
Figure FDA0002623253950000028
the representation contemplates the magnitude of the amount of movement of i in the kth iteration,
Figure FDA0002623253950000029
representing the new image of the ith model at the kth iteration, ω is an inertial weight factor, and according to the testing of the method, the method decreases linearly with the number of iterations from 1.5 to 0.5, and γ is [0,1 ]]For introducing an interference factor, c1And c2Is often set as
Figure FDA00026232539500000210
Is the best conformation of model i in the previous iteration,
Figure FDA00026232539500000211
is a non-dominant solution randomly chosen from a Pareto solution set.
7. The energy-optimization-based multi-objective optimization fitting prediction method for the atomic structure and electron density map based on the energy optimization as claimed in claim 1, wherein the optimal model selected from the pareto set is as follows: sorting the pareto solutions by using a knee algorithm, and selecting a model corresponding to a knee point as a final solution, wherein the method specifically comprises the following steps: the pareto solutions are ranked using the knee score, and then the inflection point is selected as the final solution.
8. The energy-optimized multi-objective fitted atomic structure and electron density map prediction method as claimed in claim 7, wherein after obtaining pareto solutions, each pareto solution is projected into a three-dimensional energy space with energy axes E1, E2, E3;
searching a knee solution at the pareto frontier by a marginal utility method, specifically: u shapex,λ=λ1f1(x)+λ2f2(x)+λ3f3(x) Wherein: s.t. lambda123=1andλ123>0, x is the non-dominant solution in the pareto solution set, λ123Is a weighting factor, each utility solution passing through a random value λ123The solution with the greatest marginal utility, i.e., knee point, is computed as the final output solution.
9. The method for predicting multi-objective optimization fitting of atomic structure and electron density map based on energy optimization according to claim 1, wherein the fitting result is obtained by completing the whole atomic model by atom filling software pulchra, wherein the input is main chain atoms, and the whole structure is completed by statistical bond length, bond angle and dihedral angle and output.
10. A system for implementing the method of any of claims 1-9, comprising: the system comprises an initial model generation unit, a multi-objective optimization unit, a model selection unit and a full-atomic model output unit, wherein: the initial model generating unit receives system input information, performs initialization processing and outputs a plurality of initial state results to the multi-target optimizing unit, the multi-target optimizing unit receives the information of the initial model generating unit, performs core optimization processing and outputs a plurality of optimizing results to the model selecting unit, the model selecting unit receives the information of the multi-target optimizing unit, performs optimal result selection processing and outputs an optimal solution result to the all-atom model outputting unit, the all-atom model outputting unit receives the information of the model selecting unit, performs atom completion processing and outputs an all-atom model result to a user.
CN202010789510.7A 2020-08-07 2020-08-07 Energy-based atomic structure and electron density map multi-objective optimization fitting prediction method Active CN111968707B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010789510.7A CN111968707B (en) 2020-08-07 2020-08-07 Energy-based atomic structure and electron density map multi-objective optimization fitting prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010789510.7A CN111968707B (en) 2020-08-07 2020-08-07 Energy-based atomic structure and electron density map multi-objective optimization fitting prediction method

Publications (2)

Publication Number Publication Date
CN111968707A true CN111968707A (en) 2020-11-20
CN111968707B CN111968707B (en) 2022-06-17

Family

ID=73365910

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010789510.7A Active CN111968707B (en) 2020-08-07 2020-08-07 Energy-based atomic structure and electron density map multi-objective optimization fitting prediction method

Country Status (1)

Country Link
CN (1) CN111968707B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113035268A (en) * 2021-04-09 2021-06-25 上海交通大学 Protein structure optimization method based on multi-objective decomposition optimization strategy
CN113990384A (en) * 2021-08-12 2022-01-28 清华大学 Deep learning-based frozen electron microscope atomic model structure building method and system and application
CN114612501A (en) * 2022-02-07 2022-06-10 清华大学 Neural network model training method and cryoelectron microscope density map resolution estimation method
CN114841898A (en) * 2022-06-29 2022-08-02 华中科技大学 Deep learning-based post-processing method and device for three-dimensional density map of cryoelectron microscope
CN115035947A (en) * 2022-06-10 2022-09-09 水木未来(北京)科技有限公司 Protein structure modeling method and device, electronic device and storage medium
CN115083513A (en) * 2022-06-21 2022-09-20 华中科技大学 Method for constructing protein complex structure based on medium-resolution cryoelectron microscope image
CN115239999A (en) * 2022-07-22 2022-10-25 水木未来(北京)科技有限公司 Protein electron density map processing method, device, electronic apparatus and storage medium
CN117495434A (en) * 2023-12-25 2024-02-02 天津大学 Electric energy demand prediction method, model training method, device and electronic equipment
CN117995317A (en) * 2024-04-03 2024-05-07 北京云庐科技有限公司 Method, device and medium for estimating heavy atom position based on electron density map
CN117995317B (en) * 2024-04-03 2024-06-21 北京云庐科技有限公司 Method, device and medium for estimating heavy atom position based on electron density map

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170103161A1 (en) * 2015-10-13 2017-04-13 The Governing Council Of The University Of Toronto Methods and systems for 3d structure estimation
US20170329892A1 (en) * 2016-05-10 2017-11-16 Accutar Biotechnology Inc. Computational method for classifying and predicting protein side chain conformations
CN107657311A (en) * 2017-11-03 2018-02-02 电子科技大学 Test method for optimizing based on multi-objective particle swarm algorithm
CN110582301A (en) * 2016-12-14 2019-12-17 利甘达尔股份有限公司 Methods and compositions for nucleic acid and protein payload delivery

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170103161A1 (en) * 2015-10-13 2017-04-13 The Governing Council Of The University Of Toronto Methods and systems for 3d structure estimation
US20170329892A1 (en) * 2016-05-10 2017-11-16 Accutar Biotechnology Inc. Computational method for classifying and predicting protein side chain conformations
CN110582301A (en) * 2016-12-14 2019-12-17 利甘达尔股份有限公司 Methods and compositions for nucleic acid and protein payload delivery
CN107657311A (en) * 2017-11-03 2018-02-02 电子科技大学 Test method for optimizing based on multi-objective particle swarm algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SHUO YIN.ET.: "Clustering Enhancement of Noisy Cryo-Electron Microscopy Single-Particle Images with a Network Structural Similarity Metric", 《JOURNAL OF CHEMICAL INFORMATION AND MODELING》 *
初环宇: "蛋白质设计和结构模拟若干问题研究", 《中国优秀博硕士学位论文全文数据库(博士)基础科学辑》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113035268A (en) * 2021-04-09 2021-06-25 上海交通大学 Protein structure optimization method based on multi-objective decomposition optimization strategy
CN113990384A (en) * 2021-08-12 2022-01-28 清华大学 Deep learning-based frozen electron microscope atomic model structure building method and system and application
CN113990384B (en) * 2021-08-12 2024-04-30 清华大学 Deep learning-based method, system and application for constructing atomic model structure of frozen electron microscope
CN114612501A (en) * 2022-02-07 2022-06-10 清华大学 Neural network model training method and cryoelectron microscope density map resolution estimation method
CN114612501B (en) * 2022-02-07 2024-02-13 清华大学 Neural network model training method and frozen electron microscope density map resolution estimation method
CN115035947B (en) * 2022-06-10 2023-03-10 水木未来(北京)科技有限公司 Protein structure modeling method and device, electronic device and storage medium
CN115035947A (en) * 2022-06-10 2022-09-09 水木未来(北京)科技有限公司 Protein structure modeling method and device, electronic device and storage medium
CN115083513B (en) * 2022-06-21 2023-03-10 华中科技大学 Method for constructing protein complex structure based on medium-resolution cryoelectron microscope image
CN115083513A (en) * 2022-06-21 2022-09-20 华中科技大学 Method for constructing protein complex structure based on medium-resolution cryoelectron microscope image
CN114841898A (en) * 2022-06-29 2022-08-02 华中科技大学 Deep learning-based post-processing method and device for three-dimensional density map of cryoelectron microscope
CN115239999A (en) * 2022-07-22 2022-10-25 水木未来(北京)科技有限公司 Protein electron density map processing method, device, electronic apparatus and storage medium
CN115239999B (en) * 2022-07-22 2023-04-21 水木未来(北京)科技有限公司 Protein electron density map processing method, device, electronic equipment and storage medium
CN117495434A (en) * 2023-12-25 2024-02-02 天津大学 Electric energy demand prediction method, model training method, device and electronic equipment
CN117495434B (en) * 2023-12-25 2024-04-05 天津大学 Electric energy demand prediction method, model training method, device and electronic equipment
CN117995317A (en) * 2024-04-03 2024-05-07 北京云庐科技有限公司 Method, device and medium for estimating heavy atom position based on electron density map
CN117995317B (en) * 2024-04-03 2024-06-21 北京云庐科技有限公司 Method, device and medium for estimating heavy atom position based on electron density map

Also Published As

Publication number Publication date
CN111968707B (en) 2022-06-17

Similar Documents

Publication Publication Date Title
CN111968707B (en) Energy-based atomic structure and electron density map multi-objective optimization fitting prediction method
Zhang et al. Bag of freebies for training object detection neural networks
EP2026279B1 (en) Method and system for aligning three-dimensional surfaces
Hurtado et al. Deep transfer learning in the assessment of the quality of protein models
CN108846256B (en) Group protein structure prediction method based on residue contact information
Ioerger et al. Automatic modeling of protein backbones in electron-density maps via prediction of Cα coordinates
CN111429481B (en) Target tracking method, device and terminal based on adaptive expression
Carr et al. Scalable contour tree computation by data parallel peak pruning
Purnell et al. Rapid synthesis of cryo-et data for training deep learning models
CN109346128B (en) Protein structure prediction method based on residue information dynamic selection strategy
Kofler et al. Kd-tree based n-body simulations with volume-mass heuristic on the GPU
CN112991402B (en) Wen Wudian cloud registration method and system based on improved differential evolution algorithm
Liu et al. Wang-Landau sampling in face-centered-cubic hydrophobic-hydrophilic lattice model proteins
He et al. Protein structural model selection based on protein-dependent scoring function
CN109360600B (en) Protein structure prediction method based on residue characteristic distance
Chen et al. SEGEM: A fast and accurate automated protein backbone structure modeling method for cryo-EM
Makino et al. High-order description of the dynamics in FFAGs and related accelerators
West et al. A robust fitness function and genetic algorithm to morphologically constrain the dynamics of interacting galaxies
CN112884653B (en) Broken block splicing method and system for terracotta soldiers and horses based on fracture surface information
Matsumoto A new approach for building an atomic model from a three-dimensional electron microscopy data
Martino et al. Parallel algorithms in molecular biology
Ülker et al. Gravitational Search Algorithm for NURBS Curve Fitting
Albertsson et al. Towards Fast Displaced Vertex Finding
Mi et al. GDFold2: a fast and parallelizable protein folding environment with freely defined objective functions
CN113077851A (en) Crystal structure prediction method based on generation countermeasure network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant