CN111968707A - Energy-based atomic structure and electron density map multi-objective optimization fitting prediction method - Google Patents
Energy-based atomic structure and electron density map multi-objective optimization fitting prediction method Download PDFInfo
- Publication number
- CN111968707A CN111968707A CN202010789510.7A CN202010789510A CN111968707A CN 111968707 A CN111968707 A CN 111968707A CN 202010789510 A CN202010789510 A CN 202010789510A CN 111968707 A CN111968707 A CN 111968707A
- Authority
- CN
- China
- Prior art keywords
- density map
- electron density
- model
- energy
- optimization
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
- G16B15/20—Protein or domain folding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B45/00—ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
Landscapes
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Medical Informatics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Crystallography & Structural Chemistry (AREA)
- Chemical & Material Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
A multi-objective optimization fitting prediction method based on an atomic structure and an electron density map of energy is characterized in that according to a three-dimensional structure of protein and the electron density map, an initial model is generated by building a reference data set of a prediction structure and the electron density map; then, preliminarily moving the predicted atomic structure to the center of the density map by using the information of the electron density map to generate N initial models; and selecting a pareto set through a multi-target particle swarm optimization algorithm, selecting an optimal model from the pareto set through a Knee algorithm, and calculating to obtain a fitting result between the atomic structure and the electron density map. The present invention can solve the potential bias problem caused by minimizing only a single energy function.
Description
Technical Field
The invention relates to a technology in the field of biological information, in particular to a multi-objective optimization fitting prediction method based on an atomic structure and an electron density map of global and local energy.
Background
High resolution protein structure is important for understanding the function of proteins and the mechanisms of associated diseases. Several methods such as X-ray crystallography, Nuclear Magnetic Resonance (NMR), and cryoelectron microscopy (Cryo-EM)Have been used to obtain macromolecular structures. In recent years, a large number of low-temperature electromagnetic density maps have emerged due to revolutionary developments in low-temperature electromagnetic imaging techniques, but these density maps typically have relatively low resolution (e.g., low resolution)). Although most electron-microscopic derived density maps are not sufficiently high in resolution, they can generally provide a description of the molecular topology and thus can be applied to the optimization of atomic structures. This approach to optimization based on density map constraints has become a popular direction in the field of protein structure prediction. The optimization modeling process typically includes three steps, namely: (1) predicting an atomic structure; (2) fitting an atomic structure to an electron density map; (3) the atomic structure is optimized from the electron density map. Fitting the atomic structure to the electron density map is the basis for optimizing the atomic structure, which has the effect of reducing the search space for the subsequent optimization process. Especially for low or medium resolution density maps, searching the space is a difficult issue. To obtain high resolution protein structures, advanced computational techniques are required to make up for the missing information from electron density maps.
At present, several calculation methods have been successfully applied to the fitting of density maps and electron microscopes. Such as EMFIT, Situs, 3SOM, MultiFit, ADP _ EM, Attract-EM, EMatch, Powerfit and UCSF Chimera, which have been developed to allow the interpretation of structure-related molecular functions using these software. These programs typically perform an automatic search for possible rotations and translations to maximize the cross-correlation function and thus find the best fit. ADP _ EM is a multi-resolution docking method that performs a search in a rotation space by a fast rotation matching method to maximize correlation. The collies of simple is a contour-based matching method that combines fast fourier transforms to speed up the search of the space to quickly find the location of atomic structures relative to the density map. EMatch uses a template matching process to identify secondary structure elements in the electron density map to achieve alignment of the structure. Despite partial success, there are several problems that limit the efficacy of existing fitting algorithms. First, how to evaluate the fitting quality is an important goal, which will guide the search direction. Most existing algorithms use a single global objective (e.g., Correlation Coefficient (CC)) as an optimization criterion. The fit between the predicted structure and the electron density map can be very complex and result in different fit positions due to the following complications: the quality of the predicted structure, the resolution of the density map, the SNR (signal-to-noise ratio) of the density map, the misalignment between the predicted structure and the density map, etc. A single global objective function typically does not achieve the fitting effect and robustness with such complications. Second, because the search space is not limited during the fitting process, exhaustive searches performed by existing fitting tools are a time-consuming process. Thus, a more heuristic search algorithm will help balance performance and robustness.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a multi-objective optimization fitting prediction method for an atomic structure and an electron density map based on energy, which can solve the problem of potential deviation caused by only minimizing a single energy function.
The invention is realized by the following technical scheme:
the invention relates to an atomic structure and electron density map multi-objective optimization fitting prediction method based on energy optimization, which comprises the steps of generating an initial model by establishing a reference data set of a prediction structure and an electron density map according to a protein three-dimensional structure (PDB) and the electron density map; then, preliminarily moving the predicted atomic structure to the center of the density map by using the information of the electron density map to generate N initial models; and selecting a pareto set through a multi-target particle swarm optimization algorithm, selecting an optimal model from the pareto set through a Knee algorithm, and calculating to obtain a fitting result between the atomic structure and the electron density map.
The invention relates to a system for realizing the method, which comprises the following steps: the system comprises an initial model generation unit, a multi-objective optimization unit, a model selection unit and a full-atomic model output unit, wherein: the initial model generating unit receives system input information, performs initialization processing and outputs a plurality of initial state results to the multi-target optimizing unit, the multi-target optimizing unit receives the information of the initial model generating unit, performs core optimization processing and outputs a plurality of optimizing results to the model selecting unit, the model selecting unit receives the information of the multi-target optimizing unit, performs optimal result selection processing and outputs an optimal solution result to the all-atom model outputting unit, the all-atom model outputting unit receives the information of the model selecting unit, performs atom completion processing and outputs an all-atom model result to a user.
Technical effects
The invention integrally solves the problem of high-precision fitting of the predicted structure and the electron density map. The existing fitting algorithm of the atomic structure and the electron density map solves the similar fitting problem, and has the problems of unstable fitting result, low precision and the like.
Compared with single-target optimization, the method can avoid falling into a local optimal value, so that better fitting performance is obtained. In fitting the predicted structure to the density map, the present invention takes into account global and local cross-correlations, which include the correlation of the entire model, the correlation of the amino acid fragments, and the residue level correlation score. Multiobjective optimization allows complementary tradeoffs between multiple objectives to achieve an optimal solution. The best solution will be obtained from the non-dominated Pareto set. Global correlations evaluate the preference of the fit from the global shape, while local correlations evaluate the result of the fit from the local topology, the global and local correlations being complementary to produce the final optimized result.
According to the invention, the fitting difficulty is improved through the reference data sets (292 predicted structures and 292 electron density maps) of the predicted structures and the electron density maps which accord with the actual conditions, and the fitting precision reaches the average cRMSD of 2.46. Compared with the current popular single-target method, the method has great improvement.
Drawings
FIG. 1 is a schematic diagram of an atomic structure rigid docking algorithm based on an electron density map;
FIG. 2 is a distribution histogram of TM-score of initial structures corresponding to 292 test proteins;
FIG. 3 is a resolution distribution histogram of a simulated density map corresponding to 292 test proteins;
FIG. 4 is a schematic representation of the conversion of atomic structures into a calculated density map;
FIG. 5 is a graph showing the results of a comparison of MOFIT and the other three methods on 292 test proteins;
in the figure: left: MOFIT vs ADP _ EM; the method comprises the following steps: MOFIT vs PowerFit; and (3) right: MOFIT vs SITUS;
FIG. 6 is a schematic diagram showing a pareto concentration non-dominated particle energy distribution of the target protein T0880;
FIG. 7 is a schematic representation of the assembly on the domains of the termini of MADV2 fibers from murine adenovirus 2 (ID: T0880);
in the figure: all grey transparent objects are electron density maps and black objects are atomic structures. (A) The cRMSD of the structure at the position corresponding to the maximum CC (CC value 0.383, A-2) isThe cRMSD of the structure corresponding to the TM-score matching position (CC value 0.357, A-1) is 0.0; native PDB model (A-3); (B) for all methods, the different positions represent the fitting results of the different methods; the Powerfit fitting model (B-1,) The Situs fitting model (B-2, ) ADP _ EM fitting model (B-3,) And a MOFIT fitting model (B-4,) (ii) a (C) The different positions represent the final modes for optimizing different fitting structures using RosettaMolding; RMSD of the optimized structure of PowerFit model isTM-score is 0.460 (C-1); the RMSD of the optimized structure of the Situs model isTM-score of 0.465 (C-2); RMSD of the optimized structure of ADP _ EM model isTM-score is 0.503 (C-3); RMSD of the optimized structure of MOFIT model isTM-score is 0.504 (C-4);
FIG. 8 is a schematic view ofA schematic diagram of the optimization effect of the test fit on the 3a1iA single domain protein on a simulated density map of resolution;
in the figure: all grey transparent objects are electron density maps and black objects are atomic structures. (A) The cRMSD of the structure corresponding to the TM-score matching position (A-1) was 0.0; the natural PDB model corresponds to (A-2); (B) different positions represent different fitting results, the cRMSD of the fitting model of PowerFit iscRMSD of the Situs fitting model is cRMSD of ADP _ EM fitting model isAnd a cRMSD of MOFIT fitting model of(C) The different positions represent the model after optimization using EM-Refiner; the RMSD of the model after the PowerFit model is optimized isThe RMSD of the model after the Situs model optimization is asThe RMSD of the model after the ADP _ EM model optimization isRMSD of the model after MOFIT model optimization is
FIG. 9 is a diagram illustrating a comparison between a single objective function and multiple objective functions of multiple objectives;
in the figure: each square represents a cRMSD coordinate.
Detailed Description
As shown in fig. 1, the present embodiment relates to a multi-objective optimization fitting prediction method for an atomic structure and an electron density map based on energy, which includes creating a reference data set of a predicted structure and an electron density map and generating initial models, and then initially moving the predicted atomic structure to the center of the density map by using information of the electron density map to generate N initial models; and selecting a pareto set through a multi-target particle swarm optimization algorithm, selecting an optimal model from the pareto set through a Knee algorithm, and calculating to obtain a fitting result between the atomic structure and the electron density map.
The method for constructing the reference data set of the prediction structure and the electron density map and obtaining the statistical result of the data set is shown in fig. 2 and fig. 3, and comprises the following steps:
s11, firstly, extracting all PDBs (1809 in total) containing the electron density map from the PDB database, and then splitting the 1809 PDB structures into 37952 single-chain PDB structures;
as shown in fig. 4, the electron densityA map obtained by converting the protein structure into a calculated density map by converting the atomic coordinates into a lattice point space in which the density map is located by a coordinate conversion system, for a given inclusion only having coordinates x1~xNThe density map on grid y of the protein structure of c α atom of (1) is:wherein: k ═ n (pi/(2.4 + 0.8R)0))2,C=a·(k/π)1.5K and C are parameters for describing the shape of the Gaussian kernel, R0For the resolution of the electron density map, a is the mass of the c α atom.
S12, deleting the sequences with redundancy of more than 90% by using the CD-HIT, and remaining 2488 samples. After eliminating samples with too short or discontinuous sequences from 2488 samples, 1186 samples remain;
s13, randomly selecting 292 corresponding sequences from 1186 samples as initial samples of the data set of the method, predicting corresponding atomic structures by utilizing I-TASSER, and giving a TM-score distribution histogram of the 292 predicted protein initial structures in figure 2;
s14, using the 292 target natural PDB structures, using EMAN2 and Xmipp to simulate the corresponding noise-free density map and noise density map (fig. 3 shows the resolution distribution histogram of the 292 protein corresponding simulated density maps), and the structures predicted in S13 above constitute the reference dataset of the method.
The initial model preliminarily moves the predicted atomic structure to the center of the density map by utilizing the information of the electron density map to generate N initial models, and the specific generation mode is as follows: and reading the lattice point and origin information of the electron density map from a header file of the electron density map, moving the predicted atomic structure to the center of the electron density map, and then randomly rotating the electron density map to generate N initial models at different positions.
The multi-objective particle swarm optimization algorithm specifically comprises the following steps:
step 1: the model of the I-TASSER prediction is first transformed to the density map center and then randomly rotated to generate the initial structure of N different positions. In thatIn the MOPSO optimization, the structure of each position is considered as one particle. From two-dimensional vector C ═ t, r]Denoted as the ith particle. At this time, t and r represent the amount of translation and rotation matrix of the rigid body, respectively, to which they belongAnd [ -90 °, 90 ° ]]。
Step 2: in each simulation iteration, the position of each model is updated by rigid body translation and rotation. In each simulation, three energy functions are calculated as objective functions according to the conformational coordinates, and the non-dominated solution with at least two energy functions decreasing is put into the Pareto set.
The three energy functions include an evaluation of the global state and the local structure of the fitting quality, which are defined as:
(r) represents a density map ρ converted from an atomic structurec(y) and Experimental Density plot ρo(y) a correlation energy function between, which evaluates the fit state over the entire structure, specifically:wherein:andthe average value of the values contained in the grid points on the converted density chart and the experimental density chart is shown. RB (l) is the set of all grid points. CC is a global score that is very sensitive to the shape of the density map.
Energy function of local correlationWherein: CC (i) is the correlation between the lattice points represented by amino acid i and the lattice points in the corresponding laboratory density plot; l isaliIs the collection of all amino acids with a correlation greater than 0; l is the amino acid length of the atomic structure. The score function represents the correlation between local structures。
Evaluation of correlation energy function of all amino acid fragments with correlation greater than 0Wherein:when the relatedness of amino acid i is greater than 0, f (i) ═ 1; i _ up represents all amino acids with a sequential correlation greater than 0 that of the amino acid number; i _ down represents all amino acids having a smaller amino acid number and a consecutive correlation of more than 0; l isaliIs a collection of all amino acid fragments with a CV (i) of greater than 5. The scoring function is a function of similarity between segments between the table structures.
The updating of the position of each model specifically includes:
wherein:the representation contemplates the magnitude of the amount of movement of i in the kth iteration.Representing the new constellation of the ith model at the kth iteration. Omega is an inertial weight factor, which, according to the test of the method, decreases linearly with the number of iterations from 1.5 to 0.5, depending on the value. Gamma is a [0,1 ]]For introducing interference factors. c. C1And c2Is often set asIs the best conformation of model i in the previous iteration,is a non-dominant solution randomly chosen from a Pareto solution set.
And step 3: all models of the Pareto set will be ranked and then the optimal conformation will be selected as the final structure.
The optimal model selected from the pareto set is as follows: sorting the pareto solutions by using a knee algorithm, and selecting a model corresponding to a knee point as a final solution, wherein the method specifically comprises the following steps: the pareto solutions are ranked using the knee score, and then the inflection point is selected as the final solution.
After obtaining the pareto solutions, each pareto solution is projected into a three-dimensional energy space having energy axes E1, E2, E3. Searching a knee solution at the pareto frontier by a marginal utility method, specifically: u shapex,λ=λ1f1(x)+λ2f2(x)+λ3f3(x) Wherein: s.t. lambda1+λ2+λ3=1andλ1,λ2,λ3>0, x is the non-dominant solution in the pareto solution set. Lambda [ alpha ]1,λ2,λ3Is a weighting factor. The respective utility solutions may be represented by random values λ1,λ2,λ3And (6) calculating. The solution with the greatest marginal utility (knee point) is the final output solution.
And the fitting result is obtained by supplementing the whole atom model through supplementing atom software pulchra, the input of the atom model is main chain atoms, and the whole structure is supplemented by utilizing the statistical bond length, the statistical bond angle and the statistical dihedral angle and is output.
The method calculates the cRMSD of the fitted model,in the formula, L is the length of the protein sequence,representing the coordinates of the predicted structure aligned to the original structure using the TM-score alignment procedure,representing the coordinates of the atomic structures corresponding to the predicted structures in the predicted structures to density maps using MOFIT fitting, the experimental results are as follows:
table 1 statistics on 292 tested proteins.
In table 1: cRMSDaRepresents the cRMSD, p-value between the TM-score aligned structure and the fit program aligned structurebResults of student's test representing cRMSD between the conformations of MOFIT and other fitting programs, Better casescRepresenting the number of MOFIT fitting results superior to those of other methods.
As shown in fig. 5-8, the results of the tests on specific proteins. The method compares the most popular algorithms (ADP _ EM, Situs, PowerFit) at present, and the comparison result with other algorithms shows the superiority of multi-target optimization in fitting the protein structure to the density map compared with single-target optimization.
As shown in fig. 9, for each individual objective function in the multiple objectives and for comparison between the multiple objective functions.
As can be seen from FIGS. 5-8 and 9, the method has better performance on a plurality of proteins. Table 1 gives the statistics on 292 tested proteins. The columns 2-3 in Table 1 give the alignment results on 292 noisy datasets. On a noise-free dataset, MOFIT achieved a cRMSD of 2.46, which was reduced compared to ADP _ EM, Situs, PowerFit, respectively p-vlaue values of 3.51X 10, respectively-3,6.35×10-4And 7.92X 10-9. This indicates that on a noise-free density map, the MOFIT outperforms the other three methods and its statistical performance is significant. Columns 4-5 in table 1 give the statistical results under the noisy data set. MOFIT achieves relatively worse than no noiseBut relative to the cRMSD of the other three methods, the sameSample descendingP-value of 1.19X 10-4,1.11×10-3,1.11×10-89. This demonstrates that on noisy data sets, the MOFIT is also superior to other methods. The number of proteins in MOFIT that outperformed the other methods is also given in Table 1. In the noise-free data, the number of proteins with MOFIT is smaller than that of cRMSD of ADP _ EM, Situs, PowerFit, 188, 162, 262, respectively. 200, 148, 286 on the noise data, respectively.
The method utilizes three designed global cross correlation and local cross correlation, such as the correlation of the whole model, the correlation of amino acid fragments and a residue level correlation energy function, to cooperate with multi-objective optimization to select local optimal points, wherein the different local optimal points are complementary and compromised on a plurality of targets with a single target, so as to find out an optimal solution. Obtained on the data set organized by the inventionHas a reduced average CRMSD compared to the current methods ADP _ EM, Situs, and PowerFit, respectivelyStatistically, p-value was 3.51X 10-3,6.35×10-4And 7.92X 10-9. This demonstrates that the present invention is statistically significantly superior to the prior art relative to other methods.
The foregoing embodiments may be modified in many different ways by those skilled in the art without departing from the spirit and scope of the invention, which is defined by the appended claims and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.
Claims (10)
1. A multi-objective optimization fitting prediction method of an atomic structure and an electron density map based on energy optimization is characterized in that an initial model is generated by building a reference data set of a predicted structure and an electron density map according to a three-dimensional structure of a protein and the electron density map; then, preliminarily moving the predicted atomic structure to the center of the density map by using the information of the electron density map to generate N initial models; and selecting a pareto set through a multi-target particle swarm optimization algorithm, selecting an optimal model from the pareto set through a Knee algorithm, and calculating to obtain a fitting result between the atomic structure and the electron density map.
2. The energy-optimization-based multi-objective optimization fitting prediction method for the atomic structure and electron density map based on the energy optimization as claimed in claim 1, wherein the establishing of the reference data set of the predicted structure and the electron density map specifically comprises:
s11, firstly, extracting all PDBs containing the electron density map from the PDB database, and then splitting 1809 PDB structures into 37952 single-chain PDB structures;
s12, deleting sequences with redundancy of more than 90% by using CD-HIT, remaining 2488 samples, and removing over-short or discontinuous samples in the 2488 samples to leave 1186 samples;
s13, selecting 292 corresponding sequences from 1186 samples randomly as initial samples, and predicting corresponding atomic structures by utilizing I-TASSER;
s14, simulating a corresponding noise-free density map and a noise density map by using a 292 target natural PDB structure and using EMAN2 and Xmipp, wherein the noise-free density map and the noise density map and the structure predicted in the step S13 form a reference data set of the method.
3. The energy-optimization-based multi-objective optimization fitting prediction method for the atomic structure and the electron density map based on the energy optimization as claimed in claim 1, wherein the initial model is generated by initially moving the predicted atomic structure to the center of the density map by using the information of the electron density map, and the method specifically comprises the following steps: and reading the lattice point and origin information of the electron density map from a header file of the electron density map, moving the predicted atomic structure to the center of the electron density map, and then randomly rotating the electron density map to generate N initial models at different positions.
4. The energy optimization-based multi-objective optimization fitting prediction method for the atomic structure and electron density map based on the energy optimization as claimed in claim 1, wherein the multi-objective particle swarm optimization algorithm specifically comprises:
step 1: firstly, transforming an I-TASSER predicted model to the center of a density map, and then randomly rotating to generate initial structures of N different positions; in the MOPSO optimization, the structure of each position is considered as a particle; from two-dimensional vector C ═ t, r]Denoted as the ith particle; t and r represent the amount of translation and rotation matrix of the rigid body, respectivelyAnd [ -90 °, 90 ° ]];
Step 2: updating the position of each model by rigid body translation and rotation in each simulation iteration; in each simulation, three energy functions are calculated according to the conformational coordinate and are used as target functions, and then the non-dominated solution with at least two energy functions decreasing is put into a Pareto set;
and step 3: all models of the Pareto set will be ranked and then the optimal conformation will be selected as the final structure.
5. The energy-optimized multi-objective optimization fitting prediction method for atomic structures and electron density maps based on energy optimization according to claim 4, wherein the three energy functions comprise the evaluation of global states and local structures of fitting quality, which are respectively defined as:
(r) represents a density map ρ converted from an atomic structurec(y) and Experimental Density plot ρo(y) a correlation energy function between, which evaluates the fit state over the entire structure, specifically:wherein:andthe average value of the numerical values contained in the grid points on the conversion density chart and the experimental density chart is obtained; RB (l) is the set of all grid points; CC is a global score that is very sensitive to the shape of the density map;
energy function of local correlationWherein: CC (i) is the correlation between the lattice points represented by amino acid i and the lattice points in the corresponding laboratory density plot; l isaliIs the collection of all amino acids with a correlation greater than 0; l is the amino acid length of the atomic structure; the second scoring function represents the correlation between local structures;
evaluation of correlation energy function of all amino acid fragments with correlation greater than 0Wherein:when the relatedness of amino acid i is greater than 0, f (i) ═ 1; i _ up represents all amino acids with a sequential correlation greater than 0 that of the amino acid number; i _ down represents all amino acids having a smaller amino acid number and a consecutive correlation of more than 0; l isali(ii) a collection of all amino acid fragments with a CV (i) of greater than 5; the scoring function is a function of similarity between segments between the table structures.
6. The energy-optimization-based multi-objective optimization fitting prediction method for atomic structures and electron density maps according to claim 1, wherein the updating of the position of each model specifically comprises:
wherein:the representation contemplates the magnitude of the amount of movement of i in the kth iteration,representing the new image of the ith model at the kth iteration, ω is an inertial weight factor, and according to the testing of the method, the method decreases linearly with the number of iterations from 1.5 to 0.5, and γ is [0,1 ]]For introducing an interference factor, c1And c2Is often set asIs the best conformation of model i in the previous iteration,is a non-dominant solution randomly chosen from a Pareto solution set.
7. The energy-optimization-based multi-objective optimization fitting prediction method for the atomic structure and electron density map based on the energy optimization as claimed in claim 1, wherein the optimal model selected from the pareto set is as follows: sorting the pareto solutions by using a knee algorithm, and selecting a model corresponding to a knee point as a final solution, wherein the method specifically comprises the following steps: the pareto solutions are ranked using the knee score, and then the inflection point is selected as the final solution.
8. The energy-optimized multi-objective fitted atomic structure and electron density map prediction method as claimed in claim 7, wherein after obtaining pareto solutions, each pareto solution is projected into a three-dimensional energy space with energy axes E1, E2, E3;
searching a knee solution at the pareto frontier by a marginal utility method, specifically: u shapex,λ=λ1f1(x)+λ2f2(x)+λ3f3(x) Wherein: s.t. lambda1+λ2+λ3=1andλ1,λ2,λ3>0, x is the non-dominant solution in the pareto solution set, λ1,λ2,λ3Is a weighting factor, each utility solution passing through a random value λ1,λ2,λ3The solution with the greatest marginal utility, i.e., knee point, is computed as the final output solution.
9. The method for predicting multi-objective optimization fitting of atomic structure and electron density map based on energy optimization according to claim 1, wherein the fitting result is obtained by completing the whole atomic model by atom filling software pulchra, wherein the input is main chain atoms, and the whole structure is completed by statistical bond length, bond angle and dihedral angle and output.
10. A system for implementing the method of any of claims 1-9, comprising: the system comprises an initial model generation unit, a multi-objective optimization unit, a model selection unit and a full-atomic model output unit, wherein: the initial model generating unit receives system input information, performs initialization processing and outputs a plurality of initial state results to the multi-target optimizing unit, the multi-target optimizing unit receives the information of the initial model generating unit, performs core optimization processing and outputs a plurality of optimizing results to the model selecting unit, the model selecting unit receives the information of the multi-target optimizing unit, performs optimal result selection processing and outputs an optimal solution result to the all-atom model outputting unit, the all-atom model outputting unit receives the information of the model selecting unit, performs atom completion processing and outputs an all-atom model result to a user.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010789510.7A CN111968707B (en) | 2020-08-07 | 2020-08-07 | Energy-based atomic structure and electron density map multi-objective optimization fitting prediction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010789510.7A CN111968707B (en) | 2020-08-07 | 2020-08-07 | Energy-based atomic structure and electron density map multi-objective optimization fitting prediction method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111968707A true CN111968707A (en) | 2020-11-20 |
CN111968707B CN111968707B (en) | 2022-06-17 |
Family
ID=73365910
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010789510.7A Active CN111968707B (en) | 2020-08-07 | 2020-08-07 | Energy-based atomic structure and electron density map multi-objective optimization fitting prediction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111968707B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113035268A (en) * | 2021-04-09 | 2021-06-25 | 上海交通大学 | Protein structure optimization method based on multi-objective decomposition optimization strategy |
CN113990384A (en) * | 2021-08-12 | 2022-01-28 | 清华大学 | Deep learning-based frozen electron microscope atomic model structure building method and system and application |
CN114612501A (en) * | 2022-02-07 | 2022-06-10 | 清华大学 | Neural network model training method and cryoelectron microscope density map resolution estimation method |
CN114841898A (en) * | 2022-06-29 | 2022-08-02 | 华中科技大学 | Deep learning-based post-processing method and device for three-dimensional density map of cryoelectron microscope |
CN115035947A (en) * | 2022-06-10 | 2022-09-09 | 水木未来(北京)科技有限公司 | Protein structure modeling method and device, electronic device and storage medium |
CN115083513A (en) * | 2022-06-21 | 2022-09-20 | 华中科技大学 | Method for constructing protein complex structure based on medium-resolution cryoelectron microscope image |
CN115239999A (en) * | 2022-07-22 | 2022-10-25 | 水木未来(北京)科技有限公司 | Protein electron density map processing method, device, electronic apparatus and storage medium |
CN117495434A (en) * | 2023-12-25 | 2024-02-02 | 天津大学 | Electric energy demand prediction method, model training method, device and electronic equipment |
CN117995317A (en) * | 2024-04-03 | 2024-05-07 | 北京云庐科技有限公司 | Method, device and medium for estimating heavy atom position based on electron density map |
CN117995317B (en) * | 2024-04-03 | 2024-06-21 | 北京云庐科技有限公司 | Method, device and medium for estimating heavy atom position based on electron density map |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170103161A1 (en) * | 2015-10-13 | 2017-04-13 | The Governing Council Of The University Of Toronto | Methods and systems for 3d structure estimation |
US20170329892A1 (en) * | 2016-05-10 | 2017-11-16 | Accutar Biotechnology Inc. | Computational method for classifying and predicting protein side chain conformations |
CN107657311A (en) * | 2017-11-03 | 2018-02-02 | 电子科技大学 | Test method for optimizing based on multi-objective particle swarm algorithm |
CN110582301A (en) * | 2016-12-14 | 2019-12-17 | 利甘达尔股份有限公司 | Methods and compositions for nucleic acid and protein payload delivery |
-
2020
- 2020-08-07 CN CN202010789510.7A patent/CN111968707B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170103161A1 (en) * | 2015-10-13 | 2017-04-13 | The Governing Council Of The University Of Toronto | Methods and systems for 3d structure estimation |
US20170329892A1 (en) * | 2016-05-10 | 2017-11-16 | Accutar Biotechnology Inc. | Computational method for classifying and predicting protein side chain conformations |
CN110582301A (en) * | 2016-12-14 | 2019-12-17 | 利甘达尔股份有限公司 | Methods and compositions for nucleic acid and protein payload delivery |
CN107657311A (en) * | 2017-11-03 | 2018-02-02 | 电子科技大学 | Test method for optimizing based on multi-objective particle swarm algorithm |
Non-Patent Citations (2)
Title |
---|
SHUO YIN.ET.: "Clustering Enhancement of Noisy Cryo-Electron Microscopy Single-Particle Images with a Network Structural Similarity Metric", 《JOURNAL OF CHEMICAL INFORMATION AND MODELING》 * |
初环宇: "蛋白质设计和结构模拟若干问题研究", 《中国优秀博硕士学位论文全文数据库(博士)基础科学辑》 * |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113035268A (en) * | 2021-04-09 | 2021-06-25 | 上海交通大学 | Protein structure optimization method based on multi-objective decomposition optimization strategy |
CN113990384A (en) * | 2021-08-12 | 2022-01-28 | 清华大学 | Deep learning-based frozen electron microscope atomic model structure building method and system and application |
CN113990384B (en) * | 2021-08-12 | 2024-04-30 | 清华大学 | Deep learning-based method, system and application for constructing atomic model structure of frozen electron microscope |
CN114612501A (en) * | 2022-02-07 | 2022-06-10 | 清华大学 | Neural network model training method and cryoelectron microscope density map resolution estimation method |
CN114612501B (en) * | 2022-02-07 | 2024-02-13 | 清华大学 | Neural network model training method and frozen electron microscope density map resolution estimation method |
CN115035947B (en) * | 2022-06-10 | 2023-03-10 | 水木未来(北京)科技有限公司 | Protein structure modeling method and device, electronic device and storage medium |
CN115035947A (en) * | 2022-06-10 | 2022-09-09 | 水木未来(北京)科技有限公司 | Protein structure modeling method and device, electronic device and storage medium |
CN115083513B (en) * | 2022-06-21 | 2023-03-10 | 华中科技大学 | Method for constructing protein complex structure based on medium-resolution cryoelectron microscope image |
CN115083513A (en) * | 2022-06-21 | 2022-09-20 | 华中科技大学 | Method for constructing protein complex structure based on medium-resolution cryoelectron microscope image |
CN114841898A (en) * | 2022-06-29 | 2022-08-02 | 华中科技大学 | Deep learning-based post-processing method and device for three-dimensional density map of cryoelectron microscope |
CN115239999A (en) * | 2022-07-22 | 2022-10-25 | 水木未来(北京)科技有限公司 | Protein electron density map processing method, device, electronic apparatus and storage medium |
CN115239999B (en) * | 2022-07-22 | 2023-04-21 | 水木未来(北京)科技有限公司 | Protein electron density map processing method, device, electronic equipment and storage medium |
CN117495434A (en) * | 2023-12-25 | 2024-02-02 | 天津大学 | Electric energy demand prediction method, model training method, device and electronic equipment |
CN117495434B (en) * | 2023-12-25 | 2024-04-05 | 天津大学 | Electric energy demand prediction method, model training method, device and electronic equipment |
CN117995317A (en) * | 2024-04-03 | 2024-05-07 | 北京云庐科技有限公司 | Method, device and medium for estimating heavy atom position based on electron density map |
CN117995317B (en) * | 2024-04-03 | 2024-06-21 | 北京云庐科技有限公司 | Method, device and medium for estimating heavy atom position based on electron density map |
Also Published As
Publication number | Publication date |
---|---|
CN111968707B (en) | 2022-06-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111968707B (en) | Energy-based atomic structure and electron density map multi-objective optimization fitting prediction method | |
Zhang et al. | Bag of freebies for training object detection neural networks | |
EP2026279B1 (en) | Method and system for aligning three-dimensional surfaces | |
Hurtado et al. | Deep transfer learning in the assessment of the quality of protein models | |
CN108846256B (en) | Group protein structure prediction method based on residue contact information | |
Ioerger et al. | Automatic modeling of protein backbones in electron-density maps via prediction of Cα coordinates | |
CN111429481B (en) | Target tracking method, device and terminal based on adaptive expression | |
Carr et al. | Scalable contour tree computation by data parallel peak pruning | |
Purnell et al. | Rapid synthesis of cryo-et data for training deep learning models | |
CN109346128B (en) | Protein structure prediction method based on residue information dynamic selection strategy | |
Kofler et al. | Kd-tree based n-body simulations with volume-mass heuristic on the GPU | |
CN112991402B (en) | Wen Wudian cloud registration method and system based on improved differential evolution algorithm | |
Liu et al. | Wang-Landau sampling in face-centered-cubic hydrophobic-hydrophilic lattice model proteins | |
He et al. | Protein structural model selection based on protein-dependent scoring function | |
CN109360600B (en) | Protein structure prediction method based on residue characteristic distance | |
Chen et al. | SEGEM: A fast and accurate automated protein backbone structure modeling method for cryo-EM | |
Makino et al. | High-order description of the dynamics in FFAGs and related accelerators | |
West et al. | A robust fitness function and genetic algorithm to morphologically constrain the dynamics of interacting galaxies | |
CN112884653B (en) | Broken block splicing method and system for terracotta soldiers and horses based on fracture surface information | |
Matsumoto | A new approach for building an atomic model from a three-dimensional electron microscopy data | |
Martino et al. | Parallel algorithms in molecular biology | |
Ülker et al. | Gravitational Search Algorithm for NURBS Curve Fitting | |
Albertsson et al. | Towards Fast Displaced Vertex Finding | |
Mi et al. | GDFold2: a fast and parallelizable protein folding environment with freely defined objective functions | |
CN113077851A (en) | Crystal structure prediction method based on generation countermeasure network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |