CN114121146B - RNA tertiary structure prediction method based on parallel and Monte Carlo strategies - Google Patents

RNA tertiary structure prediction method based on parallel and Monte Carlo strategies Download PDF

Info

Publication number
CN114121146B
CN114121146B CN202111428461.5A CN202111428461A CN114121146B CN 114121146 B CN114121146 B CN 114121146B CN 202111428461 A CN202111428461 A CN 202111428461A CN 114121146 B CN114121146 B CN 114121146B
Authority
CN
China
Prior art keywords
energy
conformation
rna
value
potential energy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111428461.5A
Other languages
Chinese (zh)
Other versions
CN114121146A (en
Inventor
刘振栋
杨玉荣
李冬雁
陈曦
吕欣荣
秦梦颖
柏苛
何志强
李晓峰
王少华
胡国胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Jianzhu University
Original Assignee
Shandong Jianzhu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Jianzhu University filed Critical Shandong Jianzhu University
Priority to CN202111428461.5A priority Critical patent/CN114121146B/en
Publication of CN114121146A publication Critical patent/CN114121146A/en
Application granted granted Critical
Publication of CN114121146B publication Critical patent/CN114121146B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment

Abstract

The invention discloses an RNA tertiary structure prediction method based on parallel and Monte Carlo strategies, and belongs to the field of structure prediction. The method comprises performing conformational space sampling by using a parallel mechanism; scoring according to the latest updated energy function; performing rationality judgment on Monte Carlo operation of which the conformation is based on 'Stepwise ansatz' through two rounds of potential energy judgment; and finally, judging the structural integrity and modeling accuracy, and processing the result until a stable RNA tertiary structure with high accuracy and high integrity is obtained. The RNA tertiary structure prediction method provided by the invention can obtain the RNA tertiary structure with high precision and high integrity. The RNA three-level structure prediction method based on the parallel and Monte Carlo strategies has high flexibility, the Monte Carlo times can be specified, and the modeling precision and the modeling time cost can be measured by a user; the method solves the problem that the modeling of the RNA motif is incomplete in the prior art; the invention increases the breadth and depth of the conformational sampling, reduces the influence of the pseudo minimum free energy and improves the modeling precision.

Description

RNA tertiary structure prediction method based on parallel and Monte Carlo strategies
Technical Field
The invention belongs to the field of structure prediction, and particularly relates to an RNA tertiary structure prediction method based on parallel and Monte Carlo strategies.
Background
New studies have found that RNA has some complex biological functions. The structure determines the function, so that it is necessary to know the structure of RNA in advance in order to explore the function of RNA. At present, two methods for determining RNA tertiary structure at home and abroad are mainly available. The first method is to use experimental measurement methods such as x-ray, nuclear magnetic resonance and a frozen electron microscope, and the result obtained by the experimental method is relatively accurate and reliable, but the number of conformations increases exponentially with the increase of the length of the RNA, so that the cost is high. The second method is a structure prediction method based on biological calculation, and the current RNA tertiary structure prediction algorithm is mainly a knowledge mining-based prediction method and a physical prediction method. Knowledge mining-based three-level structure prediction methods rely on a library of known RNA templates; the physical-based prediction method reduces the dependence on the database, but still has the problem that the structural modeling precision is not high enough, and cannot meet the current structural prediction requirement. Thus, for this current situation, we need to innovate the existing method.
In the protein domain, there is a hypothesis that the native conformation of a macromolecule has the lowest free energy, and that the free energy function approximates the sum of hydrogen bonds, van der Waals forces, electrostatic forces, and solvation terms. However, the results obtained by applying the method of protein research to RNA research are poor due to the different folding modes of proteins and RNA molecules. Therefore, we still assume that the macromolecular native conformation has the lowest free energy, but assign different weights to different tertiary interactions, and linearly add to obtain the free energy, against the drawbacks of the prior art. In addition, aiming at the limitation of the single-thread conformational ability, a parallel mechanism is adopted, and meanwhile, multiple judgment is carried out on a modeling result, so that a gradual Monte Carlo parallelization method (SMCP) which is a method specially used for predicting the RNA tertiary structure is obtained.
Disclosure of Invention
Aiming at the defects of the existing RNA structure prediction method, the invention provides an RNA tertiary structure prediction method SMCP based on parallel and Monte Carlo strategies. The SMCP increases the breadth and depth of conformational sampling through a parallel mechanism, screens intermediate results through multiple potential energy judgment, increases the integrity of the results through result judgment and improves modeling accuracy. The method aims to solve the defects of single line Cheng Gou image sampling and the problems of low modeling integrity and precision in the current RNA structure prediction method.
The RNA tertiary structure prediction method based on parallel and Monte Carlo strategies comprises the following steps:
(1) Initializing an RNA motif, and determining the parallel stroke number n and the Monte Carlo times m; n is a natural number greater than 1, m is a natural number of 200-50000;
(2) Constructing a conformational space for the RNA motif, performing efficient conformational sampling on the conformational space by using a parallel mechanism and a 'Stepwise ansatz' hypothesis, performing operations such as adding, deleting, combining, resampling and the like on single nucleotide, and performing multiple random operations to obtain a candidate conformational set;
(3) Calculating potential energy value of the candidate conformation obtained in the step (2) by using an energy function, wherein the biomolecule potential energy value is approximate to Rosetta energy function value, and Rosetta energy value is delta E according to a formula total =∑ i ω i E ii ,aa i ) Calculating the linear sum of all energy terms scaled by weight, wherein E i Is the energy term, ω i Is the weight of each energy term, Θ i Is the geometrical degree of freedom aa i Is a chemical identity; in addition, the calculation process needs to be based on a connection weight formulaTo calculate potential energy E of each energy term x Wherein E is x Is the potential energy value of the energy term x;
(4) Judging the conformational potential energy value obtained in the step (3), wherein after the random operation of the step (2), the change of the potential energy value determines whether the operations of adding, deleting, conformational merging, resampling and the like of the nucleotide can be accepted or not; according to the standard:
determining whether random manipulation of the nucleotide is acceptable, wherein the metapolis criterion is defined by the formula:representing, obtaining a real candidate conformation set after preliminary potential energy judgment;
(5) Further potential energy judgment is carried out on the candidate conformation set obtained in the step (4), and the conformation structure with a low potential energy value is more stable, so that the conformation with the lowest potential energy value is selected as the current best candidate conformation by integrating all threads;
(6) Performing precision calculation on the current best candidate conformation obtained in the step (5), wherein RMSD is an important index for describing structural similarity of two conformations of a molecule; according to the formulaTo calculate RMSD, thereby describing modeling accuracy, wherein +.>Is the distance between atom j and the reference conformation or the average position of m equivalent atoms; in addition, rigid stacking is typically performed to minimize RMSD and then return the minimum value as the final precision value. According to the formula
Calculation, wherein n, v represent given two points;
(7) Performing accuracy judgment on the current best candidate conformation obtained in the step (5-6); we consider that the predicted conformation is in agreement with the experimentally determined conformational errorWithin this, the predicted conformation is the native conformation (i.e. modeling accuracy is required +.>) The method comprises the steps of carrying out a first treatment on the surface of the Therefore, the judgment is performed:
(8) Carrying out integrity judgment on the current best candidate conformation obtained in the step (7), and judging
(9) The conformation obtained in the step (8) is a high-precision high-integrity conformation, a final modeling result is obtained, visual analysis is carried out by using UCSF Chimer, and comparison analysis can be carried out on the conformation measured through experiments and the conformation predicted by the RNA tertiary structure prediction method through the UCSF Chimer.
Preferably, n in the step (1) has a value of 3, and m has a value of 10000;
preferably, the step (3) of obtaining conformational potential energy value comprises the steps of:
(1-1) calculating the energy of the atomic pair interactions. The atomic pair inter/intra interactions include: van der waals forces, electrostatic forces, solvation terms, hydrogen bonding forces, disulfide bonding forces. Energy terms embodying atomic pair interactions include: fa_rep, fa_intra_rep, fa_atr, fa_elec, fa_sol, lk_ball_wtd, hbond_sc, dslf_fal3, hbond_lr_bb, hbond_sr_bb, hbond_bb_sc;
(1-2) calculating the energy related to the torsion of the protein backbone and the side chains. The term indicating the torsion angle is: the pull-type diagram, the backbone design term, and the side chain conformation, the relevant energy terms include: rama_prepro, p_aa_pp, fa_ dun;
(1-3) calculating the energy of the torsion term (peptide bond dihedral angle) under special conditions. Related energy terms include omega, pro_close,
yhh_plannarity;
(1-4) calculating the energy of the non-ideal bond length and angle (Cartesian product bond energy). The relevant energy terms include: a cart_bound;
(1-5) the energy terms of all energy functions under the Rosetta framework are the same, and the difference between different energy functions is the difference of the weight values of the energy terms;
according to formula E total =ω fa_rep E fa_repfa_intra_rep E fa_intra_repfa_atr E fa_atrfa_ elec E fa_elec E fa_elecfa_sol E fa_sollk_ball_wtd E lk_ball_wtdhbond_sc E hbond_scdslf_fal3 E dslf_fal3hbond_lr_bb E hbond_lr_bbhbond_sr_bb E hbond_sr_bbhbond_bb_sc E hbond_bb_scrama_prepro E rama_preprop_aa_pp E p_aa_ppfa_dun E fa_dunomega E omegapro_close E pro_closeyhh_plannarity E yhh_plammaritycart_bonded E cart_bonded And calculating the weighted sum of all energy items in the steps, wherein ωx is the weight of the energy item x, and obtaining the potential energy value of the candidate conformation after calculation.
Preferably, step (4) is centered on calculating the system energy change Δe. The Metropolis criterion described in step (4) is according to the formulaTo determine an acceptance criterion. Where df is the difference in fitness between the new conformation and the original conformation, i.e. df=f (new) -f (old); t is a control parameter of the annealing process.
Compared with the prior art, the method has the beneficial effects that:
the algorithm innovates the RNA tertiary structure prediction algorithm, and realizes efficient structure prediction. The algorithm is based on "Stepwise ansatz"
It is assumed that by manipulating a single nucleotide, the need to enumerate all conformations at once is avoided; the structure is predicted by randomly sampling the conformation added with single nucleotide, so that the modeling stage without depending on fragments or coarse granularity is realized, the calculated amount is reduced, and the modeling time is saved; and the algorithm is optimized by utilizing parallelization, multiple programs are operated simultaneously, the prediction precision and modeling integrity are improved by screening layer by layer according to the energy value, and the modeling time is saved.
Drawings
FIG. 1 is a schematic diagram of a parallel mechanism;
FIG. 2 is a flow chart of the SMCP method;
FIG. 3 is an example of predicting RNA tertiary structure using the SMCP method;
FIG. 4 is a graph comparing modeling accuracy results of predicting RNA tertiary structure using the SMCP method and SWM method under Rosetta framework.
Detailed Description
In order to clearly illustrate the technical solution of the present invention, the present invention is described below with reference to the accompanying drawings (1-3) and examples, which are provided herein for the purpose of illustrating the present invention only and are not limiting.
Fig. 1 shows a schematic diagram of serial sampling and parallel sampling. When a serial sampling method is adopted, random search is started from s to perform conformational sampling, and the position of the local minimum energy can be found through a Monte Carlo mechanism; however, the searchability of single-threaded is limited, it is difficult to find the true lowest energy across energy barriers, and the lowest potential of the conformation obtained by single-threaded conformational search may be pseudo-lowest potential, resulting in low prediction accuracy of the RNA tertiary structure prediction method. When the parallel sampling method is adopted, a plurality of threads start to randomly search the same conformational space at different initial positions s, and all threads can obtain a local minimum energy valley; and the local conformation samples obtained by sampling all threads are comprehensively processed, so that the probability of obtaining the actual lowest energy valley in the conformation space is increased, and the high-quality samples are obtained, thereby improving the prediction precision.
FIG. 2 shows the steps of the flow of the SMCP method for predicting RNA tertiary structure. An example of a selected RNA motif is l1_sam_ll_riboswitch (PDB number: 2QWY, motif length: 7, sequence: GCAGUCG). The input of the SMPP method is provided with two 3D structure files in the pdb format, one is the initial conformation of the l1_sam_ll_riboswitch motif, and the SMPP method is modeled on the basis of the structure; the other is the native conformation of the l1_sam_ll_riboswitch motif, i.e. the experimentally determined structure, compared with the structure predicted by the SMCP method for prediction accuracy analysis of the structure prediction method. In addition, 1 fasta sequence file, 1 flag command operation file, and the number of specified threads n=3, and the number of monte carlo times m=10000 are also required to be input. The output of the SMCP method is the RNA tertiary structure predicted by the method and the structure prediction precision. The following is a specific step of RNA tertiary structure prediction:
1. conformational sampling
The parallel mechanism and the 'Stepwise ansatz' are used to assume that the conformational space is subjected to efficient conformational sampling (the conformational space contains 7 nucleotides: GCAGUCG), on the premise of knowing the structure of the GCUCG, random operations such as adding, deleting, resampling and the like are performed on the nucleotides A and G, and a candidate conformational set is obtained through 10000 times of random Monte Carlo operations, and the sampling process is as follows (only by way of example).
Resampling success operation (9904 th sample is taken as an example):
(1) Modeling 1-2 4-7
(2) Modeling mobile nucleotide No. 4G linked to nucleotide No. 5U
(3) RMSD 1.512 (atom 23 of nucleotide G4) superimposed on atom 86 of nucleotide 1-2 5-7 (RMSD 0.0000007)
(4) Number of attempts: 10000, the successful times are 13;
resampling failure operation (9999 th sampling for example):
(1) Modeling 1-3 5-7
(2) Modeling mobile nucleotide 3A linked to nucleotide 2C
(3) RMSD 3.536 (22 atoms of nucleotide A No. 3) superimposed on the 86 atom of nucleotide 1-2 5-7 (RMSD 0.0000005)
(4) Number of attempts: 3092, number of successes is 20;
the addition of a failed operation (9998 th sampling for example) is standard consistent with resampling, so we just take the failed example:
(1) Modeling 1-3 5-7;
(2) When nucleotide G No. 4 is added, it is linked to nucleotide A No. 3;
(3) RMSD 5.777 (atom No. 27 of nucleotide a 3), superimposed to atom No. 86 of the other nucleotides (RMSD 0.0000008);
(4) Number of attempted addition positions: 100000, times of success, 17;
the delete failure operation (10000 samples are taken as an example), the standard is consistent with resampling, so only one example of failure is given:
(1) Modeling 1-3 5-7;
(2) Deleting nucleotide No. 3 a linked to nucleotide No. 2C;
(3) RMSD0.000, superimposed to atom number 86 of the other nucleotides (RMSD 0.0000003);
2. scoring of energy functions
The potential energy value of the candidate conformation obtained by sampling is calculated by utilizing an energy function, the potential energy value of the biological molecule is approximate to the Rosetta energy function value, and the Rosetta energy value is delta E according to the formula total =∑ i ω i E ii ,aa i ) Calculating the linear sum of all energy terms scaled by weight, wherein E i Is the energy term, ω i Is the weight of each energy term, Θ i Is the geometrical degree of freedom aa i Is a chemical identity; in addition, the calculation process needs to be based on a connection weight formulaTo calculate potential energy E of each energy term x Wherein E is x Is the potential energy value of the x energy term, and the calculation process comprises the following steps:
(1-1) calculating the energy of the atomic pair interactions. The atomic pair inter/intra interactions include: van der waals forces, electrostatic forces, solvation terms, hydrogen bonding forces, disulfide bonding forces. Energy terms embodying atomic pair interactions include: fa_rep, fa_intra_rep, fa_atr, fa_elec, fa_sol, lk_ball_wtd, hbond_sc, dslf_fal3, hbond_lr_bb, hbond_sr_bb, hbond_bb_sc;
(1-2) calculating the energy related to the torsion of the protein backbone and the side chains. The term indicating the torsion angle is: the pull-type diagram, the backbone design term, and the side chain conformation, the relevant energy terms include: rama_prepro, p_aa_pp, fa_ dun;
(1-3) calculating the energy of the torsion term (peptide bond dihedral angle) under special conditions. Related energy terms include omega, pro_close, yhh _planness;
(1-4) calculating the energy of the non-ideal bond length and angle (Cartesian product bond energy). The relevant energy terms include: a cart_bound;
(1-5) the energy terms of all energy functions under the Rosetta framework are the same, and the difference between different energy functions is the difference of the weight values of the energy terms; and calculating the weighted sum of all energy items in the steps to obtain potential energy values of candidate conformations.
After calculation of the energy function, the potential energy of different random operations is changed as follows:
resampling successful operating potential energy value change (9904 th sample is taken as an example): -5.247→ -7.460, potential energy value decrease (initial structural potential energy value: -5.247);
resampling failure operating potential energy value change (9999 th sampling for example): -2.184 → -3.170 potential energy value decrease (initial structural potential energy value: -2.184);
add failed operating potential value change (9998 th sample for example): 17.900 to-1.671, potential energy value decrease (initial structural potential energy value: 15.326);
deletion failure operation potential value change (taking 10000 th sampling as an example): 3.702 → -3.702, the potential energy value is unchanged (initial structural potential energy value: -3.702);
3. potential energy evaluation further determines conformation
The core of the potential energy judgment is to calculate the energy change delta E of the system. The change of potential energy value determines whether the operations of adding and deleting nucleotide, combining conformation, resampling and the like can be accepted;
wherein the metapolis standard exploits the concept of monte carlo. At the rise of energy, a random number α between 0 and 1 is generated and compared with exp (ΔE/kT), if α>exp (-DELTAE/kT) refuses the acceptance; otherwise, the method is accepted, and a real candidate conformation set is obtained.
And judging each operation, and finally selecting to accept or reject, wherein the potential energy judging process is as follows:
resampling success operation (9904 th sample is taken as an example):
(1) The inverse operation of resampling nucleotide G4 linked to nucleotide U5 is performed: resampling nucleotides 1-2 5-7 attached to nucleotide No. 4G;
(2) After execution, modeling is 1-2 4-7;
(3) Potential energy value change: -6.82358-7.46016, potential energy value is reduced;
(4) Is the monte carlo operation accepted? Acceptance (both original and reverse potential values reduced);
resampling failure operation (9999 th sampling for example):
(1) Performing the inverse of nucleotide a resampling number 3 linked to nucleotide C number 2: resampling nucleotide No. 3 a linked to nucleotide No. 2C;
(2) After execution, modeling is 1-3 5-7;
(3) Potential energy value change: -6.33765-3.16991, potential energy value increases;
(4) Is the monte carlo operation accepted? Refusal (original operating potential value decreases, reverse operating potential value increases);
add failure operation (9998 th sample, for example):
(1) Performing the reverse operation to delete nucleotide No. 4G linked to nucleotide No. 3 a;
(2) Modeling 1-7 after deleting;
(3) Potential energy value change: -6.33765-1.67202, potential energy value increases;
(4) Is the monte carlo operation accepted? Refusal (original operating potential value decreases, reverse operating potential value increases);
delete failure operation (taking sample 10000 as an example):
(1) Performing the reverse operation, adding nucleotide No. 3 a linked to nucleotide No. 2C;
(2) After execution, modeling is 1-2 5-7;
(3) Potential energy value change: -6.33765 → -3.70202 potential energy value decrease (initial structural potential energy value: -3.702, no change from initial value);
(4) Is the monte carlo operation accepted? Refusing (original operation potential value is unchanged, the reverse operation potential value is reduced, but the reverse operation potential value is consistent with the initial value);
4. multithreading comprehensive potential energy judgment
The conformation structure with the low potential energy value is more stable, so that all threads are synthesized to select the conformation with the lowest potential energy value as the current best candidate conformation. The conformational potential values obtained for the 3 threads made by l1_sam_ll_riboswitch motif are respectively: -11.133REU (REU: rosetta Energy Units), -10.123REU, -12.155REU, according to the principle: the lower the potential energy value of the structure, the more stable the structure, and the conformation with the lowest potential energy value, namely the conformation with the potential energy value of-12.155 REU, is selected.
5. Modeling accuracy calculation
RMSD is an important indicator describing the structural similarity of two conformations of a molecule; according to basic formulaTo calculate RMSD to describe modeling accuracy, where δ i Is the distance between atom i and the reference conformation or the average position of n equivalent atoms; a rigid overlay is typically performed during the calculation to minimize RMSD and then return this minimum value as the final precision value RMSD. At this time, it is required to use the formulaTo calculate RMSD where n, v represent given two points. The modeling precision of the l1_sam_ll_riboswitch die body is obtained by the calculation formula>Its potential energy value is-12.155 REU.
6. Modeling accuracy and integrity determination and processing
Performing accuracy judgment on the current optimal candidate conformation; judgingThe current best conformation modeling accuracy isThe precision requirement can be met, and the re-modeling is not needed;
integrity judgment is carried out on the current best candidate conformation, and judgment is carried out
The missing value of the current conformation is 0, which indicates that the SMCP method has completed complete modeling of the 7 nucleotides of GCAGUCG;
after 10000 modeling passes on l1_sam_ll_riboswitch motif are completed, the statistics related to the monte carlo random operation are as follows:
(1) The number of times of addition: 1095; acceptance rate: 0.2868;
(2) Number of deletions: 3968; acceptance rate: 0.0769;
(3) Number of resampling: 4937; acceptance rate: 0.4588;
and (3) adding the A and G nucleotides to the 24 th and 25 th positions on the A chain through random Monte Carlo parallelization sampling, and judging and processing the precision and the integrity, wherein the conformation is the final modeling result.
7. Conformational visualization analysis
The high-precision high-integrity structure obtained after modeling is subjected to visual analysis by using UCSF (unified control system) Chimer, and the conformation measured by an experiment and the conformation predicted by an SMCP (surface-controlled processing) method can be subjected to contrast analysis by using the UCSF Chimer. The comparison result is shown in fig. 3, wherein the diagram a is the experimental measurement structure of the l1_sam_ll_riboswitch motif, the diagram B is the structure of the l1_sam_ll_riboswitch motif predicted by the SMCP method, and as can be seen from the diagrams a and B in fig. 3, the RNA structure predicted by the SMCP method has extremely high similarity with the experimental measurement real structure; from the modeling result data, the RMSD of the SMCP method modeling the l1_sam_ll_riboswitch motif isWhereas the RMSD obtained by predicting SWM by the current best method using RNA tertiary structure under Rosetta frame is +.>The SMCP method is higher in accuracy than the SWM method in predicting the RNA tertiary structure.
Modeling a benchmark consisting of 9 RNAs using the SMCP and Rosetta framework SWM method, fig. 4 is a graph comparing RMSD results of three-level structure modeling of the benchmark using the SMCP method with the SWM method, where the abscissa is RMSD of three-level structure of RNA predicted using the SWM method, the ordinate is RMSD of three-level structure of RNA predicted using the SMCP method, and each dot/square in the graph represents one RNA motif.
As can be seen from fig. 4, when the structure prediction is performed on 9 RNA motifs in the reference, the RMSD value obtained by modeling by the SMCP method is lower, that is, the modeling precision is higher, which indicates that the SMCP method takes a dominant role in the high-precision modeling field of the RNA tertiary structure, and the prediction precision is higher when the SMCP method predicts the RNA tertiary structure;
in FIG. 4, 2 RNA motifs with black square marks exist, and when RNA tertiary structure prediction is performed by using the SWM method, complete modeling of the 2 RNA motifs cannot be realized; whereas complete modeling of all nucleotides in these 2 RNA motifs can be achieved using the SMCP method, indicating that the SMCP method predicts higher integrity of the prediction when predicting RNA tertiary structure.

Claims (2)

1. The RNA tertiary structure prediction method based on the parallel and Monte Carlo strategies is characterized by comprising the following steps:
(1) Initializing an RNA motif, and determining the parallel stroke number n and the Monte Carlo times m; n is a natural number greater than 1, m is a natural number of 200-50000;
(2) Constructing a conformational space for the RNA motif, performing efficient conformational sampling on the conformational space by using a parallel mechanism and a 'Stepwise ansatz' hypothesis, performing single nucleotide addition, deletion, merging and resampling operation, and performing multiple random operations to obtain a candidate conformational set;
(3) Calculating potential energy value of the candidate conformation obtained in the step (2) by using an energy function, wherein the biomolecule potential energy value is approximate to Rosetta energy function value, and Rosetta energy value is according to a formulaCalculating the linear sum of all energy terms scaled by weight, wherein +.>Is an energy item, +.>Is the weight of each energy term, +.>Is a degree of freedom of the geometry,is a chemical identity; in addition, the calculation process needs to be carried out according to the connection weight formula +.>To calculate potential energy of each energy item;
the step (3) of obtaining conformational potential energy value comprises the following steps:
(1-1) calculating energy of atomic pair interactions: the atomic pair inter/intra interactions include: van der waals forces, electrostatic forces, solvation terms, hydrogen bonding forces, disulfide bonding forces; energy terms embodying atomic pair interactions include:,/>,/>,/>,/>,/>,/>,/>,/>
(1-2) calculating the energy related to protein backbone and side chain torsion: the term indicating the torsion angle is: the pull-type diagram, the backbone design term, and the side chain conformation, the relevant energy terms include:,/>,/>
(1-3) calculating the energy of the dihedral angle of the peptide bond of the torsion term: the relevant energy term includes:,/>
(1-4) calculating the energy of the non-ideal bond length and the angular Cartesian product bond energy: the relevant energy terms include:
(1-5) the energy terms of all energy functions under the Rosetta framework are the same, and the difference between different energy functions is the difference of the weight values of the energy terms; according to the formula
Calculating the weighted sum of all energy items in the steps;
(4) Judging the conformational potential energy value obtained in the step (3), wherein after the random operation in the step (2), the change of the potential energy value determines whether the operations of adding and deleting nucleotides, conformational merging and resampling can be accepted or not; according to the standard:determining whether a random manipulation of the nucleotide is acceptable, wherein the metapolis criterion is defined by the formula:representing, obtaining a real candidate conformation set after preliminary potential energy judgment; alpha is a random number between 0 and 1;
(5) Further potential energy judgment is carried out on the candidate conformation set obtained in the step (4), and the conformation structure with a low potential energy value is more stable, so that the conformation with the lowest potential energy value is selected as the current best candidate conformation by integrating all threads;
(6) Performing precision calculation on the current best candidate conformation obtained in the step (5), wherein RMSD is an important index for describing structural similarity of two conformations of a molecule; according to the formulaTo calculate RMSD, thereby describing modeling accuracy, wherein +.>Is the distance between atom j and the average position of m equivalent atoms; in addition, a rigid overlay would be performed to minimize RMSD and then return the minimum as the final precision value; according to the formulaCalculation of>,/>Representing a given two points;
(7) Performing accuracy judgment on the current best candidate conformation obtained in the steps (5) - (6); when the predicted conformation and the experimentally determined conformation error are within 2 a, the predicted conformation is the native conformation, i.e., the modeling accuracy is required for RMSD2Å;
Therefore, the judgment is performed:
(8) Carrying out integrity judgment on the current best candidate conformation obtained in the step (7), and judging
(9) The conformation obtained in the step (8) is a conformation with high precision and high integrity, and the predicted final RNA tertiary structure is obtained.
2. The method for predicting the three-level structure of RNA based on the parallel and Monte Carlo strategies according to claim 1, wherein n in the step (1) has a value of 3 and m has a value of 10000.
CN202111428461.5A 2021-11-29 2021-11-29 RNA tertiary structure prediction method based on parallel and Monte Carlo strategies Active CN114121146B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111428461.5A CN114121146B (en) 2021-11-29 2021-11-29 RNA tertiary structure prediction method based on parallel and Monte Carlo strategies

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111428461.5A CN114121146B (en) 2021-11-29 2021-11-29 RNA tertiary structure prediction method based on parallel and Monte Carlo strategies

Publications (2)

Publication Number Publication Date
CN114121146A CN114121146A (en) 2022-03-01
CN114121146B true CN114121146B (en) 2023-10-03

Family

ID=80370758

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111428461.5A Active CN114121146B (en) 2021-11-29 2021-11-29 RNA tertiary structure prediction method based on parallel and Monte Carlo strategies

Country Status (1)

Country Link
CN (1) CN114121146B (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5241470A (en) * 1992-01-21 1993-08-31 The Board Of Trustees Of The Leland Stanford University Prediction of protein side-chain conformation by packing optimization
US5265030A (en) * 1990-04-24 1993-11-23 Scripps Clinic And Research Foundation System and method for determining three-dimensional structures of proteins
CN102479295A (en) * 2010-11-30 2012-05-30 中国科学院大连化学物理研究所 Method for computer to predict protein functions
CN103714265A (en) * 2013-12-23 2014-04-09 浙江工业大学 Method for predicting protein three-dimensional structure based on Monte Carlo local shaking and fragment assembly
CN103886225A (en) * 2012-12-21 2014-06-25 中国科学院大连化学物理研究所 Method for designing proteins on basis of polarizable force fields and pso (particle swarm optimization)
CN104537278A (en) * 2014-12-01 2015-04-22 中国人民解放军海军工程大学 Hardware acceleration method for predication of RNA second-stage structure with pseudoknot
CN107111691A (en) * 2014-10-27 2017-08-29 阿卜杜拉国王科技大学 The method and system of recognition ligand protein binding site
CN108804869A (en) * 2018-05-04 2018-11-13 深圳晶泰科技有限公司 Molecular structure based on neural network and chemical reaction energy function construction method
CN109101785A (en) * 2018-07-12 2018-12-28 浙江工业大学 A kind of Advances in protein structure prediction based on secondary structure similarity selection strategy
CN109448784A (en) * 2018-08-29 2019-03-08 浙江工业大学 A kind of Advances in protein structure prediction based on the selection of dihedral angle information auxiliary energy function
CN111180005A (en) * 2019-11-29 2020-05-19 浙江工业大学 Multi-modal protein structure prediction method based on niche resampling
CN111402964A (en) * 2020-03-19 2020-07-10 西南医科大学 Molecular conformation search method based on mixed firework algorithm
CN113257338A (en) * 2021-04-23 2021-08-13 浙江工业大学 Protein structure prediction method based on residue contact diagram information game mechanism
CN113539377A (en) * 2021-06-18 2021-10-22 中国人民解放军海军军医大学 Prediction method of cyclic aptamer tertiary structure of targeted biotoxin

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2002249545B2 (en) * 2002-03-26 2007-10-18 Council Of Scientific And Industrial Research Method for building optimal models of 3-dimensional molecular structures
WO2017011779A1 (en) * 2015-07-16 2017-01-19 Dnastar, Inc. Protein structure prediction system

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5265030A (en) * 1990-04-24 1993-11-23 Scripps Clinic And Research Foundation System and method for determining three-dimensional structures of proteins
US5241470A (en) * 1992-01-21 1993-08-31 The Board Of Trustees Of The Leland Stanford University Prediction of protein side-chain conformation by packing optimization
CN102479295A (en) * 2010-11-30 2012-05-30 中国科学院大连化学物理研究所 Method for computer to predict protein functions
CN103886225A (en) * 2012-12-21 2014-06-25 中国科学院大连化学物理研究所 Method for designing proteins on basis of polarizable force fields and pso (particle swarm optimization)
CN103714265A (en) * 2013-12-23 2014-04-09 浙江工业大学 Method for predicting protein three-dimensional structure based on Monte Carlo local shaking and fragment assembly
CN107111691A (en) * 2014-10-27 2017-08-29 阿卜杜拉国王科技大学 The method and system of recognition ligand protein binding site
CN104537278A (en) * 2014-12-01 2015-04-22 中国人民解放军海军工程大学 Hardware acceleration method for predication of RNA second-stage structure with pseudoknot
CN108804869A (en) * 2018-05-04 2018-11-13 深圳晶泰科技有限公司 Molecular structure based on neural network and chemical reaction energy function construction method
CN109101785A (en) * 2018-07-12 2018-12-28 浙江工业大学 A kind of Advances in protein structure prediction based on secondary structure similarity selection strategy
CN109448784A (en) * 2018-08-29 2019-03-08 浙江工业大学 A kind of Advances in protein structure prediction based on the selection of dihedral angle information auxiliary energy function
CN111180005A (en) * 2019-11-29 2020-05-19 浙江工业大学 Multi-modal protein structure prediction method based on niche resampling
CN111402964A (en) * 2020-03-19 2020-07-10 西南医科大学 Molecular conformation search method based on mixed firework algorithm
CN113257338A (en) * 2021-04-23 2021-08-13 浙江工业大学 Protein structure prediction method based on residue contact diagram information game mechanism
CN113539377A (en) * 2021-06-18 2021-10-22 中国人民解放军海军军医大学 Prediction method of cyclic aptamer tertiary structure of targeted biotoxin

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
FARFAR2: Improved De Novo Rosetta Prediction of Complex Global RNA Folds;Andrew Martin Watkins等;《Structure》;第28卷;第963–976页 *
Predicting Algorithm and Complexity in RNA Structure Based on BHG;Zhengdong Liu等;《2020 16th International Conference on Computational Intelligence and Security (CIS)》;第351-355页 *
不同添加元素对镁合金快凝过程微观结构和性能影响的模拟研究;黄昌雄;《中国优秀硕士学位论文全文数据库 工程科技I辑》(第(2019)01期);B022-280 *
基于多目标优化的蛋白质三维结构预测;王雨林等;《江苏科技大学学报( 自然科学版)》;第35卷(第4期);第66-74页 *
基于混沌模拟退火的RNA二级结构预测的研究;胥杰;《中国优秀硕士学位论文全文数据库 基础科学辑》(第(2011)03期);A006-82 *

Also Published As

Publication number Publication date
CN114121146A (en) 2022-03-01

Similar Documents

Publication Publication Date Title
Terwilliger SOLVE and RESOLVE: automated structure solution, density modification and model building
Zhang et al. Local energy landscape flattening: parallel hyperbolic Monte Carlo sampling of protein folding
Weitzner et al. The origin of CDR H3 structural diversity
Durham et al. Solvent accessible surface area approximations for rapid and accurate protein structure prediction
CN107609342B (en) Protein conformation search method based on secondary structure space distance constraint
Oettel et al. Free energies, vacancy concentrations, and density distribution anisotropies in hard-sphere crystals: A combined density functional and simulation study
Oeffner et al. On the application of the expected log-likelihood gain to decision making in molecular replacement
Lemak et al. Sequence specific resonance assignment via Multicanonical Monte Carlo search using an ABACUS approach
CN114121146B (en) RNA tertiary structure prediction method based on parallel and Monte Carlo strategies
CN109215733B (en) Protein structure prediction method based on residue contact information auxiliary evaluation
EP4200854A1 (en) Predicting protein structures by sharing information between multiple sequence alignments and pair embeddings
EP4196985A1 (en) Training protein structure prediction neural networks using reduced multiple sequence alignments
Standley et al. Tertiary structure prediction of mixed α/β proteins via energy minimization
Endo et al. Detection of molecular behavior that characterizes systems using a deep learning approach
AU2014280055A1 (en) Obtaining an improved therapeutic ligand
KR20230121880A (en) Prediction of complete protein expression from masked protein expression
KR20230125038A (en) Protein Amino Acid Sequence Prediction Using Generative Models Conditioned on Protein Structure Embedding
Tuvi-Arad et al. Improved algorithms for quantifying the near symmetry of proteins: complete side chains analysis
EP4205118A1 (en) Predicting protein structures over multiple iterations using recycling
Matsumoto et al. Quantitative analysis of protein dynamics using a deep learning technique combined with experimental cryo-EM density data and MD simulations
Alocci et al. Atom depth analysis delineates mechanisms of protein intermolecular interactions
Kinjo et al. Physicochemical evaluation of protein folds predicted by threading
Jadczyk et al. Examining protein folding process simulation and searching for common structure motifs in a protein family as experiments in the gridspace2 virtual laboratory
US20240153577A1 (en) Predicting symmetrical protein structures using symmetrical expansion transformations
Dobers et al. TUM Data Innovation Lab Munich Data Science Institute (MDSI) Technical University of Munich & MIT CS and AI Laboratory

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant