CN103729577A

CN103729577A - Protein thermodynamic analysis high-efficiency stochastic simulation method based on hybrid parallel mode

Info

Publication number: CN103729577A
Application number: CN201310683507.7A
Authority: CN
Inventors: 彭丰斌; 魏彦杰; 张慧玲; 弓英瑛
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Hongzhituoxin Venture Capital Enterprise LP
Priority date: 2013-12-12
Filing date: 2013-12-12
Publication date: 2014-04-16
Anticipated expiration: 2033-12-12
Also published as: CN103729577B

Abstract

The invention relates to the technical field of biological information analysis and provides a protein thermodynamic analysis high-efficiency stochastic simulation method based on a hybrid parallel mode. The method includes step A, determining a protein energy model and an energy range; step B, determining a sectioning mode of the protein energy range; step C, simulating and calculating protein system state density. By adopting the method, the whole thermodynamic process of protein folding can be analyzed and studied efficiently so as to explore and study the protein folding process.

Description

The efficient Method of Stochastic of protein thermodynamics analysis based on hybrid parallel mode

[technical field]

The present invention relates to analysis of biological information technical field, particularly relate to a kind of efficient Method of Stochastic of protein thermodynamics analysis based on hybrid parallel mode.

[background technology]

How the main Study on Protein of protein folding is folded into natural three-dimensional structure from one dimension polypeptied chain at short notice, forms the large molecule with vital functions.The hereditary information of biosome (DNA) passes to protein (being central dogma) by rna transcription and translation process, therefore protein folding is also referred to as the second genetic code, and its research can help to disclose the expression of life hereditary information and the secret that function is transmitted.The folding process from one dimension polypeptied chain to natural three-dimensional structure, protein can occur that mistake is folding or assemble, and therefore its 26S Proteasome Structure and Function is damaged, thereby causes ' folding sick ', such as senile dementia etc.Therefore protein folding research multiple to exploring ' folding sick ' mechanism is significant.

At present, the folding algorithm great majority of Study on Protein are all realized in molecular dynamics simulation and stochastic simulation.Generally speaking, molecular dynamics simulation is usually used in the dynamic process of Study on Protein system; Stochastic simulation can Study on Protein system whole thermodynamic process.For the simulation of using the full atom protein model of pinpoint accuracy, need to calculate the multiple interaction force between thousands of atoms, for molecular dynamics simulation, can only simulate the protein folding procedure of nanosecond, therefore it has significant limitation to millisecond in research of protein folding in time in microsecond; In addition, molecular dynamics simulation is also subject to the impact of an initial experiment configuration.And stochastic simulation not only can be studied to the protein folding in the millisecond time for microsecond, and do not rely on a concrete initial configuration, can search for more widely configuration space.

Classical WangLandau algorithm is exactly the most attractive new algorithm that has development sight most in stochastic simulation field, and it can solve a lot of challenges in a plurality of fields such as bioinformatics, statistical physics.Such as in protein folding research, this algorithm has two advantages the most significant: the first, and protein simulation can not be confined to local least energy state, thereby can between whole energy range, carry out preferably free walking; The second, by this algorithm, can simulate and calculate the protein system density of states, thereby just can further solve a lot of thermokinetics amounts of obtaining in broad temperature range as specific heat etc., so just can analyze efficiently and whole thermodynamic process that Study on Protein is folding.But WangLandau algorithm needs further to promote in computational accuracy and speed.

Given this, overcoming the existing defect of the prior art is the art problem demanding prompt solution.

[summary of the invention]

The technical problem to be solved in the present invention is to provide a kind of efficient Method of Stochastic of protein thermodynamics analysis based on hybrid parallel mode.

The present invention adopts following technical scheme:

The efficient Method of Stochastic of protein thermodynamics analysis based on hybrid parallel mode, comprising:

Steps A: determine between protein energy model and energy range;

Step B: determine the segmented mode between described protein energy range;

Step C: simulation and the calculating protein system density of states.

Further, described steps A further comprises:

Adopt ECEPP protein energy model, the expression-form in the ECEPP energy field of force is:

E _ECEPP=E _C+E _LJ+E _HB+E _Tor

Wherein,

the coulomb acting force between two electric charges, r _ijrepresent the distance between atom i and j;

it is the Lan Na-Jones acting force between two atoms;

it is Hyarogen-bonding; E _tor=∑ _lu _l(1 ± cos (n _lξ _l)) be dihedral turning effort power, ξ _ll dihedral.

Further, described steps A further comprises:

To carrying out discretize processing between used protein energy range, if get k energy bin interval value, to [E _min, E _max] on average divide k bin interval, with an energy value in the middle of each bin interval, represent energy interval value.

Further, described step B further comprises:

Step B1: to being equally divided into M section between energy range, it is interval that the registration between establishing between adjacent sub-energy range equals a Δ bin, and each section contains

individual bin is interval;

Step B2: according to logarithm S (E) characteristic distributions of the current protein system Density function calculating, adaptively to segmentation between energy range, if be [E between certain sub-energy range _begin, E _end],

&dtri; S (E) = S (E_{end}) - S (E_{begin}) .

Further, described step C further comprises:

The multi-threaded parallel pattern of principal and subordinate's process mode and OpenMP by MPI, simulation and calculate the protein system density of states.

Further, in N minute process of described principal and subordinate's process mode, a minute process 1 is host process, and within all the other minute, process is subprocess.

Further, described host process comprises the steps:

Step S11: logarithm S (the E)=lng (E)=0 of initialization protein system Density function, histogram H (E)=0(E _min≤ E≤E _max), modifying factor df=1 (=lnf=lne);

Step S12:s=1;

Step S13: according to the segmented mode between determined protein energy range by (E between energy range _min≤ E≤E _max) be divided into M section, and be assigned in M minute thread t=1;

Step S14: in each minute thread, carry out random fluctuation in being limited between corresponding sub-energy range to original configuration, produce new configuration, calculating energy E _new, according to Metropolis criterion, determine the received probability of new configuration, t=t+1;

Described step S14 circulation tmax time;

Step S15: all thread intercommunications, comprehensively obtain S (E) and the H (E) in whole interval, s=s+1;

Described step S14 and S15 circulation smax time;

Step S16: all process intercommunications, host process is collected all S from process _tmpand H (E) _tmp(E) and accumulation calculating go out overall S (E) and H (E), the overall situation S (E)=S (E)+all S from process _tmp(E), H (E)=H (E)+all H from process of the overall situation _tmp(E), being broadcast to of overall S (E) and H (E) is all from process, the mild condition of judgement histogram:

\frac{\max (H (E)) - \min (H (E))}{\max (H (E)) + \min (H (E))} < φ (0 < φ < 1)

If do not meet, return to execution step S12 and continue iteration; If satisfied perform step S17;

Step S17: change modifying factor df, then return to execution step S12 and continue iteration, until meet procedure termination condition

wherein

try to achieve S (E), obtain the density of states g that protein system is relative (E)=e ^{s (E)}.

Further, in described step S14, according to Metropolis criterion, determine that the received probability of new configuration further comprises:

P (old &RightArrow; new) = \min (1, e^{- [S (E_{new}) - S (E_{old})]})

If accept new configuration:

S(E _new)=S(E _new)+df,H(E _new)=H(E _new)+1；

Otherwise:

S(E _old)=S(E _old)+df,H(E _old)=H(E _old)+1。

Further, describedly from process, comprise the steps:

Step S21: logarithm S (the E)=lng (E)=0 of initialization protein system Density function, S _tmp(E)=lng _tmp(E)=0, histogram H (E)=0, H _tmp(E)=0(E _min≤ E≤E _max), modifying factor df=1 (=lnf=lne);

Step S22:s=1;

Step S23: according to the segmented mode between determined protein energy range by (E between energy range _min≤ E≤E _max) be divided into M section, and be assigned in M minute thread t=1;

Step S24: in each minute thread, carry out random fluctuation in being limited between corresponding sub-energy range to original configuration, produce new configuration, calculating energy E _new, according to Metropolis criterion, determine the received probability of new configuration, t=t+1;

Described step S24 circulation tmax time;

Step S25: all thread intercommunications, comprehensively obtain the S in whole interval _tmpand H (E) _tmp(E), s=s+1;

Described step S24 and S25 circulation smax time;

Step S26: all process intercommunications, from process by S _tmpand H (E) _tmp(E) send to host process, then receive the overall S (E) and the H (E) that through host process, calculate and upgrade original S (E) and H (E), by S _tmpand H (E) _tmp(E) be initialized as 0, the mild condition of judgement histogram:

\frac{\max (H (E)) - \min (H (E))}{\max (H (E)) + \min (H (E))} < φ (0 < φ < 1)

If do not meet, return to execution step S22 and continue iteration; If satisfied perform step S27;

Step S27: change modifying factor df, then return to execution step S22 and continue iteration, until meet procedure termination condition

wherein

Further, in described step S24, according to Metropolis criterion, determine that the received probability of new configuration further comprises:

P (old &RightArrow; new) = \min (1, e^{- [S (E_{new}) - S (E_{old})]})

If accept new configuration:

S(E _new)=S(E _new)+df,H(E _new)=H(E _new)+1,

S _tmp(E _new)=S _tmp(E _new)+df,H _tmp(E _new)=H _tmp(E _new)+1；

Otherwise:

S(E _old)=S(E _old)+df,H(E _old)=H(E _old)+1,

S _tmp(E _old)=S _tmp(E _old)+df,H _tmp(E _old)=H _tmp(E _old)+1。

Further, in described step S17 and S27, the mode that changes modifying factor f is:

First carry out continuously the f=f of N iteration ^α(0< α <1), then carry out 1 iteration

Repeatedly repeat aforesaid way.

Compared with prior art, beneficial effect of the present invention is:

Compare with classical WangLandau algorithm, the present invention uses the renewal modifying factor mode based on mechanism of anneal can improve computational accuracy and speed, utilize a kind of flexibly between energy range segmented mode can make a minute cross-thread load balancing, adopt the hybrid parallel mode of MPI+OpenMP Hybrid paradigm can greatly accelerate simulation and computing velocity.Adopt method provided by the invention, can analyze efficiently and the folding whole thermodynamic process of Study on Protein, and then protein folding procedure is explored and studied.

[accompanying drawing explanation]

Fig. 1 is the efficient Method of Stochastic process flow diagram of the protein thermodynamics analysis of the embodiment of the present invention based on hybrid parallel mode;

Fig. 2 is the segmented mode schematic diagram between protein energy range in the embodiment of the present invention;

Fig. 3 is simulation and the hybrid parallel method flow diagram that calculates the protein system density of states.

[embodiment]

In order to make object of the present invention, technical scheme and advantage clearer, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein, only in order to explain the present invention, is not intended to limit the present invention.

In addition,, in each embodiment of described the present invention, involved technical characterictic just can not combine mutually as long as do not form each other conflict.

The invention provides a kind of efficient Method of Stochastic of protein thermodynamics analysis based on hybrid parallel mode, as shown in Figure 1, the method comprises:

Steps A: determine between protein energy model and energy range;

Step B: determine the segmented mode between protein energy range;

Step C: simulation and the calculating protein system density of states.

In steps A, can adopt ECEPP protein energy model, the expression-form in the ECEPP energy field of force is:

E _ECEPP=E _C+E _LJ+E _HB+E _Tor

Wherein,

it is the Lan Na-Jones acting force between two atoms;

it is Hyarogen-bonding; E _tor=∑ _lu _l(1 ± cos (n _lξ _l)) be dihedral turning effort power, ξ _ll dihedral.What adopt due to the ECEPP energy field of force is angle coordinate system, so its counting yield is higher than other protein models based on cartesian coordinate system.

For the ease of computer simulation emulation, also can be to carrying out discretize processing between used protein energy range, if get k energy bin interval value, to [E _min, E _max] on average divide k bin interval, with an energy value in the middle of each bin interval, represent energy interval value.

In step B, in order to make a minute cross-thread load balancing, can adopt to have as follows a kind of segmented mode between energy range flexibly of self-adaptation feature:

individual bin is interval;

Wherein, if the registration between between adjacent sub-energy range equals Δ, a bin is interval, that needing has suitable registration between adjacent sub-energy range in order comprehensively to obtain S (E) and the H (E) (i.e. S (E) and H (E) between comprehensive all sub-energy range) in whole interval.

&dtri; S (E) = S (E_{end}) - S (E_{begin}) .

Wherein, usually, logarithm S (the E)=lng (E) of protein system Density function is concave function on monotonically increasing, need make each section adaptively to segmentation between energy range

equilibrium is even also [E between certain sub-energy range _begin, E _end],

as shown in Figure 2.

In step C, the multi-threaded parallel pattern of principal and subordinate's process mode and OpenMP by MPI, simulation and calculate the protein system density of states; In N minute process of principal and subordinate's process mode, a minute process 1 is host process, and within all the other minute, process is subprocess.

As shown in Figure 3, host process comprises the steps:

Step S12:s=1;

Step S14: in each minute thread, carry out random fluctuation in being limited between corresponding sub-energy range to original configuration, produce new configuration, calculating energy E _new, according to Metropolis criterion, determine the received probability of new configuration: (referred to as MCS step)

P (old &RightArrow; new) = \min (1, e^{- [S (E_{new}) - S (E_{old})]})

If accept new configuration:

S(E _new)=S(E _new)+df,H(E _new)=H(E _new)+1；

Otherwise:

S(E _old)=S(E _old)+df,H(E _old)=H(E _old)+1。

T=t+1, described step S14 circulation tmax time (also passing through (as 10 times) MCS step tmax time);

Described step S14 and S15 circulation smax time (as 100 times);

\frac{\max (H (E)) - \min (H (E))}{\max (H (E)) + \min (H (E))} < φ (0 < φ < 1)

(be also

as

desirable 0.0001); Try to achieve S (E), obtain the density of states g that protein system is relative (E)=e ^{s (E)}.The mode that changes modifying factor f is:

Repeatedly repeat aforesaid way.

from process, comprise the steps:

Step S22:s=1;

Step S24: in each minute thread, carry out random fluctuation in being limited between corresponding sub-energy range to original configuration, produce new configuration, calculating energy E _new, according to Metropolis criterion, determine the received probability of new configuration: (referred to as MCS step)

P (old &RightArrow; new) = \min (1, e^{- [S (E_{new}) - S (E_{old})]})

If accept new configuration:

S(E _new)=S(E _new)+df,H(E _new)=H(E _new)+1,

S _tmp(E _new)=S _tmp(E _new)+df,H _tmp(E _new)=H _tmp(E _new)+ ¹；

Otherwise:

S(E _old)=S(E _old)+df,H(E _old)=H(E _old)+1,

S _tmp(E _old)=S _tmp(E _old)+df,H _tmp(E _old)=H _tmp(E _old)+1。

T=t+1, described step S24 circulation tmax time (passing through (as 10 times) MCS step tmax time);

Described step S24 and S25 circulation smax time (as 100 times);

\frac{\max (H (E)) - \min (H (E))}{\max (H (E)) + \min (H (E))} < φ (0 < φ < 1)

(be also

as

desirable 0.0001).The mode that changes modifying factor f is:

Repeatedly repeat aforesaid way.

The present invention compares with classical WangLandau algorithm, the renewal modifying factor mode of use based on mechanism of anneal can improve computational accuracy and speed, utilize a kind of flexibly between energy range segmented mode can make a minute cross-thread load balancing, adopt the hybrid parallel mode of MPI+OpenMP Hybrid paradigm can greatly accelerate simulation and computing velocity.MPI+OpenMP Hybrid paradigm can make full use of the advantage of these two kinds of programming modes, be that MPI can solve the coarseness communication between multiprocessor process, and OpenMP provide lightweight thread can solve well mutual between inner each processor of each multiprocessor computer.

The present invention can effectively simulate and calculate the protein system density of states, can further solve a lot of thermokinetics amounts of obtaining in broad temperature range as specific heat etc., therefore the whole thermodynamic process that energy Study on Protein folds, and then protein folding procedure is explored and studied.

One of ordinary skill in the art will appreciate that all or part of step in the whole bag of tricks of embodiment is to come the hardware that instruction is relevant to complete by program, this program can be stored in a computer-readable recording medium, storage medium can comprise: ROM (read-only memory) (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), disk or CD etc.

The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, all any modifications of doing within the spirit and principles in the present invention, be equal to and replace and improvement etc., within all should being included in protection scope of the present invention.

Claims

1. the efficient Method of Stochastic of protein thermodynamics analysis based on hybrid parallel mode, is characterized in that, comprising:

Steps A: determine between protein energy model and energy range;

Step B: determine the segmented mode between described protein energy range;

Step C: simulation and the calculating protein system density of states.

2. the method for claim 1, is characterized in that, described steps A further comprises:

E _ECEPP=E _C+E _LJ+E _HB+E _Tor

Wherein,

it is the Lan Na-Jones acting force between two atoms;

3. the method for claim 1, is characterized in that, described steps A further comprises:

4. method as claimed in claim 3, is characterized in that, described step B further comprises:

individual bin is interval;

&dtri; S (E) = S (E_{end}) - S (E_{begin}) .

5. the method for claim 1, is characterized in that, described step C further comprises:

6. method as claimed in claim 5, is characterized in that, in N minute process of described principal and subordinate's process mode, a minute process 1 is host process, and within all the other minute, process is subprocess.

7. method as claimed in claim 6, is characterized in that, described host process comprises the steps:

Step S12:s=1;

Described step S14 circulation tmax time;

Described step S14 and S15 circulation smax time;

\frac{\max (H (E)) - \min (H (E))}{\max (H (E)) + \min (H (E))} < φ (0 < φ < 1)

wherein

8. method as claimed in claim 7, is characterized in that, in described step S14, according to Metropolis criterion, determines that the received probability of new configuration further comprises:

P (old &RightArrow; new) = \min (1, e^{- [S (E_{new}) - S (E_{old})]})

If accept new configuration:

S(E _new)=S(E _new)+df,H(E _new)=H(E _new)+1；

Otherwise:

S(E _old)=S(E _old)+df,H(E _old)=H(E _old)+1。

9. method as claimed in claim 6, is characterized in that, describedly from process, comprises the steps:

Step S22:s=1;

Described step S24 circulation tmax time;

Described step S24 and S25 circulation smax time;

\frac{\max (H (E)) - \min (H (E))}{\max (H (E)) + \min (H (E))} < φ (0 < φ < 1)

wherein

10. method as claimed in claim 9, is characterized in that, in described step S24, according to Metropolis criterion, determines that the received probability of new configuration further comprises:

P (old &RightArrow; new) = \min (1, e^{- [S (E_{new}) - S (E_{old})]})

If accept new configuration:

S(E _new)=S(E _new)+df,H(E _new)=H(E _new)+1,

S _tmp(E _new)=S _tmp(E _new)+df,H _tmp(E _new)=H _tmp(E _new)+1；

Otherwise:

S(E _old)=S(E _old)+df,H(E _old)=H(E _old)+1,

S _tmp(E _old)=S _tmp(E _old)+df,H _tmp(E _old)=H _tmp(E _old)+1。

11. methods as described in claim 7 or 9, is characterized in that, in described step S17 and S27, the mode that changes modifying factor f is:

Repeatedly repeat aforesaid way.