[summary of the invention]
The technical problem to be solved in the present invention is to provide a kind of efficient Method of Stochastic of protein thermodynamics analysis based on hybrid parallel mode.
The present invention adopts following technical scheme:
The efficient Method of Stochastic of protein thermodynamics analysis based on hybrid parallel mode, comprising:
Steps A: determine between protein energy model and energy range;
Step B: determine the segmented mode between described protein energy range;
Step C: simulation and the calculating protein system density of states.
Further, described steps A further comprises:
Adopt ECEPP protein energy model, the expression-form in the ECEPP energy field of force is:
E
ECEPP=E
C+E
LJ+E
HB+E
Tor
Wherein,
the coulomb acting force between two electric charges, r
ijrepresent the distance between atom i and j;
it is the Lan Na-Jones acting force between two atoms;
it is Hyarogen-bonding; E
tor=∑
lu
l(1 ± cos (n
lξ
l)) be dihedral turning effort power, ξ
ll dihedral.
Further, described steps A further comprises:
To carrying out discretize processing between used protein energy range, if get k energy bin interval value, to [E
min, E
max] on average divide k bin interval, with an energy value in the middle of each bin interval, represent energy interval value.
Further, described step B further comprises:
Step B1: to being equally divided into M section between energy range, it is interval that the registration between establishing between adjacent sub-energy range equals a Δ bin, and each section contains
individual bin is interval;
Step B2: according to logarithm S (E) characteristic distributions of the current protein system Density function calculating, adaptively to segmentation between energy range, if be [E between certain sub-energy range
begin, E
end],
Further, described step C further comprises:
The multi-threaded parallel pattern of principal and subordinate's process mode and OpenMP by MPI, simulation and calculate the protein system density of states.
Further, in N minute process of described principal and subordinate's process mode, a minute process 1 is host process, and within all the other minute, process is subprocess.
Further, described host process comprises the steps:
Step S11: logarithm S (the E)=lng (E)=0 of initialization protein system Density function, histogram H (E)=0(E
min≤ E≤E
max), modifying factor df=1 (=lnf=lne);
Step S12:s=1;
Step S13: according to the segmented mode between determined protein energy range by (E between energy range
min≤ E≤E
max) be divided into M section, and be assigned in M minute thread t=1;
Step S14: in each minute thread, carry out random fluctuation in being limited between corresponding sub-energy range to original configuration, produce new configuration, calculating energy E
new, according to Metropolis criterion, determine the received probability of new configuration, t=t+1;
Described step S14 circulation tmax time;
Step S15: all thread intercommunications, comprehensively obtain S (E) and the H (E) in whole interval, s=s+1;
Described step S14 and S15 circulation smax time;
Step S16: all process intercommunications, host process is collected all S from process
tmpand H (E)
tmp(E) and accumulation calculating go out overall S (E) and H (E), the overall situation S (E)=S (E)+all S from process
tmp(E), H (E)=H (E)+all H from process of the overall situation
tmp(E), being broadcast to of overall S (E) and H (E) is all from process, the mild condition of judgement histogram:
If do not meet, return to execution step S12 and continue iteration; If satisfied perform step S17;
Step S17: change modifying factor df, then return to execution step S12 and continue iteration, until meet procedure termination condition
wherein
try to achieve S (E), obtain the density of states g that protein system is relative (E)=e
s (E).
Further, in described step S14, according to Metropolis criterion, determine that the received probability of new configuration further comprises:
If accept new configuration:
S(E
new)=S(E
new)+df,H(E
new)=H(E
new)+1;
Otherwise:
S(E
old)=S(E
old)+df,H(E
old)=H(E
old)+1。
Further, describedly from process, comprise the steps:
Step S21: logarithm S (the E)=lng (E)=0 of initialization protein system Density function, S
tmp(E)=lng
tmp(E)=0, histogram H (E)=0, H
tmp(E)=0(E
min≤ E≤E
max), modifying factor df=1 (=lnf=lne);
Step S22:s=1;
Step S23: according to the segmented mode between determined protein energy range by (E between energy range
min≤ E≤E
max) be divided into M section, and be assigned in M minute thread t=1;
Step S24: in each minute thread, carry out random fluctuation in being limited between corresponding sub-energy range to original configuration, produce new configuration, calculating energy E
new, according to Metropolis criterion, determine the received probability of new configuration, t=t+1;
Described step S24 circulation tmax time;
Step S25: all thread intercommunications, comprehensively obtain the S in whole interval
tmpand H (E)
tmp(E), s=s+1;
Described step S24 and S25 circulation smax time;
Step S26: all process intercommunications, from process by S
tmpand H (E)
tmp(E) send to host process, then receive the overall S (E) and the H (E) that through host process, calculate and upgrade original S (E) and H (E), by S
tmpand H (E)
tmp(E) be initialized as 0, the mild condition of judgement histogram:
If do not meet, return to execution step S22 and continue iteration; If satisfied perform step S27;
Step S27: change modifying factor df, then return to execution step S22 and continue iteration, until meet procedure termination condition
wherein
Further, in described step S24, according to Metropolis criterion, determine that the received probability of new configuration further comprises:
If accept new configuration:
S(E
new)=S(E
new)+df,H(E
new)=H(E
new)+1,
S
tmp(E
new)=S
tmp(E
new)+df,H
tmp(E
new)=H
tmp(E
new)+1;
Otherwise:
S(E
old)=S(E
old)+df,H(E
old)=H(E
old)+1,
S
tmp(E
old)=S
tmp(E
old)+df,H
tmp(E
old)=H
tmp(E
old)+1。
Further, in described step S17 and S27, the mode that changes modifying factor f is:
First carry out continuously the f=f of N iteration
α(0< α <1), then carry out 1 iteration
Repeatedly repeat aforesaid way.
Compared with prior art, beneficial effect of the present invention is:
Compare with classical WangLandau algorithm, the present invention uses the renewal modifying factor mode based on mechanism of anneal can improve computational accuracy and speed, utilize a kind of flexibly between energy range segmented mode can make a minute cross-thread load balancing, adopt the hybrid parallel mode of MPI+OpenMP Hybrid paradigm can greatly accelerate simulation and computing velocity.Adopt method provided by the invention, can analyze efficiently and the folding whole thermodynamic process of Study on Protein, and then protein folding procedure is explored and studied.
[embodiment]
In order to make object of the present invention, technical scheme and advantage clearer, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein, only in order to explain the present invention, is not intended to limit the present invention.
In addition,, in each embodiment of described the present invention, involved technical characterictic just can not combine mutually as long as do not form each other conflict.
The invention provides a kind of efficient Method of Stochastic of protein thermodynamics analysis based on hybrid parallel mode, as shown in Figure 1, the method comprises:
Steps A: determine between protein energy model and energy range;
Step B: determine the segmented mode between protein energy range;
Step C: simulation and the calculating protein system density of states.
In steps A, can adopt ECEPP protein energy model, the expression-form in the ECEPP energy field of force is:
E
ECEPP=E
C+E
LJ+E
HB+E
Tor
Wherein,
the coulomb acting force between two electric charges, r
ijrepresent the distance between atom i and j;
it is the Lan Na-Jones acting force between two atoms;
it is Hyarogen-bonding; E
tor=∑
lu
l(1 ± cos (n
lξ
l)) be dihedral turning effort power, ξ
ll dihedral.What adopt due to the ECEPP energy field of force is angle coordinate system, so its counting yield is higher than other protein models based on cartesian coordinate system.
For the ease of computer simulation emulation, also can be to carrying out discretize processing between used protein energy range, if get k energy bin interval value, to [E
min, E
max] on average divide k bin interval, with an energy value in the middle of each bin interval, represent energy interval value.
In step B, in order to make a minute cross-thread load balancing, can adopt to have as follows a kind of segmented mode between energy range flexibly of self-adaptation feature:
Step B1: to being equally divided into M section between energy range, it is interval that the registration between establishing between adjacent sub-energy range equals a Δ bin, and each section contains
individual bin is interval;
Wherein, if the registration between between adjacent sub-energy range equals Δ, a bin is interval, that needing has suitable registration between adjacent sub-energy range in order comprehensively to obtain S (E) and the H (E) (i.e. S (E) and H (E) between comprehensive all sub-energy range) in whole interval.
Step B2: according to logarithm S (E) characteristic distributions of the current protein system Density function calculating, adaptively to segmentation between energy range, if be [E between certain sub-energy range
begin, E
end],
Wherein, usually, logarithm S (the E)=lng (E) of protein system Density function is concave function on monotonically increasing, need make each section adaptively to segmentation between energy range
equilibrium is even also [E between certain sub-energy range
begin, E
end],
as shown in Figure 2.
In step C, the multi-threaded parallel pattern of principal and subordinate's process mode and OpenMP by MPI, simulation and calculate the protein system density of states; In N minute process of principal and subordinate's process mode, a minute process 1 is host process, and within all the other minute, process is subprocess.
As shown in Figure 3,
host process comprises the steps:
Step S11: logarithm S (the E)=lng (E)=0 of initialization protein system Density function, histogram H (E)=0(E
min≤ E≤E
max), modifying factor df=1 (=lnf=lne);
Step S12:s=1;
Step S13: according to the segmented mode between determined protein energy range by (E between energy range
min≤ E≤E
max) be divided into M section, and be assigned in M minute thread t=1;
Step S14: in each minute thread, carry out random fluctuation in being limited between corresponding sub-energy range to original configuration, produce new configuration, calculating energy E
new, according to Metropolis criterion, determine the received probability of new configuration: (referred to as MCS step)
If accept new configuration:
S(E
new)=S(E
new)+df,H(E
new)=H(E
new)+1;
Otherwise:
S(E
old)=S(E
old)+df,H(E
old)=H(E
old)+1。
T=t+1, described step S14 circulation tmax time (also passing through (as 10 times) MCS step tmax time);
Step S15: all thread intercommunications, comprehensively obtain S (E) and the H (E) in whole interval, s=s+1;
Described step S14 and S15 circulation smax time (as 100 times);
Step S16: all process intercommunications, host process is collected all S from process
tmpand H (E)
tmp(E) and accumulation calculating go out overall S (E) and H (E), the overall situation S (E)=S (E)+all S from process
tmp(E), H (E)=H (E)+all H from process of the overall situation
tmp(E), being broadcast to of overall S (E) and H (E) is all from process, the mild condition of judgement histogram:
If do not meet, return to execution step S12 and continue iteration; If satisfied perform step S17;
Step S17: change modifying factor df, then return to execution step S12 and continue iteration, until meet procedure termination condition
(be also
as
desirable 0.0001); Try to achieve S (E), obtain the density of states g that protein system is relative (E)=e
s (E).The mode that changes modifying factor f is:
First carry out continuously the f=f of N iteration
α(0< α <1), then carry out 1 iteration
Repeatedly repeat aforesaid way.
from process, comprise the steps:
Step S21: logarithm S (the E)=lng (E)=0 of initialization protein system Density function, S
tmp(E)=lng
tmp(E)=0, histogram H (E)=0, H
tmp(E)=0(E
min≤ E≤E
max), modifying factor df=1 (=lnf=lne);
Step S22:s=1;
Step S23: according to the segmented mode between determined protein energy range by (E between energy range
min≤ E≤E
max) be divided into M section, and be assigned in M minute thread t=1;
Step S24: in each minute thread, carry out random fluctuation in being limited between corresponding sub-energy range to original configuration, produce new configuration, calculating energy E
new, according to Metropolis criterion, determine the received probability of new configuration: (referred to as MCS step)
If accept new configuration:
S(E
new)=S(E
new)+df,H(E
new)=H(E
new)+1,
S
tmp(E
new)=S
tmp(E
new)+df,H
tmp(E
new)=H
tmp(E
new)+
1;
Otherwise:
S(E
old)=S(E
old)+df,H(E
old)=H(E
old)+1,
S
tmp(E
old)=S
tmp(E
old)+df,H
tmp(E
old)=H
tmp(E
old)+1。
T=t+1, described step S24 circulation tmax time (passing through (as 10 times) MCS step tmax time);
Step S25: all thread intercommunications, comprehensively obtain the S in whole interval
tmpand H (E)
tmp(E), s=s+1;
Described step S24 and S25 circulation smax time (as 100 times);
Step S26: all process intercommunications, from process by S
tmpand H (E)
tmp(E) send to host process, then receive the overall S (E) and the H (E) that through host process, calculate and upgrade original S (E) and H (E), by S
tmpand H (E)
tmp(E) be initialized as 0, the mild condition of judgement histogram:
If do not meet, return to execution step S22 and continue iteration; If satisfied perform step S27;
Step S27: change modifying factor df, then return to execution step S22 and continue iteration, until meet procedure termination condition
(be also
as
desirable 0.0001).The mode that changes modifying factor f is:
First carry out continuously the f=f of N iteration
α(0< α <1), then carry out 1 iteration
Repeatedly repeat aforesaid way.
The present invention compares with classical WangLandau algorithm, the renewal modifying factor mode of use based on mechanism of anneal can improve computational accuracy and speed, utilize a kind of flexibly between energy range segmented mode can make a minute cross-thread load balancing, adopt the hybrid parallel mode of MPI+OpenMP Hybrid paradigm can greatly accelerate simulation and computing velocity.MPI+OpenMP Hybrid paradigm can make full use of the advantage of these two kinds of programming modes, be that MPI can solve the coarseness communication between multiprocessor process, and OpenMP provide lightweight thread can solve well mutual between inner each processor of each multiprocessor computer.
The present invention can effectively simulate and calculate the protein system density of states, can further solve a lot of thermokinetics amounts of obtaining in broad temperature range as specific heat etc., therefore the whole thermodynamic process that energy Study on Protein folds, and then protein folding procedure is explored and studied.
One of ordinary skill in the art will appreciate that all or part of step in the whole bag of tricks of embodiment is to come the hardware that instruction is relevant to complete by program, this program can be stored in a computer-readable recording medium, storage medium can comprise: ROM (read-only memory) (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), disk or CD etc.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, all any modifications of doing within the spirit and principles in the present invention, be equal to and replace and improvement etc., within all should being included in protection scope of the present invention.