CN105787227A

CN105787227A - Multi-GPU molecular dynamics simulation method for structural material radiation damage

Info

Publication number: CN105787227A
Application number: CN201610311112.8A
Authority: CN
Inventors: 杨磊; 王苍龙; 高笑菲; 田园; 祁美玲
Original assignee: Institute of Modern Physics of CAS
Current assignee: Institute of Modern Physics of CAS
Priority date: 2016-05-11
Filing date: 2016-05-11
Publication date: 2016-07-20
Anticipated expiration: 2036-05-11
Also published as: CN105787227B

Abstract

The invention discloses a multi-GPU molecular dynamics simulation method for structural material radiation damage. The method comprises the following steps: initializing; dynamically dividing grids for each node; performing inter-node communication; establishing a sorting cellular list on GPU; updating time step; finding the corresponding relationship between particle and grid number according to the coordinate of the particle; forecasting the displacement, speed and accelerated speed of the particle; calculating the stress of each particle; utilizing the stress to rectify the displacement, speed and accelerated speed of the particle; ensuring the constant temperature of the system according to the ensemble correcting speed; utilizing a periodic boundary condition to correct the position of the particle; storing a current calculation result; and iteratively executing the above steps till reaching preset step number. The method can be utilized to high-efficiently and conveniently simulate the material radiation damage process on larger spatial and temporal scale and to explain the long-time evolution law of the radiation damage at micro-scale.

Description

Many GPU molecular dynamics simulation of structural material irradiation damage

Technical field

It relates to molecular dynamics (MD) imitation technology field, it is specifically related to the Molecular Dynamics method of a kind of irradiation damage process using multiple Graphics Processing Unit (GPU) accelerator card to carry out model configuration material.

Background technology

The stability of structural material is to ensure that the basis that reactor safety is run, and the extreme environment (high energy particle irradiation) in reactor can cause its core component micro structure major injury, and then causes its performance degradation.The irradiation damage of structural material is a complicated multi parametric process, relate to the formation of number of drawbacks, diffusion, compound and gathering, and transmuting hydrogen, helium impurity damage associating of especially collision cascade damage with offing normal, also it is one and sees macroscopical multiple dimensioned problem of materials from microcosmic, Jie.And primary injury event occurs in psec (ps) time scale, it is costly and cannot defect be observed online to carry out experimentation, and therefore, molecular dynamics simulation becomes the scheme of unique feasible.In order to better contrast with experimental result, Molecular Dynamics method needs the atom system of simulation ten million or even several hundred million magnitude, and conventional serial molecular dynamics simulation will be brought very big challenge by this.Thus, development high performance parallel computation method is imperative.

At the high-performance calculation parallel initial stage, it is all based on the concurrent technique of MPI and OPENMP, belongs to the parallel schema of coarseness.Increasingly mature recently as GPU technology, people attempt being incorporated in molecular dynamics simulation by GPU concurrent technique, and its effect is notable, this belong to fine-grained parallel schema.Relative to MPI multi-node parallel, GPU has every watt of higher performance, every square feet of performance and performance/price ratio characteristic.But, when calculating simulation system reach near up to ten million time its computational efficiency can decline because of the restriction that is subject on GPU stream handle number.Certain applications adopt scheme parallel for multiple GPU on individual node to expand calculating scale at present, but this method amount of calculation or relatively limited, and its calculating simulation result still cannot with Comparison of experiment results.

Summary of the invention

MPI and GPU is combined by disclosure embodiment, disclose the MD analogy method of a kind of structural material irradiation damage using multiple GPU to accelerate, can effectively simulate the material radiation damage process on bigger spatial and temporal scales, the Evolution of irradiation damage is explained, it was predicted that material irradiation performance from micro-scale.In one embodiment it is proposed that many GPU molecular dynamics simulation of a kind of structural material irradiation damage, plurality of GPU be positioned on multiple node and can parallel running, described method includes:

A. initialize on the plurality of node, distribute including space, parameter is read in, the setting of the generation of material structure and material particles initial position and speed；

B. being dynamically each node division grid, each network holds one or more particles；

C. carry out inter-node communication, the particle of the net region beyond each node and overlay region particle are sent to other nodes of correspondence；

D. on the GPU of the plurality of node, set up the cellular list of sequence；

E. on the GPU of the plurality of node, update time step；

F. coordinate according to particle on the GPU of the plurality of node, finds the corresponding relation of particle and grid numbering；

G. on the GPU of the plurality of node, predict the displacement of particle, speed and acceleration；

H. cellular list according to the step d sequence obtained on the GPU of the plurality of node, calculates the stress of each particle；

I. utilize the step h stress obtained that the displacement of particle, speed and acceleration are corrected on the GPU of the plurality of node；

J. according to assemblage erection rate on the GPU of the plurality of node, it is ensured that system constant temperature；

K. on the GPU of the plurality of node, utilize periodic boundary condition, revise particle position；

L. on the plurality of node, store current result of calculation；

M. iteration performs step b to l, until setting step number.

Method according to disclosure embodiment, utilize the feature of the calculating core of superpower for GPU Floating-point Computation ability, high bandwidth and light weight, adopt the CUDA parallel architecture of such as NIVIDIA company and MPI communication mechanism to realize this algorithm, in the design of this external algorithm structure, hardware configuration in conjunction with GPU, design the algorithm structure meeting GPU parallel model, have employed a series of optimisation strategy such as connected reference internal memory, shared drive, depositor, optimization, improve operational efficiency.Embodiment-metallic iron irradiation damage Simulation results shows the method material radiation damage process on the bigger spatial and temporal scales of premise Imitating ensureing computational accuracy, and then from the long-time Evolution of micro-scale explanation irradiation damage, reduce computing equipment energy consumption and maintenance cost simultaneously.

Accompanying drawing explanation

Only describe by way of example referring now to accompanying drawing and embodiment of the disclosure, in accompanying drawing:

Fig. 1 is the system structure schematic diagram of the many GPU molecular dynamics simulation realizing structural material irradiation damage according to disclosure embodiment；And

Fig. 2 is the schematic flow diagram of many GPU molecular dynamics simulation of the structural material irradiation damage according to disclosure embodiment.

It should be noted that in whole accompanying drawing, similar accompanying drawing labelling is used for describing same or analogous key element, feature and structure.

Detailed description of the invention

The disclosure can have multiple embodiment, and can carry out multiple amendment and modification wherein.Therefore, the specific embodiment shown in reference accompanying drawing is described in detail the disclosure.It is to be understood, however, that the disclosure is not limited to specific embodiment, but in the spirit and scope of the disclosure, include all modifications, equivalent and/or alternative.In the description to accompanying drawing, similar accompanying drawing labelling is used for representing similar element.

Word " including ", " can include " and other morphologies used in each embodiment of the disclosure represent the corresponding disclosed function of existence, operation and element, and are not intended to one or more additional function, operation or element.In addition, term " including ", " having " and its morphological change used in the various embodiments of the disclosure is intended to only represent particular characteristics, numeral, step, operation, element, assembly or its combination, and should not be construed as the existence or possible interpolation of first getting rid of one or more other characteristics, numeral, step, operation, element, assembly or its combination.

In each embodiment of the disclosure, statement "or" or " A or/and in B at least one " include the combination in any of listed word or all combine.Such as, statement " A or B " or " at least A or/and B " can include A, it is possible to include B, maybe can include A and B both.

In the disclosure, the statement including the such as ordinal number such as " first " or " second " can modify each element.But, this element is not limited by above-mentioned statement.Such as, above-mentioned statement is not limiting as order and/or the importance of element.Above-mentioned statement is only used for a purpose element and other elements made a distinction.Such as, when the scope without departing from the various embodiments of the disclosure, the first element can be referred to as the second element, and similarly, the second element can also be referred to as the first element.

The term used in each embodiment of the disclosure is only used for describing some embodiments, and is not intended to limit the disclosure.Unless context is explicitly indicated separately, otherwise singulative is also intended to include plural form.Additionally, all terms used herein (including technical term and scientific terminology) are identical with the implication that disclosure person of an ordinary skill in the technical field is generally understood that.In general dictionary, this type of term of definition should be interpreted as having the implication identical with the context implication in correlative technology field, and unless explicitly defined in each embodiment of the disclosure, otherwise these terms should not be construed to have ideal or the meaning of excessive form.

Illustrate below in conjunction with accompanying drawing preferred embodiment of this disclosure, it will be appreciated that preferred embodiment described herein is merely to illustrate and explains the disclosure, be not used to limit the disclosure.For clarity and conciseness, the disclosure eliminates the description of known function and structure.

Fig. 1 is the system structure schematic diagram of the MD analogy method of the structural material irradiation damage using multiple GPU Parallel Implementation according to disclosure embodiment.As it is shown in figure 1, this system includes client 1, the HPCC 2 that comprises GPU accelerator card and for transmitting the network 3 of information.Although it is not shown in the figure, be appreciated that in FIG, client 1 and computing cluster 2 can adopt any suitable connected mode.Such as, client 1 can communicate with computing cluster 20 via LAN or wide area network etc., remotely carries out emulation experiment.

Client 1 could be included for the hardware device of input, for instance keyboard, touch dish etc., and can remotely manage the software equipment of computing cluster, for instance PuTTY, XManager etc..

Computing cluster 2 can include multiple computing node 4-1,4-2 ..., 4-n (n is greater than being equal to the integer of 2), each computing node can comprise CPU and GPU, and alternatively, it is also possible to include storage device etc..Exemplarily, the GPU in computing node can be the NIVIDIA general-purpose computations card with more than GF110 core, comprises the GPU multiple programming of the CUDA framework of NIVIDIA company exploitation and for the interface protocol of inter-node communication, for instance MPI interface protocol.Utilize the MPI technology adding CUDA, it is possible to simulate large-scale irradiation damage program.Multiple computing nodes can executed in parallel calculate, and each GPU on each node includes the multiple stream handles carrying out parallel processing, one particle of each stream handle alignment processing.MPI can give each node one numbering in logic, generally from 0, for instance if there being four nodes, then node serial number is 0,1,2,3.In running, default node 0 is host node.

System above is only the implementation of a kind of example.It will be understood by those skilled in the art that the system architecture that can adopt other forms, for instance the function of above-mentioned all parts can carry out reallocating and combining, to form other system architecture.

Below in conjunction with the system architecture of Fig. 1, many GPU molecular dynamics simulation of structural material irradiation damage according to disclosure embodiment is described with reference to Fig. 2.

As in figure 2 it is shown, initialize on multiple nodes in step 202., distribute including such as space, parameter is read in, the setting of the generation of material structure and material particles initial position and speed.Such as, according to example embodiment, space distribution may include that

(1) node number nnodes and total population nm is read, it is determined that the size of CPU and the GPU variable space in each node, computing formula is space length nmPerNode=2* ((nm+nnodes-1)/nnodes)；

(2) it is dynamically the variable array opening space of CPU and GPU on each node, including particle position x0, y0, z0, speed x1, y1, z1, acceleration x2, y2, z2, derivative x3, y3, z3 of acceleration, particle types ispec, the stress fx of particle, fy, fz etc., space length is nmPerNode；

(3) for each node, from the file being stored in hard disk, read in relevant physical parameter, including material information such as particle types, particle weight, lattice types, lattice paprmeter, box size, assemblage, reaction step number, build lattice structure desired parameters and switching value；Corresponding potential parameter is read in from potential parameter file；PKA number, position, direction, energy, switching value is read in from primary collision atom PKA file；

(4) alternatively, Conversion of measurement unit is carried out so that overall unit is consistent；

(5) on the CPU of host node 0, dynamically distribution is for storing the position in all particle x, y, z directions and the variable space of speed, and allocated length is nm.

(6) calculations incorporated energy, is used for testing；

(7) initial crystalline structure on host node 0.If relaxation, generate particle initial position according to lattice structure, if cascade, directly the coordinate of particle after relaxation is read in.Crystal structure is that x, y, z direction order sets according to the right-hand rule, because it is according to linear stress and strain model that task divides, so may insure that particle as much as possible reduces communication on corresponding node.Need when program is run to select between relaxation and cascade, be parallel.Relaxation is two different physical processes with cascade physically, the input file first reading in relaxation runs appointment step number by program, the particle information being balanced under system includes particle position, speed, acceleration etc., and the particle information obtained when then reading in input file and the relaxation of cascade carries out irradiation damage simulation.

(8) if relaxation, host node 0 randomly generates the speed of particle (-0.5～0.5), ensure that this system momentum is 0 with method correction, and temperature-resistant.If cascade, directly read in speed, acceleration and the particle types etc. that relaxation obtains.

In step 204, distributing task, be namely dynamically each node division grid, each network holds one or more particles.According to example embodiment, distribution task may include that

(1) position of particle and speed are evenly distributed on the CPU of each node, and the tensor calculated is broadcast to each node；

(2) copying CPU end movement, speed, acceleration on each node to GPU, all aray variables of disclosure embodiment all adopt one-dimensional vector form；

(3) calculating the original position cellPerNodeStart [i] and end position cellPerNodeEnd [i] of each node i correspondence subregion grid, initial value is the grid number of each node distribution of average out to.In one embodiment, task divides and adopts linear partition method, if grid number time initial altogether is Ncell, nodes is NG, so each node is that computable grid is approximately Ncell/NG, node 0 then calculates the particle in 0～Ncell/NG grid, and node 1 then calculates the particle in Ncell/NG+1～2 × Ncell/NG grid, by that analogy；Considering in circulation that during cascade collision, particle at certain node rendezvous, can adopt the method (will be described below) of dynamic equilibrium, regulating calculation grid number on each node in each circulation step, to reach population approximate equilibrium.Between node, (coarseness) task selects linear task division methods when dividing, relative to existing method, " least area method " that such as Lammps and HOOMD etc. adopt, method according to disclosure embodiment, nodes can arbitrarily many, have only to communicate with adjacent node, simultaneously facilitate as each node regulating calculation task.Specifically, Lammps and HOOMD adopts " least area method " partitioning, and (Lammps and HOOMD is international molecular dynamics software, it is also adopted by the parallel schema of MPI+cuda, the code that part is computationally intensive has been carried out parallel by Lammps, HOOMD is then based entirely on the molecular dynamics common software of GPU), " least area method " partitioning traffic is minimum, however it is necessary that the communication judging the overlapping region of 26 nodes and this node around, and requires 2 for nodesⁿ, regulate also cumbersome for load balance.And the method according to disclosure embodiment can have only to and upper and lower two adjacent node communications relative to least area method, the realization of Dynamic Load-balancing Algorithm is also very simple and requires also relatively low to nodes.

According to example embodiment, multiple internodal communications adopt MPI communication protocol, and multiple nodes are divided into host node and from node, and host node is responsible for distribution and the collection of data, are dynamically each node division grid according to load balance principle.

In step 206, the particle and overlay region particle that remove this node corresponding region being sent to by MPI communication protocol other nodes of correspondence, in step 208, each node generates the cellular list of each sequence on GPU, and step 206 and 208 is mutual.In an example embodiment, the two step may include that

(1) utilize stream handle that each particle is carried out parallel processing on each GPU, according to particle coordinate, calculate the grid numbering that each particle is corresponding；

(2) utilize stream handle that each particle is carried out parallel processing on each GPU, find out not at the particle of this node and store to relief area, record corresponding population simultaneously, it is stored in relief area less than the particle of the original position of this node grid by not numbering at this node and grid, grid numbering is stored in another relief area more than the particle of the end position of this node grid, and particle numbering corresponding for this particle and grid numbering are composed maximum；

The corresponding thread of each stream handle, has been carried out in order to ensure each thread, it is possible to synchronization mechanism, each thread is synchronized；

(3) numbering of two adjacent with this node node is calculated, the relief area of corresponding node will be sent at the particle of this node, read the information sended over from relief area, before being transmitted, need pre-communication, allow the node received know the population needing to receive；

When nodes is more than 2, in order to avoid buffer area read-write deadlock, allowing host node 0 first receive buffer information, and other nodes send information, when nodes is equal to 2, two nodes can directly transmit information；

(4) buffer information received is copied to GPU, to the particle parallel memorizing in relief area to dependent variable backmost on GPU；

(5) on each node to the particle grid number sorting according to its correspondence, for instance the thrust built-in function sort_by_key of CUDA can be utilized；

(6) in order to ensure the seriality of GPU thread accesses address, improve execution efficiency, according to the grid numbering of sequence to position corresponding to particle, speed, acceleration sequence on GPU on each node, at this moment because all not composed maximum in the grid numbering of this node before, corresponding particle information has also come backmost；

null(7) owing to needing 26 grids around traversal grid in subsequent calculations，Then between adjacent node, grid needs communication to form overlapping (overlap) district，Owing to being linear partition task and cycle boundary method，In order to reduce the traffic，The disposable full detail by the grid of this node of parcel is transferred to this node，It is cellPerNodeStart [i]+startoffset to the grid number of forward pass，Startoffset=nlcx*nlcy+ (nlcx*nlcy-cellPerNodeStart [i] % (nlcx*nlcy)) % (nlcx*nlcy)，nlcx、Nlcy is x respectively、Grid number on y direction，The grid number of back kick is cellPerNodeEnd [i]-endoffset，Endoffset=nlcx*nlcy+cellPerNodeEnd [i] % (nlcx*nlcy)，Information pass send with receive and storage and (3) and (4) identical，By after information copy to effective particle information during storage，The information of non-grid before covering；

(8) on the GPU of each node, utilize shared drive, be respectively stored in array cellStart and cellEnd according to the starting and ending position of particle in each grid of grid number record of sequence.

In step 210, being dynamically updated time step, it is that maximum time step-length, ceiling capacity block, maximum distance blocks minimum that of middle required time.By first trying to achieve extreme value on each node, then try to achieve internodal extreme value, obtain maximum time step-length, ceiling capacity blocks and maximum distance blocks minimum that of middle required time.In an example embodiment, time step updates as follows:

(1) on the GPU of each node, built-in function thrust::min_element/max_element by CUDA completes the extreme value to speed, acceleration and derivative thereof；

(2) by the MPI_Send () in MPI communication protocol and MPI_Recv () function, the extreme value of each node is collected host node 0, then maximizing and minima；

(3) host node on CPU, obtain minimum time step-length dtmax, ceiling capacity blocks demax and maximum distance blocks minimum that of required time in dxmax；

(4) according to updating (or correction) speed of ratio correction particle of surrounding time step-length, acceleration and derivative thereof on GPU, if NPT assemblage, revise tensor simultaneously.

In step 212, coordinate according to particle on the GPU of multiple nodes, finds the corresponding relation of particle and grid numbering.This is actually dynamically regulating load balance, first the population of each grid is counted at host node 0, using the number average particle of each node as standard, regulate the starting and ending position of grid on each node so that the population approximately equal calculated on each node.

In step 214, each GPU utilize stream handle each particle is carried out parallel processing, for instance with verlet method the prediction displacement of particle, speed, acceleration；

If NPT assemblage, in addition it is also necessary to predict tensor on GPU.

In step 216, the cellular list of the sequence obtained according to step 208 on the GPU of the plurality of node, calculate the stress of each particle.Specifically, calculate many body postures of each particle, obtain the stress of each particle.In an example embodiment, the stress of particle includes the stress in x, y, z direction, namely between particle with all particles around pair potential, particle and embed gesture between electronics, correction term is added the total stress the obtaining particle projection in x, y, z direction, by total stress of following steps calculating particle:

(1) based on the array of sequence, allow the stream handle on GPU travel through all particles in the grid at this particle place and adjacent 26 grids thereof continuously, calculate their distance respectively and then calculate electron density corresponding to this particle and the contribution of electron density non-cyanide leaching part corresponding to this particle；

(2) calculated embedding gesture and the derivative thereof of particle by calculated electron density, closely calculate correction term and the derivative thereof of particle；

(3) based on the array of sequence, allow the stream handle on GPU travel through all particles in this particle place grid and adjacent 26 continuously, calculate their distance respectively, the derivative according to pair potential potential function calculated between two particle and pair potential；And

(4) pair potential of all to particle and surrounding particles, embedding gesture and correction term are added the total stress obtaining particle, and then are projected out the stress in x, y, z direction.

In step 218, each GPU utilizes stream handle each particle is carried out parallel processing, according to the stress that step 216 obtains, the displacement of particle, speed, acceleration are corrected.

If NPT assemblage, tensor is corrected.

In step 220, according to assemblage erection rate on the GPU of multiple nodes, it is ensured that system constant temperature.Such as, under non-NVE assemblage, erection rate on GPU, it is ensured that system constant temperature.

In step 222, the GPU of multiple nodes utilizes periodic boundary condition, revise particle position.

In step 224, multiple nodes store current result of calculation.Such as, according to storage demand, copy GPU client information to CPU.

In step 226, it is judged that whether circulation arrives setting step number, if not having, then return step 204, be circulated.

If judging that circulation arrives to set step number, end loop, discharge the memory space and system memory space opened up.

According to example embodiment, irradiation damage process includes the irradiation damage process of material under different crystalline lattice structure and different assemblage, and said method is additionally included under NPT assemblage and processes for the adjustment of tensor.

Alternatively, in order to improve calculating accuracy further, it is possible to add following one or more steps between step 208 and step 210:

Initial acceleration calculation procedure: utilize particle position to calculate many body postures of each particle on the GPU of each node, obtain total stress of each particle, and then obtain the initial acceleration of particle, including:

(1) based on the array of sequence, allow the stream handle on each node GPU travel through all particles (if the grid of non-node then reads in corresponding grid after array) in the grid at this particle place and adjacent 26 grids thereof continuously, calculate their distance respectively and then calculate electron density corresponding to this particle and the contribution of electron density non-cyanide leaching part corresponding to this particle；

(2) on each node GPU, each particle is carried out parallel processing, calculated embedding gesture and the derivative thereof of particle by calculated electron density, closely calculate correction term and the derivative thereof of particle；

(3) based on the array of sequence, the stream handle on each node GPU is allowed to travel through all particles in the grid at this particle place and adjacent 26 grids thereof continuously, calculate their distance respectively, according to pair potential potential function calculated between two particle, and the derivative of pair potential；The pair potential of all to particle and surrounding particles, embedding gesture and correction term are added the total stress obtaining particle, and then are projected out the stress in x, y, z direction；

Because in molecular dynamics, acting on choosing of potential-energy function extremely important between particle, for different types of material, owing to its structure is different, selected potential-energy function is also different.

When assemblage is NPT, in addition it is also necessary to corresponding tensor is initialized.

(4) on GPU, each particle is carried out parallel processing, obtain the initial acceleration of particle according to the stress of each particle and speed.

PKA setting steps: judge whether to PKA and arrange, if PKA need to be arranged, PKA is set, host node 0 place that is arranged on of PKA position and speed completes, PKA can be repeatedly set, and can at random or specify the position of PKA and direction, when multiple GPU calculate, PKA should at which grid of which node, first the position according to PKA, calculate the grid of correspondence, and find the node of correspondence, PKA information is sent to corresponding node, traversal includes 27 grids of PKA, find apart from minimum that, judge that whether correspondence position is at this node, if not, then it is sent to other nodes of correspondence, position is found to replace.PKA is arranged at the beginning of the cycle according to demand, the artificial switching value arranged, and reads in from input file.

Alternatively, between step 218 and 220, it is possible to comprise the steps:

Judge whether to need to perform quenching, if it is desired, then perform.This according to the setting in input file, can control with switching value iquen.

Alternatively, for the correctness of verification method, in each circulation step, the electron energy of all particles and interparticle energy are added, and can contrast with combining.Such as, by initial phase utilize formula calculate combine can cohe with calculate after each loop ends energy summation pe compare.When pe and cohe value is equal, illustrate correct.

Function MPI_Reduce () in MPI can be used the pe value stipulations of each node.

Following table lists embodiment one metallic iron irradiation damage Simulation results on many GPU of NVIDIA, this table is under different size, running CPU and GPU when step number is 5000 and run time contrast, result shows that the disclosure drastically increases operation efficiency while reducing energy consumption.

The method of the various embodiments according to the disclosure can be realized by the computer-readable code being stored in computer readable recording medium storing program for performing.Computer readable recording medium storing program for performing includes all types of recording equipments wherein storing the data that can be read by computer system.For such record medium, for instance, it is possible to use ROM, RAM, CD, tape, floppy disk, hard disk or nonvolatile memory.It addition, computer readable recording medium storing program for performing can store following code, described code is distributed in the computer system connected by network, such that it is able to read by computer by distributed method and perform described code.

Although the various embodiments with reference to the disclosure illustrate and describe the disclosure, skilled person will understand that, when without departing from the scope of the present disclosure being defined by the appended claims and the equivalents thereof, it is possible to carry out the various changes in form and details wherein.

It will be appreciated that embodiment of the disclosure and may be implemented as the form that hardware, software or hardware and software combine.Any this software can be stored as the form (storage device of such as ROM etc of volatile memory or nonvolatile memory, whether it is erasable or rewritable), or it is stored as the form (such as RAM, memory chip, device or integrated circuit) of memorizer, or it is stored on light or magnetic computer-readable recording medium (such as, CD, DVD, disk or tape etc.).It will be recognized that storage device and storage medium are adapted for the embodiment of the machine-readable storage device of storage program, described program includes the instruction implementing disclosure embodiment when executed.

Running through the described and claimed of this specification, word " includes " and the modification of " comprising " and this word represents " including but not limited to ", is not intended to get rid of other assemblies, entirety or step.

Running through the described and claimed of this specification, singulative comprises plural form, unless the context requires otherwise.Specifically, when using indefinite article, this specification will be understood as consideration plural number and odd number, unless the context requires otherwise.

Will be understood as suitable in any other aspect described herein, embodiment or example in conjunction with the feature described by the specific aspect of the disclosure, embodiment or example, entirety or characteristic, unless incompatible.

It will also be appreciated that, run through the described and claimed of this specification, the language of the common version of " X for Y " is (wherein, Y is certain action, activity or step, and X is certain device for performing this action, activity or step) comprise the device X that special (but non-exclusive) for carrying out Y is arranged or adapt.

Additionally, description and the understanding of technology contents are enlightened by the embodiment disclosed in this specification, but it is not limiting as the scope of the present disclosure.Therefore, the scope of the present disclosure should be interpreted as including all modifications made based on the technological thought of the disclosure or other embodiments various.

Claims

1. a multiple graphs processing unit GPU molecular dynamics simulation for structural material irradiation damage, multiple GPU be positioned on multiple node and can parallel running, described method includes:

D. on the GPU of the plurality of node, set up the cellular list of sequence；

E. on the GPU of the plurality of node, update time step；

L. on the plurality of node, store current result of calculation；

M. iteration performs step b to l, until setting step number.

2. method according to claim 1, wherein said multiple internodal communication adopts MPI communication protocol, the plurality of node is divided into host node and from node, host node is responsible for distribution and the collection of data, and is dynamically each node division grid in stepb on the primary node according to load balance principle.

3. method according to claim 1, wherein irradiation damage process includes the irradiation damage process of material under different crystalline lattice structure and different assemblage,

Described method is additionally included under NPT assemblage and processes for the adjustment of tensor.

4. method according to claim 1, wherein when performing relaxation process, is set to 0 by system temperature, and primary colliding atom PKA quantity is set to 0, performs step a to l, until being set step number material particles by initial unbalance state evolution to poised state.

5. method according to claim 1, wherein when performing cascade process, utilizes speed, coordinate and acceleration that relaxation process obtains, performs step a to l, to carry out material radiation damage simulation at the temperature specified and assemblage.

6. method according to claim 1, wherein the distribution of the space in step a is the variable array opening space of CPU and the GPU being dynamically each node,

Space distribution on CPU also includes: on the primary node the position of all particles and speed are carried out space distribution.

7. method according to claim 1, wherein in step a, parameter is read in is read in relevant physical parameter on each node from the file being stored in hard disk, including material information, corresponding potential parameter and PKA information.

8. method according to claim 1, wherein in step a, the generation of material structure and the setting of particle initial position are different because of the difference of relaxation and cascade process,

If relaxation, generate particle initial position according to lattice structure, if cascade, directly the position of particle after relaxation is read in.

9. method according to claim 1, wherein step a also includes task distribution, for the position of the particle obtained on host node and speed are evenly distributed to each node by MPI interface protocol.

10. method according to claim 1, wherein in step b, dynamically each node division grid includes: after the three-dimensional lattice structure produced is carried out stress and strain model, by x, y, z direction serial number, sum and node total number according to grid carry out linear partition according to the principle of load balance, the node corresponding to find out each grid.

11. method according to claim 1, wherein step c and d is mutual, specifically includes:

According to particle coordinate, the GPU of each node calculates the grid numbering that particle is corresponding；

GPU finds out not at the particle of node belonging to this GPU, and be transferred to other nodes of correspondence by MPI communication mechanism；

According to grid numbering, particle position, speed, acceleration on each node are ranked up；

The particle information of overlay region is sent to other nodes of correspondence；And

Find the starting and ending position of handled particle on each node.

12. method according to claim 1, wherein the time step in step e is that maximum time step-length, ceiling capacity block and maximum distance blocks minimum that of middle required time.

13. method according to claim 12, wherein by first trying to achieve extreme value on each node, then try to achieve internodal extreme value, obtain maximum time step-length, ceiling capacity blocks and maximum distance blocks minimum that of middle required time.

14. method according to claim 1, wherein each GPU on each node includes multiple stream handles of carrying out parallel processing, one particle of each stream handle alignment processing.

15. method according to claim 1, wherein in step h, the stress of particle includes the stress in x, y, z direction, namely between particle with all particles around pair potential, particle and embed gesture between electronics, correction term is added the total stress the obtaining particle projection in x, y, z direction, by total stress of following steps calculating particle:

Array based on sequence, allow the stream handle on GPU travel through all particles in the grid at this particle place and adjacent 26 grids thereof continuously, calculate their distance respectively and then calculate electron density corresponding to this particle and the contribution of electron density non-cyanide leaching part corresponding to this particle；

Calculated embedding gesture and the derivative thereof of particle by calculated electron density, closely calculate correction term and the derivative thereof of particle；

Based on the array of sequence, allow the stream handle on GPU travel through all particles in this particle place grid and adjacent 26 continuously, calculate their distance respectively, the derivative according to pair potential potential function calculated between two particle and pair potential；And

The pair potential of all to particle and surrounding particles, embedding gesture and correction term are added the total stress obtaining particle, and then are projected out the stress in x, y, z direction.