CN114580144B

CN114580144B - GPU parallel implementation method for near field dynamics problem

Info

Publication number: CN114580144B
Application number: CN202210047282.5A
Authority: CN
Inventors: 何庆; 王晓明; 安博洋; 王启航; 王平; 黄�俊; 董唯佳; 王宁; 黄洪; 张岷; 刘启宾; 匡俊; 余天乐
Original assignee: Southwest Jiaotong University
Current assignee: Southwest Jiaotong University
Priority date: 2022-01-17
Filing date: 2022-01-17
Publication date: 2024-05-17
Anticipated expiration: 2042-01-17
Also published as: CN114580144A

Abstract

The invention relates to the field of accelerated computation of near field dynamics, in particular to a GPU parallel implementation method of near field dynamics. The invention provides a GPU parallel implementation method for near-field dynamics problems, which utilizes GPU equipment, uses registers and shared memory with high read-write efficiency to calculate, uses constant memory with a broadcasting mechanism for constant parameters, thereby realizing acceleration effect far higher than that of the existing method, and overcoming the problem of limited calculation efficiency improvement caused by insufficient utilization of the GPU equipment and a parallel scheme.

Description

GPU parallel implementation method for near field dynamics problem

Technical Field

The invention relates to the field of accelerated computation of near field dynamics, in particular to a GPU parallel implementation method of near field dynamics.

Background

Continuous medium mechanics based on partial differential equation faces mathematical tool failure when solving discontinuous problems such as crack initiation and expansion, so that the existing widely applied finite element method based on continuous medium mechanics can not well simulate the discontinuous problems such as crack initiation and expansion. Near field dynamics is an emerging mechanical theory based on a non-local action idea, and an integral equation is adopted to replace a partial differential equation, so that the problem that a mathematical tool of continuous medium mechanics fails at a discontinuous position is not faced, and the near field dynamics has unique advantages in simulating the problems of impact, crack, functionally graded materials and the like. However, since near field dynamics is a non-local theory, one particle can interact with tens or even hundreds of points in the near field range, which results in the use of near field dynamics for analog to calculate the cost. Particularly when fatigue and transient problems are simulated, a large number of iterative solutions are required, which can lead to a sudden increase in computation time, thereby limiting the application of near-field dynamics.

The rapid development of the GPU greatly promotes the development of parallel computing, and the computing efficiency can be greatly improved by large-scale parallel computing. Near field dynamics simulation is typically solved by a grid-less method, which is particularly suited for parallel computing. Therefore, the problem of low calculation efficiency of near field dynamics can be solved to a certain extent by adopting GPU parallel calculation, so that the application range of near field dynamics is widened. However, the problem of parallel calculation of near field dynamics is not widely focused, and only OpenMP, openAcc, openCL, CUDA and other parallel libraries are used for accelerating near field dynamics calculation, simulating the problems of composite materials, crack propagation and the like, so that certain calculation efficiency is improved. However, the above method does not give a detailed implementation, and does not fully utilize the performance of the GPU, resulting in limited improvement of the computational efficiency.

Disclosure of Invention

Aiming at the problems in the background technology, a GPU parallel implementation method for near field dynamics problem is provided. The invention provides a GPU parallel implementation method for near-field dynamics problems, which utilizes GPU equipment, uses registers and shared memory with high read-write efficiency to calculate, uses constant memory with a broadcasting mechanism for constant parameters, thereby realizing acceleration effect far higher than that of the existing method, and solving the problem that the existing method cannot fully utilize the GPU equipment and a parallel scheme, so that the calculation efficiency is improved limitedly.

The invention provides a GPU parallel implementation method for near field dynamics, which comprises the following steps:

S1, creating a structural body PD_Parameter containing the geometry, materials and other parameters of a model; declaring a global variable of PD_Parameter at a host side (CPU) and declaring a constant memory variable at a device side (GPU);

S2, respectively creating a structure Atom and Bond containing an array related to the particles and keys;

S3, assigning a value to the PD_Parameter of the host end and transmitting the value to a constant memory variable of the equipment end;

S4, distributing equipment memories for the arrays of the Atom and the Bond of the structural body, wherein the array related to the particles is distributed with a memory of N times of single data byte number, the array related to the key is distributed with a maximum possible memory of N times of single data byte number, and attribute data of all keys in a near field range of one particle are stored in each continuous MN interval;

s5, mapping the coordinate calculation formula of each object particle to each thread of the GPU correspondingly, starting the thread with the same total particle number, and mapping each point to a special thread;

S6, generating the neighborhood internal action of each particle, determining the neighbor point number array NN and the index array NL of the particle, and paralleling the neighbor point number array NN and the index array NL in a point mapping mode;

S7, correcting the volume of the particles, and determining an original key length array idist and a volume correction coefficient array fac in the process; the calculation formula of the volume correction is as follows:

S8, applying displacement and boundary conditions for the model by restraining the displacement and the speed of the particles, wherein the calculation of each particle is mutually independent;

s9, initializing all values of the failure array to be 1, setting the value of the failure array corresponding to a key penetrated by the crack to be 0 according to the model requirement, indicating that the key is broken, and adding the crack to the model in such a way;

S10, judging whether the key-type near field dynamics or the conventional-type near field dynamics is used; performing steps S11-S12 by key-type near field dynamics; the near field dynamics of the mode performs steps S13-S15;

S11, determining a key correction coefficient array scr according to a strain energy direction correction method, wherein strain energy calculation on each key is independent;

s12, calculating the elongation of the key on the premise that the corresponding key is not broken by adopting a key mapping parallel mode:

Wherein ζ represents the relative position vector of the two particles after deformation, and η represents the relative displacement vector after deformation; setting the value of the key failure array with the key elongation exceeding the critical value to be 0, and representing the key fracture; the near field force of each particle is calculated according to the combined use mode of the register and the shared memory, and the calculation formula of the near field force is as follows:

Wherein c represents a key constant, M represents a key direction, and H _x represents a group of particles, namely all neighbor points in a near field range;

S13, calculating an influence function array omega and a weighted volume array m by adopting a parallel mode of key mapping and a mode of combining a register and a shared memory, wherein the calculation formula is as follows:

S14, calculating the volume strain of particles by adopting a key mapping parallel mode and combining a register and a shared memory which are used on the premise that the corresponding key is not broken, wherein the calculation formula is as follows:

In the formula, e= |ζ+η| - ζ|;

S15, calculating the elongation of the key to judge whether the key is broken or not on the premise that the corresponding key is not broken by adopting a key mapping parallel mode, and calculating the near field force of particles by using a register and a shared memory in a combined way, wherein the calculation formula of the near field force is as follows when the model is uniformly scattered:

wherein k and G represent bulk modulus and shear modulus, respectively, and subscripts i and j represent particles i and j, respectively;

S16, judging whether the problem belongs to a quasi-static problem or a transient problem; the quasi-static problem uses an adaptive dynamic relaxation method, the transient steady state problem updates the speed and displacement of particles based on a display integration method of center difference;

The calculation formula of the adaptive power relaxation method is as follows:

Wherein n represents the iteration times, D is a virtual diagonal density matrix, C is a damping coefficient, and F is the sum of near-field force and external force received by the particle;

The calculation formula of the display integral based on the center difference is as follows:

wherein a, v and u are respectively the acceleration, the speed and the displacement of the surface particles; t represents the current time, Δt represents the time step;

S17, judging whether iteration is finished, if so, executing S18; if not, returning the key type to S12 and returning the state type to S14 according to the type of near field dynamics;

S18, calculating damage indexes of particles by adopting a parallel mode of key mapping and combining a register and a shared memory, wherein the damage indexes represent the volumes of neighbor points with broken keys and the proportion of the neighbor points to the whole neighborhood volume, and the calculation formula is as follows:

s19, transmitting the required calculation result from the equipment end to the host end, and writing the calculation result into a file for storage;

s20, releasing the memory, and ending the calculation.

Preferably, in S1, the structure pd_parameter should also include the model total number of mass points N, the maximum neighbor number of mass points MN, the integral number of mass points n_int, the particle spacing Δ, the near field range δ, and the x, y, z-direction number of mass points.

Preferably, in S2, the array associated with the particle has displacement, velocity, acceleration, coordinates, near field forces; the key-related array has key correction coefficients, key failure arrays, volume correction coefficient influencing functions.

Preferably, in S7, the calculation of the key length and the volume correction coefficient of each key are independent from each other, so n×mn threads are started, and the calculation formula corresponding to each key is mapped to the corresponding thread, which is called as a parallel key mapping.

Preferably, in S11, after the strain energy of each key is calculated, the strain energy of the key needs to be summed to the corresponding particle, and in this case, a register with high read/write efficiency and a shared memory are used in combination for summation.

Preferably, the specific usage is as shown in the following pseudo code:

compared with the prior art, the invention has the following beneficial technical effects:

The invention provides a GPU parallel implementation method for near-field dynamics problems, which utilizes GPU equipment, uses registers and shared memory with high read-write efficiency to calculate, uses constant memory with a broadcasting mechanism for constant parameters, thereby realizing acceleration effect far higher than that of the existing method, and solving the problem that the existing method cannot fully utilize the GPU equipment and a parallel scheme, so that the calculation efficiency is improved limitedly.

Drawings

FIG. 1 is a schematic diagram of a GPU parallel scheme of a prior art method;

FIG. 2 is a schematic drawing of a uniaxially stretched flat sheet;

FIG. 3 is a schematic illustration of the centerline displacement of a uniaxially stretched flat panel;

FIG. 4 is a schematic diagram of a crack-containing slab subjected to velocity boundary conditions.

Detailed Description

The GPU parallel implementation method for the near field dynamics problem provided by the invention comprises the following steps:

S1, creating a structural body PD_Parameter containing the geometry, materials and other parameters of a model; declaring a global variable of PD_Parameter at a host side (CPU) and declaring a constant memory variable at a device side (GPU); the structure pd_parameter should further include a model total number of mass points N, a maximum number of neighbor points MN, an integral number of mass points n_int, a particle distance Δ, a near field range δ, and constant parameters to be used in subsequent procedures such as x, y, and z direction mass points;

S2, respectively creating a structure Atom and Bond containing an array related to the particles and keys; the array related to the particles comprises displacement, speed, acceleration, coordinates, near-field force and the like; the array related to the key is provided with a key correction coefficient, a key failure array, a volume correction coefficient influence function and the like;

s5, the same as the prior method (figure 1), mapping the coordinate calculation formula of each object point to each thread of the GPU correspondingly, but starting the thread with the same total number of the object points, wherein each point is mapped to a special thread; the method does not have the phenomenon that a plurality of points are mapped to the same thread as the prior method, and the parallel mode is called point mapping parallel;

s7, since particles at the boundary of the neighborhood may not be all contained in the neighborhood, correcting the volume of the particles, and determining an original key length array idist and a volume correction coefficient array fac in the process; the calculation of the key length and the volume correction coefficient of each key are mutually independent, so that N×MN threads are started, the calculation formula corresponding to each key is mapped to the corresponding thread, and the parallel mode is called key mapping parallel;

The calculation formula of the volume correction is as follows:

S11, determining a key correction coefficient array scr according to a strain energy direction correction method, wherein strain energy calculation on each key is independent; after the strain energy of each key is calculated, the strain energy on the key is required to be summed to the corresponding mass point, and at the moment, a register with high reading and writing efficiency and a shared memory are combined for summation;

the specific use is as shown in the following pseudo code:

In the formula, e= |ζ+η| - ζ|;

The calculation formula of the adaptive power relaxation method is as follows:

S19, transmitting calculation results required by displacement, damage index and the like from the equipment end to the host end, and writing the calculation results into a file for storage;

s20, releasing the memory, and ending the calculation.

Based on the above embodiments, the present invention demonstrates the benefits of the GPU parallel implementation approach of near field dynamics problem with the following two examples.

The first example is a flat plate tensile test, as shown in fig. 2, in which a plate length l=1m and a plate width w=0.5 m are applied with a tensile load equal to one thousandth of the elastic modulus of the material at both left and right ends. This is a two-dimensional quasi-static problem that is solved using keyed near field dynamics. The acceleration effect of the parallel algorithm, i.e. the ratio of the running time of the serial algorithm to the parallel algorithm, is typically evaluated by a speed-up ratio. The accelerating effect of the model compared with the serial algorithm is shown in the table I by changing the particle spacing of the model so as to change the calculation scale.

Calculation time contrast of parallel and serial methods of table-panel tensile test

It was found that the speed ratio was proportional to the calculation scale, and increased from 50.5 to 442.9 as the calculation scale increased from 5 to 50 tens of thousands. When the calculation scale is increased by 200 ten thousand, the calculation time of the serial method is too long, so that the serial method is frequently crashed, and the parallel method can still calculate the result quickly. Comparing the settlement result of the parallel method with the analysis solution, the result is shown in fig. 3, and the settlement result and the analysis solution can be matched very well, which shows that the calculation accuracy of the parallel method is very high.

The second example is a rapid load test applied to a crack-containing flat plate, as shown in fig. 4, in which the plate length l=0.05 m, the plate width w=0.05 m, the crack length 2a=0.01 m, virtual boundaries are applied to the upper and lower ends, and a rapid tensile load of 20m/s is applied. The method belongs to a two-dimensional transient problem, and adopts conventional near-field dynamics solution, and the acceleration effect is shown in a second table. It can be found that extremely high acceleration ratio is obtained after transformation solution type, and the acceleration ratio is 207.4 in a 100.6-ten thousand calculation scale, which is far higher than that of the existing method. The serial method and the parallel method are basically consistent in crack growth calculation, which shows that the parallel method has high precision. The three-dimensional problem is simply a change in the scale of computation compared to the two-dimensional problem, and thus the method also uses the three-dimensional problem.

Calculation time comparison of parallel and serial methods of quick load test of surface two crack-containing flat plates

In summary, the GPU parallel method for near-field dynamics simulation provided by the application is universally applicable, and can greatly improve the calculation efficiency.

The embodiments of the present invention have been described in detail with reference to the drawings, but the present invention is not limited thereto, and various changes can be made within the knowledge of those skilled in the art without departing from the spirit of the present invention.

Claims

1. The GPU parallel implementation method for the near-field dynamics problem is characterized by comprising the following steps of:

S1, creating a structural body PD_Parameter containing the geometry, materials and other parameters of a model; the CPU declares the global variable of PD_Parameter at the host end, and declares the constant memory variable at the device end;

the structure PD_Parameter also comprises a model total quality point N, a maximum neighbor point MN, an integral quality point N_Int, a particle distance delta, a near field range delta and quality points in the x, y and z directions;

In the formula, e= |ζ+η| - ζ|;

The calculation formula of the adaptive power relaxation method is as follows:

s20, releasing the memory, and ending the calculation.

2. The method of claim 1, wherein in S2, the array associated with the particles has displacement, velocity, acceleration, coordinates, near-field force; the key-related array has key correction coefficients, key failure arrays, volume correction coefficient influencing functions.

3. The method according to claim 1, wherein in S7, the calculation of the key length and the volume correction coefficient of each key are independent, so that n×mn threads are started, and the calculation formula corresponding to each key is mapped to the corresponding thread, which is called as key mapping parallelism.

4. The method according to claim 1, wherein in S11, after calculating the strain energy of each key, the strain energy of the key is summed to the corresponding particle, and a register with high read-write efficiency and a shared memory are used in combination for summation.