CN114580144B - GPU parallel implementation method for near field dynamics problem - Google Patents

GPU parallel implementation method for near field dynamics problem Download PDF

Info

Publication number
CN114580144B
CN114580144B CN202210047282.5A CN202210047282A CN114580144B CN 114580144 B CN114580144 B CN 114580144B CN 202210047282 A CN202210047282 A CN 202210047282A CN 114580144 B CN114580144 B CN 114580144B
Authority
CN
China
Prior art keywords
key
array
near field
particles
particle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210047282.5A
Other languages
Chinese (zh)
Other versions
CN114580144A (en
Inventor
何庆
王晓明
安博洋
王启航
王平
黄�俊
董唯佳
王宁
黄洪
张岷
刘启宾
匡俊
余天乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest Jiaotong University
Original Assignee
Southwest Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest Jiaotong University filed Critical Southwest Jiaotong University
Priority to CN202210047282.5A priority Critical patent/CN114580144B/en
Publication of CN114580144A publication Critical patent/CN114580144A/en
Application granted granted Critical
Publication of CN114580144B publication Critical patent/CN114580144B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/14Force analysis or force optimisation, e.g. static or dynamic forces
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the field of accelerated computation of near field dynamics, in particular to a GPU parallel implementation method of near field dynamics. The invention provides a GPU parallel implementation method for near-field dynamics problems, which utilizes GPU equipment, uses registers and shared memory with high read-write efficiency to calculate, uses constant memory with a broadcasting mechanism for constant parameters, thereby realizing acceleration effect far higher than that of the existing method, and overcoming the problem of limited calculation efficiency improvement caused by insufficient utilization of the GPU equipment and a parallel scheme.

Description

GPU parallel implementation method for near field dynamics problem
Technical Field
The invention relates to the field of accelerated computation of near field dynamics, in particular to a GPU parallel implementation method of near field dynamics.
Background
Continuous medium mechanics based on partial differential equation faces mathematical tool failure when solving discontinuous problems such as crack initiation and expansion, so that the existing widely applied finite element method based on continuous medium mechanics can not well simulate the discontinuous problems such as crack initiation and expansion. Near field dynamics is an emerging mechanical theory based on a non-local action idea, and an integral equation is adopted to replace a partial differential equation, so that the problem that a mathematical tool of continuous medium mechanics fails at a discontinuous position is not faced, and the near field dynamics has unique advantages in simulating the problems of impact, crack, functionally graded materials and the like. However, since near field dynamics is a non-local theory, one particle can interact with tens or even hundreds of points in the near field range, which results in the use of near field dynamics for analog to calculate the cost. Particularly when fatigue and transient problems are simulated, a large number of iterative solutions are required, which can lead to a sudden increase in computation time, thereby limiting the application of near-field dynamics.
The rapid development of the GPU greatly promotes the development of parallel computing, and the computing efficiency can be greatly improved by large-scale parallel computing. Near field dynamics simulation is typically solved by a grid-less method, which is particularly suited for parallel computing. Therefore, the problem of low calculation efficiency of near field dynamics can be solved to a certain extent by adopting GPU parallel calculation, so that the application range of near field dynamics is widened. However, the problem of parallel calculation of near field dynamics is not widely focused, and only OpenMP, openAcc, openCL, CUDA and other parallel libraries are used for accelerating near field dynamics calculation, simulating the problems of composite materials, crack propagation and the like, so that certain calculation efficiency is improved. However, the above method does not give a detailed implementation, and does not fully utilize the performance of the GPU, resulting in limited improvement of the computational efficiency.
Disclosure of Invention
Aiming at the problems in the background technology, a GPU parallel implementation method for near field dynamics problem is provided. The invention provides a GPU parallel implementation method for near-field dynamics problems, which utilizes GPU equipment, uses registers and shared memory with high read-write efficiency to calculate, uses constant memory with a broadcasting mechanism for constant parameters, thereby realizing acceleration effect far higher than that of the existing method, and solving the problem that the existing method cannot fully utilize the GPU equipment and a parallel scheme, so that the calculation efficiency is improved limitedly.
The invention provides a GPU parallel implementation method for near field dynamics, which comprises the following steps:
S1, creating a structural body PD_Parameter containing the geometry, materials and other parameters of a model; declaring a global variable of PD_Parameter at a host side (CPU) and declaring a constant memory variable at a device side (GPU);
S2, respectively creating a structure Atom and Bond containing an array related to the particles and keys;
S3, assigning a value to the PD_Parameter of the host end and transmitting the value to a constant memory variable of the equipment end;
S4, distributing equipment memories for the arrays of the Atom and the Bond of the structural body, wherein the array related to the particles is distributed with a memory of N times of single data byte number, the array related to the key is distributed with a maximum possible memory of N times of single data byte number, and attribute data of all keys in a near field range of one particle are stored in each continuous MN interval;
s5, mapping the coordinate calculation formula of each object particle to each thread of the GPU correspondingly, starting the thread with the same total particle number, and mapping each point to a special thread;
S6, generating the neighborhood internal action of each particle, determining the neighbor point number array NN and the index array NL of the particle, and paralleling the neighbor point number array NN and the index array NL in a point mapping mode;
S7, correcting the volume of the particles, and determining an original key length array idist and a volume correction coefficient array fac in the process; the calculation formula of the volume correction is as follows:
S8, applying displacement and boundary conditions for the model by restraining the displacement and the speed of the particles, wherein the calculation of each particle is mutually independent;
s9, initializing all values of the failure array to be 1, setting the value of the failure array corresponding to a key penetrated by the crack to be 0 according to the model requirement, indicating that the key is broken, and adding the crack to the model in such a way;
S10, judging whether the key-type near field dynamics or the conventional-type near field dynamics is used; performing steps S11-S12 by key-type near field dynamics; the near field dynamics of the mode performs steps S13-S15;
S11, determining a key correction coefficient array scr according to a strain energy direction correction method, wherein strain energy calculation on each key is independent;
s12, calculating the elongation of the key on the premise that the corresponding key is not broken by adopting a key mapping parallel mode:
Wherein ζ represents the relative position vector of the two particles after deformation, and η represents the relative displacement vector after deformation; setting the value of the key failure array with the key elongation exceeding the critical value to be 0, and representing the key fracture; the near field force of each particle is calculated according to the combined use mode of the register and the shared memory, and the calculation formula of the near field force is as follows:
Wherein c represents a key constant, M represents a key direction, and H x represents a group of particles, namely all neighbor points in a near field range;
S13, calculating an influence function array omega and a weighted volume array m by adopting a parallel mode of key mapping and a mode of combining a register and a shared memory, wherein the calculation formula is as follows:
S14, calculating the volume strain of particles by adopting a key mapping parallel mode and combining a register and a shared memory which are used on the premise that the corresponding key is not broken, wherein the calculation formula is as follows:
In the formula, e= |ζ+η| - ζ|;
S15, calculating the elongation of the key to judge whether the key is broken or not on the premise that the corresponding key is not broken by adopting a key mapping parallel mode, and calculating the near field force of particles by using a register and a shared memory in a combined way, wherein the calculation formula of the near field force is as follows when the model is uniformly scattered:
wherein k and G represent bulk modulus and shear modulus, respectively, and subscripts i and j represent particles i and j, respectively;
S16, judging whether the problem belongs to a quasi-static problem or a transient problem; the quasi-static problem uses an adaptive dynamic relaxation method, the transient steady state problem updates the speed and displacement of particles based on a display integration method of center difference;
The calculation formula of the adaptive power relaxation method is as follows:
Wherein n represents the iteration times, D is a virtual diagonal density matrix, C is a damping coefficient, and F is the sum of near-field force and external force received by the particle;
The calculation formula of the display integral based on the center difference is as follows:
wherein a, v and u are respectively the acceleration, the speed and the displacement of the surface particles; t represents the current time, Δt represents the time step;
S17, judging whether iteration is finished, if so, executing S18; if not, returning the key type to S12 and returning the state type to S14 according to the type of near field dynamics;
S18, calculating damage indexes of particles by adopting a parallel mode of key mapping and combining a register and a shared memory, wherein the damage indexes represent the volumes of neighbor points with broken keys and the proportion of the neighbor points to the whole neighborhood volume, and the calculation formula is as follows:
s19, transmitting the required calculation result from the equipment end to the host end, and writing the calculation result into a file for storage;
s20, releasing the memory, and ending the calculation.
Preferably, in S1, the structure pd_parameter should also include the model total number of mass points N, the maximum neighbor number of mass points MN, the integral number of mass points n_int, the particle spacing Δ, the near field range δ, and the x, y, z-direction number of mass points.
Preferably, in S2, the array associated with the particle has displacement, velocity, acceleration, coordinates, near field forces; the key-related array has key correction coefficients, key failure arrays, volume correction coefficient influencing functions.
Preferably, in S7, the calculation of the key length and the volume correction coefficient of each key are independent from each other, so n×mn threads are started, and the calculation formula corresponding to each key is mapped to the corresponding thread, which is called as a parallel key mapping.
Preferably, in S11, after the strain energy of each key is calculated, the strain energy of the key needs to be summed to the corresponding particle, and in this case, a register with high read/write efficiency and a shared memory are used in combination for summation.
Preferably, the specific usage is as shown in the following pseudo code:
compared with the prior art, the invention has the following beneficial technical effects:
The invention provides a GPU parallel implementation method for near-field dynamics problems, which utilizes GPU equipment, uses registers and shared memory with high read-write efficiency to calculate, uses constant memory with a broadcasting mechanism for constant parameters, thereby realizing acceleration effect far higher than that of the existing method, and solving the problem that the existing method cannot fully utilize the GPU equipment and a parallel scheme, so that the calculation efficiency is improved limitedly.
Drawings
FIG. 1 is a schematic diagram of a GPU parallel scheme of a prior art method;
FIG. 2 is a schematic drawing of a uniaxially stretched flat sheet;
FIG. 3 is a schematic illustration of the centerline displacement of a uniaxially stretched flat panel;
FIG. 4 is a schematic diagram of a crack-containing slab subjected to velocity boundary conditions.
Detailed Description
The GPU parallel implementation method for the near field dynamics problem provided by the invention comprises the following steps:
S1, creating a structural body PD_Parameter containing the geometry, materials and other parameters of a model; declaring a global variable of PD_Parameter at a host side (CPU) and declaring a constant memory variable at a device side (GPU); the structure pd_parameter should further include a model total number of mass points N, a maximum number of neighbor points MN, an integral number of mass points n_int, a particle distance Δ, a near field range δ, and constant parameters to be used in subsequent procedures such as x, y, and z direction mass points;
S2, respectively creating a structure Atom and Bond containing an array related to the particles and keys; the array related to the particles comprises displacement, speed, acceleration, coordinates, near-field force and the like; the array related to the key is provided with a key correction coefficient, a key failure array, a volume correction coefficient influence function and the like;
S3, assigning a value to the PD_Parameter of the host end and transmitting the value to a constant memory variable of the equipment end;
S4, distributing equipment memories for the arrays of the Atom and the Bond of the structural body, wherein the array related to the particles is distributed with a memory of N times of single data byte number, the array related to the key is distributed with a maximum possible memory of N times of single data byte number, and attribute data of all keys in a near field range of one particle are stored in each continuous MN interval;
s5, the same as the prior method (figure 1), mapping the coordinate calculation formula of each object point to each thread of the GPU correspondingly, but starting the thread with the same total number of the object points, wherein each point is mapped to a special thread; the method does not have the phenomenon that a plurality of points are mapped to the same thread as the prior method, and the parallel mode is called point mapping parallel;
S6, generating the neighborhood internal action of each particle, determining the neighbor point number array NN and the index array NL of the particle, and paralleling the neighbor point number array NN and the index array NL in a point mapping mode;
s7, since particles at the boundary of the neighborhood may not be all contained in the neighborhood, correcting the volume of the particles, and determining an original key length array idist and a volume correction coefficient array fac in the process; the calculation of the key length and the volume correction coefficient of each key are mutually independent, so that N×MN threads are started, the calculation formula corresponding to each key is mapped to the corresponding thread, and the parallel mode is called key mapping parallel;
The calculation formula of the volume correction is as follows:
S8, applying displacement and boundary conditions for the model by restraining the displacement and the speed of the particles, wherein the calculation of each particle is mutually independent;
s9, initializing all values of the failure array to be 1, setting the value of the failure array corresponding to a key penetrated by the crack to be 0 according to the model requirement, indicating that the key is broken, and adding the crack to the model in such a way;
S10, judging whether the key-type near field dynamics or the conventional-type near field dynamics is used; performing steps S11-S12 by key-type near field dynamics; the near field dynamics of the mode performs steps S13-S15;
S11, determining a key correction coefficient array scr according to a strain energy direction correction method, wherein strain energy calculation on each key is independent; after the strain energy of each key is calculated, the strain energy on the key is required to be summed to the corresponding mass point, and at the moment, a register with high reading and writing efficiency and a shared memory are combined for summation;
the specific use is as shown in the following pseudo code:
s12, calculating the elongation of the key on the premise that the corresponding key is not broken by adopting a key mapping parallel mode:
Wherein ζ represents the relative position vector of the two particles after deformation, and η represents the relative displacement vector after deformation; setting the value of the key failure array with the key elongation exceeding the critical value to be 0, and representing the key fracture; the near field force of each particle is calculated according to the combined use mode of the register and the shared memory, and the calculation formula of the near field force is as follows:
Wherein c represents a key constant, M represents a key direction, and H x represents a group of particles, namely all neighbor points in a near field range;
S13, calculating an influence function array omega and a weighted volume array m by adopting a parallel mode of key mapping and a mode of combining a register and a shared memory, wherein the calculation formula is as follows:
S14, calculating the volume strain of particles by adopting a key mapping parallel mode and combining a register and a shared memory which are used on the premise that the corresponding key is not broken, wherein the calculation formula is as follows:
In the formula, e= |ζ+η| - ζ|;
S15, calculating the elongation of the key to judge whether the key is broken or not on the premise that the corresponding key is not broken by adopting a key mapping parallel mode, and calculating the near field force of particles by using a register and a shared memory in a combined way, wherein the calculation formula of the near field force is as follows when the model is uniformly scattered:
wherein k and G represent bulk modulus and shear modulus, respectively, and subscripts i and j represent particles i and j, respectively;
S16, judging whether the problem belongs to a quasi-static problem or a transient problem; the quasi-static problem uses an adaptive dynamic relaxation method, the transient steady state problem updates the speed and displacement of particles based on a display integration method of center difference;
The calculation formula of the adaptive power relaxation method is as follows:
Wherein n represents the iteration times, D is a virtual diagonal density matrix, C is a damping coefficient, and F is the sum of near-field force and external force received by the particle;
The calculation formula of the display integral based on the center difference is as follows:
wherein a, v and u are respectively the acceleration, the speed and the displacement of the surface particles; t represents the current time, Δt represents the time step;
S17, judging whether iteration is finished, if so, executing S18; if not, returning the key type to S12 and returning the state type to S14 according to the type of near field dynamics;
S18, calculating damage indexes of particles by adopting a parallel mode of key mapping and combining a register and a shared memory, wherein the damage indexes represent the volumes of neighbor points with broken keys and the proportion of the neighbor points to the whole neighborhood volume, and the calculation formula is as follows:
S19, transmitting calculation results required by displacement, damage index and the like from the equipment end to the host end, and writing the calculation results into a file for storage;
s20, releasing the memory, and ending the calculation.
Based on the above embodiments, the present invention demonstrates the benefits of the GPU parallel implementation approach of near field dynamics problem with the following two examples.
The first example is a flat plate tensile test, as shown in fig. 2, in which a plate length l=1m and a plate width w=0.5 m are applied with a tensile load equal to one thousandth of the elastic modulus of the material at both left and right ends. This is a two-dimensional quasi-static problem that is solved using keyed near field dynamics. The acceleration effect of the parallel algorithm, i.e. the ratio of the running time of the serial algorithm to the parallel algorithm, is typically evaluated by a speed-up ratio. The accelerating effect of the model compared with the serial algorithm is shown in the table I by changing the particle spacing of the model so as to change the calculation scale.
Calculation time contrast of parallel and serial methods of table-panel tensile test
It was found that the speed ratio was proportional to the calculation scale, and increased from 50.5 to 442.9 as the calculation scale increased from 5 to 50 tens of thousands. When the calculation scale is increased by 200 ten thousand, the calculation time of the serial method is too long, so that the serial method is frequently crashed, and the parallel method can still calculate the result quickly. Comparing the settlement result of the parallel method with the analysis solution, the result is shown in fig. 3, and the settlement result and the analysis solution can be matched very well, which shows that the calculation accuracy of the parallel method is very high.
The second example is a rapid load test applied to a crack-containing flat plate, as shown in fig. 4, in which the plate length l=0.05 m, the plate width w=0.05 m, the crack length 2a=0.01 m, virtual boundaries are applied to the upper and lower ends, and a rapid tensile load of 20m/s is applied. The method belongs to a two-dimensional transient problem, and adopts conventional near-field dynamics solution, and the acceleration effect is shown in a second table. It can be found that extremely high acceleration ratio is obtained after transformation solution type, and the acceleration ratio is 207.4 in a 100.6-ten thousand calculation scale, which is far higher than that of the existing method. The serial method and the parallel method are basically consistent in crack growth calculation, which shows that the parallel method has high precision. The three-dimensional problem is simply a change in the scale of computation compared to the two-dimensional problem, and thus the method also uses the three-dimensional problem.
Calculation time comparison of parallel and serial methods of quick load test of surface two crack-containing flat plates
In summary, the GPU parallel method for near-field dynamics simulation provided by the application is universally applicable, and can greatly improve the calculation efficiency.
The embodiments of the present invention have been described in detail with reference to the drawings, but the present invention is not limited thereto, and various changes can be made within the knowledge of those skilled in the art without departing from the spirit of the present invention.

Claims (4)

1. The GPU parallel implementation method for the near-field dynamics problem is characterized by comprising the following steps of:
S1, creating a structural body PD_Parameter containing the geometry, materials and other parameters of a model; the CPU declares the global variable of PD_Parameter at the host end, and declares the constant memory variable at the device end;
the structure PD_Parameter also comprises a model total quality point N, a maximum neighbor point MN, an integral quality point N_Int, a particle distance delta, a near field range delta and quality points in the x, y and z directions;
S2, respectively creating a structure Atom and Bond containing an array related to the particles and keys;
S3, assigning a value to the PD_Parameter of the host end and transmitting the value to a constant memory variable of the equipment end;
S4, distributing equipment memories for the arrays of the Atom and the Bond of the structural body, wherein the array related to the particles is distributed with a memory of N times of single data byte number, the array related to the key is distributed with a maximum possible memory of N times of single data byte number, and attribute data of all keys in a near field range of one particle are stored in each continuous MN interval;
s5, mapping the coordinate calculation formula of each object particle to each thread of the GPU correspondingly, starting the thread with the same total particle number, and mapping each point to a special thread;
S6, generating the neighborhood internal action of each particle, determining the neighbor point number array NN and the index array NL of the particle, and paralleling the neighbor point number array NN and the index array NL in a point mapping mode;
S7, correcting the volume of the particles, and determining an original key length array idist and a volume correction coefficient array fac in the process; the calculation formula of the volume correction is as follows:
S8, applying displacement and boundary conditions for the model by restraining the displacement and the speed of the particles, wherein the calculation of each particle is mutually independent;
s9, initializing all values of the failure array to be 1, setting the value of the failure array corresponding to a key penetrated by the crack to be 0 according to the model requirement, indicating that the key is broken, and adding the crack to the model in such a way;
S10, judging whether the key-type near field dynamics or the conventional-type near field dynamics is used; performing steps S11-S12 by key-type near field dynamics; the near field dynamics of the mode performs steps S13-S15;
S11, determining a key correction coefficient array scr according to a strain energy direction correction method, wherein strain energy calculation on each key is independent;
s12, calculating the elongation of the key on the premise that the corresponding key is not broken by adopting a key mapping parallel mode:
Wherein ζ represents the relative position vector of the two particles after deformation, and η represents the relative displacement vector after deformation; setting the value of the key failure array with the key elongation exceeding the critical value to be 0, and representing the key fracture; the near field force of each particle is calculated according to the combined use mode of the register and the shared memory, and the calculation formula of the near field force is as follows:
Wherein c represents a key constant, M represents a key direction, and H x represents a group of particles, namely all neighbor points in a near field range;
S13, calculating an influence function array omega and a weighted volume array m by adopting a parallel mode of key mapping and a mode of combining a register and a shared memory, wherein the calculation formula is as follows:
S14, calculating the volume strain of particles by adopting a key mapping parallel mode and combining a register and a shared memory which are used on the premise that the corresponding key is not broken, wherein the calculation formula is as follows:
In the formula, e= |ζ+η| - ζ|;
S15, calculating the elongation of the key to judge whether the key is broken or not on the premise that the corresponding key is not broken by adopting a key mapping parallel mode, and calculating the near field force of particles by using a register and a shared memory in a combined way, wherein the calculation formula of the near field force is as follows when the model is uniformly scattered:
wherein k and G represent bulk modulus and shear modulus, respectively, and subscripts i and j represent particles i and j, respectively;
S16, judging whether the problem belongs to a quasi-static problem or a transient problem; the quasi-static problem uses an adaptive dynamic relaxation method, the transient steady state problem updates the speed and displacement of particles based on a display integration method of center difference;
The calculation formula of the adaptive power relaxation method is as follows:
Wherein n represents the iteration times, D is a virtual diagonal density matrix, C is a damping coefficient, and F is the sum of near-field force and external force received by the particle;
The calculation formula of the display integral based on the center difference is as follows:
wherein a, v and u are respectively the acceleration, the speed and the displacement of the surface particles; t represents the current time, Δt represents the time step;
S17, judging whether iteration is finished, if so, executing S18; if not, returning the key type to S12 and returning the state type to S14 according to the type of near field dynamics;
S18, calculating damage indexes of particles by adopting a parallel mode of key mapping and combining a register and a shared memory, wherein the damage indexes represent the volumes of neighbor points with broken keys and the proportion of the neighbor points to the whole neighborhood volume, and the calculation formula is as follows:
s19, transmitting the required calculation result from the equipment end to the host end, and writing the calculation result into a file for storage;
s20, releasing the memory, and ending the calculation.
2. The method of claim 1, wherein in S2, the array associated with the particles has displacement, velocity, acceleration, coordinates, near-field force; the key-related array has key correction coefficients, key failure arrays, volume correction coefficient influencing functions.
3. The method according to claim 1, wherein in S7, the calculation of the key length and the volume correction coefficient of each key are independent, so that n×mn threads are started, and the calculation formula corresponding to each key is mapped to the corresponding thread, which is called as key mapping parallelism.
4. The method according to claim 1, wherein in S11, after calculating the strain energy of each key, the strain energy of the key is summed to the corresponding particle, and a register with high read-write efficiency and a shared memory are used in combination for summation.
CN202210047282.5A 2022-01-17 2022-01-17 GPU parallel implementation method for near field dynamics problem Active CN114580144B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210047282.5A CN114580144B (en) 2022-01-17 2022-01-17 GPU parallel implementation method for near field dynamics problem

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210047282.5A CN114580144B (en) 2022-01-17 2022-01-17 GPU parallel implementation method for near field dynamics problem

Publications (2)

Publication Number Publication Date
CN114580144A CN114580144A (en) 2022-06-03
CN114580144B true CN114580144B (en) 2024-05-17

Family

ID=81772921

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210047282.5A Active CN114580144B (en) 2022-01-17 2022-01-17 GPU parallel implementation method for near field dynamics problem

Country Status (1)

Country Link
CN (1) CN114580144B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9432298B1 (en) * 2011-12-09 2016-08-30 P4tents1, LLC System, method, and computer program product for improving memory systems
CN107247973A (en) * 2017-06-29 2017-10-13 中国矿业大学 A kind of preferred Parallel Particle Swarm Optimization optimization method of SVMs parameter based on spark
WO2018119153A2 (en) * 2016-12-21 2018-06-28 Intel Corporation Wireless communication technology, apparatuses, and methods
EP3506108A1 (en) * 2017-12-30 2019-07-03 Intel Corporation Compression in machine learning and deep learning processing
CN111339594A (en) * 2020-02-26 2020-06-26 河北工业大学 DIC technology-based near-field dynamics parameter experiment inversion system and use method
CN111523778A (en) * 2020-04-10 2020-08-11 三峡大学 Power grid operation safety assessment method based on particle swarm algorithm and gradient lifting tree
CN113761760A (en) * 2021-07-21 2021-12-07 山东大学 PD-FEM numerical calculation method and system for engineering scale rock mass fracture overall process simulation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11720408B2 (en) * 2018-05-08 2023-08-08 Vmware, Inc. Method and system for assigning a virtual machine in virtual GPU enabled systems

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9432298B1 (en) * 2011-12-09 2016-08-30 P4tents1, LLC System, method, and computer program product for improving memory systems
WO2018119153A2 (en) * 2016-12-21 2018-06-28 Intel Corporation Wireless communication technology, apparatuses, and methods
CN107247973A (en) * 2017-06-29 2017-10-13 中国矿业大学 A kind of preferred Parallel Particle Swarm Optimization optimization method of SVMs parameter based on spark
EP3506108A1 (en) * 2017-12-30 2019-07-03 Intel Corporation Compression in machine learning and deep learning processing
CN111339594A (en) * 2020-02-26 2020-06-26 河北工业大学 DIC technology-based near-field dynamics parameter experiment inversion system and use method
CN111523778A (en) * 2020-04-10 2020-08-11 三峡大学 Power grid operation safety assessment method based on particle swarm algorithm and gradient lifting tree
CN113761760A (en) * 2021-07-21 2021-12-07 山东大学 PD-FEM numerical calculation method and system for engineering scale rock mass fracture overall process simulation

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A GPU parallel scheme for accelerating 2D and 3D peridynamics models;Xiaoming Wang等;《Theoretical and Applied Fracture Mechanics》;20221030;1-15 *
冰三点弯曲试验的近场动力学数值模拟;薛彦卓;陆锡奎;王庆;白晓龙;李志军;;哈尔滨工程大学学报;20180205(第04期);5-11 *
基于GPU的近场动力学模拟的并行化方法;刘肃肃;胡乐;余音;;上海交通大学学报;20160928(09);32-37+45 *

Also Published As

Publication number Publication date
CN114580144A (en) 2022-06-03

Similar Documents

Publication Publication Date Title
Cebral et al. Conservative load projection and tracking for fluid-structure problems
Zhuang et al. Extended finite element method: Tsinghua University Press computational mechanics series
Foster et al. Viscoplasticity using peridynamics
Liu et al. An introduction to meshfree methods and their programming
Percy et al. Application of matrix displacement method to linear elastic analysisof shells of revolution.
CN113360992B (en) Phase field material point method for analyzing large deformation fracture of rock-soil structure
Becker Ring fragmentation predictions using the Gurson model with material stability conditions as failure criteria
Lei et al. A smooth contact algorithm for the combined finite discrete element method
CN105260581A (en) Method for virtual vibration and impact tests of electromechanical control equipment of ship
CN108984829B (en) Calculation method and system for stacking process of rock-fill concrete rock-fill body
CN112949065B (en) Double-scale method, device, storage medium and equipment for simulating mechanical behavior of layered rock mass
Danielson Fifteen node tetrahedral elements for explicit methods in nonlinear solid dynamics
US8548776B2 (en) Parallel physics solver
CN109960865B9 (en) GPU acceleration method for dynamic response analysis of thin plate grid-free Galerkin structure
Zhang et al. An improved M-SPEM for modeling complex hydroelastic fluid-structure interaction problems
Stricklin et al. Computation and solution procedures for non-linear analysis by combined finite element—finite difference methods
CN114580144B (en) GPU parallel implementation method for near field dynamics problem
Shi et al. Toppling dynamics of regularly spaced dominoes in an array
Ichimura et al. A fast scalable implicit solver with concentrated computation for nonlinear time-evolution problems on low-order unstructured finite elements
CN113033060A (en) Optimization method for predicting complex coal seam mining structure
Wang et al. Explicit dynamic analysis of sheet metal forming processes using linear prismatic and hexahedral solid-shell elements
Zha et al. Solving 2D coupled water entry problem by an improved MPS method
Key et al. A suitable low‐order, tetrahedral finite element for solids
Zhang et al. Numerical study of solitary wave slamming on a 3-D flexible plate by MPS-FEM Coupled Method
Argilaga Femxdem double scale approach with second gradient regularization applied to granular materials modeling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant