CN113467945A - Sensitivity parallelism and GPU acceleration method based on meshless topology optimization - Google Patents

Sensitivity parallelism and GPU acceleration method based on meshless topology optimization Download PDF

Info

Publication number
CN113467945A
CN113467945A CN202110736591.9A CN202110736591A CN113467945A CN 113467945 A CN113467945 A CN 113467945A CN 202110736591 A CN202110736591 A CN 202110736591A CN 113467945 A CN113467945 A CN 113467945A
Authority
CN
China
Prior art keywords
node
value
integral
thread
integral point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110736591.9A
Other languages
Chinese (zh)
Other versions
CN113467945B (en
Inventor
卢海山
龚曙光
谢桂兰
张建平
尹硕辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiangtan University
Original Assignee
Xiangtan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiangtan University filed Critical Xiangtan University
Priority to CN202110736591.9A priority Critical patent/CN113467945B/en
Publication of CN113467945A publication Critical patent/CN113467945A/en
Application granted granted Critical
Publication of CN113467945B publication Critical patent/CN113467945B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5018Thread allocation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a sensitivity parallelism and GPU acceleration method based on meshless topology optimization. It includes: performing equivalent transformation on a calculation formula based on the sensitivity of an objective function in a topological optimization model of a meshless structure, namely transforming the calculation formula into a functional form of a part Q and a residual part R related to an overall characteristic matrix and a degree of freedom vector of the structure; traversing all the integral points by taking the integral points as a parallel computing unit of coarse granularity, computing the Q value of each integral point in parallel, and temporarily storing the Q value to a corresponding storage unit; traversing all nodes by taking the nodes as a parallel computing unit of coarse granularity, extracting Q values corresponding to all integral points in the influence of the nodes, computing the target function sensitivity value S at each node in parallel, and storing the target function sensitivity value S in a corresponding storage unit; all memory cells temporarily storing the Q value are released. The invention has low hardware cost and strong universality, can effectively improve the sensitivity calculation efficiency of the objective function and greatly reduce the time consumption of the topology optimization process.

Description

Sensitivity parallelism and GPU acceleration method based on meshless topology optimization
Technical Field
The invention belongs to the technical field of simulation calculation in computer aided engineering, and particularly relates to a grid-free method structure topology optimization-based target function sensitivity parallel analysis and a GPU (graphics Processing Unit) acceleration method thereof.
Background
In recent decades, gridless methods have been rapidly developed in the field of computational simulation. The grid-free method is different from the traditional finite element method in that field variable interpolation is adopted, and a field variable approximation method is adopted, so that the grid-free method only needs discrete node information and does not need grid information among nodes, and the difficulty in numerical calculation such as grid distortion caused by grids is fundamentally avoided. Various gridless methods have been developed, such as the Smooth Particle Hydrodynamics (SPH), gridless Garlerkin (EFGM), reconstructed nuclear particles (RKPM), Mass Point Methods (MPM), and the like. The gridless method has high calculation accuracy and fast convergence, and is widely applied to the fields of moving boundary problems (such as dynamic crack propagation), large deformation problems (such as metal plastic forming) and the like.
In recent years, in the field of structural topology optimization, a meshless method is introduced, a topology optimization model is established based on a node density variable, the problems of numerical instability such as checkerboard and mesh dependency in topology optimization are solved with great success, and the inherent advantages of the meshless method are also exerted in the process of processing the topology optimization problem of a large-deformation structure.
In the solving process based on the meshless topology optimization model, a gradient solving method is generally adopted to improve the convergence speed, so that the sensitivity information of an objective function to design variables is required. In the sensitivity analysis, each node influence domain in the gridless method comprises a large number of integral points, so that the calculation of the sensitivity value of the objective function is very time-consuming, and particularly the sensitivity analysis in the large-scale three-dimensional structure topology optimization is more time-consuming, thereby seriously limiting the application of the gridless method in the large-scale or three-dimensional topology optimization problem.
With the rapid development of computer technology, parallel computing has become an effective means for solving large-scale time-consuming problems, and particularly, with the publication of a unified computing device architecture by NVIDIA corporation in 2006, the era of general GPU parallel computing is opened, and parallel computing has been successful in a plurality of scientific and engineering fields. However, the existing parallel computing, especially the GPU parallel computing method, has a high requirement for solving the parallelization characteristic of the algorithm, otherwise, not only the computing time is not reduced, but also an erroneous computing result may be caused.
In the objective function sensitivity calculation based on the meshless topology optimization, if the traditional calculation method of the cyclic integral points is directly parallelized, because each integral point definition domain comprises a plurality of nodes, and the objective function sensitivity values and the nodes are in one-to-one correspondence, a data race condition occurs, namely different parallel threads write data into the same storage unit, so that unpredictable results are caused. Although atomic operations can avoid this problem, they can severely reduce the efficiency of parallel computing.
Disclosure of Invention
The invention aims to provide sensitivity parallelism based on the topology optimization of the meshless method and a GPU acceleration method thereof aiming at the defects that the sensitivity calculation consumes long time and the traditional analysis method lacks parallel characteristics in the topology optimization of the meshless method. The method can overcome the defects that the traditional sensitivity calculation method is long in time consumption and does not have parallelization characteristics, and effectively reduces the time consumption of sensitivity calculation in topology optimization based on a meshless method.
The invention discloses a sensitivity parallelism and GPU acceleration method based on meshless topology optimization, which comprises the following steps in sequence:
(1) performing equivalent transformation on a calculation formula based on the sensitivity of an objective function in a topological optimization model of a meshless structure, namely transforming the calculation formula into a functional form of a part Q and a residual part R related to an overall characteristic matrix and a degree of freedom vector of the structure, wherein the functional form is shown as a formula (1):
S=S(Q,R) (1);
wherein, Q and the integral point are in one-to-one correspondence relationship; r and nodes are in one-to-one correspondence relationship; s is a sensitivity value of the objective function;
(2) traversing all the integral points by taking the integral points as a parallel computing unit of coarse granularity, computing the Q value of each integral point in parallel, and temporarily storing the Q value to a corresponding storage unit;
(3) traversing all nodes by taking the nodes as a parallel computing unit of coarse granularity, extracting Q values corresponding to all integral points in the influence of the nodes, computing the target function sensitivity value S at each node in parallel, and storing the target function sensitivity value S in a corresponding storage unit;
(4) all memory cells temporarily storing the Q value are released.
Specifically, the step (2) specifically comprises the following steps:
(a) for a certain integral point in coarse-grained parallel computation, acquiring all node information in a definition domain of the integral point, and forming node pairs in a pairwise combination mode; the pairwise combination mode also comprises the combination of each node and the node;
(b) taking each node pair in the integration point definition domain of the step (a) as a parallel computing unit with fine granularity, allocating a corresponding shared storage unit for each node pair, namely all parallel threads corresponding to the integration point are accessible and all initialized to be zero;
(c) calculating the portion of each node pair in the integral point definition domain, which contributes to the Q value, and storing the calculation result into a shared storage unit corresponding to each node pair;
(d) when the contribution values of all node pairs in the integral point definition domain are calculated, summing the contribution values in all shared storage units corresponding to all node pairs in the integral point definition domain to obtain a Q value at the integral point, and marking the Q value as QgAnd temporarily stored to the corresponding storage unit.
Specifically, the step (3) specifically comprises the following steps:
(a) for a certain node in coarse-grained parallel computation, acquiring information of all integral points in an influence domain of the node, taking each integral point in the influence domain of the node as a fine-grained parallel computation unit, distributing a corresponding shared storage unit for each integral point, namely all parallel threads corresponding to the node are accessible, and all the integral points are initialized to be zero;
(b) calculating the contribution part of each integral point in the influence domain of the node to the R value, and recording the contribution part as RgAnd extracting Q corresponding to each integral point in the node influence domaingValue, then using the Q corresponding to each integration pointgValue and RgThe value of the integral point is calculated, and the contribution part of each integral point to the sensitivity value S of the objective function is calculated and recorded as SgAnd storing the calculation result into a shared storage unit corresponding to each integral point;
(c) when the contribution values of all the integral points in the node influence domain are calculated, summing the contribution values in all the shared storage units corresponding to all the integral points in the node influence domain, namely sigma SgAnd obtaining the target function sensitivity value S corresponding to the node and storing the target function sensitivity value S in a corresponding storage unit.
When the GPU is adopted to accelerate the calculation of the Q value, the method comprises the following steps:
(1) distributing the integral points to each thread block on the GPU according to a one-to-one corresponding relation; wherein, the number of threads of each thread block is required to be set to be the positive integer power of 2;
(2) acquiring all node information in a definition domain of a score point corresponding to a certain thread block, and forming node pairs in a pairwise combination mode; the pairwise combination mode also comprises the combination of each node and the node;
(3) allocating each node pair in the integral point definition domain in the step (2) to each thread in the thread block corresponding to the integral point, and allocating a corresponding GPU shared memory unit to each thread, namely each thread in the thread block corresponding to the integral point can be accessed and is all initialized to zero;
(4) each thread in the thread block corresponding to the integral point calculates the portion of the Q value contributed by each node in the definition domain of the integral point, and stores the calculation result into a storage unit corresponding to each thread in the GPU shared memory corresponding to the thread block;
(5) when all nodes to which the integral point belongsAfter the processed thread block is processed, the contribution values in all GPU shared memories corresponding to the threads of the thread block are reduced and summed to obtain the Q of the integral point processed by the thread blockgAnd temporarily storing the value to a corresponding storage unit in the GPU global memory.
When the GPU is adopted to accelerate the calculation of the sensitivity value S of the objective function, the method comprises the following steps:
(1) distributing the nodes to each thread block on the GPU according to a one-to-one corresponding relation; wherein, the thread number of each thread block needs to be set to be the positive integer power of 2;
(2) acquiring all integral point information in an influence domain of a node corresponding to a certain thread block;
(3) allocating each integral point in the node influence domain in the step (2) to each thread in the thread block corresponding to the node, and allocating a corresponding GPU shared memory unit to each thread, namely each thread in the thread block corresponding to the node can be accessed and is all initialized to zero;
(4) the node calculates the contribution part of each integral point corresponding to the node to the R value, and the contribution part is marked as RgAnd extracting Q at the integral point corresponding to each threadgValue, then according to the corresponding QgValue and RgThe value is that each thread calculates the contribution part of the integral point corresponding to the thread to the sensitivity value S of the objective function, which is marked as SgStoring the calculation result into a storage unit corresponding to each thread in the GPU shared memory corresponding to the thread block;
(5) after all the integral points belonging to the node are processed, carrying out reduction summation on the contribution values in all the GPU shared memories corresponding to the threads belonging to the thread block, namely sigma SgAnd obtaining the target function sensitivity value S corresponding to the node, and storing the target function sensitivity value S into a corresponding storage unit in the GPU global memory.
The sensitivity parallelism and GPU acceleration method based on the meshless topology optimization not only has excellent parallelism characteristics, namely, no data race condition can occur in the parallelization process, but also is very suitable for a two-stage parallel organization architecture of thread blocks and threads in the GPU, wherein the thread blocks correspond to the parallelism of coarse granularity, and the threads in the thread blocks correspond to the parallelism of fine granularity. In addition, the method is easy to realize programming, and lays a foundation for the application of the gridless method to the large-scale or three-dimensional structure topology optimization problem.
Drawings
FIG. 1 is a schematic diagram of a parallel Q-value calculation method according to the present invention.
FIG. 2 is a schematic diagram of a parallel calculation method of the sensitivity value S in the method of the present invention.
FIG. 3 is a diagram illustrating GPU parallel computation of Q values in the method of the present invention.
FIG. 4 is a diagram illustrating GPU-parallel computation of a sensitivity value S according to the method of the present invention.
FIG. 5 is a schematic model diagram of embodiment 1 of the method of the present invention.
Fig. 6 and 7 are graphs comparing the calculation results of example 1 of the method of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples.
The invention relates to a sensitivity parallelism and GPU acceleration method based on meshless topology optimization, which comprises the following specific implementation steps:
(1) performing equivalent transformation on a calculation formula based on the sensitivity of an objective function in a topology optimization model of a meshless structure, namely, transforming the calculation formula into a functional form of a part Q (Q and an integral point are in one-to-one correspondence) and a residual part R (R and a node are in one-to-one correspondence) related to an overall characteristic matrix and a degree of freedom vector of the structure, wherein the formula (1) is as follows:
S=S(Q,R) (1);
wherein S is the sensitivity value of the objective function.
(2) Referring to fig. 1, with the integration points as the unit of parallel computation of coarse granularity, traversing all the integration points, computing the Q value at each integration point in parallel, and temporarily storing the Q value in the corresponding storage unit; the method comprises the following specific steps:
(a) for a certain integral point in coarse-grained parallel computation, such as the g integral point in fig. 1, all node information in the definition domain of the certain integral point is obtained, and node pairs, such as node pairs of IJ, KL, MM and the like in fig. 1, are formed in a pairwise combination manner (including the combination of each node and the node itself);
(b) taking each node pair in the integral point definition domain as a fine-grained parallel computing unit, distributing a corresponding shared storage unit for each node pair (namely all parallel threads corresponding to the integral point can be accessed), and initializing all the node pairs to zero;
(c) calculating the portion of each node in the integral point definition domain contributing to the Q value, Q in FIG. 1IJ、QKL、QMMStoring the calculation result into a shared storage unit corresponding to each node pair;
(d) when the contribution values of all node pairs in the integral point definition domain are calculated, summing the contribution values in all shared storage units corresponding to all node pairs in the integral point definition domain to obtain a Q value at the integral point, such as Q in fig. 1gAnd temporarily stored to the corresponding storage unit.
(3) Referring to fig. 2, traversing all nodes by taking a node as a coarse-grained parallel computing unit, extracting Q values corresponding to all integral points within the influence of the node, computing an objective function sensitivity value S at each node in parallel, and storing the objective function sensitivity value S in a corresponding storage unit; the method comprises the following specific steps:
(a) for a certain node in coarse-grained parallel computation, such as node I in fig. 2, acquiring information of all integral points in an influence domain, such as integral points g, h, I in fig. 2, and taking each integral point in the influence domain of the node as a fine-grained parallel computation unit, allocating a corresponding shared storage unit to each integral point (that is, all parallel threads corresponding to the node are accessible), and initializing all the integral points to zero;
(b) calculating the contribution of each integral point in the nodal domain to the R value, e.g. R in FIG. 2g、Rh、RiAnd the Q value corresponding to each integral point in the node influence domain is extracted, and then the Q value and the R value corresponding to each integral point are utilizedThe contribution of each integration point to the sensitivity value S of the objective function, e.g. S in FIG. 2, is calculatedg、Sh、SiStoring the calculation result into a shared storage unit corresponding to each integral point;
(c) when the contribution values of all the integral points in the node influence domain are calculated, summing the contribution values in all the shared storage units corresponding to all the integral points in the node influence domain, namely sigma (S)g+Sh+Si+ …) to obtain an objective function sensitivity value corresponding to the node, e.g., S in fig. 2IAnd stored to the corresponding storage unit.
(4) Referring to fig. 1, all memory cells temporarily storing the Q value are released.
(5) Referring to fig. 3, when the GPU is used to accelerate the calculation of the Q value in step (2), the following specific steps are included:
(a) distributing the integral points to each thread block on the GPU according to a one-to-one corresponding relation (wherein the number of threads to which each thread block belongs needs to be set to be the positive integer power of 2);
(b) for the integral point corresponding to a certain thread block, acquiring all node information in a definition domain of the integral point, and forming a node pair in a pairwise combination mode (including the combination of each node and the node);
(c) allocating each node pair in the integral point definition domain to each thread in the thread block corresponding to the integral point, and allocating a corresponding GPU shared memory unit to each thread (namely each thread in the thread block corresponding to the integral point can be accessed), and all the nodes are initialized to zero;
(d) each thread in the thread block corresponding to the integral point calculates the portion of the Q value contributed by each node in the definition domain of the integral point, and stores the calculation result into a storage unit corresponding to each thread in the GPU shared memory corresponding to the thread block;
(e) after all the nodes to which the integral point belongs are processed, reducing and summing the contribution values in all the GPU shared memories corresponding to the threads to which the thread block belongs to obtain Q of the integral point processed by the thread blockgValue and temporarily put it into effectAnd storing the data to a corresponding storage unit in the global memory of the GPU.
(6) Referring to fig. 4, when the GPU is used to accelerate the calculation of the sensitivity value S of the objective function in step (3), the method includes the following specific steps:
(a) distributing the nodes to each thread block on the GPU according to a one-to-one corresponding relation (wherein the thread number of each thread block needs to be set to be the positive integer power of 2);
(b) acquiring all integral point information in an influence domain of a node corresponding to a certain thread block;
(c) allocating each integral point in the node influence domain to each thread in the thread block corresponding to the node, and allocating a corresponding GPU shared memory unit to each thread (namely, each thread in the thread block corresponding to the node can be accessed), and all the integral points are initialized to zero;
(d) the node calculates the contribution part of each integral point corresponding to the node to the R value, and the contribution part is marked as RgAnd extracting Q at the integral point corresponding to each threadgValue, then according to the corresponding QgValue and RgThe value is that each thread calculates the contribution part of the integral point corresponding to the thread to the sensitivity value S of the objective function, which is marked as SgStoring the calculation result into a storage unit corresponding to each thread in the GPU shared memory corresponding to the thread block;
(e) after all the integral points belonging to the node are processed, carrying out reduction summation on the contribution values in all the GPU shared memories corresponding to the threads belonging to the thread block, namely sigma SgAnd obtaining the target function sensitivity value S corresponding to the node, and storing the target function sensitivity value S into a corresponding storage unit in the GPU global memory.
The following is a specific example of the application of the method of the invention to test the performance of the method of the invention.
Referring to fig. 5, example 1 is a rectangular parallelepiped model having a regular shape. The cuboid is 100 in length, 50 in height and 4 in width.
Fig. 6 and 7 are comparison diagrams of final topology optimization calculation results of the method of the present invention and the conventional method, respectively. As can be seen from fig. 6 and 7, the calculation results of the two methods completely match each other.
The calculation performance of the method of the invention and the traditional method is respectively tested by using the scales of different nodes. Table 1 calculates the elapsed time to acceleration ratio for the sensitivity of example 1, where the acceleration ratio is the elapsed time of the conventional method divided by the elapsed time of the method of the present invention. As can be seen from Table 1, the acceleration rate obtained by the method is more than 70 times, and the time consumption of the traditional method is increased faster than that of the method of the invention along with the increase of the number of nodes, but the time consumption of the method of the invention is only increased in a small range along with the increase of the number of nodes, which shows that the method of the invention has excellent calculation performance, and the larger the calculation scale is, the more prominent the advantages are.
Table 1 sensitivity calculation time-to-acceleration ratio of example 1
Figure BDA0003140253190000091
Figure BDA0003140253190000101
Although the present invention has been described with reference to the preferred embodiments, the above description does not limit the scope of the present invention, and any modifications, improvements, etc. within the spirit and principle of the present invention should be construed as falling within the scope of the present invention.

Claims (5)

1. A sensitivity parallelism and GPU acceleration method based on meshless topology optimization is characterized by comprising the following steps:
(1) performing equivalent transformation on a calculation formula based on the sensitivity of an objective function in a topological optimization model of a meshless structure, namely transforming the calculation formula into a functional form of a part Q and a residual part R related to an overall characteristic matrix and a degree of freedom vector of the structure, wherein the functional form is shown as a formula (1):
S=S(Q,R) (1);
wherein, Q and the integral point are in one-to-one correspondence relationship; r and nodes are in one-to-one correspondence relationship; s is a sensitivity value of the objective function;
(2) traversing all the integral points by taking the integral points as a parallel computing unit of coarse granularity, computing the Q value of each integral point in parallel, and temporarily storing the Q value to a corresponding storage unit;
(3) traversing all nodes by taking the nodes as a parallel computing unit of coarse granularity, extracting Q values corresponding to all integral points in the influence of the nodes, computing the target function sensitivity value S at each node in parallel, and storing the target function sensitivity value S in a corresponding storage unit;
(4) all memory cells temporarily storing the Q value are released.
2. The sensitivity parallelism and GPU acceleration method based on the meshless topology optimization according to claim 1, characterized in that: the step (2) specifically comprises the following steps:
(a) for a certain integral point in coarse-grained parallel computation, acquiring all node information in a definition domain of the integral point, and forming node pairs in a pairwise combination mode; the pairwise combination mode also comprises the combination of each node and the node;
(b) taking each node pair in the integration point definition domain of the step (a) as a parallel computing unit with fine granularity, allocating a corresponding shared storage unit for each node pair, namely all parallel threads corresponding to the integration point are accessible and all initialized to be zero;
(c) calculating the portion of each node pair in the integral point definition domain, which contributes to the Q value, and storing the calculation result into a shared storage unit corresponding to each node pair;
(d) when the contribution values of all node pairs in the integral point definition domain are calculated, summing the contribution values in all shared storage units corresponding to all node pairs in the integral point definition domain to obtain a Q value at the integral point, and marking the Q value as QgAnd temporarily stored to the corresponding storage unit.
3. The sensitivity parallelism and GPU acceleration method based on the meshless topology optimization according to claim 2, characterized in that: the step (3) specifically comprises the following steps:
(a) for a certain node in coarse-grained parallel computation, acquiring information of all integral points in an influence domain of the node, taking each integral point in the influence domain of the node as a fine-grained parallel computation unit, distributing a corresponding shared storage unit for each integral point, namely all parallel threads corresponding to the node are accessible, and all the integral points are initialized to be zero;
(b) calculating the contribution part of each integral point in the influence domain of the node to the R value, and recording the contribution part as RgAnd extracting Q corresponding to each integral point in the node influence domaingValue, then using the Q corresponding to each integration pointgValue and RgThe value of the integral point is calculated, and the contribution part of each integral point to the sensitivity value S of the objective function is calculated and recorded as SgAnd storing the calculation result into a shared storage unit corresponding to each integral point;
(c) when the contribution values of all the integral points in the node influence domain are calculated, summing the contribution values in all the shared storage units corresponding to all the integral points in the node influence domain, namely sigma SgAnd obtaining the target function sensitivity value S corresponding to the node and storing the target function sensitivity value S in a corresponding storage unit.
4. The sensitivity parallelism and GPU acceleration method based on the meshless topology optimization according to claim 2, characterized in that: the method for accelerating the calculation of the Q value by adopting the GPU comprises the following steps:
(1) distributing the integral points to each thread block on the GPU according to a one-to-one corresponding relation; wherein, the number of threads of each thread block is required to be set to be the positive integer power of 2;
(2) acquiring all node information in a definition domain of a score point corresponding to a certain thread block, and forming node pairs in a pairwise combination mode; the pairwise combination mode also comprises the combination of each node and the node;
(3) allocating each node pair in the integral point definition domain in the step (2) to each thread in the thread block corresponding to the integral point, and allocating a corresponding GPU shared memory unit to each thread, namely each thread in the thread block corresponding to the integral point can be accessed and is all initialized to zero;
(4) each thread in the thread block corresponding to the integral point calculates the portion of the Q value contributed by each node in the definition domain of the integral point, and stores the calculation result into a storage unit corresponding to each thread in the GPU shared memory corresponding to the thread block;
(5) after all the nodes to which the integral point belongs are processed, reducing and summing the contribution values in all the GPU shared memories corresponding to the threads to which the thread block belongs to obtain Q of the integral point processed by the thread blockgAnd temporarily storing the value to a corresponding storage unit in the GPU global memory.
5. The sensitivity parallelism and GPU acceleration method based on the meshless topology optimization according to claim 3, characterized in that: the method for accelerating the calculation of the sensitivity value S of the objective function by adopting the GPU comprises the following steps:
(1) distributing the nodes to each thread block on the GPU according to a one-to-one corresponding relation; wherein, the thread number of each thread block needs to be set to be the positive integer power of 2;
(2) acquiring all integral point information in an influence domain of a node corresponding to a certain thread block;
(3) allocating each integral point in the node influence domain in the step (2) to each thread in the thread block corresponding to the node, and allocating a corresponding GPU shared memory unit to each thread, namely each thread in the thread block corresponding to the node can be accessed and is all initialized to zero;
(4) the node calculates the contribution part of each integral point corresponding to the node to the R value, and the contribution part is marked as RgAnd extracting Q at the integral point corresponding to each threadgValue, then according to the corresponding QgValue and RgThe value is that each thread calculates the contribution part of the integral point corresponding to the thread to the sensitivity value S of the objective function, which is marked as SgAnd storing the calculation result inThe memory cell corresponding to each thread in the GPU shared memory corresponding to the thread block;
(5) after all the integral points belonging to the node are processed, carrying out reduction summation on the contribution values in all the GPU shared memories corresponding to the threads belonging to the thread block, namely sigma SgAnd obtaining the target function sensitivity value S corresponding to the node, and storing the target function sensitivity value S into a corresponding storage unit in the GPU global memory.
CN202110736591.9A 2021-06-30 2021-06-30 Sensitivity parallel based on grid-free topology optimization and GPU acceleration method thereof Active CN113467945B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110736591.9A CN113467945B (en) 2021-06-30 2021-06-30 Sensitivity parallel based on grid-free topology optimization and GPU acceleration method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110736591.9A CN113467945B (en) 2021-06-30 2021-06-30 Sensitivity parallel based on grid-free topology optimization and GPU acceleration method thereof

Publications (2)

Publication Number Publication Date
CN113467945A true CN113467945A (en) 2021-10-01
CN113467945B CN113467945B (en) 2024-03-12

Family

ID=77876582

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110736591.9A Active CN113467945B (en) 2021-06-30 2021-06-30 Sensitivity parallel based on grid-free topology optimization and GPU acceleration method thereof

Country Status (1)

Country Link
CN (1) CN113467945B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114998091A (en) * 2022-06-22 2022-09-02 湘潭大学 PCG solving and GPU accelerating method without grid method and matrix diagonal preprocessing

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103970960A (en) * 2014-05-23 2014-08-06 湘潭大学 Grid-free Galerkin method structural topology optimization method based on GPU parallel acceleration
JP6856286B1 (en) * 2020-05-19 2021-04-07 ▲広▼州大学 BESO topology optimization method based on dynamic evolution rate and adaptive grid

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103970960A (en) * 2014-05-23 2014-08-06 湘潭大学 Grid-free Galerkin method structural topology optimization method based on GPU parallel acceleration
JP6856286B1 (en) * 2020-05-19 2021-04-07 ▲広▼州大学 BESO topology optimization method based on dynamic evolution rate and adaptive grid

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
蔡勇;李胜;: "Matlab的图形处理器并行计算及其在拓扑优化中的应用", 计算机应用, no. 03 *
邱志平;祁武超;: "配点型区间有限元法", 力学学报, no. 03 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114998091A (en) * 2022-06-22 2022-09-02 湘潭大学 PCG solving and GPU accelerating method without grid method and matrix diagonal preprocessing
CN114998091B (en) * 2022-06-22 2024-04-26 湘潭大学 Grid-free method and matrix-free diagonal preprocessing PCG solving and GPU accelerating method

Also Published As

Publication number Publication date
CN113467945B (en) 2024-03-12

Similar Documents

Publication Publication Date Title
CN109726441B (en) Body and surface mixed GPU parallel computing electromagnetism DGTD method
Kamranian et al. An adaptive meshless local Petrov–Galerkin method based on a posteriori error estimation for the boundary layer problems
Dai et al. SparseTrain: Exploiting dataflow sparsity for efficient convolutional neural networks training
WO2020202312A1 (en) Information processing device, calculation device, and information processing method
Gao et al. Millimeter-scale and billion-atom reactive force field simulation on Sunway Taihulight
CN113467945A (en) Sensitivity parallelism and GPU acceleration method based on meshless topology optimization
CN109960865B9 (en) GPU acceleration method for dynamic response analysis of thin plate grid-free Galerkin structure
CN117725354B (en) Rapid forward and backward modeling method and system combining large data volume gravity and gravity gradient
CN107704266A (en) A kind of reduction method for being applied to solve the competition of particle simulation parallel data
CN117973169A (en) Time-varying plasma bypass field SFDTD method based on three-stage parallel computing
CN109446478A (en) A kind of complex covariance matrix computing system based on iteration and restructural mode
CN109948253B (en) GPU acceleration method for thin-plate meshless Galerkin structure modal analysis
CN111832144B (en) Full-amplitude quantum computing simulation method
Li et al. Massively parallel acceleration of unstructured DSMC computing
CN114998091B (en) Grid-free method and matrix-free diagonal preprocessing PCG solving and GPU accelerating method
Menshov et al. GPU-native gas dynamic solver on octree-based AMR grids
Wei et al. Acceleration of free-vibrations analysis with the Dual Reciprocity BEM based on ℋ-matrices and CUDA
CN112100099B (en) Lattice boltzmann optimization method for multi-core vector processor
CN114722670B (en) Potential distribution finite difference solving method for two-dimensional electrostatic particle model
CN114490047B (en) Heterogeneous data transmission method for nuclear fuel fission gas cluster dynamic simulation
Lu et al. An Improved Graphics Processing Unit Acceleration Approach for Three-Dimensional Structural Topology Optimization Using the Element-Free Galerkin Method.
Jiang et al. Parallelization of K-Means clustering algorithm for data mining
Tian et al. Optimization of three-dimensional finite difference time domain algorithm for solving Schrödinger equation
박용상 Accelerating LLMs with High Activation Sparsity by Leveraging N: M Sparsity
Donath et al. Optimizing performance of the lattice Boltzmann method for complex structures on cache-based architectures

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant