CN112733401A - Finite element tearing and butt joint method and system for numerical simulation of reactor core assembly - Google Patents

Finite element tearing and butt joint method and system for numerical simulation of reactor core assembly Download PDF

Info

Publication number
CN112733401A
CN112733401A CN202011607981.8A CN202011607981A CN112733401A CN 112733401 A CN112733401 A CN 112733401A CN 202011607981 A CN202011607981 A CN 202011607981A CN 112733401 A CN112733401 A CN 112733401A
Authority
CN
China
Prior art keywords
finite element
matrix
dense
strategy
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011607981.8A
Other languages
Chinese (zh)
Other versions
CN112733401B (en
Inventor
张纪林
张鋆宸
王珏
冯仰德
聂宁明
丁佳明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Computer Network Information Center of CAS
Hangzhou Dianzi University
Original Assignee
Computer Network Information Center of CAS
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Computer Network Information Center of CAS, Hangzhou Dianzi University filed Critical Computer Network Information Center of CAS
Priority to CN202011607981.8A priority Critical patent/CN112733401B/en
Publication of CN112733401A publication Critical patent/CN112733401A/en
Application granted granted Critical
Publication of CN112733401B publication Critical patent/CN112733401B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/23Design optimisation, verification or simulation using finite element methods [FEM] or finite difference methods [FDM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E30/00Energy generation of nuclear origin
    • Y02E30/30Nuclear fission reactors

Abstract

The invention discloses a finite element tearing and butt joint method and system for numerical simulation of a reactor core component. Each of the n computing nodes is provided with the finite element tearing and butting system, and each computing node is provided with g GPU accelerators. The invention adopts a load balancing strategy, so that the size of the dense matrix memory of each process tends to an average value, cluster resources are fully utilized, and the solving speed is accelerated. And adopting HIP programming to ensure that the finite element tearing and butting method operates on the NvidiaCUDA platform and the AMDROMc platform. In the dense matrix vector multiplication stage of the iterative solution process, a dynamic allocation matrix strategy is adopted, so that different processors are allocated to proper calculation amount, calculation resources are fully utilized, and the solution speed is accelerated. In the vector inner product stage, a vector inner product acceleration strategy and a communication calculation overlapping strategy are adopted, and by introducing a communication thread, the communication waiting time is reduced, and the vector inner product speed is accelerated.

Description

Finite element tearing and butt joint method and system for numerical simulation of reactor core assembly
Technical Field
The invention relates to a finite element tearing and butting process processing technology, in particular to a finite element tearing and butting method and a finite element tearing and butting system for numerical simulation of a reactor core component.
Background
In the reactor core assembly in the nuclear reactor, the conditions of deformation of the reactor core assembly and abrasion of fuel rods can occur under the environments of high temperature, irradiation, fluid, pressure and the like, so that a series of problems of difficult material loading and unloading, assembly damage, fatigue damage and the like are caused, and the safe operation of the reactor is influenced. And because the theoretical analysis method is very difficult due to the problems of special arrangement of the reactor core components and the like, the numerical simulation of the reactor core components needs to be carried out by adopting a finite element method.
The Finite element Tearing and abutting method (FETI) is an effective scheme for solving the mechanical problem of the reactor structure, is mainly used for processing the large-scale problem obtained by discretization of a partial differential equation, is an important method required by large-scale numerical simulation of a reactor core assembly, and is also suitable for the fields of electromagnetics, aeronautics, mechanical manufacturing and the like. The finite element tear butt joint method was originally proposed in the field of structural mechanics by c.farhart and f.x.roux, which is a non-overlapping region decomposition method that divides a model into many non-overlapping subfields, and each subfield is independent. In order to ensure the continuity between sub-domains, the FETI method adds a set of unknowns (lagrange multipliers LM), and in actual solution, the LM is obtained by using a Krylov subspace iteration method, and then a sub-domain equation is solved in each sub-domain.
However, the original FETI method is not computationally efficient. To solve this problem, Farhat et al proposed in 2001 the FETI-DP method (a dual-primal unified FETI method) which no longer required a second set of Lagrangian multipliers and unifies all the previously developed one-layer and two-layer FETI methods into a single dual-pair. The FETI-DP is more robust than the FETI method, has higher calculation efficiency, and is suitable for solving second-order and fourth-order problems. In 2006, Dost' al et al proposed the TFETI method (Total FERI). This approach is a variant of the FETI approach, where Dirichlet boundary conditions are also correlated with LM (lagrange multiplier), however the rough problem remains an important factor that limits the scalability of the FETI approach.
To reduce the impact of the coarse problem and improve scalability, Klawonn and Rheinbach et al propose HFETI method (Hybrid FETI method) in 2010. The method combines the FETI method and the FETI-DP method, and a plurality of subdomains are gathered into a cluster, so that the method can be regarded as a three-level region decomposition method. First, a FETI-DP system needs to be built to handle all clusters. Each cluster is then made up of multiple subdomains, and the subdomains in each cluster need to be processed using conventional FETI methods. Also, in 2012, Kozubek et al proposed a similar method, HTFETI method (Hybrid Total FETI method). The method combines the FETI method with the TFETI method, using the TFETI method for the subfields in each cluster, and using the FETI method with projections for the clusters. The HTFETI method can effectively reduce the coarse problem.
However, in the iterative solution of FETI, the sparse matrix vector operation consumes a lot of time, so Riha et al proposed an LSC method (Local Schur complete method) in 2016, replacing the sparse matrix vector operation with more efficient dense matrix vector multiplication (GEMV), which is equivalent to a time-over-space strategy. This dense BLAS level 2 operation has continuous memory access rights and therefore better performance for memory-limited applications. At the same time, dense matrix vector multiplication is suitable for processing with the GPU accelerator. Vavrik et al used CUDA programming to multiply dense matrix vectors for processing by the GPU in 2018.
However, the existing finite element tearing and butting method still has the following problems which need to be solved urgently: 1) currently, a CUDA (compute unified device architecture) programming is used in a finite element tearing docking solver supporting heterogeneous parallel, however, the solver realized by using the CUDA programming can only run on an Nvidia CUDA platform, and does not support other types of GPU accelerators; 2) when the GPU is adopted to calculate the vector multiplication of the dense matrix, the CPU is in an idle state, and the calculation resources of the cluster are not fully utilized; 3) when the numerical simulation of the reactor core assembly is actually carried out, the size of the dense matrix memory assembled by each process is greatly different, and even the difference between the calculation time and the memory size is up to 6 times, so that the process with less calculation amount takes a large amount of time to wait for other processes, and the solution time is increased.
Disclosure of Invention
The invention aims to solve the problems of the existing finite element tearing and butt joint method, provides a finite element tearing and butt joint system for numerical simulation of a reactor core component, fully utilizes cluster resources, accelerates the solving speed, reduces the communication waiting time and improves the transportability.
A finite element tearing and butting system for reactor numerical simulation comprises an input module, a region dividing module, a matrix assembling module, a resource collecting module, a load balancing module, an iteration solving module and a local solving module.
And the input module is used for acquiring the grid file data and setting initialization parameters.
The area dividing module is used for dividing the grid into a plurality of areas and dividing each area into a plurality of sub-areas.
And the matrix assembly module is used for generating a corresponding finite element matrix in each sub-area.
And the resource collection module is used for collecting the dense matrix size information of each process and carrying out memory occupation comparison.
And the load balancing module is used for calling a load balancing strategy and reallocating the dense matrix of each process.
The iteration solving module is used for solving the displacement of each region boundary node by adopting the existing iteration method; and invokes a vector inner product acceleration strategy and a communication computation overlap strategy.
And the local solving module is used for solving the displacement of the internal node of each region.
Each of the n computing nodes is provided with the finite element tearing and butting system, and each computing node is provided with g GPU accelerators.
Another object of the present invention is to provide a finite element tearing and butting method for numerical simulation of a reactor core assembly, which comprises the following steps:
step 1: and acquiring geometric model data of the reactor core assembly, and performing meshing on the geometric model data through the existing software to generate a mesh file.
Step 2: each computing node acquires a grid file of the reactor core component through an input module, and initializes related parameters: finite element method, iteration method, maximum iteration number, iteration precision, reactor core component material parameters, reactor core component boundary conditions and the like.
The finite element method may be a FETI or HTFETI method.
And step 3: each computing node in the n computing nodes starts g processes, each process starts T threads, the grid obtained by the input module is divided into g x n regions through the region dividing module, and each region is distributed with one process; while each region is further divided into s sub-regions.
And 4, step 4: and generating a corresponding finite element matrix in each subdomain through a matrix assembly module according to the allocated region and the selected finite element method in each process, wherein each subdomain generates a dense matrix. Thus, each process will generate s dense matrices.
And 5: collecting dense matrix information of each process by using a resource collection module, wherein the memory occupied by the dense matrix of the process i is LiLet Lmin=min{L1,L2,L3...Ln*g},Lmax=max{L1,L2,L3...Ln*g}. If it is not
Figure BDA0002872332370000031
X represents a threshold value, the phenomenon of load imbalance of the reactor core component is considered to occur in the finite element processing process, a load balancing strategy is adopted for adjustment, and thenEntering a step 6; otherwise, the load is considered to be balanced in the finite element processing process, and the step 7 is directly carried out.
Step 6: starting a load balancing strategy through a load balancing module, and adjusting the memory occupied by the matrix of each process to be near the average value, specifically:
6-1, calculating the average memory size of the dense matrix according to the memory size of the dense matrix of each process;
6-2 comparing the memory size of the dense matrix of each process with the average value, if the memory size of the dense matrix of each process is larger than the average value, considering that the calculation amount of the process is larger, needing other process help, and setting the process as a helped person; if the average value is smaller than the average value, the calculated amount of the process is considered to be smaller, and other processes can be helped and set as helpers;
6-3, dividing the process into two groups, namely a helper group and a helped group, sorting each group according to the size of a dense matrix memory, and correspondingly selecting a helper and a helped person;
6-4 the helped person sends 1 dense matrix to the helper;
6-5, repeating the step 6-4 until the dense matrix memory of the current helped person is smaller than the average value, and then changing the next helped person, or the dense matrix memory of the helper is larger than the average value, and then changing the next helper, and entering the step 6-4;
6-6 repeat steps 6-4 through 6-5 until the memory of all helped persons is less than the average or the dense matrix memory of all helpers is greater than the average.
And 7: and (3) carrying out iterative solution on each process through an iterative solution module, wherein a vector inner product acceleration strategy and a communication calculation overlapping strategy are adopted for vector inner product operation in each iteration step of the iterative solution, dense matrix vector multiplication is calculated on a GPU-like accelerator by adopting HIP (heterogeneous calculation portable interface) programming, and a dynamic matrix allocation strategy is adopted.
The vector inner product acceleration strategy is to solve the local vector inner product of each process in parallel by multiple threads.
The communication calculation overlap strategy uses 1 thread for communication for each process, and the rest T-1 threads continue to participate in the calculation of the inner product of the local vector, wherein the communication threads participate in the calculation of the inner product of the local vector after finishing the communication.
The dynamic allocation matrix strategy is characterized in that when dense matrix vector multiplication is carried out, 1 thread is used for calling a hipMALS library in each process, a block of GPU accelerator is used for carrying out dense matrix vector multiplication calculation, the rest T-1 threads call an Intel MKL library, and a CPU is used for carrying out dense matrix vector multiplication calculation. And dynamically allocating the matrix quantity to the CPU and the GPU-like accelerator according to the calculation time of the dense matrix vector multiplied by the CPU and the GPU-like accelerator during each iteration. The specific formula is as follows:
Figure BDA0002872332370000041
Figure BDA0002872332370000042
Figure BDA0002872332370000043
Figure BDA0002872332370000044
where N represents the total number of dense matrices that the current process needs to process,
Figure BDA0002872332370000045
representing the number of dense matrices assigned by the GPU-like accelerator in the next iteration,
Figure BDA0002872332370000046
indicating the number of dense matrices to which the CPU is assigned in the next iteration,
Figure BDA0002872332370000047
representing the number of dense matrices allocated by the last iteration class GPU accelerator,
Figure BDA0002872332370000048
representing the number of dense matrices, t, assigned by the CPU in the previous iterationcRepresenting the computation time, t, of the last iteration of the class GPU acceleratordRepresenting the computation time of the CPU in the last iteration, xc_subRepresenting the number of dense matrices, x, to which a single CPU core is allocatedtmpIs a temporary variable.
And 8: and each process obtains the displacement of all the nodes by solving the displacement of the internal nodes through a local solving module according to the iteration solving result.
It is a further object of the present invention to provide a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the above-mentioned method.
It is a further object of the present invention to provide a computing device comprising a memory having stored therein executable code and a processor that, when executing the executable code, implements the method described above.
The invention has the beneficial effects that:
when the finite element tearing and butting method is used for carrying out numerical simulation on the reactor core component, the load balancing strategy is adopted, so that the size of the dense matrix memory of each process tends to be an average value, cluster resources can be fully utilized, and the solving speed is accelerated. Meanwhile, the invention adopts HIP programming, so that the finite element tearing and butting method can operate on an Nvidia CUDA platform and an AMD ROCM platform, and the portability of codes is increased. In the dense matrix vector multiplication stage of the iterative solution process, a dynamic allocation matrix strategy is adopted, so that different processors are allocated to proper calculation amount, calculation resources are fully utilized, and the solution speed is accelerated. In the vector inner product stage, a vector inner product acceleration strategy and a communication calculation overlapping strategy are adopted, and by introducing a communication thread, the communication waiting time is reduced, and the vector inner product speed is accelerated.
Drawings
FIG. 1 is a flow chart of a finite element tear butt joint acceleration method;
FIG. 2 is a graph of dense matrix memory footprint comparison;
FIG. 3 dense matrix vector multiplication computation time contrast diagram.
Detailed Description
The present invention is further analyzed with reference to the following specific examples.
The finite element tearing and butt joint method oriented to the numerical simulation of the reactor core component is applied to the deformation prediction of the reactor core component under the high-temperature condition.
A method for predicting deformation of a reactor core assembly in a nuclear reactor comprises a finite element tearing and butting acceleration method for numerical simulation of the reactor core assembly of the reactor; the deformation condition of the current core assembly can be obtained through the obtained solving results (namely the displacement of all the nodes), so that a basis is provided for the analysis and design of the core assembly.
The following specific implementation steps and descriptions are as follows:
a finite element tearing and butting system for numerical simulation of a reactor core component comprises an input module, a region dividing module, a matrix assembling module, a resource collecting module, a load balancing module, an iteration solving module and a local solving module.
And the input module is used for acquiring the grid file data and setting initialization parameters.
The area dividing module is used for dividing the grid into a plurality of areas and dividing each area into a plurality of sub-areas.
And the matrix assembly module is used for generating a corresponding finite element matrix in each sub-area.
And the resource collection module is used for collecting the dense matrix size information of each process and carrying out memory occupation comparison.
And the load balancing module is used for calling a load balancing strategy and reallocating the dense matrix of each process.
The iteration solving module is used for solving the displacement of each region boundary node by adopting the existing iteration method; and invokes a vector inner product acceleration strategy and a communication computation overlap strategy.
And the local solving module is used for solving the displacement of the internal node of each region.
Each of the n computing nodes is provided with the finite element tearing and butting system, and each computing node is provided with g GPU accelerators.
A finite element tearing and butting method for numerical simulation of a reactor core component comprises the following specific steps as shown in figure 1:
step 1: and acquiring geometric model data of the reactor core assembly, and performing meshing on the geometric model data through the existing software to generate a mesh file.
Step 2: each computing node acquires a grid file of the reactor core component through an input module, and initializes related parameters: finite element method, iteration method, maximum iteration number, iteration precision, reactor core component material parameters, reactor core component boundary conditions and the like.
The finite element method may be a FETI or HTFETI method.
The iteration method is a pretreatment conjugate gradient method and is used for solving the following finite element equations:
Figure BDA0002872332370000061
wherein:
F=BK+BT
G=BR
d=BK+f
e=RTf
matrix B is a displacement coordination matrix such that the node displacements on the contact faces of adjacent sub-domains are equal. Matrix K is a finite element stiffness matrix. The matrix R is the basis of the null space of the stiffness matrix K and the adjacent f is the load vector. λ and α are both unknowns, by selecting a pre-processing matrix P ═ I-G (G)TG)-1GTAnd eliminating the variable alpha, and solving by a preprocessing conjugate gradient method to obtain the sub-domain boundary node displacement lambda.
The specific algorithm is described as follows:
Figure BDA0002872332370000062
Figure BDA0002872332370000071
and step 3: the grid input method comprises the following steps that n computing nodes are started, each computing node starts g processes, each process starts T threads, a grid obtained by an input module is divided into g x n regions through a region dividing module, and each region is distributed with one process; while each region is further divided into s sub-regions.
And 4, step 4: and generating a corresponding finite element matrix in each subdomain through a matrix assembly module according to the allocated region and the selected finite element method in each process, wherein each subdomain generates a dense matrix. Thus, each process will generate s dense matrices.
And 5: collecting dense matrix information of each process by using a resource collection module, wherein the memory occupied by the dense matrix of the process i is LiLet Lmin=min{L1,L2,L3…Ln*g},Lmax=max{L1,L2,L3…Ln*g}. If it is not
Figure BDA0002872332370000072
Considering that the reactor core assembly has a load imbalance phenomenon in the finite element processing process, adjusting by adopting a load balancing strategy, and entering step 6; otherwise, the load is considered to be balanced in the finite element processing process, and the step 7 is directly carried out.
Step 6: and starting a load balancing strategy through a load balancing module, and adjusting the memory occupied by the matrix of each process to be close to the average value.
The load balancing strategy specifically comprises:
a) calculating the average memory size of the dense matrix according to the memory size of the dense matrix of each process;
b) comparing the size of the dense matrix memory of each process with the average value, if the size is larger than the average value, considering that the calculated amount of the process is larger, needing the help of other processes, and setting the process as a helped person; if the average value is smaller than the average value, the calculated amount of the process is considered to be smaller, the process of the group can be helped, and the process can be set as a helper;
c) dividing the process into two groups, namely a helper group and a helped group, sorting each group according to the size of a dense matrix memory, and correspondingly selecting one helper and one helped group;
d) the helped person sends 1 dense matrix to the helper;
e) repeating the step d) until the dense matrix memory of the current helped person is smaller than the average value, and then changing the next helped person, or the dense matrix memory of the helper is larger than the average value, and then changing the next helper, and entering the step d);
d) repeating steps d) and e) until the memory of all helped persons is less than the average value, or the dense matrix memory of all helpers is greater than the average value.
And 7: and carrying out iterative solution on each process through an iterative solution module, wherein a vector inner product acceleration strategy and a communication calculation overlapping strategy are adopted for vector inner product operation in each iteration step of the iterative solution, and a dense matrix vector multiplication is calculated on a GPU-like accelerator by adopting HIP programming and a dynamic matrix allocation strategy is adopted.
The vector inner product acceleration strategy is to solve the local vector inner product of each process in parallel by multiple threads.
The communication calculation overlap strategy uses 1 thread for communication for each process, the rest t-1 threads continue to participate in the calculation of the local vector inner product, and when the communication thread finishes the communication, the thread participates in the calculation of the local vector inner product again. The specific algorithm is described as follows:
Figure BDA0002872332370000081
the dynamic allocation matrix strategy is characterized in that when dense matrix vector multiplication is carried out, 1 thread is used for calling a hipMALS library in each process, a block of GPU accelerator is used for carrying out dense matrix vector multiplication calculation, the rest T-1 threads call an Intel MKL library, and a CPU is used for carrying out dense matrix vector multiplication calculation. And dynamically allocating the matrix quantity to the CPU and the GPU-like accelerator according to the calculation time of the dense matrix vector multiplied by the CPU and the GPU-like accelerator during each iteration. The specific formula is as follows:
Figure BDA0002872332370000091
Figure BDA0002872332370000092
Figure BDA0002872332370000093
Figure BDA0002872332370000094
where N represents the total number of dense matrices that the current process needs to process,
Figure BDA0002872332370000095
representing the number of dense matrices assigned by the GPU-like accelerator in the next iteration,
Figure BDA0002872332370000096
indicating the number of dense matrices to which the CPU is assigned in the next iteration,
Figure BDA0002872332370000097
representing the number of dense matrices allocated by the last iteration class GPU accelerator,
Figure BDA0002872332370000098
representing the number of dense matrices, t, assigned by the CPU in the previous iterationcRepresenting the computation time, t, of the last iteration of the class GPU acceleratordRepresenting the computation time of the CPU in the last iteration, xc_subRepresenting the number of dense matrices, x, to which a single CPU core is allocatedtmpIs a temporary variable.
And 8: and each process obtains the displacement of all the nodes by solving the displacement of the internal nodes through a local solving module according to the iteration solving result.
Fig. 2 shows comparison of the memory sizes of the dense matrices before and after load balancing, which indicates that the load of each process can be adjusted to an average value by the load balancing strategy, so that too long communication waiting time caused by unbalanced load is avoided, and cluster resources are fully utilized. Fig. 3 shows the calculation time of the dense matrix vector multiplication before and after load balancing, and comparison shows that the load balancing strategy can effectively accelerate the solving speed.

Claims (9)

1. A finite element tearing and butting method oriented to numerical simulation of a reactor core component is characterized by comprising the following steps:
step 1: acquiring geometric model data of a reactor core assembly of a reactor, and performing meshing on the geometric model data to generate a mesh file;
step 2: each computing node acquires a grid file of a reactor core assembly and initializes related parameters;
and step 3: each computing node in the n computing nodes starts g processes, each process starts T threads, a grid of the reactor core component is divided into g x n areas, and each area is distributed with one process; meanwhile, each area is further divided into s sub-areas;
and 4, step 4: generating a corresponding finite element matrix in each subdomain in each process according to the allocated regions and the selected finite element method, wherein each subdomain generates a dense matrix;
and 5: collecting dense matrix information of each process, and judging the load balance phenomenon in the finite element processing process after comparison; if the load is considered to be unbalanced, the step (6) is carried out, otherwise, the step (7) is carried out;
step 6: starting a load balancing strategy, and adjusting the size of the memory occupied by the matrix of each process to be close to the average value; the method comprises the following steps:
6-1, calculating the average value of the memory size of the dense matrix according to the memory size of the dense matrix of each process;
6-2 comparing the memory size of the dense matrix of each process with the average value, if the memory size of the dense matrix of each process is larger than the average value, considering that the calculation amount of the process is larger, and setting the process as a helped person; if the average value is smaller than the average value, the calculated amount of the process is considered to be smaller, and the process is set as a helper;
6-3, dividing the process into two groups, namely a helper group and a helped group, sorting each group according to the size of a dense matrix memory, and correspondingly selecting a helper and a helped person;
6-4 the helped person sends 1 dense matrix to the helper;
6-5, repeating the step 6-4 until the dense matrix memory of the current helped person is smaller than the average value, and then changing the next helped person, or the dense matrix memory of the helper is larger than the average value, and then changing the next helper, and entering the step 6-4;
6-6 repeating the steps 6-4 to 6-5 until the memories of all helped persons are smaller than the average value or the memories of all the dense matrixes of the helpers are larger than the average value;
and 7: each process carries out iterative solution, vector inner product operation in each iteration of the iterative solution adopts a vector inner product acceleration strategy and a communication calculation overlapping strategy, dense matrix vector multiplication is calculated on a GPU-like accelerator by adopting HIP programming, and a dynamic matrix allocation strategy is adopted;
and 8: and each process obtains the displacement of all the nodes by solving the iteration result and locally solving the displacement of the internal nodes.
2. The finite element tearing and jointing method oriented to numerical simulation of a reactor core component as claimed in claim 1, wherein the step (5) of judging the load balancing phenomenon in the finite element processing process is specifically:
let the memory size occupied by the dense matrix of process i be LiLet Lmin=min{L1,L2,L3…Ln*g},Lmax=max{L1,L2,L3…Ln*g}; if it is not
Figure FDA0002872332360000021
If X represents a threshold value, the phenomenon of unbalanced load of the reactor core component is considered to occur in the finite element processing process, a load balancing strategy is adopted for adjustment, and the step 6 is carried out; otherwise, the load is considered to be balanced in the finite element processing process, and the step 7 is directly carried out.
3. A finite element tear butt method oriented to reactor core component numerical simulation according to claim 1 or 2, wherein the vector inner product acceleration strategy of step (7) is to solve local vector inner products of each process in parallel by multiple threads.
4. The finite element tearing and jointing method oriented to the numerical simulation of the reactor core component according to claim 1 or 3, wherein the communication calculation overlapping strategy in the step (7) uses 1 thread for communication for each process, and the rest T-1 threads continue to participate in the calculation of the local vector inner product, wherein the communication threads participate in the calculation of the local vector inner product after completing the communication.
5. The finite element tearing and butting method oriented to numerical simulation of reactor core components, according to claim 1 or 4, characterized in that the dynamic allocation matrix strategy in step (7) is specifically that when dense matrix vector multiplication is performed, 1 thread is used for calling a hipplas library for each process, a similar GPU accelerator is used for performing dense matrix vector multiplication calculation, and the rest T-1 threads are used for calling an Intel MKL library and a CPU is used for performing dense matrix vector multiplication calculation; and dynamically allocating the matrix quantity to the CPU and the GPU-like accelerator according to the calculation time of the dense matrix vector multiplied by the CPU and the GPU-like accelerator during each iteration.
6. The finite element tear butt method oriented to reactor core component numerical simulation of claim 5, wherein the step (7) is specifically formulated as follows:
Figure FDA0002872332360000022
Figure FDA0002872332360000023
Figure FDA0002872332360000024
Figure FDA0002872332360000025
where N represents the total number of dense matrices that the current process needs to process,
Figure FDA0002872332360000026
representing the number of dense matrices assigned by the GPU-like accelerator in the next iteration,
Figure FDA0002872332360000027
indicating the number of dense matrices to which the CPU is assigned in the next iteration,
Figure FDA0002872332360000028
representing the number of dense matrices allocated by the last iteration class GPU accelerator,
Figure FDA0002872332360000029
representing the number of dense matrices, t, assigned by the CPU in the previous iterationcRepresenting the computation time, t, of the last iteration of the class GPU acceleratordRepresenting the computation time of the CPU in the last iteration, xc_subRepresenting the number of dense matrices, x, to which a single CPU core is allocatedtmpIs a temporary variable.
7. A finite element tearing and butting system for reactor numerical simulation is applied to each computing node, and each computing node is provided with g GPU accelerators; the system is characterized by comprising an input module, a region division module, a matrix assembly module, a resource collection module, a load balancing module, an iteration solving module and a local solving module;
the input module is used for acquiring grid file data and setting initialization parameters;
the area dividing module is used for dividing the grid into a plurality of areas and dividing each area into a plurality of sub-areas;
the matrix assembly module is used for generating a corresponding finite element matrix in each sub-area;
the resource collection module is used for collecting the dense matrix size information of each process and carrying out memory occupation comparison;
the load balancing module is used for calling a load balancing strategy and redistributing the dense matrix of each process;
the iteration solving module is used for solving the displacement of each region boundary node by adopting the existing iteration method; calling a vector inner product acceleration strategy and a communication calculation overlapping strategy;
and the local solving module is used for solving the displacement of the internal node of each region.
8. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-6.
9. A computing device comprising a memory having executable code stored therein and a processor that, when executing the executable code, implements the method of any of claims 1-6.
CN202011607981.8A 2020-12-30 2020-12-30 Finite element tearing butt joint method and system for numerical simulation of reactor core assembly Active CN112733401B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011607981.8A CN112733401B (en) 2020-12-30 2020-12-30 Finite element tearing butt joint method and system for numerical simulation of reactor core assembly

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011607981.8A CN112733401B (en) 2020-12-30 2020-12-30 Finite element tearing butt joint method and system for numerical simulation of reactor core assembly

Publications (2)

Publication Number Publication Date
CN112733401A true CN112733401A (en) 2021-04-30
CN112733401B CN112733401B (en) 2024-03-12

Family

ID=75610898

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011607981.8A Active CN112733401B (en) 2020-12-30 2020-12-30 Finite element tearing butt joint method and system for numerical simulation of reactor core assembly

Country Status (1)

Country Link
CN (1) CN112733401B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102232282A (en) * 2010-10-29 2011-11-02 华为技术有限公司 Method and apparatus for realizing load balance of resources in data center
CN103731498A (en) * 2013-12-31 2014-04-16 浙江鸿程计算机系统有限公司 Big data real-time enquiry system load balancing method based on copy selection
CN105045670A (en) * 2015-09-01 2015-11-11 浪潮(北京)电子信息产业有限公司 Method and system for balancing loads of central processing units and graphic processing units
CN110472187A (en) * 2019-08-06 2019-11-19 中国原子能科学研究院 A kind of load balancing parallel method of the three-dimensional neutron transport method of characteristic curves
CN112016232A (en) * 2020-08-31 2020-12-01 中国原子能科学研究院 Tear finite element process processing method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102232282A (en) * 2010-10-29 2011-11-02 华为技术有限公司 Method and apparatus for realizing load balance of resources in data center
CN103731498A (en) * 2013-12-31 2014-04-16 浙江鸿程计算机系统有限公司 Big data real-time enquiry system load balancing method based on copy selection
CN105045670A (en) * 2015-09-01 2015-11-11 浪潮(北京)电子信息产业有限公司 Method and system for balancing loads of central processing units and graphic processing units
CN110472187A (en) * 2019-08-06 2019-11-19 中国原子能科学研究院 A kind of load balancing parallel method of the three-dimensional neutron transport method of characteristic curves
CN112016232A (en) * 2020-08-31 2020-12-01 中国原子能科学研究院 Tear finite element process processing method and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
RADIM VAVˇR´IK等: "Acceleration Techniques for FETI Solvers for GPU Accelerators", 《INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION》 *
刘朵等: "Hadoop平台中一种Reduce负载均衡贪心算法", 《计算机应用研究》, vol. 33, no. 9, pages 2658 *
宛汀等: "有限元边界积分结合撕裂对接法分析电磁散射", 《系统工程与电子技术》, vol. 32, no. 9 *

Also Published As

Publication number Publication date
CN112733401B (en) 2024-03-12

Similar Documents

Publication Publication Date Title
Ida Lattice H-matrices on distributed-memory systems
CN112101525A (en) Method, device and system for designing neural network through NAS
Iwashita et al. Software framework for parallel BEM analyses with H-matrices using MPI and OpenMP
Alonso et al. An efficient multiblock method for aerodynamic analysis and design on distributed memory systems
Lastovetsky et al. Data distribution for dense factorization on computers with memory heterogeneity
Tesser et al. Improving the performance of seismic wave simulations with dynamic load balancing
Kopysov et al. Hybrid Multi-GPU solver based on Schur complement method
Ida et al. Parallel hierarchical matrices with block low-rank representation on distributed memory computer systems
CN112733401A (en) Finite element tearing and butt joint method and system for numerical simulation of reactor core assembly
CN108879691B (en) Large-scale continuous power flow calculation method and device
KR20190093932A (en) Arithmetic processing apparatus and method in deep running system
CN109101708B (en) Implicit finite element parallel method based on two-stage region decomposition
Li et al. Performance optimization algorithm of radar signal processing system
Lastovetsky et al. Data partitioning for multiprocessors with memory heterogeneity and memory constraints
Sarje et al. Parallel performance optimizations on unstructured mesh-based simulations
Heuveline et al. Parallel smoothers for matrix-based geometric multigrid methods on locally refined meshes using multicore CPUs and GPUs
CN108599173B (en) Method and device for solving batch power flows
CN114021070A (en) Deep convolution calculation method and system based on micro-architecture processor
Sarma et al. Exploiting activation based gradient output sparsity to accelerate backpropagation in CNNs
CN112016232A (en) Tear finite element process processing method and system
Korch et al. Parallelization of particle-in-cell codes for nonlinear kinetic models from mathematical physics
Marrakchi et al. Static scheduling with load balancing for solving triangular band linear systems on multicore processors
Pinar et al. Improving load balance with flexibly assignable tasks
Gao et al. Revisiting thread configuration of SpMV kernels on GPU: A machine learning based approach
CN116151340B (en) Parallel random computing neural network system and hardware compression method and system thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant