CN112733401A

CN112733401A - Finite element tearing and butt joint method and system for numerical simulation of reactor core assembly

Info

Publication number: CN112733401A
Application number: CN202011607981.8A
Authority: CN
Inventors: 张纪林; 张鋆宸; 王珏; 冯仰德; 聂宁明; 丁佳明
Original assignee: Computer Network Information Center of CAS; Hangzhou Dianzi University
Current assignee: Computer Network Information Center of CAS; Hangzhou Dianzi University
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2021-04-30
Anticipated expiration: 2040-12-30
Also published as: CN112733401B

Abstract

The invention discloses a finite element tearing and butt joint method and system for numerical simulation of a reactor core component. Each of the n computing nodes is provided with the finite element tearing and butting system, and each computing node is provided with g GPU accelerators. The invention adopts a load balancing strategy, so that the size of the dense matrix memory of each process tends to an average value, cluster resources are fully utilized, and the solving speed is accelerated. And adopting HIP programming to ensure that the finite element tearing and butting method operates on the NvidiaCUDA platform and the AMDROMc platform. In the dense matrix vector multiplication stage of the iterative solution process, a dynamic allocation matrix strategy is adopted, so that different processors are allocated to proper calculation amount, calculation resources are fully utilized, and the solution speed is accelerated. In the vector inner product stage, a vector inner product acceleration strategy and a communication calculation overlapping strategy are adopted, and by introducing a communication thread, the communication waiting time is reduced, and the vector inner product speed is accelerated.

Description

Finite element tearing and butt joint method and system for numerical simulation of reactor core assembly

Technical Field

The invention relates to a finite element tearing and butting process processing technology, in particular to a finite element tearing and butting method and a finite element tearing and butting system for numerical simulation of a reactor core component.

Background

In the reactor core assembly in the nuclear reactor, the conditions of deformation of the reactor core assembly and abrasion of fuel rods can occur under the environments of high temperature, irradiation, fluid, pressure and the like, so that a series of problems of difficult material loading and unloading, assembly damage, fatigue damage and the like are caused, and the safe operation of the reactor is influenced. And because the theoretical analysis method is very difficult due to the problems of special arrangement of the reactor core components and the like, the numerical simulation of the reactor core components needs to be carried out by adopting a finite element method.

The Finite element Tearing and abutting method (FETI) is an effective scheme for solving the mechanical problem of the reactor structure, is mainly used for processing the large-scale problem obtained by discretization of a partial differential equation, is an important method required by large-scale numerical simulation of a reactor core assembly, and is also suitable for the fields of electromagnetics, aeronautics, mechanical manufacturing and the like. The finite element tear butt joint method was originally proposed in the field of structural mechanics by c.farhart and f.x.roux, which is a non-overlapping region decomposition method that divides a model into many non-overlapping subfields, and each subfield is independent. In order to ensure the continuity between sub-domains, the FETI method adds a set of unknowns (lagrange multipliers LM), and in actual solution, the LM is obtained by using a Krylov subspace iteration method, and then a sub-domain equation is solved in each sub-domain.

However, the original FETI method is not computationally efficient. To solve this problem, Farhat et al proposed in 2001 the FETI-DP method (a dual-primal unified FETI method) which no longer required a second set of Lagrangian multipliers and unifies all the previously developed one-layer and two-layer FETI methods into a single dual-pair. The FETI-DP is more robust than the FETI method, has higher calculation efficiency, and is suitable for solving second-order and fourth-order problems. In 2006, Dost' al et al proposed the TFETI method (Total FERI). This approach is a variant of the FETI approach, where Dirichlet boundary conditions are also correlated with LM (lagrange multiplier), however the rough problem remains an important factor that limits the scalability of the FETI approach.

To reduce the impact of the coarse problem and improve scalability, Klawonn and Rheinbach et al propose HFETI method (Hybrid FETI method) in 2010. The method combines the FETI method and the FETI-DP method, and a plurality of subdomains are gathered into a cluster, so that the method can be regarded as a three-level region decomposition method. First, a FETI-DP system needs to be built to handle all clusters. Each cluster is then made up of multiple subdomains, and the subdomains in each cluster need to be processed using conventional FETI methods. Also, in 2012, Kozubek et al proposed a similar method, HTFETI method (Hybrid Total FETI method). The method combines the FETI method with the TFETI method, using the TFETI method for the subfields in each cluster, and using the FETI method with projections for the clusters. The HTFETI method can effectively reduce the coarse problem.

However, in the iterative solution of FETI, the sparse matrix vector operation consumes a lot of time, so Riha et al proposed an LSC method (Local Schur complete method) in 2016, replacing the sparse matrix vector operation with more efficient dense matrix vector multiplication (GEMV), which is equivalent to a time-over-space strategy. This dense BLAS level 2 operation has continuous memory access rights and therefore better performance for memory-limited applications. At the same time, dense matrix vector multiplication is suitable for processing with the GPU accelerator. Vavrik et al used CUDA programming to multiply dense matrix vectors for processing by the GPU in 2018.

However, the existing finite element tearing and butting method still has the following problems which need to be solved urgently: 1) currently, a CUDA (compute unified device architecture) programming is used in a finite element tearing docking solver supporting heterogeneous parallel, however, the solver realized by using the CUDA programming can only run on an Nvidia CUDA platform, and does not support other types of GPU accelerators; 2) when the GPU is adopted to calculate the vector multiplication of the dense matrix, the CPU is in an idle state, and the calculation resources of the cluster are not fully utilized; 3) when the numerical simulation of the reactor core assembly is actually carried out, the size of the dense matrix memory assembled by each process is greatly different, and even the difference between the calculation time and the memory size is up to 6 times, so that the process with less calculation amount takes a large amount of time to wait for other processes, and the solution time is increased.

Disclosure of Invention

The invention aims to solve the problems of the existing finite element tearing and butt joint method, provides a finite element tearing and butt joint system for numerical simulation of a reactor core component, fully utilizes cluster resources, accelerates the solving speed, reduces the communication waiting time and improves the transportability.

A finite element tearing and butting system for reactor numerical simulation comprises an input module, a region dividing module, a matrix assembling module, a resource collecting module, a load balancing module, an iteration solving module and a local solving module.

And the input module is used for acquiring the grid file data and setting initialization parameters.

The area dividing module is used for dividing the grid into a plurality of areas and dividing each area into a plurality of sub-areas.

And the matrix assembly module is used for generating a corresponding finite element matrix in each sub-area.

And the resource collection module is used for collecting the dense matrix size information of each process and carrying out memory occupation comparison.

And the load balancing module is used for calling a load balancing strategy and reallocating the dense matrix of each process.

The iteration solving module is used for solving the displacement of each region boundary node by adopting the existing iteration method; and invokes a vector inner product acceleration strategy and a communication computation overlap strategy.

And the local solving module is used for solving the displacement of the internal node of each region.

Each of the n computing nodes is provided with the finite element tearing and butting system, and each computing node is provided with g GPU accelerators.

Another object of the present invention is to provide a finite element tearing and butting method for numerical simulation of a reactor core assembly, which comprises the following steps:

step 1: and acquiring geometric model data of the reactor core assembly, and performing meshing on the geometric model data through the existing software to generate a mesh file.

Step 2: each computing node acquires a grid file of the reactor core component through an input module, and initializes related parameters: finite element method, iteration method, maximum iteration number, iteration precision, reactor core component material parameters, reactor core component boundary conditions and the like.

The finite element method may be a FETI or HTFETI method.

And step 3: each computing node in the n computing nodes starts g processes, each process starts T threads, the grid obtained by the input module is divided into g x n regions through the region dividing module, and each region is distributed with one process; while each region is further divided into s sub-regions.

And 4, step 4: and generating a corresponding finite element matrix in each subdomain through a matrix assembly module according to the allocated region and the selected finite element method in each process, wherein each subdomain generates a dense matrix. Thus, each process will generate s dense matrices.

And 5: collecting dense matrix information of each process by using a resource collection module, wherein the memory occupied by the dense matrix of the process i is L_iLet L_min＝min{L₁，L₂，L₃...L_n*g},L_max＝max{L₁，L₂，L₃...L_n*g}. If it is not

X represents a threshold value, the phenomenon of load imbalance of the reactor core component is considered to occur in the finite element processing process, a load balancing strategy is adopted for adjustment, and thenEntering a step 6; otherwise, the load is considered to be balanced in the finite element processing process, and the step 7 is directly carried out.

Step 6: starting a load balancing strategy through a load balancing module, and adjusting the memory occupied by the matrix of each process to be near the average value, specifically:

6-1, calculating the average memory size of the dense matrix according to the memory size of the dense matrix of each process;

6-2 comparing the memory size of the dense matrix of each process with the average value, if the memory size of the dense matrix of each process is larger than the average value, considering that the calculation amount of the process is larger, needing other process help, and setting the process as a helped person; if the average value is smaller than the average value, the calculated amount of the process is considered to be smaller, and other processes can be helped and set as helpers;

6-3, dividing the process into two groups, namely a helper group and a helped group, sorting each group according to the size of a dense matrix memory, and correspondingly selecting a helper and a helped person;

6-4 the helped person sends 1 dense matrix to the helper;

6-5, repeating the step 6-4 until the dense matrix memory of the current helped person is smaller than the average value, and then changing the next helped person, or the dense matrix memory of the helper is larger than the average value, and then changing the next helper, and entering the step 6-4;

6-6 repeat steps 6-4 through 6-5 until the memory of all helped persons is less than the average or the dense matrix memory of all helpers is greater than the average.

And 7: and (3) carrying out iterative solution on each process through an iterative solution module, wherein a vector inner product acceleration strategy and a communication calculation overlapping strategy are adopted for vector inner product operation in each iteration step of the iterative solution, dense matrix vector multiplication is calculated on a GPU-like accelerator by adopting HIP (heterogeneous calculation portable interface) programming, and a dynamic matrix allocation strategy is adopted.

The vector inner product acceleration strategy is to solve the local vector inner product of each process in parallel by multiple threads.

The communication calculation overlap strategy uses 1 thread for communication for each process, and the rest T-1 threads continue to participate in the calculation of the inner product of the local vector, wherein the communication threads participate in the calculation of the inner product of the local vector after finishing the communication.

The dynamic allocation matrix strategy is characterized in that when dense matrix vector multiplication is carried out, 1 thread is used for calling a hipMALS library in each process, a block of GPU accelerator is used for carrying out dense matrix vector multiplication calculation, the rest T-1 threads call an Intel MKL library, and a CPU is used for carrying out dense matrix vector multiplication calculation. And dynamically allocating the matrix quantity to the CPU and the GPU-like accelerator according to the calculation time of the dense matrix vector multiplied by the CPU and the GPU-like accelerator during each iteration. The specific formula is as follows:

where N represents the total number of dense matrices that the current process needs to process,

representing the number of dense matrices assigned by the GPU-like accelerator in the next iteration,

indicating the number of dense matrices to which the CPU is assigned in the next iteration,

representing the number of dense matrices allocated by the last iteration class GPU accelerator,

representing the number of dense matrices, t, assigned by the CPU in the previous iteration_cRepresenting the computation time, t, of the last iteration of the class GPU accelerator_dRepresenting the computation time of the CPU in the last iteration, x_{c_sub}Representing the number of dense matrices, x, to which a single CPU core is allocated_tmpIs a temporary variable.

And 8: and each process obtains the displacement of all the nodes by solving the displacement of the internal nodes through a local solving module according to the iteration solving result.

It is a further object of the present invention to provide a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the above-mentioned method.

It is a further object of the present invention to provide a computing device comprising a memory having stored therein executable code and a processor that, when executing the executable code, implements the method described above.

The invention has the beneficial effects that:

when the finite element tearing and butting method is used for carrying out numerical simulation on the reactor core component, the load balancing strategy is adopted, so that the size of the dense matrix memory of each process tends to be an average value, cluster resources can be fully utilized, and the solving speed is accelerated. Meanwhile, the invention adopts HIP programming, so that the finite element tearing and butting method can operate on an Nvidia CUDA platform and an AMD ROCM platform, and the portability of codes is increased. In the dense matrix vector multiplication stage of the iterative solution process, a dynamic allocation matrix strategy is adopted, so that different processors are allocated to proper calculation amount, calculation resources are fully utilized, and the solution speed is accelerated. In the vector inner product stage, a vector inner product acceleration strategy and a communication calculation overlapping strategy are adopted, and by introducing a communication thread, the communication waiting time is reduced, and the vector inner product speed is accelerated.

Drawings

FIG. 1 is a flow chart of a finite element tear butt joint acceleration method;

FIG. 2 is a graph of dense matrix memory footprint comparison;

FIG. 3 dense matrix vector multiplication computation time contrast diagram.

Detailed Description

The present invention is further analyzed with reference to the following specific examples.

The finite element tearing and butt joint method oriented to the numerical simulation of the reactor core component is applied to the deformation prediction of the reactor core component under the high-temperature condition.

A method for predicting deformation of a reactor core assembly in a nuclear reactor comprises a finite element tearing and butting acceleration method for numerical simulation of the reactor core assembly of the reactor; the deformation condition of the current core assembly can be obtained through the obtained solving results (namely the displacement of all the nodes), so that a basis is provided for the analysis and design of the core assembly.

The following specific implementation steps and descriptions are as follows:

a finite element tearing and butting system for numerical simulation of a reactor core component comprises an input module, a region dividing module, a matrix assembling module, a resource collecting module, a load balancing module, an iteration solving module and a local solving module.

A finite element tearing and butting method for numerical simulation of a reactor core component comprises the following specific steps as shown in figure 1:

The finite element method may be a FETI or HTFETI method.

The iteration method is a pretreatment conjugate gradient method and is used for solving the following finite element equations:

wherein:

F＝BK⁺B^T

G＝BR

d＝BK⁺f

e＝R^Tf

matrix B is a displacement coordination matrix such that the node displacements on the contact faces of adjacent sub-domains are equal. Matrix K is a finite element stiffness matrix. The matrix R is the basis of the null space of the stiffness matrix K and the adjacent f is the load vector. λ and α are both unknowns, by selecting a pre-processing matrix P ═ I-G (G)^TG)^-1G^TAnd eliminating the variable alpha, and solving by a preprocessing conjugate gradient method to obtain the sub-domain boundary node displacement lambda.

The specific algorithm is described as follows:

and step 3: the grid input method comprises the following steps that n computing nodes are started, each computing node starts g processes, each process starts T threads, a grid obtained by an input module is divided into g x n regions through a region dividing module, and each region is distributed with one process; while each region is further divided into s sub-regions.

And 5: collecting dense matrix information of each process by using a resource collection module, wherein the memory occupied by the dense matrix of the process i is L_iLet L_min＝min{L₁，L₂，L₃…L_n*g}，L_max＝max{L₁，L₂，L₃…L_n*g}. If it is not

Considering that the reactor core assembly has a load imbalance phenomenon in the finite element processing process, adjusting by adopting a load balancing strategy, and entering step 6; otherwise, the load is considered to be balanced in the finite element processing process, and the step 7 is directly carried out.

Step 6: and starting a load balancing strategy through a load balancing module, and adjusting the memory occupied by the matrix of each process to be close to the average value.

The load balancing strategy specifically comprises:

a) calculating the average memory size of the dense matrix according to the memory size of the dense matrix of each process;

b) comparing the size of the dense matrix memory of each process with the average value, if the size is larger than the average value, considering that the calculated amount of the process is larger, needing the help of other processes, and setting the process as a helped person; if the average value is smaller than the average value, the calculated amount of the process is considered to be smaller, the process of the group can be helped, and the process can be set as a helper;

c) dividing the process into two groups, namely a helper group and a helped group, sorting each group according to the size of a dense matrix memory, and correspondingly selecting one helper and one helped group;

d) the helped person sends 1 dense matrix to the helper;

e) repeating the step d) until the dense matrix memory of the current helped person is smaller than the average value, and then changing the next helped person, or the dense matrix memory of the helper is larger than the average value, and then changing the next helper, and entering the step d);

d) repeating steps d) and e) until the memory of all helped persons is less than the average value, or the dense matrix memory of all helpers is greater than the average value.

And 7: and carrying out iterative solution on each process through an iterative solution module, wherein a vector inner product acceleration strategy and a communication calculation overlapping strategy are adopted for vector inner product operation in each iteration step of the iterative solution, and a dense matrix vector multiplication is calculated on a GPU-like accelerator by adopting HIP programming and a dynamic matrix allocation strategy is adopted.

The communication calculation overlap strategy uses 1 thread for communication for each process, the rest t-1 threads continue to participate in the calculation of the local vector inner product, and when the communication thread finishes the communication, the thread participates in the calculation of the local vector inner product again. The specific algorithm is described as follows:

Fig. 2 shows comparison of the memory sizes of the dense matrices before and after load balancing, which indicates that the load of each process can be adjusted to an average value by the load balancing strategy, so that too long communication waiting time caused by unbalanced load is avoided, and cluster resources are fully utilized. Fig. 3 shows the calculation time of the dense matrix vector multiplication before and after load balancing, and comparison shows that the load balancing strategy can effectively accelerate the solving speed.

Claims

1. A finite element tearing and butting method oriented to numerical simulation of a reactor core component is characterized by comprising the following steps:

step 1: acquiring geometric model data of a reactor core assembly of a reactor, and performing meshing on the geometric model data to generate a mesh file;

step 2: each computing node acquires a grid file of a reactor core assembly and initializes related parameters;

and step 3: each computing node in the n computing nodes starts g processes, each process starts T threads, a grid of the reactor core component is divided into g x n areas, and each area is distributed with one process; meanwhile, each area is further divided into s sub-areas;

and 4, step 4: generating a corresponding finite element matrix in each subdomain in each process according to the allocated regions and the selected finite element method, wherein each subdomain generates a dense matrix;

and 5: collecting dense matrix information of each process, and judging the load balance phenomenon in the finite element processing process after comparison; if the load is considered to be unbalanced, the step (6) is carried out, otherwise, the step (7) is carried out;

step 6: starting a load balancing strategy, and adjusting the size of the memory occupied by the matrix of each process to be close to the average value; the method comprises the following steps:

6-1, calculating the average value of the memory size of the dense matrix according to the memory size of the dense matrix of each process;

6-2 comparing the memory size of the dense matrix of each process with the average value, if the memory size of the dense matrix of each process is larger than the average value, considering that the calculation amount of the process is larger, and setting the process as a helped person; if the average value is smaller than the average value, the calculated amount of the process is considered to be smaller, and the process is set as a helper;

6-4 the helped person sends 1 dense matrix to the helper;

6-6 repeating the steps 6-4 to 6-5 until the memories of all helped persons are smaller than the average value or the memories of all the dense matrixes of the helpers are larger than the average value;

and 7: each process carries out iterative solution, vector inner product operation in each iteration of the iterative solution adopts a vector inner product acceleration strategy and a communication calculation overlapping strategy, dense matrix vector multiplication is calculated on a GPU-like accelerator by adopting HIP programming, and a dynamic matrix allocation strategy is adopted;

and 8: and each process obtains the displacement of all the nodes by solving the iteration result and locally solving the displacement of the internal nodes.

2. The finite element tearing and jointing method oriented to numerical simulation of a reactor core component as claimed in claim 1, wherein the step (5) of judging the load balancing phenomenon in the finite element processing process is specifically:

let the memory size occupied by the dense matrix of process i be L_iLet L_min＝min{L₁，L₂，L₃…L_n*g},L_max＝max{L₁，L₂，L₃…L_n*g}; if it is not

If X represents a threshold value, the phenomenon of unbalanced load of the reactor core component is considered to occur in the finite element processing process, a load balancing strategy is adopted for adjustment, and the step 6 is carried out; otherwise, the load is considered to be balanced in the finite element processing process, and the step 7 is directly carried out.

3. A finite element tear butt method oriented to reactor core component numerical simulation according to claim 1 or 2, wherein the vector inner product acceleration strategy of step (7) is to solve local vector inner products of each process in parallel by multiple threads.

4. The finite element tearing and jointing method oriented to the numerical simulation of the reactor core component according to claim 1 or 3, wherein the communication calculation overlapping strategy in the step (7) uses 1 thread for communication for each process, and the rest T-1 threads continue to participate in the calculation of the local vector inner product, wherein the communication threads participate in the calculation of the local vector inner product after completing the communication.

5. The finite element tearing and butting method oriented to numerical simulation of reactor core components, according to claim 1 or 4, characterized in that the dynamic allocation matrix strategy in step (7) is specifically that when dense matrix vector multiplication is performed, 1 thread is used for calling a hipplas library for each process, a similar GPU accelerator is used for performing dense matrix vector multiplication calculation, and the rest T-1 threads are used for calling an Intel MKL library and a CPU is used for performing dense matrix vector multiplication calculation; and dynamically allocating the matrix quantity to the CPU and the GPU-like accelerator according to the calculation time of the dense matrix vector multiplied by the CPU and the GPU-like accelerator during each iteration.

6. The finite element tear butt method oriented to reactor core component numerical simulation of claim 5, wherein the step (7) is specifically formulated as follows:

7. A finite element tearing and butting system for reactor numerical simulation is applied to each computing node, and each computing node is provided with g GPU accelerators; the system is characterized by comprising an input module, a region division module, a matrix assembly module, a resource collection module, a load balancing module, an iteration solving module and a local solving module;

the input module is used for acquiring grid file data and setting initialization parameters;

the area dividing module is used for dividing the grid into a plurality of areas and dividing each area into a plurality of sub-areas;

the matrix assembly module is used for generating a corresponding finite element matrix in each sub-area;

the resource collection module is used for collecting the dense matrix size information of each process and carrying out memory occupation comparison;

the load balancing module is used for calling a load balancing strategy and redistributing the dense matrix of each process;

the iteration solving module is used for solving the displacement of each region boundary node by adopting the existing iteration method; calling a vector inner product acceleration strategy and a communication calculation overlapping strategy;

8. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-6.

9. A computing device comprising a memory having executable code stored therein and a processor that, when executing the executable code, implements the method of any of claims 1-6.