CN109101708A - The implicit finite element parallel method decomposed based on level-2 area - Google Patents

The implicit finite element parallel method decomposed based on level-2 area Download PDF

Info

Publication number
CN109101708A
CN109101708A CN201810826770.XA CN201810826770A CN109101708A CN 109101708 A CN109101708 A CN 109101708A CN 201810826770 A CN201810826770 A CN 201810826770A CN 109101708 A CN109101708 A CN 109101708A
Authority
CN
China
Prior art keywords
node
parallel
finite element
pretreatment
group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810826770.XA
Other languages
Chinese (zh)
Other versions
CN109101708B (en
Inventor
付朝江
王天奇
林悦荣
潘钦锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujian University of Technology
Original Assignee
Fujian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian University of Technology filed Critical Fujian University of Technology
Priority to CN201810826770.XA priority Critical patent/CN109101708B/en
Publication of CN109101708A publication Critical patent/CN109101708A/en
Application granted granted Critical
Publication of CN109101708B publication Critical patent/CN109101708B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/23Design optimisation, verification or simulation using finite element methods [FEM] or finite difference methods [FDM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Algebra (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Operations Research (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • Computing Systems (AREA)
  • Devices For Executing Special Programs (AREA)
  • Complex Calculations (AREA)

Abstract

The present invention provides a kind of implicit finite element parallel method decomposed based on level-2 area, which comprises establishes the solution procedure of implicit finite element nonlinear analysis;Level-2 area decomposition is carried out to domain, establishes the Parallel implementation step of implicit finite element nonlinear analysis;Pretreatment is chosen, LPCG solver is established and the equilibrium equation group of Newton iteration method is solved;The related figure of building;Parallelization is carried out to pretreatment using weighted balance colouring algorithm, and realizes calculating and communication overlapping.The invention has the advantages that decomposing using level-2 area, make each processor that can carry out fine-grained parallel computation;It is combined using non-structural related figure and weighting colouring algorithm, can realize parallel computation in region and interregional well;It can make to calculate time reduction using HW pretreatment, there is better performance.

Description

The implicit finite element parallel method decomposed based on level-2 area
Technical field
The present invention relates to structural nonlinear Dynamic Finite Element Analysis fields, in particular to are decomposed based on level-2 area implicit Finite element parallel method.
Background technique
In Practical Project problem, structural nonlinear dynamic finite element structural analysis is a computation-intensive task.? When carrying out numerical simulation with traditional serial finite element method on single machine, need that very long CPU is spent to calculate the time, serially Calculate the limitation for increasingly having manifested it.When carrying out structural nonlinear kinematic analysis, implicit scheme for finite element method is to solve for structure A kind of effective ways of time-histories reaction.In each time step of Newton solution by iterative method nonlinear equation, iterative solution line Sexual balance equation group can occupy a large amount of calculating time.Parallel computation can be substantially reduced the structural analysis time, can be multiple to large size Miscellaneous structure carries out explication de texte, inquires into and there is the Parallel implementation method of efficient parallel characteristic to have great importance for exploitation.
Parallel computation is using Domain Decomposition Method come Parallel implementation power balance equation.Existing Domain Decomposition Method can It is divided into traditional Domain Decomposition Method and iteration Domain Decomposition Method.Traditional Domain Decomposition Method is mostly used and is directly asked parallel Solve device;Iteration Domain Decomposition Method uses parallel iteration solver, such as linear pretreatment conjugate gradient (LPCG).
Existing sparse direct method needs decomposition coefficient matrix to carry out solve system of equation.Although this method is to nonsingular matrix energy Reliable solution is acquired, but operand and memory requirements increase with the model of required problem and increased sharply.Iterative method is to parallel meter Calculation usually has preferable scalability, and required memory is less, is suitable for distributed and shared drive system.But to big conditional number Equation system this method is difficult to restrain within reasonable time, will not even restrain.
First (EBE) method of the unit order of early development is to reduce memory requirements, it counts fine-grained vector parallel Calculator has very big advantage.Under current hardware environment, EBE is easily achieved the Region Decomposition based on unit.Based on EBE's LPCG solver requires pretreatment that can indicate using the calculating of cell level.It is pre- to locate for the scalability for realizing parallel computation Reason chooses the computer that be suitable for distributed memory.It is diagonal pre- place based on pretreatment that EBE solver is commonly used Reason and Hughes-Winget (HW) pretreatment.
Though diagonal pretreatment is easier to realize that unless equation group has very strong diagonal dominance, otherwise it can be shown parallel Show poor convergence.And HW often can provide better pretreatment.The realization of HW pretreatment on coarse grain parallel computer Being is region by FEM meshing, and each processor carries out the calculating of pretreatment in respective region.Each region by Boundary element and internal element composition.It is between processor while to carry out that HW pretreatment of internal element, which calculates,.But boundary is single The calculating of member needs effective Synchronization Design, to prevent processor while change the numerical value of shared node.
Attar is proposed distributes internal element between processor, and boundary element is kept on one processor.In this way And boundary element parallel to internal element realization executes to be serial.To large-scale three dimensional problem, the serial computing of boundary element is determined The speed of solution reduces parallel performance.King is decomposed using simple 1-D topology area realizes that HW is pre-processed.However, three-dimensional solid The finite element grid of mechanics usually has the unstructured topology of height.The formula area with simple boundary cannot be thus divided into Domain.There must be a dispatching algorithm to the efficient parallel realization of HW pretreatment of this height Unstructural Model, to keep The sequence of synchronous and unit, while keeping good load balance and reducing the communication delay between processor to the greatest extent.Analyze existing text Offer discovery, existing dispatching algorithm, it is difficult to good load balance is kept, so that the communication delay between increasing processor, leads Cause computational efficiency not high.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of implicit finite element decomposed based on level-2 area side parallel Method can make to calculate time reduction by the parallel method, have better performance.
The present invention is implemented as follows: the implicit finite element parallel method decomposed based on level-2 area, which comprises
Establish the solution procedure of implicit finite element nonlinear analysis;
Level-2 area decomposition is carried out to domain, establishes the Parallel implementation step of implicit finite element nonlinear analysis;
Pretreatment is chosen, LPCG solver is established and the equilibrium equation group of Newton iteration method is solved;
The related figure of building;
Parallelization is carried out to pretreatment using weighted balance colouring algorithm, and realizes calculating and communication overlapping.
Further, the solution procedure for establishing implicit finite element nonlinear analysis specifically includes:
Initialization step:
Calculate node equivalent load;
Determine useful load increment;
Calculate the norm of useful load increment;
Newton circulation step:
Step a, computing unit rigidity;
Step b, calculate node quality;
Step c, effective rigidity matrix and load vector are calculated;
Step d, new displacement increment is acquired using the LPCG solver based on EBE;
Step e, strain, stress and interior force vector are asked;
Step f, residual force and its norm are calculated;
Step g, judge whether residual force restrains, if not restraining, return step a;If convergence, updates strain, stress Value, and end loop.
Further, described that level-2 area decomposition is carried out to domain, establish implicit finite element nonlinear analysis and Row solution procedure specifically includes:
Region is first divided into the finite element grid in domain, and each region is distributed a processor into Row processing;Then homogeneous unit block division is carried out to each region, and makes all units in per the homogeneous unit block All have identical cell type, integral order, strain-displacement relation and material constitutive model;It is to be divided it is complete after, Ji Kejian Found the Parallel implementation step of implicit finite element nonlinear analysis.
Further, selection pretreatment is established LPCG solver and is carried out to the equilibrium equation group of Newton iteration method It solves specifically:
Son is pre-processed as pretreatment using HW, the LPCG solver based on EBE is constructed using LPCG algorithm, In, HW pretreatment carries out Crout decomposition for the preconditioning matrix C to LPCG algorithm, so that preconditioning matrix C only needs to carry out Cell level operation;
Use system of linear equations KTEquilibrium equation group of the Δ u=R as Newton iteration method, wherein KTFor tangent stiffness, Δ u For the corrected value of displacement increment, R is dynamic residual vector, and using building based on the LPCG solver of EBE come to linear equation Group KTΔ u=R is solved.
Further, the related figure of the building specifically:
By in each region unit sequence and concurrency problem be converted to the scheduling operation of relational graph, and by relational graph come Description is located at the process correlation between the unit of zone boundary, specifically includes: being decomposed using node of graph and defines each region Boundary group, and by defining the connection between node of graph the correlation established between the group of boundary, in each region except boundary group it Outer part is internal element.
Further, described that parallelization is carried out to pretreatment using weighted balance colouring algorithm specifically: using weighting It balances colouring algorithm and node of graph is split into each parallel group, and make in each parallel group without the shared connection of node of graph;It is described Steps are as follows for the realization of weighted balance colouring algorithm:
Step 11 lists all node of graph by weight factor descending order;
Step 12 constructs connection between all node of graph in same processor;
Step 13 selects independent node set using colouring algorithm from the node currently without distribution;
Step 14 is searched with most multiunit node of graph in each parallel group as target;
Step 15 recycles each processor for being less than object element number, specifically includes:
(1) processor recycles node of graph;
(2) if node of graph and will not facilitate access to object element number with any node conflict in parallel group, then will The node of graph is added in parallel group;Otherwise it is just added without in parallel group;
(3) all new node of graph are indicated as current distribution;
(4) judge whether to there remains the node not distributed, if so, continuing step 15;If it is not, then terminating to follow Ring.
Further, the realization calculates and communication is overlapped specifically: when starting to calculate one new parallel group, this is simultaneously Each processor in row group carries out pretreatment calculating to all units respectively possessed, and has been calculated in a processor Cheng Hou, the processor just start non-obstruction and send, and the updated value of node on zone boundary is sent to all places adjacent thereto Manage device;Then, to the domain of communication and calculating overlapping, pretreatment, which calculates, to be started to distribute internal element by fixed value, internal when distributing After the completion of the calculating of unit, the non-obstruction of processor detection sends and receives whether complete, and if be completed, starts under calculating One parallel group;If do not completed, Deng until start to calculate next parallel group after the completion again.
The present invention has the advantage that
(1) level-2 area is taken to decompose.It is region that the first order, which is by FEM meshing, and each region can be realized well Load balance and reduce processor between communication;The second level is by each Region Decomposition into homogeneous unit block, by each In region carry out cell block division so that cell level calculating carried out in each cell block, in this way, it is inner most circulation always into Row unit calculates, and occupies very big workload using the inside circulation of this framework creation, is conducive to local parallel;
(2) it by defining boundary group, construct related figure and realize parallelization using colouring algorithm, can avoid algorithm to having The Region Decomposition for limiting first grid applies limitation.By reducing the sum of boundary group, it can be achieved that good Region Decomposition is logical to reduce Letter;It takes weighting colouring algorithm to form parallel group, internal element is distributed to parallel group, load balance can be refined, and make simultaneously It being capable of overlapping communication and calculating between row group;
(3) HW pretreatment sublist reveals more preferable convergence property, can make to calculate time reduction using HW pretreatment, have more Good performance.
Detailed description of the invention
The present invention is further illustrated in conjunction with the embodiments with reference to the accompanying drawings.
Fig. 1 is the execution flow chart of the implicit finite element parallel method decomposed the present invention is based on level-2 area.
Fig. 2 is level-2 area decomposition diagram in the present invention.
Fig. 3 is to pre-process sub- parallel build process in the present invention.
Fig. 4 is the circular shell computation model in present example.
Fig. 5 is the displacement response figure of the central node in present example.
Fig. 6 is one of schematic diagram of speed-up ratio of algorithm (unit number 41600) in present example.
Fig. 7 is two (unit numbers 183200) of the schematic diagram of the speed-up ratio of algorithm in present example.
Specific embodiment
Please refer to shown in Fig. 1 to Fig. 7, the present invention it is a kind of based on level-2 area decompose implicit finite element parallel method compared with Good embodiment, which comprises
Step S1, the solution procedure of implicit finite element nonlinear analysis is established;
In the specific implementation, the step S1 specifically comprises the following steps:
Step S11, initialization step:
Calculate node equivalent load;
Determine useful load increment;
Calculate the norm of useful load increment;
Step S12, Newton circulation step:
Step a, computing unit rigidity;
Step b, calculate node quality;
Step c, effective rigidity matrix and load vector are calculated;
Step d, new displacement increment is acquired using the LPCG solver based on EBE;
Step e, strain, stress and interior force vector are asked;
Step f, residual force and its norm are calculated;
Step g, judge whether residual force restrains, if not restraining, return step a;If convergence, updates strain, stress Value, and end loop.
For the solution procedure of above-mentioned implicit finite element nonlinear analysis, if using traditional serial finite element analysis side Method carries out numerical simulation, it will very long CPU is spent to calculate the time;And parallel computation can be substantially reduced the structural analysis time, because This, needs to establish Parallel implementation step.
Step S2, level-2 area decomposition is carried out to domain, establishes the Parallel implementation of implicit finite element nonlinear analysis Step;
The step S2 specifically: region is first divided into the finite element grid in domain, and to each region A processor is distributed to be handled;Then homogeneous unit block division is carried out to each region, and made per described similar All units in cell block all have identical cell type, integral order, strain-displacement relation and material constitutive mould Type occupies very big workload using the inside circulation of this framework creation, is conducive to local parallel.
Be described further below with reference to Fig. 2 to the region division in step S2: the first order is region class, it is that will have Limiting first grid dividing is region, and each region is distributed a processor and handled, to realize efficient coarse grain parallelism; The second level is piecemeal grade, it is that each region is further subdivided into homogeneous unit block, efficient thin to realize on each processor Granularity is parallel.Due to carrying out cell block division in each region, cell level calculating can carry out in each cell block, Circulation inner most so always carries out unit calculating.Such as the calculating of rigidity, circulation is to cell block, then to Gauss first Point, finally to each unit in block.
It is to be divided it is complete after, the Parallel implementation step of implicit finite element nonlinear analysis can be established;After the present invention is divided The Parallel implementation step of obtained implicit finite element nonlinear analysis is as shown in table 1.
1 Parallel implementation step of table
Step S3, pretreatment is chosen, LPCG solver is established and the equilibrium equation group of Newton iteration method is solved;
The step S3 specifically:
Son is pre-processed as pretreatment using HW, the LPCG solver based on EBE is constructed using LPCG algorithm, In, HW pretreatment carries out Crout decomposition for the preconditioning matrix C to LPCG algorithm, so that preconditioning matrix C only needs to carry out Cell level operation;
Wherein, LPCG algorithm is as follows:
In above-mentioned LPCG algorithm, r indicates linear remaining, and u indicates motion vector, and C indicates pretreatment submatrix, matrix to It measures product Kp and pre-processes exhausted big portion's calculating time that the calculating consumption walked solves equation, remaining calculating includes that simple vector is transported Calculate (dot product, vector plus-minus).EBE, which is realized, can avoid the sub- C of pretreatment-1Invert and structural stiffness matrix KTExplicit algorithm, lead to Cross the contribution K to each unitT(e)pi(e)Summation is formed.
Use system of linear equationsEquilibrium equation group as Newton iteration method, wherein KTFor tangent stiffness,For the corrected value of displacement increment, R is dynamic residual vector, and using building based on the LPCG solver of EBE come to linear side Journey groupIt is solved.
In specific solve, an effective preconditioning matrix C should be matrix KTApproximation.For building pretreatment, KT's It is approximately to be multiplied to be formed by cell matrix, due to KTBuilding is related to element stiffness summation, is asked with the approximation of an expression formula With as product.
For this purpose, the rigidity of structure is indicated are as follows:
D in formula (1)sIndicate KTIt is diagonal.
KTIt is indicated with element stiffness matrix summation are as follows:
K in formula (2)eFor unit e tangent stiffness matrix, DeTHE TANGENTIAL STIFFNESS MATRICES it is diagonal.
By KTApproximation be written as long-pending form:
Inner product item is referred to as the Winget regularization of the tangent stiffness matrix of unit e in formula (3).From each unit The Crout of Winget regularization decomposes to obtain the pretreated last form of HW.
To provide a symmetrical preconditioning matrix, can rearrange to obtain:
In formula (4)ForCrout decompose,ForIt is diagonal.
To determine preconditioning matrix C, the Crout decomposition computation of each unit can complete parallel carry out and nothing between each region Need any communication.The calculating of initialization only accounts for the sub-fraction totally calculated, by the way that unit group is combined into cell block, in a list There is no unit to share a common node in first block, can solve the problems, such as unit sequence and concurrency in this way.Then pretreatment step It is carried out, is suitable under distributed and shared drive parallel environment in this way in cell block using vectorization operation in a serial fashion Realize the parallel of coarseness.
Step S4, the related figure of building, i.e., be described using an abstract topological diagram;
The step S4 specifically:
By in each region unit sequence and concurrency problem be converted to the scheduling operation of relational graph, and by relational graph come Description is located at the process correlation between the unit of zone boundary, specifically includes: being decomposed using node of graph and defines each region Boundary group (each region has common connection with adjacent area), and side is established by defining the connection between node of graph Correlation between boundary's group is (since during application pre-processes sub- C, all units in the boundary group in each region are required Data are exchanged with other regions with same communication pattern), i.e. connection between two boundary groups indicates a correlation, each Part in region in addition to the group of boundary is internal element.
Step S5, parallelization is carried out to pretreatment using weighted balance colouring algorithm, and realizes calculating and communication overlapping.
Wherein, described that parallelization is carried out to pretreatment using weighted balance colouring algorithm specifically: to use weighted balance Node of graph is split into each parallel group by colouring algorithm, and is made in each parallel group without the shared connection of node of graph;When one parallel After the completion of all pretreatments in group calculate, which just shares the item of the boundary node newly calculated with all Processor is communicated, and can thus be realized and be synchronized to the value of shared node.The realization of the weighted balance colouring algorithm Steps are as follows:
Step 11 lists all node of graph by weight factor descending order;
Step 12 constructs connection between all node of graph in same processor;
Step 13 selects independent node set using colouring algorithm from the node currently without distribution;
Step 14 is searched with most multiunit node of graph in each parallel group as target;
Step 15 recycles each processor for being less than object element number, specifically includes:
(1) processor recycles node of graph;
(2) if node of graph and will not facilitate access to object element number with any node conflict in parallel group, then will The node of graph is added in parallel group;Otherwise it is just added without in parallel group;
(3) all new node of graph are indicated as current distribution;
(4) judge whether to there remains the node not distributed, if so, continuing step 15;If it is not, then terminating to follow Ring.
6 × 6 finite element grid application HW pretreatment is come to step 4 and step 5 below on four processors Whole process be illustrated: be four regions by 6 × 6 FEM meshings, and carry out concurrent operation with four processors, As shown in (a) and (b) in Fig. 3;Topological analysis is carried out to each region and determines internal element and boundary group, such as (c) in Fig. 3 Shown, each processor has 4 internal elements and 3 boundary groups;The correlation of boundary group is described using node of graph, is such as schemed Shown in (d) in 3, the power of node of graph is equivalent in each group using estimation calculation amount needed for pretreatment operation, utilizes this letter Breath has parallel group of well loaded balance using colouring algorithm building, then, by adding the supplement appeared in parallel group Unit calls colouring algorithm further to improve load balance;What (e) in Fig. 3 was shown is exactly parallel organize with internal element most After dispatch, each parallel group has balanced load balance;Finally, the data synchronizing mistake between parallel group 1 and parallel group 2 Cheng Zhong, one group of internal element energy concurrent communication and calculating.
The realization calculates and communication overlapping specifically: when starting to calculate one new parallel group, in this parallel group Each processor carries out pretreatment calculating to all units respectively possessed, and after the completion of a processor calculates, should Processor just starts non-obstruction and sends, and the updated value of node on zone boundary is sent to all processors adjacent thereto;So Afterwards, to the domain of communication and calculating overlapping, pretreatment, which calculates, to be started to distribute internal element by fixed value, when the meter for distributing internal element After the completion of calculation, the non-obstruction of processor detection sends and receives whether complete, and if be completed, and starts to calculate next parallel Group;If do not completed, Deng until start to calculate next parallel group after the completion again.Due to all internal elements in region It is carried out in parallel group, therefore, load balance can be refined, and making being capable of overlapping communication and calculating between parallel group.
When finite element grid and Region Decomposition provide sufficient amount of internal element, colouring algorithm provides balanced scheduling When, just non-obstruction MPI transmission/received completion is waited without processor.Then, in the iteration of each conjugate gradient, HW is pre- Handling sub- operation is complete parallel, and such algorithm just has good efficiency.
The example that the present invention is embodied:
Circular shell computation model as shown in Figure 4, wherein L=1000mm, R=1000mm, θ=π/6, two straight flanges are fixed. Its center is acted on by normal point load, and applying mode is as shown in Figure 5.Thickness of shell 2mm, E=2.06 × 105Mpa, Poisson's ratio V=0.3, yield stress σs=235Mpa, mass density ρ=7.8 × 103Kg/m3.Consider geometrical non-linearity, takes 8 nodes Shell unit, carries out the finite element analysis of shell structure, and the central node displacement time histories reaction of load action is as shown in Figure 5.
For the performance for testing parallel algorithm, use different finite element grid numbers to increase problem size.Using traditional Diagonal pretreatment sub (D) and HW pretreatment are respectively calculated.CPU time (by taking unit number is 41600 as an example) such as 2 institute of table Show.
The CPU time (unit number 41600) of 2 algorithm of table
The speed-up ratio of algorithm is as shown in Figure 6 and Figure 7, wherein in the speed-up ratio of the algorithm of Fig. 6, unit number 41600; In the speed-up ratio of the algorithm of Fig. 7, unit number 183200.The problem of can be seen that by Fig. 6 and Fig. 7 to identical scale, is adopted It is corresponding to add with the HW-LPCG parallel computation of HW pretreatment of the invention than diagonally pre-processing sub algorithm (D-LPCG) fastly Speed ratio is high, and the calculated performance of algorithm is improved with the increase of problem size, it follows that the HW that the present invention uses locates in advance The algorithm of reason has better parallel performance.
In conclusion the present invention has the advantage that
(1) level-2 area is taken to decompose.It is region that the first order, which is by FEM meshing, and each region can be realized well Load balance and reduce processor between communication;The second level is by each Region Decomposition into homogeneous unit block, by each In region carry out cell block division so that cell level calculating carried out in each cell block, in this way, it is inner most circulation always into Row unit calculates, and occupies very big workload using the inside circulation of this framework creation, is conducive to local parallel;
(2) it by defining boundary group, construct related figure and realize parallelization using colouring algorithm, can avoid algorithm to having The Region Decomposition for limiting first grid applies limitation.By reducing the sum of boundary group, it can be achieved that good Region Decomposition is logical to reduce Letter;It takes weighting colouring algorithm to form parallel group, internal element is distributed to parallel group, load balance can be refined, and make simultaneously It being capable of overlapping communication and calculating between row group;
(3) HW pretreatment sublist reveals more preferable convergence property, can make to calculate time reduction using HW pretreatment, have more Good performance.
Although specific embodiments of the present invention have been described above, those familiar with the art should be managed Solution, we are merely exemplary described specific embodiment, rather than for the restriction to the scope of the present invention, it is familiar with this The technical staff in field should be covered of the invention according to modification and variation equivalent made by spirit of the invention In scope of the claimed protection.

Claims (7)

1. a kind of implicit finite element parallel method decomposed based on level-2 area, it is characterised in that: the described method includes:
Establish the solution procedure of implicit finite element nonlinear analysis;
Level-2 area decomposition is carried out to domain, establishes the Parallel implementation step of implicit finite element nonlinear analysis;
Pretreatment is chosen, LPCG solver is established and the equilibrium equation group of Newton iteration method is solved;
The related figure of building;
Parallelization is carried out to pretreatment using weighted balance colouring algorithm, and realizes calculating and communication overlapping.
2. the implicit finite element parallel method according to claim 1 decomposed based on level-2 area, it is characterised in that: described The solution procedure for establishing implicit finite element nonlinear analysis specifically includes:
Initialization step:
Calculate node equivalent load;
Determine useful load increment;
Calculate the norm of useful load increment;
Newton circulation step:
Step a, computing unit rigidity;
Step b, calculate node quality;
Step c, effective rigidity matrix and load vector are calculated;
Step d, new displacement increment is acquired using the LPCG solver based on EBE;
Step e, strain, stress and interior force vector are asked;
Step f, residual force and its norm are calculated;
Step g, judge whether residual force restrains, if not restraining, return step a;If convergence, updates strain, stress value, and End loop.
3. the implicit finite element parallel method according to claim 1 decomposed based on level-2 area, it is characterised in that: described Level-2 area decomposition is carried out to domain, the Parallel implementation step for establishing implicit finite element nonlinear analysis specifically includes:
Region is first divided into the finite element grid in domain, and each region is distributed at a processor Reason;Then homogeneous unit block division is carried out to each region, and has all units in per the homogeneous unit block There are identical cell type, integral order, strain-displacement relation and material constitutive model;It is to be divided it is complete after, can establish hidden The Parallel implementation step of formula finite element nonlinear analysis.
4. the implicit finite element parallel method according to claim 3 decomposed based on level-2 area, it is characterised in that: described Pretreatment is chosen, LPCG solver is established and the equilibrium equation group of Newton iteration method is solved specifically:
Son is pre-processed as pretreatment using HW, the LPCG solver based on EBE is constructed using LPCG algorithm, wherein HW Pretreatment carries out Crout decomposition for the preconditioning matrix C to LPCG algorithm, so that preconditioning matrix C need to only carry out unit Grade operation;
Use system of linear equationsEquilibrium equation group as Newton iteration method, wherein KTFor tangent stiffness,For The corrected value of displacement increment, R are dynamic residual vector, and using building based on the LPCG solver of EBE come to system of linear equationsIt is solved.
5. the implicit finite element parallel method according to claim 3 decomposed based on level-2 area, it is characterised in that: described The related figure of building specifically:
Unit sequence in each region is converted into the scheduling operation of relational graph with concurrency problem, and is described by relational graph Process correlation between the unit of zone boundary, specifically includes: the boundary for defining each region is decomposed using node of graph Group, and by defining the connection between node of graph the correlation established between the group of boundary, in each region in addition to the group of boundary Part is internal element.
6. the implicit finite element parallel method according to claim 5 decomposed based on level-2 area, it is characterised in that: described Parallelization is carried out to pretreatment using weighted balance colouring algorithm specifically: node of graph is torn open using weighted balance colouring algorithm It is divided into each parallel group, and makes in each parallel group without the shared connection of node of graph;The realization of the weighted balance colouring algorithm Steps are as follows:
Step 11 lists all node of graph by weight factor descending order;
Step 12 constructs connection between all node of graph in same processor;
Step 13 selects independent node set using colouring algorithm from the node currently without distribution;
Step 14 is searched with most multiunit node of graph in each parallel group as target;
Step 15 recycles each processor for being less than object element number, specifically includes:
(1) processor recycles node of graph;
(2) if node of graph and will not facilitate access to object element number, then by the figure with any node conflict in parallel group Node is added in parallel group;Otherwise it is just added without in parallel group;
(3) all new node of graph are indicated as current distribution;
(4) judge whether to there remains the node not distributed, if so, continuing step 15;If it is not, then end loop.
7. the implicit finite element parallel method according to claim 6 decomposed based on level-2 area, it is characterised in that: described It realizes and calculates and communicate overlapping specifically: when starting to calculate one new parallel group, each processor in this parallel group is equal Pretreatment calculating is carried out to all units respectively possessed, as soon as and after the completion of processor calculates, which starts Non- obstruction is sent, and the updated value of node on zone boundary is sent to all processors adjacent thereto;Then, to communication and meter The domain of overlapping is calculated, pretreatment, which calculates, to be started to distribute internal element by fixed value, after the completion of the calculating of distribution internal element, processing The non-obstruction of device detection sends and receives whether complete, and if be completed, starts to calculate next parallel group;If not complete At then equal until start to calculate next parallel group after the completion again.
CN201810826770.XA 2018-07-25 2018-07-25 Implicit finite element parallel method based on two-stage region decomposition Active CN109101708B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810826770.XA CN109101708B (en) 2018-07-25 2018-07-25 Implicit finite element parallel method based on two-stage region decomposition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810826770.XA CN109101708B (en) 2018-07-25 2018-07-25 Implicit finite element parallel method based on two-stage region decomposition

Publications (2)

Publication Number Publication Date
CN109101708A true CN109101708A (en) 2018-12-28
CN109101708B CN109101708B (en) 2022-06-28

Family

ID=64847446

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810826770.XA Active CN109101708B (en) 2018-07-25 2018-07-25 Implicit finite element parallel method based on two-stage region decomposition

Country Status (1)

Country Link
CN (1) CN109101708B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111914455A (en) * 2020-07-31 2020-11-10 英特工程仿真技术(大连)有限公司 Finite element parallel computing method based on node overlapping type region decomposition without Schwarz alternation
CN112506469A (en) * 2021-02-05 2021-03-16 支付宝(杭州)信息技术有限公司 Method and device for processing private data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140046993A1 (en) * 2012-08-13 2014-02-13 Nvidia Corporation System and method for multi-color dilu preconditioner
CN107688680A (en) * 2016-08-05 2018-02-13 南京理工大学 A kind of efficient time-Domain FEM domain decomposition parallel method
CN108228970A (en) * 2017-12-11 2018-06-29 上海交通大学 The explicit asynchronous long parallel calculating method of structural dynamical model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140046993A1 (en) * 2012-08-13 2014-02-13 Nvidia Corporation System and method for multi-color dilu preconditioner
CN107688680A (en) * 2016-08-05 2018-02-13 南京理工大学 A kind of efficient time-Domain FEM domain decomposition parallel method
CN108228970A (en) * 2017-12-11 2018-06-29 上海交通大学 The explicit asynchronous long parallel calculating method of structural dynamical model

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
付朝江: "基于有效并行求解策略的显式有限元分析并行算法_付朝江", 《计算机应用》 *
付朝江: "随机有限元分析的二级区域分解并行求解算法", 《应用力学学报》 *
付朝江: "隐式非线性动力分析有限元并行求解格式", 《工程力学》 *
付朝江等: "非线性动力有限元重叠区域分裂的隐式并行算法", 《计算力学学报》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111914455A (en) * 2020-07-31 2020-11-10 英特工程仿真技术(大连)有限公司 Finite element parallel computing method based on node overlapping type region decomposition without Schwarz alternation
CN111914455B (en) * 2020-07-31 2024-03-15 英特工程仿真技术(大连)有限公司 Finite element parallel computing method based on node overlap type regional decomposition Schwarz alternation-free
CN112506469A (en) * 2021-02-05 2021-03-16 支付宝(杭州)信息技术有限公司 Method and device for processing private data
CN112506469B (en) * 2021-02-05 2021-04-27 支付宝(杭州)信息技术有限公司 Method and device for processing private data

Also Published As

Publication number Publication date
CN109101708B (en) 2022-06-28

Similar Documents

Publication Publication Date Title
Xiao et al. Caspmv: A customized and accelerative spmv framework for the sunway taihulight
Shan et al. FPGA and GPU implementation of large scale SpMV
CN107451097A (en) Multidimensional FFT high-performance implementation method on the domestic many-core processor of Shen prestige 26010
Choi et al. High-performance dense tucker decomposition on GPU clusters
CN112231630B (en) Sparse matrix solving method based on FPGA parallel acceleration
CN109101708A (en) The implicit finite element parallel method decomposed based on level-2 area
CN114201287B (en) Method for cooperatively processing data based on CPU + GPU heterogeneous platform
Emad et al. Unite and conquer approach for high scale numerical computing
He et al. A multiple-GPU based parallel independent coefficient reanalysis method and applications for vehicle design
CN109753682B (en) Finite element stiffness matrix simulation method based on GPU (graphics processing Unit) end
Zhang et al. Efficient sparse matrix–vector multiplication using cache oblivious extension quadtree storage format
Zeng et al. GPU-based sparse power flow studies with modified Newton’s method
Cao et al. Sap-sgd: Accelerating distributed parallel training with high communication efficiency on heterogeneous clusters
CN111651208A (en) Modal parallel computing method and system for heterogeneous many-core parallel computer
Liu et al. A heterogeneous parallel genetic algorithm based on sw26010 processors
Lindquist et al. Replacing pivoting in distributed Gaussian elimination with randomized techniques
Cevahir et al. Efficient PageRank on GPU clusters
Wang et al. Enhanced hybrid MPI-OpenMP parallel electromagnetic simulations based on low-rank compressions
Liu et al. A hybrid parallel genetic algorithm with dynamic migration strategy based on sunway many-core processor
Gao et al. Optimization of reactive force field simulation: Refactor, parallelization, and vectorization for interactions
CN113704691A (en) Small-scale symmetric matrix parallel three-diagonalization method of Shenwei many-core processor
Du et al. Providing GPU capability to LU and QR within the ScaLAPACK framework
Wu et al. A fast parallel implementation of molecular dynamics with the morse potential on a heterogeneous petascale supercomputer
Wang et al. Fine-grained heterogeneous parallel direct solver for finite element problems
Farhat et al. Dynamic finite element simulations on the connection machine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant