CN109684061A

CN109684061A - A kind of unstrctured grid many-core coarse-grained parallelization method

Info

Publication number: CN109684061A
Application number: CN201811583475.2A
Authority: CN
Inventors: 刘鑫; 李芳�; 徐金秀; 陈德训; 孙唯哲; 范昊; 何香
Original assignee: Wuxi Jiangnan Computing Technology Institute
Current assignee: Wuxi Jiangnan Computing Technology Institute
Priority date: 2018-12-24
Filing date: 2018-12-24
Publication date: 2019-04-26

Abstract

The present invention discloses a kind of unstrctured grid many-core coarse-grained parallelization method, this method increases the thread-level Region Decomposition of the second level on the basis of Region Decomposition of first order unstrctured grid, each respective independent zoning is solved from core, guarantee the data hit rate from core core calculations task, realizes the coarse grain parallelism of MPI process level and the coarse grain parallelism from core thread-level.The present invention is able to solve general unstrctured grid and applies the adaptability problem on polymorphic heterogeneous processor, the coarseness many-core for being automatically performed secondary loads balance and calculating core according to unstrctured grid data scale is parallel, improves the computational efficiency and parallel efficiency of unstrctured grid numerical simulation on isomeric architecture.

Description

A kind of unstrctured grid many-core coarse-grained parallelization method

Technical field

The present invention relates to unstrctured grid flow field calculation field more particularly to a kind of unstrctured grid many-core coarse grain parallelisms Calculation method.

Background technique

It in recent years, is the computing capability for improving system, multicore, many-core processor have become the main body of high-performance computer Component parts, supercomputer system is in the structure trend of polymorphic, isomery and very big parallel scale, and application field phase therewith The High Efficient Parallel Algorithms matched lack.With going deep into for user study, the zoning of practical problem numerical simulation and topological structure Increasingly complicated and fine, unstrctured grid is because of its excellent flexibility and to the superpower adaptability of complex appearance, in engineering It calculates in software using more and more extensive.But then, due to unstrctured grid cast out the architectural limitation of mesh node from And randomness is caused on the data store, memory access bottleneck problem is more prominent, it is difficult to which the superelevation for playing many-core processor calculates energy Power needs to study the unstrctured grid Efficient numerical parallel methods based on many-core processor.

The Parallel Implementation towards heterogeneous computer system is all based on greatly MPI process level coarse grain parallelism+many-core thread at present The two-stage parallel model of grade fine grained parallel, wherein MPI process level coarse grain parallelism uses program message passing model, passes through area The coarsenesses load balances such as domain decomposition and message communicating mode realize parallel computation, and parallel granularity is relatively large, many-core thread-level Fine grained parallel is based on shared variable programming model, distributes and loads for the many-core task that core calculations circulation carries out circulation grade Balance, and Performance tuning is carried out using the methods of data layout optimization, calculating and memory access overlapping, parallel granularity is relatively small.It should Two-stage parallel model structured grid class application in using and obtain superior performance.However unstrctured grid majority core calculations section Memory access feature is discontinuous, then can be because visiting if it is parallel directly to carry out many-core to core loop using previous many-core parallel mode Depositing discontinuously causes performance lower.

To sum up, mostly use process level thick greatly currently based on the unstrctured grid class application of isomery many-core processor structure feature Granularity is parallel and thread-level fine grained parallel mode, however, being calculated due to the flexibility and randomness of unstrctured grid There are a large amount of irregular discrete memory access in journey, many-core parallel efficiency is low, and performance issue is prominent.

Summary of the invention

It is an object of the invention to by a kind of unstrctured grid many-core coarse-grained parallelization method, to solve above carry on the back The problem of scape technology segment is mentioned.

To achieve this purpose, the present invention adopts the following technical scheme:

A kind of unstrctured grid many-core coarse-grained parallelization method, this method, which uses, is based on domestic isomery many-core processor The two-level load balance of architecture feature increases second level region point on the basis of first order unstrctured grid Region Decomposition Solution each completes the calculating task of corresponding region from core, guarantees to realize from the data hit rate for assessing calculation and calculate kernel thread grade Coarseness many-core it is parallel.

Particularly, the two-level load balance based on domestic isomery many-core processor architecture feature, specifically includes: S101, level-one process level load balance is completed, Region Decomposition is carried out to zoning using figure division methods, is born between guarantee process It carries balance and communication is minimum；S102, second level thread-level load balance is completed, is calculated often according to from Nuclear Data storage space volume A maximum task amount enough calculated from nuclear energy, and the second level is equally carried out using zoning of the figure division methods to each process Task is assigned to each from core by Region Decomposition.

Particularly, the coarseness many-core for calculating kernel thread grade is parallel, specifically includes: first according to step S101's Task is assigned in each process by load balance result, is then assigned to task according to the load balance result of step S102 Each from core thread, each requires completion in order to calculate including flux from core according to the calculating of Fluid Mechanics Computation, is sparse The calculating of each core including Algebraic Equation set solution.

Unstrctured grid many-core coarse-grained parallelization method proposed by the present invention is in the region of first order unstrctured grid The thread-level Region Decomposition for increasing the second level on the basis of decomposition, each solves respective independent zoning from core, guarantee from The data hit rate of core core calculations task realizes the coarse grain parallelism of MPI process level and the coarse grain parallelism from core thread-level. The present invention is able to solve general unstrctured grid and applies the adaptability problem on polymorphic heterogeneous processor, according to unstrctured grid Data scale is automatically performed secondary loads balance and the coarseness many-core of calculating core is parallel, improves non-on isomeric architecture The computational efficiency and parallel efficiency of structured grid numerical simulation.

Detailed description of the invention

Fig. 1 is unstrctured grid many-core coarse-grained parallelization method flow diagram provided in an embodiment of the present invention.

Specific embodiment

Present invention will be further explained below with reference to the attached drawings and examples.It is understood that tool described herein Body embodiment is used only for explaining the present invention rather than limiting the invention.It also should be noted that for the ease of retouching It states, only some but not all contents related to the present invention are shown in the drawings, it is unless otherwise defined, used herein all Technical and scientific term has the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.It is used herein Term be intended merely to description specific embodiment, it is not intended that in limitation the present invention.

It please refers to shown in Fig. 1, Fig. 1 is unstrctured grid many-core coarse-grained parallelization method provided in an embodiment of the present invention Flow chart.

Unstrctured grid many-core coarse-grained parallelization method, which uses, in the present embodiment is based on domestic isomery many-core processor The two-level load balance of architecture feature increases second level region point on the basis of first order unstrctured grid Region Decomposition Solution each completes the calculating task of corresponding region from core, guarantees to realize from the data hit rate for assessing calculation and calculate kernel thread grade Coarseness many-core it is parallel.Wherein, domestic isomery many-core processor architecture feature includes: domestic isomery many-core processor core Calculation is more, has certain capacity relatively by force, from the instruction space and data space of core core from core computing capability.

Specifically, the two-level load balance based on isomery many-core processor architecture feature described in the present embodiment, It specifically includes: S101, completing level-one process level load balance, Region Decomposition is carried out to zoning using figure division methods, is protected Load balance and communication are minimum between card process；S102, second level thread-level load balance is completed, held according to from Nuclear Data memory space Meter calculates the maximum task amount each enough calculated from nuclear energy, and equally using figure division methods to the zoning of each process Second level Region Decomposition is carried out, task is assigned to each from core.

The coarseness many-core for calculating kernel thread grade described in the present embodiment is parallel, specifically includes: S103, first basis Task is assigned in each process by the load balance result of the step S101, then flat according to the load of the step S102 Weighing apparatus result task is assigned to it is each from core thread, each from core according to the calculating of Fluid Mechanics Computation require complete in order The calculating of each core including flux calculating, the solution of sparse Algebraic Equation set.

Technical solution of the present invention increases the thread of the second level on the basis of Region Decomposition of first order unstrctured grid Grade Region Decomposition, each solves respective independent zoning from core, guarantees the data hit rate from core core calculations task, real The now coarse grain parallelism of MPI process level and the coarse grain parallelism from core thread-level.The present invention is able to solve general unstrctured grid The adaptability problem on polymorphic heterogeneous processor is applied, secondary loads balance is automatically performed according to unstrctured grid data scale It is parallel with the coarseness many-core that calculates core, improve on isomeric architecture the computational efficiency of unstrctured grid numerical simulation and Parallel efficiency.

Those of ordinary skill in the art will appreciate that realizing that all parts in above-described embodiment are can to pass through computer Program is completed to instruct relevant hardware, and the program can be stored in a computer-readable storage medium, the program When being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can for magnetic disk, CD, only Read storage memory (Read-Only Memory, ROM) or random access memory (Random Access Memory, RAM) Deng.

Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation, It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.

Claims

1. a kind of unstrctured grid many-core coarse-grained parallelization method, which is characterized in that this method, which uses, is based on domestic isomery The two-level load balance of many-core processor architecture feature increases by the basis of first order unstrctured grid Region Decomposition Level-2 area decomposes, and the calculating task of corresponding region is each completed from core, guarantees to realize and calculate from the data hit rate for assessing calculation The coarseness many-core of kernel thread grade is parallel.

2. unstrctured grid many-core coarse-grained parallelization method according to claim 1, which is characterized in that described to be based on The two-level load balance of domestic isomery many-core processor architecture feature, specifically includes: S101, completing the load of level-one process level Balance carries out Region Decomposition to zoning using figure division methods, and load balance and communication are minimum between guarantee process；S102, Second level thread-level load balance is completed, is appointed according to each maximum enough calculated from nuclear energy is calculated from Nuclear Data storage space volume Business amount, and second level Region Decomposition is carried out using zoning of the figure division methods to each process, task is assigned to each From core.

3. unstrctured grid many-core coarse-grained parallelization method according to claim 2, which is characterized in that the calculating The coarseness many-core of kernel thread grade is parallel, specifically includes: first being distributed task according to the load balance result of step S101 Onto each process, task is then assigned to according to the load balance result of step S102 by each from core thread, Mei Gecong Core requires to complete in order each including flux calculating, the solution of sparse Algebraic Equation set according to the calculating of Fluid Mechanics Computation The calculating of a core.