CN109684061A - A kind of unstrctured grid many-core coarse-grained parallelization method - Google Patents

A kind of unstrctured grid many-core coarse-grained parallelization method Download PDF

Info

Publication number
CN109684061A
CN109684061A CN201811583475.2A CN201811583475A CN109684061A CN 109684061 A CN109684061 A CN 109684061A CN 201811583475 A CN201811583475 A CN 201811583475A CN 109684061 A CN109684061 A CN 109684061A
Authority
CN
China
Prior art keywords
core
level
many
unstrctured grid
calculating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811583475.2A
Other languages
Chinese (zh)
Inventor
刘鑫
李芳�
徐金秀
陈德训
孙唯哲
范昊
何香
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jiangnan Computing Technology Institute
Original Assignee
Wuxi Jiangnan Computing Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Jiangnan Computing Technology Institute filed Critical Wuxi Jiangnan Computing Technology Institute
Priority to CN201811583475.2A priority Critical patent/CN109684061A/en
Publication of CN109684061A publication Critical patent/CN109684061A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5018Thread allocation

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention discloses a kind of unstrctured grid many-core coarse-grained parallelization method, this method increases the thread-level Region Decomposition of the second level on the basis of Region Decomposition of first order unstrctured grid, each respective independent zoning is solved from core, guarantee the data hit rate from core core calculations task, realizes the coarse grain parallelism of MPI process level and the coarse grain parallelism from core thread-level.The present invention is able to solve general unstrctured grid and applies the adaptability problem on polymorphic heterogeneous processor, the coarseness many-core for being automatically performed secondary loads balance and calculating core according to unstrctured grid data scale is parallel, improves the computational efficiency and parallel efficiency of unstrctured grid numerical simulation on isomeric architecture.

Description

A kind of unstrctured grid many-core coarse-grained parallelization method
Technical field
The present invention relates to unstrctured grid flow field calculation field more particularly to a kind of unstrctured grid many-core coarse grain parallelisms Calculation method.
Background technique
It in recent years, is the computing capability for improving system, multicore, many-core processor have become the main body of high-performance computer Component parts, supercomputer system is in the structure trend of polymorphic, isomery and very big parallel scale, and application field phase therewith The High Efficient Parallel Algorithms matched lack.With going deep into for user study, the zoning of practical problem numerical simulation and topological structure Increasingly complicated and fine, unstrctured grid is because of its excellent flexibility and to the superpower adaptability of complex appearance, in engineering It calculates in software using more and more extensive.But then, due to unstrctured grid cast out the architectural limitation of mesh node from And randomness is caused on the data store, memory access bottleneck problem is more prominent, it is difficult to which the superelevation for playing many-core processor calculates energy Power needs to study the unstrctured grid Efficient numerical parallel methods based on many-core processor.
The Parallel Implementation towards heterogeneous computer system is all based on greatly MPI process level coarse grain parallelism+many-core thread at present The two-stage parallel model of grade fine grained parallel, wherein MPI process level coarse grain parallelism uses program message passing model, passes through area The coarsenesses load balances such as domain decomposition and message communicating mode realize parallel computation, and parallel granularity is relatively large, many-core thread-level Fine grained parallel is based on shared variable programming model, distributes and loads for the many-core task that core calculations circulation carries out circulation grade Balance, and Performance tuning is carried out using the methods of data layout optimization, calculating and memory access overlapping, parallel granularity is relatively small.It should Two-stage parallel model structured grid class application in using and obtain superior performance.However unstrctured grid majority core calculations section Memory access feature is discontinuous, then can be because visiting if it is parallel directly to carry out many-core to core loop using previous many-core parallel mode Depositing discontinuously causes performance lower.
To sum up, mostly use process level thick greatly currently based on the unstrctured grid class application of isomery many-core processor structure feature Granularity is parallel and thread-level fine grained parallel mode, however, being calculated due to the flexibility and randomness of unstrctured grid There are a large amount of irregular discrete memory access in journey, many-core parallel efficiency is low, and performance issue is prominent.
Summary of the invention
It is an object of the invention to by a kind of unstrctured grid many-core coarse-grained parallelization method, to solve above carry on the back The problem of scape technology segment is mentioned.
To achieve this purpose, the present invention adopts the following technical scheme:
A kind of unstrctured grid many-core coarse-grained parallelization method, this method, which uses, is based on domestic isomery many-core processor The two-level load balance of architecture feature increases second level region point on the basis of first order unstrctured grid Region Decomposition Solution each completes the calculating task of corresponding region from core, guarantees to realize from the data hit rate for assessing calculation and calculate kernel thread grade Coarseness many-core it is parallel.
Particularly, the two-level load balance based on domestic isomery many-core processor architecture feature, specifically includes: S101, level-one process level load balance is completed, Region Decomposition is carried out to zoning using figure division methods, is born between guarantee process It carries balance and communication is minimum;S102, second level thread-level load balance is completed, is calculated often according to from Nuclear Data storage space volume A maximum task amount enough calculated from nuclear energy, and the second level is equally carried out using zoning of the figure division methods to each process Task is assigned to each from core by Region Decomposition.
Particularly, the coarseness many-core for calculating kernel thread grade is parallel, specifically includes: first according to step S101's Task is assigned in each process by load balance result, is then assigned to task according to the load balance result of step S102 Each from core thread, each requires completion in order to calculate including flux from core according to the calculating of Fluid Mechanics Computation, is sparse The calculating of each core including Algebraic Equation set solution.
Unstrctured grid many-core coarse-grained parallelization method proposed by the present invention is in the region of first order unstrctured grid The thread-level Region Decomposition for increasing the second level on the basis of decomposition, each solves respective independent zoning from core, guarantee from The data hit rate of core core calculations task realizes the coarse grain parallelism of MPI process level and the coarse grain parallelism from core thread-level. The present invention is able to solve general unstrctured grid and applies the adaptability problem on polymorphic heterogeneous processor, according to unstrctured grid Data scale is automatically performed secondary loads balance and the coarseness many-core of calculating core is parallel, improves non-on isomeric architecture The computational efficiency and parallel efficiency of structured grid numerical simulation.
Detailed description of the invention
Fig. 1 is unstrctured grid many-core coarse-grained parallelization method flow diagram provided in an embodiment of the present invention.
Specific embodiment
Present invention will be further explained below with reference to the attached drawings and examples.It is understood that tool described herein Body embodiment is used only for explaining the present invention rather than limiting the invention.It also should be noted that for the ease of retouching It states, only some but not all contents related to the present invention are shown in the drawings, it is unless otherwise defined, used herein all Technical and scientific term has the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.It is used herein Term be intended merely to description specific embodiment, it is not intended that in limitation the present invention.
It please refers to shown in Fig. 1, Fig. 1 is unstrctured grid many-core coarse-grained parallelization method provided in an embodiment of the present invention Flow chart.
Unstrctured grid many-core coarse-grained parallelization method, which uses, in the present embodiment is based on domestic isomery many-core processor The two-level load balance of architecture feature increases second level region point on the basis of first order unstrctured grid Region Decomposition Solution each completes the calculating task of corresponding region from core, guarantees to realize from the data hit rate for assessing calculation and calculate kernel thread grade Coarseness many-core it is parallel.Wherein, domestic isomery many-core processor architecture feature includes: domestic isomery many-core processor core Calculation is more, has certain capacity relatively by force, from the instruction space and data space of core core from core computing capability.
Specifically, the two-level load balance based on isomery many-core processor architecture feature described in the present embodiment, It specifically includes: S101, completing level-one process level load balance, Region Decomposition is carried out to zoning using figure division methods, is protected Load balance and communication are minimum between card process;S102, second level thread-level load balance is completed, held according to from Nuclear Data memory space Meter calculates the maximum task amount each enough calculated from nuclear energy, and equally using figure division methods to the zoning of each process Second level Region Decomposition is carried out, task is assigned to each from core.
The coarseness many-core for calculating kernel thread grade described in the present embodiment is parallel, specifically includes: S103, first basis Task is assigned in each process by the load balance result of the step S101, then flat according to the load of the step S102 Weighing apparatus result task is assigned to it is each from core thread, each from core according to the calculating of Fluid Mechanics Computation require complete in order The calculating of each core including flux calculating, the solution of sparse Algebraic Equation set.
Technical solution of the present invention increases the thread of the second level on the basis of Region Decomposition of first order unstrctured grid Grade Region Decomposition, each solves respective independent zoning from core, guarantees the data hit rate from core core calculations task, real The now coarse grain parallelism of MPI process level and the coarse grain parallelism from core thread-level.The present invention is able to solve general unstrctured grid The adaptability problem on polymorphic heterogeneous processor is applied, secondary loads balance is automatically performed according to unstrctured grid data scale It is parallel with the coarseness many-core that calculates core, improve on isomeric architecture the computational efficiency of unstrctured grid numerical simulation and Parallel efficiency.
Those of ordinary skill in the art will appreciate that realizing that all parts in above-described embodiment are can to pass through computer Program is completed to instruct relevant hardware, and the program can be stored in a computer-readable storage medium, the program When being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can for magnetic disk, CD, only Read storage memory (Read-Only Memory, ROM) or random access memory (Random Access Memory, RAM) Deng.
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation, It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.

Claims (3)

1. a kind of unstrctured grid many-core coarse-grained parallelization method, which is characterized in that this method, which uses, is based on domestic isomery The two-level load balance of many-core processor architecture feature increases by the basis of first order unstrctured grid Region Decomposition Level-2 area decomposes, and the calculating task of corresponding region is each completed from core, guarantees to realize and calculate from the data hit rate for assessing calculation The coarseness many-core of kernel thread grade is parallel.
2. unstrctured grid many-core coarse-grained parallelization method according to claim 1, which is characterized in that described to be based on The two-level load balance of domestic isomery many-core processor architecture feature, specifically includes: S101, completing the load of level-one process level Balance carries out Region Decomposition to zoning using figure division methods, and load balance and communication are minimum between guarantee process;S102, Second level thread-level load balance is completed, is appointed according to each maximum enough calculated from nuclear energy is calculated from Nuclear Data storage space volume Business amount, and second level Region Decomposition is carried out using zoning of the figure division methods to each process, task is assigned to each From core.
3. unstrctured grid many-core coarse-grained parallelization method according to claim 2, which is characterized in that the calculating The coarseness many-core of kernel thread grade is parallel, specifically includes: first being distributed task according to the load balance result of step S101 Onto each process, task is then assigned to according to the load balance result of step S102 by each from core thread, Mei Gecong Core requires to complete in order each including flux calculating, the solution of sparse Algebraic Equation set according to the calculating of Fluid Mechanics Computation The calculating of a core.
CN201811583475.2A 2018-12-24 2018-12-24 A kind of unstrctured grid many-core coarse-grained parallelization method Pending CN109684061A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811583475.2A CN109684061A (en) 2018-12-24 2018-12-24 A kind of unstrctured grid many-core coarse-grained parallelization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811583475.2A CN109684061A (en) 2018-12-24 2018-12-24 A kind of unstrctured grid many-core coarse-grained parallelization method

Publications (1)

Publication Number Publication Date
CN109684061A true CN109684061A (en) 2019-04-26

Family

ID=66188193

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811583475.2A Pending CN109684061A (en) 2018-12-24 2018-12-24 A kind of unstrctured grid many-core coarse-grained parallelization method

Country Status (1)

Country Link
CN (1) CN109684061A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110543663A (en) * 2019-07-22 2019-12-06 西安交通大学 Coarse-grained MPI + OpenMP hybrid parallel-oriented structural grid area division method
CN110633149A (en) * 2019-09-10 2019-12-31 中国人民解放军国防科技大学 Parallel load balancing method for balancing calculation amount of unstructured grid unit

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140039853A1 (en) * 2011-01-10 2014-02-06 Saudi Arabian Oil Company Scalable simulation of multiphase flow in a fractured subterranean reservoir with multiple interacting continua by matrix solution
CN104239661A (en) * 2013-06-08 2014-12-24 中国石油化工股份有限公司 Large-scale numerical reservoir simulation calculation method
CN104375882A (en) * 2014-11-21 2015-02-25 北京应用物理与计算数学研究所 Multistage nested data drive calculation method matched with high-performance computer structure
CN108595277A (en) * 2018-04-08 2018-09-28 西安交通大学 A kind of communication optimization method of the CFD simulated programs based on OpenMP/MPI hybrid programmings
CN109064559A (en) * 2018-05-28 2018-12-21 杭州阿特瑞科技有限公司 Vascular flow analogy method and relevant apparatus based on mechanical equation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140039853A1 (en) * 2011-01-10 2014-02-06 Saudi Arabian Oil Company Scalable simulation of multiphase flow in a fractured subterranean reservoir with multiple interacting continua by matrix solution
CN104239661A (en) * 2013-06-08 2014-12-24 中国石油化工股份有限公司 Large-scale numerical reservoir simulation calculation method
CN104375882A (en) * 2014-11-21 2015-02-25 北京应用物理与计算数学研究所 Multistage nested data drive calculation method matched with high-performance computer structure
CN108595277A (en) * 2018-04-08 2018-09-28 西安交通大学 A kind of communication optimization method of the CFD simulated programs based on OpenMP/MPI hybrid programmings
CN109064559A (en) * 2018-05-28 2018-12-21 杭州阿特瑞科技有限公司 Vascular flow analogy method and relevant apparatus based on mechanical equation

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110543663A (en) * 2019-07-22 2019-12-06 西安交通大学 Coarse-grained MPI + OpenMP hybrid parallel-oriented structural grid area division method
CN110543663B (en) * 2019-07-22 2021-07-13 西安交通大学 Coarse-grained MPI + OpenMP hybrid parallel-oriented structural grid area division method
CN110633149A (en) * 2019-09-10 2019-12-31 中国人民解放军国防科技大学 Parallel load balancing method for balancing calculation amount of unstructured grid unit
CN110633149B (en) * 2019-09-10 2021-06-04 中国人民解放军国防科技大学 Parallel load balancing method for balancing calculation amount of unstructured grid unit

Similar Documents

Publication Publication Date Title
Zhang et al. Energy-efficient scheduling for real-time systems based on deep Q-learning model
Shan et al. FPMR: MapReduce framework on FPGA
Yang et al. Adaptive optimization for petascale heterogeneous CPU/GPU computing
US10355966B2 (en) Managing variations among nodes in parallel system frameworks
CN107209548A (en) Power management is performed in polycaryon processor
Liu et al. Power-efficient time-sensitive mapping in heterogeneous systems
US20210263739A1 (en) Vector reductions using shared scratchpad memory
CN109918199B (en) GPU-based distributed graph processing system
CN104636187B (en) Dispatching method of virtual machine in NUMA architecture based on load estimation
CN106462219A (en) Systems and methods of managing processor device power consumption
CN1981280A (en) Apparatus and method for heterogeneous chip multiprocessors via resource allocation and restriction
CN104969182A (en) High dynamic range software-transparent heterogeneous computing element processors, methods, and systems
Liao et al. Long-term generation scheduling of hydropower system using multi-core parallelization of particle swarm optimization
Khairy et al. Efficient utilization of gpgpu cache hierarchy
Talbi et al. Metaheuristics on gpus
Morad et al. Generalized MultiAmdahl: Optimization of heterogeneous multi-accelerator SoC
Sato et al. Co-design and system for the supercomputer “Fugaku”
CN109684061A (en) A kind of unstrctured grid many-core coarse-grained parallelization method
Li et al. A hybrid particle swarm optimization algorithm for load balancing of MDS on heterogeneous computing systems
CN103399799B (en) Computational physics resource node load evaluation method and device in cloud operating system
Wang et al. Architecture and compiler support for gpus using energy-efficient affine register files
Terzopoulos et al. Performance evaluation of a real-time grid system using power-saving capable processors
Rossinelli et al. Mesh–particle interpolations on graphics processing units and multicore central processing units
Du et al. Feature-aware task scheduling on CPU-FPGA heterogeneous platforms
Dally On the model of computation: point

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190426

RJ01 Rejection of invention patent application after publication