CN117573375B - Dynamic load balance parallel computing method oriented to self-adaptive decoupling equation - Google Patents

Dynamic load balance parallel computing method oriented to self-adaptive decoupling equation Download PDF

Info

Publication number
CN117573375B
CN117573375B CN202410054454.0A CN202410054454A CN117573375B CN 117573375 B CN117573375 B CN 117573375B CN 202410054454 A CN202410054454 A CN 202410054454A CN 117573375 B CN117573375 B CN 117573375B
Authority
CN
China
Prior art keywords
chemical reaction
transfer
grid
load
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410054454.0A
Other languages
Chinese (zh)
Other versions
CN117573375A (en
Inventor
张斌
肖辰祥
胡桐
李林颖
刘淏旸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Research Institute Of Shanghai Jiaotong University
Original Assignee
Sichuan Research Institute Of Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Research Institute Of Shanghai Jiaotong University filed Critical Sichuan Research Institute Of Shanghai Jiaotong University
Priority to CN202410054454.0A priority Critical patent/CN117573375B/en
Publication of CN117573375A publication Critical patent/CN117573375A/en
Application granted granted Critical
Publication of CN117573375B publication Critical patent/CN117573375B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a dynamic load balance parallel computing method for a self-adaptive decoupling equation, and provides a dynamic load balance algorithm which is based on MPI+OpenMP mixed parallel, is more flexible and efficient and is oriented to the problem of load unbalance of the self-adaptive decoupling NS equation on the basis of a dynamic data transfer scheme. The method solves the technical problems that the existing load balancing method does not fully utilize the advantages of the shared memory of the thread level parallelism, the load balancing algorithm is complex, and the data transfer cost is large, achieves dynamic load balancing more flexibly and efficiently, reduces program blocking time, improves computing efficiency and accelerates computing.

Description

Dynamic load balance parallel computing method oriented to self-adaptive decoupling equation
Technical Field
The invention relates to the field of load balancing, in particular to a dynamic load balancing parallel computing method oriented to a self-adaptive decoupling equation.
Background
With the rapid development of computer technology, numerical simulation (CFD) has become an important tool for the study of chemical reaction flow problems. However, the stiffness problem in the chemical reaction flow problem, i.e. the chemical reaction characteristic time is not consistent with the flow characteristic time scale, presents a serious challenge to conventional numerical calculation methods. On the one hand, when the stiffness problem is severe, the general numerical method may not converge, and even a false numerical solution may be generated. On the other hand, the rigidity problem causes a sharp increase in the amount of numerical calculation, even making the conventional numerical method unusable. The existing acceleration calculation method comprises the following steps: reducing model size, adaptive trellis encryption, high performance parallel computing (HPC), etc. HPC has become a powerful tool for accelerating CFD simulation, typically using shared memory based Open Multi-Processing (OpenMP), pthread, and distributed memory based Message Passing Interface (MPI). However, HPC introduces load imbalance caused by rigidity while accelerating calculation, and the calculation is concentrated in a region where chemical reaction is intense, so that the calculation load between processes is seriously unbalanced, and the parallel efficiency is greatly reduced, so that load balancing becomes an important subject of research.
The existing load balancing scheme is mainly divided into three parts:
1. dynamic region decomposition: monitoring load distribution conditions in calculation iteration, re-dividing grids when a load balance threshold is reached, processing calculation tasks of grids of each process, and communicating boundary data between different processes.
2. Dynamic data transfer: and (3) maintaining optimal grid division for flow calculation in calculation iteration, and transferring calculation tasks among different processes when the optimal grid division reaches a load balance threshold. The method utilizes the characteristic of local chemical reaction characteristics, namely each grid only needs to iterate chemical reaction according to flow field data, so that the transfer of calculation tasks is not data-dependent, and the method is very flexible. Disadvantages: most schemes only consider using MPI to achieve process level parallelism and the data transfer algorithm is complex.
3. CPU/GPU heterogeneous computing: in the calculation iteration, grid nodes with high rigidity are distributed to a CPU for implicit solution, grid nodes with low rigidity are distributed to a GPU for display solution, the advantages of the grid nodes and the GPU are fully exerted, similar calculated amount is achieved, and therefore load is balanced. Disadvantages: the realization is complex and the requirement on hardware is high.
The problem that this patent was aimed at is the unbalanced load phenomenon when self-adaptation decoupling NS (Naver-Stokes) equation was solved. The NS equation describes the flow of fluid and is the core of CFD. In CFD chemical reaction flow calculation, decoupling is a common method, namely, the chemical reaction is separated from the flow, the time precision is high, the memory consumption is very low, and the parallel calculation and the software engineering are facilitated. The adaptive approach is to add a rigid-based predictive step prior to the chemical reaction process, and in one flow time step, ODE (ordinary differential equations) of the chemical reaction requires multiple steps of computation, so the optimal number of sub-iteration steps per grid at the time is obtained by the predictive step. The method not only can ensure that reasonable numerical solutions are obtained, but also greatly reduces the calculated amount of the algorithm. However, in parallel computing, the time of chemical reaction is dominant, and the computation amount between different processes is different, so that load imbalance is caused. After the completion of the calculation, the process with small load needs to wait for the process with large load to calculate, resulting in a large amount of blocking time (mpi_barrier). For this problem, due to its decoupling characteristics, a load balancing strategy for dynamic data transfer is more suitable.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a dynamic load balance parallel computing method oriented to a self-adaptive decoupling equation.
In order to achieve the aim of the invention, the invention adopts the following technical scheme:
a dynamic load balance parallel computing method facing self-adaptive decoupling equation includes the following steps:
s1, acquiring the load of each process and judging the load unbalance degree, wherein the load is the sum of the chemical reaction iteration steps of all grids of each process;
s2, generating a chemical reaction iteration step number transfer list to obtain a sending process, a receiving process and a transfer iteration step number, and generating a grid transfer list to transfer the transfer iteration step number to specific grid data;
s3, the sending process sends the packed data to a corresponding receiving process and unpacks the packed data, and the computing operation is carried out while the data is sent through a non-blocking channel;
s4, carrying out chemical reaction formula solving of the transfer grid after receiving the transferred data and completing data unpacking.
Further, the sum of the steps of the chemical reaction iteration in S1 is expressed as:
in the method, in the process of the invention,for the load of the ith process, +.>For the mesh size of the ith process, +.>The number of iterative steps of the chemical reaction for the j-th grid of the i-th process,/, is->Is the number of processes.
Further, the specific calculation method of the load unbalance degree in S1 is as follows:
in the method, in the process of the invention,for load imbalance, +.>For the maximum value of the load in each process,an average value of loads of each process;
and when the load unbalance is larger than a balance threshold value, starting a dynamic load balance algorithm.
Further, the specific way of generating the chemical reaction iteration step number transfer list in S2 is as follows:
a1, dividing the load into a part larger than or equal to the average value and a part smaller than the average value by utilizing a priority queue;
a2, based on a greedy algorithm, continuously taking out the element with the largest difference value from the part which is larger than or equal to the average value and the element with the smallest difference value from the part which is smaller than the average value, and transferring until the transfer times reach a threshold value, so as to obtain a chemical reaction iteration step number transfer list;
a3, corresponding the transmission process-receiving process-transfer iteration steps in the obtained chemical reaction iteration step transfer list.
Further, the specific way of generating the grid transfer list in S2 is as follows:
b1, inputting a transfer list of iteration steps of the chemical reaction according to the obtained steps, and process numbers which are required to be sent to other processes by each process;
b2, traversing grids, judging whether a sending target process is empty, if yes, exiting a loop, if not, judging whether the current grid chemical reaction iteration step number can be added into a transfer grid from a first target process, if yes, adding the current grid and related data into corresponding data of the transfer grid, if not, judging whether a next target process can be added into the target process meeting the required iteration step number preferentially;
and B3, exiting the loop until all target processes meet the required iteration step number, and obtaining a grid transfer list.
Further, in the step S3, the sending process sends the packed transfer flow field data and transfer chemical reaction data to the corresponding receiving process by using non-blocking communication, and unpacks the transfer flow field data and the transfer chemical reaction data.
Further, the step S4 includes a chemical reaction solution of the receiving process and a chemical reaction solution of the sending process.
Further, the chemical reaction solution of the receiving process comprises the following steps:
s41, receiving transferred data and completing data unpacking, and immediately starting solving and calculating of chemical reactions of the transfer grid;
s42, after the solving is completed, the calculation result is packed and data updating is carried out;
s43, sending the data updating result back to the original process by utilizing the MPI_Isend of non-blocking communication, starting the solving and calculating of the grid chemical reaction of the process, and overlapping the MPI communication time and the chemical reaction calculating time by utilizing the MPI_Isend non-blocking transmission;
the sending process chemical reaction solution only carries out the solution calculation of the grid chemical reaction of the process
The invention has the following beneficial effects:
the mixed parallel scheme of MPI and OpenMP is adopted, the advantages of MPI process level parallel distributed memory and OpenMP thread level parallel shared memory are fully utilized, data transmission is reduced, and parallel efficiency is improved;
generating a data transfer list by utilizing a priority queue, optimizing the time complexity to be O (nlogn), and improving the efficiency of a dynamic complex balance algorithm;
the communication time is synchronized with the calculation time by using non-blocking communication, so that the communication overhead of frequently carrying out data transfer is reduced;
for the decoupling NS equation solution of the self-adaptive chemical reaction step, the approximately uniform load distribution condition is realized, and the blocking time is reduced to 4.77 percent from 78.65 percent of the standard self-adaptive decoupling algorithm; the parallel efficiency is further improved from 18.42% to 47.75% of the standard algorithm.
Drawings
FIG. 1 is a flow chart of a dynamic load balancing algorithm of the present invention.
Fig. 2 is an mpi+openmp hybrid parallel model diagram according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and all the inventions which make use of the inventive concept are protected by the spirit and scope of the present invention as defined and defined in the appended claims to those skilled in the art.
A dynamic load balance parallel computing method facing an adaptive decoupling equation is shown in fig. 1, and comprises the following steps:
s1, acquiring the load of each process and judging the load unbalance degree, wherein the load is the sum of the chemical reaction iteration steps of all grids of each process;
firstly, an MPI+OpenMP mixed parallel scheme is designed, and the advantages of a process-level parallel distributed memory and a thread-level parallel shared memory are fully utilized.
MPI distributes computing tasks to different processes through grid decomposition and task division of a main process, and transmits necessary information (such as communication of boundary information) between the different processes. The general process is as follows: a) Initializing an MPI parallel environment by using an MPI_initial function, and starting MPI parallel computation; b) Dividing a computing task into n parts, and distributing each part to an MPI process; c) Each process performs independent calculation and enjoys independent memory space; d) When necessary information communication is needed between processes, message communication functions such as MPI_Send and MPI_Recv are called; e) All process calculation tasks are finished, and the MPI_Finalize function is called to terminate. OpenMP typically achieves computational acceleration by performing multi-threaded computation on hot spot loops. The general procedure is for the fork-join mode: a) The computing task is performed on the main line Cheng Chuanhang; b) The calculation is carried out to a hot spot loop, a multithreading calculation (fork) is started, and the shared memory does not need to be communicated; c) The hotspot loop ends and the regression main thread (join) continues the serial computation.
The mpi+openmp hybrid parallel scheme is shown in fig. 2, and the details are described below:
a) The calculation domain is decomposed into n sub-domains, so that the load balance (uniform distribution) of flow solving is ensured;
b) Each sub-domain is allocated to an MPI process;
c) Each subdomain realizes non-blocking communication by using MPI_Isend and MPI_Irech functions, carries out boundary information communication among processes, overlaps communication with calculation by using the non-blocking communication, and eliminates communication time;
d) When encountering a hot spot cycle (chemical reaction), opening OpenMP multithreading calculation, closing the OpenMP multithreading after the hot spot cycle is ended, and returning to process calculation
The load is defined in the present algorithm as the sum of the number of chemical reaction iteration steps for all grids of each process, as shown in equation 1. The core idea is to transfer chemical reaction calculation between different processes (the process with the largest load is preferentially transferred to the process with the smallest load) based on a greedy algorithm, so that uniform calculation load between different processes is realized, program blocking time is reduced, and calculation efficiency is improved.
Wherein,for the load of the ith process, +.>For the mesh size of the ith process, +.>The number of iterative steps of the chemical reaction for the j-th grid of the i-th process,/, is->Is the number of processes.
And collecting loads of all processes and judging the load unbalance degree. Obtaining loads of all processes by using MPI_Allgather functionCalculating the load imbalance +.>When it is greater than the load balancing thresholdWhen (i.e.)>Starting a dynamic load balancing algorithm;
wherein,for the maximum value of the load in each process, +.>Is the average value of the load of each process.
S2, generating a chemical reaction iteration step number transfer list to obtain a sending process, a receiving process and a transfer iteration step number, and generating a grid transfer list to transfer the transfer iteration step number to specific grid data;
and generating a chemical reaction iteration step number transfer list, wherein the load is defined by the chemical reaction iteration step number, and the chemical reaction iteration step number which needs to be transferred for each process, namely the sending process-receiving process-transfer iteration step number, is needed to be generated. The specific details are as follows:
a1, dividing the load into a part larger than or equal to the average value and a part smaller than the average value by utilizing a priority queue;
a2, based on a greedy algorithm, continuously taking out the element with the largest difference value from the part which is larger than or equal to the average value and the element with the smallest difference value from the part which is smaller than the average value, and transferring until the transfer times reach a threshold value, so as to obtain a chemical reaction iteration step number transfer list;
performing transfer operation circularly, utilizing the characteristic of the priority queue, and based on greedy algorithm by continuously performing slave operationThe element with the largest difference is taken out, and +.>The element with the minimum difference value is taken out for transferring, thereby realizing high efficiencyElement balance transfer of (2), the time complexity of each transfer operation is +.>The total transfer operation time complexity is
A3, corresponding the transmission process-receiving process-transfer iteration steps in the obtained chemical reaction iteration step transfer list.
Finally, the three columns shown in the first column of the table 1 are generatedThe tuple list (10 processes are taken as an example) corresponds to the number of transmit process-receive process-transfer iteration steps. The fraction of time was tested to be only 0.03% of the total computation time.
TABLE 1 iterative steps for transfer chemistry and transfer grid array
A grid transfer list is generated as shown in algorithm 2. After the sending process-receiving process-transferring iteration step number is obtained, the data transfer must be performed based on the grid, that is, the iteration step number to be transferred is transferred to specific grid data, so that the sending process-receiving process-transferring grid number and index (after the transfer calculation, the original process needs to be returned) need to be generated, and meanwhile, the data transfer is packed. The specific details are as follows:
b1, inputting a transfer list of iteration steps of the chemical reaction according to the obtained steps, and process numbers which are required to be sent to other processes by each process;
b2, traversing grids, judging whether a sending target process is empty, if yes, exiting from a loop, if not, judging whether the chemical reaction iteration steps of the current grid can be added into a transfer grid from a first target process, and if yes, adding the current grid and related data into corresponding data of the transfer grid;
and B3, starting the next target process when the number of chemical reaction iteration steps transferred by one target process meets the requirement, and obtaining a grid transfer list after all processes are transferred.
Final number of transfer gridsAs shown in the fourth column of table 1.
S3, the sending process sends the packed data to a corresponding receiving process and unpacks the packed data, and the computing operation is carried out while the data is sent through a non-blocking channel;
the sending process will package the data by MPI non-blocking communication MPI_Isend and MPI_RecvAnd sending the packet to a corresponding receiving process, and unpacking the packet. Through non-blocking communication, calculation operation (including partial parameter calculation, memory release and opening) is performed while data is transmitted, so that efficient communication operation is realized, and the MPI_Recv is used here to ensure that all data is received, and the next operation is performed. The part of the communication time only accounts for 0.52% of the total calculation time through testing;
s4, carrying out chemical reaction formula solving of the transfer grid after receiving the transferred data and completing data unpacking.
For the receiving process, after receiving the transferred data and completing data unpacking, immediately starting the solving and calculating of the chemical reaction of the transfer grid, and packing the data after the solving is completedThe MPI_Isend is used to send back to the original process, then the solution calculation of the grid chemical reaction of the process is started, and the two latter communication and calculation are overlapped through the MPI_Isend non-blocking sending. For the sending process, only the solving calculation of the grid chemical reaction of the process is carried out. In the chemical reaction calculation, MPI+OpenMP mixed parallelism described in the previous section is used to further accelerate calculation and improve calculation efficiency, and the sending process receives returned data by using MPI_Recv, unpacks and updates the returned data to local data to complete the calculationThe calculation of one flow step proceeds to the next iteration.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The principles and embodiments of the present invention have been described in detail with reference to specific examples, which are provided to facilitate understanding of the method and core ideas of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.
Those of ordinary skill in the art will recognize that the embodiments described herein are for the purpose of aiding the reader in understanding the principles of the present invention and should be understood that the scope of the invention is not limited to such specific statements and embodiments. Those of ordinary skill in the art can make various other specific modifications and combinations from the teachings of the present disclosure without departing from the spirit thereof, and such modifications and combinations remain within the scope of the present disclosure.

Claims (4)

1. The dynamic load balance parallel computing method for the self-adaptive decoupling equation is characterized by comprising the following steps of:
s1, acquiring the load of each process and judging the load unbalance degree, wherein the load is the sum of the chemical reaction iteration steps of all grids of each process, and the load unbalance degree is specifically calculated in the following way:
in the method, in the process of the invention,for load imbalance, +.>For the maximum value of the load in each process,an average value of loads of each process;
when the load unbalance is larger than a balance threshold value, starting a dynamic load balance algorithm;
s2, generating a chemical reaction iteration step number transfer list to obtain a sending process, a receiving process and a transfer iteration step number, generating a grid transfer list to transfer the transfer iteration step number to specific grid data, and generating the chemical reaction iteration step number transfer list in the following specific modes:
a1, dividing the load into a part larger than or equal to the average value and a part smaller than the average value by utilizing a priority queue;
a2, based on a greedy algorithm, continuously taking out the element with the largest difference value from the part which is larger than or equal to the average value and the element with the smallest difference value from the part which is smaller than the average value, and transferring until the transfer times reach a threshold value, so as to obtain a chemical reaction iteration step number transfer list;
a3, corresponding the transmission process-receiving process-transfer iteration steps in the obtained chemical reaction iteration step transfer list;
the specific mode for generating the grid transfer list is as follows:
b1, inputting a transfer list of iteration steps of the chemical reaction according to the obtained steps, and process numbers which are required to be sent to other processes by each process;
b2, traversing grids, judging whether a sending target process is empty, if yes, exiting a loop, if not, judging whether the current grid chemical reaction iteration step number can be added into a transfer grid from a first target process, if yes, adding the current grid and related data into corresponding data of the transfer grid, if not, judging whether a next target process can be added into the target process meeting the required iteration step number preferentially;
b3, starting the next target process when the number of chemical reaction iteration steps transferred by one target process meets the requirement, and obtaining a grid transfer list after all processes are transferred;
s3, the sending process sends the packed data to a corresponding receiving process and unpacks the packed data, and the computing operation is carried out while the data is sent through non-blocking communication;
s4, the receiving process receives the transferred data and performs chemical reaction formula solving of the transfer grid after data unpacking is completed.
2. The method for dynamically balancing and parallel computing for self-adaptive decoupling equations according to claim 1, wherein the sum of the number of chemical reaction iteration steps in S1 is expressed as:
in the method, in the process of the invention,for the load of the ith process, +.>For the mesh size of the ith process, +.>The number of iterative steps of the chemical reaction for the j-th grid of the i-th process,/, is->Is the number of processes.
3. The adaptive decoupling equation-oriented dynamic load balancing parallel computing method of claim 1, wherein S4 includes a chemical reaction solution of a receiving process and a chemical reaction solution of a transmitting process.
4. A dynamic load balancing parallel computing method for an adaptive decoupling equation according to claim 3, wherein the chemical reaction solution of the receiving process comprises the steps of:
s41, receiving transferred data and completing data unpacking, and immediately starting solving and calculating of chemical reactions of the transfer grid;
s42, after the solving is completed, the calculation result is packed and data updating is carried out;
s43, sending the data updating result back to the original process by using non-blocking communication, starting the solving and calculating of the grid chemical reaction of the process, and overlapping the MPI communication time and the chemical reaction calculating time by using non-blocking communication;
and the sending process chemical reaction solution only carries out the solution calculation of the grid chemical reaction of the process.
CN202410054454.0A 2024-01-15 2024-01-15 Dynamic load balance parallel computing method oriented to self-adaptive decoupling equation Active CN117573375B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410054454.0A CN117573375B (en) 2024-01-15 2024-01-15 Dynamic load balance parallel computing method oriented to self-adaptive decoupling equation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410054454.0A CN117573375B (en) 2024-01-15 2024-01-15 Dynamic load balance parallel computing method oriented to self-adaptive decoupling equation

Publications (2)

Publication Number Publication Date
CN117573375A CN117573375A (en) 2024-02-20
CN117573375B true CN117573375B (en) 2024-04-02

Family

ID=89862787

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410054454.0A Active CN117573375B (en) 2024-01-15 2024-01-15 Dynamic load balance parallel computing method oriented to self-adaptive decoupling equation

Country Status (1)

Country Link
CN (1) CN117573375B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239586A (en) * 2016-03-29 2017-10-10 南京理工大学 The domain decomposition parallel method effective to unconditionally stable FDTD method
CN108595277A (en) * 2018-04-08 2018-09-28 西安交通大学 A kind of communication optimization method of the CFD simulated programs based on OpenMP/MPI hybrid programmings
CN109885401A (en) * 2019-01-27 2019-06-14 中国人民解放军国防科技大学 Structured grid load balancing method based on LPT local optimization
CN110275732A (en) * 2019-05-28 2019-09-24 上海交通大学 The Parallel Implementation method of particle in cell method on ARMv8 processor
CN112380793A (en) * 2020-11-18 2021-02-19 上海交通大学 Turbulence combustion numerical simulation parallel acceleration implementation method based on GPU
CN113392568A (en) * 2021-08-17 2021-09-14 北京航空航天大学 Load balancing method of dynamic calculation domain in aircraft aerodynamic characteristic parallel simulation
CN113485826A (en) * 2021-06-25 2021-10-08 中国电子科技集团公司第五十四研究所 Load balancing method and system for edge server
CN116390161A (en) * 2023-03-20 2023-07-04 重庆邮电大学 Task migration method based on load balancing in mobile edge calculation

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9876539B2 (en) * 2014-06-16 2018-01-23 Ntt Docomo, Inc. Method and apparatus for scalable load balancing across wireless heterogeneous MIMO networks
CN105446979B (en) * 2014-06-27 2019-02-01 华为技术有限公司 Data digging method and node

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239586A (en) * 2016-03-29 2017-10-10 南京理工大学 The domain decomposition parallel method effective to unconditionally stable FDTD method
CN108595277A (en) * 2018-04-08 2018-09-28 西安交通大学 A kind of communication optimization method of the CFD simulated programs based on OpenMP/MPI hybrid programmings
CN109885401A (en) * 2019-01-27 2019-06-14 中国人民解放军国防科技大学 Structured grid load balancing method based on LPT local optimization
CN110275732A (en) * 2019-05-28 2019-09-24 上海交通大学 The Parallel Implementation method of particle in cell method on ARMv8 processor
CN112380793A (en) * 2020-11-18 2021-02-19 上海交通大学 Turbulence combustion numerical simulation parallel acceleration implementation method based on GPU
CN113485826A (en) * 2021-06-25 2021-10-08 中国电子科技集团公司第五十四研究所 Load balancing method and system for edge server
CN113392568A (en) * 2021-08-17 2021-09-14 北京航空航天大学 Load balancing method of dynamic calculation domain in aircraft aerodynamic characteristic parallel simulation
CN116390161A (en) * 2023-03-20 2023-07-04 重庆邮电大学 Task migration method based on load balancing in mobile edge calculation

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Harshitha Menon等.A distributed dynamic load balancer for iterative applications.《SC'13:Proceedings of the International Conference on High Performance Computing,Networking,Storage and Analysis》.2013,1-11. *
基于DGX-2的湍流燃烧问题优化研究;文敏华等;《计算机科学》;20221210;第48卷(第12期);43-48 *
软件定义网络多控制器负载分配关键技术研究;王军晓;《中国优秀硕士学位论文全文数据库 信息科技辑》;20180415(第04期);I139-162 *
面向多接入边缘计算的计算卸载方案研究综述;张冰洁等;《计算机科学与探索》;20230509;第17卷(第09期);2030-2046 *

Also Published As

Publication number Publication date
CN117573375A (en) 2024-02-20

Similar Documents

Publication Publication Date Title
CN110119311B (en) Distributed stream computing system acceleration method based on FPGA
CN104461467B (en) The method for improving calculating speed using MPI and OpenMP hybrid parallels for SMP group systems
EP3779804A1 (en) Chip and chip-based data processing method
Wu et al. A deadline-aware estimation of distribution algorithm for resource scheduling in fog computing systems
US11816061B2 (en) Dynamic allocation of arithmetic logic units for vectorized operations
Kunz et al. Multi-level parallelism for time-and cost-efficient parallel discrete event simulation on gpus
Shi et al. Exploiting simultaneous communications to accelerate data parallel distributed deep learning
Sudheer et al. Optimization of the hop-byte metric for effective topology aware mapping
Zhang et al. Low-latency mini-batch gnn inference on cpu-fpga heterogeneous platform
CN117573375B (en) Dynamic load balance parallel computing method oriented to self-adaptive decoupling equation
CN107273092B (en) Method and system for optimizing memory access delay of data stream architecture
Chu et al. Dynamic kernel fusion for bulk non-contiguous data transfer on GPU clusters
Lin et al. swFLOW: A dataflow deep learning framework on sunway taihulight supercomputer
Liu et al. A hybrid parallel genetic algorithm with dynamic migration strategy based on sunway many-core processor
Jiang et al. An optimized resource scheduling strategy for Hadoop speculative execution based on non-cooperative game schemes
Wang et al. Prophet: Fine-grained Load Balancing for Parallel Training of Large-scale MoE Models
CN108228323A (en) Hadoop method for scheduling task and device based on data locality
CN111090508A (en) OpenCL-based dynamic task scheduling method among heterogeneous cooperative parallel computing devices
CN117632520B (en) Master-slave core monitoring interaction calculated amount scheduling method based on Shenwei many-core processor
WO2022141344A1 (en) Executive program compilation method, and chip, electronic device, and computer-readable storage medium
Huang et al. Performance optimization of High-Performance LINPACK based on GPU-centric model on heterogeneous systems
Khelifi et al. Towards efficient and concurrent FFTs implementation on Intel Xeon/MIC clusters for LTE and HPC
ZHANG et al. A Hybrid Simulator to Analyze Gradient Staleness Effect
Lu et al. Implementation of parallel convolution based on MPI
Pousa et al. Improving communication performance of sparse linear algebra for an atomistic simulation application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant