CN111090508B - OpenCL-based dynamic task scheduling method between heterogeneous cooperative parallel computing devices - Google Patents

OpenCL-based dynamic task scheduling method between heterogeneous cooperative parallel computing devices Download PDF

Info

Publication number
CN111090508B
CN111090508B CN201911203540.9A CN201911203540A CN111090508B CN 111090508 B CN111090508 B CN 111090508B CN 201911203540 A CN201911203540 A CN 201911203540A CN 111090508 B CN111090508 B CN 111090508B
Authority
CN
China
Prior art keywords
workload
calculation
curr
block
computing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911203540.9A
Other languages
Chinese (zh)
Other versions
CN111090508A (en
Inventor
朱正东
李少辉
李小轩
韩靖雯
王鹏博
李珍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN201911203540.9A priority Critical patent/CN111090508B/en
Publication of CN111090508A publication Critical patent/CN111090508A/en
Application granted granted Critical
Publication of CN111090508B publication Critical patent/CN111090508B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4812Task transfer initiation or dispatching by interrupt, e.g. masked
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses an OpenCL-based dynamic task scheduling method among heterogeneous cooperative parallel computing devices, which comprises the following steps: firstly, taking a part of the total workload of a specified calculation kernel as an initial block size, then obtaining the task division proportion of each calculation device during the first calculation according to the theoretical peak value of each device participating in the collaborative parallel calculation during the first calculation execution process of the specified calculation kernel, and then dynamically adjusting the size of the next block and the task division proportion of each calculation device during the next calculation according to the calculation speed fed back by each calculation device participating in the collaborative parallel calculation during the execution process of the specified calculation kernel. The method realizes the effect of feedback type dynamic task division, and meanwhile, the overall performance of multi-device collaborative parallel computing can be improved. The invention completes the design details, the realization algorithm and the coding work of the functions and improves the resource utilization rate of a plurality of devices in parallel computing.

Description

OpenCL-based dynamic task scheduling method among heterogeneous cooperative parallel computing devices
Technical Field
The invention belongs to the technical field of computer application, and particularly relates to an OpenCL-based dynamic task scheduling method between heterogeneous cooperative parallel computing devices.
Background
With the rise of parallel programming languages, such as OpenCL and CUDA, heterogeneous computing platforms composed of host Computer (CPU) and GPU-based acceleration devices have become mainstream computing architectures nowadays. Such platforms provide higher performance for computationally intensive applications. On a heterogeneous platform formed by a CPU and various accelerating devices, openCL exerts the portability and cross-platform characteristics thereof, so that OpenCL is popular. But this programming model does not have an efficient and sophisticated task scheduling framework. Task scheduling between devices becomes especially important in order to be able to fully utilize the resources of heterogeneous systems. A static task scheduling strategy is adopted in the multi-device collaborative parallel computing, an accurate task division proportion is selected, load balance among computing devices can be effectively achieved, and no scheduling overhead exists during operation. However, the acquisition of the optimal task division ratio depends on time-consuming and labor-consuming offline training, and once the application program, the problem scale, the type and the number of devices participating in the collaborative parallel computation, the software and hardware configuration of the heterogeneous many-core system, and the like are changed, the offline training must be performed again. For static task scheduling, a poor task partition ratio may cause severe load imbalance among the devices, so that the overall performance of the multi-device cooperative parallel computing is significantly reduced.
Disclosure of Invention
In order to solve the problems in the prior art, the invention aims to provide a dynamic task scheduling method between devices in heterogeneous cooperative parallel computing based on OpenCL, and the method can improve the overall performance of multi-device cooperative parallel computing.
The invention adopts the following technical scheme:
a dynamic task scheduling method among heterogeneous cooperative parallel computing devices based on OpenCL comprises the following processes:
firstly, taking a part of the total workload of a specified calculation kernel as an initial block size and executing, then obtaining the task division proportion of each calculation device during the first calculation according to the theoretical peak value of each device participating in the collaborative parallel calculation in the first calculation execution process of the specified calculation kernel, and then dynamically adjusting the size of the next block and the task division proportion of each calculation device during the next calculation according to the calculation speed fed back by each calculation device participating in the collaborative parallel calculation in the execution process of the specified calculation kernel.
The dynamic task scheduling method among the devices in the heterogeneous cooperative parallel computing based on the OpenCL specifically comprises the following steps:
s1, taking a part of total workload of a specified computation kernel as a first block, and cooperatively executing the first block by using a computation device;
s2, judging whether the residual workload exists or not, and if not, indicating that the designated calculation kernel is executed; if so, cooperatively executing the second block using the computing device;
and S3, repeating the S2 until the residual workload is 0.
The S1 comprises the following steps:
s1.1, dividing the ratio R according to the initial task i To a computing device D i Distribution workload W curr_i Wherein W is curr_i =W curr ×R i And W curr I is not less than 1 and not more than p, p is the total number of computing devices, n is a preset parameter, and W is the total workload of a specified computing kernel;
s1.2, in a computing device D i In executing the workload W assigned to it curr_i
S1.3, when calculating device D i Has completed the workload W assigned to it curr_i Thereafter, the collection computing device D i Current execution time T curr_i And computing device D i Current execution speed V of curr_i
S1.4, after all the computing devices complete respective work, computing device D i Relative execution speed RV i Wherein
Figure BDA0002296447110000021
Relative execution speed RV i Used as a new task division ratio, the task division ratio is updated as follows: r i =RV i ,1≤i≤p;
S1.5, calculating the current collaborative parallel execution speed V curr In which V is curr =W curr /T curr ,T curr =max(T curr_1 ,T curr_2 ,...,T curr_p );
S1.6, updating the total workload W which is finished f And the remaining workload W r Wherein W is f =W r +W curr ,W r =W-W f
The initial task division ratio R i Calculating the initial task division ratio R through manual setting or according to the proportional relation of theoretical peak values of all computing devices participating in cooperative parallel computation i
The second block size is 2 xW/n, S2 includes the steps of:
s2.1, distributing the workload of the second block to each computing device participating in the collaborative parallel computing according to the updated task division proportion;
s2.2, executing the workload distributed to each computing device;
s2.3, after each computing device finishes executing the respective workload, collecting the execution time of each computing device, calculating the relative execution speed of each computing device, and updating the task division ratio according to the obtained relative execution speed;
s2.4, calculating the current cooperative parallel execution speed;
s2.5, adjusting the size of the next block according to the current collaborative parallel execution speed obtained in the S2.4, and determining the workload required to be completed in the next step; determining whether the size of the next block is the multiplication, multiplication or maintenance compared with the size of the current block by comparing the current coordinated parallel execution speed of the previous block and the current coordinated parallel execution speed of the current block and comparing the size of the previous block and the size of the current block;
and S2.6, updating the completed total workload and the completed residual workload, if the residual workload is 0, completing the calculation task, and if the residual workload is not 0, performing S3.
In S3:
in each iteration, if device D is calculated i Is an accelerator, then at computing device D i Uploading a part of data of the current block from the host side to the computing device D according to the task division ratio before executing the current block i When computing device D i Slave computing device D in task division ratio after ending execution of current block i Downloading a part of processed data of the current block to a host end; after the current block is processed, the method is based on the cooperative parallelismThe dynamic variation of execution speed and workload adjusts the size of the next block, whose maximum size should not exceed the remaining workload.
In S3:
calculating the difference between the size of the next block and the residual workload of the current block in each iteration step, and if the difference is less than or equal to 0.5 times of the residual workload of the current block, taking the residual workload of the current block as the size of the next block; otherwise the size of the next block remains unchanged.
The invention has the following beneficial effects:
the invention discloses a heterogeneous collaborative parallel computing inter-device dynamic task scheduling method based on OpenCL. Therefore, the effect of feedback type dynamic task division is achieved, and meanwhile, the overall performance of multi-device collaborative parallel computing can be improved through the method.
Further, considering that a smaller block may result in underutilization of the computing power of the accelerator, in S3, a difference between the size of the next block and the remaining workload of the current block is calculated in each iteration, and if the difference is less than or equal to 0.5 times the remaining workload of the current block, the remaining workload of the current block is taken as the size of the next block; otherwise the size of the next block remains unchanged.
Drawings
FIG. 1 is a flowchart of a method for scheduling dynamic tasks among devices in OpenCL-based heterogeneous cooperative parallel computing.
FIG. 2 is a flowchart illustrating an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the figures and examples.
The expanded OpenCL programming framework is designed, and programmers can write a parallel application program which can be cooperatively and parallelly executed by any type and any plurality of computing devices in a master-slave heterogeneous system by using OpenCL. After a program is compiled and linked, it is necessary to package functions into a data structure called a kernel, i.e., to create a kernel. The runtime system is primarily responsible for dividing and equitably distributing a computing task to multiple computing devices, and then executing a device-specific kernel on each computing device to complete the sub-computing tasks assigned to it. According to the above, in a master-slave heterogeneous multi-device system, a CPU and multiple accelerators can be used to concurrently and cooperatively execute data-level parallel applications, but the most critical issue is how to reasonably and efficiently schedule tasks among computing devices. The dynamic task scheduling method between devices in heterogeneous cooperative parallel computing based on OpenCL provided by the invention can effectively solve the problem.
The research idea of the dynamic task scheduling method between devices in heterogeneous collaborative parallel computing based on OpenCL is to dynamically divide the whole iteration space of a computing kernel into a plurality of blocks with different sizes. Specifically, 1/n (namely W/n) of the total workload of a specified computational kernel is taken as an initial block size, wherein a parameter n can be manually set by a programmer, the preferred value range of the parameter n is 32-128, and the default value of the parameter n is 32; and then, dynamically adjusting the size of the next block according to the performance change of the multi-device cooperative parallel computing in the execution process of the specified computing kernel.
Referring to fig. 1 and fig. 2, the method for scheduling dynamic tasks among devices in heterogeneous collaborative parallel computing based on OpenCL specifically includes the following steps:
step 1: cooperatively executing a first block using p computing devices, the first block having a size of W/n; the method specifically comprises the following steps:
1.1 dividing the ratio R according to the initial task i Part of workload W of the first block curr_i To a device D i Wherein W is curr_i =W curr ×R i And W curr I is not less than 1 and not more than p. Initial task division ratio R i Can be manually set by a programmer or can be set by the programmerCalculating an initial task division ratio R according to the proportional relation of theoretical peak values of all devices participating in cooperative parallel computation i
1.2 at computing device D i In executing the workload W assigned to it curr_i
1.3 computing device D i After the workload assigned to it is completed, the collection computing device D i Current execution time T curr_i And computing device D i Current execution speed V of curr_i In which V is curr_i =W curr_i /T curr_i
1.4 computing device D when all p computing devices have completed their respective jobs i Relative execution speed RV i In which
Figure BDA0002296447110000051
Here, the relative execution speed RV i Is used as a new task division ratio, the task division ratio can be updated as follows: r i =RV i (1≤i≤p)。
1.5 calculating the current collaborative parallel execution speed V curr In which V is curr =W curr /T curr And T is curr =max(T curr_1 ,T curr_2 ,...,T curr_p )。
1.6 updating the Total workload W that has completed f And the remaining workload W r Wherein W is f =W r +W curr And W r =W-W f
Step 2: judging whether the residual workload exists, if not, indicating that the specified calculation kernel is executed completely; if so, a second block of 2 xW/n is cooperatively executed using p computing devices, as per the process of step 1.
The method specifically comprises the following steps:
2.1 dividing the ratio R according to the updated task i The workload of the second block is distributed to the computing devices participating in the collaborative parallel computing.
2.2 executing the respective workloads assigned to them in the respective computing devices.
And 2.3 after each computing device finishes the respective workload, collecting the execution time of each computing device, calculating the relative execution speed of each device, and updating the task division ratio according to the obtained relative execution speed.
2.4 calculating the current cooperative parallel execution speed.
2.5, adjusting the size of the next block according to the current coordinated parallel execution speed obtained in the step 2.4, namely determining the workload to be completed next step. The speed V of the coordinated parallel execution obtained by comparing step 1.5 prev And the current cooperative parallel execution speed V curr And comparing the size W of the previous block prev (i.e., the amount of work done in the previous step) and the size W of the current block curr (i.e., the amount of work done in the current step) to determine the size W of the next block next Compared to the current block size W curr Whether the multiplication, or subtraction remains the same.
And 2.6, updating the completed total workload and the residual workload, if the residual workload is 0, completing the calculation task, and if the residual workload is not 0, performing the step 3.
And 3, step 3: and repeating the step 2 until the residual workload is 0. Wherein, in each iteration step, if the device D is calculated i Is an accelerator, then at computing device D i Before the current block is executed, dividing the current block into a task division ratio R i Uploading a portion of data of a current block from a host to a computing device D i When computing device D i Dividing the ratio R according to the task after finishing the execution of the current block i Slave computing device D i And downloading a part of processed data of the current block to the host end. After the current block is processed, the size of the next block is adjusted according to the dynamic change of the cooperative parallel execution speed and the workload, and the maximum size of the next block should not exceed W r I.e. W next Should be less than or equal to W r . Furthermore, W is calculated in each iteration, considering that smaller blocks may result in underutilization of the computing power of the accelerator next And W r If the difference is less thanOr equal to 0.5 xW next Then W is next =W r (ii) a Otherwise, W next It remains unchanged.
In the dynamic task scheduling method between devices in heterogeneous collaborative parallel computing based on OpenCL, the setting of parameter n affects the performance of the algorithm. The preferable value range of the parameter n is 32-128, and the default value is 32; experimental results demonstrate that this default setting is reasonable but not necessarily optimal for each test procedure. The programmer can optimally set the parameter n for different computational kernels. The performance of the scheduling algorithm is also related to the initial block size setting, but experimental results show that: the setting of the initial block is of little impact on the performance of the scheduling algorithm as long as it is not too large or too small.
The embodiments of the present invention are illustrated below by specific examples. There is a computation task T Matrix-vector mul.cl system containing 1 CPU and 2 GPUs to test. OpenCL is used by programmers to write a parallel application that can be executed in parallel by any type and any number of computing devices in a master-slave heterogeneous system in coordination. After the program is compiled and linked, the function needs to be packed into a data structure called a kernel, i.e., the kernel is created. The feedback type dynamic task scheduling method is used for dynamically dividing the whole iteration space of a computing kernel (namely a data level parallel for loop) into a plurality of blocks with different sizes, and then dynamically adjusting the size of the next block according to the performance change of multi-device cooperative parallel computing in the execution process of a specified computing kernel.
In order to evaluate the performance of the dynamic task scheduling method between devices in the OpenCL-based heterogeneous cooperative parallel computing, the test programs in the table 1 are selected to realize each test program in the following ways. Single CPU core serial execution, multi-CPU parallel execution, multi-NIVIDIA GPU parallel execution, multi-AMD GPU parallel execution, CPU and NIVIDIA _ GPU coordinated parallel execution, CPU and AMD _ GPU coordinated parallel execution, and CPU and NIVIDIA _ GPU and AMD _ GPU coordinated parallel execution. Here, the parallel execution of multiple CPUs refers to using OpenMP to implement a specified test program and running the test program in an 8-core CPU; the parallel execution of multiple NIVIDIA GPUs means that a specified test program is realized by using CUDA and runs in a specified NIVIDIA GPU; the parallel execution of the multiple AMD GPUs refers to that a specified test program is realized by using an OpenCL programming model without adding a feedback type dynamic scheduling algorithm and the test program is operated in one specified AMD GPU. The CPU and NIVIDIA _ GPU and AMD _ GPU are compared using static scheduling, split scheduling policy, quic scheduling policy, and the feedback dynamic task scheduling policy proposed herein, respectively. In addition, considering that the initial block size has a great influence on the performance of the quick scheduling policy and the number of blocks also has a great influence on the performance of the split scheduling policy, for the sake of fairness, the appropriate initial block size is manually selected for the quick scheduling policy and the appropriate number of blocks is also manually selected for the split scheduling policy for a specified test procedure and a specified problem scale.
TABLE 1
Figure BDA0002296447110000081
/>

Claims (5)

1. A dynamic task scheduling method among heterogeneous cooperative parallel computing devices based on OpenCL is characterized by comprising the following steps: firstly, taking a part of the total workload of a specified calculation kernel as an initial block size and executing, then obtaining the task division proportion of each calculation device during the first calculation according to the theoretical peak value of each device participating in the collaborative parallel calculation in the first calculation execution process of the specified calculation kernel, and then dynamically adjusting the size of the next block and the task division proportion of each calculation device during the next calculation according to the calculation speed fed back by each calculation device participating in the collaborative parallel calculation in the execution process of the specified calculation kernel;
the method comprises the following steps:
s1, taking a part of total workload of a specified computation kernel as a first block, and cooperatively executing the first block by using a computation device;
s2, judging whether residual workload exists or not, and if not, indicating that the specified calculation kernel is executed completely; if so, cooperatively executing the second block using the computing device;
s3, repeating the step S2, and cooperatively executing the next block by using the computing equipment until the residual workload is 0;
s1 comprises the following steps:
s1.1, dividing the ratio R according to the initial task i To a computing device D i Distribution workload W curr_i Wherein W is curr_i =W curr ×R i And W curr I is not less than 1 and not more than p, p is the total number of computing devices, n is a preset parameter, and W is the total workload of a specified computing kernel;
s1.2, at computing device D i In executing the workload W assigned to it curr_i
S1.3, when calculating device D i Completes the workload W assigned to it curr_i Thereafter, the collection computing device D i Current execution time T curr_i And computing device D i Current execution speed V of curr_i
S1.4, after all the computing devices complete respective work, computing device D i Relative execution speed RV i In which
Figure FDA0004039652670000011
Relative execution speed RV i Used as a new task division ratio, the task division ratio is updated as follows: r i =RV i ,1≤i≤p;
S1.5, calculating the current collaborative parallel execution speed V curr In which V is curr =W curr /T curr
T curr =max(T curr_1 ,T curr_2 ,...,T curr_p );
S1.6, updating the total workload W which is completed f And the remaining workload W r Wherein W is f =W r +W curr ,W r =W-W f
2. The method for dynamically scheduling tasks among devices in heterogeneous cooperative parallel computing based on OpenCL as claimed in claim 1, wherein the method is implemented by using a distributed architectureStarting task division ratio R i Calculating the initial task division ratio R through manual setting or according to the proportional relation of theoretical peak values of all computing devices participating in cooperative parallel computing i
3. The method for dynamically scheduling the tasks among the devices in the heterogeneous collaborative parallel computing based on the OpenCL as recited in claim 1, wherein the second block size is 2 xw/n, and S2 comprises the following steps:
s2.1, distributing the workload of the second block to each computing device participating in the collaborative parallel computing according to the updated task division proportion;
s2.2, executing the workload distributed to each computing device;
s2.3, after each computing device finishes executing the respective workload, collecting the execution time of each computing device, calculating the relative execution speed of each computing device, and updating the task division ratio according to the obtained relative execution speed;
s2.4, calculating the current collaborative parallel execution speed;
s2.5, adjusting the size of the next block according to the current collaborative parallel execution speed obtained in the S2.4, and determining the workload required to be completed in the next step; determining whether the size of the next block is the multiplication, multiplication or maintenance compared with the size of the current block by comparing the current coordinated parallel execution speed of the previous block and the current coordinated parallel execution speed of the current block and comparing the size of the previous block and the size of the current block;
and S2.6, updating the completed total workload and the completed residual workload, if the residual workload is 0, completing the calculation task, and if the residual workload is not 0, performing S3.
4. The method for dynamically scheduling the tasks among the devices in the OpenCL-based heterogeneous cooperative parallel computing, according to claim 1, wherein in S3:
in each iteration, if device D is calculated i Is an accelerator, then at computing device D i Before executing the current block according to task divisionPartial proportion uploading part of data of current block from host computer end to computing device D i When computing device D i Slave computing device D in task division ratio after ending execution of current block i Downloading a part of processed data of the current block to a host end; and after the current block is processed, adjusting the size of the next block according to the dynamic change of the cooperative parallel execution speed and the workload, wherein the maximum size of the next block should not exceed the residual workload.
5. The method for dynamically scheduling the tasks among the devices in the heterogeneous collaborative parallel computing based on the OpenCL as claimed in claim 4, wherein in S3:
calculating a difference between the size of the next block and the remaining workload of the current block in each iteration, and if the difference is less than or equal to 0.5 times the remaining workload of the current block, taking the remaining workload of the current block as the size of the next block; otherwise the size of the next block remains unchanged.
CN201911203540.9A 2019-11-29 2019-11-29 OpenCL-based dynamic task scheduling method between heterogeneous cooperative parallel computing devices Active CN111090508B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911203540.9A CN111090508B (en) 2019-11-29 2019-11-29 OpenCL-based dynamic task scheduling method between heterogeneous cooperative parallel computing devices

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911203540.9A CN111090508B (en) 2019-11-29 2019-11-29 OpenCL-based dynamic task scheduling method between heterogeneous cooperative parallel computing devices

Publications (2)

Publication Number Publication Date
CN111090508A CN111090508A (en) 2020-05-01
CN111090508B true CN111090508B (en) 2023-04-14

Family

ID=70393336

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911203540.9A Active CN111090508B (en) 2019-11-29 2019-11-29 OpenCL-based dynamic task scheduling method between heterogeneous cooperative parallel computing devices

Country Status (1)

Country Link
CN (1) CN111090508B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116048742B (en) * 2022-05-30 2023-11-07 荣耀终端有限公司 Data processing method and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062249A (en) * 2017-12-11 2018-05-22 成都博睿德科技有限公司 High in the clouds data allocation schedule method based on big data
DE102017109239A1 (en) * 2017-04-28 2018-10-31 Ilnumerics Gmbh COMPUTER IMPLEMENTED PROCESS, COMPUTER READABLE MEDIA AND HETEROGICAL COMPUTER SYSTEM
CN109542596A (en) * 2018-10-22 2019-03-29 西安交通大学 A kind of Scheduling Framework based on OpenCL kernel tasks

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130339978A1 (en) * 2012-06-13 2013-12-19 Advanced Micro Devices, Inc. Load balancing for heterogeneous systems

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102017109239A1 (en) * 2017-04-28 2018-10-31 Ilnumerics Gmbh COMPUTER IMPLEMENTED PROCESS, COMPUTER READABLE MEDIA AND HETEROGICAL COMPUTER SYSTEM
CN108062249A (en) * 2017-12-11 2018-05-22 成都博睿德科技有限公司 High in the clouds data allocation schedule method based on big data
CN109542596A (en) * 2018-10-22 2019-03-29 西安交通大学 A kind of Scheduling Framework based on OpenCL kernel tasks

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
孟祥宾 ; 隋志强 ; 王修银 ; 唐祥功 ; 段疾病 ; .地震处理多核异构并行计算通用框架研究.油气地球物理.2014,(02),全文. *
贾海鹏 ; 张云泉 ; 徐建良 ; .基于OpenCL的图像积分图算法优化研究.计算机科学.2013,(02),全文. *

Also Published As

Publication number Publication date
CN111090508A (en) 2020-05-01

Similar Documents

Publication Publication Date Title
CN109814986B (en) Task parallel processing method, storage medium, computer equipment, device and system
Lastovetsky et al. Model-based optimization of EULAG kernel on Intel Xeon Phi through load imbalancing
CN105487838A (en) Task-level parallel scheduling method and system for dynamically reconfigurable processor
Menon et al. Automated load balancing invocation based on application characteristics
CN101639788B (en) Multi-core parallel method for continuous system simulation based on TBB threading building blocks
Shetti et al. Optimization of the HEFT algorithm for a CPU-GPU environment
Bosilca et al. Performance portability of a GPU enabled factorization with the DAGuE framework
Lai et al. Accelerating Strassen-Winograd's matrix multiplication algorithm on GPUs
Acosta et al. Towards the dynamic load balancing on heterogeneous multi-GPU systems
Clarke et al. Fupermod: A framework for optimal data partitioning for parallel scientific applications on dedicated heterogeneous hpc platforms
CN108470211B (en) Method and device for realizing convolution calculation and computer storage medium
Agullo et al. Bridging the gap between performance and bounds of cholesky factorization on heterogeneous platforms
CN111090508B (en) OpenCL-based dynamic task scheduling method between heterogeneous cooperative parallel computing devices
Alonso et al. Experimental study of six different implementations of parallel matrix multiplication on heterogeneous computational clusters of multicore processors
Posner et al. Transparent resource elasticity for task-based cluster environments with work stealing
Ciorba et al. Dynamic multi phase scheduling for heterogeneous clusters
CN114138440A (en) Operator execution device, operator scheduling device, method and chip
Ilic et al. Simultaneous multi-level divisible load balancing for heterogeneous desktop systems
Christou et al. Earth system modelling on system-level heterogeneous architectures: EMAC (version 2.42) on the Dynamical Exascale Entry Platform (DEEP)
Kunzman et al. Programming heterogeneous systems
Nesi et al. Communication-aware load balancing of the LU factorization over heterogeneous clusters
Gharajeh et al. Heuristic-based task-to-thread mapping in multi-core processors
Shao et al. Modeling the Cost of Redistribution in Scheduling.
Biswas et al. Portable parallel programming for the dynamic load balancing of unstructured grid applications
CN111221640B (en) GPU-CPU cooperative energy saving method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant