WO2017065629A1 - Task scheduler and method for scheduling a plurality of tasks - Google Patents

Task scheduler and method for scheduling a plurality of tasks Download PDF

Info

Publication number
WO2017065629A1
WO2017065629A1 PCT/RU2015/000664 RU2015000664W WO2017065629A1 WO 2017065629 A1 WO2017065629 A1 WO 2017065629A1 RU 2015000664 W RU2015000664 W RU 2015000664W WO 2017065629 A1 WO2017065629 A1 WO 2017065629A1
Authority
WO
WIPO (PCT)
Prior art keywords
task
cores
tasks
slow
fast
Prior art date
Application number
PCT/RU2015/000664
Other languages
French (fr)
Inventor
Mikhail Petrovich LEVIN
Alexander Vladimirovich SLESARENKO
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to CN201580083785.6A priority Critical patent/CN108139929B/en
Priority to PCT/RU2015/000664 priority patent/WO2017065629A1/en
Publication of WO2017065629A1 publication Critical patent/WO2017065629A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • G06F9/4887Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues involving deadlines, e.g. rate based, periodic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5094Allocation of resources, e.g. of the central processing unit [CPU] where the allocation takes into account power or heat criteria
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/501Performance criteria
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present invention relates to a task scheduler for scheduling a plurality of tasks on a multi-core processor and to a method for scheduling a plurality of tasks on a processor.
  • the present invention also relates to a processor and to a computer-readable storage medium storing program code, the program code comprising instructions for carrying out the above method.
  • Heterogeneous multi-core computing systems are widely used in networked mobile systems such as mobile phones, tablets and even subnotebook computers. These systems contain two types of processor cores: fast cores intended for high performance operation and low power cores intended for power aware operation.
  • the first set is sometimes also called hot set, pull of hot cores, pull of fast cores.
  • the second set comprises low performance cores with low power consumption and is also called cold set, pull of cold cores or pull of slow cores.
  • HMCCS Carrying out tasks on the set of slow cores instead of the set of fast cores allows reducing the overall power consumption. This is of particular importance for mobile systems because it allows prolonging the battery life in mobile systems without recharging.
  • the usual system software for operation of HMCCS comprises a compiler and a scheduler.
  • the compiler is responsible for creation of programs running on such devices and the scheduler is responsible for loading of such devices during run-time.
  • the main question in software development for these systems is what kind of core should be used for a program block or task in an HMCC system. In modern compilers this solution is done by the programmer.
  • Another approach consists in changing the affiliation of task or processes, or threads, or blocks of the program with sets of different type cores automatically on the scheduler level.
  • a lot of different techniques have been proposed.
  • Various types of approaches for optimizing usage of HMCCS have been proposed.
  • One direction is devoted to maximization of performance of HMCCS, and another direction is related with performance optimization inside established power consumption budget, and so on.
  • the objective of the present invention is to provide a task scheduler and a method for task scheduling, wherein the task scheduler and the method overcome one or more problems of the prior art.
  • an objective of the present invention can include increasing the efficiency of using computational systems with heterogeneous multi-core (HMC) architectures which comprises at least two types of cores.
  • HMC heterogeneous multi-core
  • a first aspect of the invention provides a task scheduler for scheduling a plurality of tasks on a multi-core processor comprising a set of slow cores and a set of fast cores, the task scheduler comprising:
  • timing unit configured to compare a slow core runtime of at least one candidate task that is not on the critical path with a fast core runtime of one or more critical path tasks
  • a task assigning unit configured to assign the candidate task to the set of fast cores if the slow core runtime of the candidate task is longer than a fast core runtime of the one or more critical path tasks, and otherwise to assign the candidate task to the set of slow cores.
  • a slow core runtime of a task is the runtime of the task on a core of the set of slow cores.
  • the slow core runtime can be an estimate of the runtime on the slow core runtime, in particular it can be an estimated minimum or maximum runtime on a core of the slow cores.
  • the fast core runtime can be defined correspondingly.
  • each application is considered as a set of tasks and a special task diagram describes this set of tasks, the hierarchy of tasks in the set and the sequence of task execution.
  • the one or more critical path tasks on one or more levels of the critical path that correspond to the candidate task can comprise the range of tasks on the critical path which are operating in the same time range with the candidate task.
  • the method of the first aspect ensures that the execution of the candidate task does not prolong the runtime of the entire program.
  • the method of the first aspect ensures that tasks are preferably assigned to the slow cores, thus saving energy consumption and leaving the set of fast cores available for the execution of more urgent tasks.
  • the task scheduler further comprises:
  • a graph construction unit configured to construct a task graph of the plurality of tasks
  • a path finding unit configured to determine the critical path of the task graph.
  • the task scheduler can have as an input the program code (which in embodiments can be in source code form or in compiled, binary form) and derive, using the graph construction unit and the path finding unit, the necessary information for scheduling the tasks of the program.
  • the program code which in embodiments can be in source code form or in compiled, binary form
  • the task scheduler of the first implementation can be configured to have as input a program code, which defines a plurality of tasks, and derive (as output) a scheduling for these tasks.
  • a task graph can comprise a set of vertexes connected by ribs.
  • the ribs are empty of latencies, because latencies are included into the duration of the appropriate tasks.
  • vertexes in the task diagram in contrast with task graph contains multiply data as follows: t ⁇ v), t 2 (v), pj(v) and p2(v).
  • t ⁇ v) denotes the duration of task v on fast set cores
  • t 2 (v) denotes the duration of task v on slow set cores
  • pi(v) denotes the power consumption of task v on fast set cores
  • p2(v) denotes the power consumption of task v on slow set cores.
  • the task scheduler can be configured to obtain the task graph and the critical path of the task graph as input from an external unit.
  • the task graph can be determined during compilation of the program.
  • the task scheduler further comprises a power computation unit configured to determine a power consumption gain of assigning a candidate task to the set of slow cores, wherein the task assigning unit is configured to assign candidate tasks in an order of decreasing power consumption gain.
  • the task scheduler itself is configured to determine the power consumption gain. This means that the task scheduler can be independent of other devices and has fewer requirements that other units providing information regarding the tasks to be executed.
  • the power computation unit is configured to determine the power consumption gain as the difference between a power consumption of the candidate task on the set of fast cores and a power consumption of the candidate task on the set of slow cores.
  • the task scheduler further comprises a preliminary execution unit configured to determine a slow core runtime and/or a fast core runtime of the candidate task by carrying out one or more preliminary runs of the candidate task.
  • the preliminary execution unit is configured to determine the slow core and/or fast core runtime before the execution of a program.
  • the task scheduler can be configured to determine the slow core and/or fast core runtime of the tasks of a program during the installation of the program.
  • a second aspect of the invention relates to a processor comprising a set of fast cores, a set of slow cores and a task scheduler according to the first aspect of the invention or one of its implementations.
  • the task scheduler can be integrated into the processor.
  • the task scheduler can be integrated into the hardware of the processor. This has the advantage that external components do not need to be modified in order to achieve the performance gain.
  • a third aspect of the invention relates to a method for scheduling a plurality of tasks on a processor comprising a set of fast cores and a set of slow cores, the method comprising:
  • the slow core runtime of the candidate task is longer than a fast core runtime of the one or more critical path tasks, assigning the task to the set of fast cores, otherwise assigning the task to the set of slow cores.
  • the methods according to the third aspect of the invention can be performed by the task scheduler according to the first aspect of the invention. Further features or implementations of the method according to the third aspect of the invention can perform the functionality of the task scheduler according to the first aspect of the invention and its different implementation forms.
  • the method further comprises initial steps of:
  • the method further comprises:
  • determining a power consumption gain of assigning the candidate task to the set of slow cores determining a power consumption gain of assigning the candidate task to the set of slow cores
  • the power consumption gain is determined as the difference between a power consumption of the candidate task on the set of fast cores and a power consumption of the candidate task on the set of slow cores.
  • the method further comprises an initial step of determining a slow core runtime and/or a fast core runtime of the candidate task by carrying out one or more preliminary runs of the candidate task.
  • the preliminary runs are carried out for collecting information on task execution time and latency by executing the candidate task on different sets of cores, and wherein the slow core runtime and/or the fast core runtime are determined based on the collected information.
  • the task scheduler can thus determine the required information by carrying out the preliminary runs. This can involve additional computation time, but can still lead to a reduction of overall computation time, in particular for long execution times of a program.
  • a fourth aspect of the invention refers to a computer-readable storage medium storing program code, the program code comprising instructions for carrying out the method of the third aspect or one of the implementations of the third aspect.
  • FIG 1 is a block diagram illustrating a task scheduler in accordance with an embodiment of the present invention
  • FIG 2 is a flow chart illustrating a method for scheduling a plurality of tasks in accordance with a further embodiment of the present invention
  • FIG 3 is a schematic diagram illustrating a method for scheduling a plurality of tasks in accordance with a further embodiment of the present invention
  • FIG. 4 is a schematic diagram illustrating a method for scheduling a plurality of tasks in accordance with a further embodiment of the present invention
  • FIG. 5 is a schematic diagram illustrating a method for scheduling a plurality of tasks in accordance with a further embodiment of the present invention
  • FIG. 6 is a schematic diagram illustrating a method for scheduling a plurality of tasks in accordance with a further embodiment of the present invention
  • FIG. 7 is a flow chart illustrating a method for scheduling a plurality of tasks in accordance with a further embodiment of the present invention.
  • FIG 1 is a block diagram illustrating a task scheduler 100 in accordance with an embodiment of the present invention.
  • the task scheduler 100 comprises a timing unit 110 and a task assigning unit 120. Further, the task scheduler 100 can optionally, as indicated with dashed lines in FIG 1 comprise a graph construction unit 130, a path finding unit 140, a power computation unit 150, and a preliminary computation unit 160.
  • the task scheduler 100 can be implemented as part of a processor (not shown in FIG 1) or can be implemented in a hardware device that is located outside the processor.
  • FIG 2 is a flow chart illustrating a method 200 for scheduling a plurality of tasks in accordance with a further embodiment of the present invention.
  • the method 200 comprises a step 210 of comparing a slow core runtime of a candidate task that is not on the critical path with a fast core runtime of one or more critical path tasks on one or more levels of the critical path that correspond to the candidate task.
  • the method comprises a further step 220 of, if the slow core runtime of the candidate task is longer than a fast core runtime of the one or more critical path tasks, assigning the task to the set of fast cores, otherwise assigning the task to the set of slow cores.
  • the method optionally further comprises three initial or preliminary steps: A first initial step 202 of construction a task graph of the plurality of tasks, a second initial step 204 of determining a critical path of the task graph, and a third initial step 206 of determining a slow core runtime and/or a fast core runtime of the candidate task by carrying out one or more preliminary runs of the candidate task.
  • the method steps are carried out in the order as shown in FIG. 2. However, in other embodiments of the invention, the method steps can be carried out in a different order.
  • FIG 3 is a schematic diagram which illustrates the problem addressed by the task scheduler and method of the present invention.
  • Shown in FIG 3 are a plurality of tasks, a first, second and third task 310, 320, 330 that are on a critical path 305, and a candidate task 340.
  • the tasks 310, 320, 330 are allocated to a set of fast cores 302.
  • the task scheduler should decide whether to assign it to the set of fast cores 302 or a set of slow cores 304.
  • the time of program execution corresponds to the longest path (critical path) through the task graph, evaluated on task execution times.
  • the performance of the program is the inverse value of the program execution time.
  • To maximize the performance of the program means minimizing the program execution time or minimizing the critical path of the task diagram.
  • a minimal value of a critical path corresponds to execution of tasks of critical path on cores of the fast set. All other tasks (not included into the critical path) should migrate among sets to minimize power consumption (this facility is denoted by the "?” sign on the diagram shown on FIG 3)
  • first task 410, fourth task 440, fifth task 450 and sixth task 460 are located on a critical path, indicated by dashed line 405, wherein the tasks are assigned to a set of fast or hot cores, 402.
  • a second task 420 and a third task 430 are located outside of the critical path, but on the same level as the fourth task 440, indicated as "Level 2" in FIG. 4.
  • Second task 420 and third task 430 are considered as candidate tasks in the following.
  • A, B, C, D, and E denotes the first task, the second task,the third task, the fourth task, and the fifth task, respectively.
  • the second task B and the third task C can be affiliated with the set of slow cores, without exceeding the total runtime, and hence, without the loss of performance if
  • the first inequality according to task diagram is valid only for tasks of the same level, namely Level 2, but the second inequality according to the task diagram shown on FIG 4 is valid for the tasks of the range of levels, namely Level 2 and Level 3, because the second task 420 (task B) operates not only on one level, but on a few.
  • the second and third tasks 420, 430 should be affiliated with the set of slow cores.
  • FIG 5 an example is shown, where a plurality of tasks 510, 540, 550, and
  • the migration of the third task 530 to the slow set of cores is available, but migration of the second task 520 is not available. In this case
  • the power consumption will decrease by the following value
  • Pprofit(Level2) p ⁇ C) - p 2 (C) .
  • FIG 6 shows a similar example, where a plurality of tasks 610, 640, 650, and
  • p proflt (Level2) Pl (B) - p 2 (B) + Pl (C) - p 2 (C)
  • the order of migration is not essential to get a better result in terms of minimization of power consumption. But for example, it is better that, if
  • the second task 620 migrates before the first, otherwise the third task 630 migrates the first.
  • FIG 7 is a flow chart of an example method for migrating tasks, wherein the task B is belonging to only one fixed level fixed among the set of fast cores and the set of slow cores.
  • a list of candidate tasks L is provided to the task scheduler.
  • the task scheduler sorts the list L in order of decreasing power consumption profit (computed e.g. as pi - p 2 ) .
  • the result is stored in an ordered list Lj.
  • a task D which is on a (e.g. previously determined) critical path is taken from the list and, in step 708, put "into the hot pull", i.e. assigned to the fast set of cores.
  • step 710 it is checked whether D is the last task on the data layer. If so, there are no more tasks to process and the method stops in step 722.
  • step 712 the method proceeds in step 712 and takes task B (the first task in ordered list L ⁇ ).
  • step 714 the condition
  • step 716 the method proceeds with step 716 and puts B into the "cold pull", i.e., assigns it to the set of slow cores. If the condition is not fulfilled, the method proceeds in step 718 and task B is put into the "hot pull", i.e., it is assigned to the set of fast cores.
  • step 720 it is checked whether task B is the last task in the ordered list Li. If so, the method ends in step 722. Otherwise, the method continues with step 724, taking task B as the next task in the ordered list Li .
  • D E, ...,S is the range of tasks of critical path which are operating in the same time range with task B.
  • embodiments of the present invention comprise mapping tasks to sets of cores. Preliminary can be performed to collect information of execution times on different type core and appropriate power consumptions. After that it is possible to construct task diagram, evaluate the critical path on this diagram corresponding to the maximal value of performance, split this diagram on levels and on each level solve the problem of migration of tasks that are not belonging to the critical path. Potentially, these can be assigned to the set of slow cores, thus reducing overall power consumption.
  • the method can comprise further steps.
  • heterogeneous multi-core computing system consists of Ci,c2 r en cores of the fast set type (with high energy consumption and high performance) and ck+j , ck+2 , c legally cores of the slow set type (with low energy consumption and low performance), totally n cores. Now let us consider how to bind tasks in complicated software with processor cores of different sets:
  • the task diagram is constructed.
  • An evaluation of the power consumption is provided according to the tasks affiliation with the sets of cores. A gain in power consumption can be reached if even only one task is affiliated with the slow set cores. If many tasks are affiliated with slow set cores the power consumption profit will be essentially greater.
  • Effects of a method in accordance with the present invention can include that a HMCCS performance is improved and/or the power consumption is decreased.
  • a method can solve an optimization problem in order to minimize total completion time of each particular application. This can include finding an optimal mapping of tasks to cores that will make completion time reach its potential minimum and simultaneously decrease the power consumption of HMCCS as much as it is possible.
  • Embodiments of the present invention can be used in a system with signal processors of SoC type in which the same software is running permanently. Thus, a particularly high power saving is achieved.

Abstract

The present invention discloses a task scheduler for scheduling a plurality of tasks on a multi-core processor comprising a set of slow cores and a set of fast cores, the task scheduler comprising: a timing unit configured compare a slow core runtime of at least one candidate task that is not on the critical path with a fast core runtime of one or more critical path tasks on one or more levels of the critical path that correspond to the candidate task, and a task assigning unit configured to assign the candidate task to the set of fast cores if the slow core runtime of the candidate task is longer than a fast core runtime of the one or more critical path tasks, and otherwise to assign the candidate task to the set of slow cores.

Description

TASK SCHEDULER AND METHOD FOR SCHEDULING A
PLURALITY OF TASKS
TECHNICAL FIELD
The present invention relates to a task scheduler for scheduling a plurality of tasks on a multi-core processor and to a method for scheduling a plurality of tasks on a processor.
The present invention also relates to a processor and to a computer-readable storage medium storing program code, the program code comprising instructions for carrying out the above method.
BACKGROUND
Heterogeneous multi-core computing systems (HMCCS) are widely used in networked mobile systems such as mobile phones, tablets and even subnotebook computers. These systems contain two types of processor cores: fast cores intended for high performance operation and low power cores intended for power aware operation. The first set is sometimes also called hot set, pull of hot cores, pull of fast cores. The second set comprises low performance cores with low power consumption and is also called cold set, pull of cold cores or pull of slow cores.
Carrying out tasks on the set of slow cores instead of the set of fast cores allows reducing the overall power consumption. This is of particular importance for mobile systems because it allows prolonging the battery life in mobile systems without recharging. The usual system software for operation of HMCCS comprises a compiler and a scheduler. The compiler is responsible for creation of programs running on such devices and the scheduler is responsible for loading of such devices during run-time. The main question in software development for these systems is what kind of core should be used for a program block or task in an HMCC system. In modern compilers this solution is done by the programmer.
Another approach consists in changing the affiliation of task or processes, or threads, or blocks of the program with sets of different type cores automatically on the scheduler level. In this context, a lot of different techniques have been proposed. Various types of approaches for optimizing usage of HMCCS have been proposed. One direction is devoted to maximization of performance of HMCCS, and another direction is related with performance optimization inside established power consumption budget, and so on. However, there is still a need for a more efficient execution of programs on HMCCS.
SUMMARY OF THE INVENTION
The objective of the present invention is to provide a task scheduler and a method for task scheduling, wherein the task scheduler and the method overcome one or more problems of the prior art.
In particular, an objective of the present invention can include increasing the efficiency of using computational systems with heterogeneous multi-core (HMC) architectures which comprises at least two types of cores.
A first aspect of the invention provides a task scheduler for scheduling a plurality of tasks on a multi-core processor comprising a set of slow cores and a set of fast cores, the task scheduler comprising:
a timing unit configured to compare a slow core runtime of at least one candidate task that is not on the critical path with a fast core runtime of one or more critical path tasks, and
a task assigning unit configured to assign the candidate task to the set of fast cores if the slow core runtime of the candidate task is longer than a fast core runtime of the one or more critical path tasks, and otherwise to assign the candidate task to the set of slow cores.
In general, a slow core runtime of a task is the runtime of the task on a core of the set of slow cores. The slow core runtime can be an estimate of the runtime on the slow core runtime, in particular it can be an estimated minimum or maximum runtime on a core of the slow cores. The fast core runtime can be defined correspondingly.
In embodiments of the invention, each application is considered as a set of tasks and a special task diagram describes this set of tasks, the hierarchy of tasks in the set and the sequence of task execution.
Each task diagram is divided into levels in hierarchical order. Each lower level corresponds to tasks which are dependent on data only of higher level tasks. The runtimes of tasks are compared with each other on a same level basis. That is, the execution time of a task not belonging to the critical path is compared with the execution times of tasks on the critical path within the same level in the task diagram. With other words, the timing unit is configured to compare a slow core runtime of at least one candidate task that is not on the critical path with a fast core runtime of one or more critical path tasks on one or more levels of the critical path that correspond to the candidate task.
The one or more critical path tasks, whose runtimes are compared with the runtime of the candidate task, not being on the critical path, are the tasks on the same levels of the critical path than the candidate task. With other words, the one or more critical path tasks are on one or more levels of the critical path, the levels corresponding to the level of the candidate task.
The one or more critical path tasks on one or more levels of the critical path that correspond to the candidate task can comprise the range of tasks on the critical path which are operating in the same time range with the candidate task.
By assigning the candidate task to the set of fast cores if the slow core runtime of the candidate task is longer than a fast core runtime of the one or more critical path tasks, the method of the first aspect ensures that the execution of the candidate task does not prolong the runtime of the entire program.
On the other hand, by assigning the candidate task to the set of slow cores if the slow core runtime of the candidate task is no longer than a fast core runtime of the one or more critical path tasks, the method of the first aspect ensures that tasks are preferably assigned to the slow cores, thus saving energy consumption and leaving the set of fast cores available for the execution of more urgent tasks.
In a first implementation of the apparatus according to the first aspect, the task scheduler further comprises:
a graph construction unit configured to construct a task graph of the plurality of tasks, and
- a path finding unit configured to determine the critical path of the task graph.
Thus, the task scheduler can have as an input the program code (which in embodiments can be in source code form or in compiled, binary form) and derive, using the graph construction unit and the path finding unit, the necessary information for scheduling the tasks of the program.
In other words, the task scheduler of the first implementation can be configured to have as input a program code, which defines a plurality of tasks, and derive (as output) a scheduling for these tasks.
A task graph can comprise a set of vertexes connected by ribs. In a preferred embodiment, the ribs are empty of latencies, because latencies are included into the duration of the appropriate tasks. Also vertexes in the task diagram in contrast with task graph contains multiply data as follows: t^v), t2(v), pj(v) and p2(v). Here t^v) denotes the duration of task v on fast set cores, t2(v) denotes the duration of task v on slow set cores, pi(v) denotes the power consumption of task v on fast set cores, p2(v) denotes the power consumption of task v on slow set cores.
In alternative embodiments, also in accordance with the present invention, the task scheduler can be configured to obtain the task graph and the critical path of the task graph as input from an external unit. For example, the task graph can be determined during compilation of the program.
In a second implementation of the apparatus according to the first aspect, the task scheduler further comprises a power computation unit configured to determine a power consumption gain of assigning a candidate task to the set of slow cores, wherein the task assigning unit is configured to assign candidate tasks in an order of decreasing power consumption gain.
Thus, the task scheduler itself is configured to determine the power consumption gain. This means that the task scheduler can be independent of other devices and has fewer requirements that other units providing information regarding the tasks to be executed.
In a third implementation of the apparatus according to the first aspect, the power computation unit is configured to determine the power consumption gain as the difference between a power consumption of the candidate task on the set of fast cores and a power consumption of the candidate task on the set of slow cores.
This represents a particularly simple and efficient way of computing a power consumption gain.
In a fourth implementation of the apparatus according to the first aspect, the task scheduler further comprises a preliminary execution unit configured to determine a slow core runtime and/or a fast core runtime of the candidate task by carrying out one or more preliminary runs of the candidate task.
This represents a practical way of computing the power consumption gain. In embodiments of the invention, the preliminary execution unit is configured to determine the slow core and/or fast core runtime before the execution of a program. For example, the task scheduler can be configured to determine the slow core and/or fast core runtime of the tasks of a program during the installation of the program.
A second aspect of the invention relates to a processor comprising a set of fast cores, a set of slow cores and a task scheduler according to the first aspect of the invention or one of its implementations.
According to this aspect, the task scheduler can be integrated into the processor. For example, the task scheduler can be integrated into the hardware of the processor. This has the advantage that external components do not need to be modified in order to achieve the performance gain.
A third aspect of the invention relates to a method for scheduling a plurality of tasks on a processor comprising a set of fast cores and a set of slow cores, the method comprising:
comparing a slow core runtime of a candidate task that is not on a critical path with a fast core runtime of one or more critical path tasks, and
if the slow core runtime of the candidate task is longer than a fast core runtime of the one or more critical path tasks, assigning the task to the set of fast cores, otherwise assigning the task to the set of slow cores.
The methods according to the third aspect of the invention can be performed by the task scheduler according to the first aspect of the invention. Further features or implementations of the method according to the third aspect of the invention can perform the functionality of the task scheduler according to the first aspect of the invention and its different implementation forms.
In a first implementation of the method of the third aspect, the method further comprises initial steps of:
constructing a task graph of the plurality of tasks, and
determining the critical path of the task graph.
Thus, it is possible that the task graph is not previously determined, but determined e.g. by the task scheduler. If the structure of the task graph depends e.g. on some decisions that are done after compile-time, the method can determine the task graph at a later point, e.g. at runtime. In a second implementation of the method of the third aspect, the method further comprises:
for at least two candidate tasks: determining a power consumption gain of assigning the candidate task to the set of slow cores, and
- assigning the at least two tasks in an order of decreasing power consumption gain.
In a third implementation of the method of the third aspect, the power consumption gain is determined as the difference between a power consumption of the candidate task on the set of fast cores and a power consumption of the candidate task on the set of slow cores.
In a fourth implementation of the method of the third aspect, the method further comprises an initial step of determining a slow core runtime and/or a fast core runtime of the candidate task by carrying out one or more preliminary runs of the candidate task.
In a fifth implementation of the method of the third aspect, the preliminary runs are carried out for collecting information on task execution time and latency by executing the candidate task on different sets of cores, and wherein the slow core runtime and/or the fast core runtime are determined based on the collected information.
If the information on task execution time and latency are not provided (e.g. by the compiler), the task scheduler can thus determine the required information by carrying out the preliminary runs. This can involve additional computation time, but can still lead to a reduction of overall computation time, in particular for long execution times of a program.
A fourth aspect of the invention refers to a computer-readable storage medium storing program code, the program code comprising instructions for carrying out the method of the third aspect or one of the implementations of the third aspect.
BRIEF DESCRIPTION OF THE DRAWINGS
To illustrate the technical features of embodiments of the present invention more clearly, the accompanying drawings provided for describing the embodiments are introduced briefly in the following. The accompanying drawings in the following description are merely some embodiments of the present invention, but modifications on these embodiments are possible without departing from the scope of the present invention as defined in the claims.
FIG 1 is a block diagram illustrating a task scheduler in accordance with an embodiment of the present invention,
FIG 2 is a flow chart illustrating a method for scheduling a plurality of tasks in accordance with a further embodiment of the present invention,
FIG 3 is a schematic diagram illustrating a method for scheduling a plurality of tasks in accordance with a further embodiment of the present invention, FIG. 4 is a schematic diagram illustrating a method for scheduling a plurality of tasks in accordance with a further embodiment of the present invention,
FIG. 5 is a schematic diagram illustrating a method for scheduling a plurality of tasks in accordance with a further embodiment of the present invention, FIG. 6 is a schematic diagram illustrating a method for scheduling a plurality of tasks in accordance with a further embodiment of the present invention, and
FIG. 7 is a flow chart illustrating a method for scheduling a plurality of tasks in accordance with a further embodiment of the present invention.
Detailed Description of the Embodiments
FIG 1 is a block diagram illustrating a task scheduler 100 in accordance with an embodiment of the present invention. The task scheduler 100 comprises a timing unit 110 and a task assigning unit 120. Further, the task scheduler 100 can optionally, as indicated with dashed lines in FIG 1 comprise a graph construction unit 130, a path finding unit 140, a power computation unit 150, and a preliminary computation unit 160.
In embodiments of the invention, the task scheduler 100 can be implemented as part of a processor (not shown in FIG 1) or can be implemented in a hardware device that is located outside the processor.
FIG 2 is a flow chart illustrating a method 200 for scheduling a plurality of tasks in accordance with a further embodiment of the present invention.
The method 200 comprises a step 210 of comparing a slow core runtime of a candidate task that is not on the critical path with a fast core runtime of one or more critical path tasks on one or more levels of the critical path that correspond to the candidate task.
The method comprises a further step 220 of, if the slow core runtime of the candidate task is longer than a fast core runtime of the one or more critical path tasks, assigning the task to the set of fast cores, otherwise assigning the task to the set of slow cores.
As shown with dashed lines in FIG. 2, the method optionally further comprises three initial or preliminary steps: A first initial step 202 of construction a task graph of the plurality of tasks, a second initial step 204 of determining a critical path of the task graph, and a third initial step 206 of determining a slow core runtime and/or a fast core runtime of the candidate task by carrying out one or more preliminary runs of the candidate task.
In embodiments of the invention, the method steps are carried out in the order as shown in FIG. 2. However, in other embodiments of the invention, the method steps can be carried out in a different order.
FIG 3 is a schematic diagram which illustrates the problem addressed by the task scheduler and method of the present invention.
Shown in FIG 3 are a plurality of tasks, a first, second and third task 310, 320, 330 that are on a critical path 305, and a candidate task 340. The tasks 310, 320, 330 are allocated to a set of fast cores 302. For the candidate task 340, the task scheduler should decide whether to assign it to the set of fast cores 302 or a set of slow cores 304.
Here t; is the time of execution on cores of i-type, (wherein i is 1 or 2) and pi is the power consumption on cores of i-type (i=l for fast cores and i=2 for slow cores).
The time of program execution corresponds to the longest path (critical path) through the task graph, evaluated on task execution times. The performance of the program is the inverse value of the program execution time. To maximize the performance of the program means minimizing the program execution time or minimizing the critical path of the task diagram. A minimal value of a critical path corresponds to execution of tasks of critical path on cores of the fast set. All other tasks (not included into the critical path) should migrate among sets to minimize power consumption (this facility is denoted by the "?" sign on the diagram shown on FIG 3) Now let us consider the problem of maximization of performance and minimization of power consumption. We will solve this problem step by step. At the first step we construct the solution with maximal performance and at the second step keeping maximal performance value, we will minimize power consumption.
Let us assume we found the critical path K, e.g. as determined by a critical path finding unit as described above. After that, all tasks are divided or organized into levels with respect to tasks of the critical path. Each lower level corresponds to tasks which are dependent on data only of higher level tasks. This is illustrated by the example shown in FIG 4.
In FIG. 4, first task 410, fourth task 440, fifth task 450 and sixth task 460 are located on a critical path, indicated by dashed line 405, wherein the tasks are assigned to a set of fast or hot cores, 402. A second task 420 and a third task 430 are located outside of the critical path, but on the same level as the fourth task 440, indicated as "Level 2" in FIG. 4. Second task 420 and third task 430 are considered as candidate tasks in the following.
When searching the critical path 405, all operations are provided on the set of fast cores, because only by this way one can get the maximal performance of considering HMCCS.
Then the critical path is fixed and all tasks belonging to it are affiliated with the set of fast cores. Let us consider tasks on intermediate levels that are not belonging to the critical path. Since the goal now is to minimize power consumption, it is checked:
Is it possible to affiliate the second task 420 and the third task 430 with the set of slow cores without extending the total runtime.
In the following equations, A, B, C, D, and E denotes the first task, the second task,the third task, the fourth task, and the fifth task, respectively.
The second task B and the third task C can be affiliated with the set of slow cores, without exceeding the total runtime, and hence, without the loss of performance if
t2(C)≤ti(D)
and
t2(B) < t,(D) + t,(E) .
The first inequality according to task diagram is valid only for tasks of the same level, namely Level 2, but the second inequality according to the task diagram shown on FIG 4 is valid for the tasks of the range of levels, namely Level 2 and Level 3, because the second task 420 (task B) operates not only on one level, but on a few.
In the case presented in FIG 4, the second and third tasks 420, 430 should be affiliated with the set of slow cores.
In FIG 5 an example is shown, where a plurality of tasks 510, 540, 550, and
560 are located on a critical path 505 and for a second and third task 520, 530 it needs to be decided whether to assign these to the set of slow cores 504 or the set of fast cores 502. The potential placement of the second and third task on the set of cold cores is indicated with reference numbers 520' and 530'.
In the example of FIG 5, the migration of the third task 530 to the slow set of cores is available, but migration of the second task 520 is not available. In this case
Figure imgf000011_0001
and
t2(C) > t1(D) .
Since the critical path is not changing the performance keeps its maximal value.
Otherwise migration of any task into the slow set leads to the decreasing or minimization of power consumption. In this case, the power consumption will decrease by the following value
Pprofit(Level2) = p^C) - p2(C) .
FIG 6 shows a similar example, where a plurality of tasks 610, 640, 650, and
660 are located on a critical path 605 and for a second and third task 620, 630 it needs to be decided whether to assign these to the set of slow cores 604 or the set of fast cores 602. The potential placement of the second and third task on the set of cold cores is indicated with reference numbers 620' and 630'. In this case the following inequalities are valid
t2(B)≤t1(D)
and
t2(C)≤ti(D) + ti(E) .
Here the second inequality is also valid in the range of levels 2 and 3.
In this case the effect of decreasing power consumption will be greater than in the previous example and be equal to the following value
pproflt(Level2) = Pl(B) - p2(B) + Pl(C) - p2(C) The order of migration is not essential to get a better result in terms of minimization of power consumption. But for example, it is better that, if
p1(B) - p2(B)≥p1(C) - p2(C) ,
then the second task 620 migrates before the first, otherwise the third task 630 migrates the first.
FIG 7 is a flow chart of an example method for migrating tasks, wherein the task B is belonging to only one fixed level fixed among the set of fast cores and the set of slow cores.
In a first step 702, a list of candidate tasks L is provided to the task scheduler. In a second step 704, the task scheduler sorts the list L in order of decreasing power consumption profit (computed e.g. as pi - p2). The result is stored in an ordered list Lj.
In a third step 706, a task D, which is on a (e.g. previously determined) critical path is taken from the list and, in step 708, put "into the hot pull", i.e. assigned to the fast set of cores.
In step 710, it is checked whether D is the last task on the data layer. If so, there are no more tasks to process and the method stops in step 722.
If there are more tasks to process, the method proceeds in step 712 and takes task B (the first task in ordered list L\). In step 714, the condition
ti(D)≥t2(B)
is checked. If the condition is fulfilled, the method proceeds with step 716 and puts B into the "cold pull", i.e., assigns it to the set of slow cores. If the condition is not fulfilled, the method proceeds in step 718 and task B is put into the "hot pull", i.e., it is assigned to the set of fast cores.
In step 720 it is checked whether task B is the last task in the ordered list Li. If so, the method ends in step 722. Otherwise, the method continues with step 724, taking task B as the next task in the ordered list Li .
If task B is belonging to a few levels the control inequality in this algorithm should be changed to the more complicated inequality as follows
t1(D) + t1(E) + ... + t1(S) > t2(B) .
Here D, E, ...,S is the range of tasks of critical path which are operating in the same time range with task B.
The foregoing descriptions are only implementation manners of the present invention, the protection of the scope of the present invention is not limited to this.
Any variations or replacements can be easily made through person skilled in the art.
Therefore, the protection scope of the present invention should be subject to the protection scope of the attached claims.
To summarize, embodiments of the present invention comprise mapping tasks to sets of cores. Preliminary can be performed to collect information of execution times on different type core and appropriate power consumptions. After that it is possible to construct task diagram, evaluate the critical path on this diagram corresponding to the maximal value of performance, split this diagram on levels and on each level solve the problem of migration of tasks that are not belonging to the critical path. Potentially, these can be assigned to the set of slow cores, thus reducing overall power consumption.
In embodiments of the invention, the method can comprise further steps. Let us consider heterogeneous multi-core computing system consists of Ci,c2r en cores of the fast set type (with high energy consumption and high performance) and ck+j , ck+2 , c„ cores of the slow set type (with low energy consumption and low performance), totally n cores. Now let us consider how to bind tasks in complicated software with processor cores of different sets:
1. The static monitoring of HMCCS is provided. In result we evaluate the time of execution of all tasks on different cores tl and t2 and the appropriate values of power consumptions.
2. The task diagram is constructed.
3. We evaluate the critical path on the task diagram suggesting that all evaluations are provided on the fast set cores. This defines the maximal performance on the considering HMCCS.
4. We divide the task diagram on levels starting from initial node down to the last node.
5. On all intermediate levels we solve migration problem for tasks not belonging to the critical path at the data level. Tasks of critical path always are affiliated with fast set cores.
6. An evaluation of the power consumption is provided according to the tasks affiliation with the sets of cores. A gain in power consumption can be reached if even only one task is affiliated with the slow set cores. If many tasks are affiliated with slow set cores the power consumption profit will be essentially greater.
7. All tasks are executed according to the migration among fast and slow set
cores.
Effects of a method in accordance with the present invention can include that a HMCCS performance is improved and/or the power consumption is decreased.
Here we think about one of the most common goals - minimal completion time (another one - maximal throughput - is not considered here). A method can solve an optimization problem in order to minimize total completion time of each particular application. This can include finding an optimal mapping of tasks to cores that will make completion time reach its potential minimum and simultaneously decrease the power consumption of HMCCS as much as it is possible.
Furthermore, with a task scheduler or method in accordance with the present invention there is considerable less effort for the developer to develop parallel applications for heterogeneous hardware. This results in making the process of developing parallel application for HMCCS hardware easier. Finally, it leads to decrease in labor costs of either software developing or effective porting existing code to specific architecture.
Embodiments of the present invention can be used in a system with signal processors of SoC type in which the same software is running permanently. Thus, a particularly high power saving is achieved.
The system of exploiting heterogeneous multi-core architecture with functionally different performance and power consumption cores. Aspects of the present invention can involve:
• Preliminary static estimation of time of execution and power consumption on a set of fast cores and a set of slow cores.
• Usage of a task diagram for designing an performance-energy efficient scheduler for heterogeneous multi-core devices
· Evaluating the critical path in on a task diagram
• Leveling the task diagram to provide the maximal profit in power
consumption Evaluation of power consumption on obtained task distribution among sets of cores according to the task diagram to minimize power consumption and simultaneously keep the value of maximal performance.

Claims

Task scheduler (100) for scheduling a plurality of tasks (310-340; 410-460; 510-560; 610-660) on a multi-core processor comprising a set of slow cores (304; 504; 604) and a set of fast cores (302; 402; 502; 602), the task scheduler comprising:
a timing unit (110) configured to compare a slow core runtime of at least one candidate task (340; 420, 430; 520, 530; 620, 630) that is not on a critical path (305; 405; 505; 605) with a fast core runtime of one or more critical path tasks, and
a task assigning unit (120) configured to assign the candidate task to the set of fast cores if the slow core runtime of the candidate task is longer than the fast core runtime of the one or more critical path tasks, and otherwise to assign the candidate task to the set of slow cores.
2. The task scheduler of claim 1, further comprising:
a graph construction unit (130) configured to construct a task graph of the plurality of tasks,
a path finding unit (140) configured to determine the critical path of the task graph.
The task scheduler of claim 1 or 2, further comprising a power computation unit (150) configured to determine a power consumption gain of assigning a candidate task to the set of slow cores, wherein the task assigning unit is configured to assign candidate tasks in an order of decreasing power consumption gain.
The task scheduler of one of the previous claims, wherein the power computation unit is configured to determine the power consumption gain as the difference between a power consumption of the candidate task on the set of fast cores and a power consumption of the candidate task on the set of slow cores. The task scheduler of one of the previous claims, further comprising a preliminary execution unit (160) configured to determine a slow core runtime and/or a fast core runtime of the candidate task by carrying out one or more preliminary runs of the candidate task.
A processor, comprising a set of fast cores (304; 504; 604), a set of slow cores (302; 402; 502; 602) and a task scheduler (100) according to one of claims 1 to 5.
Method for scheduling a plurality of tasks (310-340; 410-460; 510-560; 610- 660) on a processor comprising a set of fast cores (302; 402; 502; 602) and a set of slow cores (304; 504; 604), the method comprising:
comparing (210) a slow core runtime of a candidate task that is not on a critical path (305; 405; 505; 605) with a fast core runtime of one or more critical path tasks, and
if the slow core runtime of the candidate task is longer than the fast core runtime of the one or more critical path tasks, assigning (220) the task to the set of fast cores, otherwise assigning the task to the set of slow cores.
The method of claim 7, further comprising initial steps of:
constructing (202) a task graph of the plurality of tasks, and determining (204) a critical path of the task graph.
The method of claim 7 or 8, further comprising:
for at least two candidate tasks: determining a power consumption gain of assigning the candidate task to the set of slow cores, and
assigning the at least two tasks in an order of decreasing power consumption gain.
The method of claim 7 to 9, wherein the power consumption gain is determined as the difference between a power consumption of the candidate task on the set of fast cores and a power consumption of the candidate task on the set of slow cores.
The method of one of claims 7 to 10, further comprising an initial step of determining (206) a slow core runtime and/or a fast core runtime of the candidate task by carrying out one or more preliminary runs of the candidate task.
The method of claim 11, wherein the preliminary runs are carried out for collecting information on task execution time and latency by executing the candidate task on different set of cores, and wherein the slow core runtime and/or the fast core runtime are determined based on the collected information. A computer-readable storage medium storing program code, the program code comprising instructions for carrying out the method of one of claims 7 to 12.
PCT/RU2015/000664 2015-10-12 2015-10-12 Task scheduler and method for scheduling a plurality of tasks WO2017065629A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201580083785.6A CN108139929B (en) 2015-10-12 2015-10-12 Task scheduling apparatus and method for scheduling a plurality of tasks
PCT/RU2015/000664 WO2017065629A1 (en) 2015-10-12 2015-10-12 Task scheduler and method for scheduling a plurality of tasks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/RU2015/000664 WO2017065629A1 (en) 2015-10-12 2015-10-12 Task scheduler and method for scheduling a plurality of tasks

Publications (1)

Publication Number Publication Date
WO2017065629A1 true WO2017065629A1 (en) 2017-04-20

Family

ID=55967386

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/RU2015/000664 WO2017065629A1 (en) 2015-10-12 2015-10-12 Task scheduler and method for scheduling a plurality of tasks

Country Status (2)

Country Link
CN (1) CN108139929B (en)
WO (1) WO2017065629A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170371720A1 (en) * 2016-06-23 2017-12-28 Advanced Micro Devices, Inc. Multi-processor apparatus and method of detection and acceleration of lagging tasks
CN111198757A (en) * 2020-01-06 2020-05-26 北京小米移动软件有限公司 CPU kernel scheduling method, CPU kernel scheduling device and storage medium
CN114691326A (en) * 2022-03-16 2022-07-01 中国船舶重工集团公司第七一一研究所 Multi-task scheduling method, multi-core processor and machine-side monitoring system

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102641520B1 (en) * 2018-11-09 2024-02-28 삼성전자주식회사 System on chip including multi-core processor and task scheduling method thereof

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004171234A (en) * 2002-11-19 2004-06-17 Toshiba Corp Task allocation method in multiprocessor system, task allocation program and multiprocessor system
US20070143759A1 (en) * 2005-12-15 2007-06-21 Aysel Ozgur Scheduling and partitioning tasks via architecture-aware feedback information
US8959370B2 (en) * 2008-10-03 2015-02-17 University Of Sydney Scheduling an application for performance on a heterogeneous computing system
US8887163B2 (en) * 2010-06-25 2014-11-11 Ebay Inc. Task scheduling based on dependencies and resources
CN102193826B (en) * 2011-05-24 2012-12-19 哈尔滨工程大学 Method for high-efficiency task scheduling of heterogeneous multi-core processor
CN103399626B (en) * 2013-07-18 2016-01-20 国家电网公司 Towards Parallel application dispatching system and the method for the power-aware of hybrid compute environment
CN103473134B (en) * 2013-09-23 2016-08-17 哈尔滨工程大学 A kind of dependence task dispatching method of heterogeneous multi-nucleus processor
US20150121387A1 (en) * 2013-10-30 2015-04-30 Mediatek Inc. Task scheduling method for dispatching tasks based on computing power of different processor cores in heterogeneous multi-core system and related non-transitory computer readable medium
US9858115B2 (en) * 2013-10-30 2018-01-02 Mediatek Inc. Task scheduling method for dispatching tasks based on computing power of different processor cores in heterogeneous multi-core processor system and related non-transitory computer readable medium
US20170068574A1 (en) * 2014-02-25 2017-03-09 Hewlett Packard Enterprise Development Lp Multiple pools in a multi-core system
CN103984595A (en) * 2014-05-16 2014-08-13 哈尔滨工程大学 Isomerous CMP (Chip Multi-Processor) static state task scheduling method
CN104598310B (en) * 2015-01-23 2017-08-08 武汉理工大学 Low-power consumption scheduling method based on FPGA portion Dynamic Reconfigurable Technique Module Division
CN104849698B (en) * 2015-05-21 2017-04-05 中国人民解放军海军工程大学 A kind of radar signal method for parallel processing and system based on heterogeneous multi-core system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CATHERINE H GEBOTYS ET AL: "Power M inimization in Heterogeneous Processing", 1 January 1996 (1996-01-01), XP055277523, Retrieved from the Internet <URL:https://www.computer.org/csdl/proceedings/hicss/1996/7324/00/73240330.pdf> [retrieved on 20160602] *
SENG J S ET AL: "Reducing power with dynamic critical path information", MICROARCHITECTURE, 2001. MICRO-34. PROCEEDINGS. 34TH ACM/IEEE INTERNAT IONAL SYMPOSIUM ON DEC. 1-5, 2001, PISCATAWAY, NJ, USA,IEEE, 1 December 2001 (2001-12-01), pages 114 - 123, XP010583676, ISBN: 978-0-7965-1369-4, DOI: 10.1109/MICRO.2001.991110 *
YU-KWONG KWOK ET AL: "Static scheduling algorithms for allocating directed task graphs to multiprocessors", ACM COMPUTING SURVEYS, ACM, NEW YORK, NY, US, US, vol. 31, no. 4, 1 December 1999 (1999-12-01), pages 406 - 471, XP002461554, ISSN: 0360-0300, DOI: 10.1145/344588.344618 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170371720A1 (en) * 2016-06-23 2017-12-28 Advanced Micro Devices, Inc. Multi-processor apparatus and method of detection and acceleration of lagging tasks
US10592279B2 (en) * 2016-06-23 2020-03-17 Advanced Micro Devices, Inc. Multi-processor apparatus and method of detection and acceleration of lagging tasks
CN111198757A (en) * 2020-01-06 2020-05-26 北京小米移动软件有限公司 CPU kernel scheduling method, CPU kernel scheduling device and storage medium
CN111198757B (en) * 2020-01-06 2023-11-28 北京小米移动软件有限公司 CPU kernel scheduling method, CPU kernel scheduling device and storage medium
CN114691326A (en) * 2022-03-16 2022-07-01 中国船舶重工集团公司第七一一研究所 Multi-task scheduling method, multi-core processor and machine-side monitoring system

Also Published As

Publication number Publication date
CN108139929B (en) 2021-08-20
CN108139929A (en) 2018-06-08

Similar Documents

Publication Publication Date Title
US9811389B2 (en) Task assignment for processor cores based on a statistical power and frequency model
CN112306678B (en) Method and system for parallel processing of algorithms based on heterogeneous many-core processor
CN112396172A (en) Method and apparatus for managing power of deep learning accelerator system
US8752036B2 (en) Throughput-aware software pipelining for highly multi-threaded systems
US20150046679A1 (en) Energy-Efficient Run-Time Offloading of Dynamically Generated Code in Heterogenuous Multiprocessor Systems
US20170123775A1 (en) Compilation of application into multiple instruction sets for a heterogeneous processor
US8893104B2 (en) Method and apparatus for register spill minimization
US8166486B2 (en) Adjusting workload to accommodate speculative thread start-up cost
WO2017065629A1 (en) Task scheduler and method for scheduling a plurality of tasks
chul Jung et al. Dynamic code mapping for limited local memory systems
JP2018503184A (en) System and method for dynamic temporal power steering
JP6464982B2 (en) Parallelization method, parallelization tool, in-vehicle device
US10162679B2 (en) Method and system for assigning a computational block of a software program to cores of a multi-processor system
Padoin et al. Managing power demand and load imbalance to save energy on systems with heterogeneous CPU speeds
US10846086B2 (en) Method for managing computation tasks on a functionally asymmetric multi-core processor
Youn et al. A spill data aware memory assignment technique for improving power consumption of multimedia memory systems
EP2418582A2 (en) Apparatus and method for thread progress tracking using deterministic progress index
CN103593220A (en) OPENCL compilation
US11188315B1 (en) Method and apparatus for reusable and relative indexed register resource allocation in function calls
US10025639B2 (en) Energy efficient supercomputer job allocation
CN114610494A (en) Resource allocation method, electronic device and computer-readable storage medium
US20110296140A1 (en) RISC processor register expansion method
Massari et al. Harnessing performance variability: A HPC-oriented application scenario
KR102022972B1 (en) Runtime management apparatus for heterogeneous multi-processing system and method thereof
CN107408056B (en) Scheduling apparatus and method for dynamic loop-processor mapping

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15860017

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15860017

Country of ref document: EP

Kind code of ref document: A1