WO2017065629A1

WO2017065629A1 - Task scheduler and method for scheduling a plurality of tasks

Info

Publication number: WO2017065629A1
Application number: PCT/RU2015/000664
Authority: WO
Inventors: Mikhail Petrovich LEVIN; Alexander Vladimirovich SLESARENKO
Original assignee: Huawei Technologies Co., Ltd.
Priority date: 2015-10-12
Filing date: 2015-10-12
Publication date: 2017-04-20
Also published as: CN108139929B; CN108139929A

Abstract

The present invention discloses a task scheduler for scheduling a plurality of tasks on a multi-core processor comprising a set of slow cores and a set of fast cores, the task scheduler comprising: a timing unit configured compare a slow core runtime of at least one candidate task that is not on the critical path with a fast core runtime of one or more critical path tasks on one or more levels of the critical path that correspond to the candidate task, and a task assigning unit configured to assign the candidate task to the set of fast cores if the slow core runtime of the candidate task is longer than a fast core runtime of the one or more critical path tasks, and otherwise to assign the candidate task to the set of slow cores.

Description

TASK SCHEDULER AND METHOD FOR SCHEDULING A

PLURALITY OF TASKS

TECHNICAL FIELD

The present invention relates to a task scheduler for scheduling a plurality of tasks on a multi-core processor and to a method for scheduling a plurality of tasks on a processor.

The present invention also relates to a processor and to a computer-readable storage medium storing program code, the program code comprising instructions for carrying out the above method.

BACKGROUND

Heterogeneous multi-core computing systems (HMCCS) are widely used in networked mobile systems such as mobile phones, tablets and even subnotebook computers. These systems contain two types of processor cores: fast cores intended for high performance operation and low power cores intended for power aware operation. The first set is sometimes also called hot set, pull of hot cores, pull of fast cores. The second set comprises low performance cores with low power consumption and is also called cold set, pull of cold cores or pull of slow cores.

Carrying out tasks on the set of slow cores instead of the set of fast cores allows reducing the overall power consumption. This is of particular importance for mobile systems because it allows prolonging the battery life in mobile systems without recharging. The usual system software for operation of HMCCS comprises a compiler and a scheduler. The compiler is responsible for creation of programs running on such devices and the scheduler is responsible for loading of such devices during run-time. The main question in software development for these systems is what kind of core should be used for a program block or task in an HMCC system. In modern compilers this solution is done by the programmer.

Another approach consists in changing the affiliation of task or processes, or threads, or blocks of the program with sets of different type cores automatically on the scheduler level. In this context, a lot of different techniques have been proposed. Various types of approaches for optimizing usage of HMCCS have been proposed. One direction is devoted to maximization of performance of HMCCS, and another direction is related with performance optimization inside established power consumption budget, and so on. However, there is still a need for a more efficient execution of programs on HMCCS.

SUMMARY OF THE INVENTION

The objective of the present invention is to provide a task scheduler and a method for task scheduling, wherein the task scheduler and the method overcome one or more problems of the prior art.

In particular, an objective of the present invention can include increasing the efficiency of using computational systems with heterogeneous multi-core (HMC) architectures which comprises at least two types of cores.

A first aspect of the invention provides a task scheduler for scheduling a plurality of tasks on a multi-core processor comprising a set of slow cores and a set of fast cores, the task scheduler comprising:

a timing unit configured to compare a slow core runtime of at least one candidate task that is not on the critical path with a fast core runtime of one or more critical path tasks, and

a task assigning unit configured to assign the candidate task to the set of fast cores if the slow core runtime of the candidate task is longer than a fast core runtime of the one or more critical path tasks, and otherwise to assign the candidate task to the set of slow cores.

In general, a slow core runtime of a task is the runtime of the task on a core of the set of slow cores. The slow core runtime can be an estimate of the runtime on the slow core runtime, in particular it can be an estimated minimum or maximum runtime on a core of the slow cores. The fast core runtime can be defined correspondingly.

In embodiments of the invention, each application is considered as a set of tasks and a special task diagram describes this set of tasks, the hierarchy of tasks in the set and the sequence of task execution.

Each task diagram is divided into levels in hierarchical order. Each lower level corresponds to tasks which are dependent on data only of higher level tasks. The runtimes of tasks are compared with each other on a same level basis. That is, the execution time of a task not belonging to the critical path is compared with the execution times of tasks on the critical path within the same level in the task diagram. With other words, the timing unit is configured to compare a slow core runtime of at least one candidate task that is not on the critical path with a fast core runtime of one or more critical path tasks on one or more levels of the critical path that correspond to the candidate task.

The one or more critical path tasks, whose runtimes are compared with the runtime of the candidate task, not being on the critical path, are the tasks on the same levels of the critical path than the candidate task. With other words, the one or more critical path tasks are on one or more levels of the critical path, the levels corresponding to the level of the candidate task.

The one or more critical path tasks on one or more levels of the critical path that correspond to the candidate task can comprise the range of tasks on the critical path which are operating in the same time range with the candidate task.

By assigning the candidate task to the set of fast cores if the slow core runtime of the candidate task is longer than a fast core runtime of the one or more critical path tasks, the method of the first aspect ensures that the execution of the candidate task does not prolong the runtime of the entire program.

On the other hand, by assigning the candidate task to the set of slow cores if the slow core runtime of the candidate task is no longer than a fast core runtime of the one or more critical path tasks, the method of the first aspect ensures that tasks are preferably assigned to the slow cores, thus saving energy consumption and leaving the set of fast cores available for the execution of more urgent tasks.

In a first implementation of the apparatus according to the first aspect, the task scheduler further comprises:

a graph construction unit configured to construct a task graph of the plurality of tasks, and

- a path finding unit configured to determine the critical path of the task graph.

Thus, the task scheduler can have as an input the program code (which in embodiments can be in source code form or in compiled, binary form) and derive, using the graph construction unit and the path finding unit, the necessary information for scheduling the tasks of the program.

In other words, the task scheduler of the first implementation can be configured to have as input a program code, which defines a plurality of tasks, and derive (as output) a scheduling for these tasks.

A task graph can comprise a set of vertexes connected by ribs. In a preferred embodiment, the ribs are empty of latencies, because latencies are included into the duration of the appropriate tasks. Also vertexes in the task diagram in contrast with task graph contains multiply data as follows: t^v), t₂(v), pj(v) and p2(v). Here t^v) denotes the duration of task v on fast set cores, t₂(v) denotes the duration of task v on slow set cores, pi(v) denotes the power consumption of task v on fast set cores, p2(v) denotes the power consumption of task v on slow set cores.

In alternative embodiments, also in accordance with the present invention, the task scheduler can be configured to obtain the task graph and the critical path of the task graph as input from an external unit. For example, the task graph can be determined during compilation of the program.

In a second implementation of the apparatus according to the first aspect, the task scheduler further comprises a power computation unit configured to determine a power consumption gain of assigning a candidate task to the set of slow cores, wherein the task assigning unit is configured to assign candidate tasks in an order of decreasing power consumption gain.

Thus, the task scheduler itself is configured to determine the power consumption gain. This means that the task scheduler can be independent of other devices and has fewer requirements that other units providing information regarding the tasks to be executed.

In a third implementation of the apparatus according to the first aspect, the power computation unit is configured to determine the power consumption gain as the difference between a power consumption of the candidate task on the set of fast cores and a power consumption of the candidate task on the set of slow cores.

This represents a particularly simple and efficient way of computing a power consumption gain.

In a fourth implementation of the apparatus according to the first aspect, the task scheduler further comprises a preliminary execution unit configured to determine a slow core runtime and/or a fast core runtime of the candidate task by carrying out one or more preliminary runs of the candidate task.

This represents a practical way of computing the power consumption gain. In embodiments of the invention, the preliminary execution unit is configured to determine the slow core and/or fast core runtime before the execution of a program. For example, the task scheduler can be configured to determine the slow core and/or fast core runtime of the tasks of a program during the installation of the program.

A second aspect of the invention relates to a processor comprising a set of fast cores, a set of slow cores and a task scheduler according to the first aspect of the invention or one of its implementations.

According to this aspect, the task scheduler can be integrated into the processor. For example, the task scheduler can be integrated into the hardware of the processor. This has the advantage that external components do not need to be modified in order to achieve the performance gain.

A third aspect of the invention relates to a method for scheduling a plurality of tasks on a processor comprising a set of fast cores and a set of slow cores, the method comprising:

comparing a slow core runtime of a candidate task that is not on a critical path with a fast core runtime of one or more critical path tasks, and

if the slow core runtime of the candidate task is longer than a fast core runtime of the one or more critical path tasks, assigning the task to the set of fast cores, otherwise assigning the task to the set of slow cores.

The methods according to the third aspect of the invention can be performed by the task scheduler according to the first aspect of the invention. Further features or implementations of the method according to the third aspect of the invention can perform the functionality of the task scheduler according to the first aspect of the invention and its different implementation forms.

In a first implementation of the method of the third aspect, the method further comprises initial steps of:

constructing a task graph of the plurality of tasks, and

determining the critical path of the task graph.

Thus, it is possible that the task graph is not previously determined, but determined e.g. by the task scheduler. If the structure of the task graph depends e.g. on some decisions that are done after compile-time, the method can determine the task graph at a later point, e.g. at runtime. In a second implementation of the method of the third aspect, the method further comprises:

for at least two candidate tasks: determining a power consumption gain of assigning the candidate task to the set of slow cores, and

- assigning the at least two tasks in an order of decreasing power consumption gain.

In a third implementation of the method of the third aspect, the power consumption gain is determined as the difference between a power consumption of the candidate task on the set of fast cores and a power consumption of the candidate task on the set of slow cores.

In a fourth implementation of the method of the third aspect, the method further comprises an initial step of determining a slow core runtime and/or a fast core runtime of the candidate task by carrying out one or more preliminary runs of the candidate task.

In a fifth implementation of the method of the third aspect, the preliminary runs are carried out for collecting information on task execution time and latency by executing the candidate task on different sets of cores, and wherein the slow core runtime and/or the fast core runtime are determined based on the collected information.

If the information on task execution time and latency are not provided (e.g. by the compiler), the task scheduler can thus determine the required information by carrying out the preliminary runs. This can involve additional computation time, but can still lead to a reduction of overall computation time, in particular for long execution times of a program.

A fourth aspect of the invention refers to a computer-readable storage medium storing program code, the program code comprising instructions for carrying out the method of the third aspect or one of the implementations of the third aspect.

BRIEF DESCRIPTION OF THE DRAWINGS

To illustrate the technical features of embodiments of the present invention more clearly, the accompanying drawings provided for describing the embodiments are introduced briefly in the following. The accompanying drawings in the following description are merely some embodiments of the present invention, but modifications on these embodiments are possible without departing from the scope of the present invention as defined in the claims.

FIG 1 is a block diagram illustrating a task scheduler in accordance with an embodiment of the present invention,

FIG 2 is a flow chart illustrating a method for scheduling a plurality of tasks in accordance with a further embodiment of the present invention,

FIG 3 is a schematic diagram illustrating a method for scheduling a plurality of tasks in accordance with a further embodiment of the present invention, FIG. 4 is a schematic diagram illustrating a method for scheduling a plurality of tasks in accordance with a further embodiment of the present invention,

FIG. 5 is a schematic diagram illustrating a method for scheduling a plurality of tasks in accordance with a further embodiment of the present invention, FIG. 6 is a schematic diagram illustrating a method for scheduling a plurality of tasks in accordance with a further embodiment of the present invention, and

FIG. 7 is a flow chart illustrating a method for scheduling a plurality of tasks in accordance with a further embodiment of the present invention.

Detailed Description of the Embodiments

FIG 1 is a block diagram illustrating a task scheduler 100 in accordance with an embodiment of the present invention. The task scheduler 100 comprises a timing unit 110 and a task assigning unit 120. Further, the task scheduler 100 can optionally, as indicated with dashed lines in FIG 1 comprise a graph construction unit 130, a path finding unit 140, a power computation unit 150, and a preliminary computation unit 160.

In embodiments of the invention, the task scheduler 100 can be implemented as part of a processor (not shown in FIG 1) or can be implemented in a hardware device that is located outside the processor.

FIG 2 is a flow chart illustrating a method 200 for scheduling a plurality of tasks in accordance with a further embodiment of the present invention.

The method 200 comprises a step 210 of comparing a slow core runtime of a candidate task that is not on the critical path with a fast core runtime of one or more critical path tasks on one or more levels of the critical path that correspond to the candidate task.

The method comprises a further step 220 of, if the slow core runtime of the candidate task is longer than a fast core runtime of the one or more critical path tasks, assigning the task to the set of fast cores, otherwise assigning the task to the set of slow cores.

As shown with dashed lines in FIG. 2, the method optionally further comprises three initial or preliminary steps: A first initial step 202 of construction a task graph of the plurality of tasks, a second initial step 204 of determining a critical path of the task graph, and a third initial step 206 of determining a slow core runtime and/or a fast core runtime of the candidate task by carrying out one or more preliminary runs of the candidate task.

In embodiments of the invention, the method steps are carried out in the order as shown in FIG. 2. However, in other embodiments of the invention, the method steps can be carried out in a different order.

FIG 3 is a schematic diagram which illustrates the problem addressed by the task scheduler and method of the present invention.

Shown in FIG 3 are a plurality of tasks, a first, second and third task 310, 320, 330 that are on a critical path 305, and a candidate task 340. The tasks 310, 320, 330 are allocated to a set of fast cores 302. For the candidate task 340, the task scheduler should decide whether to assign it to the set of fast cores 302 or a set of slow cores 304.

Here t; is the time of execution on cores of i-type, (wherein i is 1 or 2) and pi is the power consumption on cores of i-type (i=l for fast cores and i=2 for slow cores).

The time of program execution corresponds to the longest path (critical path) through the task graph, evaluated on task execution times. The performance of the program is the inverse value of the program execution time. To maximize the performance of the program means minimizing the program execution time or minimizing the critical path of the task diagram. A minimal value of a critical path corresponds to execution of tasks of critical path on cores of the fast set. All other tasks (not included into the critical path) should migrate among sets to minimize power consumption (this facility is denoted by the "?" sign on the diagram shown on FIG 3) Now let us consider the problem of maximization of performance and minimization of power consumption. We will solve this problem step by step. At the first step we construct the solution with maximal performance and at the second step keeping maximal performance value, we will minimize power consumption.

Let us assume we found the critical path K, e.g. as determined by a critical path finding unit as described above. After that, all tasks are divided or organized into levels with respect to tasks of the critical path. Each lower level corresponds to tasks which are dependent on data only of higher level tasks. This is illustrated by the example shown in FIG 4.

In FIG. 4, first task 410, fourth task 440, fifth task 450 and sixth task 460 are located on a critical path, indicated by dashed line 405, wherein the tasks are assigned to a set of fast or hot cores, 402. A second task 420 and a third task 430 are located outside of the critical path, but on the same level as the fourth task 440, indicated as "Level 2" in FIG. 4. Second task 420 and third task 430 are considered as candidate tasks in the following.

When searching the critical path 405, all operations are provided on the set of fast cores, because only by this way one can get the maximal performance of considering HMCCS.

Then the critical path is fixed and all tasks belonging to it are affiliated with the set of fast cores. Let us consider tasks on intermediate levels that are not belonging to the critical path. Since the goal now is to minimize power consumption, it is checked:

Is it possible to affiliate the second task 420 and the third task 430 with the set of slow cores without extending the total runtime.

In the following equations, A, B, C, D, and E denotes the first task, the second task,the third task, the fourth task, and the fifth task, respectively.

The second task B and the third task C can be affiliated with the set of slow cores, without exceeding the total runtime, and hence, without the loss of performance if

t₂(C)≤ti(D)

and

t₂(B) < t,(D) + t,(E) .

The first inequality according to task diagram is valid only for tasks of the same level, namely Level 2, but the second inequality according to the task diagram shown on FIG 4 is valid for the tasks of the range of levels, namely Level 2 and Level 3, because the second task 420 (task B) operates not only on one level, but on a few.

In the case presented in FIG 4, the second and third tasks 420, 430 should be affiliated with the set of slow cores.

In FIG 5 an example is shown, where a plurality of tasks 510, 540, 550, and

560 are located on a critical path 505 and for a second and third task 520, 530 it needs to be decided whether to assign these to the set of slow cores 504 or the set of fast cores 502. The potential placement of the second and third task on the set of cold cores is indicated with reference numbers 520' and 530'.

In the example of FIG 5, the migration of the third task 530 to the slow set of cores is available, but migration of the second task 520 is not available. In this case

and

t₂(C) > t₁(D) .

Since the critical path is not changing the performance keeps its maximal value.

Otherwise migration of any task into the slow set leads to the decreasing or minimization of power consumption. In this case, the power consumption will decrease by the following value

Pprofit(Level2) = p^C) - p₂(C) .

FIG 6 shows a similar example, where a plurality of tasks 610, 640, 650, and

660 are located on a critical path 605 and for a second and third task 620, 630 it needs to be decided whether to assign these to the set of slow cores 604 or the set of fast cores 602. The potential placement of the second and third task on the set of cold cores is indicated with reference numbers 620' and 630'. In this case the following inequalities are valid

t₂(B)≤t₁(D)

and

t₂(C)≤ti(D) + ti(E) .

Here the second inequality is also valid in the range of levels 2 and 3.

In this case the effect of decreasing power consumption will be greater than in the previous example and be equal to the following value

p_proflt(Level2) = _Pl(B) - p₂(B) + _Pl(C) - p₂(C) The order of migration is not essential to get a better result in terms of minimization of power consumption. But for example, it is better that, if

p₁(B) - p₂(B)≥p₁(C) - p₂(C) ,

then the second task 620 migrates before the first, otherwise the third task 630 migrates the first.

FIG 7 is a flow chart of an example method for migrating tasks, wherein the task B is belonging to only one fixed level fixed among the set of fast cores and the set of slow cores.

In a first step 702, a list of candidate tasks L is provided to the task scheduler. In a second step 704, the task scheduler sorts the list L in order of decreasing power consumption profit (computed e.g. as pi - p₂)_. The result is stored in an ordered list Lj.

In a third step 706, a task D, which is on a (e.g. previously determined) critical path is taken from the list and, in step 708, put "into the hot pull", i.e. assigned to the fast set of cores.

In step 710, it is checked whether D is the last task on the data layer. If so, there are no more tasks to process and the method stops in step 722.

If there are more tasks to process, the method proceeds in step 712 and takes task B (the first task in ordered list L_\). In step 714, the condition

ti(D)≥t₂(B)

is checked. If the condition is fulfilled, the method proceeds with step 716 and puts B into the "cold pull", i.e., assigns it to the set of slow cores. If the condition is not fulfilled, the method proceeds in step 718 and task B is put into the "hot pull", i.e., it is assigned to the set of fast cores.

In step 720 it is checked whether task B is the last task in the ordered list Li. If so, the method ends in step 722. Otherwise, the method continues with step 724, taking task B as the next task in the ordered list Li .

If task B is belonging to a few levels the control inequality in this algorithm should be changed to the more complicated inequality as follows

t₁(D) + t₁(E) + ... + t₁(S) > t₂(B) .

Here D, E, ...,S is the range of tasks of critical path which are operating in the same time range with task B.

The foregoing descriptions are only implementation manners of the present invention, the protection of the scope of the present invention is not limited to this.

Any variations or replacements can be easily made through person skilled in the art.

Therefore, the protection scope of the present invention should be subject to the protection scope of the attached claims.

To summarize, embodiments of the present invention comprise mapping tasks to sets of cores. Preliminary can be performed to collect information of execution times on different type core and appropriate power consumptions. After that it is possible to construct task diagram, evaluate the critical path on this diagram corresponding to the maximal value of performance, split this diagram on levels and on each level solve the problem of migration of tasks that are not belonging to the critical path. Potentially, these can be assigned to the set of slow cores, thus reducing overall power consumption.

In embodiments of the invention, the method can comprise further steps. Let us consider heterogeneous multi-core computing system consists of Ci,c2_r en cores of the fast set type (with high energy consumption and high performance) and ck+j , ck+2 , c„ cores of the slow set type (with low energy consumption and low performance), totally n cores. Now let us consider how to bind tasks in complicated software with processor cores of different sets:

1. The static monitoring of HMCCS is provided. In result we evaluate the time of execution of all tasks on different cores tl and t2 and the appropriate values of power consumptions.

2. The task diagram is constructed.

3. We evaluate the critical path on the task diagram suggesting that all evaluations are provided on the fast set cores. This defines the maximal performance on the considering HMCCS.

4. We divide the task diagram on levels starting from initial node down to the last node.

5. On all intermediate levels we solve migration problem for tasks not belonging to the critical path at the data level. Tasks of critical path always are affiliated with fast set cores.

6. An evaluation of the power consumption is provided according to the tasks affiliation with the sets of cores. A gain in power consumption can be reached if even only one task is affiliated with the slow set cores. If many tasks are affiliated with slow set cores the power consumption profit will be essentially greater.

7. All tasks are executed according to the migration among fast and slow set

cores.

Effects of a method in accordance with the present invention can include that a HMCCS performance is improved and/or the power consumption is decreased.

Here we think about one of the most common goals - minimal completion time (another one - maximal throughput - is not considered here). A method can solve an optimization problem in order to minimize total completion time of each particular application. This can include finding an optimal mapping of tasks to cores that will make completion time reach its potential minimum and simultaneously decrease the power consumption of HMCCS as much as it is possible.

Furthermore, with a task scheduler or method in accordance with the present invention there is considerable less effort for the developer to develop parallel applications for heterogeneous hardware. This results in making the process of developing parallel application for HMCCS hardware easier. Finally, it leads to decrease in labor costs of either software developing or effective porting existing code to specific architecture.

Embodiments of the present invention can be used in a system with signal processors of SoC type in which the same software is running permanently. Thus, a particularly high power saving is achieved.

The system of exploiting heterogeneous multi-core architecture with functionally different performance and power consumption cores. Aspects of the present invention can involve:

• Preliminary static estimation of time of execution and power consumption on a set of fast cores and a set of slow cores.

• Usage of a task diagram for designing an performance-energy efficient scheduler for heterogeneous multi-core devices

· Evaluating the critical path in on a task diagram

• Leveling the task diagram to provide the maximal profit in power

consumption Evaluation of power consumption on obtained task distribution among sets of cores according to the task diagram to minimize power consumption and simultaneously keep the value of maximal performance.

Claims

Task scheduler (100) for scheduling a plurality of tasks (310-340; 410-460; 510-560; 610-660) on a multi-core processor comprising a set of slow cores (304; 504; 604) and a set of fast cores (302; 402; 502; 602), the task scheduler comprising:

a timing unit (110) configured to compare a slow core runtime of at least one candidate task (340; 420, 430; 520, 530; 620, 630) that is not on a critical path (305; 405; 505; 605) with a fast core runtime of one or more critical path tasks, and

a task assigning unit (120) configured to assign the candidate task to the set of fast cores if the slow core runtime of the candidate task is longer than the fast core runtime of the one or more critical path tasks, and otherwise to assign the candidate task to the set of slow cores.

2. The task scheduler of claim 1, further comprising:

a graph construction unit (130) configured to construct a task graph of the plurality of tasks,

a path finding unit (140) configured to determine the critical path of the task graph.

The task scheduler of claim 1 or 2, further comprising a power computation unit (150) configured to determine a power consumption gain of assigning a candidate task to the set of slow cores, wherein the task assigning unit is configured to assign candidate tasks in an order of decreasing power consumption gain.

The task scheduler of one of the previous claims, wherein the power computation unit is configured to determine the power consumption gain as the difference between a power consumption of the candidate task on the set of fast cores and a power consumption of the candidate task on the set of slow cores. The task scheduler of one of the previous claims, further comprising a preliminary execution unit (160) configured to determine a slow core runtime and/or a fast core runtime of the candidate task by carrying out one or more preliminary runs of the candidate task.

A processor, comprising a set of fast cores (304; 504; 604), a set of slow cores (302; 402; 502; 602) and a task scheduler (100) according to one of claims 1 to 5.

Method for scheduling a plurality of tasks (310-340; 410-460; 510-560; 610- 660) on a processor comprising a set of fast cores (302; 402; 502; 602) and a set of slow cores (304; 504; 604), the method comprising:

comparing (210) a slow core runtime of a candidate task that is not on a critical path (305; 405; 505; 605) with a fast core runtime of one or more critical path tasks, and

if the slow core runtime of the candidate task is longer than the fast core runtime of the one or more critical path tasks, assigning (220) the task to the set of fast cores, otherwise assigning the task to the set of slow cores.

The method of claim 7, further comprising initial steps of:

constructing (202) a task graph of the plurality of tasks, and determining (204) a critical path of the task graph.

The method of claim 7 or 8, further comprising:

assigning the at least two tasks in an order of decreasing power consumption gain.

The method of claim 7 to 9, wherein the power consumption gain is determined as the difference between a power consumption of the candidate task on the set of fast cores and a power consumption of the candidate task on the set of slow cores.

The method of one of claims 7 to 10, further comprising an initial step of determining (206) a slow core runtime and/or a fast core runtime of the candidate task by carrying out one or more preliminary runs of the candidate task.

The method of claim 11, wherein the preliminary runs are carried out for collecting information on task execution time and latency by executing the candidate task on different set of cores, and wherein the slow core runtime and/or the fast core runtime are determined based on the collected information. A computer-readable storage medium storing program code, the program code comprising instructions for carrying out the method of one of claims 7 to 12.