WO2008148624A1

WO2008148624A1 - Method and device for providing a schedule for a predictable operation of an algorithm on a multi-core processor

Info

Publication number: WO2008148624A1
Application number: PCT/EP2008/055907
Authority: WO
Inventors: Andrey Nechypurenko; Egon Wuchner
Original assignee: Siemens Aktiengesellschaft
Priority date: 2007-06-05
Filing date: 2008-05-14
Publication date: 2008-12-11

Abstract

The invention describes a method for providing a schedule for a predictable operation of an algorithm (ALG) on a multi-core processor comprising a plurality of parallel working cores (P1,...,P8), comprising the steps of - creating a model of the algorithm (ALG), thereby identifying tasks (t1,..., t22) of the algorithm (ALG) and at least one characteristic of each of the tasks (t1,..., t22); - exploiting an optimization method taking account of a first optimization criterion, thereby assigning each identified task (t1,..., t22) according to its at least one characteristic to at least some of the plurality of cores (P1,..., P8) of the multi-core processor and determining a starting point in time for each of the tasks (t1,..., t22) of the algorithm (ALG); - repeatedly exploiting the optimization method taking account of at least one second optimization criterion, thereby outputting the starting point in time for each of the tasks (t1,..., t22), the first and the at least one second optimization criterion; and - analysing each schedule of operation of the algorithm (ALG) with regard to the starting point in time for each of the tasks (t1,..., t22), the first and the at least one second optimization criterion to determine the best combination of the first and the at least one second optimization criteria.

Description

Method and Device for providing a schedule for a predictable operation of an algorithm on a multi-core processor

The invention relates to a method for providing a schedule for a predictable operation of an algorithm on a multi-core processor comprising a plurality of parallel working cores. The invention further relates to a device for creating a schedule for a predictable operation of an algorithm on a multi-core processor comprising a plurality of parallel working cores. The invention further relates to a computer program product .

Normally, an algorithm comprises a plurality of different tasks, each task having a specific duration. Furthermore, there are task dependencies and data exchange between some of the tasks. These tasks are called interdependent tasks. The operation time of an algorithm depends on many aspects, especially the task dependencies, task durations and a so- called domain motivated explicit mapping of tasks or task groups to certain cores of a multi-core processor. Improving the operation time of an algorithm implies a better data- throughput or a better task scalability in general. Especially, real-time or life-processing algorithms which are used for medical instruments need feasible run-time predictions with regard to their operation time.

Even in case that run-time predictions are not needed, the dynamic scheduling of an algorithm with interdependent tasks has to be based on their task dependencies and durations. The tasks of the algorithm have to be executed in sequence of their specified dependencies. In addition, there has to be a certain kind of fairness with respect to all dispatched algorithms processed by the multi-core processor at a certain time. By executing a plurality of algorithms in parallel, it has to be guaranteed that no previous algorithm starves out. On the contrary, an algorithm should be able to be completed before a new or the same algorithm starts. Besides a time- optimized scheduling of the tasks specific to an algorithm sometimes there have to be made trade-offs because a time- robust task-to-core deployment with as few cores of the multi-core processor as possible as another option is as important as a time-optimized task-to-core deployment. Consequently, a general-purpose scheduling policy does not suffice, since there are several dimensions of variability: the number of cores of the multi-core processor; the questions whether the multi-core processor comprises asymmetric or symmetric cores; a robust versus a time- optimized deployment of the tasks of an algorithm; significant time improvements due to further parallelizability of certain tasks.

By now, when using so-called symmetric multi-core processors (as provided by Intel or AMD) , task scheduling is under the control of the operating system. A symmetric multi-core processor means that all cores of the multi-core processor have the same capability. Even real-time operating systems schedule their execution units, i.e. tasks, threads or processes, in a transparent way according to priorities, execution unit states and queues. Executing an algorithm with a real-time operating system therefore makes it necessary to assign a priority to each task of the algorithm. Several approaches are known how to map task to priorities in order to derive a feasible and predictable algorithm duration. Unfortunately, such a mapping has to be done manually by the programmer leveraging his experience and knowledge. Furthermore, this procedure is hardly applicable to algorithms having interdependent tasks.

Even when using so-called asymmetric multi-core processors or a combination of different multi-core units, like the so- called IBM Cell Broadcasting Engine (CBE) and graphical processing units (GPUs) the scheduling of the tasks of an algorithm has to be done manually. The scheduling result has to be programmed explicitly. Each scheduling solution needs to consider which tasks are able to run on which cores and needs to specify and trigger task-to-core deployments explicitly.

It is therefore an object of the present invention to provide a method for providing a schedule for a predictable operation of an algorithm on a multi-core processor. It is a further object of the present invention to provide a device for creating a schedule for a predictable operation of an algorithm on a multi-core processor. Furthermore, a computer program product directly loadable into the internal memory of a digital computer has to be provided.

These objects are solved by a method according to claim 1, a computer program product according to claim 11 and a device according to claim 12. Preferred embodiments of the invention are set out in the dependent claims.

A method for providing a schedule for a predictable operation of an algorithm on a multi-core processor, the multi-core processor comprising a plurality of parallel working cores, comprises the steps of creating a model of the algorithm, thereby identifying tasks of the algorithm and at least one characteristic of each of the tasks; exploiting an optimization method taking account of a first optimization criterion, thereby assigning each identified task according to its at least one characteristic to at least some of the plurality of cores of the multi-core processor and determining a starting point in time for each of the tasks of the algorithm; repeatedly exploiting the optimization method taking account of at least one second optimization criterion, thereby outputting the starting point in time for each of the tasks, the first and the at least one second optimization criteria; and analyzing each schedule of operation of the algorithm with regard to the starting point in time for each of the tasks, the first and the at least one second optimization criterion to determine the best combination of the first and the at least one second criteria.

Scheduling a predictable operation means that the overall time of the algorithm, the needed number of cores for executing the algorithm, the time-robustness and the resulting core utilization of the number of cores is identical with each execution of the algorithm.

A scheduling policy on multi-core processors must not be dominated by one dimension, as for example its overall time- optimization. Applying the method according to the invention means to make a trade-off analysis to complex algorithms so that their task-to-core deployment and scheduling addresses this problem. It takes several dimensions of variability, i.e. optimization criteria, as input parameters. The delivered result is a set of scheduling solutions to each tuple of input values and the impact of each solution like time-optimization, the scale of its time-robustness or the specific tasks worth being further parallelized.

Thereby, the power of the multi-core processor can be fully exploited. The schedule for the algorithm furthermore allows a predictable operation. Therefore, the method according to the invention allows providing algorithms which can be used for critical applications, e.g. medical measurements. An advantage over the prior art is that the method according to the invention can be carried out automatically by using a computer system instead of a manual approach. Therefore, the method according to the invention may be used for algorithms with changing tasks, changing dependencies and durations because scheduling a new predictable operation of the algorithm can be done even dynamically. The first optimization criterion may correspond to a time- optimization of the algorithm. Therefore, starting point for the method according to the invention is to provide a schedule for a predictable operation of an algorithm which is time-optimized. The at least one second optimization criterion may correspond to a fixed number of cores and/or a discrete range of cores and/or a time-robustness of the schedule and/or parallelization options of certain tasks. According to the used at least one second optimization criterion an already provided schedule for the operation of the algorithm can be varied to determine the best combination of the first and the at least one second criteria for the schedule of the tasks of the algorithm.

According to an improvement only some of the second optimization criteria are considered during exploiting the optimization method repeatedly. As a result, one or more solutions for scheduling the tasks of the algorithm can be found. In contrast to using all of the second optimization criteria, the time for finding one or more solutions can be reduced.

According to a further embodiment different second optimization criteria are considered during exploiting each run of the optimization method. This step is done before analysing each schedule of operation of the algorithm with regard to the starting point in time for each of the tasks, the first and the at least one second optimization criteria to determine the best combination of the first and the at least one second criteria.

The method steps of the invention may be carried out based on a computer. Alternatively, it is also possible that some of the method steps of the invention are carried out and created by, respectively, a programmer.

As an optimization method an optimization algorithm or a heuristics may be used. Thereby use of known optimization algorithms and heuristics is possible, like the optimization method "Simulated Annealing" or the heuristics "First Fit Decreasing" .

According to a further embodiment of the invention the first and/or the at least one second optimization criterion gets weighted with a respective weight parameter. A weight parameter may be specified for one, for some or for the entire first and/or the at least on second optimization criteria. As a result, only those schedules (defined by starting point in time for each of the tasks, the first and the at least one second optimization criterion) are outputted which meet the specified weighted criteria.

Furthermore, the first optimization criterion may be specified as a range. Accordingly, the at least one second optimization criterion may be specified as a range.

According to a further embodiment, the analysis of each schedule of operation of the algorithm ends up with one or more suggestions of the first and the at least one second optimization criterion. After that, a choice can be made automatically or by the software engineer which of the proposed schedules seems to be the most practical solution for the algorithm running on a specific multi-core processor.

According to a second aspect of the invention a computer program product directly loadable into the internal memory of a digital computer is suggested, the computer program product comprising software code portions for performing the steps of the method according to the invention when said product is run on a computer.

According to a third aspect of the invention a device for providing a schedule for a predictable operation of an algorithm on a multi-core processor comprising a plurality of parallel working cores is suggested. The device comprises a first means for creating a model of the algorithm, thereby identifying tasks of the algorithm and at least one characteristic of each of the tasks; a second means for exploiting an optimization method taking account of a first optimization criterion, thereby assigning each identified task according to its at least one characteristic to at least some of the plurality of cores of the multi-core processor and for determining a starting point in time for each of the tasks of the algorithm, a third means for repeatedly exploiting the optimization method taking account of at least one second optimization criterion, thereby outputting the starting point in time for each of the tasks, the first and the at least one second optimization criterion; and a fourth means for analysing each schedule of operation of the algorithm with regard to the starting point in time for each of the tasks, the first and the at least one second optimization criterion to determine the best combination of the first and the at least one second optimization criteria.

A device according to the invention has the same advantages pointed out in connection with the described method.

Furthermore, the device according to the invention comprises further means for executing the method steps as set out above .

The invention will be described in more detail by reference to the accompanying figures.

Fig. 1 shows a simplified already analysed model of an algorithm containing a plurality of tasks, and

Fig. 2 shows a schedule of the tasks of the algorithm of

Fig. 1 assigned to a plurality of cores of a multi- core processor.

Fig. 1 shows a simplified model of an algorithm ALG consisting of tasks tO to t22. Each of the tasks tθ,...,t22 has a specific duration which is indicated by reference numeral "delta". For instance, task tO has a duration of delta = 200 ms . Task t7 has a duration of delta = 50 ms and so on. Furthermore, Fig. 1 shows a dependency between some of the tasks. For example, the dependency between task t9 and task t7 is indicated with dl . This means, between tasks t7 and t9 there is a data exchange and task t7 has to wait for some input information from task tθ. Accordingly, tasks t6, t8 and tlO need input information from task t7. The dependencies are outlined with d4, d5 and d6. In an appropriate manner dependencies dl to d32 between further interdependent tasks are indicated.

Fig. 1 exemplarily shows the algorithm after performing the step of creating a model of the algorithm, thereby identifying the tasks tl,...,t22 of the algorithm ALG and identifying characteristics of each of the tasks. The characteristics used for a first step of optimization and shown in Fig. 1 are the duration of each of the tasks and dependencies to other tasks. Thereafter, a time-optimized algorithm ALG will be determined by an optimization method, e.g. any optimization algorithm or a heuristics. The determination of the time-optimized scheduling can be done automatically. Applying a time-optimization to the algorithm shown in Fig. 1 results in using eight cores of a multi-core processor (which might have more than eight cores in total) and an overall duration of the algorithm of 1850 ms . The resulting task-to-core scheduling is outlined in Fig. 2.

As can be seen from Fig. 2 each of the tasks tθ,...,t22 is assigned to a specific core P1,...,P8: task t22 is assigned to core Pl. Tasks tl, tl9 and t4 are assigned to core P4. Tasks t5, tl8 and til are assigned to core P5. Tasks tθ, t6, tl5 and tl7 are assigned to core P6. Tasks t7, t8, t9, tl4 and t2 are assigned to core P7. Tasks tl3 and t21 are assigned to core P2. Tasks tlβ, t20 and t3 are assigned to core P3. Last, tasks tlO and tl2 are assigned to core P8. The assignment to the eight cores is only exemplary. However, another task-to- core deployment could be used as a starting point for the method according to the invention. Furthermore, the dependencies dl,...,d32 which are outlined in Fig. 1 are shown in Fig. 2, too.

Further criteria which can affect a time-optimization for the schedule for a predictable operation of the algorithm ALG may be data exchanges between the tasks, task durations, core candidates per task and task groups to be deployed to one core .

Further parameters or criteria besides time-optimization for finding an optimized schedule for the algorithm ALG are a fixed number of cores of the multi-core processor, a discrete range of cores, a time-robustness of the schedule and parallelization options of certain tasks. These criteria can be called trade-off criteria. Some or all of these criteria will be used as input information to exploit optimization methods (algorithms or heuristics) to suggest possible schedulings of tasks to certain cores at specific times. The input criteria which represent further "dimensions of variability" of the algorithm besides time-optimization can be dealt with by an interactive simulation of deployment suggestions and an impact analysis of alternative deployment options and their trade-offs.

According to this, the optimization method is repeatedly exploited taking account of some or all of the above mentioned optimization criteria, thereby outputting the starting point in time for each of the tasks and the optimization criteria to be considered for each possible schedule. Thereafter, each proposed schedule of operation of the algorithm is analysed with regard to its starting point in time for each of the tasks and the optimization criteria considered determining the best combination of the optimization criteria.

The impact analysis may encompass the following usage scenarios : A fixed number of cores of the multi-core processor may be specified. As a result a suggestion about a time- optimal scheduling solution feasible within the bounds of given cores is provided.

Alternatively a (discrete) range of cores (e.g. 8, 12, 16 available cores) can be specified. As a result deployment suggestions sorted along several criteria like operation time and needed number of cores is provided. By selection of one of these scheduling solutions a simulation of different task duration variations could be executed in order to see the time- robustness of a scheduling solution depending on ranges of task durations. Thus, one is able to simulate several deployment suggestions by using a range of cores and task duration variations as input parameters. The delivered result is the impact of all input values within the specified range on the overall optimized algorithm time and its time-robustness.

In contrast, a selection criterion may be specified, e.g. a weighted measure of all variability dimensions like time-optimization, time-robustness, needed number of cores etc. As an input a range of available cores, task durations and time optimization ranges with minimums and maximums can be specified. Another input value is the weight which is a kind of importance of each input dimension, each needed to determine the final scheduling policy. For instance, the needed number of cores should influence the selection of the final scheduling policy by 25%, the time-optimization by 35% and the time-robustness by 40%. As a result, one deployment suggestion according to the specified tradeoff criterion is provided.

On the other hand, if the number or a range of available cores is not restricted, the result will be a best- possible time-deployment suggestion. Another option is to focus on the scheduling suggestion estimated as the best according to the results of analysis of the repeatedly exploited optimization methods. It could be aimed to retrieve suggestions about which tasks are worth being split into sub-tasks in order to further improve the overall operation time. For instance, when the rate of incoming data blocks cannot be met by the operation time of the selected algorithm it is appropriate to split further tasks into subtasks. With the method according to the invention it is possible to gather information about which tasks are worth the effort to split them into subtasks.

Trade-off criteria allow to stress certain aspects of the proposed scheduling solutions, like predictable (not necessarily fast) and robust over all time execution with a limited number of cores for the fastest operation time possible with as many cores as needed. The invention provides a powerful mechanism to assist software developers in making an informed and automatic decision on the right scheduling solution .

Claims

Patent Claims

1. A method for providing a schedule for a predictable operation of an algorithm (ALG) on a multi-core processor comprising a plurality of parallel working cores (P1,...,P8), comprising the steps of creating a model of the algorithm (ALG) , thereby identifying tasks (tl, ..., t22) of the algorithm (ALG) and at least one characteristic of each of the tasks (tl,...,t22); exploiting an optimization method taking account of a first optimization criterion, thereby assigning each identified task (tl, ..., t22) according to its at least one characteristic to at least some of the plurality of cores (P1,...,P8) of the multi-core processor and determining a starting point in time for each of the tasks (tl,...,t22) of the algorithm (ALG); repeatedly exploiting the optimization method taking account of at least one second optimization criterion, thereby outputting the starting point in time for each of the tasks (tl, ..., t22) , the first and the at least one second optimization criterion; and analysing each schedule of operation of the algorithm (ALG) with regard to the starting point in time for each of the tasks (tl, ..., t22) , the first and the at least one second optimization criterion to determine the best combination of the first and the at least one second optimization criteria.

2. Method according to claim 1, wherein the first optimization criterion corresponds to a time-optimization of the algorithm (ALG) .

3. Method according to claim 1 or 2, wherein the at least one second optimization criterion corresponds to a fixed number of cores (P1,...,P8), and/or a discrete range of cores (P1,...,P8), and/or a time-robustness of the schedule, and/or parallelization options of certain tasks (tl, ..., t22) .

4. Method according to claim 3, wherein only some of the second optimization criteria are considered during exploiting the optimization method repeatedly.

5. Method according to one of the preceding claims, wherein different second optimization criteria are considered during exploiting each run of the optimization method.

6. Method according to one of the preceding claims, wherein an optimization algorithm or a heuristics is used as an optimization method.

7. Method according to one of the preceding claims, wherein the first and/or the at least one second optimization criterion gets weighted with a respective weight parameter.

8. Method according to one of the preceding claims, wherein the first optimization criterion is specified as a range.

9. Method according to one of the preceding claims, wherein the at least one second optimization criterion is specified as a range .

10. Method according to one of the preceding claims, wherein the analysis of each schedule of operation of the algorithm (ALG) ends up with one or more suggestions of the first and the at least one second optimization criterion.

11. Computer program product directly loadable into the internal memory of a digital computer, comprising software code portions for performing the steps of one of the preceding claims when said product is run on a computer.

12. A device for providing a schedule for a predictable operation of an algorithm (ALG) on a multi-core processor comprising a plurality of parallel working cores (P1,...,P8), comprising a first means for creating a model of the algorithm (ALG), thereby identifying tasks (tl, ..., t22) of the algorithm (ALG) and at least one characteristic of each of the tasks (tl, ..., t22) ; a second means for exploiting an optimization method taking account of a first optimization criterion, thereby assigning each identified task (tl,...,t22) according to its at least one characteristic to at least some of the plurality of cores (P1,...,P8) of the multi- core processor and for determining a starting point in time for each of the tasks (tl,...,t22) of the algorithm (ALG) ; - a third means for repeatedly exploiting the optimization method taking account of at least one second optimization criterion, thereby outputting the starting point in time for each of the tasks (tl, ..., t22) , the first and the at least one second optimization criterion; and a fourth means for analysing each schedule of operation of the algorithm (ALG) with regard to the starting point in time for each of the tasks (tl, ..., t22) , the first and the at least one second optimization criterion to determine the best combination of the first and the at least one second optimization criteria.

13. Device according to claim 12, comprising further means for executing the method steps according to claim 2 to 10.