WO2022171262A1

WO2022171262A1 - Scheduling tasks for execution by a computer based on a reinforcement learning model

Info

Publication number: WO2022171262A1
Application number: PCT/EP2021/052997
Authority: WO
Inventors: Hitham Ahmed Assem Aly SALAMA; Jose Mora; Peng Hu; MingXue Wang
Original assignee: Huawei Technologies Co., Ltd.
Priority date: 2021-02-09
Filing date: 2021-02-09
Publication date: 2022-08-18
Also published as: CN116848508A

Abstract

A task scheduler for scheduling tasks for execution by a computer is disclosed. The task scheduler is configured to obtain a first reinforcement learning model for optimising a task execution time and a second reinforcement learning model for optimising a utilisation of a computing resource of the computer for task execution; determine a measure of an execution time of one or more tasks executed by the computer; and schedule further tasks for execution by the computer using one of the first reinforcement learning model and the second reinforcement learning model based on the determined measure of the execution time.

Description

SCHEDULING TASKS FOR EXECUTION BY A COMPUTER BASED ON A REINFORCEMENT LEARNING MODEL

FIELD OF THE DISCLOSURE

The present disclosure relates to scheduling tasks for execution by a computer. Aspects of the disclosure relate to a task scheduler for scheduling tasks for execution by a computer, and to a method of scheduling tasks for execution by a computer.

BACKGROUND OF THE DISCLOSURE

A major cause of energy inefficiency in a multi- computing device environment, e.g. in a data centre comprising a pool of servers, is the electrical power wasted when computing resources are underutilised for executing tasks. For example, it is often the case that even in very-low- load/idle conditions, the power consumption of a processor of a computing device, such as a server, may still exceed 50% of its peak power consumption. Consolidation of task workloads on fewer, more highly utilised, computing resources thus has the potential to desirably reduce wasted power, and so reduce the overall power consumption of a data centre. However, whilst consolidation of workload within a data centre may often reduce wasted and overall power consumption, overloading of computing resources risks undesirably delaying serving of computing requests, and otherwise increasing the time taken for executing tasks. A particular concern for a data centre operator is the risk of task execution times exceeding a maximum task execution time agreed in a service-level agreement with a data centre user. Task scheduling in a multi-computing device environment, such as a data centre, thus requires careful balancing of the often competing demands of maximising resource utilisation whilst maintaining acceptable task execution times.

SUMMARY OF THE DISCLOSURE

An objective of the present disclosure is to provide task scheduling that optimises a utilisation of computing resources without unacceptably delaying the time taken for executing tasks by the computing resources.

The foregoing and other objectives are achieved by the features of the independent claims. Further implementation forms are apparent from the dependent claims, the description and the Figures. A first aspect of the present disclosure provides a task scheduler for scheduling tasks for execution by a computer, the task scheduler being configured to: obtain a first reinforcement learning model for optimising a task execution time and a second reinforcement learning model for optimising a utilisation of a computing resource of the computer for task execution; determine a measure of an execution time of one or more tasks executed by the computer; and schedule further tasks for execution by the computer using one of the first reinforcement learning model and the second reinforcement learning model based on the determined measure of the execution time.

The task scheduler may schedule tasks using the second reinforcement learning model to optimise a utilisation of computing resource of the computer, thereby potentially reducing an overall power consumption of the computer for execution of a given number of tasks.

However, because the task scheduler monitors an execution time of tasks, and schedules tasks selectively using the second reinforcement learning model or the first reinforcement learning model based on the execution time of tasks, the task scheduler may ensure that task execution times are not unacceptably delayed.

In the context of the disclosure, ‘optimising’ or similar may be understood to mean generally ‘reaching a target’ value, i.e., a target time or a target utilisation. As described, in some circumstances, ‘optimising’ may specifically mean reaching a ‘minimum’ time or a ‘maximum’ utilisation.

For example, ‘optimising’ a computer resource utilisation may often mean ‘maximising’ a computer resource utilisation. However, a maximal resource utilisation will not always necessarily result in optimised energy-efficiency of the computing resource. This is because the relationship between resource utilisation and energy consumption per task executed is not always linear. For example, the energy consumption of a computer processor will often increase more than linearly with respect to utilisation, such that an optimal overall energy efficiency of a group of computing devices may, in some circumstances, be achieved by spreading load evenly between the computing devices, rather than maximally utilising a subset of the computing devices and minimally utilising other of the computing devices. Thus, in an example alternative scenario, optimising a computer resource utilisation could instead mean aiming for a level of resource utilisation that is between a maximum and a minimum resource utilisation. For example, in one scenario, it may be known that a most energy-efficient utilisation of computing resource of the computer is a utilisation of 75%, and in this circumstance an objective of the second reinforcement learning model could be to achieve a 75% resource utilisation.

Similarly, ‘optimising’ a task execution time may mean ‘minimising’ a task execution time. Optionally, in some scenarios optimal performance of a computing system may instead require that a task execution time is maintained at a level above a possible minimum execution time of the computer. For example, where results of the executed tasks are returned by the computer to a client device via a low-bandwidth communication network, it may be desired that the execution time of the tasks is matched to the transmission rate of the network, to thereby avoid the need for memory intensive queueing of results transmission.

The task scheduler could be further configured to schedule one or more tasks for execution by the computer, or in other words, instruct the computer to execute one or more tasks, such that the step of determining a measure of an execution time of one or more tasks executed by the computer comprises determining a measure of an execution time by the computer of the one or more tasks scheduled for execution/instructed by the task scheduler. In an example simpler alternative, the step of determining a measure of an execution time of one or more tasks executed by the computer could involve the task scheduler determining a measure of an execution time of task(s) scheduled for execution by an external task scheduler.

The determining a measure of an execution time of one or more tasks executed by the computer could, for example, comprise determining an execution time of a single task executed by the computer. In this instance, the step of scheduling further tasks for execution by the computer could be based on the execution time of the single task. For example, in response to the execution time of the single task exceeding a threshold time value, the task scheduler could schedule further tasks using the first reinforcement learning model. In other words, in a simple embodiment, the decision on whether to schedule tasks for execution using the first reinforcement learning model or the second reinforcement learning model could be taken by the task scheduler based on the execution time of only a single task. This mode of operation may be justified by the assumption that, where the execution time of the single task is acceptable, execution times of other of the tasks will likely be similarly acceptable. This mode of operation may advantageously be relatively computationally simple. It will be appreciated however that this approach, whilst computationally simple, is susceptible to the risk of the single selected task having an execution time that is unrepresentative, e.g. significantly shorter or longer, than execution times of other of the tasks, which may thereby result in improper scheduling of the tasks.

In an implementation, the determining a measure of an execution time of one or more of the tasks executed by the computer comprises determining an average execution time of two or more of the tasks. An average execution time of a plurality of tasks executed by the computer may be more representative of the individual execution times of all of the tasks than any single task execution time. In particular, considering an average execution time may reduce the risk of incorrect operation of the scheduler resulting from sampling of a task having an ‘outlier’ execution time, i.e. an execution time that is significantly greater or lesser than execution times of the other tasks. The reliability of the task scheduler may thereby advantageously be improved.

The ‘two or more’ tasks could be a subset of the total number of tasks executed by the computer. Determining execution times of only a subset of the tasks, to thereby determine an average of that subset of execution times, may be more computationally inexpensive to compute than execution times of all of the tasks. However, for reason of improved reliability, even incurring additional computational expense, it may be preferred that the method comprises determining execution times, and an average execution time, of all of the tasks.

In an implementation, the scheduling further tasks for execution by the computer using one of the first reinforcement learning model and the second reinforcement learning model based on the determined measure of the execution time comprises scheduling further tasks for execution by the computer using the first reinforcement learning model if the determined measure of an execution time is greater than a threshold execution time.

In other words, the task scheduler may compare the measure of execution time(s) of the task(s) to a threshold execution time value, and proceed to schedule further tasks for execution using the second reinforcement learning model if the measure exceeds the threshold. The threshold execution time may thus be used to define a target, or alternatively a maximum allowable, task execution time. For example, the threshold execution time could be a predefined value set by a developer of the task scheduler, which may reflect a maximum permitted time set by a service-level agreement between an operator of the computer and a user of the computer. Because the task scheduler defaults to scheduling tasks using the first reinforcement learning model where the measure of the execution time(s) is unacceptably high, it may be expected that (average) execution time of tasks will quickly converge to the ‘optimum’ time.

In an implementation, the scheduling further tasks for execution by the computer using one of the first reinforcement learning model and the second reinforcement learning model based on the determined measure of the execution time comprises scheduling further tasks for execution by the computer using the first reinforcement learning model if the determined measure of an execution time is less than a threshold execution time.

In other words, the task scheduler may compare the measure of execution time of the task(s) to a threshold execution time value, and proceed to schedule further tasks for execution using the second reinforcement learning model if the measure is less than the threshold. Thus, where the threshold execution time defines a target execution time, or a maximum allowable execution time, an execution time of tasks less than the threshold indicates that tasks are being executed more quickly than required. Accordingly, it may be understood that the system can now afford to pursue the further objective of optimising resource utilisation.

Because the task scheduler reaches its decision on which of the reinforcement learning models to employ for scheduling tasks based primarily on the execution time of tasks, the risk of incorrect operation of the system resulting in unacceptable task execution times is reduced. In other words, the system operates to prioritise achieving of a desired task execution time, and only if that desired execution time is achieved does the system pursue the further objective of optimising resource utilisation.

In an implementation, the task scheduler is further configured to recurrently perform the determining a measure of an execution time of one or more tasks executed by the computer, and the scheduling further tasks for execution by the computer using one of the first reinforcement learning model and the second reinforcement learning model based on the determined measure of the execution time. In other words, the method of the preceding statements may be repeated continuously during operation of the task scheduler. Repeating the procedures ensures that the system pursues the objective of improved resource utilisation, whilst maintaining task execution times below the threshold time, at least in the medium-term.

In an implementation, the task scheduler is further configured to schedule additional tasks for execution by the computer using the second reinforcement learning model if the determined measure of the execution time is less than a further threshold execution time, wherein the further threshold execution time is less than the threshold execution time.

In other words, following scheduling of tasks using the first (time optimisation) reinforcement learning model, the task scheduler may compare the measure of the execution time, e.g. the average task execution time, to a further, relatively lower, threshold time value, and only change operation to scheduling tasks using the second (utilisation optimisation) reinforcement learning model, where the execution time measure is less than the lower threshold. This has the effect that, in operation, the task scheduler is forced to ‘overshoot’ the threshold execution time, to thereby reduce the (average) task execution time to significantly beyond, rather than just reaching, the threshold time before switching to scheduling task with the objective of resource utilisation. This mode of operation may advantageously reduce the occurrence of ‘flip- flopping’ of the task scheduler when the task execution time is very close the threshold time, which could undesirably impair the performance of the scheduler, and/or increase the risk of the task execution time exceeding the threshold. The operation of the task scheduler may thereby be further improved.

In an implementation, the task scheduler is further configured to determine a number of tasks for execution for which execution is not complete, and train the first reinforcement learning model for optimising a task execution time using a reward function based on the determined number of tasks.

In other words, the task scheduler may train the first reinforcement learning model for optimising a task execution time using a function of the number of tasks awaiting execution by the computer. The number of incomplete tasks is indirectly related to the task execution time of the computer, inasmuch that a slower execution time may be expected to increase the number of incomplete tasks, and conversely a quicker execution time may be expected to reduce the number of incomplete tasks. Thus, using such a function, the reinforcement learning model may be trained to optimise a task execution time. Moreover, training the first reinforcement learning model using a function of the number of incomplete tasks may provide more useful training than simply training using the monitored task execution time(s), for the reason that the number of incomplete tasks is further a function of the magnitude of the workload addressed by the task scheduler, i.e. the number of tasks requiring scheduling, thus making the task scheduler responsive to the workload condition in addition to the task execution time.

For example, the reward function could be a function of the inverse of the number of incomplete jobs, such that the system is rewarded by a lower number of incomplete jobs. Such a reward function may be desirable where the objective of the first reinforcement learning model is to minimise task execution times.

In an implementation, the task scheduler is further configured to determine a utilisation of a computing resource of the computer, and train the second reinforcement learning model using a reward function based on the determined utilisation.

In other words, the task scheduler may train the second reinforcement learning model for optimising a resource utilisation using a function of the utilisation of a computing resource of the computer. For example, the scheduler could determine the utilisation of processor capacity of the computer, and train the second reinforcement learning model based on that determination. This may advantageously reliably reward the model for approaching or reaching a desired utilisation. Determining a utilisation of a computing resource of the computer could, for example, mean measuring utilised resource, or could alternatively comprises measuring unutilised resource.

A second aspect of the present disclosure provides a computing system comprising a computer for execution of tasks and a task scheduler according to the first aspect of the disclosure configured for scheduling tasks for execution by the computer.

A third aspect of the present disclosure provides a method of scheduling tasks for execution by a computer, the method comprising: obtaining a first reinforcement learning model for optimising a task execution time and a second reinforcement learning model for optimising a utilisation of a computing resource of the computer for task execution; determining a measure of an execution time of one or more tasks executed by the computer; and scheduling further tasks for execution by the computer using one of the first reinforcement learning model and the second reinforcement learning model based on the determined measure of the execution time.

In an implementation, the determining a measure of an execution time of one or more of the tasks executed by the computer comprises determining an average execution time of two or more of the tasks.

In an implementation, the scheduling further tasks for execution by the computer using one of the first reinforcement learning model and the second reinforcement learning model based on the determined measure of the execution time comprises scheduling further tasks for execution by the computer using the first reinforcement learning model if the determined measure of the execution time is greater than a threshold execution time.

In an implementation, the scheduling further tasks for execution by the computer using one of the first reinforcement learning model and the second reinforcement learning model based on the determined measure of the execution time comprises scheduling further tasks for execution by the computer using the second reinforcement learning model if the determined measure of the execution time is less than a threshold execution time.

In an implementation, the method further comprises recurrently performing the determining a measure of an execution time of one or more of the tasks executed by the computer, and the scheduling further tasks for execution by the computer using one of the first reinforcement learning model and the second reinforcement learning model based on the determined measure of the execution time.

In an implementation, the method further comprises scheduling additional tasks for execution by the computer using the second reinforcement learning model if the determined measure of an execution time is less than a further threshold execution time, wherein the further threshold execution time is less than the threshold execution time.

In an implementation, the method further comprises determining a number of tasks for execution for which execution is not complete, and training the first reinforcement learning model for optimising a task execution time using a reward function based on the determined number of tasks.

In an implementation, the method further comprises determining a utilisation of the computing resource of the computer, and training the second reinforcement learning model using a reward function based on the determined utilisation.

A third aspect of the present disclosure provides a computer program comprising instructions, which, when executed by a computing device, cause the computing device to carry out the method according to the second aspect of the present disclosure.

A fourth aspect of the present disclosure provides a computer-readable data carrier having the computer program of the third aspect of the present disclosure stored thereon.

These and other aspects of the invention will be apparent from the embodiment s) described below.

BRIEF DESCRIPTION OF THE DRAWINGS

In order that the present invention may be more readily understood, embodiments of the invention will now be described, by way of example, with reference to the accompanying drawings, in which:

Figure 1 shows schematically an example of a computing system embodying aspects of the disclosure, the computing system comprising a computer for executing tasks, and a task scheduler for scheduling tasks for execution by the computer;

Figure 2 shows schematically hardware of the task scheduler previously identified with reference to Figure 1;

Figure 3 shows schematically virtual modules supported by the hardware of the task scheduler;

Figure 4 shows processes involved in a method of executing tasks using the computing system identified previously with reference to Figure 1, which includes a process of scheduling tasks for execution by the computer of the computing system and a process of training the task scheduler;

Figure 5 shows processes involved in the method of scheduling tasks for execution by the computer of the computing system, which includes a process of determining magnitudes of an available computational resource of the computer and of computational resource required for executing tasks;

Figure 6 shows a visualisation of the process of determining magnitudes of an available computational resource of the computer and of computational resource required for executing tasks;

Figure 7 shows processes involved in the method of training the task scheduler; and Figure 8 shows a visualisation of the method of training the task scheduler.

DETAILED DESCRIPTION OF THE DISCLOSURE

Referring firstly to Figure 1, a computing system 101 embodying an aspect of the present disclosure comprises a plurality of client devices, for example, two client devices 102 and 103, a plurality of computers, for example, two computers 104 and 105, and a task scheduler 106. The components 102 to 106 are in communication via a communication network, depicted illustratively at 107. For example, the network 107 is the internet.

Client devices 102 and 103 are computing devices configured for running application software to perform one or more functions. For example, each of the client devices 102 and 103 could comprise a personal computer, or a mobile computing device, configured to run office-related software, requiring execution of computational tasks. In order to reduce the consumption by the application software of internal computational resource, such as processor capacity, of the respective client device 102 and 103, each of the client devices is configured to output, via the network 107, computational tasks for execution by the computers 104 and 105.

Computers 104 and 105 each comprise hardware for executing computational tasks and for interfacing with the network 107. For example, each of the computers comprises a central processing unit for executing tasks, memory supporting a buffer for queuing received tasks and for storing information relating to programs run by the central processing unit and operational data generated by the programs during task execution, and a network interface, e.g., network card, for enabling the computer to communicate with the network 107. For example, the computers 104 and 105 are operated as a data center, remote from the client devices 102 and 103, for executing tasks output by the client devices in a client-server relationship.

Task scheduler 106 is a computer device, comprising hardware for reading tasks output by the client devices 102 and 103 and scheduling the tasks for execution by the computers 104 and 105. Task scheduler 106 may, for example, be a separate computer device to computers 104 and 105. For example, task scheduler 106 may be provided as a server computer. In other examples, task scheduler 106 could be incorporated into one or more of client devices 102 and 103 or computers 104 and 105. As will be described, task scheduler 106 is configured for scheduling tasks using reinforcement learning models. Task scheduler 106 will be described in further detail with particular reference to Figures 2 and 3.

Referring next to Figure 2, the task scheduler 106 comprises central processing unit 201, memory 202, random access memory 203, network card 204, and system bus 205.

Central processing unit 201 is configured for executing processes relating to the scheduling of tasks for execution by the computers 104 and 105. Flash memory 202 is configured for non volatile storage of programs relating to the processes for scheduling tasks performed by the central processing unit 102. Random-access memory 104 is configured as read/write memory for storage of operational data associated with the programs executed by the central processing unit 102, and for storing data relating to tasks output by the client devices 102 and 103. Network interface 204, e.g. a network card, enables the task scheduler 106 to communicate with the client devices and the computers through the network 107. The components 201 to 204 of the task scheduler are in communication via system bus 205.

Referring next to Figure 3, the hardware of the task scheduler 106 is configured to support eight functions used for a method of scheduling tasks output by the client devices 102 and 103.

Thresholds identifier 301 is functional to identify service level parameters for execution of tasks by the computers 104 and 105. In particular, thresholds identifier 301 is functional to identify threshold execution times, e.g. maximum execution times, for execution of a task by the computers 104 and 105. For example, the thresholds identifier 301 may be functional to read a service- level agreement identifying threshold times and other parameters for task execution, that is stored in the memory 202 of the task scheduler.

Task buffer 302 is functional to receive and store tasks output by the client devices 102 and 103, in preparation for scheduling of those tasks by the task scheduler. For example, the computing system 101 may be configured such that the task scheduler receives and stores all tasks output by the client devices 102 and 103 determines a schedule for executing the tasks, and subsequently forwards the tasks to the computers 104 and 105, for execution along with a determined schedule defining the order of execution of the tasks. In an example alternative, tasks output by the client devices 102 and 103 could be sent directly to one or both of the computers 104 and 105, and stored in memory of the computers, and the task scheduler could be configured to read the tasks from the memory of the computer(s) 104 and 105. for the purpose of generating a schedule for the tasks. In this alternative example, task buffer 302 may be omitted from task scheduler 106, and substituted by functionality to read tasks from the memory of the computer(s) 104 and 105.

Resource monitor 303 is functional to determine a computational resource demanded by each queued task for execution of the task. Resource monitor 303 is further functional to determine a utilisation of the computational resource, e.g., processor capacity, of the computers 104 and 105 for executing tasks. For example, resource monitor 303 may receive reports from computers 104 and 105, e.g., via network 107, identifying unutilised computational resource of the respective computer, or could alternatively be configured to dynamically test for unutilised resource of the computers 104 and 105.

Task instructor 304 is functional to instruct the computers 104 and 105 to execute tasks in accordance with a schedule generated by task scheduler. For example, task instructor 304 may communicate, e.g., via the network 107, execution commands to computers 104 and 105.

Time monitor 305 is functional to determine the time taken by the computers 104 and 105 to execute scheduled tasks. For example, the time monitor 305 may receive signals, e.g., via the network 107, from the computers 104 and 105 at start and end times of instances of task execution by the respective computer. As an alternative example, the computers 104 and 105 could self-report times taken to execute tasks, and such reports could be communicated to time monitor 305 via network 107. Further, the time monitor 305 is functional to compute an average of execution times for plural tasks.

Queue monitor 306 is functional to determine a number of tasks output by the client devices 102 and 103 for which execution by the computers 104 and 105 is not complete. For example, the queue monitor may be configured to include in the count both tasks awaiting execution and partially-executed tasks. The queue monitor could, for example, receive reports from the computers 104 and 105 identifying a number of incomplete tasks.

Schedule generator 307 is functional to determine a schedule for tasks output by the client devices 102 and 103 to be executed by the computers 104 and 105, and is further functional to communicate the schedule to the computers 104 and 105, e.g., via network 107. As will be described further with reference to Figure 5 and 8, schedule generator 307 deploys reinforcement learning models to schedule tasks.

Model trainer 308 is functional to train the reinforcement learning models of the schedule generator 307 to schedule tasks for execution by the computers 104 and 105.

Referring to Figure 4, a method of operating the computing system 101 to execute tasks comprises six stages.

At stage 401, the client devices 102 and 103 output, via the communication network 107, computational tasks for execution by the computers 104 and 105. Such computational tasks could, for example, comprise computational operations involved in the performance of application software running on the client devices 102 and 103. For example, application software running on client device 102 and 103 may involve execution of neural network models, and the neural network models may be executed by the computers 104 and 105, whereby the results of execution of the neural network models may be communicated by the computers 104 and 105 to the client devices 102 and 103. The outputting of tasks for execution could, for example, be a process controlled by application software running on the client devices 102 and 103, such that the method is initiated automatically by the application software. For example, the computing system 101 may be configured such that the client devices 102 and 103 output tasks for execution to the task scheduler 106. whereby the task buffer 302 receives and stores the tasks, e.g., in the random-access memory 203. At stage 402, the task scheduler 106 determines a schedule for execution of tasks using one or more reinforcement learning models. The schedule defines an order of execution of tasks by the computers 104 and 105, and defines a split of tasks, or portions of tasks, for execution by the computer 104, and tasks or portions of tasks for execution by the computer 105.

At stage 403, the task scheduler 106 sends details of the tasks to the computers 104 and 105, via the communication network 107. For example, stage 403 could involve the task scheduler 106 communicating data defining the one or more task to be completed, e.g., the computational operation(s) to be performed and inputs of the computational operation(s), to a respective one, or both, of the computers 104 and 105. Following sending of the tasks to the computers 104 and 105, details of the tasks are erased from the random-access memory 203 of the task scheduler 106. The task scheduler 106 further sends the schedule generated at stage 402 to the computers 104 and 105. Details of the tasks, and the schedule, may subsequently be stored in internal memory of one or both of the computers 104 and 105.

At stage 404, the computers 104 and 105 execute the tasks received at stage 403 in accordance with the schedule also received at stage 403.

At stage 405, following execution of the tasks at stage 404, the computers 104 and 105 return computational results of the tasks, e.g., predictions of neural network model tasks, to one or both of the client devices 102 and 103. The results returned may thereby be utilised by application software running on the client devices 102 and 103.

At stage 406, the reinforcement learning models employed by the task scheduler 106 for scheduling tasks are trained.

Referring in particular to Figure 5, the method of stage 402 for scheduling tasks for execution comprises ten stages.

At stage 501, a threshold execution time (T_T), e.g. a maximum permissible time, for execution of tasks by the computers 104 and 105 is determined by the thresholds identifier 301 of the task schedule 106. The thresholds identifier 301 may read a static threshold execution time stored in memory 202 of the task scheduler 106. For example, a service-level agreement from a developer of the computing system 101 includes the threshold execution time and is stored in the memory 202 of the task scheduler 106. As an example alternative, the thresholds identifier 301 may be configured to dynamically determine a threshold execution time in accordance with predefined rules. For example, the predefined rules could define threshold execution times as a function of the particular client device by which the tasks are output, e.g., as a function of the client to be serviced, or as a function of a time of day.

At stage 502, the task buffer 302 of the task scheduler 106 receives tasks output by the client devices 102 and 103, and stores the received tasks in the random-access memory 203 of the task scheduler.

At stage 503, the resource monitor 303 of the task scheduler 106 determines: (a) a magnitude of computational resources, specifically, processor and memory capacity, of each of the computers 104 and 105; and (b) a magnitude of computational resources, specifically again, processor and memory capacity, required for execution of each of the tasks received at stage 502.

At stage 504, the task instructor 304 of the task scheduler 106 initially sends one or more tasks received at stage 502 to one or both of the computers 104 and 105 for execution. For example, this stage involves the task instructor sending a small plurality of the tasks, for example, 10% of the tasks, to one or both of the computers 104 and 105, in the order in which they were received by the task buffer from the client devices 102 and 103. The primary purpose of the task instructor 304 sending these initial tasks to the computers 104 and 105 is to allow analysis of the execution times of tasks by the computers, which will inform the later stages of the task scheduling process.

At stage 505, the time monitor 305 of the task scheduler 106 measures the time durations (T) for execution of each of the tasks sent for execution by the computers at stage 504. For example, stage 505 could involve the time monitor 305 receiving signals, via the network 107, marking start and end times of task execution instances by the computers 104 and 105, whereby the time monitor 305 may determine the execution time of the task(s). As an example alternative, stage 505 could involve the computers 104 and 105 sending reports to time monitor 305 reporting times taken to execute tasks. At stage 506, the time monitor 305 computes an average, e.g., an arithmetic mean, of the execution times of the tasks (T_AV) determined at stage 505.

At stage 507, the time monitor 305 determines whether the average execution time TAV determined at stage 506 is equal to or less than the threshold execution time TT determined at stage 501. If the determination at stage 507 is answered in the negative, indicating that the average time for execution of the tasks is greater than the threshold execution time, or in other words, unacceptably long, the task scheduler proceeds to stage 508. In the alternative, if the determination at stage 507 is answered in the affirmative, indicating that the average time for execution of the tasks is indeed equal to or less than the threshold execution time, or in other words, acceptably short, the task scheduler proceeds to stage 509.

At stage 508, the schedule generator 307 proceeds to schedule other of the tasks received at stage 502, i.e., tasks which were not previously sent for execution at stage 504, using a first reinforcement learning model, configured for optimising, e.g. minimising, task execution times. For example, the first reinforcement learning model could minimise task execution times by scheduling tasks for execution by distributing task workload between the computers 104 and 105. This distributed workload may be expected to reduce the instantaneous demand on computational resources on each of the computers 104 and 105, which may thereby be expected to reduce the time taken by each computer to execute tasks.

At stage 509, the schedule generator 307 proceeds to schedule tasks using a second reinforcement learning model, configured for optimising, e.g. maximising, a utilisation of computational resource of one or both of the computers 104 and 105 for executing tasks. For example, the second reinforcement learning model could maximise utilisation of computing resources of one of the computers 104 and 105 by scheduling a relatively great proportion of the tasks, or even of all of the tasks, received at stage 502 for execution by the one computer. This consolidation of tasks onto one heavily utilised computer 104 and 105 may, in certain circumstances, be expected to reduce the overall power consumption of the computers 104 and 105.

At stage 510, the schedule generator 307 generates one or more schedules defining parameters for execution of the tasks received at stage 502, based on the output of stage 508 or stage 509. For example, the schedule generated at stage 510 could define which of computers 104 and 105 should execute each task, along with an order of execution of tasks by the computers 104 and 105.

The schedule(s) generated at stage 510 may then be sent to the computers 104 and 105, at stage 403, optionally accompanied by details of the tasks to be executed, as described previously with reference to Figure 3.

As indicated in Figure 5, stages 505 to 510 of the task scheduling process may be repeated following execution of previously scheduled tasks by the computers 104 and 105 at stage 404. Thus, in a second iteration of stages 505 to 510, at stage 505 the time monitor 305 may monitor execution times of the tasks scheduled in the above described first iteration. Depending on the (average) execution times of those first iteration tasks, in the second iteration the task scheduler

106 may schedule further tasks for execution at stage 508 or stage 509. In the specific example, the task scheduler 106 is configured to schedule tasks for execution in small batches, for example, in batches of less than one-hundred tasks, prior to sending the batch of tasks to the computers 104 and 105 at stage 403. In each iteration therefore the time monitor 305 monitors the execution times of tasks sent in the immediately preceding iteration, and schedules further tasks based on the monitored times. In alternative embodiments, the task scheduler could be configured to schedule tasks in significantly smaller or greater batch numbers, for example, in batch numbers of fewer than ten tasks, or greater than a thousand tasks.

Referring next to Figure 6, the process of stage 503 for determining a utilisation of processor and memory computational resources of each of the computers 104 and 105, and for determining a processor and memory capacity for execution of each task received by task buffer 302 at stage 502, allows a determination to be made of the ability of the computers 104 and 105 for executing each of the queued tasks. Thus, referring to the example, in result of this stage, the task scheduler 106 may determine that execution of the example task queued at job slot 1 involves two units of processor capacity of the computers 104 and 105 for two units of time, and one unit of memory of the computers 104 and 105 for two units of time, and that execution of the example task queued at job slot 2 involves one unit of processor capacity of the computers 104 and 105, for one unit of time and two units of memory of the computers 104 and 105 for one unit of time, et cetera. Knowledge of the aforementioned utilisation of computational resource of the computers 104 and 105, and of the computational resource involved in executing the tasks, advantageously allows for avoiding over-utilisation, i.e., overloading, of one or both of the computers 104 and 105 with tasks requiring computational resource exceeding an available computational resource of the computers 104 and 105.

Referring next to Figures 7 and 8 collectively, the method of stage 406 for training the reinforcement learning models RL1, RL2 deployed by the task scheduler 106 at stages 508 and 509 respectively comprises four stages.

At stage 701, the queue monitor 306 determines a number of tasks output by the client devices 102 and 103 and received by the task scheduler 106 for which execution by the computers 104 and 105 is not complete. For example, the determination at this stage is to include both tasks received by the task scheduler which are awaiting scheduling and tasks scheduled for execution by the computers 104 and 105 which are not yet fully completed. At this stage, the queue monitor 306 may query the task buffer 302 of the task scheduler 106 to determine a number of tasks awaiting scheduling, i.e. tasks which have not yet been sent to the computers 104 and 105 for execution, and may query the computers 104 and 105 to determine a number of tasks scheduled for execution, i.e. tasks which have already been sent to the computers 104 and 105, and a number of tasks currently undergoing execution by the computers 104 and 105.

At stage 702, the resource monitor 303 determines a degree of utilisation of a computational resource, such as processor capacity, of the computers 104 and 105 for executing tasks, i.e. a proportion of a computational resource of the respective computer currently utilised for executing tasks. For example, this stage may involve the task scheduler querying the computers 104 and 105 to establish a proportion of the processor capacity of the respective computer currently utilised for executing tasks.

At stage 703, the model trainer 308 generates first and second reward functions for training a respective one of the first and second reinforcement learning models, RL1, RL2 respectively.

For example, the first reinforcement learning model, RL1, is trained using a reward function that is defined as a function of the inverses of the number of tasks for which execution is not complete, as determined at stage 701. In the context of training the first reinforcement learning model to minimise execution times, a reward function defined with reference to the number of tasks for which execution is not complete is considered advantageous for the reason that the function takes into account not only, indirectly, the rate at which the tasks are executed by the computers 104 and 105, i.e. the average execution time, but also the workload of tasks for execution output by the client devices 102 and 103. Further, for example, a reward function for training the second reinforcement learning model, RL2, i.e. for training model RL2 to maximise a utilisation of computational resource, e.g. processor capacity of the computers 104 and 105, is defined as a function of the utilisation of computational resource, such as processor capacity, of the computers 104 and 105, as determined at stage 702.

At stage 704, the parameters of the reinforcement learning models employed by the task scheduler for scheduling tasks for execution by the computers 104 and 105 is updated by a respective one of the two reward functions generated at stage 703. The task scheduler may thereby be trained to schedule tasks to optimally balance task execution time and resource utilisation. However, because the method of scheduling tasks at stage 402, as described with reference to stage 507 of Figure 5, gives priority to the first reinforcement learning model configured for minimising task execution times, task execution times may be maintained close to a desired level, for example, close to a minimum time.

Although the present invention and its advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims. _In the claims, the word “comprising” does not exclude other elements or steps, and the indefinite article “a” or “an” does not exclude a plurality. Further, where methods or processes are described by way of example as involving plural steps or stages, it should be understood that in other examples, stages may be omitted or performed in alternative orders to the described example.

Claims

1. A task scheduler for scheduling tasks for execution by a computer, the task scheduler being configured to: obtain a first reinforcement learning model for optimising a task execution time and a second reinforcement learning model for optimising a utilisation of a computing resource of the computer for task execution; determine a measure of an execution time of one or more tasks executed by the computer; and schedule further tasks for execution by the computer using one of the first reinforcement learning model and the second reinforcement learning model based on the determined measure of the execution time.

2. The task scheduler of claim 1, wherein the determining a measure of an execution time of one or more of the tasks executed by the computer comprises determining an average execution time of two or more of the tasks.

3. The task scheduler of claim 1 or claim 2, wherein the scheduling further tasks for execution by the computer using one of the first reinforcement learning model and the second reinforcement learning model based on the determined measure of the execution time comprises scheduling further tasks for execution by the computer using the first reinforcement learning model if the determined measure of an execution time is greater than a threshold execution time.

4. The task scheduler of any one of the preceding claims, wherein the scheduling further tasks for execution by the computer using one of the first reinforcement learning model and the second reinforcement learning model based on the determined measure of the execution time comprises scheduling further tasks for execution by the computer using the second reinforcement learning model if the determined measure of an execution time is less than a threshold execution time.

5. The task scheduler of any one of the preceding claims, configured to recurrently perform the determining a measure of an execution time of one or more tasks executed by the computer, and the scheduling further tasks for execution by the computer using one of the first reinforcement learning model and the second reinforcement learning model based on the determined measure of the execution time.

6. The task scheduler of any one of claims 3 to 5, configured to schedule additional tasks for execution by the computer using the second reinforcement learning model if the determined measure of the execution time is less than a further threshold execution time, wherein the further threshold execution time is less than the threshold execution time.

7. The task scheduler of any one of the preceding claims, further configured to: determine a number of tasks for execution for which execution is not complete, and train the first reinforcement learning model for optimising a task execution time using a reward function based on the determined number of tasks.

8. The task scheduler of any one of the preceding claims, further configured to: determine a utilisation of the computing resource of the computer, and train the second reinforcement learning model using a reward function based on the determined utilisation.

9. A computing system comprising a computer for execution of tasks and the task scheduler of any one of the preceding claims configured for scheduling tasks for execution by the computer.

10. A method of scheduling tasks for execution by a computer, the method comprising: obtaining a first reinforcement learning model for optimising a task execution time and a second reinforcement learning model for optimising a utilisation of a computing resource of the computer for task execution; determining a measure of an execution time of one or more tasks executed by the computer; and scheduling further tasks for execution by the computer using one of the first reinforcement learning model and the second reinforcement learning model based on the determined measure of the execution time.

11. The method of claim 10, wherein the determining a measure of an execution time of one or more of the tasks executed by the computer comprises determining an average execution time of two or more of the tasks.

12. The method of claim 10 or claim 11, wherein the scheduling further tasks for execution by the computer using one of the first reinforcement learning model and the second reinforcement learning model based on the determined measure of the execution time comprises scheduling further tasks for execution by the computer using the first reinforcement learning model if the determined measure of the execution time is greater than a threshold execution time.

13. The method of any one of the preceding claims, wherein the scheduling further tasks for execution by the computer using one of the first reinforcement learning model and the second reinforcement learning model based on the determined measure of the execution time comprises scheduling further tasks for execution by the computer using the second reinforcement learning model if the determined measure of the execution time is less than a threshold execution time.

14. The method of any one of the preceding claims, further comprising recurrently performing the determining a measure of an execution time of one or more of the tasks executed by the computer, and the scheduling further tasks for execution by the computer using one of the first reinforcement learning model and the second reinforcement learning model based on the determined measure of the execution time.

15. The method of any one of claims 12 to 14, further comprising scheduling additional tasks for execution by the computer using the second reinforcement learning model if the determined measure of an execution time is less than a further threshold execution time, wherein the further threshold execution time is less than the threshold execution time.

16. The method of any one of the preceding claims, further comprising: determining a number of tasks for execution for which execution is not complete, and training the first reinforcement learning model for optimising a task execution time using a reward function based on the determined number of tasks.

17. The method of any one of the preceding claims, further comprising: determining a utilisation of the computing resource of the computer, and training the second reinforcement learning model using a reward function based on the determined utilisation.

18. A computer program comprising instructions, which, when executed by a computing device, cause the computing device to carry out the method of any one of claims 10 to 17.

19. A computer-readable data carrier having the computer program of claim 18 stored thereon.