CN109936604B

CN109936604B - Resource scheduling method, device and system

Info

Publication number: CN109936604B
Application number: CN201711362963.6A
Authority: CN
Inventors: 张皓天; 苏磊; 靳江明
Original assignee: Beijing Tusimple Technology Co Ltd
Current assignee: Beijing Tusimple Technology Co Ltd
Priority date: 2017-12-18
Filing date: 2017-12-18
Publication date: 2022-07-26
Anticipated expiration: 2037-12-18
Also published as: CN109936604A

Abstract

The invention discloses a resource scheduling method, a resource scheduling device and a resource scheduling system, which aim to solve the technical problem of low utilization rate of GPU resources in the prior art. The method comprises the following steps: monitoring the allocable resources of each GPU in each host machine; when a new task is received, determining demand resources corresponding to the new task; determining a target GPU (graphics processing Unit) with distributable resources meeting the required resources according to the distributable resources of each GPU in the host machine; and allocating resources for the new task from the allocable resources of the target GPU, and allocating the new task to the host machine where the target GPU is located. By adopting the technical scheme of the invention, the utilization rate of GPU resources is improved, and the task execution efficiency and speed are improved.

Description

Resource scheduling method, device and system

Technical Field

The present invention relates to the field of computers, and in particular, to a resource scheduling method, a resource scheduling apparatus, and a resource scheduling system.

Background

At present, a distributed computing cluster system based on a master-worker mode is increasingly widely used (for example, a docker container cluster), and the distributed computing cluster system based on the master-worker mode includes a master-end server and a plurality of worker-end host machines. The master end server is used for receiving the new task, distributing resources to the new task, distributing tasks to the worker host machine and the like; the worker host is used for receiving the new task and executing the new task.

In the distributed computing cluster system, when a master server allocates resources to a new task, all resources of one or more GPUs (Graphics Processing units) in a worker host are allocated to the same task, that is, one task occupies all resources of one or more GPUs.

And when the Master server receives a new task, judging whether a whole GPU which is not allocated to any task exists on the worker terminal host machine, if not, waiting for the task in execution to finish execution and then allocating one or more whole GPU resources to the new task. However, in actual use, a task often does not use the allocated whole GPU resources 100% of the time, for example, the task may use only 30% or 50% of the resources of the whole GPU for a long time, and other resources of the GPU are idle. Therefore, the existing resource allocation mode cannot fully and reasonably utilize the resources of the whole GPU, and the utilization rate of the GPU resources is low.

Disclosure of Invention

In view of the above problems, the present invention provides a resource scheduling method, device and system to solve the technical problem of low GPU resource utilization in the prior art.

The embodiment of the invention provides a resource scheduling method in a first aspect, which is applied to a master server in a distributed computing cluster of a master-worker mode, and comprises the following steps:

monitoring the distributable resources of each GPU in each host machine;

when a new task is received, determining demand resources corresponding to the new task;

determining a target GPU (graphics processing Unit) with distributable resources meeting the required resources according to the distributable resources of each GPU in the host machine;

and allocating resources for the new task from the allocable resources of the target GPU, and allocating the new task to the host machine where the target GPU is located.

In an embodiment of the present invention, a second aspect provides a resource scheduling method, where the method is applicable to a worker-end host in a distributed computing cluster in a master-worker mode, and the method includes:

determining distributable resources in each GPU in a host machine;

sending the distributable resources of each GPU to a master server;

and executing the tasks distributed by the master server.

In an embodiment of the present invention, a third aspect provides a resource scheduling apparatus, where the apparatus is disposed at a master server in a distributed computing cluster in a master-worker mode, and the apparatus includes:

the monitoring unit is used for monitoring the distributable resources of each GPU in each host machine;

the analysis unit is used for determining the demand resource corresponding to the new task when receiving the new task;

the determining unit is used for determining a target GPU of which the distributable resources meet the required resources according to the distributable resources of each GPU in the host machine;

and the allocation unit is used for allocating resources to the new task from the allocable resources of the target GPU and allocating the new task to the host corresponding to the target GPU.

In an embodiment of the present invention, a fourth aspect provides a resource scheduling apparatus, where the apparatus is arranged in a worker-end host in a distributed computing cluster in a master-worker mode, and the apparatus includes:

the resource determining unit is used for determining distributable resources in each GPU in the host machine;

the communication unit is used for sending the allocable resources of each GPU to the master server;

and the execution unit is used for executing the tasks distributed by the master server.

In an embodiment of the present invention, a fifth aspect provides a resource scheduling system, including a master server and a plurality of worker end host machines respectively connected to the master server, where:

the master server is used for monitoring the distributable resources of each GPU in each host machine; when a new task is received, determining demand resources corresponding to the new task; determining a target GPU of which the distributable resources meet the required resources according to the distributable resources of each GPU in the host machine; distributing resources for the new task from the distributable resources of the target GPU, and distributing the new task to a host corresponding to the target GPU;

and the host machine is used for determining the distributable resources of the GPU in the host machine, sending the distributable resources to the master-end server and executing the tasks distributed by the master-end server.

In the embodiment of the invention, aiming at a distributed computing cluster of a master-worker mode, a master end server monitors the distributable resources of each GPU in each host machine; when a new task is received, the whole resource of the GPU is not directly distributed to the new task, but the resource quantity corresponding to the required resource is distributed from the distributable resource of the GPU according to the required resource of the new task. By adopting the technical scheme of the invention, when the same GPU resource is allocated to the task in execution and if the residual allocable resources exist, the allocable resource of the GPU can be allocated to other tasks for use, so that a plurality of tasks can share the resource of the same GPU, the GPU resource is fully utilized, and the problem of low GPU resource utilization rate caused by the fact that one task monopolizes the whole GPU resource in the prior art is solved; moreover, due to the adoption of the technical scheme of the invention, the method and the system can be used for more tasks under the condition of the same GPU resource amount as that of the prior art, and can distribute resources for the new tasks in time when receiving the new tasks, thereby integrally improving the task execution speed and efficiency.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.

FIG. 1 is a schematic structural diagram of a resource scheduling system according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a resource scheduling device disposed in a master-side server in an embodiment of the present invention;

FIG. 3 is a diagram illustrating the allocable resource amount of each GPU recorded in the resource pool according to the embodiment of the present invention;

fig. 4 is a second schematic structural diagram of a resource scheduling device disposed in a master server according to an embodiment of the present invention;

fig. 5 is a schematic diagram of task information corresponding to a host machine maintained in a task information maintenance unit according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating the task information of FIG. 5 after being updated;

FIG. 7 is a diagram illustrating a structure of a determining unit according to an embodiment of the present invention;

FIG. 8 is a second schematic structural diagram of a determining unit according to an embodiment of the present invention;

FIG. 9 is a third schematic structural diagram of a determining unit according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of a resource scheduling apparatus disposed in a worker end host in the embodiment of the present invention;

FIG. 11 is a flowchart of a resource scheduling method implemented in a master server according to an embodiment of the present invention;

FIG. 12 is one of the flow charts for implementing step 103 in FIG. 11;

FIG. 13 is a second flowchart for implementing step 103 in FIG. 11;

FIG. 14 is a third flowchart for implementing step 103 in FIG. 11;

FIG. 15 is a fourth flowchart for implementing step 103 in FIG. 11;

FIG. 16 is a fifth flowchart for implementing step 103 of FIG. 11;

fig. 17 is a flowchart of a resource scheduling method set in a worker end host in the embodiment of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, shall fall within the protection scope of the present invention.

The technical scheme of the invention is suitable for all mster-worker mode distributed computing clusters, such as a docker container cluster, an engine computing cluster and the like. The present application is not strictly limited to a specific distributed computing cluster.

Example one

As shown in fig. 1, which is a schematic structural diagram of a resource scheduling system, a distributed computing cluster in an mster-worker mode is used in the resource scheduling system, and the distributed computing cluster includes a master server and a plurality of worker end hosts which are respectively in communication connection with the master server.

The Master server can realize the following functions through a Master program arranged on the Master server: monitoring the distributable resources of each GPU in each host machine in real time or periodically; receiving a new task, and analyzing task parameters corresponding to the new task to obtain demand resources corresponding to the new task; determining a target GPU of which the allocable resources meet the required resources of the new task according to the allocable resources of each GPU in each host; and distributing resources for the new task from the distributable resources of the target GPU, and distributing the new task to the host machine where the target GPU is located, so that the host machine where the target GPU is located calls a corresponding worker program to execute the new task.

Each worker end host machine can realize the following functions through a worker program arranged on the host machine: and determining the distributable resources of each GPU on the host machine where the worker program is located in real time or periodically, sending the distributable resources of each GPU to the master server, and executing the tasks distributed to the host machine by the master server.

In the embodiment of the present invention, there are various mechanisms for the host to send the distributable resources corresponding to each GPU on the host to the master server, and the present application is not limited strictly. For example, the worker program periodically and actively synchronizes the allocable resources of each GPU on the host where the worker program is located to the master server; for example, the master server periodically sends a resource acquisition request to each host, and the worker program in each host sends the allocable resource of each GPU on the host where the worker program is located to the master server according to the received resource acquisition request; for another example, the master-side server may periodically poll each host, and the worker program sends the assignable resources of each GPU on the host to the master-side server when the master-side server polls the host where the worker program is located.

In order to facilitate the technical solutions of the present invention to be further understood by those skilled in the art, the following describes the technical solutions of the present invention in detail from the master-end server and the worker-end host, respectively.

Example two

The master program in the master-side server can implement the foregoing functions through a scheduler (i.e., a resource scheduling device) that is a subprogram of the master, and the resource scheduling device may have a structure as shown in fig. 2, and may include a monitoring unit 11, an analysis unit 12, a determination unit 13, and an allocation unit 14, where:

and a monitoring unit 11, configured to monitor an allocable resource of each GPU in each host.

And the analysis unit 12 is configured to determine a required resource corresponding to the new task when the new task is received.

In the embodiment of the present invention, when receiving a new task, the parsing unit 12 parses the task parameter corresponding to the new task according to a preset parsing rule, so as to obtain a required resource corresponding to the new task, for example, the task parameter includes identity information (such as a name or an ID) of the new task, and GPU resource information (the GPU resource information includes the number of GPUs and the amount of resources occupied by each GPU) required by the new task.

A determining unit 13, configured to determine, according to the allocable resource of each GPU in the host, a target GPU of which the allocable resource meets the required resource.

And the allocating unit 14 is configured to allocate resources to the new task from the allocable resources of the target GPU, and allocate the new task to a host corresponding to the target GPU.

In the embodiment of the present invention, the monitoring unit 11 may monitor the allocable resources of each GPU in each host by, but not limited to, the following manners:

the monitoring unit 11 establishes a resource pool (i.e., resource pool), and dynamically records the allocable resource amount of each GPU in each host in the resource pool, as shown in fig. 3, the host (denoted by H1) includes 3 pieces of GPUs (denoted by H1G1, H1G2, and H1G3, respectively), and the allocable resource amounts corresponding to H1G1, H1G2, and H1G3 are N11, N12, and N13, respectively. When receiving the allocable resources corresponding to each GPU on the host from the host, the monitoring unit 11 updates the allocable resource amount corresponding to the GPU in the resource pool according to the received allocable resources corresponding to each GPU.

Of course, those skilled in the art may also monitor the allocable resources of each GPU in each host in other manners, for example, by creating a dynamic list, recording information of the amount of allocable resources of each GPU in each host in the dynamic list, and maintaining the information in the dynamic list in real time or periodically.

In the embodiment of the present invention, the determining unit 13 obtains the allocable resources of each GPU in each host from the monitoring unit 11, so as to determine the target GPU of which the allocable resources meet the required resources corresponding to the new task.

Preferably, in order to update the allocable resource amount corresponding to each GPU in the resource pool in time, after allocating resources for the new task from the allocable resources of the target GPU, the allocating unit 14 synchronizes the target GPU and the resource amount allocated to the new task to the monitoring unit 11, and the monitoring unit 11 updates the allocable resource amount of the target GPU in time. Taking the H1G1 shown in fig. 3 as an example, the allocable resource amount of the H1G1 before allocating resources for the new task from the allocable resources of the H1G1 is N11, and after the allocating unit 14 allocates resources with the amount of M1 from the allocable resources of the target GPU, the allocable resources of the target GPU become N11-M1.

Preferably, in order to further timely obtain task information in each host, the resource scheduling apparatus may further include a task information maintenance unit 15, as shown in fig. 4, where:

the task information maintenance unit 15 is configured to record task information corresponding to each host, where the task information includes all executing tasks on the host and GPU resource information allocated to each executing task, and the GPU resource information includes: and the GPU corresponding to the task in execution and the resource amount occupied by each GPU.

As shown in fig. 5, the host H1 includes 3 GPUs (represented by H1G1, H1G2, and H1G3, respectively), and the host H1 includes two tasks (represented by task a1 and task a2, respectively), where: task A1 corresponds to H1G1 and H1G2, the amount of resources allocated to task A1 by H1G1 is M11, and the amount of resources allocated to task A1 by H1G2 is M12; task a2 corresponds to H1G3, and the amount of resources H1G3 allocates to task a2 is M21.

Preferably, in order to update the task information corresponding to each host in time, in the embodiment of the present invention, after the allocation unit 14 allocates resources to the new task from the allocable resources of the target GPU, the target GPU and the amount of the resources allocated to the new task are synchronized to the task information maintenance unit 15, so that the task information maintenance unit 15 updates the task information corresponding to the host where the target GPU is located in time. Taking the GPU with H1G2 shown in fig. 5 as an example, the new task is represented by task 3, the task information corresponding to the host H1 for the new task is shown in fig. 6, the new task 3, the H1G2 corresponding to task 3, and the amount of resources allocated for task 3 by H1G2 is M31.

Preferably, in the embodiment of the present invention, after each host executes a certain task, the resource corresponding to the task is released, and the state information of the task that is executed and the resource information occupied by the task are synchronized to the monitoring unit 11 and the task information maintenance unit 15, so that the monitoring unit 11 and the task information maintenance unit 15 update information.

In the embodiment of the invention, the distributable resources of the GPU can be idle resources in the GPU, can also be sharable resources in the GPU, and can also be idle resources and sharable resources in the GPU. The free resources of the GPU refer to the resources which are not allocated to the executing task in the GPU, and the sharable resources of the GPU refer to the partial resources which are not utilized by the executing task in a prediction period in the resources which are allocated to the executing task in the GPU. For example, taking H1G1 shown in fig. 5 as an example, assuming that the total amount of resources of H1G1 is N1, H1G1 currently includes task a1 and task a2, where the amount of resources allocated to task a1 by H1G1 is M11, the amount of resources allocated to task a2 is M12, task a1 only occupies the amount of resources of M11 'for a period of time, and task a2 only occupies the amount of resources of M12' for a period of time, then the idle resources in H1G1 are N1-M11-M12, and the sharable resources in H1G1 include (M11-M11 ') and (M12-M12'). The following description is made with example 1, example 2, and example 3, respectively.

Example 1

In example 1, the allocable resource of the GPU is an idle resource in the GPU, and the structure of the determining unit 13 is as shown in fig. 7, and includes a judging subunit 131 and a determining subunit 132, where:

a determining subunit 131, configured to determine whether a candidate GPU with allocable resources greater than or equal to the required resources exists in each GPU in the host, and if the candidate GPU exists, trigger the determining subunit 132;

a determining subunit 132, configured to select one of the GPUs from the candidate GPUs as a target GPU.

The determining subunit 132 may randomly select one GPU from the candidate GPUs as the target GPU, or select a GPU with the least allocable resources from the candidate GPUs as the target GPU, which is not strictly limited in this application.

Preferably, when the determining subunit 131 determines that there is no candidate GPU, if the priority of the new task is higher, in order to ensure that the high-priority task can be executed in time, the determining subunit 131 is further configured to: if the candidate GPU does not exist, judging whether a preemptible task with the priority lower than that of the new task and the allocated resource more than or equal to the required resource exists in the host machine; and if the preemptible tasks exist, selecting one target task from the preemptible tasks, allocating the resources of the target task to the new task, and allocating the new task to the host machine where the target task is located. And if the preemptible task does not exist, putting the new task into a preset blocking pool to wait for resource allocation.

Example 2

In example 2, the allocable resources of the GPU are sharable resources in the GPU. The structure of the determining unit 13 can be seen in fig. 7, and includes a judging subunit 131 and a determining subunit 132, where specific functions of the judging subunit 131 and the determining subunit 132 can be seen in example 1, and are not described herein again.

Preferably, since the sharable resource of the GPU is a part of the resources already allocated to one executing task in the GPU, the amount of the resources that may be required by the executing task increases after a period of time, and in order to ensure that the executing task can be smoothly executed, in the embodiment of the present invention, the sharable resource of the GPU is set to be allocated only to a new task with a lower priority than any one executing task in the GPU, therefore, in example 2, the determining subunit 132 selects one GPU from the candidate GPUs as the target GPU, specifically: and selecting one GPU from the candidate GPUs, wherein the priority of the task in execution is higher than that of the new task, and the selected GPU is used as a target GPU.

In the embodiment of the present invention, the master program may periodically select the task with the highest priority or the task with the longest time to be placed in the congestion pool from the congestion pool, and send the selected task as a new task to the analysis unit 12.

Example 3

In example 3, the allocable resources of the GPU are free resources and sharable resources in the GPU, and the determining unit 13 has a structure as shown in fig. 8, and includes a first determining subunit 133, a first determining subunit 134, a second determining subunit 135, and a second determining subunit 136, where:

a first determining subunit 133, configured to determine whether a first candidate GPU with idle resources greater than or equal to the required resources exists in each GPU of the host, trigger the first determining subunit 134 if the first candidate GPU exists, and trigger the second determining subunit 135 if the first candidate GPU does not exist;

a first determining subunit 134, configured to select one GPU from the first candidate GPUs as a target GPU;

a second determining subunit 135, configured to determine whether a second candidate GPU whose sharable resource is greater than or equal to the required resource exists in each GPU of the host; triggering a second determining subunit 136 if a second candidate GPU exists;

a second determining subunit 136, configured to select one GPU from the second candidate GPUs as the target GPU.

Preferably, the second determining subunit 136 is specifically configured to: and selecting one GPU from the second candidate GPUs, wherein the priority of the task in execution is higher than that of the new task, and the selected GPU is used as a target GPU.

Preferably, when the first determining subunit 133 determines that the first candidate GPU does not exist, if the priority of the new task is higher, in order to ensure that the high-priority task can be executed in time, the first determining subunit 133 is further configured to: judging whether a preemptible task with the priority lower than that of the new task and the allocated resources more than or equal to the required resources exists in the host machine; if the preemptible tasks exist, selecting a target task from the preemptible tasks, allocating the resources of the target task to the new task, and allocating the new task to a host machine where the target task is located; if there is no preemptible task, the second determining sub-unit 135 is triggered.

Preferably, to ensure that the new task can be executed in time, the determining unit 13 shown in fig. 7 may further include a third determining subunit 137 and a third determining subunit 138, as shown in fig. 9, where:

the second determining subunit 135 is further configured to: if the second candidate GPU does not exist, the third determining subunit 137 is triggered;

a third determining subunit 137, configured to determine whether a third candidate GPU whose total of idle resources and sharable resources is greater than or equal to the required resource exists in each GPU of the host, trigger the third determining subunit 138 if the third candidate GPU exists, and place the new task into a preset blocking pool for waiting for resource allocation if the third candidate GPU does not exist;

a third determining subunit 138, configured to select one GPU from the third candidate GPUs as the target GPU.

Preferably, in an embodiment of the present invention, the third determining subunit 138 is specifically configured to: and selecting one GPU from the second candidate GPUs, wherein the priority of the task in execution is higher than that of the new task, and the selected GPU is used as a target GPU.

EXAMPLE III

In the third embodiment of the present invention, a worker program in a worker end host may be implemented by a resource scheduling device as shown in fig. 10, where the resource scheduling device includes a resource determining unit 21, a communication unit 22, and an execution unit 23, where:

a resource determining unit 21, configured to determine an allocable resource in each GPU in the host.

And the communication unit 22 is used for sending the allocable resources of each GPU to the master server.

And the execution unit 23 is configured to execute the task allocated to the host by the master server.

Preferably, in the third embodiment of the present invention, the allocable resource of the GPU may be an idle resource of the GPU, may also be a sharable resource of the GPU, and may also be an idle resource and a sharable resource of the GPU.

In one example, if the allocable resource of the GPU is an idle resource of the GPU, the resource determining unit 21 is specifically configured to: and monitoring idle resources which are not allocated to the task in execution in each GPU in the host, and taking the idle resources as allocable resources.

In another example, if the allocable resource of the GPU is a sharable resource of the GPU, the resource determining unit 21 is specifically configured to: and predicting sharable resources which are not utilized by the executing task within a period of time in the resources which are allocated to the executing task in each GPU in the host, and taking the sharable resources as allocable resources.

In another example, the allocable resources of the GPU are idle resources and sharable resources of the GPU, and the resource determining unit 21 is specifically configured to: monitoring idle resources which are not allocated to the task in execution in each GPU in the host, predicting sharable resources which are not utilized by the task in execution within a period of time in the resources which are allocated to the task in execution in each GPU in the host, and taking the idle resources and the sharable resources as allocable resources.

In the embodiment of the present invention, the resource determining unit 21 predicts sharable resources that are not utilized by the executing task within a period of time among resources allocated to the executing task in each GPU in the host, and the specific implementation may be as follows: the resource utilization rate of each executing task in each GPU in a historical time period is monitored, the resource utilization rate of each executing task in the GPU in a future time period is predicted, and the part of resources which are predicted not to be used in the future time period are used as sharable resources. For example, a certain GPU includes an executing task a, the amount of GPU resources allocated to the task a is M, and monitoring results show that the resource utilization rate of the task a in a period of time T is always lower than 50%, it can be predicted that the resource utilization rate of the task a still does not exceed 50% in the next period of time, and at this time, 50% of the amount of GPU resources M of the task a is determined as sharable resources in a future period of time.

Preferably, the execution unit 23 is specifically configured to: when a first instruction for executing a new task by using idle resources of a target GPU is received, executing the new task by using the idle resources of the target GPU; and, upon receiving a second instruction to execute a new task using sharable resources of a target GPU, executing the new task using the sharable resources of the target GPU.

Preferably, the execution unit 23 is further configured to: when detecting that a high-priority task in the GPU needs to use more resources, stopping running a low-priority task in the GPU, and allocating sharable resources allocated to the low-priority task to the high-priority task.

Example four

Based on the resource scheduling apparatus shown in the second embodiment, a fourth embodiment of the present invention provides a resource scheduling method, where the method is applied to a master server in a distributed computing cluster in a master-worker mode, and a flowchart of the method is shown in fig. 11, and includes:

step 101, monitoring distributable resources of each GPU in each host machine;

step 102, when a new task is received, determining a demand resource corresponding to the new task;

103, determining a target GPU (graphics processing Unit) with distributable resources meeting the required resources according to the distributable resources of each GPU in the host machine;

and 104, distributing resources for the new task from the distributable resources of the target GPU, and distributing the new task to the host machine where the target GPU is located.

In a specific example, the allocable resource is an idle resource in a GPU, or the allocable resource is a sharable resource in the GPU, and the specific implementation of step 103 may be as shown in fig. 12, where the method includes:

a1, judging whether candidate GPUs with allocable resources more than or equal to the required resources exist in each GPU in the host machine; if there is a candidate GPU, performing step A2;

and A2, selecting one GPU from the candidate GPUs as a target GPU.

Preferably, if the allocable resource is a sharable resource in the GPU, the step a2 specifically includes: and selecting one GPU from the candidate GPUs, wherein the priority of the task in execution is higher than that of the new task, and the selected GPU is used as a target GPU.

Preferably, step a1 in the flowchart shown in fig. 11 further includes the following steps: if no candidate GPU exists, execute steps A3 to a5, as shown in fig. 13:

step A3, judging whether a preemptible task with a priority lower than the new task and allocated resources more than or equal to the required resources exists in the host machine; if the preemptible task exists, executing the step A4, and if the preemptible task does not exist, executing the step A5;

step A4, selecting a target task from the preemptible tasks, allocating the resource of the target task to the new task, and allocating the new task to the host where the target task is located.

Step A5, placing the new task into a preset blocking pool to wait for the resource allocation.

In another example, the allocable resources include free resources and sharable resources in the GPU, and the specific implementation of step 103 may be as shown in fig. 14, and include:

step B1, judging whether a first candidate GPU with idle resources more than or equal to the required resources exists in each GPU of the host machine; performing step B2 if the first candidate GPU exists, and performing step B3 if the first candidate GPU does not exist;

step B2, selecting one GPU from the first candidate GPUs as a target GPU;

step B3, judging whether a second candidate GPU with sharable resources more than or equal to the required resources exists in each GPU of the host machine; if there is a second candidate GPU, performing step B4;

and B4, selecting one GPU from the second candidate GPUs as a target GPU.

Step B4 is specifically for: and selecting one GPU from the second candidate GPUs, wherein the priority of the task in execution is higher than that of the new task, and the selected GPU is used as a target GPU.

Preferably, to ensure that the task with high priority can be executed in time, before executing step B3 of the flowchart shown in fig. 14, step B5 to step B6 may be further included, as shown in fig. 15:

step B5, judging whether a preemptible task with the priority lower than the new task and the allocated resource more than or equal to the required resource exists in the host machine; if the preemptible task exists, executing the step B6, and if the preemptible task does not exist, executing the step B3;

step B6, selecting a target task from the preemptible tasks, allocating the resource of the target task to the new task, and allocating the new task to the host where the target task is located.

Preferably, in the aforementioned flows shown in fig. 14 and fig. 15, if there is no second candidate GPU, the method may further include steps B7 to B8, and as shown in fig. 16, the method may further include steps B7 to B8 in fig. 15;

step B7, judging whether a third candidate GPU with the sum of idle resources and sharable resources being more than or equal to the required resources exists in each GPU of the host machine; if the third candidate GPU exists, executing the step B8, and if the third candidate GPU does not exist, placing the new task into a blocking pool to wait for resource allocation;

and B8, selecting one GPU from the third candidate GPUs as a target GPU.

Preferably, step B8 is specifically for: and selecting one GPU from the third candidate GPUs, wherein the priority of the task in execution is higher than that of the new task, and the selected GPU is used as a target GPU.

EXAMPLE five

Based on the same concept of the resource scheduling device provided by the third embodiment, a fifth embodiment of the present invention provides a resource scheduling method, which is applicable to a worker-end host in a distributed computing cluster in a master-worker mode, and as shown in fig. 17, the method includes:

step 201, determining distributable resources in each GPU in a host machine;

step 202, sending the allocable resources of each GPU to a master server;

and step 203, executing the task distributed to the host by the master server.

In one example, the allocable resource of the GPU is an idle resource in the GPU, and the step 201 is implemented as follows: and monitoring idle resources which are not allocated to the task in execution in each GPU in the host, and taking the idle resources as allocable resources.

In another example, the allocable resource of the GPU is a sharable resource in the GPU, and the step 201 may be implemented as follows: and predicting sharable resources which are not utilized by the executing task within a period of time in the resources which are distributed to the executing task in each GPU in the host, and taking the sharable resources as the distributable resources.

In another example, the allocable resources of the GPU are idle resources and sharable resources in the GPU, and the step 201 may be implemented as follows: monitoring idle resources which are not allocated to the executing task in each GPU in the host, predicting sharable resources which are not utilized by the executing task within a period of time in the resources which are allocated to the executing task in each GPU in the host, and taking the idle resources and the sharable resources as allocable resources.

Preferably, the step 203 specifically includes: when a first instruction for executing a new task by using idle resources of a target GPU is received, executing the new task by using the idle resources of the target GPU; when a second instruction is received to execute a new task using sharable resources of a target GPU, the new task is executed using the sharable resources of the target GPU.

Preferably, the step 203 further comprises the steps of: when detecting that a high-priority task in the GPU needs to use more resources, stopping running a low-priority task in the GPU, and allocating sharable resources allocated to the low-priority task to the high-priority task.

While the principles of the invention have been described in connection with specific embodiments thereof, it should be noted that it will be understood by those skilled in the art that all or any of the steps or elements of the method and apparatus of the invention may be implemented in any computing device (including processors, storage media, etc.) or network of computing devices, in hardware, firmware, software, or any combination thereof, which may be implemented by those skilled in the art using their basic programming skills after reading the description of the invention.

It will be understood by those skilled in the art that all or part of the steps carried out in the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and the program, when executed, may comprise one or a combination of the steps of the method embodiments.

In addition, each functional unit in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the above embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the above-described embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A resource scheduling method is characterized in that the method is applied to a master server in a distributed computing cluster of a master-worker mode, and the method comprises the following steps:

monitoring the distributable resources of each graphics processing unit GPU in each host machine;

determining a target GPU of which the distributable resources meet the required resources according to the distributable resources of each GPU in the host machine;

allocating resources for the new task from the allocable resources of the target GPU, and allocating the new task to the host where the target GPU is located,

the allocable resources comprise idle resources and sharable resources in the GPU, and the sharable resources in the GPU refer to partial resources which are predicted not to be utilized by the executing tasks in a period of time in the resources allocated to the executing tasks in the GPU;

determining a target GPU with distributable resources meeting the demand resources according to the distributable resources of each GPU in the host, which specifically comprises the following steps:

judging whether a first candidate GPU with idle resources more than or equal to the required resources exists in each GPU of the host machine;

if the first candidate GPU exists, selecting one GPU from the first candidate GPUs as a target GPU;

if the first candidate GPU does not exist, judging whether a second candidate GPU with sharable resources more than or equal to the required resources exists in each GPU of the host machine;

if a second candidate GPU exists, selecting one GPU from the second candidate GPUs as a target GPU;

if no second candidate GPU exists, then: judging whether a third candidate GPU with the sum of idle resources and sharable resources larger than or equal to the required resources exists in each GPU of the host machine; and if the third candidate GPUs exist, selecting one GPU from the third candidate GPUs as a target GPU.

2. The method according to claim 1, before determining whether there is a second candidate GPU having sharable resources greater than or equal to the required resources among the GPUs of the host, further comprising:

judging whether a preemptible task with the priority lower than the new task and the allocated resource more than or equal to the required resource exists in the host machine;

if the preemptible tasks exist, selecting a target task from the preemptible tasks, allocating the resources of the target task to the new task, and allocating the new task to a host machine where the target task is located;

and if the preemptible task does not exist, executing the step of judging whether a second candidate GPU with sharable resources more than or equal to the required resources exists in each GPU of the host machine.

3. The method according to claim 1 or 2, wherein selecting one GPU from the second candidate GPUs as the target GPU specifically comprises: selecting one GPU which contains the tasks in execution and has higher priority than the new tasks from the second candidate GPUs as a target GPU;

and/or selecting one GPU from the third candidate GPUs as a target GPU, and specifically comprising the following steps: and selecting one GPU from the third candidate GPUs, wherein the priority of the task in execution is higher than that of the new task, and the selected GPU is used as a target GPU.

4. A resource scheduling device is characterized in that the device is arranged at a master server in a distributed computing cluster of a master-worker mode, and the device comprises:

the monitoring unit is used for monitoring the distributable resources of the GPU in each host;

an allocating unit, configured to allocate resources to the new task from the allocable resources of the target GPU, and allocate the new task to a host corresponding to the target GPU,

the distributable resources comprise idle resources and sharable resources in the GPU, wherein the sharable resources in the GPU refer to partial resources which are predicted not to be utilized by the executing task in the resources distributed to the executing task in the GPU within a period of time;

the determining unit specifically includes:

the first judgment subunit is used for judging whether a first candidate GPU with idle resources larger than or equal to the required resources exists in each GPU of the host machine, triggering the first determination subunit if the first candidate GPU exists, and triggering the second judgment subunit if the first candidate GPU does not exist;

a first determining subunit, configured to select one GPU from the first candidate GPUs as a target GPU;

the second judgment subunit is used for judging whether a second candidate GPU with sharable resources more than or equal to the required resources exists in each GPU of the host machine; triggering a second determining subunit if a second candidate GPU exists; if the second candidate GPU does not exist, triggering a third judgment subunit;

a second determining subunit, configured to select one GPU from the second candidate GPUs as a target GPU;

the third judgment subunit is used for judging whether a third candidate GPU with the sum of the idle resources and the sharable resources larger than or equal to the required resources exists in each GPU of the host machine, and if the third candidate GPU exists, the third judgment subunit triggers a third determination subunit;

and the third determining subunit is used for selecting one GPU from the third candidate GPUs as a target GPU.

5. The apparatus of claim 4, wherein the first determining subunit is further configured to, before triggering the second determining subunit: judging whether a preemptible task with the priority lower than that of the new task and the allocated resources more than or equal to the required resources exists in the host machine; if the preemptible tasks exist, selecting a target task from the preemptible tasks, allocating the resources of the target task to the new task, and allocating the new task to a host machine where the target task is located; and if the preemptible task does not exist, triggering a second judgment subunit.

6. The apparatus according to claim 4 or 5, wherein the second determining subunit is specifically configured to: selecting a GPU which contains the tasks in execution and has higher priority than the new tasks from the second candidate GPUs as a target GPU;

and/or the third determining subunit is specifically configured to: and selecting one GPU from the third candidate GPUs, wherein the priority of the task in execution is higher than that of the new task, and the selected GPU is used as a target GPU.

7. A resource scheduling system is characterized by comprising a master server and a plurality of worker end host machines which are respectively connected with the master server:

the master server is used for monitoring the distributable resources of each GPU in each host machine; when a new task is received, determining demand resources corresponding to the new task; determining a target GPU of which the distributable resources meet the required resources according to the distributable resources of each GPU in the host machine; distributing resources for the new task from the distributable resources of the target GPU, and distributing the new task to a host machine where the target GPU is located;

the method for determining the target GPU with the distributable resources meeting the demand resources according to the distributable resources of the GPUs in the host machine specifically comprises the following steps:

if second candidate GPUs exist, selecting one GPU from the second candidate GPUs as a target GPU;

if no second candidate GPU exists, then: judging whether a third candidate GPU with the sum of idle resources and sharable resources larger than or equal to the required resources exists in each GPU of the host machine; if a third candidate GPU exists, selecting one GPU from the third candidate GPUs as a target GPU,

the host machine is used for determining the distributable resources of each GPU in the host machine, sending the distributable resources to the master-end server and executing the tasks distributed by the master-end server,

the determining of the allocable resources in each GPU in the host includes:

monitoring idle resources which are not allocated to the tasks in execution in each GPU in the host, predicting sharable resources which are not utilized by the tasks in execution within a period of time in the resources which are allocated to the tasks in execution in each GPU in the host, and taking the idle resources and the sharable resources as allocable resources.