CN110413412B - GPU (graphics processing Unit) cluster resource allocation method and device - Google Patents

GPU (graphics processing Unit) cluster resource allocation method and device Download PDF

Info

Publication number
CN110413412B
CN110413412B CN201910654395.XA CN201910654395A CN110413412B CN 110413412 B CN110413412 B CN 110413412B CN 201910654395 A CN201910654395 A CN 201910654395A CN 110413412 B CN110413412 B CN 110413412B
Authority
CN
China
Prior art keywords
task
gpu
processed
tasks
resource
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910654395.XA
Other languages
Chinese (zh)
Other versions
CN110413412A (en
Inventor
姬贵阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN201910654395.XA priority Critical patent/CN110413412B/en
Publication of CN110413412A publication Critical patent/CN110413412A/en
Application granted granted Critical
Publication of CN110413412B publication Critical patent/CN110413412B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a resource allocation method based on a GPU cluster, wherein the GPU cluster comprises a plurality of GPU cards; the method comprises the following steps: acquiring tasks to be processed, wherein the tasks to be processed comprise large tasks and small tasks; the large task is a task to be processed, wherein the required resource amount is greater than or equal to one GPU card; the small task is a task to be processed with the required resource amount smaller than one GPU card; distributing one or more GPU cards to execute the large tasks according to the priority sequence of the tasks to be processed; each GPU card is only distributed with one large task; acquiring the resource residual quantity of each GPU card for executing the task; and traversing the small tasks which are not executed according to the priority order for each GPU card with the resource residual quantity, if the resource residual quantity of the GPU card is found to meet the small tasks which are not executed, distributing the resource residual quantity of the GPU card to the small tasks, and updating the resource residual quantity of the GPU card. By the scheme of the invention, the utilization rate of GPU cluster resources is improved.

Description

GPU (graphics processing Unit) cluster resource allocation method and device
Technical Field
The invention relates to the field of cloud computing, in particular to a method and a device for distributing cluster resources based on a GPU.
Background
Currently, Resource scheduling software commonly used by a GPU cluster (a system for managing a large number of GPU cards) includes pbs (portable Batch system) and skid (simple Linux Utility for Resource management), the utilization rate of the GPU cluster is low by adopting the existing Resource scheduling software, the utilization rate of a GPU (graphics Processing Unit) card for allocating Processing tasks is about 30% to 60%, and about 30% of resources are not used and are wasted.
The unit for allocating the GPU cards by the dispatcher software pbs and the scheduler software slrm is the number, namely the card number, when the GPU resources are dispatched, one or more GPU cards are allocated to the tasks, the utilization rate of the GPU is not high when the tasks are operated, and partial GPU resources are wasted. Meanwhile, the resource scheduling of the existing scheduler software pbs and slurm is the multiplexing of the GPU card, namely more than two tasks can be operated on one GPU card, although the GPU resources can be more fully utilized, the task execution speed is reduced due to the fact that a plurality of tasks are operated on one GPU card at the same time, the work efficiency is very low, and the task completion period is prolonged. In addition, in the prior art, only the scheduling of resources is realized, and a related technology for reasonably allocating resources does not exist, and for the problems existing in the conventional GPU cluster resource scheduling, a method based on GPU cluster resource allocation is urgently needed to be provided, and a method and a device capable of improving the GPU cluster utilization rate are needed.
Disclosure of Invention
In order to solve the technical problem, the invention provides a method and a device for allocating resources based on a GPU cluster, which improve the utilization rate of the GPU cluster.
In order to achieve the aim of the invention, the invention provides a resource allocation method based on a GPU cluster, wherein the GPU cluster comprises a plurality of GPU cards; the method comprises the following steps:
acquiring tasks to be processed, wherein the tasks to be processed comprise large tasks and small tasks; the large task refers to a task to be processed, wherein the required resource amount is greater than or equal to one GPU card; the small task is a task to be processed, the required resource amount of which is less than that of one GPU card;
distributing one or more GPU cards to execute the large tasks according to the priority sequence of the tasks to be processed; each GPU card is only distributed with one large task;
acquiring the resource residual quantity of each GPU card for executing the task;
and traversing the small tasks which are not executed according to the priority order for each GPU card with the resource residual quantity, if the resource residual quantity of the GPU card is found to meet the small tasks which are not executed, distributing the resource residual quantity of the GPU card to the small tasks, and updating the resource residual quantity of the GPU card.
In an exemplary embodiment, after the acquiring the task to be processed, the method further includes:
calculating the attribute of the task to be processed by adopting a custom rule to obtain a priority weight value;
sorting according to the calculated priority weight values, wherein the higher the weight value is, the higher the priority is;
wherein the attributes of the task to be processed comprise one or more of the following: the time range for executing the task, the number of GPU cards required for executing the task and the type of GPU cards are calculated.
In an exemplary embodiment, the executing one or more GPU cards allocated to a large task according to the priority order of the tasks to be processed includes:
step 31, according to the priority sequence of the tasks to be processed, determining the first unexecuted big task as the current task to be processed;
step 32, distributing one or more GPU cards for the current task to be processed;
step 33, judging whether large tasks which are not executed exist, and if not, acquiring the resource residual quantity of each GPU card for executing the tasks; if so, determining the next unexecuted big task as the current task to be processed;
step 34, judging whether an idle GPU card exists, if so, performing step 35, and if not, acquiring the resource residual quantity of each GPU card for executing the task;
step 35, judging whether the resource amount of the current idle GPU card is larger than or equal to the resource amount required by the current task to be processed;
if greater than or equal to, go to step 32;
if so, go to step 36;
step 36, judging whether a large task which is not executed exists;
if there is a big task that is not executed, determining the next big task that is not executed as the current task to be processed according to the priority order, and performing step 35;
if there are no large tasks not executed, then this is complete.
In an exemplary embodiment, the amount of resources includes a time frame for executing the task and a resource requirement for executing the task.
In an exemplary embodiment, for each GPU card with a resource residual amount, traversing the small tasks that are not executed respectively according to a priority order, and if the resource residual amount of the GPU card is found to satisfy the small tasks that are not executed, allocating the resource residual amount of the GPU card to the small task, and updating the resource residual amount of the GPU card, the method includes:
step 51, traversing the small tasks which are not executed according to the priority sequence, and determining the first small task which is not executed as the current task to be processed;
step 52, judging whether the resource residual quantity of the GPU card is larger than or equal to the resource quantity required by the current task to be processed; if so, go to step 53, if not, go to step 54;
step 53, judging whether the execution time range of the current task to be processed is within the time range of the GPU card executing the large task;
if the current task to be processed is within the time range of executing the large task, the resource residual amount of the GPU card is distributed to the current task to be processed, and the resource residual amount is updated; if not, go to step 54;
step 54, judging whether small tasks which are not executed exist, and if not, ending the process;
if there are small tasks that have not been executed, the next small task that has not been executed is determined to be the current task to be processed according to the priority order, and the process returns to step 52.
In order to solve the above problem, the present invention further provides a device for allocating resources based on a GPU cluster, including: a memory and a processor, the GPU cluster including a plurality of GPU cards;
the memory is used for storing programs for allocating resources based on GPU clusters;
the processor is configured to read and execute the program for allocating resources based on the GPU cluster, and perform the following operations:
acquiring tasks to be processed, wherein the tasks to be processed comprise large tasks and small tasks; the large task refers to a task to be processed, wherein the required resource amount is greater than or equal to one GPU card; the small task is a task to be processed, the required resource amount of which is less than that of one GPU card;
distributing one or more GPU cards to execute the large tasks according to the priority sequence of the tasks to be processed; each GPU card is only distributed with one large task;
acquiring the resource residual quantity of each GPU card for executing the task;
and traversing the small tasks which are not executed according to the priority order for each GPU card with the resource residual quantity, if the resource residual quantity of the GPU card is found to meet the small tasks which are not executed, distributing the resource residual quantity of the GPU card to the small tasks, and updating the resource residual quantity of the GPU card.
In an exemplary embodiment, the processor reads and executes the program for GPU cluster resource allocation based on the following operations after acquiring the task to be processed:
calculating the attribute of the task to be processed by adopting a custom rule to obtain a priority weight value;
sorting according to the calculated priority weight values, wherein the higher the weight value is, the higher the priority is;
wherein the attributes of the task to be processed comprise one or more of the following: the time range for executing the task, the number of GPU cards required for executing the task and the type of GPU cards are calculated.
In an exemplary embodiment, the processor reads and executes the program for GPU cluster resource based allocation, and further performs the following operations:
the executing of the one or more GPU cards distributed to the big tasks according to the priority sequence of the tasks to be processed comprises the following steps:
step 31, according to the priority sequence of the tasks to be processed, determining the first unexecuted big task as the current task to be processed;
step 32, distributing one or more GPU cards for the current task to be processed;
step 33, judging whether large tasks which are not executed exist, and if not, acquiring the resource residual quantity of each GPU card for executing the tasks; if so, determining the next unexecuted big task as the current task to be processed;
step 34, judging whether an idle GPU card exists, if so, performing step 35, and if not, acquiring the resource residual quantity of each GPU card for executing the task;
step 35, judging whether the resource amount of the current idle GPU card is larger than or equal to the resource amount required by the current task to be processed;
if greater than or equal to, go to step 32;
if so, go to step 36;
step 36, judging whether a large task which is not executed exists;
if there is a big task that is not executed, determining the next big task that is not executed as the current task to be processed according to the priority order, and performing step 35;
if there are no large tasks not executed, then this is complete.
In an exemplary embodiment, the amount of resources includes a time frame for executing the task and a resource requirement for executing the task.
In an exemplary embodiment, the processor reads and executes the program for GPU cluster resource based allocation, and further performs the following operations:
for each GPU card with resource residual quantity, respectively traversing the small tasks which are not executed according to the priority sequence, if the resource residual quantity of the GPU card is found to meet the small tasks which are not executed, distributing the resource residual quantity of the GPU card to the small tasks, and updating the resource residual quantity of the GPU card, wherein the method comprises the following steps:
step 51, traversing the small tasks which are not executed according to the priority sequence, and determining the first small task which is not executed as the current task to be processed;
step 52, judging whether the resource residual quantity of the GPU card is larger than or equal to the resource quantity required by the current task to be processed; if so, go to step 53, if not, go to step 54;
step 53, judging whether the execution time range of the current task to be processed is within the time range of the GPU card executing the large task;
if the current task to be processed is within the time range of executing the large task, the resource residual amount of the GPU card is distributed to the current task to be processed, and the resource residual amount is updated; if not, go to step 54;
step 54, judging whether small tasks which are not executed exist, and if not, ending the process;
if there are small tasks that have not been executed, the next small task that has not been executed is determined to be the current task to be processed according to the priority order, and the process returns to step 52.
Compared with the prior art, the invention provides a resource allocation method based on a GPU cluster, wherein the GPU cluster comprises a plurality of GPU cards; acquiring tasks to be processed, wherein the tasks to be processed comprise large tasks and small tasks; the large task is a task to be processed, wherein the required resource amount is greater than or equal to one GPU card; the small task is a task to be processed with the required resource amount smaller than one GPU card; distributing one or more GPU cards to execute the large tasks according to the priority sequence of the tasks to be processed; each GPU card is only distributed with one large task; acquiring the resource residual quantity of each GPU card for executing the task; and traversing the small tasks which are not executed according to the priority order for each GPU card with the resource residual quantity, if the resource residual quantity of the GPU card is found to meet the small tasks which are not executed, distributing the resource residual quantity of the GPU card to the small tasks, and updating the resource residual quantity of the GPU card. By the scheme of the invention, the utilization rate of GPU cluster resources is improved.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the example serve to explain the principles of the invention and not to limit the invention.
FIG. 1 is a flowchart of a GPU-based cluster resource allocation method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a GPU cluster resource allocation device in the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that the embodiments and features of the embodiments in the present application may be arbitrarily combined with each other without conflict.
The steps illustrated in the flow charts of the figures may be performed in a computer system such as a set of computer-executable instructions. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.
FIG. 1 is a flowchart of a method for allocating resources based on a GPU cluster according to an embodiment of the present invention, where the GPU cluster includes multiple GPU cards; the method comprises the following steps:
step 100, obtaining tasks to be processed, wherein the tasks to be processed comprise large tasks and small tasks.
In this embodiment, a plurality of GPU cards are included in a GPU cluster, each GPU card being an independent processor.
Acquiring a task to be processed, and dividing the task to be processed into a large task and a small task; the large task is a task to be processed, the required resource amount of which is more than or equal to one GPU card; the small task is a task to be processed with less resource amount than one GPU card. Wherein the processing task may be a deep learning task trained in a deep learning platform. The resource amount required by the large task can be distributed by dividing the processing tasks, and the small tasks are backfilled with resources. For example: the resource amount of the large task is 2 GPU cards, and the resource amount of the small task is 20% of the memory in one GPU card.
In an exemplary embodiment, after a task to be processed is obtained, attributes of the task to be processed are calculated by adopting a custom rule to obtain a priority weight value; sorting according to the calculated priority weight values, wherein the higher the weight value is, the higher the priority is; wherein the attributes of the task to be processed comprise one or more of the following: and calculating the expected start time of the task, the expected end time of the task, the execution time range of the task, the number of GPU cards required for executing the task and the type of the GPU cards. After the tasks to be processed are subjected to priority sequencing, the large tasks are in front of the priority sequence, and the small tasks are behind the priority sequence. For example: the priority order of the sequenced tasks to be processed is as follows: major task 1, major task 2, major task 3 … …, minor task 1, minor task 2, minor task 3, and minor task 4 … ….
And 101, distributing one or more GPU cards to execute the large tasks according to the priority sequence of the tasks to be processed.
In the embodiment, according to the priority sequence of the tasks to be processed, the GPU cards are firstly distributed to the large tasks, one or more GPU cards are distributed to execute the large tasks, and only one large task is distributed to each GPU card in the distribution process; one card runs one large task to ensure the speed of task execution and the efficiency of task execution.
In an exemplary embodiment, the one or more GPU cards allocated to the large task according to the priority order of the tasks to be processed may be implemented as follows:
step 31, according to the priority sequence of the tasks to be processed, determining the first unexecuted big task as the current task to be processed;
step 32, distributing one or more GPU cards for the current task to be processed;
step 33, judging whether large tasks which are not executed exist, and if not, acquiring the resource residual quantity of each GPU card for executing the tasks; if so, determining the next unexecuted big task as the current task to be processed;
step 34, judging whether an idle GPU card exists, if so, performing step 35, and if not, acquiring the resource residual quantity of each GPU card for executing the task;
step 35, judging whether the resource amount of the current idle GPU card is larger than or equal to the resource amount required by the current task to be processed;
if greater than or equal to, go to step 32;
if so, go to step 36;
step 36, judging whether a large task which is not executed exists;
if there is a big task that is not executed, determining the next big task that is not executed as the current task to be processed according to the priority order, and performing step 35;
if there are no large tasks not executed, then this is complete.
And 102, acquiring the resource residual quantity of each GPU card for executing the task.
In this implementation, the resource remaining amount of each GPU card after the large task is allocated may be obtained by the prior art. The technical means adopted for obtaining the resource residual amount of each GPU card after the large task is allocated is not specifically limited in this embodiment.
And 103, traversing the small tasks which are not executed according to the priority sequence for each GPU card with the resource residual quantity, if the resource residual quantity of the GPU card is found to meet the small tasks which are not executed, distributing the resource residual quantity of the GPU card to the small tasks, and updating the resource residual quantity of the GPU card.
In this implementation, for each GPU card with the resource remaining amount obtained in step 102, the small task may be allocated to the GPU card again, so that the resource utilization rate of the GPU card may be improved.
An exemplary embodiment includes a time frame for executing the task and a resource requirement for executing the task.
In an exemplary embodiment, for each GPU card with resource residual, traversing the small tasks that are not executed according to a priority order, if the resource residual of the GPU card is found to satisfy the small tasks that are not executed, allocating the resource residual of the GPU card to the small tasks, and updating the resource residual of the GPU card, includes:
step 51, traversing the small tasks which are not executed according to the priority sequence, and determining the first small task which is not executed as the current task to be processed;
step 52, judging whether the resource residual quantity of the GPU card is larger than or equal to the resource quantity required by the current task to be processed; if so, go to step 53, if not, go to step 54;
step 53, judging whether the execution time range of the current task to be processed is within the time range of the GPU card executing the large task;
if the current task to be processed is within the time range of executing the large task, the resource residual amount of the GPU card is distributed to the current task to be processed, and the resource residual amount is updated; if not, go to step 54;
step 54, judging whether small tasks which are not executed exist, and if not, ending the process;
if there are small tasks that have not been executed, the next small task that has not been executed is determined to be the current task to be processed according to the priority order, and the process returns to step 52.
In addition, the present application provides an embodiment of an apparatus based on GPU cluster resource allocation, which corresponds to the embodiment of the method shown in fig. 1.
In order to solve the above problem, the present invention further provides a device for allocating resources based on a GPU cluster, including: a memory and a processor, the GPU cluster including a plurality of GPU cards;
the memory is used for storing programs for allocating resources based on GPU clusters;
the processor is configured to read and execute the program for allocating resources based on the GPU cluster, and perform the following operations:
acquiring tasks to be processed, wherein the tasks to be processed comprise large tasks and small tasks; the large task refers to a task to be processed, wherein the required resource amount is greater than or equal to one GPU card; the small task is a task to be processed, the required resource amount of which is less than that of one GPU card;
distributing one or more GPU cards to execute the large tasks according to the priority sequence of the tasks to be processed; each GPU card is only distributed with one large task;
acquiring the resource residual quantity of each GPU card for executing the task;
and traversing the small tasks which are not executed according to the priority order for each GPU card with the resource residual quantity, if the resource residual quantity of the GPU card is found to meet the small tasks which are not executed, distributing the resource residual quantity of the GPU card to the small tasks, and updating the resource residual quantity of the GPU card.
In an exemplary embodiment, the processor reads and executes the program for GPU cluster resource allocation based on the following operations after acquiring the task to be processed:
calculating the attribute of the task to be processed by adopting a custom rule to obtain a priority weight value;
sorting according to the calculated priority weight values, wherein the higher the weight value is, the higher the priority is;
wherein the attributes of the task to be processed comprise one or more of the following: the time range for executing the task, the number of GPU cards required for executing the task and the type of GPU cards are calculated.
In an exemplary embodiment, the processor reads and executes the program for GPU cluster resource based allocation, and further performs the following operations:
the executing of the one or more GPU cards distributed to the big tasks according to the priority sequence of the tasks to be processed comprises the following steps:
step 31, according to the priority sequence of the tasks to be processed, determining the first unexecuted big task as the current task to be processed;
step 32, distributing one or more GPU cards for the current task to be processed;
step 33, judging whether large tasks which are not executed exist, and if not, acquiring the resource residual quantity of each GPU card for executing the tasks; if so, determining the next unexecuted big task as the current task to be processed;
step 34, judging whether an idle GPU card exists, if so, performing step 35, and if not, acquiring the resource residual quantity of each GPU card for executing the task;
step 35, judging whether the resource amount of the current idle GPU card is larger than or equal to the resource amount required by the current task to be processed;
if greater than or equal to, go to step 32;
if so, go to step 36;
step 36, judging whether a large task which is not executed exists;
if there is a big task that is not executed, determining the next big task that is not executed as the current task to be processed according to the priority order, and performing step 35;
if there are no large tasks not executed, then this is complete.
In an exemplary embodiment, the amount of resources includes a time frame for executing the task and a resource requirement for executing the task.
In an exemplary embodiment, the processor reads and executes the program for GPU cluster resource based allocation, and further performs the following operations:
for each GPU card with resource surplus, traversing the small tasks which are not executed according to the priority sequence, if the resource surplus of the GPU card is found to meet the small tasks which are not executed, distributing the resource surplus of the GPU card to the small tasks, and updating the resource surplus of the GPU card, wherein the method comprises the following steps:
step 51, traversing the small tasks which are not executed according to the priority sequence, and determining the first small task which is not executed as the current task to be processed;
step 52, judging whether the resource residual quantity of the GPU card is larger than or equal to the resource quantity required by the current task to be processed; if so, go to step 53, if not, go to step 54;
step 53, judging whether the execution time range of the current task to be processed is within the time range of the GPU card executing the large task;
if the current task to be processed is within the time range of executing the large task, the resource residual amount of the GPU card is distributed to the current task to be processed, and the resource residual amount is updated; if not, go to step 54;
step 54, judging whether the small tasks which are not executed exist, if not, ending;
if there are small tasks that have not been executed, the next small task that has not been executed is determined to be the current task to be processed according to the priority order, and the process returns to step 52.
In order to solve the above problem, the present invention further provides a specific embodiment of a method for allocating resources based on a GPU cluster, and an implementation process of an exemplary embodiment is as follows:
step 31, sequencing the tasks to be processed by adopting a custom rule to obtain the priority sequence of the tasks to be processed;
step 32, according to the priority sequence of the tasks to be processed, determining the first unexecuted big task as the current task to be processed;
in this embodiment, in the task allocation process, a resource allocation is first performed on a large task.
Step 33, distributing one or more GPU cards for the current to-be-processed large task; wherein each GPU card is only distributed to one big task;
step 34, judging whether a large task which is not executed exists, if not, performing step 38; if so, determining the next unexecuted big task as the current task to be processed;
step 35, judging whether an idle GPU card exists, if so, performing step 36, and if not, performing step 38;
step 36, judging whether the resource amount of the current idle GPU card is larger than or equal to the resource amount required by the current task to be processed;
if greater than or equal to, go to step 33;
if so, go to step 37;
step 37, judging whether a large task which is not executed exists;
if there is a big task that is not executed, determining the next big task that is not executed as the current task to be processed according to the priority order, and performing step 36;
if there are no large tasks not executed, then this is complete.
And step 38, acquiring the resource residual quantity of each GPU card.
And 39, traversing the small tasks which are not executed according to the priority sequence for each GPU card with the resource residual quantity, if the resource residual quantity of the GPU card is found to meet the small tasks which are not executed, distributing the resource residual quantity of the GPU card to the small tasks, and updating the resource residual quantity of the GPU card.
In this embodiment, after the large task is allocated, the GPU cards for executing the large task also have the resource residual amount, and the resource residual amount of each GPU card is allocated to the small task satisfying the condition, so that the resource utilization rate of the GPU card can be improved.
The specific implementation process for step 39 may include:
step 391, traversing the small tasks which are not executed according to the priority order, and determining the first small task which is not executed as the current task to be processed;
step 392, judging whether a GPU card with the resource residual quantity meeting the resource quantity required by the current task to be processed exists; if so, proceed to step 393, and if not, proceed to step 394;
step 393, if yes, judging whether the execution time range of the current task to be processed is within the time range of the GPU card executing the large task;
if the current task to be processed is within the time range of executing the large task, the resource residual amount of the GPU card is distributed to the current task to be processed, and the resource residual amount is updated; if not, proceed to step 394;
step 394, judging whether there are small tasks which are not executed, if not, ending;
if there are small tasks that have not been executed, the next small task that has not been executed is determined to be the current task to be processed according to the priority order, and the process returns to step 392.
Based on a specific example of this embodiment, a task to be processed is divided into a large task and a small task, different resource allocation methods are respectively adopted for the large task and the small task, and after the large task is allocated, a remaining resource amount exists for a GPU card executing the large task, and the remaining resource amount of each GPU card is allocated to the small task, so that the resource amount of each GPU card is fully utilized, and the GPU cluster utilization rate is improved.
It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

Claims (8)

1. A method based on GPU cluster resource allocation, wherein the GPU cluster comprises a plurality of GPU cards; characterized in that the method comprises:
acquiring tasks to be processed, wherein the tasks to be processed comprise large tasks and small tasks; the large task refers to a task to be processed, wherein the required resource amount is greater than or equal to one GPU card; the small task is a task to be processed, the required resource amount of which is less than that of one GPU card;
distributing one or more GPU cards to execute the large tasks according to the priority sequence of the tasks to be processed; each GPU card is only distributed with one large task;
acquiring the resource residual quantity of each GPU card for executing the task;
for each GPU card with the resource residual quantity, traversing the small tasks which are not executed according to the priority sequence, if the resource residual quantity of the GPU card is found to meet the small tasks which are not executed, distributing the resource residual quantity of the GPU card to the small tasks, and updating the resource residual quantity of the GPU card;
after the task to be processed is obtained, the method further comprises the following steps:
calculating the attribute of the task to be processed by adopting a custom rule to obtain a priority weight value;
sorting according to the calculated priority weight values, wherein the higher the weight value is, the higher the priority is;
wherein the attributes of the task to be processed comprise one or more of the following: the time range for executing the task, the number of GPU cards required for executing the task and the type of GPU cards are calculated.
2. The method for allocating GPU-based cluster resources according to claim 1, wherein the executing one or more GPU cards allocated to the big task according to the priority order of the tasks to be processed comprises:
step 31, according to the priority sequence of the tasks to be processed, determining the first unexecuted big task as the current task to be processed;
step 32, distributing one or more GPU cards for the current task to be processed;
step 33, judging whether large tasks which are not executed exist, and if not, acquiring the resource residual quantity of each GPU card for executing the tasks; if so, determining the next unexecuted big task as the current task to be processed;
step 34, judging whether an idle GPU card exists, if so, performing step 35, and if not, acquiring the resource residual quantity of each GPU card for executing the task;
step 35, judging whether the resource amount of the current idle GPU card is larger than or equal to the resource amount required by the current task to be processed;
if greater than or equal to, go to step 32;
if so, go to step 36;
step 36, judging whether a large task which is not executed exists;
if there is a big task that is not executed, determining the next big task that is not executed as the current task to be processed according to the priority order, and performing step 35;
if there are no large tasks not executed, then this is complete.
3. The method of claim 1, wherein the amount of resources comprises a time frame for executing a task and a resource requirement for executing a task.
4. The method according to claim 3, wherein for each GPU card with resource residual, traversing the small tasks that are not executed according to the priority order, if the resource residual of the GPU card is found to satisfy the small tasks that are not executed, allocating the resource residual of the GPU card to the small tasks, and updating the resource residual of the GPU card comprises:
step 51, traversing the small tasks which are not executed according to the priority sequence, and determining the first small task which is not executed as the current task to be processed;
step 52, judging whether the resource residual quantity of the GPU card is larger than or equal to the resource quantity required by the current task to be processed; if so, go to step 53, if not, go to step 54;
step 53, judging whether the execution time range of the current task to be processed is within the time range of the GPU card executing the large task;
if the current task to be processed is within the time range of executing the large task, the resource residual amount of the GPU card is distributed to the current task to be processed, and the resource residual amount is updated; if not, go to step 54;
step 54, judging whether small tasks which are not executed exist, and if not, ending the process;
if there are small tasks that have not been executed, the next small task that has not been executed is determined to be the current task to be processed according to the priority order, and the process returns to step 52.
5. An apparatus for GPU cluster resource allocation based, comprising: a memory and a processor, the GPU cluster including a plurality of GPU cards; the method is characterized in that:
the memory is used for storing programs for allocating resources based on GPU clusters;
the processor is configured to read and execute the program for allocating resources based on the GPU cluster, and perform the following operations:
acquiring tasks to be processed, wherein the tasks to be processed comprise large tasks and small tasks; the large task refers to a task to be processed, wherein the required resource amount is greater than or equal to one GPU card; the small task is a task to be processed, the required resource amount of which is less than that of one GPU card;
distributing one or more GPU cards to execute the large tasks according to the priority sequence of the tasks to be processed; each GPU card is only distributed with one large task;
acquiring the resource residual quantity of each GPU card for executing the task;
for each GPU card with the resource residual quantity, traversing the small tasks which are not executed according to the priority sequence, if the resource residual quantity of the GPU card is found to meet the small tasks which are not executed, distributing the resource residual quantity of the GPU card to the small tasks, and updating the resource residual quantity of the GPU card;
after the task to be processed is obtained, the following operations are also executed:
calculating the attribute of the task to be processed by adopting a custom rule to obtain a priority weight value;
sorting according to the calculated priority weight values, wherein the higher the weight value is, the higher the priority is;
wherein the attributes of the task to be processed comprise one or more of the following: the time range for executing the task, the number of GPU cards required for executing the task and the type of GPU cards are calculated.
6. The apparatus according to claim 5, wherein the processor reads and executes the program for GPU cluster resource allocation, and further performs the following operations:
the executing of the one or more GPU cards distributed to the big tasks according to the priority sequence of the tasks to be processed comprises the following steps:
step 31, according to the priority sequence of the tasks to be processed, determining the first unexecuted big task as the current task to be processed;
step 32, distributing one or more GPU cards for the current task to be processed;
step 33, judging whether large tasks which are not executed exist, and if not, acquiring the resource residual quantity of each GPU card for executing the tasks; if so, determining the next unexecuted big task as the current task to be processed;
step 34, judging whether an idle GPU card exists, if so, performing step 35, and if not, acquiring the resource residual quantity of each GPU card for executing the task;
step 35, judging whether the resource amount of the current idle GPU card is larger than or equal to the resource amount required by the current task to be processed;
if greater than or equal to, go to step 32;
if so, go to step 36;
step 36, judging whether a large task which is not executed exists;
if there is a big task that is not executed, determining the next big task that is not executed as the current task to be processed according to the priority order, and performing step 35;
if there are no large tasks not executed, then this is complete.
7. The GPU cluster resource allocation-based apparatus of claim 5,
the resource amount includes a time range for executing the task and a resource demand amount for executing the task.
8. The apparatus according to claim 7, wherein the processor reads and executes the program for GPU cluster resource allocation and further performs the following operations:
for each GPU card with resource residual quantity, respectively traversing the small tasks which are not executed according to the priority sequence, if the resource residual quantity of the GPU card is found to meet the small tasks which are not executed, distributing the resource residual quantity of the GPU card to the small tasks, and updating the resource residual quantity of the GPU card, wherein the method comprises the following steps:
step 51, traversing the small tasks which are not executed according to the priority sequence, and determining the first small task which is not executed as the current task to be processed;
step 52, judging whether the resource residual quantity of the GPU card is larger than or equal to the resource quantity required by the current task to be processed; if so, go to step 53, if not, go to step 54;
step 53, judging whether the execution time range of the current task to be processed is within the time range of the GPU card executing the large task;
if the current task to be processed is within the time range of executing the large task, the resource residual amount of the GPU card is distributed to the current task to be processed, and the resource residual amount is updated; if not, go to step 54;
step 54, judging whether small tasks which are not executed exist, and if not, ending the process;
if there are small tasks that have not been executed, the next small task that has not been executed is determined to be the current task to be processed according to the priority order, and the process returns to step 52.
CN201910654395.XA 2019-07-19 2019-07-19 GPU (graphics processing Unit) cluster resource allocation method and device Active CN110413412B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910654395.XA CN110413412B (en) 2019-07-19 2019-07-19 GPU (graphics processing Unit) cluster resource allocation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910654395.XA CN110413412B (en) 2019-07-19 2019-07-19 GPU (graphics processing Unit) cluster resource allocation method and device

Publications (2)

Publication Number Publication Date
CN110413412A CN110413412A (en) 2019-11-05
CN110413412B true CN110413412B (en) 2022-03-25

Family

ID=68362046

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910654395.XA Active CN110413412B (en) 2019-07-19 2019-07-19 GPU (graphics processing Unit) cluster resource allocation method and device

Country Status (1)

Country Link
CN (1) CN110413412B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111176852B (en) * 2020-01-15 2024-04-16 上海依图网络科技有限公司 Resource allocation method, device, chip and computer readable storage medium
CN111381970B (en) * 2020-03-16 2023-07-25 第四范式(北京)技术有限公司 Cluster task resource allocation method and device, computer device and storage medium
CN111708799B (en) * 2020-04-30 2023-09-05 咪咕文化科技有限公司 Spark task processing method and device, electronic equipment and storage medium
CN111722928A (en) * 2020-06-12 2020-09-29 北京字节跳动网络技术有限公司 Resource scheduling method and device, electronic equipment and storage medium
CN112148481B (en) * 2020-09-10 2022-11-22 苏州浪潮智能科技有限公司 Method, system, equipment and medium for executing simulation test task
CN113742064B (en) * 2021-08-06 2023-08-04 苏州浪潮智能科技有限公司 Resource arrangement method, system, equipment and medium of server cluster
CN115328665B (en) * 2022-10-12 2023-02-28 中瓴智行(成都)科技有限公司 Hypervisor-based GPU virtualization method and device and electronic equipment
CN117950815A (en) * 2022-10-21 2024-04-30 华为技术有限公司 Method for executing tasks and heterogeneous server

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107797853A (en) * 2016-09-07 2018-03-13 深圳市中兴微电子技术有限公司 A kind of method for scheduling task, device and polycaryon processor
CN109144710A (en) * 2017-06-16 2019-01-04 中国移动通信有限公司研究院 Resource regulating method, device and computer readable storage medium
CN109819057A (en) * 2019-04-08 2019-05-28 科大讯飞股份有限公司 A kind of load-balancing method and system
CN109992407A (en) * 2018-01-02 2019-07-09 中国移动通信有限公司研究院 A kind of YARN cluster GPU resource dispatching method, device and medium
CN109995862A (en) * 2019-03-29 2019-07-09 北京百度网讯科技有限公司 A kind of resource regulating method and terminal

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018234869A2 (en) * 2017-06-22 2018-12-27 Banuba Limited Improving operation of computing devices by dynamically adaptive distribution of workload between central processing unit(s) and graphics processing unit(s), and computer systems and computer-implemented methods in accordance with thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107797853A (en) * 2016-09-07 2018-03-13 深圳市中兴微电子技术有限公司 A kind of method for scheduling task, device and polycaryon processor
CN109144710A (en) * 2017-06-16 2019-01-04 中国移动通信有限公司研究院 Resource regulating method, device and computer readable storage medium
CN109992407A (en) * 2018-01-02 2019-07-09 中国移动通信有限公司研究院 A kind of YARN cluster GPU resource dispatching method, device and medium
CN109995862A (en) * 2019-03-29 2019-07-09 北京百度网讯科技有限公司 A kind of resource regulating method and terminal
CN109819057A (en) * 2019-04-08 2019-05-28 科大讯飞股份有限公司 A kind of load-balancing method and system

Also Published As

Publication number Publication date
CN110413412A (en) 2019-11-05

Similar Documents

Publication Publication Date Title
CN110413412B (en) GPU (graphics processing Unit) cluster resource allocation method and device
CN109213600B (en) GPU resource scheduling method and device based on AI cloud
US20170255496A1 (en) Method for scheduling data flow task and apparatus
CN110389816B (en) Method, apparatus and computer readable medium for resource scheduling
CN111400022A (en) Resource scheduling method and device and electronic equipment
CN113448743B (en) Method, electronic device and computer program product for task processing
US10884667B2 (en) Storage controller and IO request processing method
CN112035238A (en) Task scheduling processing method and device, cluster system and readable storage medium
CN107577534A (en) A kind of resource regulating method and device
CN107515781B (en) Deterministic task scheduling and load balancing system based on multiple processors
CN103503412A (en) Method and device for scheduling resources
CN112540841A (en) Task scheduling method and device, processor and electronic equipment
CN112860387A (en) Distributed task scheduling method and device, computer equipment and storage medium
CN106775975B (en) Process scheduling method and device
CN115586961A (en) AI platform computing resource task scheduling method, device and medium
CN111367655B (en) Method, system and storage medium for GPU resource scheduling in cloud computing environment
CN109189581B (en) Job scheduling method and device
CN115878910A (en) Line query method, device and storage medium
CN108429704B (en) Node resource allocation method and device
CN115952054A (en) Simulation task resource management method, device, equipment and medium
CN110928649A (en) Resource scheduling method and device
CN111796934B (en) Task issuing method and device, storage medium and electronic equipment
CN114924848A (en) IO (input/output) scheduling method, device and equipment
US20210271520A1 (en) Application aware resource allocation for deep learning job scheduling
CN110750330A (en) Virtual machine creating method, system, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant