CN111796932A - GPU resource scheduling method - Google Patents

GPU resource scheduling method Download PDF

Info

Publication number
CN111796932A
CN111796932A CN202010576793.7A CN202010576793A CN111796932A CN 111796932 A CN111796932 A CN 111796932A CN 202010576793 A CN202010576793 A CN 202010576793A CN 111796932 A CN111796932 A CN 111796932A
Authority
CN
China
Prior art keywords
gpu
application
scheduling
gpus
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010576793.7A
Other languages
Chinese (zh)
Inventor
徐山川
王滨
王臣汉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Computing Tianjin Information Technology Co ltd
Original Assignee
Beijing Computing Tianjin Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Computing Tianjin Information Technology Co ltd filed Critical Beijing Computing Tianjin Information Technology Co ltd
Priority to CN202010576793.7A priority Critical patent/CN111796932A/en
Publication of CN111796932A publication Critical patent/CN111796932A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to the technical field of communication application, and discloses a GPU resource scheduling method, which comprises the following steps: s1, collecting basic information of the GPU from the cluster, providing a GPU-usages interface, and entering the step S2; s2, creating GPU application, sending an application request to a Kubernetes scheduler, and entering the step S3; s3, the Kubernetes dispatcher traverses all GPU applications in the cluster after receiving the application request, and the step S4 is entered; s4, calculating a GPU meeting the scheduling requirement of the application through a GPU-usages interface, and entering the step S5; and S5, the GPU manager binds the appointed GPU resources into the application according to the machine where the GPU is located on the application. The sharing of a single GPU in a plurality of applications according to GPU video memory and GPU computing power percentages is realized, the utilization efficiency of the single GPU is greatly improved, and the cost of GPU application is reduced.

Description

GPU resource scheduling method
Technical Field
The invention relates to the technical field of communication application, in particular to a GPU resource scheduling method.
Background
With the explosive growth of the device performance and the gradual popularization of the virtualization technology, how to realize the dynamic resource allocation and flexible scheduling of multiple virtualization devices on the existing physical device and improve the resource utilization rate is urgent to meet the needs of users in daily work.
The Kubernetes is adopted to manage the enterprise server cluster, so that the operation and maintenance cost of an enterprise is greatly reduced, and the utilization rate of resources is improved, but the Kubernetes mainly manages CPU, memory, storage and other hardware for resource management of each machine at present. Because more and more enterprises adopt the GPU to carry out model training and online service of machine learning at present, the efficient management of GPU resources is more and more important.
The defects of the prior art are as follows: resource allocation is performed on a GPU resource entity GPU card as a unit, and GPU resources cannot be shared by multiple applications, which may cause that even if a single application does not fully use the allocated computing resources, the exclusive resources cannot be allocated to other applications, so that the GPU resources cannot be fully utilized.
Disclosure of Invention
The invention mainly aims to provide a GPU resource scheduling method to solve the problem that GPU resources cannot be fully utilized by single application at present.
In order to achieve the above object, the present invention provides the following techniques:
a GPU resource scheduling method comprises the following steps:
s1, collecting basic information of the GPU from the cluster, providing a GPU-usages interface, and entering the step S2;
s2, creating GPU application, sending an application request to a Kubernetes scheduler, and entering the step S3;
s3, the Kubernetes dispatcher traverses all GPU applications in the cluster after receiving the application request, and the step S4 is entered;
s4, calculating a GPU meeting the scheduling requirement of the application through a GPU-usages interface, and entering the step S5;
and S5, the GPU manager binds the appointed GPU resources into the application according to the machine where the GPU is located on the application.
Further, in step S2, in the process of creating the GPU application, the application provides the required video memory value and the calculation force value.
Further, in step S1, the collected basic information of the GPU includes the model of the GPU, the video memory, and the GPU core.
Further, in step S4, if there is no GPU in the cluster that meets the scheduling requirement of the application, the process proceeds to steps S6, S6, and isolation of GPU resources.
Further, S6 includes steps S60 and S61, and S60 returns that the video memory allocation fails if the video memory required by the application exceeds the preset value or is greater than the video memory values of all GPUs in the cluster; s61, packaging the execution thread, periodically checking the core utilization rate of the program to the GPU, if the core utilization rate exceeds the set core utilization value or is greater than the video memory values of all GPUs in the cluster, then transferring the current execution thread into the waiting execution thread.
Further, in step S2, in the process of creating the GPU application, the model of the GPU and the number of GPUs required by the GPU application should be provided.
Further, in step S4, the first GPU that meets the requirement is taken, and the name of the machine where the GPU is located and the number of the GPU in the machine are marked on the application.
Further, in step S4, the machines with the corresponding number of idle GPUs are found through the GPU-usages interface, and the machine with the minimum number of idle GPUs is selected from the machines and the name of the machine is added to the application.
Further, in step S5, the GPU manager uses exhaustion to allocate the GPU to the application, completing the scheduling and binding of GPU resources.
Further, the method completes scheduling of GPU resources for one GPU application or a plurality of GPU applications.
Compared with the prior art, the invention can bring the following technical effects:
1. the sharing of a single GPU in a plurality of applications according to GPU video memory and GPU computing power percentages is realized, the utilization efficiency of the single GPU is greatly improved, and the cost of GPU application is reduced.
2. The topological structure between the GPUs considered when the GPUs are scheduled maximizes the communication efficiency between the GPUs under the same application, and improves the use performance of the application on the GPUs.
3. When the GPU application is scheduled in the Kubernetes cluster, centralized resource allocation is supported, namely machines which use more GPU applications are supported as much as possible, and the fact that the GPU application can still be successfully scheduled to the cluster when multiple cards are used subsequently is guaranteed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention and to enable other features, objects and advantages of the invention to be more fully apparent. The drawings and their description illustrate the invention by way of example and are not intended to limit the invention. In the drawings:
FIG. 1 is a general flow chart of a GPU resource scheduling method of the present invention;
FIG. 2 is a flow diagram of the sharing of a default scheduling policy with a single entity GPU of the present invention in the prior art;
FIG. 3 is a schematic diagram of the topology of DGX1 in an embodiment of the invention;
FIG. 4 is a diagram of a prior art multi-GPU distribution without consideration of topology among GPUs and with consideration of topology relation among GPUs according to the present invention;
FIG. 5 is a flowchart of a multiple GPU application scheduling of a default uniform scheduling policy and a centralized scheduling policy of the present invention in the prior art;
FIG. 6 is a diagram illustrating an example of a topology of a GPU in an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged under appropriate circumstances in order to facilitate the description of the embodiments of the invention herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In the present invention, the terms "upper", "lower", "left", "right", "front", "rear", "top", "bottom", "inner", "outer", "center", "vertical", "horizontal", "lateral", "longitudinal", and the like indicate an orientation or positional relationship based on the orientation or positional relationship shown in the drawings. These terms are used primarily to better describe the invention and its embodiments and are not intended to limit the indicated devices, elements or components to a particular orientation or to be constructed and operated in a particular orientation.
Moreover, some of the above terms may be used to indicate other meanings besides the orientation or positional relationship, for example, the term "on" may also be used to indicate some kind of attachment or connection relationship in some cases. The specific meanings of these terms in the present invention can be understood by those skilled in the art as appropriate.
In addition, the term "plurality" shall mean two as well as more than two.
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
Example 1
As shown in fig. 1 and 2, for an application that only needs one GPU, a method of resource allocation according to the required GPU video memory and the required number of cores is supported, instead of allocating a complete GPU to the application. The default GPU resource manager does not support allocation according to the resources required by the application, but directly locks the whole GPU and allocates the GPU to the required application.
A GPU resource scheduling method comprises the following steps:
s1, collecting basic information of the GPU from the cluster, providing a GPU-usages interface, and entering the step S2; in step S1, the collected basic information of the GPU includes the model of the GPU, the video memory, and the GPU core. And the scheduler is convenient to acquire the cluster GPU resource information.
S2, creating GPU application, sending an application request to a Kubernetes scheduler, and entering the step S3; in step S2, in the process of creating the GPU application, the application provides the required video memory value and the calculation force value. Since the number of cores of each GPU varies greatly and is not known to application developers, the ratio of the number of cores is directly scaled here. For example, a GPU application needs information to a cluster like: the model is a GPU resource of T4 type, 4GB video memory and 25% core number.
S3, the Kubernetes dispatcher traverses all GPU applications in the cluster after receiving the application request, and the step S4 is entered;
s4, calculating a GPU meeting the scheduling requirement of the application through a GPU-usages interface, and entering the step S5; in step S4, if there is no GPU in the cluster that meets the scheduling requirement of the application, the process proceeds to step S6, where GPU resources are isolated. In step S4, the first GPU that meets the requirement is fetched, and the name of the machine where the GPU is located and the number of the GPU in the machine are marked on the application.
And S5, the GPU manager binds the appointed GPU resources into the application according to the machine where the GPU is located on the application.
Further, S6 includes steps S60 and S61, and S60 returns that the video memory allocation fails if the video memory required by the application exceeds the preset value or is greater than the video memory values of all GPUs in the cluster; s61, packaging the execution thread, periodically checking the core utilization rate of the program to the GPU, if the core utilization rate exceeds the set GPU utilization core value or is greater than the video memory values of all GPUs in the cluster, then transferring the current execution thread into the waiting execution thread. After the shared scheduling of the GPU is completed, the GPU manager allocates the GPU video memory and the application of the GPU core according to the GPU application, but if no corresponding resource isolation mechanism exists, the situation that the application use exceeds the appointed GPU resource can not be guaranteed, and other applications cannot be normally used is avoided.
Further, the method can realize that the GPU resources are scheduled for one GPU application or a plurality of GPU applications.
Example 2
As shown in fig. 1, 3, 4, 5 and 6; for applications that require multiple GPUs: and distributing according to the GPU group with the highest communication efficiency. The connection structure of the GPUs in the machines is different, and the communication speed between the GPUs is also different. As shown in FIG. 3, the DGX-1 machine comprises 8 GPUs, wherein the GPU0, the GPU1, the GPU2, the GPU3 and the GPU4 can be directly connected in an NVLink mode, and the communication bandwidth can reach 40 GB/s. The connection between the GPU0 and the GPUs 5, 6 and 7 needs to be completed through PCIe Switch and QPI, which is far less efficient than NVLink. When allocating multiple GPUs to an application, the connection structure between the allocated multiple GPUs, also referred to as the topology of the GPUs, should be considered. The topology structure between the GPUs can be obtained through the driving of the GPUs, and the communication efficiency between the GPUs can be achieved through the topology structure. An example of a GPU topology is shown in figure 6.
And a centralized occupation scheme of GPU application is supported. The default Kubernetes resource scheduling mode is a resource uniform scheduling scheme, namely for a cluster, deployed applications are uniformly distributed on each node as much as possible, so that the availability of the applications can be ensured to the greatest extent, namely, when a certain machine has a problem, the applications in other machines are not influenced. However, for the multi-GPU application, the scheduling scheme may cause that GPU resources cannot be fully utilized, as shown in fig. 5, a path a adopts a default uniform scheduling policy, GPU resources are uniformly used, and scheduling cannot be completed when a new multi-GPU application demand appears; and b, adopting a centralized scheduling strategy for the path, scheduling the GPU application to a busy machine as much as possible, and finishing scheduling when multiple GPU applications exist.
The following is a one-time completed process for deployment of multiple GPU applications:
s1, collecting basic information of the GPU from the cluster, providing a GPU-usages interface, and entering the step S2; in step S1, the collected basic information of the GPU includes the model of the GPU, the video memory, and the GPU core. And the scheduler is convenient to acquire the cluster GPU resource information.
S2, creating GPU application, sending an application request to a Kubernetes scheduler, and entering the step S3; in step S2, in the process of creating the GPU application, the application provides the required video memory value and the calculation force value. Since the number of cores of each GPU varies greatly and is not known to application developers, the ratio of the number of cores is directly scaled here. For example, a GPU application needs information to a cluster like: the model is a GPU resource of T4 type, 4GB video memory and 25% core number. In the process of creating the GPU application, the model of the GPU and the number of GPUs required by the GPU application should also be provided. If the application is a multi-GPU application, only the model of the GPU and the number of the GPUs need to be provided, for example, one GPU application needs information similar to the following information to a cluster: model number T4 type, 2 GPU.
S3, the Kubernetes dispatcher traverses all GPU applications in the cluster after receiving the application request, and the step S4 is entered;
s4, calculating a GPU meeting the scheduling requirement of the application through a GPU-usages interface, and entering the step S5; in step S4, if there is no GPU in the cluster that meets the scheduling requirement of the application, the process proceeds to step S6, where GPU resources are isolated. In step S4, the first GPU that meets the requirement is fetched, and the name of the machine where the GPU is located and the number of the GPU in the machine are marked on the application.
And searching the machines with the corresponding number of idle GPUs through a GPU-usages interface, selecting the machine with the minimum idle number from the machines, and adding the name of the machine to the application. If the application is a multi-GPU application, the machine with the corresponding number of idle GPUs needs to be searched through a GPU-usages interface, and the machine with the minimum number of idle GPUs is selected from the machines and the name of the machine with the minimum number of idle GPUs is added to the application. For example, the application requires two T4 type GPUs, and at this step, 3 and 4 idle T4 type GPUs are found in machine 1 and machine 2, respectively, then machine 1 is selected as the scheduling machine of the application, and the information of the machine is added to the application.
And S5, the GPU manager binds the appointed GPU resources into the application according to the machine where the GPU is located on the application. In step S5, the GPU manager allocates the GPU to the application using exhaustion, completing the scheduling and binding of GPU resources.
And the GPU manager finds a group of GPUs with highest connection efficiency according to the machines distributed by the application in the corresponding machines by using an exhaustion method to distribute the GPUs to the application to complete the scheduling and binding of GPU resources. For example, in an application requiring two V100 model GPUs, we allocated them to one DGX-1 machine, where GPUs 0, 1, 7 were idle, we exhausted by the combination of (GPU0, GPU1), (GPU0, GPU7), (GPU1, GPU7), selecting (GPU0, GPU1) as the final bound GPU.
Further, S6 includes steps S60 and S61, and S60 returns that the video memory allocation fails if the video memory required by the application exceeds the preset value or is greater than the video memory values of all GPUs in the cluster; s61, packaging the execution thread, periodically checking the core utilization rate of the program to the GPU, if the core utilization rate exceeds the set core utilization value or is greater than the video memory values of all GPUs in the cluster, then transferring the current execution thread into the waiting execution thread. After the shared scheduling of the GPU is completed, the GPU manager allocates the GPU video memory and the application of the GPU core according to the GPU application, but if no corresponding resource isolation mechanism exists, the situation that the application use exceeds the appointed GPU resource can not be guaranteed, and other applications cannot be normally used is avoided.
Further, the method completes scheduling of GPU resources for one GPU application or a plurality of GPU applications.
Example 3
As shown in fig. 1, 2, 3, 4, 5 and 6; for applications that require multiple GPUs: and distributing according to the GPU group with the highest communication efficiency. The connection structure of the GPUs in the machines is different, and the communication speed between the GPUs is also different. As shown in FIG. 3, the DGX-1 machine comprises 8 GPUs, wherein the GPU0, the GPU1, the GPU2, the GPU3 and the GPU4 can be directly connected in an NVLink mode, and the communication bandwidth can reach 40 GB/s. The connection between the GPU0 and the GPUs 5, 6 and 7 needs to be completed through PCIe Switch and QPI, which is far less efficient than NVLink. When allocating multiple GPUs to an application, the connection structure between the allocated multiple GPUs, also referred to as the topology of the GPUs, should be considered. The topology structure between the GPUs can be obtained through the driving of the GPUs, and the communication efficiency between the GPUs can be achieved through the topology structure. An example of a GPU topology is shown in figure 6.
And a centralized occupation scheme of GPU application is supported. The default Kubernetes resource scheduling mode is a resource uniform scheduling scheme, namely for a cluster, deployed applications are uniformly distributed on each node as much as possible, so that the availability of the applications can be ensured to the greatest extent, namely, when a certain machine has a problem, the applications in other machines are not influenced. However, for the multi-GPU application, the scheduling scheme may cause that GPU resources cannot be fully utilized, as shown in fig. 5, a path a adopts a default uniform scheduling policy, GPU resources are uniformly used, and scheduling cannot be completed when a new multi-GPU application demand appears; and b, adopting a centralized scheduling strategy for the path, scheduling the GPU application to a busy machine as much as possible, and finishing scheduling when multiple GPU applications exist.
The following is a one-time completed process for deployment of multiple GPU applications:
s1, collecting basic information of the GPU from the cluster, providing a GPU-usages interface, and entering the step S2; in step S1, the collected basic information of the GPU includes the model of the GPU, the video memory, and the GPU core. And the scheduler is convenient to acquire the cluster GPU resource information.
S2, creating GPU application, sending an application request to a Kubernetes scheduler, and entering the step S3; in step S2, in the process of creating the GPU application, the application provides the required video memory value and the calculation force value. Since the number of cores of each GPU varies greatly and is not known to application developers, the ratio of the number of cores is directly scaled here. For example, a GPU application needs information to a cluster like: the model is a GPU resource of T4 type, 4GB video memory and 25% core number. In the process of creating the GPU application, the model of the GPU and the number of GPUs required by the GPU application should also be provided. If the application is a multi-GPU application, only the model of the GPU and the number of the GPUs need to be provided, for example, one GPU application needs information similar to the following information to a cluster: model number T4 type, 2 GPU.
S3, the Kubernetes dispatcher traverses all GPU applications in the cluster after receiving the application request, and the step S4 is entered;
s4, calculating a GPU meeting the scheduling requirement of the application through a GPU-usages interface, and entering the step S5; in step S4, if there is no GPU in the cluster that meets the scheduling requirement of the application, the process proceeds to step S6, where GPU resources are isolated. In step S4, the first GPU that meets the requirement is fetched, and the name of the machine where the GPU is located and the number of the GPU in the machine are marked on the application.
And searching the machines with the corresponding number of idle GPUs through a GPU-usages interface, selecting the machine with the minimum idle number from the machines, and adding the name of the machine to the application. If the application is a multi-GPU application, the machine with the corresponding number of idle GPUs needs to be searched through a GPU-usages interface, and the machine with the minimum number of idle GPUs is selected from the machines and the name of the machine with the minimum number of idle GPUs is added to the application. For example, the application requires two T4 type GPUs, and at this step, 3 and 4 idle T4 type GPUs are found in machine 1 and machine 2, respectively, then machine 1 is selected as the scheduling machine of the application, and the information of the machine is added to the application.
And S5, the GPU manager binds the appointed GPU resources into the application according to the machine where the GPU is located on the application. In step S5, the GPU manager allocates the GPU to the application using exhaustion, completing the scheduling and binding of GPU resources.
And the GPU manager finds a group of GPUs with highest connection efficiency according to the machines distributed by the application in the corresponding machines by using an exhaustion method to distribute the GPUs to the application to complete the scheduling and binding of GPU resources. For example, in an application requiring three V100 model GPUs, we allocated them to one DGX-1 machine, where GPU0, GPU1, GPU3, GPU5, GPU7 were idle, we exhausted by the combination of (GPU0, GPU1, GPU3), (GPU1, GPU3, GPU5), (GPU3, GPU5, GPU7), (GPU0, GPU1, GPU5), (GPU0, GPU1, GPU7), (GPU1, GPU3, GPU7), (… …), selecting (GPU0, GPU1, GPU3) as the final bound GPU.
Further, S6 includes steps S60 and S61, and S60 returns that the video memory allocation fails if the video memory required by the application exceeds the preset value or is greater than the video memory values of all GPUs in the cluster; s61, packaging the execution thread, periodically checking the core utilization rate of the program to the GPU, if the core utilization rate exceeds the set core utilization value or is greater than the video memory values of all GPUs in the cluster, then transferring the current execution thread into the waiting execution thread. After the shared scheduling of the GPU is completed, the GPU manager allocates the GPU video memory and the application of the GPU core according to the GPU application, but if no corresponding resource isolation mechanism exists, the situation that the application use exceeds the appointed GPU resource can not be guaranteed, and other applications cannot be normally used is avoided.
Further, the method completes scheduling of GPU resources for one GPU application or a plurality of GPU applications.
Compared with the prior art, the invention can bring the following technical effects:
1. the sharing of a single GPU in a plurality of applications according to GPU video memory and GPU computing power percentages is realized, the utilization efficiency of the single GPU is greatly improved, and the cost of GPU application is reduced.
2. The topological structure between the GPUs considered when the GPUs are scheduled maximizes the communication efficiency between the GPUs under the same application, and improves the use performance of the application on the GPUs.
3. When the GPU application is scheduled in the Kubernetes cluster, centralized resource allocation is supported, namely machines which use more GPU applications are supported as much as possible, and the fact that the GPU application can still be successfully scheduled to the cluster when multiple cards are used subsequently is guaranteed.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A GPU resource scheduling method is characterized by comprising the following steps:
s1, collecting basic information of the GPU from the cluster, providing a GPU-usages interface, and entering the step S2;
s2, creating GPU application, sending an application request to a Kubernetes scheduler, and entering the step S3;
s3, the Kubernetes dispatcher traverses all GPU applications in the cluster after receiving the application request, and the step S4 is entered;
s4, calculating a GPU meeting the scheduling requirement of the application through a GPU-usages interface, and entering the step S5;
and S5, the GPU manager binds the appointed GPU resources into the application according to the machine where the GPU is located on the application.
2. The method as claimed in claim 1, wherein in step S2, in the process of creating the GPU application, the application provides the required video memory value and computation force value.
3. The method as claimed in claim 1 or 2, wherein in step S1, the collected basic information of the GPU includes a model of the GPU, a video memory, and a GPU core.
4. The method as claimed in claim 3, wherein in step S4, if there is no GPU in the cluster meeting the scheduling requirement of the application, the method proceeds to steps S6, S6, and GPU resource isolation.
5. The method for scheduling GPU resources of claim 4, wherein S6 includes steps S60 and S61, S60, and if the video memory required by the application exceeds the preset value or is greater than all GPU video memory values in the cluster, then returning a video memory allocation failure; s61, packaging the execution thread, periodically checking the core utilization rate of the program to the GPU, if the core utilization rate exceeds the set core utilization value or is greater than the video memory values of all GPUs in the cluster, then transferring the current execution thread into the waiting execution thread.
6. A method as claimed in claim 1, 2, 4 or 5, wherein in step S2, the model number of the GPU and the number of GPUs required by the GPU should be provided in the process of creating the GPU application.
7. The method as claimed in claim 6, wherein in step S4, the first GPU meeting the requirement is taken, and the name of the machine where the GPU is located and the number of the GPU in the machine are marked on the application.
8. A method as claimed in claim 1, 2, 4, 5 or 7, wherein in step S4, the machines with the corresponding number of idle GPUs are found through the GPU-usages interface, and the machine with the least number of idle GPUs is selected from the machines and its name is added to the application.
9. A method as claimed in claim 1, 2, 4, 5 or 7, wherein in step S5, the GPU manager uses exhaustion to allocate the GPU to the application, so as to complete the scheduling and binding of GPU resources.
10. A method for scheduling GPU resources as claimed in any of claims 1 to 9, wherein the method performs scheduling of GPU resources for a GPU application or a plurality of GPU applications.
CN202010576793.7A 2020-06-22 2020-06-22 GPU resource scheduling method Pending CN111796932A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010576793.7A CN111796932A (en) 2020-06-22 2020-06-22 GPU resource scheduling method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010576793.7A CN111796932A (en) 2020-06-22 2020-06-22 GPU resource scheduling method

Publications (1)

Publication Number Publication Date
CN111796932A true CN111796932A (en) 2020-10-20

Family

ID=72803890

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010576793.7A Pending CN111796932A (en) 2020-06-22 2020-06-22 GPU resource scheduling method

Country Status (1)

Country Link
CN (1) CN111796932A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114500413A (en) * 2021-12-17 2022-05-13 阿里巴巴(中国)有限公司 Equipment connection method and device and equipment connection chip
US12028878B2 (en) 2020-11-12 2024-07-02 Samsung Electronics Co., Ltd. Method and apparatus for allocating GPU to software package

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109033001A (en) * 2018-07-17 2018-12-18 北京百度网讯科技有限公司 Method and apparatus for distributing GPU
CN110688218A (en) * 2019-09-05 2020-01-14 广东浪潮大数据研究有限公司 Resource scheduling method and device
CN111158879A (en) * 2019-12-31 2020-05-15 上海依图网络科技有限公司 System resource scheduling method, device, machine readable medium and system
CN111190718A (en) * 2020-01-07 2020-05-22 第四范式(北京)技术有限公司 Method, device and system for realizing task scheduling

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109033001A (en) * 2018-07-17 2018-12-18 北京百度网讯科技有限公司 Method and apparatus for distributing GPU
CN110688218A (en) * 2019-09-05 2020-01-14 广东浪潮大数据研究有限公司 Resource scheduling method and device
CN111158879A (en) * 2019-12-31 2020-05-15 上海依图网络科技有限公司 System resource scheduling method, device, machine readable medium and system
CN111190718A (en) * 2020-01-07 2020-05-22 第四范式(北京)技术有限公司 Method, device and system for realizing task scheduling

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12028878B2 (en) 2020-11-12 2024-07-02 Samsung Electronics Co., Ltd. Method and apparatus for allocating GPU to software package
CN114500413A (en) * 2021-12-17 2022-05-13 阿里巴巴(中国)有限公司 Equipment connection method and device and equipment connection chip
CN114500413B (en) * 2021-12-17 2024-04-16 阿里巴巴(中国)有限公司 Device connection method and device, and device connection chip

Similar Documents

Publication Publication Date Title
CN107038069B (en) Dynamic label matching DLMS scheduling method under Hadoop platform
CN103870314B (en) Method and system for simultaneously operating different types of virtual machines by single node
CN102387173B (en) MapReduce system and method and device for scheduling tasks thereof
CN107222531B (en) Container cloud resource scheduling method
CN103534687B (en) Extensible centralized dynamic resource distribution in a clustered data grid
CN104881325A (en) Resource scheduling method and resource scheduling system
CN114356543A (en) Kubernetes-based multi-tenant machine learning task resource scheduling method
CN110221920B (en) Deployment method, device, storage medium and system
CN113946431B (en) Resource scheduling method, system, medium and computing device
CN110990154B (en) Big data application optimization method, device and storage medium
CN109783225B (en) Tenant priority management method and system of multi-tenant big data platform
CN104050043A (en) Share cache perception-based virtual machine scheduling method and device
CN114996018A (en) Resource scheduling method, node, system, device and medium for heterogeneous computing
CN114443263A (en) Video memory management method, device, equipment and system
CN106874115A (en) A kind of resources of virtual machine distribution method and distributed virtual machine resource scheduling system
JP2022539955A (en) Task scheduling method and apparatus
CN115134371A (en) Scheduling method, system, equipment and medium containing edge network computing resources
CN109471725A (en) Resource allocation methods, device and server
CN111796932A (en) GPU resource scheduling method
CN107992351B (en) Hardware resource allocation method and device and electronic equipment
CN114721818A (en) Kubernetes cluster-based GPU time-sharing method and system
JP2023543744A (en) Resource scheduling method, system, electronic device and computer readable storage medium
CN117608760A (en) Cloud application hybrid deployment method applied to Kubernetes
CN105187483B (en) Distribute the method and device of cloud computing resources
WO2017133421A1 (en) Method and device for sharing resources among multiple tenants

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination