CN111796932A - GPU resource scheduling method - Google Patents
GPU resource scheduling method Download PDFInfo
- Publication number
- CN111796932A CN111796932A CN202010576793.7A CN202010576793A CN111796932A CN 111796932 A CN111796932 A CN 111796932A CN 202010576793 A CN202010576793 A CN 202010576793A CN 111796932 A CN111796932 A CN 111796932A
- Authority
- CN
- China
- Prior art keywords
- gpu
- application
- scheduling
- gpus
- cluster
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 230000008569 process Effects 0.000 claims description 17
- 238000002955 isolation Methods 0.000 claims description 5
- 238000004806 packaging method and process Methods 0.000 claims description 5
- 238000004891 communication Methods 0.000 abstract description 12
- 238000013468 resource allocation Methods 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention relates to the technical field of communication application, and discloses a GPU resource scheduling method, which comprises the following steps: s1, collecting basic information of the GPU from the cluster, providing a GPU-usages interface, and entering the step S2; s2, creating GPU application, sending an application request to a Kubernetes scheduler, and entering the step S3; s3, the Kubernetes dispatcher traverses all GPU applications in the cluster after receiving the application request, and the step S4 is entered; s4, calculating a GPU meeting the scheduling requirement of the application through a GPU-usages interface, and entering the step S5; and S5, the GPU manager binds the appointed GPU resources into the application according to the machine where the GPU is located on the application. The sharing of a single GPU in a plurality of applications according to GPU video memory and GPU computing power percentages is realized, the utilization efficiency of the single GPU is greatly improved, and the cost of GPU application is reduced.
Description
Technical Field
The invention relates to the technical field of communication application, in particular to a GPU resource scheduling method.
Background
With the explosive growth of the device performance and the gradual popularization of the virtualization technology, how to realize the dynamic resource allocation and flexible scheduling of multiple virtualization devices on the existing physical device and improve the resource utilization rate is urgent to meet the needs of users in daily work.
The Kubernetes is adopted to manage the enterprise server cluster, so that the operation and maintenance cost of an enterprise is greatly reduced, and the utilization rate of resources is improved, but the Kubernetes mainly manages CPU, memory, storage and other hardware for resource management of each machine at present. Because more and more enterprises adopt the GPU to carry out model training and online service of machine learning at present, the efficient management of GPU resources is more and more important.
The defects of the prior art are as follows: resource allocation is performed on a GPU resource entity GPU card as a unit, and GPU resources cannot be shared by multiple applications, which may cause that even if a single application does not fully use the allocated computing resources, the exclusive resources cannot be allocated to other applications, so that the GPU resources cannot be fully utilized.
Disclosure of Invention
The invention mainly aims to provide a GPU resource scheduling method to solve the problem that GPU resources cannot be fully utilized by single application at present.
In order to achieve the above object, the present invention provides the following techniques:
a GPU resource scheduling method comprises the following steps:
s1, collecting basic information of the GPU from the cluster, providing a GPU-usages interface, and entering the step S2;
s2, creating GPU application, sending an application request to a Kubernetes scheduler, and entering the step S3;
s3, the Kubernetes dispatcher traverses all GPU applications in the cluster after receiving the application request, and the step S4 is entered;
s4, calculating a GPU meeting the scheduling requirement of the application through a GPU-usages interface, and entering the step S5;
and S5, the GPU manager binds the appointed GPU resources into the application according to the machine where the GPU is located on the application.
Further, in step S2, in the process of creating the GPU application, the application provides the required video memory value and the calculation force value.
Further, in step S1, the collected basic information of the GPU includes the model of the GPU, the video memory, and the GPU core.
Further, in step S4, if there is no GPU in the cluster that meets the scheduling requirement of the application, the process proceeds to steps S6, S6, and isolation of GPU resources.
Further, S6 includes steps S60 and S61, and S60 returns that the video memory allocation fails if the video memory required by the application exceeds the preset value or is greater than the video memory values of all GPUs in the cluster; s61, packaging the execution thread, periodically checking the core utilization rate of the program to the GPU, if the core utilization rate exceeds the set core utilization value or is greater than the video memory values of all GPUs in the cluster, then transferring the current execution thread into the waiting execution thread.
Further, in step S2, in the process of creating the GPU application, the model of the GPU and the number of GPUs required by the GPU application should be provided.
Further, in step S4, the first GPU that meets the requirement is taken, and the name of the machine where the GPU is located and the number of the GPU in the machine are marked on the application.
Further, in step S4, the machines with the corresponding number of idle GPUs are found through the GPU-usages interface, and the machine with the minimum number of idle GPUs is selected from the machines and the name of the machine is added to the application.
Further, in step S5, the GPU manager uses exhaustion to allocate the GPU to the application, completing the scheduling and binding of GPU resources.
Further, the method completes scheduling of GPU resources for one GPU application or a plurality of GPU applications.
Compared with the prior art, the invention can bring the following technical effects:
1. the sharing of a single GPU in a plurality of applications according to GPU video memory and GPU computing power percentages is realized, the utilization efficiency of the single GPU is greatly improved, and the cost of GPU application is reduced.
2. The topological structure between the GPUs considered when the GPUs are scheduled maximizes the communication efficiency between the GPUs under the same application, and improves the use performance of the application on the GPUs.
3. When the GPU application is scheduled in the Kubernetes cluster, centralized resource allocation is supported, namely machines which use more GPU applications are supported as much as possible, and the fact that the GPU application can still be successfully scheduled to the cluster when multiple cards are used subsequently is guaranteed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention and to enable other features, objects and advantages of the invention to be more fully apparent. The drawings and their description illustrate the invention by way of example and are not intended to limit the invention. In the drawings:
FIG. 1 is a general flow chart of a GPU resource scheduling method of the present invention;
FIG. 2 is a flow diagram of the sharing of a default scheduling policy with a single entity GPU of the present invention in the prior art;
FIG. 3 is a schematic diagram of the topology of DGX1 in an embodiment of the invention;
FIG. 4 is a diagram of a prior art multi-GPU distribution without consideration of topology among GPUs and with consideration of topology relation among GPUs according to the present invention;
FIG. 5 is a flowchart of a multiple GPU application scheduling of a default uniform scheduling policy and a centralized scheduling policy of the present invention in the prior art;
FIG. 6 is a diagram illustrating an example of a topology of a GPU in an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged under appropriate circumstances in order to facilitate the description of the embodiments of the invention herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In the present invention, the terms "upper", "lower", "left", "right", "front", "rear", "top", "bottom", "inner", "outer", "center", "vertical", "horizontal", "lateral", "longitudinal", and the like indicate an orientation or positional relationship based on the orientation or positional relationship shown in the drawings. These terms are used primarily to better describe the invention and its embodiments and are not intended to limit the indicated devices, elements or components to a particular orientation or to be constructed and operated in a particular orientation.
Moreover, some of the above terms may be used to indicate other meanings besides the orientation or positional relationship, for example, the term "on" may also be used to indicate some kind of attachment or connection relationship in some cases. The specific meanings of these terms in the present invention can be understood by those skilled in the art as appropriate.
In addition, the term "plurality" shall mean two as well as more than two.
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
Example 1
As shown in fig. 1 and 2, for an application that only needs one GPU, a method of resource allocation according to the required GPU video memory and the required number of cores is supported, instead of allocating a complete GPU to the application. The default GPU resource manager does not support allocation according to the resources required by the application, but directly locks the whole GPU and allocates the GPU to the required application.
A GPU resource scheduling method comprises the following steps:
s1, collecting basic information of the GPU from the cluster, providing a GPU-usages interface, and entering the step S2; in step S1, the collected basic information of the GPU includes the model of the GPU, the video memory, and the GPU core. And the scheduler is convenient to acquire the cluster GPU resource information.
S2, creating GPU application, sending an application request to a Kubernetes scheduler, and entering the step S3; in step S2, in the process of creating the GPU application, the application provides the required video memory value and the calculation force value. Since the number of cores of each GPU varies greatly and is not known to application developers, the ratio of the number of cores is directly scaled here. For example, a GPU application needs information to a cluster like: the model is a GPU resource of T4 type, 4GB video memory and 25% core number.
S3, the Kubernetes dispatcher traverses all GPU applications in the cluster after receiving the application request, and the step S4 is entered;
s4, calculating a GPU meeting the scheduling requirement of the application through a GPU-usages interface, and entering the step S5; in step S4, if there is no GPU in the cluster that meets the scheduling requirement of the application, the process proceeds to step S6, where GPU resources are isolated. In step S4, the first GPU that meets the requirement is fetched, and the name of the machine where the GPU is located and the number of the GPU in the machine are marked on the application.
And S5, the GPU manager binds the appointed GPU resources into the application according to the machine where the GPU is located on the application.
Further, S6 includes steps S60 and S61, and S60 returns that the video memory allocation fails if the video memory required by the application exceeds the preset value or is greater than the video memory values of all GPUs in the cluster; s61, packaging the execution thread, periodically checking the core utilization rate of the program to the GPU, if the core utilization rate exceeds the set GPU utilization core value or is greater than the video memory values of all GPUs in the cluster, then transferring the current execution thread into the waiting execution thread. After the shared scheduling of the GPU is completed, the GPU manager allocates the GPU video memory and the application of the GPU core according to the GPU application, but if no corresponding resource isolation mechanism exists, the situation that the application use exceeds the appointed GPU resource can not be guaranteed, and other applications cannot be normally used is avoided.
Further, the method can realize that the GPU resources are scheduled for one GPU application or a plurality of GPU applications.
Example 2
As shown in fig. 1, 3, 4, 5 and 6; for applications that require multiple GPUs: and distributing according to the GPU group with the highest communication efficiency. The connection structure of the GPUs in the machines is different, and the communication speed between the GPUs is also different. As shown in FIG. 3, the DGX-1 machine comprises 8 GPUs, wherein the GPU0, the GPU1, the GPU2, the GPU3 and the GPU4 can be directly connected in an NVLink mode, and the communication bandwidth can reach 40 GB/s. The connection between the GPU0 and the GPUs 5, 6 and 7 needs to be completed through PCIe Switch and QPI, which is far less efficient than NVLink. When allocating multiple GPUs to an application, the connection structure between the allocated multiple GPUs, also referred to as the topology of the GPUs, should be considered. The topology structure between the GPUs can be obtained through the driving of the GPUs, and the communication efficiency between the GPUs can be achieved through the topology structure. An example of a GPU topology is shown in figure 6.
And a centralized occupation scheme of GPU application is supported. The default Kubernetes resource scheduling mode is a resource uniform scheduling scheme, namely for a cluster, deployed applications are uniformly distributed on each node as much as possible, so that the availability of the applications can be ensured to the greatest extent, namely, when a certain machine has a problem, the applications in other machines are not influenced. However, for the multi-GPU application, the scheduling scheme may cause that GPU resources cannot be fully utilized, as shown in fig. 5, a path a adopts a default uniform scheduling policy, GPU resources are uniformly used, and scheduling cannot be completed when a new multi-GPU application demand appears; and b, adopting a centralized scheduling strategy for the path, scheduling the GPU application to a busy machine as much as possible, and finishing scheduling when multiple GPU applications exist.
The following is a one-time completed process for deployment of multiple GPU applications:
s1, collecting basic information of the GPU from the cluster, providing a GPU-usages interface, and entering the step S2; in step S1, the collected basic information of the GPU includes the model of the GPU, the video memory, and the GPU core. And the scheduler is convenient to acquire the cluster GPU resource information.
S2, creating GPU application, sending an application request to a Kubernetes scheduler, and entering the step S3; in step S2, in the process of creating the GPU application, the application provides the required video memory value and the calculation force value. Since the number of cores of each GPU varies greatly and is not known to application developers, the ratio of the number of cores is directly scaled here. For example, a GPU application needs information to a cluster like: the model is a GPU resource of T4 type, 4GB video memory and 25% core number. In the process of creating the GPU application, the model of the GPU and the number of GPUs required by the GPU application should also be provided. If the application is a multi-GPU application, only the model of the GPU and the number of the GPUs need to be provided, for example, one GPU application needs information similar to the following information to a cluster: model number T4 type, 2 GPU.
S3, the Kubernetes dispatcher traverses all GPU applications in the cluster after receiving the application request, and the step S4 is entered;
s4, calculating a GPU meeting the scheduling requirement of the application through a GPU-usages interface, and entering the step S5; in step S4, if there is no GPU in the cluster that meets the scheduling requirement of the application, the process proceeds to step S6, where GPU resources are isolated. In step S4, the first GPU that meets the requirement is fetched, and the name of the machine where the GPU is located and the number of the GPU in the machine are marked on the application.
And searching the machines with the corresponding number of idle GPUs through a GPU-usages interface, selecting the machine with the minimum idle number from the machines, and adding the name of the machine to the application. If the application is a multi-GPU application, the machine with the corresponding number of idle GPUs needs to be searched through a GPU-usages interface, and the machine with the minimum number of idle GPUs is selected from the machines and the name of the machine with the minimum number of idle GPUs is added to the application. For example, the application requires two T4 type GPUs, and at this step, 3 and 4 idle T4 type GPUs are found in machine 1 and machine 2, respectively, then machine 1 is selected as the scheduling machine of the application, and the information of the machine is added to the application.
And S5, the GPU manager binds the appointed GPU resources into the application according to the machine where the GPU is located on the application. In step S5, the GPU manager allocates the GPU to the application using exhaustion, completing the scheduling and binding of GPU resources.
And the GPU manager finds a group of GPUs with highest connection efficiency according to the machines distributed by the application in the corresponding machines by using an exhaustion method to distribute the GPUs to the application to complete the scheduling and binding of GPU resources. For example, in an application requiring two V100 model GPUs, we allocated them to one DGX-1 machine, where GPUs 0, 1, 7 were idle, we exhausted by the combination of (GPU0, GPU1), (GPU0, GPU7), (GPU1, GPU7), selecting (GPU0, GPU1) as the final bound GPU.
Further, S6 includes steps S60 and S61, and S60 returns that the video memory allocation fails if the video memory required by the application exceeds the preset value or is greater than the video memory values of all GPUs in the cluster; s61, packaging the execution thread, periodically checking the core utilization rate of the program to the GPU, if the core utilization rate exceeds the set core utilization value or is greater than the video memory values of all GPUs in the cluster, then transferring the current execution thread into the waiting execution thread. After the shared scheduling of the GPU is completed, the GPU manager allocates the GPU video memory and the application of the GPU core according to the GPU application, but if no corresponding resource isolation mechanism exists, the situation that the application use exceeds the appointed GPU resource can not be guaranteed, and other applications cannot be normally used is avoided.
Further, the method completes scheduling of GPU resources for one GPU application or a plurality of GPU applications.
Example 3
As shown in fig. 1, 2, 3, 4, 5 and 6; for applications that require multiple GPUs: and distributing according to the GPU group with the highest communication efficiency. The connection structure of the GPUs in the machines is different, and the communication speed between the GPUs is also different. As shown in FIG. 3, the DGX-1 machine comprises 8 GPUs, wherein the GPU0, the GPU1, the GPU2, the GPU3 and the GPU4 can be directly connected in an NVLink mode, and the communication bandwidth can reach 40 GB/s. The connection between the GPU0 and the GPUs 5, 6 and 7 needs to be completed through PCIe Switch and QPI, which is far less efficient than NVLink. When allocating multiple GPUs to an application, the connection structure between the allocated multiple GPUs, also referred to as the topology of the GPUs, should be considered. The topology structure between the GPUs can be obtained through the driving of the GPUs, and the communication efficiency between the GPUs can be achieved through the topology structure. An example of a GPU topology is shown in figure 6.
And a centralized occupation scheme of GPU application is supported. The default Kubernetes resource scheduling mode is a resource uniform scheduling scheme, namely for a cluster, deployed applications are uniformly distributed on each node as much as possible, so that the availability of the applications can be ensured to the greatest extent, namely, when a certain machine has a problem, the applications in other machines are not influenced. However, for the multi-GPU application, the scheduling scheme may cause that GPU resources cannot be fully utilized, as shown in fig. 5, a path a adopts a default uniform scheduling policy, GPU resources are uniformly used, and scheduling cannot be completed when a new multi-GPU application demand appears; and b, adopting a centralized scheduling strategy for the path, scheduling the GPU application to a busy machine as much as possible, and finishing scheduling when multiple GPU applications exist.
The following is a one-time completed process for deployment of multiple GPU applications:
s1, collecting basic information of the GPU from the cluster, providing a GPU-usages interface, and entering the step S2; in step S1, the collected basic information of the GPU includes the model of the GPU, the video memory, and the GPU core. And the scheduler is convenient to acquire the cluster GPU resource information.
S2, creating GPU application, sending an application request to a Kubernetes scheduler, and entering the step S3; in step S2, in the process of creating the GPU application, the application provides the required video memory value and the calculation force value. Since the number of cores of each GPU varies greatly and is not known to application developers, the ratio of the number of cores is directly scaled here. For example, a GPU application needs information to a cluster like: the model is a GPU resource of T4 type, 4GB video memory and 25% core number. In the process of creating the GPU application, the model of the GPU and the number of GPUs required by the GPU application should also be provided. If the application is a multi-GPU application, only the model of the GPU and the number of the GPUs need to be provided, for example, one GPU application needs information similar to the following information to a cluster: model number T4 type, 2 GPU.
S3, the Kubernetes dispatcher traverses all GPU applications in the cluster after receiving the application request, and the step S4 is entered;
s4, calculating a GPU meeting the scheduling requirement of the application through a GPU-usages interface, and entering the step S5; in step S4, if there is no GPU in the cluster that meets the scheduling requirement of the application, the process proceeds to step S6, where GPU resources are isolated. In step S4, the first GPU that meets the requirement is fetched, and the name of the machine where the GPU is located and the number of the GPU in the machine are marked on the application.
And searching the machines with the corresponding number of idle GPUs through a GPU-usages interface, selecting the machine with the minimum idle number from the machines, and adding the name of the machine to the application. If the application is a multi-GPU application, the machine with the corresponding number of idle GPUs needs to be searched through a GPU-usages interface, and the machine with the minimum number of idle GPUs is selected from the machines and the name of the machine with the minimum number of idle GPUs is added to the application. For example, the application requires two T4 type GPUs, and at this step, 3 and 4 idle T4 type GPUs are found in machine 1 and machine 2, respectively, then machine 1 is selected as the scheduling machine of the application, and the information of the machine is added to the application.
And S5, the GPU manager binds the appointed GPU resources into the application according to the machine where the GPU is located on the application. In step S5, the GPU manager allocates the GPU to the application using exhaustion, completing the scheduling and binding of GPU resources.
And the GPU manager finds a group of GPUs with highest connection efficiency according to the machines distributed by the application in the corresponding machines by using an exhaustion method to distribute the GPUs to the application to complete the scheduling and binding of GPU resources. For example, in an application requiring three V100 model GPUs, we allocated them to one DGX-1 machine, where GPU0, GPU1, GPU3, GPU5, GPU7 were idle, we exhausted by the combination of (GPU0, GPU1, GPU3), (GPU1, GPU3, GPU5), (GPU3, GPU5, GPU7), (GPU0, GPU1, GPU5), (GPU0, GPU1, GPU7), (GPU1, GPU3, GPU7), (… …), selecting (GPU0, GPU1, GPU3) as the final bound GPU.
Further, S6 includes steps S60 and S61, and S60 returns that the video memory allocation fails if the video memory required by the application exceeds the preset value or is greater than the video memory values of all GPUs in the cluster; s61, packaging the execution thread, periodically checking the core utilization rate of the program to the GPU, if the core utilization rate exceeds the set core utilization value or is greater than the video memory values of all GPUs in the cluster, then transferring the current execution thread into the waiting execution thread. After the shared scheduling of the GPU is completed, the GPU manager allocates the GPU video memory and the application of the GPU core according to the GPU application, but if no corresponding resource isolation mechanism exists, the situation that the application use exceeds the appointed GPU resource can not be guaranteed, and other applications cannot be normally used is avoided.
Further, the method completes scheduling of GPU resources for one GPU application or a plurality of GPU applications.
Compared with the prior art, the invention can bring the following technical effects:
1. the sharing of a single GPU in a plurality of applications according to GPU video memory and GPU computing power percentages is realized, the utilization efficiency of the single GPU is greatly improved, and the cost of GPU application is reduced.
2. The topological structure between the GPUs considered when the GPUs are scheduled maximizes the communication efficiency between the GPUs under the same application, and improves the use performance of the application on the GPUs.
3. When the GPU application is scheduled in the Kubernetes cluster, centralized resource allocation is supported, namely machines which use more GPU applications are supported as much as possible, and the fact that the GPU application can still be successfully scheduled to the cluster when multiple cards are used subsequently is guaranteed.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. A GPU resource scheduling method is characterized by comprising the following steps:
s1, collecting basic information of the GPU from the cluster, providing a GPU-usages interface, and entering the step S2;
s2, creating GPU application, sending an application request to a Kubernetes scheduler, and entering the step S3;
s3, the Kubernetes dispatcher traverses all GPU applications in the cluster after receiving the application request, and the step S4 is entered;
s4, calculating a GPU meeting the scheduling requirement of the application through a GPU-usages interface, and entering the step S5;
and S5, the GPU manager binds the appointed GPU resources into the application according to the machine where the GPU is located on the application.
2. The method as claimed in claim 1, wherein in step S2, in the process of creating the GPU application, the application provides the required video memory value and computation force value.
3. The method as claimed in claim 1 or 2, wherein in step S1, the collected basic information of the GPU includes a model of the GPU, a video memory, and a GPU core.
4. The method as claimed in claim 3, wherein in step S4, if there is no GPU in the cluster meeting the scheduling requirement of the application, the method proceeds to steps S6, S6, and GPU resource isolation.
5. The method for scheduling GPU resources of claim 4, wherein S6 includes steps S60 and S61, S60, and if the video memory required by the application exceeds the preset value or is greater than all GPU video memory values in the cluster, then returning a video memory allocation failure; s61, packaging the execution thread, periodically checking the core utilization rate of the program to the GPU, if the core utilization rate exceeds the set core utilization value or is greater than the video memory values of all GPUs in the cluster, then transferring the current execution thread into the waiting execution thread.
6. A method as claimed in claim 1, 2, 4 or 5, wherein in step S2, the model number of the GPU and the number of GPUs required by the GPU should be provided in the process of creating the GPU application.
7. The method as claimed in claim 6, wherein in step S4, the first GPU meeting the requirement is taken, and the name of the machine where the GPU is located and the number of the GPU in the machine are marked on the application.
8. A method as claimed in claim 1, 2, 4, 5 or 7, wherein in step S4, the machines with the corresponding number of idle GPUs are found through the GPU-usages interface, and the machine with the least number of idle GPUs is selected from the machines and its name is added to the application.
9. A method as claimed in claim 1, 2, 4, 5 or 7, wherein in step S5, the GPU manager uses exhaustion to allocate the GPU to the application, so as to complete the scheduling and binding of GPU resources.
10. A method for scheduling GPU resources as claimed in any of claims 1 to 9, wherein the method performs scheduling of GPU resources for a GPU application or a plurality of GPU applications.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010576793.7A CN111796932A (en) | 2020-06-22 | 2020-06-22 | GPU resource scheduling method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010576793.7A CN111796932A (en) | 2020-06-22 | 2020-06-22 | GPU resource scheduling method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111796932A true CN111796932A (en) | 2020-10-20 |
Family
ID=72803890
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010576793.7A Pending CN111796932A (en) | 2020-06-22 | 2020-06-22 | GPU resource scheduling method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111796932A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114500413A (en) * | 2021-12-17 | 2022-05-13 | 阿里巴巴(中国)有限公司 | Equipment connection method and device and equipment connection chip |
US12028878B2 (en) | 2020-11-12 | 2024-07-02 | Samsung Electronics Co., Ltd. | Method and apparatus for allocating GPU to software package |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109033001A (en) * | 2018-07-17 | 2018-12-18 | 北京百度网讯科技有限公司 | Method and apparatus for distributing GPU |
CN110688218A (en) * | 2019-09-05 | 2020-01-14 | 广东浪潮大数据研究有限公司 | Resource scheduling method and device |
CN111158879A (en) * | 2019-12-31 | 2020-05-15 | 上海依图网络科技有限公司 | System resource scheduling method, device, machine readable medium and system |
CN111190718A (en) * | 2020-01-07 | 2020-05-22 | 第四范式(北京)技术有限公司 | Method, device and system for realizing task scheduling |
-
2020
- 2020-06-22 CN CN202010576793.7A patent/CN111796932A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109033001A (en) * | 2018-07-17 | 2018-12-18 | 北京百度网讯科技有限公司 | Method and apparatus for distributing GPU |
CN110688218A (en) * | 2019-09-05 | 2020-01-14 | 广东浪潮大数据研究有限公司 | Resource scheduling method and device |
CN111158879A (en) * | 2019-12-31 | 2020-05-15 | 上海依图网络科技有限公司 | System resource scheduling method, device, machine readable medium and system |
CN111190718A (en) * | 2020-01-07 | 2020-05-22 | 第四范式(北京)技术有限公司 | Method, device and system for realizing task scheduling |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12028878B2 (en) | 2020-11-12 | 2024-07-02 | Samsung Electronics Co., Ltd. | Method and apparatus for allocating GPU to software package |
CN114500413A (en) * | 2021-12-17 | 2022-05-13 | 阿里巴巴(中国)有限公司 | Equipment connection method and device and equipment connection chip |
CN114500413B (en) * | 2021-12-17 | 2024-04-16 | 阿里巴巴(中国)有限公司 | Device connection method and device, and device connection chip |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107038069B (en) | Dynamic label matching DLMS scheduling method under Hadoop platform | |
CN103870314B (en) | Method and system for simultaneously operating different types of virtual machines by single node | |
CN102387173B (en) | MapReduce system and method and device for scheduling tasks thereof | |
CN107222531B (en) | Container cloud resource scheduling method | |
CN103534687B (en) | Extensible centralized dynamic resource distribution in a clustered data grid | |
CN104881325A (en) | Resource scheduling method and resource scheduling system | |
CN114356543A (en) | Kubernetes-based multi-tenant machine learning task resource scheduling method | |
CN110221920B (en) | Deployment method, device, storage medium and system | |
CN113946431B (en) | Resource scheduling method, system, medium and computing device | |
CN110990154B (en) | Big data application optimization method, device and storage medium | |
CN109783225B (en) | Tenant priority management method and system of multi-tenant big data platform | |
CN104050043A (en) | Share cache perception-based virtual machine scheduling method and device | |
CN114996018A (en) | Resource scheduling method, node, system, device and medium for heterogeneous computing | |
CN114443263A (en) | Video memory management method, device, equipment and system | |
CN106874115A (en) | A kind of resources of virtual machine distribution method and distributed virtual machine resource scheduling system | |
JP2022539955A (en) | Task scheduling method and apparatus | |
CN115134371A (en) | Scheduling method, system, equipment and medium containing edge network computing resources | |
CN109471725A (en) | Resource allocation methods, device and server | |
CN111796932A (en) | GPU resource scheduling method | |
CN107992351B (en) | Hardware resource allocation method and device and electronic equipment | |
CN114721818A (en) | Kubernetes cluster-based GPU time-sharing method and system | |
JP2023543744A (en) | Resource scheduling method, system, electronic device and computer readable storage medium | |
CN117608760A (en) | Cloud application hybrid deployment method applied to Kubernetes | |
CN105187483B (en) | Distribute the method and device of cloud computing resources | |
WO2017133421A1 (en) | Method and device for sharing resources among multiple tenants |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |