CN115964152A

CN115964152A - GPU resource scheduling method, equipment and storage medium

Info

Publication number: CN115964152A
Application number: CN202310060787.XA
Authority: CN
Inventors: 王文正; 田亮; 黎媛; 张旭敏; 廖玲
Original assignee: New Tranx Information Technology Shenzhen Co ltd
Current assignee: New Tranx Information Technology Shenzhen Co ltd
Priority date: 2023-01-13
Filing date: 2023-01-13
Publication date: 2023-04-14

Abstract

The invention discloses a GPU resource scheduling method, equipment and a storage medium, wherein the method comprises the following steps: after the Pod is bound with a target node, inquiring GPU resources corresponding to the target node, wherein the GPU resources comprise GPU video memory capacity and GPU card number; and creating a container corresponding to the Pod based on the GPU resource corresponding to the target node. By virtualizing GPU resources, standard containers supporting related algorithm configuration are allocated to different types of computing tasks, so that the computing tasks run in the containers, matching between the different characteristic algorithm tasks and computing equipment is flexibly achieved, and flexibility of resource allocation is improved.

Description

GPU resource scheduling method, equipment and storage medium

Technical Field

The invention relates to the technical field of computers, in particular to a GPU resource scheduling method, equipment and a storage medium.

Background

The multi-modal machine translation will use various deep learning frames, which are the core supported by the machine translation model and are also the basis of incremental training, so that it is necessary to ensure that the various deep learning frames are compatible and convenient to manage and train, container cluster management needs to be performed on multiple GPU (graphics processing unit) resources, and management of computing resources is realized.

Disclosure of Invention

The invention mainly aims to provide a GPU resource scheduling method, equipment and a storage medium, and aims to solve the problem of how to realize flexibility of GPU resource allocation.

In order to achieve the above object, the present invention provides a GPU resource scheduling method, which comprises the following steps:

after the Pod is bound with a target node, inquiring GPU resources corresponding to the target node, wherein the GPU resources comprise GPU video memory capacity and GPU card number;

and creating a container corresponding to the Pod based on the GPU resource corresponding to the target node.

Optionally, before the step of querying the GPU resource corresponding to the target node, the method further includes:

acquiring the use state of each working node in the node cluster;

determining the reported resource information of the working node according to the use state;

and sending the resource information to the scheduler.

Optionally, the step of querying the use status of each working node in the node cluster includes:

generating a registration request according to the resource request, and sending the registration request to an equipment plug-in so as to enable the equipment plug-in to start a gPC service;

and acquiring the use state of the working nodes in the node cluster.

after the dispatcher receives the Pod-based resource request, the using state of each working node in the node cluster is obtained; or,

and acquiring the use state of each working node in the node cluster reported by the equipment plug-in at the starting time.

receiving a resource request based on Pod, wherein the resource request comprises GPU target video memory capacity and GPU target card number;

acquiring resource information of a working node reported by a kubel component;

and determining a target node corresponding to the resource request according to the resource information, and binding the Pod with the target node.

Optionally, the step of determining, according to the resource information, a target node corresponding to the resource request includes:

and controlling a GPU scheduler according to the resource information and the resource request, so that the GPU scheduler determines a target node according to the resource information and the resource request, and sends the target node to the scheduler.

acquiring resource information and a resource request sent by a scheduler;

determining a target node according to the resource information and the resource request;

and sending the target node to the scheduler.

Optionally, the step of determining a target node according to the resource information and the resource request includes:

determining the resource quantity required by the Pod according to the resource request, and determining the working node to be determined, wherein the resource quantity is greater than or equal to the required resource quantity;

determining the priority of the working node to be determined, and determining a target node according to the priority.

In order to achieve the above object, the present invention further provides a GPU resource scheduling apparatus, which includes a memory, a processor, and a GPU resource scheduler stored in the memory and executable on the processor, and when being executed by the processor, the GPU resource scheduler implements the steps of the GPU resource scheduling method as described above.

To achieve the above object, the present invention further provides a computer readable storage medium storing a GPU resource scheduler, which when executed by a processor implements the steps of the GPU resource scheduling method as described above.

According to the GPU resource scheduling method, the device and the storage medium, after the Pod is bound with the target node, the GPU resource corresponding to the target node is inquired, wherein the GPU resource comprises GPU video memory capacity and GPU card number; and creating a container corresponding to the Pod based on the GPU resource corresponding to the target node. By virtualizing GPU resources, distributing standard containers supporting related algorithm configuration for different types of computing tasks, enabling the computing tasks to run in the containers, flexibly achieving matching between the different characteristic algorithm tasks and computing equipment, and improving flexibility of resource distribution.

Drawings

Fig. 1 is a schematic diagram of a hardware structure of a GPU resource scheduling device according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a GPU resource scheduling method according to a first embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a Kubernetes system of the GPU resource scheduling method of the present invention;

FIG. 4 is a flowchart illustrating a GPU resource scheduling method according to a second embodiment of the present invention;

FIG. 5 is a flowchart illustrating a GPU resource scheduling method according to a third embodiment of the present invention;

FIG. 6 is a flowchart illustrating a GPU resource scheduling method according to a fourth embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The main solution of the embodiment of the invention is as follows: after the Pod is bound with the target node, inquiring GPU resources corresponding to the target node, wherein the GPU resources comprise GPU video memory capacity and GPU card number; and creating a container corresponding to the Pod based on the GPU resource corresponding to the target node.

By virtualizing GPU resources, distributing standard containers supporting related algorithm configuration for different types of computing tasks, enabling the computing tasks to run in the containers, flexibly achieving matching between the different characteristic algorithm tasks and computing equipment, and improving flexibility of resource distribution.

As one implementation, the GPU resource scheduling device may be as shown in fig. 1.

The embodiment scheme of the invention relates to GPU resource scheduling equipment, which comprises: a processor 101, e.g. a CPU, a memory 102, a communication bus 103. Wherein a communication bus 103 is used for enabling the connection communication between these components.

The memory 102 may be a high-speed RAM memory or a non-volatile memory (e.g., a disk memory). As shown in FIG. 1, a GPU resource scheduler may be included in memory 102, which is a type of computer-readable storage medium; and the processor 101 may be configured to invoke the GPU resource scheduler stored in the memory 102 and perform the following operations:

Alternatively, the processor 101 may be configured to call a GPU resource scheduler stored in the memory 102 and perform the following operations:

acquiring the use state of each working node in the node cluster;

and sending the resource information to the scheduler.

and acquiring the use state of the working nodes in the node cluster.

Alternatively, the processor 101 may be configured to invoke a GPU resource scheduler stored in the memory 102 and perform the following operations:

acquiring resource information of a working node reported by a kubel component;

and controlling a GPU scheduler according to the resource information and the resource request so that the GPU scheduler determines a target node according to the resource information and the resource request and sends the target node to the scheduler.

acquiring resource information and a resource request sent by a scheduler;

and sending the target node to the scheduler.

Based on the hardware architecture of the GPU resource scheduling equipment, the embodiment of the GPU resource scheduling method is provided.

Referring to fig. 2, fig. 2 is a diagram of a GPU resource scheduling method according to a first embodiment of the present invention, where the GPU resource scheduling method includes the following steps:

and S10, after the Pod is bound with the target node, inquiring GPU resources corresponding to the target node, wherein the GPU resources comprise GPU video memory capacity and GPU card number.

The multi-modal machine translation uses various deep learning frames, which are the core supported by a machine translation model and are also the basis of incremental training, so that the various deep learning frames need to be ensured to be compatible and convenient to manage and train, on one hand, container cluster management is required for GPU (graphics processing unit) resources, a Kubernets system can be adopted to virtualize hardware computing resources, and cross-node scheduling and management of the computing resources are carried out, wherein the Kubernets system is used for managing containerization on a plurality of hosts in a cloud platform; on the other hand, the multi-deep learning framework is allocated and called as a whole. The Kubernetes is adopted as a container cluster management system in the deep learning framework, multi-type GPU cluster scheduling is supported, convenience of large-scale container cluster management is met, and management sharing and scheduling are carried out on computing resources.

Optionally, the management and training of the deep learning framework supports an Advanced RISCMachine (ARM) or a CPU chip of an x86 or MIPS architecture, and supports an AI accelerator card for deep learning, such as an NPU (neural-network processing unit). The method supports containerization deep learning calculation service management, virtualizes calculation resources such as CPU, AI and GPU chips and resources such as storage and network, allocates standard containers supporting relevant algorithm configuration for different types of calculation tasks, and the calculation tasks run in the containers, so that matching between the algorithm tasks with different characteristics and the calculation equipment is flexibly realized. The method supports core computing tasks such as machine translation and the like, tests the optimal equipment configuration scheme, realizes support on equipment of a Transformer framework, and provides stable and efficient computing capability for the equipment.

Optionally, as shown in fig. 3, a scheduler, a GPU scheduler, a kubbellet component, a node cluster, and the like are included in the kubernets system.

Optionally, the embodiment is applied to the kubel component, and after the scheduler notifies the kubel component Pod, that is, after the container group is bound with the target node, the kubel component queries the GPU resources corresponding to the target node, where the GPU resources include GPU video memory capacity, GPU card number, and the like.

Optionally, before step S10, the method further includes: the method comprises the steps that a kubbelet component obtains the use state of each working node in a node cluster; determining the reported resource information of the working nodes according to the use state, wherein the resource information comprises GPU video memory capacity and GPU card number corresponding to each working node; and sending the resource information to the scheduler. Optionally, when the service status of the working node is used, the resource information of the working node is not reported; and when the service state of the working node is unused, reporting the resource information of the working node. Optionally, after determining that the scheduler receives the Pod-based resource request, the kubel component acquires the use state of each working node in the node cluster; or the kubel component acquires the use state of each working node in the node cluster reported by the equipment plug-in at the starting time.

And S20, creating a container corresponding to the Pod based on the GPU resource corresponding to the target node.

Optionally, the kubelet component creates a container corresponding to the Pod based on the GPU resource corresponding to the target node, and operates or trains the deep learning framework in the container. The kubelet component calls a Docker, namely an application container engine, based on the GPU resources corresponding to the target node, the application container engine is used for managing the life cycle of the container, and the application container engine creates a container corresponding to the Pod according to the GPU resources of the target node.

The Kubernetes system manages GPU Resources through a plug-in extension mechanism, and has two independent internal mechanisms, wherein extended Resources are extended Resources and allow users to define resource names. The Device plug-in Framework is constructed, and allows a third-party Device provider to perform full-life-cycle management on the Device in an external mode. Optionally, the reporting of Extended Resource records the GPU type, the GPU video memory capacity, the GPU card number, and the like of the working node corresponding to the Extended Resource in a scheduler of the kubernets system.

The resource reporting of the device plug-in may be resource reporting at the starting time of the device plug-in, or scheduling and running at the user using time, for example, the user using time is when the scheduler receives a Pod-based resource request. Optionally, when the GPU resource information monitored by the device plug-in changes, the GPU resource information is reported to the Kubelet component. The equipment plug-in defines two new Extended resources, namely, expanding resources, wherein the first is GPU video memory GPU-mem; the second is the GPU card number GPU-count, vector resource information is described through two scalar resources, and a working mechanism supporting the shared GPU is provided by combining the resource information.

In the technical scheme of the embodiment, after the Pod is bound with the target node, the GPU resource corresponding to the target node is inquired; and creating a container corresponding to the Pod based on the GPU resource corresponding to the target node. By virtualizing GPU resources, standard containers supporting related algorithm configuration are allocated to different types of computing tasks, so that the computing tasks run in the containers, matching between the different characteristic algorithm tasks and computing equipment is flexibly achieved, and flexibility of resource allocation is improved.

Referring to fig. 4, fig. 4 is a second embodiment of a GPU resource scheduling method of the present invention, the method includes:

step S30, receiving a resource request based on Pod, wherein the resource request comprises GPU target video memory capacity and GPU target card number;

s40, acquiring resource information of the working node reported by the kubel component;

and S50, determining a target node corresponding to the resource request according to the resource information, and binding the Pod with the target node.

Optionally, as shown in fig. 3, a scheduler, a GPU scheduler, a kubbeelet component, a node cluster, and the like are included in the kubernets system. Optionally, the present embodiment is applied to a scheduler.

Optionally, the resource request includes a GPU target video memory capacity GPU-mem and a GPU target card number GPU-count. Optionally, matching is performed according to the resource information and the resource request to obtain a target node, and the Pod is bound to the target node, and optionally, a node tag and a node selector may be used to schedule the Pod on the target node.

Optionally, the kubel component acquires the use state of each working node in the node cluster; determining the reported resource information of the working nodes according to the use state, wherein the resource information comprises GPU video memory capacity and GPU card number corresponding to each working node; and sending the resource information to the scheduler. Optionally, when the working node is in a used state, the resource information of the working node is not reported; and when the use state of the working node is unused, reporting the resource information of the working node. Optionally, after determining that the scheduler receives the Pod-based resource request, the kubel component acquires the use state of each working node in the node cluster; or the kubel component acquires the use state of each working node in the node cluster reported by the equipment plug-in at the starting time.

Optionally, the scheduler controls the GPU scheduler according to the resource information and the resource request of the working node, and the GPU scheduler determines a target node according to the resource information and the resource request, sends the target node to the scheduler, and binds the Pod with the target node. The GPU card is used as a resource report of the scheduling particles, and a user can request one or more GPUs in each container; the GPU cannot be split, and can only be distributed in a whole block when GPU resources are distributed to the PODs, other PODs cannot apply for the GPU resources even if the GPU distributed to the PODs is not used, and one POD can only specify one GPU card.

In the technical solution of this embodiment, a Pod-based resource request is received; acquiring resource information of a working node reported by a kubelet component; and determining a target node corresponding to the resource request according to the resource information, and binding the Pod with the target node. By determining the target node and binding the Pod with the target node, multi-type GPU cluster scheduling is achieved, namely different types of GPU resources on different nodes in a scheduling node cluster are scheduled, and flexibility and efficiency of GPU resource scheduling are improved.

Referring to fig. 5, fig. 5 is a diagram of a GPU resource scheduling method according to a third embodiment of the present invention, where the method includes:

step S60, acquiring resource information and resource requests sent by a scheduler;

step S70, determining a target node according to the resource information and the resource request;

and step S80, sending the target node to the scheduler.

Optionally, as shown in fig. 3, a scheduler, a GPU scheduler, a kubbeelet component, a node cluster, and the like are included in the kubernets system. Optionally, the present embodiment is applied to a GPU scheduler.

Optionally, the GPU scheduler obtains resource information and a resource request sent by the scheduler, and the GPU scheduler determines a target node according to the resource information and the resource request, sends the target node to the scheduler, and binds the Pod with the target node. The GPU card is used as a resource report of the scheduling particles, and a user can request one or more GPUs in each container; the GPU cannot be split, GPU resources can be distributed to PODs in a whole block, even if the GPU distributed to the PODs is not used, other PODs cannot apply for the GPU resources, and one POD can only designate one GPU card.

Optionally, step S70 includes: the GPU scheduler determines the resource quantity required by the Pod according to the resource request, and determines the working node to be determined, the resource quantity of which is greater than or equal to the required resource quantity; determining the priority of the working node to be determined, and determining a target node according to the priority.

The priority of the working node to be determined is determined, the priority of the working node to be determined can be graded, and the GPU scheduler may consider some overall optimization strategies, for example, the working node with the lowest load is used, and the like. And selecting the node with the highest score, performing binding operation, storing the result into the etcd data storage system, and executing container creation operation by the kubel component according to the target working node.

In the technical scheme of the embodiment, resource information and a resource request sent by a scheduler are acquired; determining a target node according to the resource information and the resource request; and sending the target node to the scheduler. By determining the target node, the multi-type GPU cluster scheduling is realized, different types of GPU resources on different nodes in the node cluster are scheduled, and the flexibility and the efficiency of GPU resource scheduling are improved.

In one embodiment, referring to fig. 6, the scheduler receives Pod-based resource requests, the resource requests including GPU target video memory capacity and GPU target card count; acquiring resource information of a working node reported by a kubel component; and determining a target node corresponding to the resource request according to the resource information, and binding the Pod with the target node. The GPU scheduler acquires resource information and resource requests sent by the scheduler; determining a target node according to the resource information and the resource request; the target node is sent to the scheduler. After the Pod is bound with the target node, the kubel assembly inquires GPU resources corresponding to the target node, wherein the GPU resources comprise GPU video memory capacity and GPU card number; and creating a container corresponding to the Pod based on the GPU resource corresponding to the target node. By virtualizing GPU resources, standard containers supporting related algorithm configuration are allocated to different types of computing tasks, so that the computing tasks run in the containers, matching between the different characteristic algorithm tasks and computing equipment is flexibly achieved, and flexibility of resource allocation is improved.

The invention also provides a GPU resource scheduling device, which comprises a memory, a processor and a GPU resource scheduler stored in the memory and executable on the processor, wherein the GPU resource scheduler implements the steps of the GPU resource scheduling method according to the above embodiments when being executed by the processor.

The present invention also provides a computer readable storage medium storing a GPU resource scheduler, which when executed by a processor implements the steps of the GPU resource scheduling method described in the above embodiments.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, system, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, system, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, system, article, or apparatus that comprises the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the system of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a computer-readable storage medium (such as ROM/RAM, magnetic disk, optical disk) as described above, and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a parking management device, an air conditioner, or a network device) to execute the system according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A GPU resource scheduling method is applied to a kubbelets component, the Kubernets system comprises a scheduler, the kubbelets component and a node cluster, and the method comprises the following steps:

2. The method for scheduling GPU resources of claim 1, wherein before the step of querying the GPU resources corresponding to the target node, the method further comprises:

acquiring the use state of each working node in the node cluster;

and sending the resource information to the scheduler.

3. The method for scheduling GPU resources of claim 2, wherein the step of querying the use status of each worker node in the node cluster comprises:

and acquiring the use state of the working nodes in the node cluster.

4. The GPU resource scheduling method of claim 2, wherein the step of querying the use status of each worker node in the node cluster comprises:

5. A GPU resource scheduling method is applied to a scheduler, and comprises the following steps:

acquiring resource information of a working node reported by a kubel component;

6. The method for scheduling resources of a GPU as claimed in claim 5, wherein the step of determining the target node corresponding to the resource request according to the resource information comprises:

7. A GPU resource scheduling method is applied to a GPU scheduler, and comprises the following steps:

acquiring resource information and a resource request sent by a scheduler;

and sending the target node to the scheduler.

8. The method for GPU resource scheduling of claim 7 wherein the step of determining a target node based on the resource information and the resource request comprises:

9. A GPU resource scheduling device, comprising a memory, a processor, and a GPU resource scheduler stored in the memory and executable on the processor, the GPU resource scheduler, when executed by the processor, implementing the steps of the GPU resource scheduling method of any of claims 1-8.

10. A computer-readable storage medium, storing a GPU resource scheduler, which when executed by a processor performs the steps of the GPU resource scheduling method of any of claims 1-8.