CN116680035A

CN116680035A - GPU (graphics processing unit) method and device for realizing remote scheduling of kubernetes container

Info

Publication number: CN116680035A
Application number: CN202310443295.9A
Authority: CN
Inventors: 薛少宁
Original assignee: Inspur Communication Technology Co Ltd
Current assignee: Inspur Communication Technology Co Ltd
Priority date: 2023-04-24
Filing date: 2023-04-24
Publication date: 2023-09-01

Abstract

The invention relates to the technical field of RPC communication, in particular to a method and a device for realizing remote scheduling of kubernetes containers for using a GPU, which comprise the following steps: deploying a Kubernetes cluster, and labeling GPU nodes and non-GPU nodes by using 'kubectl label node'; deploying an official CUDA run time API on the non-GPU node; an official GPU Driver API, namely a GPU Driver, is deployed on the GPU node; deploying a client service component called by a remote CUDA on a non-GPU node; the beneficial effects are as follows: according to the method and the device for realizing remote dispatching of Kubernetes containers, virtual GPU equipment is created on the nodes without physical GPUs through a virtualization technology, and the virtual GPU equipment is registered on the Kubernetes nodes, so that the Kubernetes can dispatch the containers needing GPU resources on the nodes without the physical GPUs, and bind the containers to the virtual GPU equipment, thereby realizing sharing of GPU resources, improving the utilization rate of the GPUs and reducing the hardware and software cost.

Description

GPU (graphics processing unit) method and device for realizing remote scheduling of kubernetes container

Technical Field

The invention relates to the technical field of RPC (remote procedure control) communication, in particular to a method and a device for realizing remote scheduling of kubernetes containers for using a GPU (graphics processing unit).

Background

Kubernetes, abbreviated as K8s. Is an open source for managing containerized applications on multiple hosts in a cloud platform, and the goal of Kubernetes is to make deploying containerized applications simple and efficient, kubernetes provides a mechanism for application deployment, planning, updating, and maintenance.

In the prior art, in Kubernetes, on one hand, users can only use the GPU equipment resources on the present node, but the number of GPU nodes is limited, and certain cost and technology are also required for deploying and managing GPU nodes, which limits the flexibility and portability of the container; on the other hand, the utilization rate of the GPU resources is not high, and some GPU resources on the nodes may be in an idle state, which wastes many unused GPU resources.

Disclosure of Invention

The invention aims to provide a method and a device for realizing remote scheduling of a kubernetes container for using a GPU (graphics processing unit), so as to solve the problems in the prior art.

In order to achieve the above purpose, the present invention provides the following technical solutions: a method for implementing remote scheduling of kubernetes containers using a GPU, the method comprising the steps of:

deploying a Kubernetes cluster, and labeling GPU nodes and non-GPU nodes by using 'kubectl label node';

deploying an official CUDA run time API on the non-GPU node;

an official GPU Driver API, namely a GPU Driver, is deployed on the GPU node;

deploying a client service component called by a remote CUDA on a non-GPU node;

deploying a server-side service component called by a remote CUDA on the GPU node;

creating a depoyment using GPU resources in Kubernetes, configuring scheduling parameters, and creating Pod until no GPU resource nodes exist;

and observing the running state of the Pod on the Kubernetes platform, entering the normal running of the Pod observation service, and confirming the normal call of the GPU resource.

Preferably, when labeling GPU nodes and non-GPU nodes, the GPU nodes and the non-GPU nodes are labeled as "GPU" and "non-GPU" respectively in the Kubernetes cluster.

Preferably, the client service component is implemented by an RPC technology, intercepts the access of the CUDA application program in Pod to the GPU by hijacking the CUDA API, and forwards the access to a node with GPU resources, where a remote CUDA call server is deployed, through a TCP/IP network or an RDMA network.

Preferably, the server service component receives the CUDA call request sent by the client, forwards the request to the GPU device for execution, and returns the result to the client.

Preferably, when the Kubernetes platform observes the running state of Pod, after the GPU container is started, the invoked CUDA program and API call are hijacked and redirected to run on the remote CUDA client, and executed on the node with GPU through communication between the remote CUDA, returning the result to the GPU container.

The remote dispatching and using GPU device of the kubernetes container comprises a kubernetes management module, a dispatching module and a GPU node management module;

the kubernetes management module is responsible for managing the creation, deletion and scheduling operation of the pod, and when the kubernetes management module detects that no GPU node is available, the kubernetes management module sends a scheduling request to the scheduling module; the GPU node management module is responsible for managing states of the GPU nodes, including GPU resource use conditions and GPU node health states, and provides GPU node state information for the kubernetes management module.

Preferably, the scheduling module uses a remote cuda to invoke a TCP/IP network or RDMA network interconnection technique used by the server and the client, including RDMA Aware SocketRDS or InfiniBand networks.

Preferably, the communication between the remote cuda call server and the client is managed through a service discovery mechanism of Kubernetes, a load balancing mechanism of Kubernetes, a network policy of Kubernetes and a security mechanism of Kubernetes.

Preferably, the container POD of the kubernetes management module uses a shared GPU.

Compared with the prior art, the invention has the beneficial effects that:

according to the method and the device for realizing remote dispatching of Kubernetes containers, virtual GPU equipment is created on the nodes without physical GPUs through a virtualization technology, and the virtual GPU equipment is registered on the Kubernetes nodes, so that the Kubernetes can dispatch the containers needing GPU resources on the nodes without the physical GPUs, and bind the containers to the virtual GPU equipment, thereby realizing sharing of GPU resources, improving the utilization rate of the GPUs and reducing the hardware and software cost.

Drawings

FIG. 1 is a schematic diagram of the method of the present invention;

FIG. 2 is a flow chart of the method of the present invention.

Detailed Description

In order to make the objects, technical solutions, and advantages of the present invention more apparent, the embodiments of the present invention will be further described in detail with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are some, but not all, embodiments of the present invention, are intended to be illustrative only and not limiting of the embodiments of the present invention, and that all other embodiments obtained by persons of ordinary skill in the art without making any inventive effort are within the scope of the present invention.

Example 1

Referring to fig. 1 to 2, the present invention provides a technical solution: a method for implementing remote scheduling of kubernetes containers using a GPU, the method comprising the steps of:

deploying a Kubernetes cluster, and labeling GPU nodes and non-GPU nodes by using 'kubectl label node'; in Kubernetes clusters, GPU nodes and non-GPU nodes are labeled "GPU" and "non-GPU", respectively;

deploying an official CUDA run time API on the non-GPU node;

an official GPU Driver API, namely a GPU Driver, is deployed on the GPU node;

deploying a client service component called by a remote CUDA on a non-GPU node; the client service component is realized by an RPC technology, intercepts the access of the CUDA application program to the GPU in the Pod by hijacking the CUDA API, and forwards the access to a node with GPU resources of a server side deployed with remote CUDA call through a TCP/IP network or an RDMA network;

deploying a server-side service component called by a remote CUDA on the GPU node; the server side service component receives a CUDA call request sent by the client side, forwards the request to the GPU equipment for execution, and returns a result to the client side;

observing the running state of the Pod on the Kubernetes platform, entering the normal running of the Pod observation service, confirming the normal calling of GPU resources, after the GPU container is started, hijacking and redirecting the called CUDA program and API call to a remote CUDA client for running, executing on a node with the GPU through communication between the remote CUDA, and returning the result to the GPU container.

Example two

On the basis of the first embodiment, a remote dispatching GPU device for kubernetes containers comprises a kubernetes management module, a dispatching module and a GPU node management module;

the kubernetes management module is responsible for managing the creation, deletion and scheduling operation of the pod, and when the kubernetes management module detects that no GPU node is available, the kubernetes management module sends a scheduling request to the scheduling module; the GPU node management module is responsible for managing the states of the GPU nodes, including the use condition of GPU resources and the health state of the GPU nodes, and provides GPU node state information for the kubernetes management module;

the scheduling module uses a TCP/IP network or RDMA network interconnection technology used by the remote cuda calling server and the client, including RDMA Aware SocketRDS or InfiniBand networks; the communication between the remote cuda call server and the client is managed through a service discovery mechanism of Kubernetes, a load balancing mechanism of Kubernetes, a network policy of Kubernetes and a security mechanism of Kubernetes, and a container POD of the Kubernetes management module uses a shared GPU.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A GPU method for realizing remote scheduling of kubernetes containers is characterized in that: the GPU remote scheduling method comprises the following steps of:

deploying an official CUDA run time API on the non-GPU node;

an official GPU Driver API, namely a GPU Driver, is deployed on the GPU node;

deploying a client service component called by a remote CUDA on a non-GPU node;

2. The method for implementing kubernetes container remote scheduling according to claim 1, wherein the GPU method is characterized by: when labeling GPU nodes and non-GPU nodes, the GPU nodes and the non-GPU nodes are labeled as "GPU" and "non-GPU" respectively in the Kubernetes cluster.

3. The method for implementing kubernetes container remote scheduling according to claim 1, wherein the GPU method is characterized by: the client service component is realized by an RPC technology, intercepts the access of the CUDA application program to the GPU in the Pod by hijacking the CUDA API, and forwards the access to a node with GPU resources of a server side deployed with remote CUDA call through a TCP/IP network or an RDMA network.

4. The method for implementing kubernetes container remote scheduling according to claim 1, wherein the GPU method is characterized by: the server side service component receives the CUDA call request sent by the client side, forwards the request to the GPU equipment for execution, and returns the result to the client side.

5. The method for implementing kubernetes container remote scheduling according to claim 1, wherein the GPU method is characterized by: when the Kubernetes platform observes the running state of Pod, after the GPU container is started, the invoked CUDA program and API call are hijacked and redirected to run on the remote CUDA client, and executed on the node with GPU through communication between the remote CUDA, returning the result to the GPU container.

6. A kubernetes container remote scheduling GPU apparatus for implementing a kubernetes container remote scheduling GPU method according to any of claims 1-5, wherein: the remote dispatching GPU device comprises a kubernetes management module, a dispatching module and a GPU node management module;

7. The GPU apparatus of claim 6, wherein the GPU apparatus is configured to implement kubernetes container remote scheduling: the scheduling module utilizes a TCP/IP network or RDMA network interconnection technology used by the remote cuda call server and the client, including RDMA Aware SocketRDS or InfiniBand networks.

8. The GPU apparatus of claim 7, wherein the GPU apparatus is configured to implement kubernetes container remote scheduling: the communication between the remote cuda call server and the client is managed through a service discovery mechanism of Kubernetes, a load balancing mechanism of Kubernetes, a network policy of Kubernetes and a security mechanism of Kubernetes.

9. The GPU apparatus of claim 6, wherein the GPU apparatus is configured to implement kubernetes container remote scheduling: the container POD of the kubernetes management module uses a shared GPU.