CN114565502A

CN114565502A - GPU resource management method, scheduling method, device, electronic equipment and storage medium

Info

Publication number: CN114565502A
Application number: CN202210219719.9A
Authority: CN
Inventors: 万江凯
Original assignee: Chongqing Unisinsight Technology Co Ltd
Current assignee: Chongqing Unisinsight Technology Co Ltd
Priority date: 2022-03-08
Filing date: 2022-03-08
Publication date: 2022-05-31

Abstract

The embodiment of the invention relates to the technical field of GPU resource management, and provides a GPU resource management method, a scheduling device, electronic equipment and a storage medium, wherein the GPU resource management method comprises the following steps: detecting a GPU card on a working node by using the management Pod; if the GPU card is detected, acquiring GPU resources of the working nodes by using the management Pod; and sending the GPU resources of the working nodes to the master node so that the master node schedules the GPU resources. According to the embodiment of the invention, the GPU card on the working node is automatically detected by using the management Pod with the built-in GPU card driving assembly, and the detected GPU resource of the GPU card is sent to the master node, so that the master node can schedule the GPU resource, the management and use processes of the GPU card resource are simplified, and the automatic management and scheduling of the GPU card resource are realized.

Description

GPU resource management method, scheduling method, device, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of GPU resource management, in particular to a GPU resource management method, a scheduling device, electronic equipment and a storage medium.

Background

Graphics Processing Unit (GPU) resources are expensive computing resources in the field of cloud computing, and are widely applied to work scenes such as deep learning, graphic image Processing, video encoding and decoding, graphic databases, high-performance computing, molecular modeling, genomics and the like.

In a cloud platform based on a Kubernetes cluster, GPU resources on each working node in the Kubernetes cluster need to install a driver of a corresponding GPU card on each working node to detect the GPU resources of the GPU card, and the management and use processes of the GPU resources are excessively complicated.

Disclosure of Invention

The invention aims to provide a GPU resource management method, a scheduling device, electronic equipment and a storage medium, which can simplify the management and use process of GPU card resources and realize the automatic management and scheduling of the GPU card resources.

In order to achieve the above purpose, the embodiment of the present invention adopts the following technical solutions:

in a first aspect, an embodiment of the present invention provides a GPU resource management method, which is applied to a work node in a kubernets cluster, where the kubernets cluster further includes a master node, the master node is in communication connection with the work node, the work node runs a management Pod, and a GPU card drive component is built in the management Pod, and the method includes: detecting a GPU card on the working node by using the management Pod; if the GPU card is detected, acquiring GPU resources of the working node by using the management Pod; and sending the GPU resources of the working nodes to the main node by using the management Pod so that the main node schedules the GPU resources.

Further, the method further comprises:

when the working state of the GPU card of the working node changes, the GPU card on the working node is detected again by using the management Pod;

counting the GPU resources of the re-detected GPU card to obtain the current GPU resources of the working nodes;

and synchronizing the current GPU resource of the working node to the main node so as to instruct the main node to update the GPU resource of the working node.

Further, the working node runs a plurality of created applications Pod, and the method further includes:

acquiring GPU resource allocation of each application Pod;

and according to the GPU resource configuration of each application Pod, isolating the GPU resources of the working nodes so that each application Pod can obtain the GPU resources required by the GPU resource configuration of each application Pod.

Further, the number of the detected GPU cards is multiple, and the step of acquiring the GPU resources of the work node by using the management Pod includes:

acquiring the detected GPU resources of each GPU card;

and counting all the detected GPU resources of the GPU card to obtain the GPU resources of the working nodes.

In a second aspect, an embodiment of the present invention provides a GPU resource scheduling method, which is applied to a master node in a kubernets cluster, where the kubernets cluster further includes a work node, the work node is in communication connection with the master node, the work node runs a management Pod, and a GPU card drive component is built in the management Pod, and the method includes:

receiving GPU resources of the working nodes sent by the working nodes, wherein the working nodes run management Pod, and the GPU resources of the working nodes are obtained by the working nodes through detecting GPU cards installed on the working nodes by using the management Pod and through using the management Pod when the GPU cards are detected; and scheduling the GPU resources.

Further, there are a plurality of the working nodes, and the step of scheduling the GPU resources includes:

receiving an application Pod creation command, wherein the application Pod creation command comprises GPU resource configuration of an application Pod to be created;

acquiring GPU resources of each working node based on the application Pod creating command;

determining target GPU resources meeting the GPU resource configuration from GPU resources of all the working nodes;

and scheduling the application Pod to a target work node to which the target GPU resource belongs so as to enable the target work node to create and run the application Pod to be created.

Further, the method further comprises:

receiving a join request which is reported by a new working node and used for applying for joining the Kubernets cluster;

and scheduling the management Pod to the new working node based on the joining request so as to enable the new working node to create and operate the management Pod, and sending the GPU resources of the new working node, which are acquired by the management Pod, to the master node by the new working node.

In a third aspect, an embodiment of the present invention provides a GPU resource management device, which is applied to a work node in a kubernets cluster, where the kubernets cluster further includes a master node, the master node is in communication connection with the work node, the work node runs a management Pod, and a GPU card drive component is built in the management Pod, and the device includes: the management module is used for detecting the GPU card on the working node by using the management Pod; the management module is further configured to acquire, if a GPU card is detected, GPU resources of the working node by using the management Pod; and the sending module is used for sending the GPU resources of the working nodes to the main node by using the management Pod so that the main node schedules the GPU resources.

In a fourth aspect, an embodiment of the present invention provides a GPU resource scheduling apparatus, which is applied to a master node in a kubernets cluster, where the kubernets cluster further includes a work node, the work node is in communication connection with the master node, the work node runs a management Pod, and a GPU card drive component is built in the management Pod, and the apparatus includes: the receiving module is used for receiving GPU resources of the working nodes sent by the working nodes, wherein the working nodes run with management pods, and the GPU resources of the working nodes are obtained by the working nodes through the management pods, detecting GPU cards installed on the working nodes by using the management pods, and when the GPU cards are detected; and the scheduling module is used for scheduling the GPU resources.

In a fifth aspect, an embodiment of the present invention further provides an electronic device, including a processor and a memory; the memory is used for storing programs; the processor is configured to implement the GPU resource management method in the first aspect or implement the GPU resource scheduling method in the second aspect when executing the program.

In a sixth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the GPU resource management method in the first aspect, or implements the GPU resource scheduling method in the second aspect.

Compared with the prior art, the embodiment of the invention provides a GPU resource management method, a scheduling method, a device, electronic equipment and a storage medium, the GPU resource management method is applied to working nodes in a Kubernetes cluster, the working nodes run management Pod, a GPU card driving component is arranged in the management Pod, the working nodes detect GPU cards of the working nodes by using the management Pod, if the GPU cards are detected, the GPU resources of the working nodes are acquired by using the management Pod, and sends the GPU resources of the working nodes to the master node, so that the master node schedules the GPU resources, the embodiment of the invention automatically detects the GPU cards on the working nodes by using the management Pod internally provided with a GPU card driving assembly, and transmitting the detected GPU resources of the GPU card to the main node so as to schedule the GPU resources by the main node, therefore, the management and use processes of the GPU card resources are simplified, and the automatic management and scheduling of the GPU card resources are realized.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 is an exemplary diagram of a kubernets cluster according to an embodiment of the present invention.

Fig. 2 is a block diagram of an electronic device according to an embodiment of the present invention.

Fig. 3 is a flowchart illustrating a GPU resource management method according to an embodiment of the present invention.

Fig. 4 is a flowchart illustrating another GPU resource management method according to an embodiment of the present invention.

Fig. 5 is a flowchart illustrating another GPU resource management method according to an embodiment of the present invention.

Fig. 6 is a flowchart illustrating another GPU resource management method according to an embodiment of the present invention.

Fig. 7 is a flowchart illustrating a method for scheduling GPU resources according to an embodiment of the present invention.

Fig. 8 is a flowchart illustrating another GPU resource scheduling method according to an embodiment of the present invention.

Fig. 9 is a flowchart illustrating another GPU resource scheduling method according to an embodiment of the present invention.

Fig. 10 is a diagram illustrating an example of a particular kubernets cluster provided by an embodiment of the present invention.

Fig. 11 is a diagram illustrating an example of GPU resources when a GPU card on node1 fails according to an embodiment of the present invention.

Fig. 12 is a diagram illustrating an example of GPU resources when GPU cards on node1 are two according to an embodiment of the present invention.

Fig. 13 shows an exemplary diagram of GPU resources on node4 provided by an embodiment of the present invention.

Fig. 14 is a block diagram illustrating a GPU resource management apparatus according to an embodiment of the present invention.

Fig. 15 is a block diagram illustrating a GPU resource scheduling apparatus according to an embodiment of the present invention.

Icon: 10-a master node; 20-a working node; 30-an electronic device; 31-a processor; 32-a memory; 33-a bus; 100-GPU resource management means; 110-a management module; 120-a sending module; 200-GPU resource scheduling device; 210-a receiving module; 220-scheduling module.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

In the description of the present invention, it should be noted that if the terms "upper", "lower", "inside", "outside", etc. indicate an orientation or a positional relationship based on that shown in the drawings or that the product of the present invention is used as it is, this is only for convenience of description and simplification of the description, and it does not indicate or imply that the device or the element referred to must have a specific orientation, be constructed in a specific orientation, and be operated, and thus should not be construed as limiting the present invention.

Furthermore, the appearances of the terms "first," "second," and the like, if any, are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.

It should be noted that the features of the embodiments of the present invention may be combined with each other without conflict.

The kubernets cluster is a computer cluster with kubernets deployed, the kubernets are an open source and used for managing containerized applications on multiple computers in a cloud platform, the kubernets are aimed at enabling the containerized applications to be deployed simply and efficiently, please refer to fig. 1, fig. 1 is an exemplary diagram of the kubernets cluster provided by an embodiment of the present invention, and in fig. 1, the kubernets cluster includes a main node 10 and multiple working nodes 20. The main node 10 schedules and deploys applications on a kubernets cluster, specifically, the main node 10 is responsible for compiling all relevant information of running containers on work nodes, the work nodes 20 run the applications scheduled and deployed by the main node 10, specifically, the work nodes 20 effectively run containers managed by the kubernets cluster, in the kubernets cluster, Pod is the smallest deployable unit capable of creating and managing kubernets calculation, applications required to run in the kubernets cluster are usually created and run in Pod form, and one Pod represents one process running in the kubernets cluster. The master node 10 collects information of the working nodes 20 and the pods operating on the working nodes 20 to manage the working nodes 20 and the pods operating thereon, and the working nodes 20 are responsible for operating the pods, and one Pod can only operate on one working node, but one Pod can have multiple copies, and each copy can only operate on a different working node.

The master node 10 and the working nodes 20 may be physical devices or virtual machines, and the master node 10 may be independent of the working nodes 20, or may share the same physical device or virtual machine with any working node 20 in the kubernets cluster.

In order to enable an application Pod deployed in a kubernets cluster to normally use GPU resources, a GPU card is usually installed on a working node 20, the GPU card usually refers to a display card on which a GPU is welded, the GPU is a heart of the display card, and most of performance of the display card is determined. The management and use process of the GPU resources is too cumbersome because of the excessive need for human intervention.

In view of the above, embodiments of the present invention provide a GPU resource management method, a scheduling method, an apparatus, an electronic device, and a storage medium, which are used to simplify the management and use process of GPU resources, and will be described in detail below.

Referring to fig. 2, fig. 2 is a block schematic diagram of an electronic device 30 according to an embodiment of the present invention, where the electronic device 30 may be the master node 10 or the work node 20 in fig. 1, and the electronic device 30 includes a processor 31, a memory 32, and a bus 33. The processor 31 and the memory 32 communicate via a bus 33.

The processor 31 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 31. The Processor 31 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components.

The memory 32 is configured to store a program, for example, a GPU resource management device or a GPU resource scheduling device in the embodiment of the present invention, where the GPU resource management device or the GPU resource scheduling device includes at least one software functional module that can be stored in the memory 32 in a form of software or firmware (firmware), and the processor 31 executes the program after receiving an execution instruction to implement the GPU resource management method or the GPU resource scheduling method in the embodiment of the present invention.

The Memory 32 may include a Random Access Memory (RAM) and a non-volatile Memory (non-volatile Memory). Alternatively, the memory 32 may be a storage device built in the processor 31, or may be a storage device independent of the processor 31.

The bus 33 may be an ISA bus, a PCI bus, an EISA bus, or the like. Fig. 2 is represented by only one double-headed arrow, but does not represent only one bus or one type of bus.

On the basis of fig. 1 and fig. 2, an embodiment of the present invention further provides a GPU resource management method applied to the working node 20 in fig. 1 and the electronic device 30 in fig. 2, please refer to fig. 3, and fig. 3 is a flowchart illustrating a GPU resource management method according to an embodiment of the present invention, where the method includes the following steps:

and S101, detecting a GPU card on the working node by using the management Pod.

In this embodiment, the management Pod is a dedicated Pod for managing GPU resources of the GPU card on the working node 20, and each working node 20 may be instructed by the master node 10 to automatically create a management Pod when newly joining the kubernets cluster, so that the management Pod manages the GPU resources on the working node 20. And the management Pod is used for detecting the GPU card, acquiring GPU resources of the working node when the GPU card is detected, and finally sending the GPU resources to the main node.

In this embodiment, there may be one or more GPU cards on one work node 20, and the management Pod detects all the GPU cards on the work node 20.

And step S102, if the GPU card is detected, utilizing the management Pod to acquire GPU resources of the working node.

In this embodiment, when a GPU card is detected by the management Pod, the GPU resources of the working node are obtained, in this embodiment, if only one GPU card is on the working node 20, the GPU resources of the working node 20 are the GPU resources of the GPU card, and if a plurality of GPU cards are on the working node 20, the GPU resources of the working node 20 are the sum of the GPU resources of the plurality of GPU cards, the GPU resources include, but are not limited to, computing power resources and display resources, the computing power resources are used for representing GPU computing capabilities of the GPU card, and the display resources are used for representing the size of resources of a display memory that the GPU card can use when performing computing.

And S103, sending the GPU resources of the working nodes to the master node by using the management Pod so that the master node schedules the GPU resources.

In this embodiment, the management Pod of the working node 20 is responsible for sending the GPU resources of the working node to the master node 10, and the master node 10 is responsible for performing unified scheduling on the GPU resources, that is, scheduling the GPU resources required for the application Pod requiring the GPU resources, so as to create the application Pod on the appropriate working node 20.

According to the method provided by the embodiment of the invention, the GPU card on the working node is automatically detected by using the management Pod with the built-in GPU card driving assembly, and the GPU resource of the detected GPU card is sent to the main node, so that the main node can schedule the GPU resource, the management and use process of the GPU card resource is simplified, and the automatic management and scheduling of the GPU card resource are realized.

On the basis of fig. 3, an embodiment of the present invention further provides an implementation manner in which a GPU card acquires GPU resources of a work node when there are multiple GPU cards, please refer to fig. 4, fig. 4 is a flowchart illustrating another GPU resource management method provided in the embodiment of the present invention, and step S102 includes the following sub-steps:

and a substep S1021, acquiring GPU resources of each detected GPU card.

And a substep S1022, performing statistics on the GPU resources of all the detected GPU cards to obtain the GPU resources of the working node.

In this embodiment, if the GPU resources may include at least one of computing power resources and video memory resources, and if the GPU resources include computing power resources and video memory resources, the management Pod may perform statistics on the distribution of the computing power resources and the video memory resources of the GPU card on the working node, where the GPU resources of the working node include the computing power resources and the video memory resources obtained through the statistics.

In this embodiment, in order to support hot plug of a GPU card, that is, when the GPU card on a working node changes, it is ensured that an application Pod running on the working node is not interrupted, so that running of a normal service on the application Pod is not affected, an embodiment of the present invention further provides an implementation manner for automatically updating a GPU resource when a working state of the GPU card on the working node changes, referring to fig. 5, fig. 5 is a flowchart of another GPU resource management method provided by the embodiment of the present invention, and the method further includes the following steps:

and step S111, when the working state of the GPU card of the working node changes, the GPU card on the working node is detected again by using the management Pod.

In this embodiment, the changing the operating state of the GPU card of the operating node 20 may include: the normal GPU card is pulled out to cause the state of the GPU card to become unavailable, the normal GPU card fails to cause the state of the GPU card to become unavailable, the new GPU card is inserted into the working node, the state of the new GPU card becomes available, the failed GPU card fails to recover, and the GPU card becomes available again, and so on, that is, the GPU resources of the working node 20 change due to the change of the working state of the GPU card of the working node 20, and at this time, all the GPU cards of the working node 20 need to be redetected by using the management Pod.

And step S112, counting the GPU resources of the re-detected GPU card to obtain the current GPU resources of the working nodes.

In this embodiment, the current GPU resources of the working node 20 are the statistical result of the GPU resources of the GPU card of the currently detected working node, for example, if there are 3 GPU cards on the currently detected working node 20 and the video memory resource of each GPU card is 28GB, then the current video memory resource of the working node 20 is 28GB × 3 — 84 GB.

Step S113, synchronizing the current GPU resources of the working nodes to the master node so as to instruct the master node to update the GPU resources of the working nodes.

In this embodiment, the master node records the GPU resources of the working node in the database or locally, and when the GPU resources of the working node change, updates the GPU resources of the working node, which were recorded in the database or locally before, according to the received current GPU resources.

The method provided by the embodiment of the invention can support hot plug of the GPU card, and meanwhile, when the drive of the GPU card needs to be upgraded, only the GPU card drive component in the management Pod needs to be updated, and the management Pod running on the working node is automatically updated by the main node, so that the maintenance efficiency of the GPU card is obviously improved, and the maintenance cost is saved.

In this embodiment, a same working node may run multiple application pods, and the multiple application pods may share GPU resources of the working node, so that the multiple application pods do not affect each other when using the GPU resources of the same working node, and performance consumption caused by context switching when the GPU resources are shared by the multiple application pods is avoided from affecting the operating efficiency of the application pods, an embodiment of the present invention further provides another GPU resource management method, please refer to fig. 6, where fig. 6 is a flowchart of another GPU resource management method provided in the embodiment of the present invention, and the method further includes the following steps:

step S121, obtain GPU resource allocation of each application Pod.

In this embodiment, the application Pod may be a Pod corresponding to an application deployed by a user in a Kubernetes cluster, the application Pod may use GPU resources of a working node, one working node 20 may run a plurality of different application pods simultaneously, and the plurality of application pods may share the GPU resources on the working node 20.

In this embodiment, the GPU resource configuration of the application Pod is used to characterize the GPU resource required by the application Pod, and the GPU resource configuration may be specified by a user according to an actually running application scene when initiating a request for creating the application Pod.

Step S122, according to the GPU resource configuration of each application Pod, isolating the GPU resources of the working node, so that each application Pod can obtain the GPU resources required for the GPU resource configuration of each application Pod.

In this embodiment, the GPU resources may be isolated in different isolation manners, where the isolation manners include soft isolation and hard isolation, and taking hard isolation as an example, for the NVIDIA GPU card, the hard isolation of the computational resources in the GPU resources may be implemented by using manners including, but not limited to, NVIDIA multiprocessing service (MPS) and the like. The hard isolation of the video memory resources is implemented by using a technology including, but not limited to, virtual memory address mapping, and Unified computing Device Architecture (CUDA) request interception processing.

It should be noted that the method provided by the embodiment of the present invention can be used when GPU resources are allocated to an application Pod when the application Pod runs.

In this embodiment, in order to improve the utilization efficiency of the GPU resources of all the working nodes and schedule more reasonable GPU resources for the application Pod in the kubernets cluster, an embodiment of the present invention further provides a GPU resource scheduling method applied to the master node 10 of the kubernets cluster, please refer to fig. 7, where fig. 7 is a flowchart of a GPU resource scheduling method provided by an embodiment of the present invention, and the method includes the following steps:

step S201, receiving a GPU resource of a working node sent by the working node, where the working node runs with a management Pod, and the GPU resource of the working node is obtained by the working node detecting a GPU card installed on the working node by using the management Pod and using the management Pod when the GPU card is detected.

In this embodiment, when the working node 20 may periodically send the GPU resource of the local node to the master node 10, or when detecting that the GPU resource changes, send the currently latest detected GPU resource to the master node 10. The way in which the specific work node 20 obtains the GPU resources through the management Pod has been described in detail in the foregoing embodiment, and is not described here again.

Step S202, GPU resources are scheduled.

In this embodiment, the scheduling of the GPU resource by the master node 10 may be allocating a needed GPU resource to an application Pod to be created, so as to create the application Pod on a suitable work node 20, or automatically scheduling a GPU resource meeting a requirement of an application Pod running on the work node 20 when a GPU card on the work node 20 is changed and cannot meet the requirement of the application Pod running thereon, so that the application Pod on the work node 20 may be automatically transferred to another work node 20 to run, or when an application Pod is newly created on the work node 20, an available GPU resource and an already used GPU resource on the work node 20 are changed.

On the basis of fig. 7, an embodiment of the present invention further provides a specific implementation manner for scheduling GPU resources, please refer to fig. 8, where fig. 8 is a flowchart illustrating another method for scheduling GPU resources according to an embodiment of the present invention, and step S202 includes the following sub-steps:

in sub-step S2021, an application Pod creation command is received, where the application Pod creation command includes a GPU resource configuration of the application Pod to be created.

In this embodiment, when a user needs to create an application Pod, an application Pod creation command needs to be sent to the master node 10, where the application Pod creation command includes GPU resource configuration of the application Pod to be created.

In the substep S2022, GPU resources of each work node are acquired based on the application Pod creation command.

In this embodiment, the GPU resources of each working node may include available GPU resources of each working node, or include total GPU resources and used GPU resources of each working node, the master node 10 may store the GPU resources sent by the working node 20 into a local or preset database when receiving the GPU resources, and the master node 10 may obtain the GPU resources of each working node from the local or preset database when receiving the application Pod creation command.

In the sub-step S2023, a target GPU resource satisfying the GPU resource configuration is determined from the GPU resources of all the work nodes.

In this embodiment, as a specific implementation manner, the master node 10 not only stores the GPU resources of each working node, but also stores the usage of the GPU resources of each working node, and according to the GPU resources of the working nodes and the used GPU resources, it can be determined whether the available GPU resources of the working nodes meet the requirement of the GPU resource configuration of the application Pod to be created.

In this embodiment, if there are a plurality of GPU resources of the working node that satisfy the GPU resource configuration, a target GPU resource may be determined from the GPU resources of the plurality of working nodes that satisfy the requirement according to a preset principle, where the preset principle may be a random principle, or a maximum or minimum GPU resource may be selected from the GPU resources of the working nodes that satisfy the requirement to perform the determination.

And substep S2024, scheduling the application Pod to a target work node to which the target GPU resource belongs, so that the target work node creates and runs the application Pod to be created.

In this embodiment, as a specific implementation manner, after determining a target working node, a master node writes configuration items related to an application Pod to be created (the configuration items include GPU resource configuration, the number of applications Pod to be created, and other configurations related to creating and running the application Pod to be created), into a database accessible by both the master node and the target working node, and the target working node may poll the database to detect whether there is a need to create and run a Pod, and once finding a configuration item of an application Pod to be created in the database, locally create and run the application Pod to be created according to the configuration item.

The method provided by the embodiment of the invention can automatically determine the target working node meeting the GPU resource configuration according to the GPU resources of all the working nodes when the application Pod is created.

In this embodiment, if a new working node is added to a kubernets cluster, in order to enable a GPU card thereon to quickly enter a ready state and provide GPU resource services in time, an embodiment of the present invention further provides another GPU resource scheduling method, please refer to fig. 9, where fig. 9 is a flowchart of another GPU resource scheduling method provided in an embodiment of the present invention, and the method further includes the following steps:

step S211, receiving a join request for applying to join to the kubernets cluster, which is reported by the new working node.

Step S212, based on the join request, schedules the management Pod to the new working node, so that the new working node creates and runs the management Pod, and sends the GPU resource of the new working node, which is acquired by the new working node through the management Pod, to the master node.

In this embodiment, in order to implement automatic management of GPU resources of a work node, a program for implementing automatic management of GPU resources may be deployed in a kubernets cluster in advance, so that the program for implementing automatic management of GPU resources is run in a Pod manner in the kubernets cluster. The Pod capable of realizing automatic management of GPU resources is the management Pod, and through configuration, when a Kubernetes cluster detects that a new working node is added, the management Pod can be automatically scheduled to the new working node, so that the new working node creates and operates the management Pod, and automatic management of GPU resources of the new working node is realized.

According to the method provided by the embodiment, the management Pod is automatically created for the newly added working node, so that the GPU resources of the newly added working node are automatically discovered and automatically sent to the master node for automatic management, the working node with the GPU resources is quickly increased and decreased in topology, the current topology of the working node is met, and the enterprise cost is greatly reduced.

To more clearly illustrate the GPU resource management method in the above embodiment, the embodiment of the present invention is described by taking a specific implementation manner of a specific application scenario as an example, as shown in fig. 10, fig. 10 shows an exemplary diagram of a specific kubernets cluster provided in the embodiment of the present invention, and the kubernets cluster in fig. 10 includes a master node 10 and 3 working nodes 20: the implementation manners of the nodes 1-3 are the same, taking one of the working nodes 20(node1) as an example, a video card is inserted into the working node 20, the computing power resource of the video card is 100, the video Memory resource is 14, the management Pod running on the working node 20 includes several functional components, i.e., U-GPU-Device, U-Work-query, U-SM, U-Memory, Pre-hook, runC, NVIDIA Driver, and libnvidia-container cli, and when the management Pod runs, the functional components are started.

The U-GPU-Device component detects whether the working node 20 has a GPU card or not, if the GPU card is detected, GPU resources are reported to the main node 10 of the Kubernets cluster in two dimensions of computing power and video memory, and the main node is shown in FIG. 10 to obtain correct GPU resources of node 1: the computing power resource is 100, and the video memory resource is 14.

After receiving a creation command for creating an application Pod, the master node 10 schedules GPU resources for the application Pod according to the GPU resources of each working node, sends the creation command to a target working node, the U-GPU-Device component receives the creation command, then transmits the application specification (namely GPU resource configuration) of the GPU of the application Pod as a parameter to the U-SM component and the U-Memory component, the U-GPU-Device component creates the Pod by calling a runC component, of course, the components such as docker-shim, contiverd-shim and cri-O, conmon can be used to replace a runC component, if the Pod creation succeeds, the Pod name and the timestamp are used as identifiers to be added into a preset queue of the U-word-Que component, and a pre-hook manner is used to inject a calling script to call a libnvidia-conti CLI to mark the created Pod with a flag, the flag includes, but is not limited to, NVIDIA _ DRIVER _ CAPABILITIES flags for determining that the container needs to be mapped to NVIDIA DRIVER library, NVIDIA _ VISIBLE _ DEVICES flags for determining whether the GPU card will be allocated, and ID of the mounted GPU card, and NVIDIA-DRIVER-PATH flags for indicating the driving PATH of the GPU card.

And the U-SM component and the U-Memory component perform hard isolation of the computational power resource and the video Memory resource in the GPU resource according to the received parameter information, wherein the computational power hard isolation is realized by using methods including but not limited to NVIDIA MPS and the like. The hard isolation of the video memory is realized by using technical means including but not limited to virtual memory address mapping and CUDA request interception processing.

The U-word-Que component is used for managing a plurality of application Pods needing to share GPU resources of the node1, so that the plurality of Pods share the GPU resources of the node1, the plurality of Pods can access the GPU card in a concurrent and safe isolation mode, and performance consumption caused by GPU context switching can be avoided.

When the GPU card on node1 fails, node1 synchronizes the latest GPU resource of the node to the master node in time through Pod management, based on the application scenario in fig. 10, please refer to fig. 11, where fig. 11 shows an exemplary diagram of the GPU resource when the GPU card on node1 fails, as can be seen from fig. 11, the GPU resource of node1 that is known by the master node is already zeroed.

When the GPU card on node1 recovers from the failure and another new GPU card is inserted, node1 synchronizes its latest GPU resource to the master node through the management Pod, based on the application scenario of fig. 10, please refer to fig. 12, where fig. 12 shows an exemplary diagram of GPU resources when there are two GPU cards on node1 provided in the embodiment of the present invention, as can be seen from fig. 12, the GPU resource acquired by the master node for node1 is the sum of the two GPU card resources: namely, the computing power resource is 200, and the video memory resource is 28.

When a new node4 joins the kubernets cluster, the master node may instruct node4 to create a management Pod, and the management Pod may send the detected resource of the GPU card on node4 to the master node 10, based on the application scenario in fig. 10, please refer to fig. 13, where fig. 13 shows an exemplary diagram of the GPU resource on node4 provided in the embodiment of the present invention, as can be seen from fig. 13, there is only one GPU card on node4, and the master node acquires the GPU resource of node 4: the computing power resource is 100, and the video memory resource is 14.

It should be noted that fig. 10 to 13 only show GPU resources of a work node having a GPU card therein, and a management Pod is still running on a work node without a GPU card, so that the GPU card can be automatically detected in time and GPU resources can be obtained when the GPU card is inserted, and then the GPU resources are synchronized to the master node 10, which is not shown in fig. 10 to 13.

In order to perform the corresponding steps in the above embodiments and various possible implementations, an implementation of the GPU resource management device 100 is given below. Referring to fig. 14, fig. 14 is a block diagram illustrating a GPU resource management device 100 according to an embodiment of the present invention. It should be noted that, the GPU resource management apparatus 100 provided in this embodiment is applied to the working node 20 in the kubernets cluster in fig. 1 and the electronic device 30 in fig. 2, the basic principle and the generated technical effect are the same as those of the above embodiments, and for the sake of brief description, no reference is made to this embodiment.

The GPU resource management apparatus 100 includes a management module 110 and a transmission module 120.

And the management module 110 is configured to detect a GPU card on the working node by using the management Pod.

The management module 110 is further configured to, if the GPU card is detected, obtain GPU resources of the working node by using the management Pod.

Further, the management module 110 is specifically further configured to: when the working state of the GPU card of the working node changes, the GPU card on the working node is detected again by using the management Pod; counting the GPU resources of the re-detected GPU card to obtain the current GPU resources of the working nodes; and synchronizing the current GPU resources of the working nodes to the master node so as to instruct the master node to update the GPU resources of the working nodes.

Further, the working node runs a plurality of created applications Pod, and the management module 110 is further specifically configured to: acquiring GPU resource allocation of each application Pod; and isolating the GPU resources of the working nodes according to the GPU resource configuration of each application Pod, so that each application Pod can acquire the GPU resources required by the GPU resource configuration of each application Pod.

Further, the number of detected GPU cards is multiple, and the management module 110 is further specifically configured to: acquiring the detected GPU resources of each GPU card; and counting GPU resources of all the detected GPU cards to obtain the GPU resources of the working nodes.

And a sending module 120, configured to send the GPU resources of the working node to the master node by using the management Pod, so that the master node schedules the GPU resources.

In order to perform the corresponding steps in the above embodiments and various possible implementations, an implementation manner of the GPU resource scheduling apparatus 200 is given below. Referring to fig. 15, fig. 15 is a block diagram illustrating a GPU resource scheduling apparatus 200 according to an embodiment of the present invention. It should be noted that, the GPU resource scheduling apparatus 200 provided in this embodiment is applied to the master node 10 in the kubernets cluster in fig. 1 and the electronic device 30 in fig. 2, the basic principle and the generated technical effect are the same as those of the above embodiments, and for the sake of brief description, no reference is made to this embodiment.

The GPU resource scheduler 200 comprises a receiving module 210 and a scheduling module 220.

A receiving module 210, configured to receive a join request, which is reported by a new working node and used to apply for joining to a kubernets cluster.

And the scheduling module 220 is configured to send a management Pod creation command to the new working node based on the join request, so that the new working node executes the management Pod creation command to create and run the management Pod, and send the GPU resource of the new working node, which is acquired by the new working node through the management Pod, to the master node.

Further, there are a plurality of working nodes, and the scheduling module 220 is specifically configured to: receiving an application Pod creation command, wherein the application Pod creation command comprises GPU resource configuration of an application Pod to be created; acquiring GPU resources of each working node based on the application Pod creation command; determining target GPU resources meeting GPU resource configuration from GPU resources of all the working nodes; and scheduling the application Pod to a target work node to which the target GPU resource belongs so as to enable the target work node to create and run the application Pod to be created.

Further, the scheduling module 220 is specifically further configured to: receiving a joining request which is reported by a new working node and is used for applying to join in the Kubernetes cluster; and scheduling the management Pod to the new working node based on the adding request so that the new working node creates and runs the management Pod, and sending the GPU resource of the new working node acquired by the new working node through the management Pod to the master node.

Embodiments of the present invention provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the GPU resource management method as described above, or implements the GPU resource scheduling method as described above.

In summary, embodiments of the present invention provide a GPU resource management method, a scheduling method, an apparatus, an electronic device, and a storage medium, where the GPU resource management method includes: detecting a GPU card on a working node by using the management Pod; if the GPU card is detected, acquiring GPU resources of the working nodes by using the management Pod; and sending the GPU resources of the working nodes to the master node by using the management Pod so that the master node schedules the GPU resources. Compared with the prior art, the embodiment of the invention automatically detects the GPU card on the working node by using the management Pod with the built-in GPU card driving component, and sends the detected GPU resource of the GPU card to the master node so as to schedule the GPU resource by the master node, thereby simplifying the management and use process of the GPU card resource and realizing the automatic management and scheduling of the GPU card resource.

The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A GPU resource management method is applied to a working node in a Kubernets cluster, the Kubernets cluster further comprises a main node, the main node is in communication connection with the working node, a management Pod runs on the working node, a GPU card driving assembly is arranged in the management Pod, and the method comprises the following steps:

detecting a GPU card on the working node by using the management Pod;

if the GPU card is detected, acquiring GPU resources of the working node by using the management Pod;

and sending the GPU resources of the working nodes to the main node by using the management Pod so that the main node schedules the GPU resources.

2. A GPU resource management method as recited in claim 1, wherein the method further comprises:

when the working state of the GPU card of the working node changes, the management Pod is used for detecting the GPU card on the working node again;

3. The GPU resource management method of claim 1, wherein the worker node runs a plurality of created applications Pod, the method further comprising:

acquiring GPU resource allocation of each application Pod;

and according to the GPU resource configuration of each application Pod, isolating the GPU resources of the working nodes, so that each application Pod can acquire the GPU resources required by the GPU resource configuration of each application Pod.

4. The GPU resource management method of claim 1, wherein the number of detected GPU cards is multiple, and the step of obtaining GPU resources of the working node using the management Pod comprises:

acquiring the detected GPU resources of each GPU card;

5. A GPU resource scheduling method is applied to a main node in a Kubernets cluster, the Kubernets cluster further comprises a working node, the working node is in communication connection with the main node, a management Pod runs on the working node, a GPU card driving assembly is arranged in the management Pod, and the method comprises the following steps:

receiving GPU resources of the working nodes sent by the working nodes, wherein the working nodes run management Pod, and the GPU resources of the working nodes are obtained by the working nodes through detecting GPU cards installed on the working nodes by using the management Pod and through using the management Pod when the GPU cards are detected;

and scheduling the GPU resources.

6. The method for scheduling GPU resources of claim 5, wherein there are a plurality of the work nodes, and the step of scheduling the GPU resources comprises:

7. A GPU resource scheduling method as defined in claim 5, further comprising:

receiving a joining request which is reported by a new working node and is used for applying to join the Kubernetes cluster;

8. The GPU resource management device is applied to a work node in a Kubernets cluster, the Kubernets cluster further comprises a main node, the main node is in communication connection with the work node, a management Pod is operated on the work node, a GPU card driving assembly is arranged in the management Pod, and the device comprises:

the management module is used for detecting the GPU card on the working node by using the management Pod;

the management module is further configured to acquire, if a GPU card is detected, GPU resources of the working node by using the management Pod;

and the sending module is used for sending the GPU resources of the working nodes to the main node by using the management Pod so that the main node schedules the GPU resources.

9. The GPU resource scheduling device is applied to a main node in a Kubernets cluster, the Kubernets cluster further comprises a working node, the working node is in communication connection with the main node, a management Pod runs on the working node, a GPU card driving assembly is arranged in the management Pod, and the device comprises:

the receiving module is used for receiving GPU resources of the working nodes sent by the working nodes, wherein the working nodes run management Pod, and the GPU resources of the working nodes are obtained by the working nodes through detecting GPU cards installed on the working nodes by using the management Pod and through using the management Pod when the GPU cards are detected;

and the scheduling module is used for scheduling the GPU resources.

10. An electronic device comprising a processor and a memory; the memory is used for storing programs; the processor is configured to implement the GPU resource management method of any of claims 1-4, or the GPU resource scheduling method of any of claims 5-7, when executing the program.

11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the GPU resource management method of any of claims 1-4, or the GPU resource scheduling method of any of claims 5-7.