CN111930522A

CN111930522A - GPU virtualization and resource scheduling method and device

Info

Publication number: CN111930522A
Application number: CN202011012220.8A
Authority: CN
Inventors: 张昭; 李强
Original assignee: Changzhou Weiyizhi Technology Co Ltd
Current assignee: Changzhou Weiyizhi Technology Co Ltd
Priority date: 2020-09-24
Filing date: 2020-09-24
Publication date: 2020-11-13

Abstract

The invention provides a GPU virtualization and resource scheduling method and a device, wherein the method comprises the following steps: s1, loading and registering the GPU on the k8S cluster, virtualizing the GPU on the granularity of cuda cores and video memory, and constructing a node resource list according to k8S cluster node information obtained through virtualization; s2, after the task to be processed is submitted, distributing GPU resources according to the node resource list and the resources required by the task to be processed, wherein a corresponding scheduler is selected according to the size of the resources required by the task to be processed; and S3, after the GPU resources are distributed, monitoring the resource occupation condition of each node of the current k8S cluster, and displaying and alarming according to the resource occupation condition. The method of the invention has the advantages of flexibility, high efficiency, resource saving, robust operation and the like.

Description

GPU virtualization and resource scheduling method and device

Technical Field

The invention relates to the technical field of GPU virtualization, in particular to a GPU virtualization and resource scheduling method, a GPU virtualization and resource scheduling device, computer equipment, a non-transitory computer readable storage medium and a computer program product.

Background

The convolution neural network in deep learning is a set of convolution operation and matrix operation in mathematics, and the convolution operation can be completed by converting the convolution operation into the matrix operation. These operations are similar to Graphics operations that are often performed by a GPU (Graphics Processing Unit), and thus the deep-learning correlation algorithm is preferably performed by the GPU.

The large-scale distributed machine learning has more related algorithms and huge calculation task amount. Therefore, the general GPU virtualization and resource scheduling strategies at present are difficult to meet the requirements of large distributed machine learning tasks, such as distributed machine learning model training.

Therefore, it is urgently needed to provide a GPU virtualization and resource scheduling strategy that is flexible, efficient, resource-saving, and robust in operation for large distributed machine learning tasks.

Disclosure of Invention

The invention provides a GPU virtualization and resource scheduling method for solving the technical problems, and the method has the advantages of flexibility, high efficiency, resource saving, robust operation and the like.

The invention also proposes a GPU virtualization and resource scheduling apparatus, a computer device, a non-transitory computer-readable storage medium and a computer program product.

The technical scheme adopted by the invention is as follows:

a GPU virtualization and resource scheduling method comprises the following steps: s1, loading and registering a GPU on a k8S (Kubernets, an open source container arrangement engine) cluster, virtualizing the GPU on the granularity of a cuda (computer Unified Device Architecture) core and a video memory, and constructing a node resource list according to k8S cluster node information obtained through virtualization; s2, after the task to be processed is submitted, distributing GPU resources according to the node resource list and the resources required by the task to be processed, wherein a corresponding scheduler is selected according to the size of the resources required by the task to be processed; and S3, after the GPU resources are distributed, monitoring the resource occupation condition of each node of the current k8S cluster, and displaying and alarming according to the resource occupation condition.

The task to be processed is a training task of the distributed machine learning model.

Step S1 specifically includes: declaring and starting k8s device-plugin, and initiating a GPU resource request to obtain a node resource information array containing cuda core information and video memory information; analyzing the node resource information array; and constructing a node resource list containing GPU resource-node mapping according to the analysis result.

Step S2 specifically includes: judging the GPU resource allowance according to the resources required by the task to be processed and the node resource list; if the GPU resource allowance is sufficient, returning available nodes, containers and resource configuration information to the task controller; if the GPU resource allowance is insufficient, the task to be processed is placed back to a task submission queue, and when the subsequent GPU resource is idle, the task to be processed is pulled up according to the resource required by the task to be processed and the submission time, and the currently available node, container and resource configuration information are returned to a task controller; and the task controller marks task execution candidate containers according to the returned available nodes, containers and resource configuration information, and selects a corresponding scheduler according to the size of the resources required by the task to be processed to realize the distribution of GPU resources.

The resource occupation condition includes Perf analysis data, video memory usage amount, utilization rate, and process, and step S3 specifically includes: sampling the resource occupation condition; constructing and displaying a task node resource occupation view and a task running state view according to the resource occupation condition; and sending an alarm prompt according to the resource occupation condition and a preset alarm rule.

The GPU virtualization and resource scheduling method further comprises the following steps: and after the task to be processed is processed, unloading GPU resources of the processed task, updating available nodes, containers and resource configuration information, and updating the node resource list.

A GPU virtualization and resource scheduling apparatus, comprising: the virtualization module is used for loading and registering the GPU on the k8s cluster, virtualizing the GPU on the granularity of the cuda core and the video memory, and constructing a node resource list according to k8s cluster node information obtained through virtualization; the distribution module is used for distributing GPU resources according to the node resource list and the resources required by the tasks to be processed after the tasks to be processed are submitted, wherein a corresponding scheduler is selected according to the size of the resources required by the tasks to be processed; and the monitoring module is used for monitoring the resource occupation condition of each node of the current k8s cluster after the GPU resource allocation is finished, and displaying and alarming according to the resource occupation condition.

A computer device comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein when the processor executes the program, the GPU virtualization and resource scheduling method is realized.

A non-transitory computer readable storage medium, having stored thereon a computer program which, when executed by a processor, implements the above-described GPU virtualization and resource scheduling methods.

A computer program product, wherein instructions when executed by a processor perform the above-described GPU virtualization and resource scheduling methods.

The invention has the beneficial effects that:

according to the invention, the GPU is virtualized on the granularity of the cuda core and the video memory, so that the GPU resources are divided into finer granularity, and the granularity of the algorithm with large task amount calling the underlying computing power resources is finer and more flexible; selecting a corresponding scheduler according to the size of resources required by the tasks to be processed during GPU resource allocation, so that the tasks consumed by different resources can be executed efficiently in a mode of high resource utilization rate without conflict; by displaying and alarming the resource occupation condition, the GPU resource can be flexibly controlled and effectively monitored, and therefore the method has the advantages of flexibility, high efficiency, resource saving, robust operation and the like.

Drawings

FIG. 1 is a flowchart of a GPU virtualization and resource scheduling method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram illustrating a process of executing a training task of a distributed machine learning model according to an embodiment of the present invention;

fig. 3 is a block diagram illustrating a GPU virtualization and resource scheduling apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, the GPU virtualization and resource scheduling method according to the embodiment of the present invention includes the following steps:

s1, loading and registering the GPU on the k8S cluster, virtualizing the GPU on the granularity of the cuda core and the video memory, and constructing a node resource list according to k8S cluster node information obtained through virtualization.

In the embodiment of the invention, the GPU performs service registration according to the resource granularity of the cuda core (cuda-core) and the display memory (GPU-memory) level. Specifically, k8s device-plugin can be declared to be started, a GPU resource request is initiated, and a node resource information array containing cuda core information and video memory information is obtained. A node resource information array of the specific embodiment of the present invention is as follows:

{'node': 'node_name', 'pod': 'pod_name', 'container': 'container_name', 'cluster_ip': 'virtual_ip_host', 'cpu_resource': [{'id': 0, 'core': '8'}, {'id': 1, 'core': '8'}], 'gpu_resource': [{'id': 0, 'cuda_core': 500, 'cuda_memory': 500}, {'id': 1, 'cuda_core': 1200, 'cuda_memory': 1200}], 'memory': '108Gi', 'volume': '500Gi'}

wherein, the node resource information array comprises: k8s cluster node (node), given by node name; k8s cluster pod resource (pod), given by pod name (pod _ name); k8s cluster container resource (container), given by container name (container _ name); the pod in the k8s cluster corresponds to the intranet ip (cluster _ ip), and is given by virtual ip (virtual _ ip _ host); a CPU resource (CPU _ resource) given by a CPU number (id) and a CPU core number (core); a GPU resource (GPU _ resource) given by a GPU number (id), a cuda core number (cuda _ core), and a cuda memory (cuda _ memory); memory (memory), given in size (e.g., 108 Gi); volume store (volume), given in size (e.g., 500 Gi).

Then, the node resource information array may be analyzed to obtain the specific content included in the node resource information array. It should be noted that, for resource information related to a CPU (Central Processing Unit) and a GPU, if the resource information is in the yaml format, the yaml format may be parsed into the json format.

And finally, constructing a node resource list containing GPU resource-node mapping according to the analysis result, wherein the node resource list comprises nodes, container information, cuda core number of the GPU to be allocated subsequently, cuda storage information and the like.

And S2, after the task to be processed is submitted, allocating GPU resources according to the node resource list and the resources required by the task to be processed, wherein a corresponding scheduler is selected according to the size of the resources required by the task to be processed.

The GPU virtualization and resource scheduling method provided by the embodiment of the invention can be suitable for processing tasks with relatively large task quantity, so that the tasks to be processed can be training tasks of a distributed machine learning model and the like.

Specifically, after the task to be processed is submitted, firstly, the GPU resource allowance can be judged according to the resource and the node resource list required by the task to be processed, that is, the total GPU resource of the k8s cluster is calculated, compared with the GPU resource required by the task to be processed, and the comparison result is returned to the task submitting end in a recallback callback manner.

If the GPU resource margin is sufficient, namely, the cluster _ cuda _ core (the cluster whole cuda core number) > task _ cuda _ core (the cuda core number required by the task) and the cluster _ cuda _ memory (the cluster whole cuda storage) > task _ cuda _ memory (the cuda storage required by the task) are both used, the return result of the callback is allowed, and the available node, container and resource configuration information are returned to the task controller. Two nodes, containers and resource configuration information of the specific embodiment of the present invention are as follows:

"resource1":{"resource_node1":"worker-node1","resource_pod1": "pod-a1","resource_container1":"containter-c1","resource_cuda_memory": 80,"resource_cuda_core": 60,"resource_node_ip": "192.**.**.*1"}

"resource2":{"resource_node1":"worker-node2","resource_pod1": "pod-a2","resource_container1":"containter-c2","resource_cuda_memory": 80,"resource_cuda_core": 60,"resource_node_ip": "192.**.**.*2"}

the node, the container and the resource configuration information include a resource name (resource), a resource node (resource _ node), a resource node (resource _ point), a resource node container (resource _ container), a resource node storage (resource _ cuda _ memory) corresponding to the resource node container, a resource core number (resource _ cuda _ core) corresponding to the resource node container, and an internal ip (resource _ node _ ip) of the resource node container.

And if the GPU resource margin is insufficient, returning a callback result to be rejected, returning the task to be processed to the task submission queue, and submitting a task submission version task _ version + 1. And simultaneously marking resources required by the task to be processed, pulling up the task to be processed according to the resources required by the task to be processed and the submission time when the subsequent GPU resources are idle, and returning the currently available nodes, containers and resource configuration information to the task controller. The judgment condition for pulling up the task to be processed again is as follows: first, task _ cuda _ memory < cluster _ cuda _ memory and task _ cuda _ core < cluster _ cuda _ memory are satisfied, and then according to the task _ verson forward ordering, the larger the task _ version, the more preferentially the execution is. That is, the pending task that was previously placed back into the task submission queue due to insufficient GPU resource headroom may be pulled up for execution if the required resources are less than the GPU resource headroom and are first submitted.

Then, the task controller may mark the task execution candidate container according to the returned available node, container and resource configuration information, and select a corresponding scheduler according to the size of the resource required by the task to be processed, for example, if the resource required by the task to be processed is greater than 1cps (cuda _ memory =100, cuda _ core = 100), a link-scheduler (custom connection scheduler) may be used; the resources required by the tasks to be processed are equal to 1cps, and fragment-scheduler (self-defined fragment scheduler) can be used; the resource required by the task to be processed is less than 1cps, share schduler (self-defined shared scheduler) can be used, and different schedulers execute different scheduling logics. The scheduler can be bilaterally checked with the working nodes when the resource allocation is implemented, the scheduler sends GPU resource configuration information to cover the environment configuration of the target working nodes and containers after the checking is completed, and the covered nodes and containers are marked as resource-label-finished.

And S3, after the GPU resources are distributed, monitoring the resource occupation condition of each node of the current k8S cluster, and displaying and alarming according to the resource occupation condition.

Specifically, the resource occupation condition can be sampled, the Perf analysis data, the Memory-Usage amount (Memory-Usage), the utilization rate (voltate GPU-Util), the process and the like are mainly sampled, and a task node resource occupation view and a task running state view are constructed and displayed according to the resource occupation condition. Meanwhile, an alarm prompt can be sent out according to the resource occupation condition and a preset alarm rule, for example, the alarm rule can be Memory-use <15% within 3min of time, and then when the video Memory Usage is less than 15% within 3min, an alarm mail or an alarm message can be sent to a preset receiving address.

In one embodiment of the invention, if a single node frequently gives an alarm, for example, if the alarm frequency exceeds a set value within a defined time period, the node can be marked as unavailable, and the operator executed by the node is transferred to other substitute nodes.

In addition, after the task to be processed is completed, a Hook function (Hook _ function) may be executed, and the Hook function may unload the GPU resource corresponding to the processed task in a signal-triggered manner (the trigger signal is an algorithm execution result or a file output, or may be self-defined, and after the Hook function is correspondingly identified, a resource unloading process is executed), update available nodes, containers, and resource configuration information, and update a node resource list.

In an embodiment of the present invention, a process for executing a training task of a distributed machine learning model to which the GPU virtualization and resource scheduling method of the embodiment of the present invention is applied is shown in fig. 2. After the training task of the distributed machine learning model is submitted, whether the resource corresponding to the task is sufficient or not can be judged. If not, the task is put into a commit queue to wait. If the node resource information is sufficient, the available node, the available pod and the available container are labeled, node resource information synchronization is carried out, the corresponding node is covered by resource calibration, and the node resource information is returned to the task controller. And next, selecting a GPU scheduler, and distributing the training tasks of the distributed machine learning model to each node to be executed by the selected GPU scheduler. During task execution, the use details of GPU resources can be sampled at regular time and displayed through graphs; meanwhile, whether the use data of the GPU resources exceed an alarm threshold value is judged, if yes, mail alarm is carried out, and if not, the tasks are continuously and normally executed. After the training task of the distributed machine learning model is finished, GPU resources can be put back to the resource pool, and the put back resources can be calibrated again.

According to the GPU virtualization and resource scheduling method disclosed by the embodiment of the invention, the GPU is virtualized on the granularity of the cuda core and the video memory, so that the GPU resources are divided into finer granularity, and the granularity of the algorithm with large task load calling the underlying computing power resources is finer and more flexible; selecting a corresponding scheduler according to the size of resources required by the tasks to be processed during GPU resource allocation, so that the tasks consumed by different resources can be executed efficiently in a mode of high resource utilization rate without conflict; by displaying and alarming the resource occupation condition, the GPU resource can be flexibly controlled and effectively monitored, and therefore the method has the advantages of flexibility, high efficiency, resource saving, robust operation and the like.

Corresponding to the GPU virtualization and resource scheduling method of the above embodiment, the present invention further provides a GPU virtualization and resource scheduling device.

As shown in fig. 3, the GPU virtualization and resource scheduling apparatus according to the embodiment of the present invention includes: a virtualization module 10, an assignment module 20, and a monitoring module 30. The virtualization module 10 is configured to load and register the GPU on the k8s cluster, virtualize the GPU on the granularity of cuda core and video memory, and construct a node resource list according to k8s cluster node information obtained through virtualization; the allocation module 20 is configured to, after the task to be processed is submitted, allocate GPU resources according to the node resource list and resources required by the task to be processed, wherein a corresponding scheduler is selected according to the size of the resources required by the task to be processed; the monitoring module 30 is configured to monitor resource occupation conditions of each node of the current k8s cluster after GPU resource allocation is completed, and perform display and alarm according to the resource occupation conditions.

In the embodiment of the invention, the GPU performs service registration according to the resource granularity of the cuda core (cuda-core) and the display memory (GPU-memory) level.

Specifically, after declaring to start k8s device-plugin and initiating a GPU resource request, the virtualization module 10 may obtain a node resource information array including cuda core information and video memory information. A node resource information array of the specific embodiment of the present invention is as follows:

Then, the virtualization module 10 may analyze the node resource information array to obtain the specific content included in the node resource information array. It should be noted that, for the resource information related to the CPU and the GPU, if the resource information is in the yaml format, the yaml format may be parsed into the json format.

Finally, the virtualization module 10 may construct a node resource list including GPU resource-node mapping according to the parsing result, where the node resource list includes nodes, container information, cuda core number of the GPU to be subsequently allocated, cuda storage information, and the like.

The GPU virtualization and resource scheduling device provided by the embodiment of the invention can be suitable for processing tasks with relatively large task quantity, so that the tasks to be processed can be training tasks of a distributed machine learning model and the like.

After the task to be processed is submitted, the allocation module 20 may first perform GPU resource allowance judgment according to the resource and node resource list required by the task to be processed, that is, calculate the total GPU resource of the k8s cluster, compare the total GPU resource with the GPU resource required by the task to be processed, and return the comparison result to the task submitting end in a recallback callback manner.

The monitoring module 30 may specifically sample the resource occupation condition, mainly sample Perf analysis data, Memory-Usage (Memory-Usage), utilization rate (voltate GPU-utility), process, and the like, and construct and display a task node resource occupation view and a task running state view according to the resource occupation condition. Meanwhile, an alarm prompt can be sent out according to the resource occupation condition and a preset alarm rule, for example, the alarm rule can be Memory-use <15% within 3min of time, and then when the video Memory Usage is less than 15% within 3min, an alarm mail or an alarm message can be sent to a preset receiving address.

In addition, after the task to be processed is completed, the update module may execute a Hook function (Hook _ function), and the Hook function may unload the GPU resource corresponding to the processed task in a signal-triggered manner (the trigger signal is an algorithm execution result or file output, or may be self-defined, and after the Hook function is correspondingly recognized, execute a resource unloading process), update available nodes, containers, and resource configuration information, and update a node resource list.

According to the GPU virtualization and resource scheduling device disclosed by the embodiment of the invention, the GPU is virtualized on the granularity of the cuda core and the video memory, so that the GPU resources are divided into finer granularity, and the granularity of the algorithm with large task load calling the underlying computing power resources is finer and more flexible; selecting a corresponding scheduler according to the size of resources required by the tasks to be processed during GPU resource allocation, so that the tasks consumed by different resources can be executed efficiently in a mode of high resource utilization rate without conflict; by displaying and alarming the resource occupation condition, the GPU resource can be flexibly controlled and effectively monitored, and therefore the method has the advantages of flexibility, high efficiency, resource saving, robust operation and the like.

The invention further provides a computer device corresponding to the embodiment.

The computer device of the embodiment of the present invention includes a memory, a processor, and a computer program that is stored in the memory and can be run on the processor, and when the processor executes the computer program, the GPU virtualization and resource scheduling method according to the above embodiment of the present invention can be implemented.

According to the computer equipment provided by the embodiment of the invention, when the processor executes the computer program stored on the memory, the GPU is virtualized on the granularity of the cuda core and the video memory, so that the GPU resources are divided into finer granularity, and the granularity of the algorithm with large task amount calling the underlying computing power resources is finer and more flexible; selecting a corresponding scheduler according to the size of resources required by the tasks to be processed during GPU resource allocation, so that the tasks consumed by different resources can be executed efficiently in a mode of high resource utilization rate without conflict; by displaying and alarming the resource occupation condition, the GPU resource can be flexibly controlled and effectively monitored, and therefore the GPU virtualization and resource scheduling method has the advantages of flexibility, high efficiency, resource saving, robust operation and the like.

The invention also provides a non-transitory computer readable storage medium corresponding to the above embodiment.

A non-transitory computer readable storage medium of an embodiment of the present invention stores thereon a computer program, and when the computer program is executed by a processor, the computer program can implement the GPU virtualization and resource scheduling method according to the above embodiment of the present invention.

According to the non-transitory computer-readable storage medium of the embodiment of the invention, when the processor executes the computer program stored thereon, the GPU is virtualized on the granularity of the cuda core and the video memory, so that the GPU resources are divided into finer granularity, and the granularity of the algorithm with large task volume calling the underlying computing power resources is finer and more flexible; selecting a corresponding scheduler according to the size of resources required by the tasks to be processed during GPU resource allocation, so that the tasks consumed by different resources can be executed efficiently in a mode of high resource utilization rate without conflict; by displaying and alarming the resource occupation condition, the GPU resource can be flexibly controlled and effectively monitored, and therefore the GPU virtualization and resource scheduling method has the advantages of flexibility, high efficiency, resource saving, robust operation and the like.

The present invention also provides a computer program product corresponding to the above embodiments.

When the instructions in the computer program product of the embodiment of the present invention are executed by the processor, the GPU virtualization and resource scheduling method according to the above-described embodiment of the present invention may be executed.

According to the computer program product provided by the embodiment of the invention, when the processor executes the instruction, the GPU is virtualized on the granularity of the cuda core and the video memory, so that the GPU resources are divided into finer granularity, and the granularity of the algorithm with large task load calling the underlying computing power resources is finer and more flexible; selecting a corresponding scheduler according to the size of resources required by the tasks to be processed during GPU resource allocation, so that the tasks consumed by different resources can be executed efficiently in a mode of high resource utilization rate without conflict; by displaying and alarming the resource occupation condition, the GPU resource can be flexibly controlled and effectively monitored, and therefore the GPU virtualization and resource scheduling method has the advantages of flexibility, high efficiency, resource saving, robust operation and the like.

In the description of the present invention, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. The meaning of "plurality" is two or more unless specifically limited otherwise.

In the present invention, unless otherwise expressly stated or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; either directly or indirectly through intervening media, either internally or in any other relationship. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

In the present invention, unless otherwise expressly stated or limited, the first feature "on" or "under" the second feature may be directly contacting the first and second features or indirectly contacting the first and second features through an intermediate. Also, a first feature "on," "over," and "above" a second feature may be directly or diagonally above the second feature, or may simply indicate that the first feature is at a higher level than the second feature. A first feature being "under," "below," and "beneath" a second feature may be directly under or obliquely under the first feature, or may simply mean that the first feature is at a lesser elevation than the second feature.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A GPU virtualization and resource scheduling method is characterized by comprising the following steps:

s1, loading and registering the GPU on the k8S cluster, virtualizing the GPU on the granularity of cuda cores and video memory, and constructing a node resource list according to k8S cluster node information obtained through virtualization;

s2, after the task to be processed is submitted, distributing GPU resources according to the node resource list and the resources required by the task to be processed, wherein a corresponding scheduler is selected according to the size of the resources required by the task to be processed;

2. The GPU virtualization and resource scheduling method of claim 1, wherein the task to be processed is a training task of a distributed machine learning model.

3. The GPU virtualization and resource scheduling method according to claim 1 or 2, wherein step S1 specifically includes:

declaring and starting k8s device-plugin, and initiating a GPU resource request to obtain a node resource information array containing cuda core information and video memory information;

analyzing the node resource information array;

and constructing a node resource list containing GPU resource-node mapping according to the analysis result.

4. The GPU virtualization and resource scheduling method of claim 3, wherein step S2 specifically comprises:

judging the GPU resource allowance according to the resources required by the task to be processed and the node resource list;

if the GPU resource allowance is sufficient, returning available nodes, containers and resource configuration information to the task controller;

if the GPU resource allowance is insufficient, the task to be processed is placed back to a task submission queue, and when the subsequent GPU resource is idle, the task to be processed is pulled up according to the resource required by the task to be processed and the submission time, and the currently available node, container and resource configuration information are returned to a task controller;

and the task controller marks task execution candidate containers according to the returned available nodes, containers and resource configuration information, and selects a corresponding scheduler according to the size of the resources required by the task to be processed to realize the distribution of GPU resources.

5. The GPU virtualization and resource scheduling method of claim 4, wherein the resource occupation status includes Perf analysis data, video memory usage amount, utilization rate, and process, and step S3 specifically includes:

sampling the resource occupation condition;

constructing and displaying a task node resource occupation view and a task running state view according to the resource occupation condition;

and sending an alarm prompt according to the resource occupation condition and a preset alarm rule.

6. A GPU virtualization and resource scheduling method according to claim 5, further comprising:

and after the task to be processed is processed, unloading GPU resources of the processed task, updating available nodes, containers and resource configuration information, and updating the node resource list.

7. A GPU virtualization and resource scheduling apparatus, comprising:

the virtualization module is used for loading and registering the GPU on the k8s cluster, virtualizing the GPU on the granularity of the cuda core and the video memory, and constructing a node resource list according to k8s cluster node information obtained through virtualization;

the distribution module is used for distributing GPU resources according to the node resource list and the resources required by the tasks to be processed after the tasks to be processed are submitted, wherein a corresponding scheduler is selected according to the size of the resources required by the tasks to be processed;

and the monitoring module is used for monitoring the resource occupation condition of each node of the current k8s cluster after the GPU resource allocation is finished, and displaying and alarming according to the resource occupation condition.

8. A computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the program, implements the GPU virtualization and resource scheduling method of any of claims 1-6.

9. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the GPU virtualization and resource scheduling method of any of claims 1-6.

10. A computer program product, characterized in that instructions in the computer program product, when executed by a processor, perform the GPU virtualization and resource scheduling method according to any of claims 1-6.