CN114840344A - GPU equipment resource allocation method and system based on kubernetes - Google Patents

GPU equipment resource allocation method and system based on kubernetes Download PDF

Info

Publication number
CN114840344A
CN114840344A CN202210547160.2A CN202210547160A CN114840344A CN 114840344 A CN114840344 A CN 114840344A CN 202210547160 A CN202210547160 A CN 202210547160A CN 114840344 A CN114840344 A CN 114840344A
Authority
CN
China
Prior art keywords
equipment
gpu
real
logic
target logic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210547160.2A
Other languages
Chinese (zh)
Inventor
马春雨
吴春光
张远航
李钰磊
张里阳
刘晓敏
张玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Galaxy Qilin Software Changsha Co ltd
Original Assignee
Galaxy Qilin Software Changsha Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Galaxy Qilin Software Changsha Co ltd filed Critical Galaxy Qilin Software Changsha Co ltd
Priority to CN202210547160.2A priority Critical patent/CN114840344A/en
Publication of CN114840344A publication Critical patent/CN114840344A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a GPU equipment resource allocation method and a system based on kubernets, wherein the method comprises the following steps: the GPU equipment plug-in generates at least two logic equipment for each GPU real equipment; the GPU device plug-in reports the device ID of the logic device and the health condition of the corresponding GPU real device, and if the GPU application container is dispatched to a GPU computing node, the kubel device manager module acquires the PodID, the container ID and the binding mark of the GPU application container and sends the PodID, the container ID and the binding mark to the GPU device plug-in; and the GPU equipment plug-in selects GPU real equipment with the lowest load rate to distribute logic equipment, or matches and distributes the GPU real equipment according to the binding marks and PodID. The invention realizes the sharing of the GPU equipment to the GPU application container and improves the utilization rate of the GPU equipment.

Description

GPU equipment resource allocation method and system based on kubernetes
Technical Field
The invention relates to the field of computer software, in particular to a Kubernetes-based GPU equipment resource allocation method and system.
Background
kubernets is the most common container arrangement system at present, containerization applications are more complex and diverse, and some application containers need to use devices such as a GPU, an FPGA, a network card and the like, so kubernets provides a device resource management mechanism to recycle the devices. The registration and allocation of Device resources are performed by interaction between a kubbeloet Device Manager module and a Device plugin to obtain a Device list of a computing node and apply for the Device resources, and then the Device resources are synchronized to an API Server in kubernets and allocated to the relevant computing node by a scheduler module when an application using the Device is deployed. The API Server is a core module of Kubernets and is responsible for counting and recording equipment resources; the scheduler module is also part of kubernets, and is primarily responsible for the scheduling of Pod.
Pod is a basic unit scheduled in kubernets, and one Pod may contain multiple GPU application containers, but currently, the distribution of kubernets to GPU device resources cannot realize that GPU application containers belonging to the same Pod share GPU devices, and a GPU device plug-in kubernets only satisfies that each GPU application container binds to a single GPU device, which results in a low load rate of each GPU device in a scenario where a large amount of GPU applications are deployed by a multi-GPU device computing node, and wastes GPU device resources.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: aiming at the problems in the prior art, the method and the system for allocating the GPU equipment resources based on kubernets are provided, the GPU equipment can be shared and bound, and therefore the utilization rate of the GPU equipment is improved.
In order to solve the technical problems, the invention adopts the technical scheme that:
a Kubernets-based GPU device resource allocation method is applied to GPU computing nodes in the Kubernets, each GPU computing node comprises at least 2 real GPU devices, a GPU device plug-in and a Kubelnet device manager module, and the Kubelnet device manager module is connected with each real GPU device through the GPU device plug-in, and the method comprises the following steps:
s1) the GPU device plug-in obtains the device information of each real GPU device, and generates at least two logic devices for each real GPU device according to the size of the video memory of each real GPU device;
s2) the GPU device plug-in obtains the load condition and the health condition of each real GPU device, reports the device IDs of all logic devices and the health condition of the corresponding real GPU device to the kubel device manager module, and then the kubel device manager module sends the information of the real GPU devices and the logic devices of the device resources to an API Server in Kubernets;
s3) if the Pod containing the GPU application container is dispatched to the GPU computing node, the kubel device manager module acquires the corresponding Pod ID, the container ID, the binding mark and the device ID of the allocated target logic device, and sends the Pod ID, the container ID, the binding mark and the device ID of the allocated target logic device to the GPU device plugin;
s4) if the device ID of the target logic device is one and the binding mark is a first value, the GPU device plug-in selects the real GPU device with the lowest load rate, allocates the target logic device to the selected real GPU device, then allocates the target logic device to the corresponding GPU application container, adjusts the corresponding relation between the rest logic devices and each real GPU device, and returns to the step S2) until the end; otherwise, executing step S5);
s5) if the device ID of the target logic device is one and the binding mark is a second value, the GPU device plug-in matches the real GPU device according to the Pod ID, if the matching result exists, the target logic device is distributed to the matched real GPU device, then the target logic device is distributed to the corresponding GPU application container, the corresponding relation between the rest logic devices and each real GPU device is adjusted, and the step S2) is returned until the end.
Further, step S1) is preceded by a step of configuring the kubelet device management module and the GPU device plugin, which specifically includes: adding fields containing equipment IDs, Pod IDs, container IDs and binding marks in the equipment management protocol of kubernets, configuring kubbelet equipment management modules and GPU equipment plugins according to the modified equipment management protocol, deploying the GPU equipment plugins into the kubernets in a containerization mode, and mapping all GPU equipment of hosts to containers of the GPU equipment plugins.
Further, step S4) specifically includes the following steps:
s41) matching the device ID of the target logic device with the corresponding original real GPU device, obtaining a load rate of the original real GPU device, and if the load rate of the original real GPU device is not the minimum among all the empty real GPU devices, performing step S42); if the load rate of the original real GPU equipment is the minimum of all the spare real GPU equipment, distributing the target logic equipment to the original real GPU equipment, then distributing to a corresponding GPU application container, feeding back the actual path and the authority of the original real GPU equipment to the kubelelet equipment manager module, and returning to the step S2), wherein the spare real GPU equipment is the real GPU equipment with the unallocated logic equipment;
s42) selecting the real GPU device with the smallest load ratio among all the spare real GPU devices as the current real GPU device, selecting one unallocated logic device of the current real GPU device and allocating it to the original real GPU device, allocating the target logic device to the current real GPU device, and then allocating it to the corresponding GPU application container, and feeding back the actual path and authority of the current real GPU device to the kubelet device manager module.
Further, step S5) specifically includes the following steps:
s51) matching all real GPU equipment according to the Pod IDs corresponding to the equipment IDs of the target logic equipment, and if a matching result exists and the matched real GPU equipment has logic equipment which is not distributed, taking the matched real GPU equipment as the current real GPU equipment;
s52) matching corresponding original real GPU equipment according to the equipment ID of the target logic equipment, if the original real GPU equipment is not the current real GPU equipment, selecting one unallocated logic equipment of the current real GPU equipment and allocating the logic equipment to the original real GPU equipment, allocating the target logic equipment to the current real GPU equipment, then allocating the logic equipment to a corresponding GPU application container, and feeding back the actual path and authority of the current real GPU equipment to the kubel equipment manager module; and if the original real GPU equipment is the current real GPU equipment, distributing the target logic equipment to the original real GPU equipment, then distributing the target logic equipment to the corresponding GPU application container, and feeding back the actual path and the authority of the original real GPU equipment to the kubel equipment manager module.
Further, step S51) further includes a processing step that no matching result exists, or no unallocated logic device exists in the matched real GPU device, which specifically includes:
s51a) if there is no matching result or the matched real GPU device has no unallocated logic device, matching the corresponding original real GPU device according to the device ID of the target logic device, obtaining the load rate of the original real GPU device, and if the load rate of the original real GPU device is not the minimum among all the spare real GPU devices, performing step S51 b); if the load rate of the original real GPU equipment is the minimum in all the spare real GPU equipment, allocating the target logic equipment to the original real GPU equipment, then allocating the target logic equipment to a corresponding GPU application container, feeding back the actual path and the authority of the original real GPU equipment to the kubelelet equipment manager module, saving the Pod ID corresponding to the equipment ID of the target logic equipment as the Pod ID corresponding to the original real GPU equipment, and returning to the step S2, wherein the spare real GPU equipment is the real GPU equipment with unallocated logic equipment;
s51b) selecting the real GPU equipment with the minimum load rate from all the spare real GPU equipment as the current real GPU equipment, selecting one unallocated logic equipment of the current real GPU equipment and allocating the logic equipment to the original real GPU equipment, allocating the target logic equipment to the current real GPU equipment, then allocating the logic equipment to the corresponding GPU application container, feeding back the actual path and the authority of the current real GPU equipment to the kubelet equipment manager module, and storing the Pod ID corresponding to the equipment ID of the target logic equipment as the Pod ID corresponding to the current real GPU equipment.
Further, the step S5) is followed by a step of processing that the device ID of the target logical device is multiple, which specifically includes:
s501) selecting the equipment ID of the current target logic equipment;
s502) matching the corresponding original real GPU equipment according to the equipment ID of the current target logic equipment, acquiring the load rate of the original real GPU equipment, and executing the step S503 if the load rate of the original real GPU equipment is not minimum in all the spare real GPU equipment; if the load rate of the original real GPU equipment is the minimum in all the vacant real GPU equipment, distributing the current target logic equipment to the original real GPU equipment, then distributing to a corresponding GPU application container, feeding back the actual path and the authority of the original real GPU equipment to the kubelelet equipment manager module, returning to the step S501) until the equipment IDs of all the target logic equipment are selected, and returning to the step S2), wherein the vacant real GPU equipment is the real GPU equipment with the unallocated logic equipment;
s503) selecting the real GPU equipment with the minimum load rate from all the spare real GPU equipment as the current real GPU equipment, selecting one unallocated logic equipment of the current GPU real equipment and allocating the logic equipment to the original real GPU equipment, allocating the current target logic equipment to the current real GPU equipment, then allocating the logic equipment to the corresponding GPU application container, feeding back the actual path and the authority of the current real GPU equipment to the kubelelet equipment manager module, and returning to the step S501) until the equipment IDs of all the target logic equipment are selected completely.
Further, step S3) further includes: and if the GPU resources applied by the GPU application container are larger than the number of all real GPU equipment, returning a failure message and ending.
The invention also provides a kubernets-based GPU equipment resource allocation system, which comprises GPU computing nodes arranged in kubernets, wherein each GPU computing node comprises at least 2 real GPU equipment, a GPU equipment plug-in and a kubbelet equipment manager module, and the kubbelet equipment manager module is connected with each real GPU equipment through the GPU equipment plug-in, wherein:
the kubel device manager module is used for acquiring corresponding Pod ID, container ID and binding mark and the device ID of the allocated target logic device after the Pod containing the GPU application container is dispatched to the GPU computing node, and sending the Pod ID, the container ID and the binding mark and the device ID of the allocated target logic device to the GPU device plugin together;
the GPU device plug-in is used for acquiring device information of each real GPU device and generating at least two logic devices for each real GPU device according to the size of a video memory of each real GPU device; the system also comprises a kubel device manager module, a logic device and a real GPU device, wherein the logic device is used for sending the device ID of each logic device and the health condition of the corresponding real GPU device to the kubel device manager module; the device ID, the Pod ID, the container ID and the binding mark of the target logical device are also received, wherein the device ID, the Pod ID, the container ID and the binding mark are sent by the kubel device manager module; if the device ID of the target logic device is one and the binding mark is a first value, selecting the real GPU device with the lowest load rate, distributing the target logic device to the selected real GPU device, then distributing the target logic device to the corresponding GPU application container, and redistributing the rest logic devices to the real GPU devices; and if the device ID of the target logic device is one and the binding mark is a second value, matching real GPU devices according to the Pod ID, if the matching result exists, allocating the target logic device to the matched real GPU devices, then allocating the target logic device to the corresponding GPU application container, adjusting the corresponding relation between the rest logic devices and each GPU device, and continuing to report the device ID of each logic device and the health condition of the corresponding real GPU device to the kubel device manager module until the end.
Further, if the device ID of the target logical device is one and the binding flag is the first value, the GPU device plugin is configured to perform:
matching corresponding original real GPU equipment according to the equipment ID of the target logic equipment, acquiring the load rate of the original real GPU equipment, if the load rate of the original real GPU equipment is the minimum of all spare real GPU equipment, distributing the target logic equipment to the original real GPU equipment, then distributing the target logic equipment to a corresponding GPU application container, and feeding back the actual path and authority of the original real GPU equipment to a kubelet equipment manager module, wherein the spare real GPU equipment is real GPU equipment with unallocated logic equipment;
and if the load rate of the original real GPU equipment is not the minimum in all the spare real GPU equipment, selecting the real GPU equipment with the minimum load rate in all the spare real GPU equipment as the current real GPU equipment, selecting one unallocated logic equipment of the current real GPU equipment and allocating the logic equipment to the original real GPU equipment, allocating the target logic equipment to the current real GPU equipment, then allocating the logic equipment to a corresponding GPU application container, and feeding back the actual path and the authority of the current real GPU equipment to the kubel equipment manager module.
Further, if the device ID of the target logical device is one and the binding flag is a second value, the GPU device plugin is configured to perform:
matching all real GPU equipment according to the Pod IDs corresponding to the equipment IDs of the target logic equipment, and if a matching result exists and the matched real GPU equipment has unallocated logic equipment, taking the matched real GPU equipment as the current real GPU equipment;
matching corresponding original real GPU equipment according to the equipment ID of the target logic equipment, if the original real GPU equipment is not the current real GPU equipment, selecting one unallocated logic equipment of the current real GPU equipment and allocating the logic equipment to the original real GPU equipment, allocating the target logic equipment to the current real GPU equipment, then allocating the logic equipment to a corresponding GPU application container, and feeding back the actual path and the authority of the current real GPU equipment to a kubel equipment manager module; and if the original real GPU equipment is the current real GPU equipment, distributing the target logic equipment to the original real GPU equipment, then distributing the target logic equipment to the corresponding GPU application container, and feeding back the actual path and the authority of the original real GPU equipment to the kubel equipment manager module.
Compared with the prior art, the invention has the following advantages:
according to the method and the device, when the container applies for the device resource, the device ID of the logic device is sent, and the Pod ID, the binding mark and the container ID which belong to the logic device are sent, so that whether the GPU device is bound or not is determined according to the binding mark, the corresponding GPU device is provided for the GPU application container according to the load rate of the GPU device under the condition of no binding, and the same GPU device is distributed for the GPU application container with the same Pod ID under the condition of binding, so that the resource sharing of the GPU device is realized, and the resource utilization rate of the GPU device is improved. Meanwhile, according to the load condition of the GPU equipment, the GPU equipment with lower load rate is selected and allocated to the request container, and the GPU equipment resources are allocated by adopting a real-time load balancing strategy, so that the conditions that the GPU equipment resources are idle and are unevenly allocated are avoided.
Drawings
FIG. 1 is a timing diagram of the kubelet device manager module and GPU device plug-in interaction of kubernets.
Fig. 2 is an overall configuration diagram of the system of the embodiment of the present invention.
Fig. 3 is a functional diagram of the kubelet device manager module of an embodiment of the present invention.
Fig. 4 is a schematic diagram of the operation of the GPU device plug-in according to the embodiment of the present invention.
Fig. 5 is a flow chart of a method of an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the drawings and specific preferred embodiments of the description, without thereby limiting the scope of protection of the invention.
The process of interaction between the kubelet device manager module and the GPU device plugin is shown in fig. 1, the GPU device plugin first sends a registration request to the kubelet device manager module, the registration request content includes Version, Endpoint, and ResourceName, where Version represents a plugin protocol Version number; the Endpoint represents a Unix Socket file path monitored by the plug-in; ResourceName represents the device resource name. The kubbelet device manager module will continually send a ListAndWatch request to the GPU device plug-in to obtain the device ID and health. When a container applies for GPU equipment resources, the kubelet equipment manager module sends an equipment allocation request to the GPU equipment plug-in, the request only contains equipment ID of the logic equipment under the default condition, the GPU equipment plug-in adopts a relevant strategy to Allocate the resources of the corresponding GPU equipment to the application container according to the equipment ID, the actual path of the GPU equipment at a GPU computing node and the given authority are returned to the kubelet equipment manager module, and because the request only contains the equipment ID of the logic equipment under the default condition and the equipment ID of the logic equipment comes from equipment information of the GPU equipment, the GPU application container needs to be bound with a single GPU equipment, so that the GPU equipment cannot be shared by a plurality of containers.
With the continuous improvement of the performance of the GPU device, the GPU device has full capability of providing required resources for multiple GPU application containers, and in order to implement sharing of the GPU device and increase the utilization rate of the GPU device, this embodiment improves the existing kubbeelet device manager module and GPU device plug-in, and proposes a kubbeenetes-based GPU device resource allocation system, as shown in fig. 2, which includes a GPU computing node disposed in kubbernees, where the GPU computing node includes at least 2 real GPU devices, and a GPU device plug-in and a kubbeelet device manager module, and the kubbeelet device manager module is connected with each real GPU device through the GPU device plug-in, where:
as shown in fig. 3, the kubel device manager module is configured to perform:
after the Pod containing the GPU application container is dispatched to a GPU computing node, acquiring a corresponding Pod ID, a container ID and a binding mark, and an API Server of Kubelet and a device ID of a target logic device distributed by a dispatcher, and sending the information to a GPU device plug-in as an allocation request;
obtaining a GPU equipment list through a List AndWatch interface at regular time;
as shown in fig. 4, the GPU device plug-in is configured to perform:
acquiring equipment information of each real GPU equipment, and generating at least two logic equipment for each real GPU equipment according to the size of a video memory of each real GPU equipment during initialization;
acquiring the health condition and the load condition of each real GPU device, forming a GPU device list by the device ID of each logic device and the health condition of the corresponding real GPU device, reporting to a kubel device manager module through a List AndWatch interface, and then sending the information of the real GPU device and the logic device to an API Server in Kubernets by the kubel device manager module to provide a basis for the API Server and a scheduler to distribute target logic devices;
and receiving the device ID, the Pod ID, the container ID and the binding mark of the target logic device in the allocation request sent by the kubelet device manager module, selecting a corresponding GPU device allocation strategy according to the number of the device IDs of the target logic devices and the content of the binding mark, and continuing reporting the device ID of each logic device and the health condition of the corresponding real GPU device to the kubelet device manager module after the allocation is finished until the end.
In this embodiment, the GPU device allocation policy includes:
A. applying for non-binding of a single GPU resource, applying for at most one real GPU device for each GPU application container in the Pod, namely, selecting the real GPU device with the lowest load rate if the device ID of the target logic device corresponding to each GPU application container is one, and marking the binding as a first value which represents that the binding is not needed, such as false or 0, allocating the target logic device to the selected GPU device, then allocating the target logic device to the corresponding GPU application container, and reallocating the rest logic devices to each real GPU device;
B. applying for a single GPU resource and binding, wherein multiple GPU application containers are arranged in the same Pod and are required to be bound to the same GPU device, namely the device ID of a target logic device corresponding to each GPU application container is one, and a binding mark is a second value such as true or 1 indicating that the binding is required; and when GPU resources are allocated to another container belonging to the same Pod, allocating the target logic device to the bound real GPU device, then allocating the target logic device to the corresponding GPU application container, and reallocating the rest logic devices to each real GPU device.
C. Applying for a plurality of GPU resources, wherein a GPU application container may need a plurality of real GPU devices, the device IDs of corresponding target logic devices are two or more, the content of the binding marks is ignored, the GPU device with the lowest load rate is selected aiming at each device ID, the logic device corresponding to the device ID is distributed to the selected GPU device, then the logic device is distributed to the corresponding GPU application container, the corresponding relation between the other logic devices and each GPU device is adjusted, and the strategy A is executed for a plurality of times.
D. And if the GPU resources are applied to exceed the actual equipment number, namely the GPU resources applied by the GPU application container exceed the real GPU equipment number, refusing to be distributed and directly returning to fail.
As shown in fig. 2, when the GPU device plug-in is initialized, according to the preset size of the video memory, the logic devices generated for the GPU card a are GPU0, GPU1, and GPU2, the logic devices generated for the GPU card B are GPU3, CPU4, and GPU5, and the logic devices generated for the GPU card C are CPU6, GPU7, GPU8, GPU9, and GPU10, and the sizes of the video memories occupied by the logic devices GPU0 to GPU10 are the same, and at this time, the logic devices GPU0 to GPU10 are not really allocated to the corresponding GPU cards although corresponding to the GPU cards a to C, and need to be allocated until the allocation policy is executed.
The GPU device plug-in maintains a relationship table of the GPU device and the logic device, and a logic device allocation table according to the above-mentioned conditions, as shown in table 1 and table 2, respectively.
TABLE 1 relationship table of GPU devices and logical devices before adjustment
Figure BDA0003653044570000081
TABLE 2 logical device Allocation Table before Regulation
Assigned logical device ID Is assigned to Pod ID Container ID Corresponding GPU real equipment
If the Pod containing the GPU application container is scheduled to the GPU computing node, the kubelet device manager module acquires that the device ID is GPU0, the Pod ID is Pod a, the container ID is 1, and the binding flag is not bound, so that the allocation policy a is executed, but at this time, the GPU card a load corresponding to GPU0 is not the lowest, and the GPU card C load is the lowest, so that the correspondence relationship between GPU0 and GPU card a and any one of the logic devices from GPU6 to GPU10 are replaced, for example, the correspondence relationship between GPU6 and GPU card C is replaced, at this time, the logic device corresponding to GPU card a is GPU6, GPU1, and GPU2, and the logic device corresponding to GPU card C is GPU0, GPU7, GPU8, GPU9, and GPU10, then GPU0 is allocated to GPU card C, and GPU0 is allocated to the corresponding GPU application container, the adjusted GPU device and logic device relationship table, and the logic device allocation table are respectively shown in table 3 and table 4.
TABLE 3 adjusted GPU device and logical device relationship Table
Real GPU device Unallocated logic device Real-time load capacity
GPU A video memory 2G GPU6 GPU1 GPU2 25%
GPU B video memory 2G GPU3 GPU4 GPU5 20%
GPU C video memory 4G GPU7 GPU8 GPU9 GPU10 25%
TABLE 4 logical device Allocation Table after adjustment
Assigned logical device ID Is assigned to Pod ID Container ID Corresponding GPU real equipment
GPU0 Pod A 1 GPU C
Therefore, in this embodiment, an allocation policy is selected according to the device ID, the Pod ID, the binding tag, and the container ID of the target logic device, and GPU device resources are allocated according to a real-time load balancing policy, so that a plurality of GPU application containers share real GPU devices, and the same GPU devices can be allocated to the GPU application containers with the same Pod ID, thereby improving the resource utilization rate of the GPU devices, and avoiding the situations of GPU device resource idleness and unbalanced allocation.
The specific configuration process of the kubel device management module and the GPU device plugin in this embodiment is as follows:
firstly, adding fields containing equipment ID, Pod ID, container ID and binding mark of logic equipment in a device management protocol of kubernets, wherein the modified device management protocol is used by a kubbelet device management module and a GPU (graphics processing unit) device plugin;
then, recompiling and deploying the kubelelet according to the modified device management protocol, and configuring a corresponding GPU application list for the scheduled GPU application container, as follows:
Figure BDA0003653044570000091
in the application list, the Pod UID, the container ID, and the Pod Label are added. The method comprises the steps that whether GPU equipment is bound to GPU application containers of the same Pod can be designated in Pod labels, as shown above, contents of contacts in a GPU application list comprise two GPU application containers, a requries part of each GPU application container requests one real GPU equipment, a binding field in the Label is true, and the binding field represents that the GPU application containers in the pods are bound to the same GPU equipment;
and finally, constructing GPU equipment plug-ins according to the modified equipment management protocol, containerizing the GPU equipment plug-ins, deploying the GPU equipment plug-ins into kubernets in a Deamonset mode, mapping all real GPU equipment of the host machine to containers of the GPU equipment plug-ins, obtaining equipment information of all real GPU equipment of the host machine by the GPU equipment plug-ins through reading a PCI bus so as to indirectly obtain GPU manufacturer models, and obtaining health conditions and load conditions of the real GPU equipment through other related interfaces.
In this embodiment, when the allocation policy a is executed, the GPU device plug-in is configured to execute the following process:
matching corresponding original real GPU equipment according to the equipment ID of the target logic equipment, obtaining the load rate of the original real GPU equipment, if the load rate of the original real GPU equipment is the minimum in all spare real GPU equipment, distributing the target logic equipment to the original GPU equipment, then distributing the target logic equipment to a corresponding GPU application container, and feeding back the actual path and authority of the original real GPU equipment to a kubelet equipment manager module, wherein the spare real GPU equipment is real GPU equipment which is not distributed with logic equipment;
and if the load rate of the original real GPU equipment is not the minimum in all the spare real GPU equipment, selecting the real GPU equipment with the minimum load rate in all the spare real GPU equipment as the current real GPU equipment, selecting one unallocated logic equipment of the current real GPU equipment, exchanging with the target logic equipment, allocating the target logic equipment to the current real GPU equipment, then allocating the target logic equipment to the corresponding GPU application container, and feeding back the actual path and the authority of the current real GPU equipment to the kubel equipment manager module.
In this embodiment, when the allocation policy B is executed, the GPU device plug-in is configured to execute the following process:
matching all real GPU equipment according to the Pod ID corresponding to the equipment ID of the target logic equipment, if a matching result exists and the matched real GPU equipment has unallocated logic equipment, taking the matched real GPU equipment as the current real GPU equipment, matching the matched real GPU equipment to corresponding original real GPU equipment according to the equipment ID of the target logic equipment, if the original real GPU equipment is not the current real GPU equipment, selecting one unallocated logic equipment of the current real GPU equipment, exchanging the logic equipment with the target logic equipment, allocating the target logic equipment to the current real GPU equipment, then allocating the logic equipment to a corresponding GPU application container, and feeding back the actual path and the authority of the current real GPU equipment to a kubel equipment manager module; if the original real GPU equipment is the current real GPU equipment, the target logic equipment is distributed to the original real GPU equipment, then the target logic equipment is distributed to a corresponding GPU application container, and the actual path and the authority of the original real GPU equipment are fed back to the kubel equipment manager module;
if the matching result does not exist, or the matched real GPU equipment does not have unallocated logic equipment, matching the corresponding original real GPU equipment according to the equipment ID of the target logic equipment, obtaining the load rate of the original real GPU equipment, if the load rate of the original real GPU equipment is the minimum in all the spare real GPU equipment, allocating the target logic equipment to the original real GPU equipment, then allocating the target logic equipment to a corresponding GPU application container, feeding back the actual path and the authority of the original real GPU equipment to the kubelet equipment manager module, saving the Pod ID corresponding to the equipment ID of the target logic equipment as the Pod ID corresponding to the original real GPU equipment, wherein the spare real GPU equipment is the real GPU equipment with unallocated logic equipment;
if the load rate of the original real GPU equipment is not the minimum in all the spare real GPU equipment, selecting the real GPU equipment with the minimum load rate in all the spare real GPU equipment as the current real GPU equipment, selecting one unallocated logic equipment of the current real GPU equipment, exchanging with the target logic equipment, allocating the target logic equipment to the current real GPU equipment, then allocating the target logic equipment to a corresponding GPU application container, feeding back the actual path and the authority of the current real GPU equipment to the kubelet equipment manager module, and storing the Pod ID corresponding to the equipment ID of the target logic equipment as the Pod ID corresponding to the current real GPU equipment.
In this embodiment, when the allocation policy C is executed, the GPU device plug-in is configured to execute the following process:
selecting a device ID of a current target logic device, matching the device ID of the current target logic device with a corresponding original real GPU device according to the device ID of the current target logic device, obtaining a load rate of the original real GPU device, if the load rate of the original real GPU device is the minimum of all spare real GPU devices, distributing the current target logic device to the original real GPU device, then distributing the current target logic device to a corresponding GPU application container, and feeding back an actual path and authority of the original real GPU device to a kubel device manager module, wherein the spare real GPU device is a real GPU device with unallocated logic devices; and if the load rate of the original real GPU equipment is not the minimum in all the vacant real GPU equipment, selecting the real GPU equipment with the minimum load rate in all the vacant real GPU equipment as the current real GPU equipment, selecting one unallocated logic equipment of the current real GPU equipment, exchanging the logic equipment with the current target logic equipment, allocating the current target logic equipment to the current GPU equipment, then allocating the logic equipment to a corresponding GPU application container, feeding back the actual path and the authority of the current real GPU equipment to the kubel equipment manager module, and repeating the process until the equipment IDs of all the target logic equipment are selected.
In this embodiment, when the allocation policy D is executed, the GPU device plug-in is configured to execute the following process:
and acquiring the quantity of GPU resources applied by all the GPU application containers, and if the quantity of the GPU resources applied by the GPU application containers is larger than that of all the real GPU equipment, returning a failure message and ending.
Based on the foregoing system, this embodiment further provides a method for allocating resources of GPU devices based on kubernets, which is applied to a GPU compute node in kubernets, where the GPU compute node includes at least 2 real GPU devices, a GPU device plugin and a kubelet device manager module, and the kubelet device manager module is connected to each real GPU device through the GPU device plugin, as shown in fig. 5, the method includes the following steps:
firstly, the kubel device manager module and the GPU device plugin are sequentially constructed, and specifically include: adding fields containing equipment IDs, Pod IDs, container IDs and binding marks in an equipment management protocol of kubernets, configuring a kubbeelet equipment management module and GPU equipment plugins according to the modified equipment management protocol, containerizing the GPU equipment plugins, deploying the GPU equipment plugins into the kubernets in a DeamoSet mode, and mapping all GPU equipment of a host to containers of the GPU equipment plugins.
The following steps are then performed:
s1) the GPU device plug-in obtains the device information of each real GPU device, and generates at least two logic devices for each real GPU device according to the size of the video memory of each real GPU device;
s2) the GPU device plug-in obtains the load condition and the health condition of each real GPU device, reports the device ID of each logic device and the health condition of the corresponding real GPU device to the kubbelet device manager module, and then the kubbelet device manager module sends the information of the real GPU device and the logic device to an API Server in Kubernets;
s3) if the Pod containing the GPU application container is dispatched to the GPU computing node, the kubel device manager module acquires the corresponding Pod ID, the container ID and the binding mark, and the device ID of the allocated target logic device, and sends the Pod ID, the container ID and the binding mark to the GPU device plugin;
s4) if the device ID corresponding to the current GPU application container is one and the binding mark is a first value, the GPU device plug-in selects the real GPU device with the lowest load rate, allocates the corresponding target logic device to the selected real GPU device, then allocates the target logic device to the corresponding GPU application container, adjusts the corresponding relation between the rest logic devices and each real GPU device, and returns to the step S2) until the end; otherwise, executing step S5);
s5) if the device ID corresponding to the current GPU application container is one and the binding mark is a second value, the GPU device plug-in matches the real GPU device according to the Pod ID, if the matching result exists, the corresponding target logic device is distributed to the matched real GPU device, then the target logic device is distributed to the corresponding GPU application container, the corresponding relation between the rest logic devices and each real GPU device is adjusted, and the step S2) is returned to be finished.
Step S4) of this embodiment implements the foregoing allocation policy a, and specifically includes the following steps:
s41) matching the device ID of the target logic device with the corresponding original real GPU device, obtaining a load rate of the original real GPU device, and if the load rate of the original real GPU device is not the minimum among all the empty real GPU devices, performing step S42); if the load rate of the original real GPU equipment is the minimum of all the spare real GPU equipment, distributing the target logic equipment to the original real GPU equipment, then distributing to a corresponding GPU application container, feeding back the actual path and the authority of the original real GPU equipment to the kubelelet equipment manager module, and returning to the step S2), wherein the spare real GPU equipment is the real GPU equipment with the unallocated logic equipment;
s42) selecting the real GPU equipment with the minimum load rate from all the spare real GPU equipment as the current real GPU equipment, selecting one unallocated logic equipment of the current real GPU equipment, exchanging the logic equipment with the target logic equipment, allocating the target logic equipment to the current real GPU equipment, then allocating the logic equipment to the corresponding GPU application container, and feeding back the actual path and the authority of the current real GPU equipment to the kubelelet equipment manager module.
Step S5) of this embodiment implements the foregoing allocation policy B, and specifically includes the following steps:
s51) matching all real GPU equipment according to the Pod IDs corresponding to the equipment IDs of the target logic equipment, and if a matching result exists and the matched real GPU equipment has logic equipment which is not distributed, taking the matched real GPU equipment as the current real GPU equipment;
s52) matching the corresponding original real GPU equipment according to the equipment ID of the target logic equipment, if the original real GPU equipment is not the current real GPU equipment, selecting one unallocated logic equipment of the current real GPU equipment, exchanging the logic equipment with the target logic equipment, allocating the target logic equipment to the current real GPU equipment, then allocating the logic equipment to a corresponding GPU application container, and feeding back the actual path and authority of the current real GPU equipment to the kubel equipment manager module; and if the original real GPU equipment is the current real GPU equipment, distributing the target logic equipment to the original real GPU equipment, then distributing the target logic equipment to the corresponding GPU application container, and feeding back the actual path and the authority of the original real GPU equipment to the kubel equipment manager module.
Step S51) in this embodiment further includes a processing step in which there is no matching result, or the matched real GPU device has no unallocated logic device, which specifically includes:
s51a) if there is no matching result or the matched real GPU device has no unallocated logic device, matching the corresponding original real GPU device according to the device ID of the target logic device, obtaining the load rate of the original real GPU device, and if the load rate of the original real GPU device is not the minimum among all the spare real GPU devices, performing step S51 b); if the load rate of the original real GPU equipment is the minimum in all the spare real GPU equipment, allocating the target logic equipment to the original real GPU equipment, then allocating the target logic equipment to a corresponding GPU application container, feeding back the actual path and the authority of the original real GPU equipment to the kubelelet equipment manager module, saving the Pod ID corresponding to the equipment ID of the target logic equipment as the Pod ID corresponding to the original real GPU equipment, and returning to the step S2, wherein the spare real GPU equipment is the real GPU equipment with unallocated logic equipment;
s51b) selecting the real GPU equipment with the minimum load rate from all the spare real GPU equipment as the current real GPU equipment, selecting one unallocated logic equipment of the current real GPU equipment, exchanging the logic equipment with target logic equipment, allocating the target logic equipment to the current real GPU equipment, then allocating the logic equipment to a corresponding GPU application container, feeding back the actual path and the authority of the current real GPU equipment to a kubelet equipment manager module, and storing the Pod ID corresponding to the equipment ID of the target logic equipment as the Pod ID corresponding to the current real GPU equipment.
Step S5) of this embodiment, further includes a processing step in which the device ID of the target logical device corresponding to the current GPU application container is multiple, that is, an implementation step for implementing the foregoing allocation policy C, which specifically includes:
s501) selecting the equipment ID of the current target logic equipment;
s502) matching the corresponding original real GPU equipment according to the equipment ID of the current target logic equipment, acquiring the load rate of the original real GPU equipment, and executing the step S503 if the load rate of the original real GPU equipment is not minimum in all the spare real GPU equipment; if the load rate of the original real GPU equipment is the minimum in all the vacant real GPU equipment, distributing the current target logic equipment to the original real GPU equipment, then distributing to a corresponding GPU application container, feeding back the actual path and the authority of the original real GPU equipment to the kubelelet equipment manager module, returning to the step S501) until the equipment IDs of all the target logic equipment are selected, and returning to the step S2), wherein the vacant real GPU equipment is the real GPU equipment with the unallocated logic equipment;
s503) selecting the real GPU equipment with the minimum load rate from all the spare real GPU equipment as the current GPU equipment, selecting one unallocated logic equipment of the current real GPU equipment, exchanging the logic equipment with the current target logic equipment, allocating the current target logic equipment to the current real GPU equipment, then allocating the logic equipment to the corresponding GPU application container, feeding back the actual path and the authority of the current real GPU equipment to the kubelet equipment manager module, and returning to the step S501) until the equipment IDs of all the target logic equipment are selected completely.
After step S5), the method further includes an implementation step of implementing the foregoing allocation policy D, which specifically includes: and acquiring the quantity of GPU resources applied by all the GPU application containers, and if the quantity of the GPU resources applied by the GPU application containers is larger than that of all the real GPU equipment, returning a failure message and ending.
In addition, the implementation step of the allocation policy D may also be implemented in step S3) of this embodiment, so as to screen out the unsatisfactory Pod in advance, and avoid wasting the computing resources.
The foregoing is considered as illustrative of the preferred embodiments of the invention and is not to be construed as limiting the invention in any way. Although the present invention has been described with reference to the preferred embodiments, it is not intended to be limited thereto. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical spirit of the present invention should fall within the protection scope of the technical scheme of the present invention, unless the technical spirit of the present invention departs from the content of the technical scheme of the present invention.

Claims (10)

1. A Kubernets-based GPU device resource allocation method is applied to a GPU computing node in the Kubernets, the GPU computing node comprises at least 2 real GPU devices, a GPU device plug-in and a Kubelnet device manager module, the Kubelnet device manager module is connected with each real GPU device through the GPU device plug-in, and the method comprises the following steps:
s1) the GPU device plug-in obtains the device information of each real GPU device, and generates at least two logic devices for each real GPU device according to the size of the video memory of each real GPU device;
s2) the GPU device plug-in obtains the load condition and the health condition of each real GPU device, reports the device ID of each logic device and the health condition of the corresponding real GPU device to the kubbelet device manager module, and then the kubbelet device manager module sends the information of the real GPU device and the logic device to an API Server in Kubernets;
s3) if the Pod containing the GPU application container is dispatched to the GPU computing node, the kubel device manager module acquires the corresponding Pod ID, the container ID and the binding mark, and the device ID of the allocated target logic device, and sends the Pod ID, the container ID and the binding mark to the GPU device plugin;
s4) if the device ID of the target logic device is one and the binding mark is a first value, the GPU device plug-in selects the real GPU device with the lowest load rate, allocates the target logic device to the selected real GPU device, then allocates the target logic device to the corresponding GPU application container, adjusts the corresponding relation between the rest logic devices and each real GPU device, and returns to the step S2) until the end; otherwise, executing step S5);
s5) if the device ID of the target logic device is one and the binding mark is a second value, the GPU device plug-in matches the real GPU device according to the Pod ID, if the matching result exists, the target logic device is distributed to the matched real GPU device, then the target logic device is distributed to the corresponding GPU application container, the corresponding relation between the rest logic devices and each real GPU device is adjusted, and the step S2) is returned until the end.
2. The method for allocating GPU device resources based on kubernets according to claim 1, wherein step S1) is preceded by a step of configuring a kubbelet device management module and a GPU device plug-in, specifically comprising: adding fields containing equipment IDs, Pod IDs, container IDs and binding marks in the equipment management protocol of kubernets, configuring kubbelet equipment management modules and GPU equipment plugins according to the modified equipment management protocol, deploying the GPU equipment plugins into the kubernets in a containerization mode, and mapping all GPU equipment of hosts to containers of the GPU equipment plugins.
3. The kubernets-based GPU device resource allocation method according to claim 1, wherein the step S4) specifically comprises the following steps:
s41) matching the device ID of the target logic device with the corresponding original real GPU device, obtaining a load rate of the original real GPU device, and if the load rate of the original real GPU device is not the minimum among all the empty real GPU devices, performing step S42); if the load rate of the original real GPU equipment is the minimum of all the spare real GPU equipment, distributing the target logic equipment to the original real GPU equipment, then distributing to a corresponding GPU application container, feeding back the actual path and the authority of the original real GPU equipment to the kubelelet equipment manager module, and returning to the step S2), wherein the spare real GPU equipment is the real GPU equipment with the unallocated logic equipment;
s42) selecting the real GPU device with the smallest load ratio among all the spare real GPU devices as the current real GPU device, selecting one unallocated logic device of the current real GPU device, exchanging with the target logic device, allocating the target logic device to the current real GPU device, then allocating to the corresponding GPU application container, and feeding back the actual path and authority of the current real GPU device to the kubelet device manager module.
4. The kubernets-based GPU device resource allocation method as claimed in claim 1, wherein the step S5) specifically includes the following steps:
s51) matching all real GPU equipment according to the Pod IDs corresponding to the equipment IDs of the target logic equipment, and if a matching result exists and the matched real GPU equipment has logic equipment which is not distributed, taking the matched real GPU equipment as the current real GPU equipment;
s52) matching the corresponding original real GPU equipment according to the equipment ID of the target logic equipment, if the original real GPU equipment is not the current real GPU equipment, selecting one unallocated logic equipment of the current real GPU equipment, exchanging the logic equipment with the target logic equipment, allocating the target logic equipment to the current real GPU equipment, then allocating the logic equipment to a corresponding GPU application container, and feeding back the actual path and authority of the current real GPU equipment to the kubel equipment manager module; and if the original real GPU equipment is the current real GPU equipment, distributing the target logic equipment to the original real GPU equipment, then distributing the target logic equipment to the corresponding GPU application container, and feeding back the actual path and the authority of the original real GPU equipment to the kubel equipment manager module.
5. The kubernets-based GPU device resource allocation method according to claim 4, wherein step S51) further includes a processing step in which there is no matching result, or there is no unallocated logic device for the matched real GPU device, specifically including:
s51a) if there is no matching result or the matched real GPU device has no unallocated logic device, matching the corresponding original real GPU device according to the device ID of the target logic device, obtaining the load rate of the original real GPU device, and if the load rate of the original real GPU device is not the minimum among all the spare real GPU devices, performing step S51 b); if the load rate of the original real GPU equipment is the minimum in all the spare real GPU equipment, allocating the target logic equipment to the original real GPU equipment, then allocating the target logic equipment to a corresponding GPU application container, feeding back the actual path and the authority of the original real GPU equipment to the kubelelet equipment manager module, saving the Pod ID corresponding to the equipment ID of the target logic equipment as the Pod ID corresponding to the original real GPU equipment, and returning to the step S2, wherein the spare real GPU equipment is the real GPU equipment with unallocated logic equipment;
s51b) selecting the real GPU equipment with the minimum load rate from all the spare real GPU equipment as the current real GPU equipment, selecting one unallocated logic equipment of the current real GPU equipment, exchanging the logic equipment with target logic equipment, allocating the target logic equipment to the current real GPU equipment, then allocating the logic equipment to a corresponding GPU application container, feeding back the actual path and the authority of the current real GPU equipment to a kubelet equipment manager module, and storing the Pod ID corresponding to the equipment ID of the target logic equipment as the Pod ID corresponding to the current real GPU equipment.
6. The kubernets-based GPU device resource allocation method according to claim 1, wherein after step S5), the method further includes a processing step of multiple device IDs of the target logical device, specifically including:
s501) selecting the equipment ID of the current target logic equipment;
s502) matching the corresponding original real GPU equipment according to the equipment ID of the current target logic equipment, acquiring the load rate of the original real GPU equipment, and executing the step S503 if the load rate of the original real GPU equipment is not minimum in all the spare real GPU equipment; if the load rate of the original real GPU equipment is the minimum in all the vacant real GPU equipment, distributing the current target logic equipment to the original real GPU equipment, then distributing to a corresponding GPU application container, feeding back the actual path and the authority of the original real GPU equipment to the kubelelet equipment manager module, returning to the step S501) until the equipment IDs of all the target logic equipment are selected, and returning to the step S2), wherein the vacant real GPU equipment is the real GPU equipment with the unallocated logic equipment;
s503) selecting the real GPU equipment with the minimum load rate from all the spare real GPU equipment as the current real GPU equipment, selecting one unallocated logic equipment of the current GPU real equipment, exchanging with the current target logic equipment, allocating the current target logic equipment to the current real GPU equipment, then allocating the current real GPU equipment to the corresponding GPU application container, feeding back the actual path and the authority of the current real GPU equipment to the kubelet equipment manager module, and returning to the step S501) until the equipment IDs of all the target logic equipment are selected.
7. The kubernets-based GPU device resource allocation method as claimed in claim 1, wherein the step S3) further includes: and if the GPU resources applied by the GPU application container are larger than the number of all real GPU equipment, returning a failure message and ending.
8. A kubernets-based GPU device resource allocation system, comprising a GPU compute node disposed in kubernets, the GPU compute node including at least 2 real GPU devices, and a GPU device plug-in and a kubbelet device manager module, the kubbelet device manager module being connected with each real GPU device through the GPU device plug-in, wherein:
the kubel device manager module is used for acquiring a corresponding Pod ID, a container ID and a binding mark and a device ID of the allocated target logic device after the Pod containing the GPU application container is dispatched to a GPU computing node, and sending the Pod ID, the container ID and the binding mark and the device ID of the allocated target logic device to the GPU device plugin together;
the GPU device plug-in is used for acquiring device information of each real GPU device and generating at least two logic devices for each real GPU device according to the size of a video memory of each real GPU device; the system also comprises a kubel device manager module, a logic device and a real GPU device, wherein the logic device is used for sending the device ID of each logic device and the health condition of the corresponding real GPU device to the kubel device manager module; the device ID, the Pod ID, the container ID and the binding mark of the target logical device are also received, wherein the device ID, the Pod ID, the container ID and the binding mark are sent by the kubel device manager module; if the device ID of the target logic device is one and the binding mark is a first value, selecting the real GPU device with the lowest load rate, distributing the target logic device to the selected real GPU device, then distributing the target logic device to the corresponding GPU application container, and redistributing the rest logic devices to the real GPU devices; and if the device ID of the target logic device is one and the binding mark is a second value, matching real GPU devices according to the Pod ID, if the matching result exists, allocating the target logic device to the matched real GPU devices, then allocating the target logic device to the corresponding GPU application container, adjusting the corresponding relation between the rest logic devices and each GPU real device, and continuing to report the device ID of each logic device and the health condition of the corresponding real GPU device to the kubel device manager module until the end.
9. The system of claim 8, wherein if the device ID of the target logical device is one and a binding flag is a first value, the GPU device plugin is configured to perform:
matching corresponding original real GPU equipment according to the equipment ID of the target logic equipment, acquiring the load rate of the original real GPU equipment, if the load rate of the original real GPU equipment is the minimum of all spare real GPU equipment, distributing the target logic equipment to the original real GPU equipment, then distributing the target logic equipment to a corresponding GPU application container, and feeding back the actual path and authority of the original real GPU equipment to a kubelet equipment manager module, wherein the spare real GPU equipment is real GPU equipment with unallocated logic equipment;
and if the load rate of the original real GPU equipment is not the minimum in all the spare real GPU equipment, selecting the real GPU equipment with the minimum load rate in all the spare real GPU equipment as the current real GPU equipment, selecting one unallocated logic equipment of the current real GPU equipment, exchanging with the target logic equipment, allocating the target logic equipment to the current real GPU equipment, then allocating the target logic equipment to the corresponding GPU application container, and feeding back the actual path and the authority of the current real GPU equipment to the kubel equipment manager module.
10. The system of claim 8, wherein if the device ID of the target logical device is one and a binding flag is a second value, the GPU device plugin is configured to perform:
matching all real GPU equipment according to the Pod IDs corresponding to the equipment IDs of the target logic equipment, and if a matching result exists and the matched real GPU equipment has unallocated logic equipment, taking the matched real GPU equipment as the current real GPU equipment;
matching corresponding original real GPU equipment according to the equipment ID of the target logic equipment, if the original real GPU equipment is not the current real GPU equipment, selecting one unallocated logic equipment of the current real GPU equipment, exchanging the logic equipment with the target logic equipment, allocating the target logic equipment to the current real GPU equipment, then allocating the logic equipment to a corresponding GPU application container, and feeding back the actual path and the authority of the current real GPU equipment to a kubel equipment manager module; and if the original real GPU equipment is the current real GPU equipment, distributing the target logic equipment to the original real GPU equipment, then distributing the target logic equipment to the corresponding GPU application container, and feeding back the actual path and the authority of the original real GPU equipment to the kubel equipment manager module.
CN202210547160.2A 2022-05-19 2022-05-19 GPU equipment resource allocation method and system based on kubernetes Pending CN114840344A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210547160.2A CN114840344A (en) 2022-05-19 2022-05-19 GPU equipment resource allocation method and system based on kubernetes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210547160.2A CN114840344A (en) 2022-05-19 2022-05-19 GPU equipment resource allocation method and system based on kubernetes

Publications (1)

Publication Number Publication Date
CN114840344A true CN114840344A (en) 2022-08-02

Family

ID=82569188

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210547160.2A Pending CN114840344A (en) 2022-05-19 2022-05-19 GPU equipment resource allocation method and system based on kubernetes

Country Status (1)

Country Link
CN (1) CN114840344A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116089009A (en) * 2023-02-01 2023-05-09 华院计算技术(上海)股份有限公司 GPU resource management method, system, equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116089009A (en) * 2023-02-01 2023-05-09 华院计算技术(上海)股份有限公司 GPU resource management method, system, equipment and storage medium

Similar Documents

Publication Publication Date Title
WO2018149221A1 (en) Device management method and network management system
CN102255926B (en) Method for allocating tasks in Map Reduce system, system and device
EP3073374B1 (en) Thread creation method, service request processing method and related device
CN110098946B (en) Method and device for deploying virtualized network element equipment
US9965826B2 (en) Resource management
CN105242957A (en) Method and system for cloud computing system to allocate GPU resources to virtual machine
CN105408879A (en) Resource management for peripheral component interconnect-express domains
CN112825042A (en) Resource management method and device, electronic equipment and storage medium
CN109376011B (en) Method and device for managing resources in virtualization system
US20200174821A1 (en) System, method and computer program for virtual machine resource allocation
CN109726005A (en) For managing method, server system and the computer program product of resource
WO2020177336A1 (en) Resource scheduling methods, device and system, and central server
EP3745678A1 (en) Storage system, and method and apparatus for allocating storage resources
CN107682391B (en) Electronic device, server allocation control method, and computer-readable storage medium
CN103338230A (en) A method and a system both for processing business data
CN103049328A (en) Distribution method of internal memory resources in computer system
CN114840344A (en) GPU equipment resource allocation method and system based on kubernetes
CN112579622A (en) Method, device and equipment for processing service data
CN104158841A (en) Computing resource allocation method
CN110750339A (en) Thread scheduling method and device and electronic equipment
CN103823712A (en) Data flow processing method and device for multi-CPU virtual machine system
CN113835897A (en) Method for allocating and using GPU resources on distributed computing cluster Kubernets
CN111858035A (en) FPGA equipment allocation method, device, equipment and storage medium
CN111610942A (en) Method and system for generating printed file and readable storage medium
CN116680078A (en) Cloud computing resource scheduling method, device, equipment and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination