CN112988383A

CN112988383A - Resource allocation method, device, equipment and storage medium

Info

Publication number: CN112988383A
Application number: CN202110270221.0A
Authority: CN
Inventors: 李景韬; 蒋佳峻; 何礼祺; 成杰峰
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2021-03-12
Filing date: 2021-03-12
Publication date: 2021-06-18

Abstract

The application is applicable to the technical field of computers, and provides a resource allocation method, a resource allocation device, resource allocation equipment and a storage medium, wherein the resource allocation method comprises the following steps: when a task request needing to schedule GPU resources is obtained, obtaining available GPU nodes and the number of used GPU cards in each available GPU node; determining a weight value corresponding to each idle GPU card based on the number of available GPU nodes and GPU cards and the topological structure corresponding to each GPU card; and determining an allocation result corresponding to the task request based on the number of the needed GPU resources and the weight value corresponding to each idle GPU card, and allocating the GPU card to the task request based on the allocation result. The mode of dynamically adjusting the weight to allocate the GPU resources enables the GPU cards to be more compactly allocated to different task requests; GPU resources are fully and reasonably utilized, and the utilization rate of the GPU resources is maximized.

Description

Resource allocation method, device, equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a resource allocation method, a resource allocation apparatus, a resource allocation device, and a storage medium.

Background

With the development of various industries, machine learning applications of various scenes need to be constructed. In order to train a better-performance machine learning model from mass data, a large-scale Graphics Processing Unit (GPU) needs to be called and managed efficiently. In the face of such demands, cloud-based native technologies based on kubernets are produced.

kubernets, K8s for short, is an abbreviation for 8 instead of 8 characters "ubernet". The Kubernets is an open source and used for managing containerized applications on a plurality of hosts in a cloud platform, the goal of Kubernets is to make deploying containerized applications simple and efficient, and the Kubernets provides a mechanism for application deployment, planning, updating and maintenance.

However, the current kubernets generally allocate GPU cards randomly when scheduling the GPU, which results in low overall usage rate of the GPU and resource waste.

Disclosure of Invention

In view of this, embodiments of the present application provide a resource allocation method, a resource allocation apparatus, a resource allocation device, and a storage medium, so as to solve the problems of low overall usage rate of a GPU and resource waste caused by randomly allocating GPU cards in a conventional resource allocation method.

A first aspect of an embodiment of the present application provides a resource allocation method, including:

when a task request needing to schedule GPU resources is obtained, obtaining available GPU nodes and the number of used GPU cards in each available GPU node; the task request carries the number of GPU resources required for completing the task request;

determining a weight value corresponding to each idle GPU card in each available GPU node based on the available GPU nodes, the number of the GPU cards and the corresponding topological structure of each GPU card;

determining an allocation result corresponding to the task request based on the needed GPU resource quantity and a weight value corresponding to each idle GPU card;

and allocating a GPU card for the task request based on the allocation result.

A second aspect of the embodiments of the present application provides a resource allocation apparatus, including:

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring available GPU nodes and the number of used GPU cards in each available GPU node when a task request needing GPU resource scheduling is acquired; the task request carries the number of GPU resources required for completing the task request;

a first determining unit, configured to determine, based on the available GPU nodes, the number of GPU cards, and a topology corresponding to each GPU card, a weight value corresponding to each idle GPU card in each available GPU node;

a second determining unit, configured to determine, based on the required GPU resource amount and a weight value corresponding to each idle GPU card, an allocation result corresponding to the task request;

and the distribution unit is used for distributing the GPU cards for the task requests based on the distribution result.

A third aspect of embodiments of the present application provides a resource allocation device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor, when executing the computer program, implements the steps of the resource allocation method according to the first aspect.

A fourth aspect of embodiments of the present application provides a computer-readable storage medium, which stores a computer program, which when executed by a processor implements the steps of the resource allocation method according to the first aspect.

A fifth aspect of embodiments of the present application provides a computer program product, which, when running on a resource allocation apparatus, causes the resource allocation apparatus to execute the steps of the resource allocation method according to the first aspect.

The resource allocation method, the resource allocation device and the storage medium provided by the embodiment of the application have the following beneficial effects:

in the resource allocation method provided by the application, when a task request needing to schedule GPU resources is obtained, available GPU nodes and the number of used GPU cards in each available GPU node are obtained; the task request carries the number of GPU resources required for completing the task request; determining a weight value corresponding to each idle GPU card in each available GPU node based on the number of available GPU nodes and GPU cards and the topological structure corresponding to each GPU card; and determining an allocation result corresponding to the task request based on the number of the needed GPU resources and the weight value corresponding to each idle GPU card, and allocating the GPU card to the task request based on the allocation result. In the above manner, the resource allocation device determines a weight value corresponding to each available GPU card according to the topology structure and the use condition corresponding to each GPU card, and allocates the most appropriate GPU node and GPU card to the task request based on the weight value and different requirements of the task request. The method for dynamically adjusting the weight to distribute the GPU resources enables the GPU cards to be more compactly distributed to different task requests, idle GPU resources can better serve the task requests with larger demand, the GPU resources are fully and reasonably utilized, and the GPU resource utilization rate is maximized.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic flow chart of a resource allocation method provided in an embodiment of the present application;

FIG. 2 is a schematic flow chart diagram of a resource allocation method according to another embodiment of the present application;

FIG. 3 is a schematic flow chart diagram of a resource allocation method according to still another embodiment of the present application;

FIG. 4 is a schematic flow chart diagram of a resource allocation method according to another embodiment of the present application;

fig. 5 is a schematic diagram of a resource allocation apparatus according to an embodiment of the present application;

fig. 6 is a schematic diagram of a resource allocation apparatus according to another embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The traditional resource allocation method randomly allocates GPU cards, so that the overall utilization rate of the GPU is low and resources are wasted. In view of this, the present application provides a resource allocation method, in which when a task request that needs to schedule GPU resources is obtained, available GPU nodes and the number of used GPU cards in each available GPU node are obtained; the task request carries the number of GPU resources required for completing the task request; determining a weight value corresponding to each idle GPU card in each available GPU node based on the number of available GPU nodes and GPU cards and the topological structure corresponding to each GPU card; and determining an allocation result corresponding to the task request based on the number of the needed GPU resources and the weight value corresponding to each idle GPU card, and allocating the GPU card to the task request based on the allocation result. In the above manner, the resource allocation device determines a weight value corresponding to each available GPU card according to the topology structure and the use condition corresponding to each GPU card, and allocates the most appropriate GPU node and GPU card to the task request based on the weight value and different requirements of the task request. The method for dynamically adjusting the weight to distribute the GPU resources enables the GPU cards to be more compactly distributed to different task requests, idle GPU resources can better serve the task requests with larger demand, the GPU resources are fully and reasonably utilized, and the GPU resource utilization rate is maximized.

Referring to fig. 1, fig. 1 is a schematic flow chart of a resource allocation method according to an embodiment of the present application. The main execution body of the resource allocation method in this embodiment is a resource allocation device, and the resource allocation device includes, but is not limited to, an independent server, a distributed server, a server cluster, a cloud server, a smart phone, a tablet computer, a Personal Digital Assistant (PDA), a notebook computer, an ultra-mobile Personal computer (UMPC), and the like. The resource allocation method shown in fig. 1 may include S101 to S104, and the specific implementation principle of each step is as follows.

S101: when a task request needing to schedule GPU resources is obtained, obtaining available GPU nodes and the number of used GPU cards in each available GPU node; the task request carries the number of GPU resources required for completing the task request.

When a certain application program needs to schedule GPU resources, a task request needing to schedule the GPU resources is initiated, and the resource allocation equipment receives the task request and acquires available GPU nodes and the number of used GPU cards in each available GPU node. The task request carries the number of GPU resources required to complete the task request. Or the resource allocation device monitors whether a task request requiring the scheduling of the GPU resource exists at present, and when the task request is monitored, the task request is acquired. The number of GPU resources required to complete the task request carried in the task request may be understood as the number of GPU cards required to complete the task request.

For example, some terms referred to in the present embodiment are explained for convenience of understanding.

GPU: graphics Processing Unit (Graphics Processing Unit).

Kubernetes: the system is used for running and coordinating containerized application programs on a resource distribution device and provides a mechanism for deploying, planning, updating and maintaining the application programs.

Pod: is the fundamental unit of the kubernets system, is the smallest component created or deployed by a user, is a combination of one or more containers, and is also a resource object for running containerized applications on the kubernets system.

Api server: belonging to one component in kubernets, the rest of the components can share state interactions through the component.

PCI-Express (PCI-E): are the latest bus and interface standards that may include a display interface as well as a variety of application interfaces.

Kubelet: the method belongs to a component in kubernets and mainly has the functions of acquiring the operation information of a GPU node from a certain place at regular time and reporting the operation information to an Api server at regular time.

Illustratively, kubernets are deployed and run in the resource allocation device in advance. And receiving a task request needing to schedule GPU resources through the Api server, and correspondingly creating a Pod object when receiving the task request. One Pod object corresponds to one task request, which means that each time one task request is received, one Pod object is correspondingly created.

The resource allocation equipment analyzes the packaged information through the Api server to obtain identification information corresponding to all the current GPU nodes, used GPU cards and idle GPU cards corresponding to all the GPU nodes, identification information corresponding to each GPU card and a topological structure corresponding to each GPU card.

And determining the number of available GPU nodes and the number of used GPU cards in each available GPU node in all current GPU nodes and all current GPU cards based on the number of GPU resources required for completing the task request, wherein the number of GPU resources is carried in the task request. For example, the number of used GPU cards and the number of idle GPU cards corresponding to each GPU node may be determined from the used GPU cards and the idle GPU cards corresponding to each GPU node. And selecting the GPU nodes with the number of idle GPU cards in the GPU nodes larger than or equal to the required GPU resource number as the available GPU nodes. The number of used GPU cards in these available GPU nodes is obtained.

Optionally, when there are multiple PCI-es configured in a GPU node, acquiring the number of used GPU cards in the available GPU node includes acquiring the number of all used GPU cards in the GPU node, and acquiring the number of used GPU cards installed in the same PCI-E. For example, when there are 2 PCI-es set in a GPU node, the number of all used GPU cards in the GPU node is acquired, and the number of used GPU cards in each PCI-E is acquired.

Optionally, before S101, the method may further include, based on a preset plug-in, acquiring a topology structure corresponding to each GPU card on each GPU node in all the GPU nodes, identification information corresponding to each GPU node, identification information corresponding to each GPU card, a use condition of each GPU card, and the like, and packaging the information in the form of extended resources. I.e. to encode the information obtained into data that can be understood by a computer. And the Kubelet process acquires the packaged information according to a preset period and uploads the information to the Api server. The preset period is set by the user, and is not limited. For example, the Kubelet process takes the latest encapsulated information every 10 seconds and uploads it to the Api server.

The preset plug-in is mainly used for acquiring information corresponding to each GPU card on each GPU node, and the preset plug-in can be a user-defined device-plug-in. The topological structure corresponding to the GPU card is used for representing the physical layout of the GPU card in the resource allocation device. For example, the resource allocation device is provided with two PCI-Es and 8 GPU cards, and each four GPU cards are installed on one PCI-E according to a preset sequence. For any one of the GPU cards, the relationship between the GPU card and two PCI-E cards and the remaining 7 GPU cards is the physical layout corresponding to the GPU card, that is, the topology corresponding to the GPU card.

In the embodiment of the application, the resource allocation device can timely and accurately acquire the identification information corresponding to all the current GPU nodes, and the topology structure and the identification information corresponding to each GPU card, so that the subsequent resource allocation device can timely and reasonably allocate GPU resources to task requests.

As shown in fig. 2, fig. 2 is a schematic flowchart of a resource allocation method according to another embodiment of the present application, and optionally, in a possible implementation manner, the foregoing S101 includes S1011 to S1013, specifically as follows:

s1011: and acquiring the GPU nodes in the login state and the number of idle GPU cards in each GPU node.

The method includes the steps of obtaining GPU nodes in a login state, namely obtaining currently available GPU nodes, and specifically, analyzing packaged information through an Api server by resource allocation equipment to obtain currently all GPU nodes and used GPU cards and idle GPU cards corresponding to each GPU node in the currently all GPU nodes. Based on the above, the number of idle GPU cards in each GPU node in all the current GPU nodes is counted.

S1012: determining available GPU nodes based on the number of idle GPU cards in each GPU node and the number of needed GPU resources.

The resource allocation equipment can filter out GPU nodes with the number of idle GPU cards which does not accord with the number of GPU resources required by the task request in all GPU nodes in advance, the filtered GPU nodes cannot enter a subsequent allocation flow, and the speed and the accuracy of subsequent GPU resource allocation are improved.

The resource allocation device can also mark GPU nodes with the number of idle GPU cards in the GPU nodes larger than or equal to the required GPU resource number as available GPU nodes. Similarly, only the GPU node marked as available can enter the subsequent distribution process, so that the speed and the accuracy of the subsequent GPU resource distribution are improved.

For example, a task request requiring scheduling of GPU resources carries that the number of GPU resources required to complete the task request is 3, the number of idle GPU cards on all GPU nodes is obtained, GPU nodes whose number of idle GPU cards is greater than or equal to 3 are reserved, and GPU nodes whose number of idle GPU cards is less than 3 are filtered out. Exemplarily, if the number of idle GPU cards on a certain GPU node is 1, filtering out the GPU node; if the number of idle GPU cards on a certain GPU node is 2, filtering out the GPU node; if the number of idle GPU cards on a certain GPU node is 4, reserving the GPU node; and if the number of idle GPU cards on a certain GPU node is 3, reserving the GPU node. These reserved GPU nodes are available GPU nodes, and these GPU nodes and the GPU cards on these GPU nodes will enter the subsequent distribution process.

S1013: the number of used GPU cards in each available GPU node is obtained.

The GPU nodes available here may be understood as GPU nodes that can be subsequently allocated for the task request, or may be understood as GPU nodes corresponding to GPU cards that can be subsequently allocated for the task request. After the available GPU nodes are determined, the number of used GPU cards on each available GPU node is respectively obtained, and the weight value corresponding to each idle GPU card in each available GPU node is conveniently determined in the follow-up process.

S102: and determining a weight value corresponding to each idle GPU card in each available GPU node based on the available GPU nodes, the number of GPU cards and the corresponding topological structure of each GPU card.

The resource allocation device determines a weight value corresponding to each idle GPU card in each available GPU node based on the available GPU nodes, the number of used GPU cards in each available GPU node and the topology corresponding to each used GPU card in each available GPU node. For example, after each available GPU node is determined, the number of used GPU cards and the number of idle GPU cards on each available GPU node are counted, and the topology corresponding to each used GPU card on each available GPU node is obtained. The number of used GPU cards on each available GPU node comprises the number of all used GPU cards in each GPU node and the number of used GPU cards installed in the same PCI-E.

Determining a weight value corresponding to each idle GPU card in each available GPU node through a preset formula, wherein the preset formula is as follows:

in the formula (1), g represents a weight value corresponding to an idle GPU card, and constant is a constant and can be adjusted according to a topology structure corresponding to each GPU card; all1 represents the number of all GPU cards in the GPU node; all2 indicates the number of GPU cards installed on the same PCI-E; used1 indicates the number of GPU cards used in the GPU node; used2 indicates the number of GPU cards that have been used installed on the same PCI-E; requestd represents the amount of GPU resources required for a task request.

For example, the preset formula may be divided into three parts, where the first part is a constant 100, and is used to represent the initial weight value allocated to each GPU card, and here 100 is merely an exemplary illustration, and is not limited thereto.

The second part is

The part is used for updating the weight value based on the topological structure of each GPU card, and under the current topological structure, the less the remaining idle GPU cards are, the higher the weight value is, and the more the remaining idle GPU cards are, the lower the weight value is; it can be understood that the fewer GPU cards left free are preferentially allocated. For example, a certain GPU node is provided with 2 PCI-es and 8 GPU cards, and each 4 GPU cards are installed on one PCI-E according to a preset sequence. Two GPU cards sequentially installed on the first PCI-E are in a state of being used. At this time, constant may take 120, all1 8, used1 2.

The third part is

If the weight value corresponding to the idle GPU card 3 on the first PCI-E is calculated, the number of GPU resources required by the task request is 2. At this time, all2 takes 4, and the values used2 and requestd are set to 2, respectively. And the third part considers the number integrity of the GPU cards, preferentially allocates a complete topological structure to the task request, and improves the data processing efficiency of the task request. As in the above example, the number of GPU resources required for a task request is 2, and according to the calculation result of the weight value, the resource allocation device may allocate the two remaining idle GPU cards on the same PCI-E to the task request, thereby ensuring maximum utilization of GPU card resources on the one hand; on one hand, GPU cards on the same PCI-E are distributed to the task request, so that data interaction inside the task request is facilitated, and the data processing speed of the task request is improved; on the other hand, for another PCI-E, 4 idle GPU cards remain, and when a new task request needs 3 GPU cards or 4 GPU cards, the GPU cards in another PCI-E can be directly allocated without waiting for the task request.

It should be noted that if the third part of the formula is calculated to have a negative value, the calculation result of the third part is discarded.

S103: and determining an allocation result corresponding to the task request based on the number of the needed GPU resources and the weight value corresponding to each idle GPU card.

The distribution result comprises first identification information and second identification information; the first identification information is used for identifying a target GPU node to be allocated to the task request, and the second identification information is used for identifying a target GPU card to be allocated to the task request. The target GPU node is a GPU node which is selected from the available GPU nodes and is to be allocated to the task request, and the target GPU card is a GPU card which is selected from the target GPU node and is to be allocated to the task request. It should be noted that the number of the first identification information and the second identification information is not limited. For example, when there are a plurality of GPU cards that need to be allocated to a task request, there are a plurality of second identification information accordingly, and each second identification information identifies one target GPU node.

Based on the weight value corresponding to each idle GPU card, the GPU card with the larger weight value and the same quantity of needed GPU resources is selected as a target GPU card to be distributed, and meanwhile, a GPU node corresponding to the target GPU card is obtained and used as a target GPU node. And acquiring first identification information corresponding to the target GPU node and second identification information corresponding to the target GPU card, and taking the first identification information and the second identification information as distribution results.

Optionally, in a possible implementation manner, the foregoing S103 may include S1031 to S1032, which are specifically as follows:

s1031: and sequencing the weighted values to obtain a sequencing result.

For example, the GPU cards may be sorted in the order of the weighted values from large to small, resulting in one sort result, or sorted in the order of the weighted values from small to large, resulting in another sort result.

S1032: determining the allocation result based on the sorting result and the required GPU resource amount; the distribution result comprises first identification information and second identification information; the first identification information is used for identifying a target GPU node to be allocated to the task request, and the second identification information is used for identifying a target GPU card to be allocated to the task request.

When the GPU cards are sequenced on the basis of the sequence of the weighted values from large to small, a plurality of GPU cards matched with the number of GPU resources required in the task request are selected from the sequencing result according to the sequence from front to back, and the selected GPU cards are used as target GPU cards to be distributed to the task request. And simultaneously acquiring a target GPU node corresponding to the target GPU card. When the GPU cards are sequenced on the basis of the sequence of the weighted values from small to large, a plurality of GPU cards matched with the number of GPU resources required in the task request are selected from the sequencing result according to the sequence from back to front, and the selected GPU cards are used as target GPU cards to be distributed to the task request. And simultaneously acquiring a target GPU node corresponding to the target GPU card. If the GPU cards with the same weight value appear, the selection may be performed according to the order in the sorting result, or may be performed randomly in the GPU cards with the same weight value, which is not limited to this.

And acquiring first identification information corresponding to the target GPU node and second identification information corresponding to the target GPU card, and taking the first identification information and the second identification information as distribution results.

For example, the example of sorting the GPU cards in order of the weight values from large to small is described. If the number of GPU resources required for completing the task request, which are carried in the task request, is 3, selecting the first 3 GPU cards in the sequencing result, and acquiring second identification information corresponding to the 3 GPU cards and first identification information of a GPU node corresponding to any one GPU card. And taking the first identification information and the 3 pieces of second identification information as the distribution results corresponding to the task request. And the resource allocation equipment allocates the corresponding GPU node and the corresponding GPU card for the task request according to the allocation result.

S104: and allocating the GPU card for the task request based on the allocation result.

Specifically, the target GPU node may be searched for in the available GPU nodes through the first identification information, the target GPU card may be searched for in the target GPU node based on the second identification information, and the searched target GPU card may be allocated to the task request.

Optionally, in a possible implementation manner, the S104 may include S1041 to S1042, which are specifically as follows:

s1041: and searching for the target GPU node from the available GPU nodes based on the first identification information.

The first identification information may uniquely identify one GPU node, and the resource allocation device may find the target GPU node among the available GPU nodes according to the first identification information.

S1042: and searching the target GPU card in the target GPU node based on the second identification information, and distributing the target GPU card to the task request.

Each second identification information may uniquely identify one GPU card. And further searching a target GPU card in the searched target GPU node according to the second identification information, and distributing the target GPU card to the task request.

For example, the allocation result may be stored in annotation information of the Pod, the kubel obtains the allocation result in the Pod corresponding to the current task request, and according to the first identification information and the second identification information included in the allocation result, the corresponding target GPU node and the target GPU card are called and set in the environment variable of the Pod, thereby implementing GPU resource allocation for the task request.

In the embodiment of the application, the resource allocation device determines the weight value corresponding to each available GPU card according to the topology structure and the use condition corresponding to each GPU card, and allocates the most appropriate GPU node and GPU card for the task request based on the weight value and different requirements of the task request. The method for dynamically adjusting the weight to distribute the GPU resources enables the GPU cards to be more compactly distributed to different task requests, idle GPU resources can better serve the task requests with larger demand, the GPU resources are fully and reasonably utilized, and the GPU resource utilization rate is maximized.

As shown in fig. 3, fig. 3 is a schematic flowchart of a resource allocation method according to still another embodiment of the present application, and optionally, in a possible implementation manner, the resource allocation method shown in fig. 3 may include S201 to S206. For reference, the steps S201 to S204 shown in fig. 3 may refer to the above description of S101 to S104, and for brevity, the description is omitted here. The following will specifically explain steps S205 to S206.

S205: the available GPU nodes are updated based on the target GPU node.

And updating the available GPU nodes in real time based on the target GPU nodes. For example, at this time, the target GPU node is a GPU node already allocated to the task request, and the usage state of the target GPU node needs to be recorded as a reference for determining an allocation result of the next task request.

S206: the number of used GPU cards in each available GPU node is updated based on the target GPU card.

The number of used GPU cards in each available GPU node is updated in real time based on the target GPU card. Illustratively, the use state of the target GPU card in the target GPU node is recorded, and the number of used GPU cards and the number of idle GPU cards in the target GPU node are determined again. For example, the state of the target GPU card is updated to be in use, which is beneficial to accurately allocating the corresponding GPU node and GPU card for the next task request.

In the embodiment, the number of the available GPU nodes and the number of the used GPU cards in each available GPU node are updated in real time, so that the corresponding GPU nodes and the corresponding GPU cards can be accurately allocated for the next task request, and meanwhile, the maximization of GPU resource utilization is also guaranteed.

As shown in fig. 4, fig. 4 is a schematic flowchart of a resource allocation method provided in another embodiment of the present application, and optionally, in a possible implementation manner, the resource allocation method shown in fig. 4 may include S301 to S303. For reference, the steps S301 to S302 shown in fig. 4 may refer to the above description of S101 to S102, and for brevity, the description is omitted here. Step S303 will be specifically described below.

S303: and when the distribution result corresponding to the task request is not determined based on the number of the needed GPU resources and the weight value corresponding to each idle GPU card, distributing the GPU card to the task request randomly.

And when the distribution result corresponding to the task request is not determined based on the number of the needed GPU resources and the weight value corresponding to each idle GPU card, namely, when a proper GPU node and GPU card distributed for the task request cannot be selected, randomly selecting the current idle GPU card to distribute to the task request.

Exemplarily, the Kubelet searches an allocation result in annotation information of a Pod corresponding to the current task request, and if the allocation result matched with the task request is not found, it is proved that the allocation result corresponding to the task request cannot be determined, at this time, identification information corresponding to the currently idle GPU card is randomly selected, a GPU node corresponding to the GPU card is selected, and the selected GPU card and the GPU node are set to an environment variable of the Pod corresponding to the current task request, thereby realizing GPU resource allocation to the current task request.

According to the embodiment of the application, when no appropriate GPU resource is allocated to the task request, or the allocation result corresponding to the task request cannot be determined in time due to the fact that resource allocation equipment is blocked and the like, the corresponding GPU resource can be allocated to the task request randomly, and the task corresponding to the task request can be guaranteed to be completed smoothly.

In the prior art, a certain GPU node (which may be understood as a certain server) is provided with two PCI-es and 8 GPU cards, 4 GPU cards with identification information of 0, 1, 2, and 3 are installed on one PCI-E according to a preset sequence, and 4 GPU cards with identification information of 4, 5, 6, and 7 are installed on the other PCI-E according to a preset sequence. If two GPU cards with the identification information of 0 and 1 are used, the number of GPU resources required for completing the task request is 2. The prior art approach would assign a GPU card on another PCI-E to the task request. According to the resource allocation method provided by the application, the weight values corresponding to the idle GPU cards are determined based on the topological structures, the use conditions and the like of the GPU cards, the weight values of the GPU cards with identification information of 2 and 3 obtained through calculation are larger than those of other GPU cards, and therefore the GPU cards with identification information of 2 and 3 on the previous PCI-E are allocated to the task request. When the number of GPU resources required by another task request is 3 or 4, a GPU card on another PCI-E can be allocated to the new task request, and in the prior art, the GPU resources can be allocated after the previous task is finished. The method for distributing the GPU resources by dynamically adjusting the weight values enables the GPU cards to be more compactly distributed to different task requests, enables idle GPU resources to be capable of better serving the task requests with larger demands, makes full and reasonable use of the GPU resources, and maximizes the GPU resource utilization rate.

Referring to fig. 5, fig. 5 is a schematic diagram of a resource allocation apparatus according to an embodiment of the present application. The resource allocation apparatus includes units for executing steps in the embodiments corresponding to fig. 1, fig. 2, fig. 3, and fig. 4. Please refer to the related descriptions in the embodiments corresponding to fig. 1, fig. 2, fig. 3, and fig. 4, respectively. For convenience of explanation, only the portions related to the present embodiment are shown. Referring to fig. 5, it includes:

the obtaining unit 410 is configured to obtain, when a task request that GPU resources need to be scheduled is obtained, available GPU nodes and the number of used GPU cards in each available GPU node; the task request carries the number of GPU resources required for completing the task request;

a first determining unit 420, configured to determine, based on the available GPU nodes, the number of GPU cards, and a topology corresponding to each GPU card, a weight value corresponding to each idle GPU card in each available GPU node;

a second determining unit 430, configured to determine, based on the required GPU resource amount and a weight value corresponding to each idle GPU card, an allocation result corresponding to the task request;

an allocating unit 440, configured to allocate a GPU card to the task request based on the allocation result.

Optionally, the obtaining unit 410 is specifically configured to:

acquiring GPU nodes in a login state and the number of idle GPU cards in each GPU node;

determining the available GPU nodes based on the number of idle GPU cards in each GPU node and the number of needed GPU resources;

the number of used GPU cards in each available GPU node is obtained.

Optionally, the second determining unit 430 is specifically configured to:

sorting the weighted values to obtain a sorting result;

determining the allocation result based on the sorting result and the required GPU resource amount; the distribution result comprises first identification information and second identification information; the first identification information is used for identifying a target GPU node to be allocated to the task request, and the second identification information is used for identifying a target GPU card to be allocated to the task request.

Optionally, the allocating unit 440 is specifically configured to:

searching the target GPU node in the available GPU nodes based on the first identification information;

and searching the target GPU card in the target GPU node based on the second identification information, and distributing the target GPU card to the task request.

Optionally, the resource allocation apparatus further includes:

a first update unit to update the available GPU node based on the target GPU node;

a second updating unit for updating the number of used GPU cards in each available GPU node based on the target GPU card.

Optionally, the resource allocation apparatus further includes:

and the random allocation unit is used for randomly allocating the GPU cards to the task requests when the allocation result corresponding to the task requests is not determined based on the required GPU resource quantity and the weight value corresponding to each idle GPU card.

Referring to fig. 6, fig. 6 is a schematic diagram of a resource allocation apparatus according to another embodiment of the present application. As shown in fig. 6, the resource allocation apparatus 5 of this embodiment includes: a processor 50, a memory 51, and computer readable instructions 52 stored in said memory 51 and executable on said processor 50. The processor 50, when executing the computer readable instructions 52, implements the steps in the various processing resource allocation method embodiments described above, such as S101-S104 shown in fig. 1. Alternatively, the processor 50, when executing the computer readable instructions 52, implements the functions of the units in the above embodiments, such as the units 410 to 440 shown in fig. 5.

Illustratively, the computer readable instructions 52 may be divided into one or more units, which are stored in the memory 51 and executed by the processor 50 to accomplish the present application. The one or more units may be a series of computer-readable instruction segments capable of performing specific functions, which are used to describe the execution of the computer-readable instructions 52 in the resource allocation device 5. For example, the computer readable instructions 52 may be divided into an acquisition unit, a first determination unit, a second determination unit, and an allocation unit, each unit functioning specifically as described above.

The resource allocation device may include, but is not limited to, a processor 50, a memory 51. It will be appreciated by those skilled in the art that fig. 6 is merely an example of the resource allocation device 5, and does not constitute a limitation of the resource allocation device, and may include more or less components than those shown, or combine some components, or different components, for example, the resource allocation device may also include input-output terminals, network access terminals, buses, etc.

The Processor 50 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 51 may be an internal storage unit of the resource allocation device, such as a hard disk or a memory of the resource allocation device. The memory 51 may also be an external storage terminal of the resource distribution device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like equipped on the resource distribution device. Further, the memory 51 may also include both an internal storage unit of the resource allocation device and an external storage terminal. The memory 51 is used for storing the computer readable instructions and other programs and data required by the terminal. The memory 51 may also be used to temporarily store data that has been output or is to be output.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not cause the essential features of the corresponding technical solutions to depart from the spirit scope of the technical solutions of the embodiments of the present application, and are intended to be included within the scope of the present application.

Claims

1. A method for resource allocation, comprising:

and allocating a GPU card for the task request based on the allocation result.

2. The method for allocating resources according to claim 1, wherein the obtaining available GPU nodes and the number of used GPU cards in each available GPU node when obtaining the task request that needs to schedule GPU resources comprises:

the number of used GPU cards in each available GPU node is obtained.

3. The method of claim 1, wherein the determining an allocation result corresponding to the task request based on the number of the needed GPU resources and a weight value corresponding to each idle GPU card comprises:

sorting the weighted values to obtain a sorting result;

4. The method of claim 3, wherein the allocating GPU cards for the task requests based on the allocation results comprises:

5. The method of claim 3, wherein after allocating the GPU card for the task request based on the allocation result, the method further comprises:

updating the available GPU node based on the target GPU node;

updating the number of GPU cards used in each available GPU node based on the target GPU card.

6. The resource allocation method according to any one of claims 1 to 5, wherein when the allocation result corresponding to the task request is not determined based on the required GPU resource amount and the weight value corresponding to each idle GPU card, a GPU card is randomly allocated to the task request.

7. A resource allocation apparatus, comprising:

8. The resource allocation apparatus according to claim 7, wherein the obtaining unit is specifically configured to:

the number of used GPU cards in each available GPU node is obtained.

9. A resource allocation device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the resource allocation method according to any one of claims 1 to 6 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the resource allocation method according to any one of claims 1 to 6.