CN112698947A

CN112698947A - GPU resource flexible scheduling method based on heterogeneous application platform

Info

Publication number: CN112698947A
Application number: CN202011617125.0A
Authority: CN
Inventors: 王继彬; 刘鑫; 郭莹; 杨美红
Original assignee: Shandong Computer Science Center National Super Computing Center in Jinan
Current assignee: Qilu University of Technology; Shandong Computer Science Center National Super Computing Center in Jinan
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2021-04-23
Anticipated expiration: 2040-12-31
Also published as: CN112698947B

Abstract

The GPU resource elastic scheduling method based on the heterogeneous application platform comprises the following steps: a) acquiring GPU resource utilization rate information; b) setting a trigger threshold and times; c) screening and sequencing the capacity reduction platform queues; d) screening and sorting expansion platform queues; 1) selecting a platform to be shrunk; 2) building a GPU node list; 3) locking state node processing; 4) off-line nodes to be migrated; 5) add to the resource queue; 6) judging whether the volume reduction is finished. The GPU resource flexible scheduling method can be flexibly adjusted according to the load condition of the GPU of the whole platform, so that the maximum utilization of the GPU resource of the platform is realized, the scheduling of the platform is mainly realized by utilizing the existing scheduling component of the platform adapted at the lower layer, the dynamic resource monitoring, the information acquisition and the execution operation issuing are realized through interface calling, and the rapid and flexible deployment implementation of a cloud computing, big data, artificial intelligence and high-performance computing scene platform can be met.

Description

GPU resource flexible scheduling method based on heterogeneous application platform

Technical Field

The invention relates to a GPU resource flexible scheduling method, in particular to a GPU resource flexible scheduling method based on a heterogeneous application platform.

Background

Graphics Processing Unit (GPU) resources are increasingly used in the fields of cloud computing, artificial intelligence, high-performance computing, etc. in recent years due to their excellent parallel computing power, higher bandwidth and dominant frequency. Meanwhile, the price of the GPU resources is generally higher than that of the CPU, so that the GPU resources belong to scarce resources in different computing application scenes, and the resource utilization rate of the GPU resources is generally realized by a resource scheduling mode.

GPU resource scheduling can generally be divided into task-level scheduling, hardware-level scheduling, and node-level scheduling. Task-level scheduling of GPU resources is mainly realized by utilizing characteristics of GPU resources and characteristics of application services, and by means of resource preemption strategies, resource isolation strategies and the like, sharing and multiplexing of task-level resources are realized, so that the method is suitable for homogeneous application scenes (namely the application environment has relatively uniform use modes and requirements on GPU resources); the GPU resource hardware-level scheduling is divided into GPU single scheduling and vGPU virtualization scheduling, the method is a scheduling form which is most widely applied at present, binding of a physical computing environment or a virtual computing environment with a GPU card and a vGPU virtual card can be achieved, the method is mainly supported by scheduling platforms such as Slurm, PBS and LSF in the high-performance computing field, and is mainly supported by scheduling platforms or scheduling service components such as meso, K8S and Nova in the cloud computing, artificial intelligence and big data analysis fields. However, the method needs to be unique for the scheduling system used by the same platform, otherwise, the scheduling information aiming at GPU resources is inconsistent, and resource occupation conflict is caused; the scheduling platform commonly used for physical node level scheduling is the same as the hardware level scheduling platform, all GPU computing nodes belong to a certain resource management platform, sharing multiplexing is not involved, elastic expansion is limited in the platform, heterogeneous resources and heterogeneous application expansion are not supported, and the overall GPU resource utilization rate of the platform is not favorably improved. In summary, the present invention provides a GPU resource scheduling system based on a heterogeneous application platform and an implementation method thereof.

Disclosure of Invention

The invention provides a GPU resource flexible scheduling system based on a heterogeneous application platform and a corresponding implementation method in order to overcome the defects of the technical problems, so that GPU resource scheduling on the heterogeneous platform and node automatic flexible expansion can be realized, the GPU resource flexible scheduling system comprises a cloud computing platform, an artificial intelligence platform and a high-performance computing platform, and the GPU resource utilization rate of the whole platform can be effectively improved.

The invention discloses a GPU resource elastic scheduling method based on a heterogeneous application platform, which comprises the steps of obtaining and detecting GPU resources of the application platform, reducing the capacity of the GPU resources of the application platform and expanding the capacity of the GPU resources of the application platform, and is characterized in that: the application platform GPU resource acquisition and detection are realized through steps a) to d):

a) acquiring GPU resource utilization rate information, periodically acquiring the GPU resource utilization rate information of each application platform, and expressing the GPU resource utilization rate information by an array MS (MS ═ MS)₁ MS₂ ... MS_n]Where n denotes the number of application platforms, the ith application platform MS_iThe resource utilization information of (a) is represented by a six-tuple as follows:

MS_i＝<N_i,RU_i,SRU_i,CU_i,G_i,TP_i> (1)

wherein N is_iSchedulable GPU node List, RU, representing application platform i_iRepresenting the overall nominal resource utilization, SRU, of the application platform i_iResource utilization rate representing the overall triggered resource expansion of the application platform i, and RU_i≤SRU_i，CU_iRepresenting the current resource utilization, G, of the application platform i_iRepresents the total number of GPU cards, TP, of the application platform i_iRepresents the type of the application platform i;

scheduling units among different application platforms are GPU node levels, all GPU nodes belong to isomorphic nodes among different application platforms, and each isomorphic node is configured with c GPU cards;

b) setting triggering threshold and times, and setting the reduced capacity continuous triggering scheduling threshold of the application platform as k₁The number of consecutive triggers is t₁Setting the expansion continuous trigger scheduling threshold of the application platform as k₂The number of consecutive triggers is t₂；

c) Screening and sequencing the reduced capacity platform queues, periodically detecting and judging whether the current resource utilization rate of each application platform is lower than a reduced capacity continuous trigger scheduling threshold k₁And is continuously lower than k₁The number of times of (2) reaches t₁If yes, the application platform is a platform needing capacity reduction and is classified into an application platform queue PS needing capacity reduction; and sequencing the application platforms needing the capacity reduction from high to low by using a formula (2):

wherein, RU_iRepresenting the overall nominal resource utilization, CU, of the application platform i_iRepresenting the current resource utilization rate of the application platform i; c is the number of GPU cards configured for each GPU node;

d) screening and sequencing expansion platform queues, periodically detecting and judging whether the current resource utilization rate of each application platform is higher than an expansion continuous trigger scheduling threshold k₂And is continuously higher than k₂The number of times of (2) reaches t₂If yes, the application platform is a platform needing capacity expansion and is classified into an application platform queue PD needing capacity expansion; and sequencing the application platforms needing capacity expansion from high to low by using a formula (3):

wherein, CU_iRepresenting the current resource utilization, SRU, of the application platform i_iThe resource utilization rate of the whole trigger resource expansion of the application platform i is represented; c is the number of GPU cards configured for each GPU node;

capacity reduction and recovery are required to be carried out on GPU nodes of top-s application platforms in a platform queue PS, and the capacity reduction of GPU resources of the application platforms is realized through steps 1) to 6):

1) selecting platforms to be scaled down, selecting the application platform to be scaled down which is the first in the current sequence from the top-s application platforms, and executing the step 2);

2) establishing a GPU node list, and sequencing GPU nodes in the selected application platform from low to high according to the GPU resource utilization rate to establish the GPU node list; setting the first GPU node in the GPU node list to be in a locking state, and executing the step 3);

3) locking state node processing, namely judging whether the GPU node which is set to be in the locking state currently has application service, and if the GPU node does not have the application service, executing the step 4); if the service exists, executing service migration operation according to the platform type, if the application service migration is successful, executing step 4), if the migration is failed, terminating the current operation, removing the GPU node from the GPU node list, and skipping to execute step 1);

4) off-line nodes to be migrated, attributing GPU nodes in a locked state to GPU nodes to be migrated, off-line the GPU nodes to be migrated from the original application platform, marking the GPU node state type as off-line, removing the GPU node state type from the GPU node list established in the step 2), and executing the step 5);

5) adding the node inherent information of the GPU node to be migrated into a resource queue RQ, wherein the node inherent information comprises an operating system, a kernel version and a compatible application platform; judging whether the capacity reduction of the current application platform is finished or not, if so, executing the step 6), and if not, executing the step 2);

6) judging whether the top-s application platforms selected in the step 1) are all subjected to capacity reduction, and if not, executing the step 1) to perform cycle traversal; and if the capacity reduction is finished, ending the capacity reduction of the GPU resources of the application platform.

The GPU resource elastic scheduling method based on the heterogeneous application platforms is characterized in that elastic expansion of GPU nodes of front top-d application platforms in a platform queue PD is needed, and the GPU resource expansion of the application platforms is realized through the following steps:

A) selecting platforms to be expanded, selecting the application platform to be expanded which is the first application platform to be expanded in the current sequence from the top-d application platforms, and executing the step B);

B) traversing the resource queue RQ, acquiring inherent information of nodes in the resource queue RQ, sequentially and circularly matching the selected source node information with the application platform to be expanded if the queue is not empty, if GPU nodes meeting the expansion requirement exist in the resource queue RQ, the GPU node environment can be multiplexed, turning to the step D), and if all the resource queue RQ is traversed and the GPU nodes are not matched, turning to the step C);

C) selecting a GPU node in the queue resource queue RQ which is sequenced first, marking the state of the GPU node as a locked state, setting the initialized state of the GPU node as a state to be initialized, then performing initialization operation on the GPU node, namely, the node environment cannot be multiplexed and needs to be reset, and turning to the step D) after the initialization operation is completed;

D) initializing GPU nodes to be online, including application platform system initialization and application software initialization, and turning to the step E after initialization is completed);

E) adding the initialized GPU node to be online into the platform to be applied, realizing the capacity expansion of the GPU resource node in the platform to be applied, completing the online operation of the GPU node, and updating the information of the resource queue RQ; judging whether the expansion of the application platform to be expanded is completed, if so, executing the step F), and if not, executing the step B);

F) if an application platform capable of expanding capacity still exists in the top-d, turning to the step A), circulating and traversing, and if the application platform capable of expanding capacity does not exist, ending the expansion of the GPU resources of the application platform.

According to the GPU resource flexible scheduling method based on the heterogeneous application platform, the GPU node environment in the step B) can be multiplexed, namely the GPU node environment only needs software environment configuration without reinstalling an operating system, and the resetting and initialization in the step c) indicate that the node needs to install the operating system through an automatic means and issue the software configuration so as to achieve the purpose of initializing the node.

The invention has the beneficial effects that: the GPU resource elastic scheduling method based on the heterogeneous application platform can solve the problem of elastic expansion of node-level GPU resources on different application platforms, can flexibly adjust according to the GPU load condition of the whole platform, thereby realizing the maximum utilization of the GPU resources of the platform, the scheduling of the platform is mainly realized by utilizing the existing scheduling components of the lower adaptive platform, dynamic resource monitoring, information acquisition and execution operation issuing are realized through interface calling, and the rapid and flexible deployment implementation of a cloud computing, big data, artificial intelligence and high-performance computing scene platform can be met.

Drawings

FIG. 1 is a diagram illustrating a general flexible scheduling of GPU resources according to the present invention;

FIG. 2 is a diagram of an example of a network architecture of an elastic capacity expansion node according to the present invention;

FIG. 3 is a schematic diagram of an elastic scheduling method for a multi-configuration GPU node resource pool according to the present invention;

FIG. 4 is a flowchart of the application platform GPU resource reduction of the present invention;

FIG. 5 is a flowchart illustrating expansion of GPU resources of the application platform according to the present invention.

Detailed Description

The invention is further described with reference to the following figures and examples.

As shown in fig. 1, a schematic diagram of the general flexible scheduling of GPU resources in the present invention is given, where the three application platforms are a high performance computing application platform, a cloud computing application platform, and a container application platform, and their identification IDs are 1,2, and 3, respectively. Meanwhile, the platform also has a public GPU node resource pool which is used for flexible scheduling and dynamic flexible use. The resource flexible expansion core platform mainly comprises a configuration management module, a flexible scheduling module, a resource allocation module, a resource recovery module, an initialization module and an acquisition module. The specific responsible functions of the individual modules are as follows:

the configuration management module is used for configuring the scheduling information and the related parameter setting of the management platform; the flexible scheduling module is mainly used for completing scheduling strategies and request issuing, in the example of fig. 1, the module is used as an abstract implementation of an upper-layer scheduling platform, the bottom layer is in butt joint with schedulers of three application platforms, and a butt joint interface is implemented in an API calling mode; the resource allocation module is used as a resource allocation execution module, receives the command from the scheduling module, and then executes GPU node resource allocation operation according to the collected resource queue PD information and the scheduling strategy; the resource recovery module is used for recovering idle GPU nodes of the recovery application platform, the driving information of the resource recovery module is from the capacity reduction queue PS, the node information after recovery is synchronously updated into the RQ queue, and corresponding node resources logically return to the GPU node resource pool again; the initialization module is realized by two components of software automatic configuration and node initialization, the software automatic configuration is realized by an ansable, and the node initialization mainly realizes the automatic reinstallation and configuration of an operating system by means of bare metal service of a cloud computing platform; the acquisition module acquires resource utilization rate information of each application platform node through periodic operation, and the acquisition is mainly realized through a command line and an API (application programming interface), for example, a cloud computing application platform acquires monitoring data through Gnocchi, Panko and other components.

In the example specific implementation of fig. 1, the specific execution of the resource recycling and allocating module needs to be implemented by interfacing with an API interface or a command line of a corresponding application platform, for example, for a high-performance computing application platform adopting a churm scheduler, the resource recycling and allocating module is implemented by modifying churm.conf and gres.conf files through commands, and for an OpenStack cloud computing platform adopting a Nova component, the resource recycling and allocating module is implemented by calling "add host" and "remove host" API interfaces.

The storage information of the capacity reducing queue PS and the capacity expanding queue PD in the example of fig. 1 is written by the acquisition module of the platform, and is respectively calculated by the calculation formula,

and

and calculating, wherein the number represents the corresponding application platform identification, and sequencing and writing the application platform identification into the PS queue and the PD queue according to the urgency degree of capacity expansion or capacity contraction required.

In order to ensure the service continuity and performance of the application platform, the system selects top-s and top-d nodes to execute respectively aiming at the recovery and distribution of PS and PD queue nodes, and the period of the operation can be set in a management configuration module, such as executing once in 1 hour or 1 day, mainly in order to prevent the performance of the system from being influenced by the frequent flexible scheduling operation of the whole platform.

The resource recovery shown in fig. 1 is that 2 GPU nodes to be recovered and 3 GPU nodes to be allocated to the high-performance computing application are selected from the container application platform, the node information after successful recovery is stored in the resource pool queue RQ, and the node information after completion of allocation is removed from the RQ, thereby ensuring that the node information stored in the RQ queue is always consistent with the GPU node resource pool information.

As shown in example fig. 2, the network required by the GPU node is relatively high, if the flexible scheduling of the high-performance computing application platform and the cloud computing application platform node is required to be supported, three types of an InfiniBand network card, an ethernet operation network card and a virtualization service network card need to be configured for the network configuration of the GPU resource pool node, and if the two platform software environments do not support sharing, a substrate management control network card needs to be configured to support bare metal services, thereby implementing the reinstallation of the software environment of the application. If more application platform GPU nodes are required to share flexible scheduling, the network hardware support condition of the GPU nodes needs to be considered, and corresponding network cards need to be configured in advance to realize automatic flexible expansion.

In order to reduce the complexity of GPU node flexible scheduling, GPU node configuration is required to be consistent in the invention, and aiming at the inconsistent situation, a multi-configuration GPU node flexible scheduling mode is given in figure 3, GPU nodes are only required to be shared among a plurality of specific application platforms, as shown in the figure, configuration 1GPU nodes are only flexibly scheduled between a high-performance computing application platform and a cloud computing application platform, and configuration 2 nodes are only flexibly expanded between the cloud computing application platform and a container application platform. That is, the GPU resource pool in fig. 3 is managed and scheduled in a multi-resource pool form, the resource recovery and allocation are based on a specific resource pool, and in the specific implementation, the scheduling of each resource pool is independent and does not affect each other.

The following is a flow implemented by GPU node flexible scheduling in a cloud computing and high-performance application platform, and the steps of the flow are described as two main flows of flexible capacity expansion and flexible capacity reduction, where the node capacity reduction flow shown in fig. 4 specifically includes the following implementation steps:

step 1) periodically acquiring the resource utilization rate of the whole GPU of each application platform according to an acquisition module, and comparing the resource utilization rate with a trigger scheduling threshold k₁Comparing and then turning to step 2);

step 2) if the continuous trigger scheduling threshold value exceeds t₁Go to step 3), otherwise go to step 1);

step 3) selecting top-s application platforms needing capacity reduction from the PS queue, selecting the first currently sequenced application platform to be capacity reduced from the top-s, and turning to step 4);

step 4) obtaining a node list of the selected application platform needing capacity reduction, sorting the node list from low to high according to the utilization rate, selecting a GPU node with the lowest current utilization rate and setting the GPU node in a locking state, wherein if state locking is realized on the high-performance computing application platform through a queue management command by adopting a slarm scheduler, the cloud computing platform is realized through a host management interface of a nova component, and then, turning to step 5);

step 5) if the locked GPU node to be migrated has application service currently, aiming at the high-performance computing platform, because no migration operation aiming at the operation exists, the triggered migration of the computing node can directly cause the operation termination, so that the direct termination operation, the interruption point and the migration execution after the operation is finished are adopted aiming at the high-performance computing platform; and aiming at the cloud computing platform, the online live migration of the nova component is called to realize the online live migration. If the business application migration in the step fails, terminating the current operation, resetting the GPU node state to be migrated, and turning to the step 4); otherwise go to step 6);

step 6) triggering a command through the elastic scheduling module, sending the command to the resource recovery module, calling a corresponding platform scheduler by the elastic scheduling module, taking the GPU node to be migrated off line from the original application platform, marking the state type of the GPU node as off-line, and then turning to step 7), wherein the high-performance computing and cloud computing application platforms respectively call a queue management command and a scheduling component in the specific implementation process;

step 7), if the capacity reduction of the currently selected capacity reduction application platform is finished, turning to step 8), otherwise, turning to step 4);

and 8) updating the recovered GPU node information to an RQ queue, wherein the updated information comprises an operating system and a kernel version and is compatible with the type of an application platform. Then go to step 9);

and 9) if the top-s has a scalable application platform, turning to the step 3), and otherwise, ending the scaling process.

Contrary to the capacity reduction operation, the resource capacity expansion is mainly completed from the GPU resource pool to the corresponding application resource platform, as shown in fig. 5, the specific implementation steps are as follows:

step 1) periodically acquiring the resource utilization rate of the whole GPU of each application platform according to an acquisition module, and comparing the resource utilization rate with a trigger scheduling threshold k₂Comparing and then turning to step 2);

step 2) if the continuous trigger scheduling threshold value exceeds t₂Go to step 3), otherwise go to step 1);

step 3) selecting top-d application platforms needing capacity expansion from the PD queue, selecting the first application platform to be subjected to capacity expansion in the current sequence (sequence from high to low according to the utilization rate) in the top-d, and turning to the step 4);

step 4), traversing a GPU node resource queue RQ, selecting a current node in the queue, and then turning to the step 5);

and 5) judging that the currently selected node information is matched with the environmental parameters of the application platform to be expanded. The matching content mainly comprises an operating system version, a kernel, application software compatibility and the like, for example, the matching of the cloud computing environment and the high-performance computing energy environment conflicts the environment of cloud service components and HPC application software except the operating system and the kernel. If the node environment is matched, the node environment can be reused by two application platform environments, the step 7) is carried out, and otherwise, the step 6) is carried out;

step 6), if all queues RQ are traversed and the selected platform is not matched, turning to the step 3);

step 7), the resource allocation module selects a node with the first sequence in the queue RQ, marks the state of the node as a locking state, sets the initialization state as to-be-initialized, and then goes to step 8);

step 8) the initialization module calls bare metal services to carry out initialization operation of the operating system, the operation mainly realizes automatic installation and deployment of the operating system through protocols such as IPMI, PXE and the like, and then the step 9) is carried out;

step 9) initializing a node environment to be online, including application platform system initialization and application software initialization, setting the initialization state to be complete, and turning to step 10);

and step 10), the flexible scheduling module of the platform calls a scheduler of the capacity expansion application platform to complete the on-line operation of the node and update RQ queue information. The steps are realized by respectively calling an API (application programming interface) provided by a nova service component and a queue management command of a Slurm scheduler aiming at the cloud computing and high-performance computing application platform. And then to step 11).

Step 11), if the expansion of the application platform to be expanded is completed, turning to step 12), otherwise, turning to step 4);

and step 12) if an application platform capable of expanding capacity still exists in the top-d, turning to the step 1), otherwise, ending the capacity expansion process.

After the two flexible scheduling steps, the system platform can ensure that GPU node resources are fully shared and utilized, so that the resource utilization rate of the whole platform is improved.

Claims

1. A GPU resource flexible scheduling method based on a heterogeneous application platform comprises application platform GPU resource acquisition and detection, application platform GPU resource capacity reduction and application platform GPU resource capacity expansion, and is characterized in that: the application platform GPU resource acquisition and detection are realized through steps a) to d):

MS_i＝<N_i,RU_i,SRU_i,CU_i,G_i,TP_i> (1)

wherein N is_iSchedulable GPU node List, RU, representing application platform i_iRepresenting the overall nominal resource utilization, SRU, of the application platform i_iShould indicateResource utilization rate by using integral trigger resource expansion of platform i, and RU_i≤SRU_i，CU_iRepresenting the current resource utilization, G, of the application platform i_iRepresents the total number of GPU cards, TP, of the application platform i_iRepresents the type of the application platform i;

2. The elastic scheduling method of GPU resources based on heterogeneous application platforms according to claim 1, wherein elastic expansion of GPU nodes of top-d application platforms in a platform queue PD is required, and the expansion of GPU resources of the application platforms is implemented by the following steps:

3. The GPU resource elastic scheduling method based on the heterogeneous application platform, according to claim 2, characterized in that: the GPU node environment in the step B) can be reused, namely the GPU node environment only needs software environment configuration without reinstalling an operating system, and the resetting and initialization in the step c) indicate that the node needs to install the operating system through an automatic means and issue the software configuration so as to achieve the purpose of initializing the node.