CN112698947A - GPU resource flexible scheduling method based on heterogeneous application platform - Google Patents

GPU resource flexible scheduling method based on heterogeneous application platform Download PDF

Info

Publication number
CN112698947A
CN112698947A CN202011617125.0A CN202011617125A CN112698947A CN 112698947 A CN112698947 A CN 112698947A CN 202011617125 A CN202011617125 A CN 202011617125A CN 112698947 A CN112698947 A CN 112698947A
Authority
CN
China
Prior art keywords
gpu
platform
application platform
node
application
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011617125.0A
Other languages
Chinese (zh)
Other versions
CN112698947B (en
Inventor
王继彬
刘鑫
郭莹
杨美红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qilu University of Technology
Shandong Computer Science Center National Super Computing Center in Jinan
Original Assignee
Shandong Computer Science Center National Super Computing Center in Jinan
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Computer Science Center National Super Computing Center in Jinan filed Critical Shandong Computer Science Center National Super Computing Center in Jinan
Priority to CN202011617125.0A priority Critical patent/CN112698947B/en
Publication of CN112698947A publication Critical patent/CN112698947A/en
Application granted granted Critical
Publication of CN112698947B publication Critical patent/CN112698947B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/4401Bootstrapping
    • G06F9/4403Processor initialisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5012Processor sets
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/548Queue

Abstract

The GPU resource elastic scheduling method based on the heterogeneous application platform comprises the following steps: a) acquiring GPU resource utilization rate information; b) setting a trigger threshold and times; c) screening and sequencing the capacity reduction platform queues; d) screening and sorting expansion platform queues; 1) selecting a platform to be shrunk; 2) building a GPU node list; 3) locking state node processing; 4) off-line nodes to be migrated; 5) add to the resource queue; 6) judging whether the volume reduction is finished. The GPU resource flexible scheduling method can be flexibly adjusted according to the load condition of the GPU of the whole platform, so that the maximum utilization of the GPU resource of the platform is realized, the scheduling of the platform is mainly realized by utilizing the existing scheduling component of the platform adapted at the lower layer, the dynamic resource monitoring, the information acquisition and the execution operation issuing are realized through interface calling, and the rapid and flexible deployment implementation of a cloud computing, big data, artificial intelligence and high-performance computing scene platform can be met.

Description

GPU resource flexible scheduling method based on heterogeneous application platform
Technical Field
The invention relates to a GPU resource flexible scheduling method, in particular to a GPU resource flexible scheduling method based on a heterogeneous application platform.
Background
Graphics Processing Unit (GPU) resources are increasingly used in the fields of cloud computing, artificial intelligence, high-performance computing, etc. in recent years due to their excellent parallel computing power, higher bandwidth and dominant frequency. Meanwhile, the price of the GPU resources is generally higher than that of the CPU, so that the GPU resources belong to scarce resources in different computing application scenes, and the resource utilization rate of the GPU resources is generally realized by a resource scheduling mode.
GPU resource scheduling can generally be divided into task-level scheduling, hardware-level scheduling, and node-level scheduling. Task-level scheduling of GPU resources is mainly realized by utilizing characteristics of GPU resources and characteristics of application services, and by means of resource preemption strategies, resource isolation strategies and the like, sharing and multiplexing of task-level resources are realized, so that the method is suitable for homogeneous application scenes (namely the application environment has relatively uniform use modes and requirements on GPU resources); the GPU resource hardware-level scheduling is divided into GPU single scheduling and vGPU virtualization scheduling, the method is a scheduling form which is most widely applied at present, binding of a physical computing environment or a virtual computing environment with a GPU card and a vGPU virtual card can be achieved, the method is mainly supported by scheduling platforms such as Slurm, PBS and LSF in the high-performance computing field, and is mainly supported by scheduling platforms or scheduling service components such as meso, K8S and Nova in the cloud computing, artificial intelligence and big data analysis fields. However, the method needs to be unique for the scheduling system used by the same platform, otherwise, the scheduling information aiming at GPU resources is inconsistent, and resource occupation conflict is caused; the scheduling platform commonly used for physical node level scheduling is the same as the hardware level scheduling platform, all GPU computing nodes belong to a certain resource management platform, sharing multiplexing is not involved, elastic expansion is limited in the platform, heterogeneous resources and heterogeneous application expansion are not supported, and the overall GPU resource utilization rate of the platform is not favorably improved. In summary, the present invention provides a GPU resource scheduling system based on a heterogeneous application platform and an implementation method thereof.
Disclosure of Invention
The invention provides a GPU resource flexible scheduling system based on a heterogeneous application platform and a corresponding implementation method in order to overcome the defects of the technical problems, so that GPU resource scheduling on the heterogeneous platform and node automatic flexible expansion can be realized, the GPU resource flexible scheduling system comprises a cloud computing platform, an artificial intelligence platform and a high-performance computing platform, and the GPU resource utilization rate of the whole platform can be effectively improved.
The invention discloses a GPU resource elastic scheduling method based on a heterogeneous application platform, which comprises the steps of obtaining and detecting GPU resources of the application platform, reducing the capacity of the GPU resources of the application platform and expanding the capacity of the GPU resources of the application platform, and is characterized in that: the application platform GPU resource acquisition and detection are realized through steps a) to d):
a) acquiring GPU resource utilization rate information, periodically acquiring the GPU resource utilization rate information of each application platform, and expressing the GPU resource utilization rate information by an array MS (MS ═ MS)1 MS2 ... MSn]Where n denotes the number of application platforms, the ith application platform MSiThe resource utilization information of (a) is represented by a six-tuple as follows:
MSi=<Ni,RUi,SRUi,CUi,Gi,TPi> (1)
wherein N isiSchedulable GPU node List, RU, representing application platform iiRepresenting the overall nominal resource utilization, SRU, of the application platform iiResource utilization rate representing the overall triggered resource expansion of the application platform i, and RUi≤SRUi,CUiRepresenting the current resource utilization, G, of the application platform iiRepresents the total number of GPU cards, TP, of the application platform iiRepresents the type of the application platform i;
scheduling units among different application platforms are GPU node levels, all GPU nodes belong to isomorphic nodes among different application platforms, and each isomorphic node is configured with c GPU cards;
b) setting triggering threshold and times, and setting the reduced capacity continuous triggering scheduling threshold of the application platform as k1The number of consecutive triggers is t1Setting the expansion continuous trigger scheduling threshold of the application platform as k2The number of consecutive triggers is t2
c) Screening and sequencing the reduced capacity platform queues, periodically detecting and judging whether the current resource utilization rate of each application platform is lower than a reduced capacity continuous trigger scheduling threshold k1And is continuously lower than k1The number of times of (2) reaches t1If yes, the application platform is a platform needing capacity reduction and is classified into an application platform queue PS needing capacity reduction; and sequencing the application platforms needing the capacity reduction from high to low by using a formula (2):
Figure BDA0002875173140000031
wherein, RUiRepresenting the overall nominal resource utilization, CU, of the application platform iiRepresenting the current resource utilization rate of the application platform i; c is the number of GPU cards configured for each GPU node;
d) screening and sequencing expansion platform queues, periodically detecting and judging whether the current resource utilization rate of each application platform is higher than an expansion continuous trigger scheduling threshold k2And is continuously higher than k2The number of times of (2) reaches t2If yes, the application platform is a platform needing capacity expansion and is classified into an application platform queue PD needing capacity expansion; and sequencing the application platforms needing capacity expansion from high to low by using a formula (3):
Figure BDA0002875173140000032
wherein, CUiRepresenting the current resource utilization, SRU, of the application platform iiThe resource utilization rate of the whole trigger resource expansion of the application platform i is represented; c is the number of GPU cards configured for each GPU node;
capacity reduction and recovery are required to be carried out on GPU nodes of top-s application platforms in a platform queue PS, and the capacity reduction of GPU resources of the application platforms is realized through steps 1) to 6):
1) selecting platforms to be scaled down, selecting the application platform to be scaled down which is the first in the current sequence from the top-s application platforms, and executing the step 2);
2) establishing a GPU node list, and sequencing GPU nodes in the selected application platform from low to high according to the GPU resource utilization rate to establish the GPU node list; setting the first GPU node in the GPU node list to be in a locking state, and executing the step 3);
3) locking state node processing, namely judging whether the GPU node which is set to be in the locking state currently has application service, and if the GPU node does not have the application service, executing the step 4); if the service exists, executing service migration operation according to the platform type, if the application service migration is successful, executing step 4), if the migration is failed, terminating the current operation, removing the GPU node from the GPU node list, and skipping to execute step 1);
4) off-line nodes to be migrated, attributing GPU nodes in a locked state to GPU nodes to be migrated, off-line the GPU nodes to be migrated from the original application platform, marking the GPU node state type as off-line, removing the GPU node state type from the GPU node list established in the step 2), and executing the step 5);
5) adding the node inherent information of the GPU node to be migrated into a resource queue RQ, wherein the node inherent information comprises an operating system, a kernel version and a compatible application platform; judging whether the capacity reduction of the current application platform is finished or not, if so, executing the step 6), and if not, executing the step 2);
6) judging whether the top-s application platforms selected in the step 1) are all subjected to capacity reduction, and if not, executing the step 1) to perform cycle traversal; and if the capacity reduction is finished, ending the capacity reduction of the GPU resources of the application platform.
The GPU resource elastic scheduling method based on the heterogeneous application platforms is characterized in that elastic expansion of GPU nodes of front top-d application platforms in a platform queue PD is needed, and the GPU resource expansion of the application platforms is realized through the following steps:
A) selecting platforms to be expanded, selecting the application platform to be expanded which is the first application platform to be expanded in the current sequence from the top-d application platforms, and executing the step B);
B) traversing the resource queue RQ, acquiring inherent information of nodes in the resource queue RQ, sequentially and circularly matching the selected source node information with the application platform to be expanded if the queue is not empty, if GPU nodes meeting the expansion requirement exist in the resource queue RQ, the GPU node environment can be multiplexed, turning to the step D), and if all the resource queue RQ is traversed and the GPU nodes are not matched, turning to the step C);
C) selecting a GPU node in the queue resource queue RQ which is sequenced first, marking the state of the GPU node as a locked state, setting the initialized state of the GPU node as a state to be initialized, then performing initialization operation on the GPU node, namely, the node environment cannot be multiplexed and needs to be reset, and turning to the step D) after the initialization operation is completed;
D) initializing GPU nodes to be online, including application platform system initialization and application software initialization, and turning to the step E after initialization is completed);
E) adding the initialized GPU node to be online into the platform to be applied, realizing the capacity expansion of the GPU resource node in the platform to be applied, completing the online operation of the GPU node, and updating the information of the resource queue RQ; judging whether the expansion of the application platform to be expanded is completed, if so, executing the step F), and if not, executing the step B);
F) if an application platform capable of expanding capacity still exists in the top-d, turning to the step A), circulating and traversing, and if the application platform capable of expanding capacity does not exist, ending the expansion of the GPU resources of the application platform.
According to the GPU resource flexible scheduling method based on the heterogeneous application platform, the GPU node environment in the step B) can be multiplexed, namely the GPU node environment only needs software environment configuration without reinstalling an operating system, and the resetting and initialization in the step c) indicate that the node needs to install the operating system through an automatic means and issue the software configuration so as to achieve the purpose of initializing the node.
The invention has the beneficial effects that: the GPU resource elastic scheduling method based on the heterogeneous application platform can solve the problem of elastic expansion of node-level GPU resources on different application platforms, can flexibly adjust according to the GPU load condition of the whole platform, thereby realizing the maximum utilization of the GPU resources of the platform, the scheduling of the platform is mainly realized by utilizing the existing scheduling components of the lower adaptive platform, dynamic resource monitoring, information acquisition and execution operation issuing are realized through interface calling, and the rapid and flexible deployment implementation of a cloud computing, big data, artificial intelligence and high-performance computing scene platform can be met.
Drawings
FIG. 1 is a diagram illustrating a general flexible scheduling of GPU resources according to the present invention;
FIG. 2 is a diagram of an example of a network architecture of an elastic capacity expansion node according to the present invention;
FIG. 3 is a schematic diagram of an elastic scheduling method for a multi-configuration GPU node resource pool according to the present invention;
FIG. 4 is a flowchart of the application platform GPU resource reduction of the present invention;
FIG. 5 is a flowchart illustrating expansion of GPU resources of the application platform according to the present invention.
Detailed Description
The invention is further described with reference to the following figures and examples.
As shown in fig. 1, a schematic diagram of the general flexible scheduling of GPU resources in the present invention is given, where the three application platforms are a high performance computing application platform, a cloud computing application platform, and a container application platform, and their identification IDs are 1,2, and 3, respectively. Meanwhile, the platform also has a public GPU node resource pool which is used for flexible scheduling and dynamic flexible use. The resource flexible expansion core platform mainly comprises a configuration management module, a flexible scheduling module, a resource allocation module, a resource recovery module, an initialization module and an acquisition module. The specific responsible functions of the individual modules are as follows:
the configuration management module is used for configuring the scheduling information and the related parameter setting of the management platform; the flexible scheduling module is mainly used for completing scheduling strategies and request issuing, in the example of fig. 1, the module is used as an abstract implementation of an upper-layer scheduling platform, the bottom layer is in butt joint with schedulers of three application platforms, and a butt joint interface is implemented in an API calling mode; the resource allocation module is used as a resource allocation execution module, receives the command from the scheduling module, and then executes GPU node resource allocation operation according to the collected resource queue PD information and the scheduling strategy; the resource recovery module is used for recovering idle GPU nodes of the recovery application platform, the driving information of the resource recovery module is from the capacity reduction queue PS, the node information after recovery is synchronously updated into the RQ queue, and corresponding node resources logically return to the GPU node resource pool again; the initialization module is realized by two components of software automatic configuration and node initialization, the software automatic configuration is realized by an ansable, and the node initialization mainly realizes the automatic reinstallation and configuration of an operating system by means of bare metal service of a cloud computing platform; the acquisition module acquires resource utilization rate information of each application platform node through periodic operation, and the acquisition is mainly realized through a command line and an API (application programming interface), for example, a cloud computing application platform acquires monitoring data through Gnocchi, Panko and other components.
In the example specific implementation of fig. 1, the specific execution of the resource recycling and allocating module needs to be implemented by interfacing with an API interface or a command line of a corresponding application platform, for example, for a high-performance computing application platform adopting a churm scheduler, the resource recycling and allocating module is implemented by modifying churm.conf and gres.conf files through commands, and for an OpenStack cloud computing platform adopting a Nova component, the resource recycling and allocating module is implemented by calling "add host" and "remove host" API interfaces.
The storage information of the capacity reducing queue PS and the capacity expanding queue PD in the example of fig. 1 is written by the acquisition module of the platform, and is respectively calculated by the calculation formula,
Figure BDA0002875173140000061
and
Figure BDA0002875173140000062
and calculating, wherein the number represents the corresponding application platform identification, and sequencing and writing the application platform identification into the PS queue and the PD queue according to the urgency degree of capacity expansion or capacity contraction required.
In order to ensure the service continuity and performance of the application platform, the system selects top-s and top-d nodes to execute respectively aiming at the recovery and distribution of PS and PD queue nodes, and the period of the operation can be set in a management configuration module, such as executing once in 1 hour or 1 day, mainly in order to prevent the performance of the system from being influenced by the frequent flexible scheduling operation of the whole platform.
The resource recovery shown in fig. 1 is that 2 GPU nodes to be recovered and 3 GPU nodes to be allocated to the high-performance computing application are selected from the container application platform, the node information after successful recovery is stored in the resource pool queue RQ, and the node information after completion of allocation is removed from the RQ, thereby ensuring that the node information stored in the RQ queue is always consistent with the GPU node resource pool information.
As shown in example fig. 2, the network required by the GPU node is relatively high, if the flexible scheduling of the high-performance computing application platform and the cloud computing application platform node is required to be supported, three types of an InfiniBand network card, an ethernet operation network card and a virtualization service network card need to be configured for the network configuration of the GPU resource pool node, and if the two platform software environments do not support sharing, a substrate management control network card needs to be configured to support bare metal services, thereby implementing the reinstallation of the software environment of the application. If more application platform GPU nodes are required to share flexible scheduling, the network hardware support condition of the GPU nodes needs to be considered, and corresponding network cards need to be configured in advance to realize automatic flexible expansion.
In order to reduce the complexity of GPU node flexible scheduling, GPU node configuration is required to be consistent in the invention, and aiming at the inconsistent situation, a multi-configuration GPU node flexible scheduling mode is given in figure 3, GPU nodes are only required to be shared among a plurality of specific application platforms, as shown in the figure, configuration 1GPU nodes are only flexibly scheduled between a high-performance computing application platform and a cloud computing application platform, and configuration 2 nodes are only flexibly expanded between the cloud computing application platform and a container application platform. That is, the GPU resource pool in fig. 3 is managed and scheduled in a multi-resource pool form, the resource recovery and allocation are based on a specific resource pool, and in the specific implementation, the scheduling of each resource pool is independent and does not affect each other.
The following is a flow implemented by GPU node flexible scheduling in a cloud computing and high-performance application platform, and the steps of the flow are described as two main flows of flexible capacity expansion and flexible capacity reduction, where the node capacity reduction flow shown in fig. 4 specifically includes the following implementation steps:
step 1) periodically acquiring the resource utilization rate of the whole GPU of each application platform according to an acquisition module, and comparing the resource utilization rate with a trigger scheduling threshold k1Comparing and then turning to step 2);
step 2) if the continuous trigger scheduling threshold value exceeds t1Go to step 3), otherwise go to step 1);
step 3) selecting top-s application platforms needing capacity reduction from the PS queue, selecting the first currently sequenced application platform to be capacity reduced from the top-s, and turning to step 4);
step 4) obtaining a node list of the selected application platform needing capacity reduction, sorting the node list from low to high according to the utilization rate, selecting a GPU node with the lowest current utilization rate and setting the GPU node in a locking state, wherein if state locking is realized on the high-performance computing application platform through a queue management command by adopting a slarm scheduler, the cloud computing platform is realized through a host management interface of a nova component, and then, turning to step 5);
step 5) if the locked GPU node to be migrated has application service currently, aiming at the high-performance computing platform, because no migration operation aiming at the operation exists, the triggered migration of the computing node can directly cause the operation termination, so that the direct termination operation, the interruption point and the migration execution after the operation is finished are adopted aiming at the high-performance computing platform; and aiming at the cloud computing platform, the online live migration of the nova component is called to realize the online live migration. If the business application migration in the step fails, terminating the current operation, resetting the GPU node state to be migrated, and turning to the step 4); otherwise go to step 6);
step 6) triggering a command through the elastic scheduling module, sending the command to the resource recovery module, calling a corresponding platform scheduler by the elastic scheduling module, taking the GPU node to be migrated off line from the original application platform, marking the state type of the GPU node as off-line, and then turning to step 7), wherein the high-performance computing and cloud computing application platforms respectively call a queue management command and a scheduling component in the specific implementation process;
step 7), if the capacity reduction of the currently selected capacity reduction application platform is finished, turning to step 8), otherwise, turning to step 4);
and 8) updating the recovered GPU node information to an RQ queue, wherein the updated information comprises an operating system and a kernel version and is compatible with the type of an application platform. Then go to step 9);
and 9) if the top-s has a scalable application platform, turning to the step 3), and otherwise, ending the scaling process.
Contrary to the capacity reduction operation, the resource capacity expansion is mainly completed from the GPU resource pool to the corresponding application resource platform, as shown in fig. 5, the specific implementation steps are as follows:
step 1) periodically acquiring the resource utilization rate of the whole GPU of each application platform according to an acquisition module, and comparing the resource utilization rate with a trigger scheduling threshold k2Comparing and then turning to step 2);
step 2) if the continuous trigger scheduling threshold value exceeds t2Go to step 3), otherwise go to step 1);
step 3) selecting top-d application platforms needing capacity expansion from the PD queue, selecting the first application platform to be subjected to capacity expansion in the current sequence (sequence from high to low according to the utilization rate) in the top-d, and turning to the step 4);
step 4), traversing a GPU node resource queue RQ, selecting a current node in the queue, and then turning to the step 5);
and 5) judging that the currently selected node information is matched with the environmental parameters of the application platform to be expanded. The matching content mainly comprises an operating system version, a kernel, application software compatibility and the like, for example, the matching of the cloud computing environment and the high-performance computing energy environment conflicts the environment of cloud service components and HPC application software except the operating system and the kernel. If the node environment is matched, the node environment can be reused by two application platform environments, the step 7) is carried out, and otherwise, the step 6) is carried out;
step 6), if all queues RQ are traversed and the selected platform is not matched, turning to the step 3);
step 7), the resource allocation module selects a node with the first sequence in the queue RQ, marks the state of the node as a locking state, sets the initialization state as to-be-initialized, and then goes to step 8);
step 8) the initialization module calls bare metal services to carry out initialization operation of the operating system, the operation mainly realizes automatic installation and deployment of the operating system through protocols such as IPMI, PXE and the like, and then the step 9) is carried out;
step 9) initializing a node environment to be online, including application platform system initialization and application software initialization, setting the initialization state to be complete, and turning to step 10);
and step 10), the flexible scheduling module of the platform calls a scheduler of the capacity expansion application platform to complete the on-line operation of the node and update RQ queue information. The steps are realized by respectively calling an API (application programming interface) provided by a nova service component and a queue management command of a Slurm scheduler aiming at the cloud computing and high-performance computing application platform. And then to step 11).
Step 11), if the expansion of the application platform to be expanded is completed, turning to step 12), otherwise, turning to step 4);
and step 12) if an application platform capable of expanding capacity still exists in the top-d, turning to the step 1), otherwise, ending the capacity expansion process.
After the two flexible scheduling steps, the system platform can ensure that GPU node resources are fully shared and utilized, so that the resource utilization rate of the whole platform is improved.

Claims (3)

1. A GPU resource flexible scheduling method based on a heterogeneous application platform comprises application platform GPU resource acquisition and detection, application platform GPU resource capacity reduction and application platform GPU resource capacity expansion, and is characterized in that: the application platform GPU resource acquisition and detection are realized through steps a) to d):
a) acquiring GPU resource utilization rate information, periodically acquiring the GPU resource utilization rate information of each application platform, and expressing the GPU resource utilization rate information by an array MS (MS ═ MS)1 MS2 ... MSn]Where n denotes the number of application platforms, the ith application platform MSiThe resource utilization information of (a) is represented by a six-tuple as follows:
MSi=<Ni,RUi,SRUi,CUi,Gi,TPi> (1)
wherein N isiSchedulable GPU node List, RU, representing application platform iiRepresenting the overall nominal resource utilization, SRU, of the application platform iiShould indicateResource utilization rate by using integral trigger resource expansion of platform i, and RUi≤SRUi,CUiRepresenting the current resource utilization, G, of the application platform iiRepresents the total number of GPU cards, TP, of the application platform iiRepresents the type of the application platform i;
scheduling units among different application platforms are GPU node levels, all GPU nodes belong to isomorphic nodes among different application platforms, and each isomorphic node is configured with c GPU cards;
b) setting triggering threshold and times, and setting the reduced capacity continuous triggering scheduling threshold of the application platform as k1The number of consecutive triggers is t1Setting the expansion continuous trigger scheduling threshold of the application platform as k2The number of consecutive triggers is t2
c) Screening and sequencing the reduced capacity platform queues, periodically detecting and judging whether the current resource utilization rate of each application platform is lower than a reduced capacity continuous trigger scheduling threshold k1And is continuously lower than k1The number of times of (2) reaches t1If yes, the application platform is a platform needing capacity reduction and is classified into an application platform queue PS needing capacity reduction; and sequencing the application platforms needing the capacity reduction from high to low by using a formula (2):
Figure FDA0002875173130000011
wherein, RUiRepresenting the overall nominal resource utilization, CU, of the application platform iiRepresenting the current resource utilization rate of the application platform i; c is the number of GPU cards configured for each GPU node;
d) screening and sequencing expansion platform queues, periodically detecting and judging whether the current resource utilization rate of each application platform is higher than an expansion continuous trigger scheduling threshold k2And is continuously higher than k2The number of times of (2) reaches t2If yes, the application platform is a platform needing capacity expansion and is classified into an application platform queue PD needing capacity expansion; and sequencing the application platforms needing capacity expansion from high to low by using a formula (3):
Figure FDA0002875173130000021
wherein, CUiRepresenting the current resource utilization, SRU, of the application platform iiThe resource utilization rate of the whole trigger resource expansion of the application platform i is represented; c is the number of GPU cards configured for each GPU node;
capacity reduction and recovery are required to be carried out on GPU nodes of top-s application platforms in a platform queue PS, and the capacity reduction of GPU resources of the application platforms is realized through steps 1) to 6):
1) selecting platforms to be scaled down, selecting the application platform to be scaled down which is the first in the current sequence from the top-s application platforms, and executing the step 2);
2) establishing a GPU node list, and sequencing GPU nodes in the selected application platform from low to high according to the GPU resource utilization rate to establish the GPU node list; setting the first GPU node in the GPU node list to be in a locking state, and executing the step 3);
3) locking state node processing, namely judging whether the GPU node which is set to be in the locking state currently has application service, and if the GPU node does not have the application service, executing the step 4); if the service exists, executing service migration operation according to the platform type, if the application service migration is successful, executing step 4), if the migration is failed, terminating the current operation, removing the GPU node from the GPU node list, and skipping to execute step 1);
4) off-line nodes to be migrated, attributing GPU nodes in a locked state to GPU nodes to be migrated, off-line the GPU nodes to be migrated from the original application platform, marking the GPU node state type as off-line, removing the GPU node state type from the GPU node list established in the step 2), and executing the step 5);
5) adding the node inherent information of the GPU node to be migrated into a resource queue RQ, wherein the node inherent information comprises an operating system, a kernel version and a compatible application platform; judging whether the capacity reduction of the current application platform is finished or not, if so, executing the step 6), and if not, executing the step 2);
6) judging whether the top-s application platforms selected in the step 1) are all subjected to capacity reduction, and if not, executing the step 1) to perform cycle traversal; and if the capacity reduction is finished, ending the capacity reduction of the GPU resources of the application platform.
2. The elastic scheduling method of GPU resources based on heterogeneous application platforms according to claim 1, wherein elastic expansion of GPU nodes of top-d application platforms in a platform queue PD is required, and the expansion of GPU resources of the application platforms is implemented by the following steps:
A) selecting platforms to be expanded, selecting the application platform to be expanded which is the first application platform to be expanded in the current sequence from the top-d application platforms, and executing the step B);
B) traversing the resource queue RQ, acquiring inherent information of nodes in the resource queue RQ, sequentially and circularly matching the selected source node information with the application platform to be expanded if the queue is not empty, if GPU nodes meeting the expansion requirement exist in the resource queue RQ, the GPU node environment can be multiplexed, turning to the step D), and if all the resource queue RQ is traversed and the GPU nodes are not matched, turning to the step C);
C) selecting a GPU node in the queue resource queue RQ which is sequenced first, marking the state of the GPU node as a locked state, setting the initialized state of the GPU node as a state to be initialized, then performing initialization operation on the GPU node, namely, the node environment cannot be multiplexed and needs to be reset, and turning to the step D) after the initialization operation is completed;
D) initializing GPU nodes to be online, including application platform system initialization and application software initialization, and turning to the step E after initialization is completed);
E) adding the initialized GPU node to be online into the platform to be applied, realizing the capacity expansion of the GPU resource node in the platform to be applied, completing the online operation of the GPU node, and updating the information of the resource queue RQ; judging whether the expansion of the application platform to be expanded is completed, if so, executing the step F), and if not, executing the step B);
F) if an application platform capable of expanding capacity still exists in the top-d, turning to the step A), circulating and traversing, and if the application platform capable of expanding capacity does not exist, ending the expansion of the GPU resources of the application platform.
3. The GPU resource elastic scheduling method based on the heterogeneous application platform, according to claim 2, characterized in that: the GPU node environment in the step B) can be reused, namely the GPU node environment only needs software environment configuration without reinstalling an operating system, and the resetting and initialization in the step c) indicate that the node needs to install the operating system through an automatic means and issue the software configuration so as to achieve the purpose of initializing the node.
CN202011617125.0A 2020-12-31 2020-12-31 GPU resource flexible scheduling method based on heterogeneous application platform Active CN112698947B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011617125.0A CN112698947B (en) 2020-12-31 2020-12-31 GPU resource flexible scheduling method based on heterogeneous application platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011617125.0A CN112698947B (en) 2020-12-31 2020-12-31 GPU resource flexible scheduling method based on heterogeneous application platform

Publications (2)

Publication Number Publication Date
CN112698947A true CN112698947A (en) 2021-04-23
CN112698947B CN112698947B (en) 2022-03-29

Family

ID=75512804

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011617125.0A Active CN112698947B (en) 2020-12-31 2020-12-31 GPU resource flexible scheduling method based on heterogeneous application platform

Country Status (1)

Country Link
CN (1) CN112698947B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114915460A (en) * 2022-04-28 2022-08-16 中国人民解放军战略支援部队信息工程大学 Heterogeneous dynamic expansion and contraction capacity device and method for container cloud
CN116069481A (en) * 2023-04-06 2023-05-05 山东省计算中心(国家超级计算济南中心) Container scheduling system and scheduling method for sharing GPU resources

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107977252A (en) * 2016-10-21 2018-05-01 中兴通讯股份有限公司 A kind of capacity reduction method, device and the cloud platform of cloud platform business
WO2020017847A1 (en) * 2018-07-19 2020-01-23 나무기술 주식회사 Method for provisioning and managing multi-cluster on cloud platform
CN110825520A (en) * 2019-10-18 2020-02-21 山东省计算中心(国家超级计算济南中心) Cluster top-speed elastic expansion method for realizing efficient resource utilization
CN110968414A (en) * 2018-09-28 2020-04-07 阿里巴巴集团控股有限公司 Resource scaling method and device
CN111309440A (en) * 2020-02-16 2020-06-19 苏州浪潮智能科技有限公司 Method and equipment for managing and scheduling multiple types of GPUs

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107977252A (en) * 2016-10-21 2018-05-01 中兴通讯股份有限公司 A kind of capacity reduction method, device and the cloud platform of cloud platform business
WO2020017847A1 (en) * 2018-07-19 2020-01-23 나무기술 주식회사 Method for provisioning and managing multi-cluster on cloud platform
CN110968414A (en) * 2018-09-28 2020-04-07 阿里巴巴集团控股有限公司 Resource scaling method and device
CN110825520A (en) * 2019-10-18 2020-02-21 山东省计算中心(国家超级计算济南中心) Cluster top-speed elastic expansion method for realizing efficient resource utilization
CN111309440A (en) * 2020-02-16 2020-06-19 苏州浪潮智能科技有限公司 Method and equipment for managing and scheduling multiple types of GPUs

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
SHULING YANG ET AL.: "Research on Heterogeneous Cloud Test Platform Based on Elastic Scaling Mechanism", 《2019 IEEE 19TH INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY COMPANION (QRS-C)》 *
单朋荣等: "基于 Kubernetes 云平台的弹性伸缩方案设计与实现", 《计算机工程》 *
邹祝平: "容器资源负载预测的方法研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
马航: "基于Kubernetes的容器云平台资源动态调度的研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114915460A (en) * 2022-04-28 2022-08-16 中国人民解放军战略支援部队信息工程大学 Heterogeneous dynamic expansion and contraction capacity device and method for container cloud
CN114915460B (en) * 2022-04-28 2023-05-05 中国人民解放军战略支援部队信息工程大学 Heterogeneous dynamic capacity expansion and contraction device and method for container cloud
CN116069481A (en) * 2023-04-06 2023-05-05 山东省计算中心(国家超级计算济南中心) Container scheduling system and scheduling method for sharing GPU resources

Also Published As

Publication number Publication date
CN112698947B (en) 2022-03-29

Similar Documents

Publication Publication Date Title
CN109783218B (en) Kubernetes container cluster-based time-associated container scheduling method
CN107025205B (en) Method and equipment for training model in distributed system
CN112698947B (en) GPU resource flexible scheduling method based on heterogeneous application platform
CN114741207B (en) GPU resource scheduling method and system based on multi-dimensional combination parallelism
US10417062B2 (en) Method and apparatus of unloading out of memory processing flow to user space
CN111694675B (en) Task scheduling method and device and storage medium
CN112486642B (en) Resource scheduling method, device, electronic equipment and computer readable storage medium
CN113867959A (en) Training task resource scheduling method, device, equipment and medium
CN112416585A (en) GPU resource management and intelligent scheduling method for deep learning
CN110738156B (en) Face recognition system and method based on message middleware
CN109976873B (en) Scheduling scheme obtaining method and scheduling method of containerized distributed computing framework
CN113672391B (en) Parallel computing task scheduling method and system based on Kubernetes
CN114691372A (en) Group intelligent control method of multimedia end edge cloud system
CN111796933A (en) Resource scheduling method, device, storage medium and electronic equipment
CN109408230B (en) Docker container deployment method and system based on energy consumption optimization
CN114489970A (en) Method and system for realizing queue sequencing by using scheduling plug-in Kubernetes
CN115774614A (en) Resource regulation and control method, terminal and storage medium
CN113254143A (en) Virtual network function network element arranging and scheduling method, device and system
CN117332881B (en) Distributed training method and electronic equipment
CN112506658B (en) Dynamic resource allocation and task scheduling method in service chain
CN111726251B (en) Networking method, system and device for SDS (sodium dodecyl sulfate) storage domain in virtualized system
US20230195546A1 (en) Message Management Method and Apparatus, and Serverless System
CN116643890A (en) Cluster resource scheduling method, device, equipment and medium
CN116643860A (en) Priority scheduling method, system, electronic device and computer program product for weather machine learning algorithm operation
CN114816696A (en) Method, system and equipment for automatically executing Kubernetes task

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20230418

Address after: 250014 No. 19, ASTRI Road, Lixia District, Shandong, Ji'nan

Patentee after: SHANDONG COMPUTER SCIENCE CENTER(NATIONAL SUPERCOMPUTER CENTER IN JINAN)

Patentee after: Qilu University of Technology (Shandong Academy of Sciences)

Address before: Shandong computing center, No.19 Keyuan Road, Lixia District, Jinan City, Shandong Province 250014

Patentee before: SHANDONG COMPUTER SCIENCE CENTER(NATIONAL SUPERCOMPUTER CENTER IN JINAN)

TR01 Transfer of patent right