CN114510319A

CN114510319A - Kubernetes cluster GPU space sharing method

Info

Publication number: CN114510319A
Application number: CN202111635865.1A
Authority: CN
Inventors: 刘万涛; 虎嵩林; 韩冀中
Original assignee: Institute of Information Engineering of CAS
Current assignee: Institute of Information Engineering of CAS
Priority date: 2021-12-29
Filing date: 2021-12-29
Publication date: 2022-05-17

Abstract

The invention discloses a Kubernetes cluster GPU space sharing method, relates to the technical field of computers, and provides a scoring rule for the combination of GPU video memory and active threads aiming at the problem of low GPU resource utilization rate in the current data processing cluster.

Description

Kubernetes cluster GPU space sharing method

Technical Field

The invention relates to the technical field of computers, in particular to a Kubernetes cluster GPU resource sharing method.

Background

With the development and progress of network technology, the data volume in the network has been increased explosively, and the huge value is often hidden behind the data, so how to process the data quickly is a current hotspot problem. The solution presented has roughly two aspects:

one is to utilize a GPU. Previously, people constructed the CPU into a human brain-like model, which was suitable for logic processing and serial computing, and also for multi-task parallel processing, so people were also used to compute data using the CPU. However, in the architecture of the GPU, the GPU has a much larger number of processing parallel computing units SM than the number of CPUs. During subsequent processing of data, engineers are also increasingly aware of the powerful capabilities of GPUs in the direction of processing parallel data. In 2006, the NVIDIA corporation released CUDA, which is a general parallel computing platform and programming model built on CPUs of NVIDIA, and based on CUDA programming, parallel computing engines of GPUs can be used to more efficiently solve complex computing problems.

One is a cloud computing cluster. In recent years, the popularity and development of cloud computing systems and services has increased exponentially. Cloud computing services, an emerging, mixed computing paradigm, are becoming the primary computing power for business, personal, and mobile computing applications, often deployed over a distributed virtualization infrastructure. With the development of virtualization technology and container technology, the use of cloud services becomes simpler and more trouble-saving, and because of the characteristic of lighter weight of containers, the original virtual machine technology is gradually replaced. There are many container technologies, but the current Docker technology is almost an absolute mainstream of containers. Kubernetes is a container arrangement tool developed by Google, and the predecessor of the Kubernetes is a pulse system used for years in Google, so that the structure and the technology of Kubernetes are mature at the beginning of the disclosure, the Kubernetes is generally concerned by the cloud computing industry, and the Kubernetes quickly becomes the mainstream of the arrangement tool market. By using the distributed cloud computing cluster platform based on Kubernetes, the data processing speed is increased, and compared with the use of a personal computer or a traditional cluster, the cloud cluster improves the deployment and execution efficiency of tasks to a certain extent.

Therefore, combining the two is a popular choice in the current society, but there is a big problem in front of users or enterprises. Because most clusters are basically one Docker or container which completely monopolizes one GPU card, and GPU resources are more and more expensive along with the iterative update of GPU versions, not only is the resources wasted, but also the use cost is increased sharply. How to realize GPU sharing becomes the current hot topic.

The patent publication No. CN111506419A discloses a GPU resource sharing method and device, and provides a method which comprises the steps of comparing GPU physical resource requirements of a virtual machine with unallocated resources in a resource pool; if the unallocated resources in the resource pool do not meet the GPU physical resource requirements of the virtual machine, adjusting the GPU physical resources allocated to the virtual machine according to the GPU physical resource utilization rate allocated to the virtual machine, and allocating the GPU physical resources to the virtual machine according to the unallocated resources in the adjusted resource pool; otherwise, allocating GPU physical resources to the virtual machine according to the GPU physical resource requirements of the virtual machine. The method is not applicable to practice because the type of GPU physical resources is not clear, and the virtual machine technology still used by the method occupies a lot of resources on a system level when the method is used.

Patent publication No. CN111475303A entitled method, system and apparatus for GPU shared scheduling and single-machine multi-card, which proposes a GPU shared scheduling method, applied to a scheduler of a central control host, including: inquiring GPU information in the environment variable of the pod which is not updated in each controlled host by using the resource occupation mark and the update mark of the pod; the un-updated pod is a pod that is already running but has not updated GPU information; updating the GPU information to the annotation of the non-updated pod, and adding the update mark to the non-updated pod; screening schedulable controlled hosts without unmarked pod from a plurality of controlled hosts; the unmarked pod is a pod without the renewal mark; screening out a target controlled host meeting a first preset condition and a target GPU meeting a second preset condition in the target controlled host from the schedulable controlled host by utilizing the state information of the GPUs in the schedulable controlled host; writing GPU information of the target GPU in the annotation of the pod to be allocated; and distributing the pod to be distributed to the target controlled host. The sharing mode does not actually solve the GPU sharing problem of each card, and the problem of single-card resource waste still exists.

Most current GPU sharing methods have more or less problems, and even some methods are not suitable for kubernets cloud clusters. One container alone occupies one GPU card, which causes a lot of waste of resources, and many users face high use cost, which is prohibitive.

Disclosure of Invention

The invention aims to provide a Kubernetes cluster GPU space sharing method, which improves the GPU utilization rate in a cluster and reduces the use cost.

In order to achieve the purpose, the invention adopts the following technical scheme:

a Kubernetes cluster GPU space sharing method comprises the following steps:

1) the Kubernets cluster is constructed by using a preset number of nodes, wherein the nodes comprise a main node formed by a Server and a working node used for executing tasks distributed by a Master, and components of the main node comprise a data bus API Server, a controller and a scheduler; the method comprises the steps that a Scheduler plug-in Scheduler Extender is arranged on a Scheduler, a Device plug-in is arranged on a working node, and a Kubernetes cluster caches a GPU model, a GPU video memory resource and multi-process service MPS information of the working node reported by the Device plug-in into the Scheduler Extender;

2) when a user submits Pod information through a client, the Pod is the minimum scheduling unit of the cluster, the cluster receives the Pod information and judges whether the format of the Pod information content conforms to the cluster characteristics, and if not, the Pod information content is fed back to the user;

3) if the format of the Pod information content conforms to the cluster characteristics, the API Server stores the Pod information into a distributed storage database Etcd, and establishes a query relation on the Pod information at a client;

4) the Scheduler filters the working nodes which do not meet the conditions according to scheduling resources required by the Pod, and then sends the working node information which meets the conditions, GPU video memory allocated for the Pod and active thread ratio information to the Scheduler Extender;

5) the scheduler extends the order, then adds the order situation and the Pod information to be scheduled into the ant colony algorithm for operation, if the solution is obtained, the solution is sent to the scheduler, otherwise, the scheduler plans to open a working node in a closed state, and brings the working node information into the working node set of the Pod to be scheduled, and the iterative computation is carried out again through the steps;

6) if the obtained solution contains a working node which is planned to be started, the working node is started, otherwise, the Pod to be executed and the working node are bound through a Scheduler, and the binding information is written into the Etcd;

7) verifying whether the bound Pod can run on the working node or not on the working node, if the bound Pod runs successfully, scoring again, removing the working node which fails to run from the deployable working node set, and distributing again;

8) the steps are circulated until all the Pod to be scheduled is completed;

9) and the dispatcher sends the binding information of the Pod and the working node, the state information of the working node and the scheduling information used by the Pod to the Etcd, and the Etcd acquires the switching state information and the operation state information of the working node in the cluster according to a fixed time period and updates the stored corresponding information.

Furthermore, each working node runs a Kubelet service process, monitors a port, receives and executes an instruction sent by the main node, and manages Pod information and a Pod in the Pod; and each Kubelet service process registers the information of the working node on the API Server, periodically reports the resource use condition to the main node, and monitors the resources of the working node and the container in the Pod.

Further, the Device plug connects the Device plug Manager in the Kubelet through a remote procedure call system GRPC with the identity of the client to obtain the GPU model and the GPU memory resource of the working node, and report and monitor the GPU model and the GPU memory resource.

Further, the user submits the Pod information through a client-installed command line tool Kubectl.

Further, the Scheduler Extender scores according to a scoring rule, wherein the scoring rule is as follows:

G＝α*S_gpum+β*S_mps；

wherein the content of the first and second substances,T_gpumGPU memory, T, representing the current working node_mpsRepresenting the active thread ratio surplus degree of the current working node; u shape_gpumGPU video memory, U, representing task consumption already deployed by current working node_mpsRepresenting the active thread ratio consumed by the task deployed by the current working node; r_gpumGPU memory, R, representing a current task request to be scheduled_mpsRepresenting the active thread ratio of the current task request to be scheduled; s_gpumGPU video memory indicating working node idleness, S_mpsRepresenting the active thread ratio of idle working nodes; alpha and beta represent weights; g represents Node resource priority.

Further, the ant colony algorithm is composed of the following three formulas:

wherein i represents the starting point of the current ant, j represents the end point which can be reached by the current ant, Λ represents the collection of the end points which can be reached by the ant, and η_ijRepresenting a heuristic function, τ_ijIndicating pheromone concentration on a path from a starting point i to an end point j, alpha indicating pheromone weight factor, beta indicating heuristic function weight factor, n indicating iteration times, m indicating the total number of ants,

expressing the size of pheromone secreted by the ant k from the starting point i to the end point j, rho expressing the volatilization factor of the pheromone from the starting point i to the end point j, R expressing the evaluation coefficient of the risk of the pheromone, and L_kRepresenting the distance of the ant k from the starting point i to the end point j in the current iteration,

representing the probability magnitude that ant k chooses to crawl from starting point i to ending point j.

Further, the heuristic function formula is as follows:

further, taking an objective function of the optimal ant path in the solution of the ant colony algorithm as a volatilization factor rho of the pheromone, and calculating an objective function value; the smaller the objective function value is, the better the scheduling scheme sought by the ant is.

Further, the formula of the objective function is as follows:

wherein f (q) represents an objective function, q represents the number of working nodes used by the current scheduling,

represents the total amount of GPU video memory of the j-th working node,

respectively representing the active thread ratio total amount of the jth working node;

the GPU video memory consumed by the task which is deployed by the jth working node is shown,

representing the active thread ratio consumed by the task that the jth worker node has deployed,

the GPU video memory of the ith task request to be scheduled is represented,

represents the active thread ratio, rho, of the ith task request to be scheduled₁、ρ₂Representing weight, wherein m represents the number of working nodes used by current scheduling, x represents x Pods to be scheduled, and y represents y working nodes; and c represents the task number of the current working node.

Further, the constraint conditions of the optimal ant path in the ant colony algorithm solution are as follows:

q≤y；

and the constraint conditions are respectively distributed on the sum of the ratios of GPU video memory and active threads used by all tasks on the jth working node, and the sum is less than or equal to the total ratio of GPU video memory and active threads of the current working node.

Further, initialization pheromone concentration value tau of ant colony algorithm_ij(0) The percentage of the free resource quantity of the cluster working node is set as follows:

furthermore, in the solving process of the ant colony algorithm, a path is selected by a roulette mechanism; after path selection is carried out for a single ant, local pheromones are updated through a global pheromone formula; and after all ants finish the path selection this time, updating the global pheromone through a global pheromone formula.

Further, local pheromones: tau is_ij(t+1)＝ρΔτ_ij(t)+G_ijτ_ij(t)；

Global pheromones: tau is_ij(t+1)＝ρΔτ_ij(t)+(1-f(q))τ_ij(t)；

Where ρ is a volatility factor and Δ τ_ij(t) the increment of pheromones left by each iteration ant colony from the starting point i to the end point j is represented, and the pheromones on each path in the initial stage are zero;

expressing the total amount of pheromones secreted by k ants from the task i to the work node j in t iterative tasks; g_ijIndicating the resource priority of the worker node.

The invention has the following advantages: aiming at the problem of low utilization rate of GPU resources in the current data processing cluster, the invention provides a scoring rule for the combination of GPU video memory and active threads, and in order to effectively schedule GPU resources, Scheduler extensions and Device plugs are added in the cluster, thereby further changing the static scheduling scheme in the conventional scheduling strategy, and using the improved ant colony algorithm can effectively improve the utilization rate of the cluster resources, reduce the use cost of enterprises and improve the benefits.

Drawings

FIG. 1 is a schematic diagram of a cluster architecture.

Fig. 2 is a diagram of an improved kubernets cluster architecture according to an embodiment of the present invention.

Detailed Description

In order to make the aforementioned and other features and advantages of the invention more comprehensible, embodiments accompanied with figures are described in detail below.

The embodiment of the invention improves the cluster architecture based on the kubernets cloud cluster shown in fig. 1, the improved architecture is shown in fig. 2, and firstly, the cluster architecture is simply introduced:

master of cluster Master node: the Kubernetes cluster management system is a core node of the whole cluster, all command execution operations for the Kubernetes cluster are executed by the Kubernetes cluster management system, and the Kubernetes cluster management system is responsible for scheduling and managing the whole cluster and is generally an independent server in the cluster. The Master includes a very large number of components, mainly including the API Server, the Controller Manager, and the Scheduler. The API Server is a data bus and a data center of the whole system, and provides the data interaction and communication among various resource objects of the cluster system, REST API interfaces (including authentication authorization, data verification and cluster state change) of cluster management and other modules. Other modules inquire or modify data through the API Server, and only the API Server directly operates the Etcd. The controller is a cluster internal management and control center and is responsible for acquiring relevant parameters such as development dependency libraries and system configuration required by users and sending the parameter generation tasks to the scheduler. And the scheduler receives the tasks sent by the controller, monitors the validity of the Pod, finds a proper working node for operation by the Pod according to a scheduling strategy, and sends the binding information to the Etcd for storage through the API Server.

Working Node (Node): except for a cluster Master Node (Master), other nodes in the Kubernetes cluster are called working nodes (nodes), and each Node is distributed to a reasonable workload by the Master Node according to a scheduling rule and is a task executor. And each Node runs a Kubelet service process, monitors a port, receives and executes an instruction sent by the Master, and manages the Pod and the container in the Pod. Each Kubelet process can register the Node self information on the API Server, periodically report the resource use condition to the Master Node, and monitor the resources of the Node and the container through software or a module cAdvisor for monitoring the resource use condition.

Command line tool Kubectl: kubectl is typically installed on clients, which can manage the cluster itself and enable installation deployment of containerized applications on the cluster. Kubectl, as a command line tool of Kubernets, has a main responsibility of operating objects of resources in a cluster, and the operations include creation, deletion, and viewing of the resource objects, so that management of the Kubernets cluster is realized.

Distributed storage database (Etcd): the Etcd is a very important component in a kubernets cluster, and is used for storing all network configurations and state information of all objects in the cluster, that is, the state of the whole cluster is stored.

Controller Manager: and the system is responsible for maintaining the state of the cluster, such as fault detection, automatic expansion, rolling update and the like.

Scheduler: and the scheduling of resources is responsible, and the Pod is scheduled to the corresponding machine according to a preset scheduling strategy.

Service process Kubelet: the Node management system runs on Node nodes, is responsible for maintaining the life cycle of a container, is also responsible for managing volume (CVI) and network (CNI), and provides the current state information of the Node nodes for Master.

Pod in the present invention is the smallest scheduling unit of the kubernets cluster. Pod can be viewed as a working form of a container or a group of containers in a cluster, a container being the operating state of a Docker.

In recent years, a unified architecture of the GPU and applications using a high-level language programming (such as C + + programming language) development platform are adopted, so that general-purpose computation of the GPU is promoted and is rapidly developed. In the current cloud cluster, most of the data submitted by users or enterprises are also data parallel computing tasks, so that the GPU is an important resource in the cluster. Among the GPU resources, GPU video memory and Streaming Multiprocessor (SM) are two core resources that are currently of most interest. The video memory is an important space for storing temporary data, whether in data operation or deep learning training. However, in the Kubernetes cluster, scheduling of video memory resources is not supported. Also for SM, it can be regarded as the heart of the GPU, which is an important component of the GPU architecture, and is important hardware for performing CUDA kernel operation, but cannot be used as an important reference resource to assist sharing. Aiming at the two methods, the invention provides a method for sharing GPU resources.

The multiprocessing service (MPS) is another binary-compatible implementation of the CUDA Application Programming Interface (API). The MPS runtime architecture aims to transparently enable multi-process CUDA applications (typically MPI jobs) to utilize Hyper-Q functionality on the latest NVIDIA (Kepler-based) GPU. Hyper-Q allows concurrent processing of CUDA kernels on the same GPU. The MPS is a CUDA API realized by a binary-compatible client-server runtime, and provides a way for multi-process CUDA applications to share the NVIDA GPU. And different active thread ratios can be set inside MPS jobs, so that different jobs can be distinguished on GPU operations. Briefly, the MPS can schedule what the CUDA context does not work on as a whole, conceptually implementing the assignment to SMs. The reason for setting the different active thread ratios is as follows: the mechanism for using the multi-thread service is that a user can adjust the calculation speed by setting the available active thread ratio variable of the program. The efficiency or occupancy of a streaming multiprocessor is 100% if all threads in the streaming multiprocessor are working at the same time, in which case the speed of the processing program is the fastest and the efficiency is the highest. In practice, due to the limitation of resources, not all threads work simultaneously, but the running thread is an active thread. The multithreading service can set the active thread ratio and divide the active thread ratio into different application processes. Although the multithreading service is an independent system for realizing the multiprocess GPU sharing on the host, a container running on a Node can be taken as a process on the host, so that the size of an active thread ratio can be set when the container is started, and the MPS can be regarded as a partitionable fine-grained resource.

In order to improve the utilization rate of cluster resources, the invention designs a Node scoring rule, and combines the two factors together to evaluate whether a Node is suitable for dispatching Pod. The GPU video memory takes 1G as a unit, MPS is taken as a resource, the active thread ratio is taken as a divisible fine-grained resource, the highest is 100, the lowest is 0, each 10 parts is taken as a unit, and the active thread ratio of all tasks on a Node is comprehensively less than or equal to 100. The scoring rule is as follows:

G＝α*S_gpum+β*S_mps；

T_gpum、T_mpsrespectively representing GPU video memory and active thread ratio residual degree of the current Node; u shape_gpum、U_mpsRespectively representing the ratio of GPU video memory consumed by the deployed tasks of the current Node to the active threads; r_gpum、R_mpsRespectively representing the ratio of GPU video memory to active thread of the current task request to be scheduled; s_gpum、S_mpsRespectively representing idle GPU video memory and active thread ratio of Node nodes; alpha and beta represent weight, wherein alpha is 0.6, and beta is 0.4; g represents Node resource priority, and the larger the value is, the higher the Node score is.

The ant colony algorithm is a group intelligent algorithm, and a group of individuals without intelligence or with slight intelligence show intelligent behaviors through mutual cooperation, so that a new possibility is provided for solving complex problems. The ant colony algorithm was first proposed by the italian scholari Colorni a., Dorigo m. equal to 1991. Through the development of more than 20 years, the ant colony algorithm has been greatly improved in theory and application research. The ant colony algorithm is a bionic algorithm and is inspired by the foraging behavior of ants in nature. In nature, the ant colony is always able to find an optimal path from the nest and the food source as ants seek food. The artificial ants are applied, and through information exchange and cooperation among individuals, the problem of the traveling salesman can be solved, and the artificial ants have good effects in solving the problem of combination optimization, such as scheduling, network routing and the like.

The conventional ant colony algorithm model is constructed by the following three formulas:

i represents the starting point of the current ant; j represents the terminal point that the current ant can reach; Λ represents the set of ants that can reach the endpoint; eta_ijRepresenting a heuristic function; tau._ijIndicating the pheromone concentration on the line from the starting point i to the end point j; α is a weight factor representing pheromones; β is a weight factor representing a heuristic function; n represents how many times an iteration has been made; m represents the total number of ants;

represents the size of pheromone secreted by ant k from a starting point i to an end point j; rho represents the volatilization factor of pheromone from the starting point to the destination route; r is an evaluation coefficient for pheromone risk; l is_kRepresenting the distance from the starting point to the end point of the ant k in the iteration;

indicating how much probability ant k chooses to crawl from starting point i to ending point j. The beta heuristic function weight factor is used for expressing the influence of the pheromone and the heuristic function on ants in the foraging routing process, and directly influences whether the optimal solution can be found in the solution space. If the beta value is larger, the current local optimal solution is selected; the rho volatility factor is mainly used for keeping a certain constraint of the heuristic function and the pheromone, so that the constraint reaches a dynamic constraint balance.

The objective function of the optimal ant path in the ant colony algorithm solution:

in the formula (I), the compound is shown in the specification,

respectively representing the GPU video memory total amount and the active thread ratio total amount of the jth Node;

respectively representing the ratios of GPU video memory consumed by tasks deployed by the jth Node and active threads;

respectively representing the ratio of the GPU video memory to the active thread of the ith task request to be scheduled; rho₁、ρ₂Represents a weight; m represents the number of Node nodes used in the scheduling; x represents x Pods to be scheduled, and y represents y Node nodes; and c represents the task number of the current Node. (q) an objective function of the optimal ant path in the solution of the ant colony algorithm is represented, the objective function is correspondingly processed, the objective function is added into an updated formula of the pheromone concentration (namely, the second formula of the three formulas) to serve as a volatilization factor rho of the pheromone, the smaller the objective function value f (q), the more optimal the scheduling scheme searched by the ant is, and the higher the pheromone concentration is, the more optimal the allocation scheme is. And q represents the number of Node nodes used by the current scheduling.

Constraint conditions of the optimal ant path in the ant colony algorithm solution are as follows:

q≤y；

the constraint condition represents that the sum of GPU video memory and active thread ratio used by all tasks distributed on the jth Node is less than or equal to the total amount of GPU video memory and active thread ratio of the current Node.

The concentration of the initialization pheromone of the ant colony algorithm is randomly set and is subjected to relevant assignment according to the colony information environment. Therefore, the percentage of the idle resource quantity of the cluster Node nodes is used as the pheromone concentration value in the initialization algorithm model:

a heuristic function:

when the scheduling policy is applied, the heuristic function and the pheromone are initially set. Meanwhile, the relevant weighting factor is set according to the effect obtained by the previous experiment. In the solving process, the way of selecting the path mainly depends on a roulette mechanism, and influences the next round of iteration together with pheromone generated by the current selected path. After Node nodes are selected for a single ant, local pheromone is updated by using a global pheromone formula, and global pheromone is updated by using the global pheromone formula after all ants complete path selection.

Local pheromones:

τ_ij(t+1)＝ρΔτ_ij(t)+G_ijτ_ij(t)；

global pheromones:

τ_ij(t+1)＝ρΔτ_ij(t)+(1-f(q))τ_ij(t)；

ρ is a volatility factor, Δ τ_ijAnd (t) the increment of the pheromone left by the ant colony from the starting point i to the end point j in each iteration, and the pheromone on each path in the initial stage is zero.

The following formula is used to calculate the pheromone increment,

represents the total amount of pheromones secreted by k ants from task i to Node j in t iterations:

the larger the total amount of pheromone, the better the scheduling scheme. G_ijAnd expressing the resource priority of the Node, adding the resource priority as an influence factor into a local pheromone concentration updating formula, wherein the higher the value is, the higher the possibility of deploying to the Node is. And f (q) expressing the objective function value of the optimal ant in the ant colony solution, correspondingly processing the objective function value, adding the objective function value into an updating formula of the pheromone concentration to serve as an influence factor, wherein the smaller the value is, the more optimal the scheduling scheme searched by the ant is, the larger the pheromone concentration is, and the easier the optimal distribution scheme is to be found.

Kubernetes is taken as an open source container management tool, for two resources related to the invention, for GPU video memory and MPS, the ExtendedResource rule can be used, and the Kubernetes can conveniently declare a resource as a resource which can be identified by a cluster. In order to realize better resource scheduling, the invention designs a Scheduler Extender and a Device plug between a client and a Kubernets cluster according to a Scheduler Extender mechanism and a Device plug mechanism respectively. The Scheduler extends the ware and utilizes the expansibility of the Scheduler component to be responsible for scheduling the Pod using the extend Resource object in the container. Since the extension mechanism of the Scheduler Extender is implemented by means of HTTP, in order not to affect the default Scheduler performance of the cluster, scheduling is provided for the Pod that only needs to use the extension resource by means of multiple schedulers, and this way has portability. The Scheduler Extender is responsible for determining whether a single GPU device on a Node can provide enough GPU memory when a global Scheduler performs filtering and binding, recording a GPU allocation result into a Pod Spec annotation so as to perform subsequent filtering during binding, the default Scheduler initiates an http request to the Scheduler Extender, and the Scheduler Extender makes a decision according to own resource data and an intermediate scheduling result transmitted by the Scheduler.

The resources and the state of each Node in the Kubernetes cluster are collected by a Kubelet, the information is sent to a Master, when a Pod is created, a request is sent to a Scheduler, the Scheduler selects an optimal Node according to the Node state, and finally the Pod is created by the Kubelet of the optimal Node. In order to manage and distribute Node Extender Resource resources, a specified program, namely a Device plug-in, needs to be created between a Node Kubelet and a GPU. These Device plug-ins connect the Device plug-in Manager in Kubelet with the identity of the client by means of a remote procedure call to the system GRPC. The Device plug can report and monitor GPU video memory resources of Node nodes and perform Pod auxiliary scheduling. For monitoring of MPS resources, a data structure needs to be set in the Scheduler Extender for operation. When the cluster is initialized, the MPS total amount of all Node nodes in the cluster is set as 100, and when the cluster is operated in the post-sequence, the stored MPS total amount of each Node is updated according to the Pod to be deployed to the Node and the active thread ratio information in the Pod information to finish the operation on the Node. For scheduling MPS resources, it is necessary to determine whether the video memory is suitable for scheduling corresponding Pod while the Scheduler Extender determines whether the video memory is suitable for scheduling corresponding Pod.

The invention provides a Kubernetes cluster GPU space sharing method, which comprises the following steps:

the first step is as follows: and (5) initializing. And constructing a cluster by using a proper number of nodes, and caching the video card information (including GPU models and GPU video memory resources) and MPS information of the Node nodes into a Scheduler Extender.

The second step is that: and submitting the Pod. The user submits Pod information to the cluster through Kubectl using YAML files. When submitting, the cluster will determine whether the format of the submitted content conforms to the cluster characteristics, such as Pod version information. If not, feeding back to the user; and if the answer is yes, the next step is carried out.

The third step: after receiving the information, the API Server stores the information into Etcd, and can query the Pod information at the client by using Kubectl. And prepares for Node scheduling for these Pod.

The fourth step: and after the default Scheduler filters the Node nodes which do not meet the conditions according to other scheduling resources required by the Pod, the Node nodes which meet the conditions, the GPU video memory allocated for the Pod and the active thread ratio information are sent to the Scheduler Extender together.

The fifth step: and Node scoring. And the Scheduler Extender scores according to the self-defined scoring rule and stores a scoring result into the Scheduler.

And a sixth step: a Scheduler is used for pre-scheduling. And adding the scoring condition and the Pod information to be scheduled into the ant colony algorithm for operation by the Scheduler Extender. If not, plan to open a Node in closed state, incorporate the Node information into the Node set of Pod to be scheduled, and reenter the fifth to sixth steps to carry out iterative computation. And if the solution is obtained, the next step is carried out.

The seventh step: the Scheduler Extender sends the solution obtained by the operation back to the default Scheduler.

Eighth step: if the obtained solution, namely the scheduling scheme contains the Node which is planned to be started, the Node is opened; otherwise, go to the next step.

The ninth step: executing according to a preset allocation scheme, and binding the Pod to be executed and the Node through a Scheduler, wherein the binding method in the cluster at the moment is to update cached Pod and Node information in the cluster, write the binding information into Etcd, and finally verify whether the bound Pod can run on the Node at the Node.

The tenth step: if the Pod does not successfully operate in the Node, the fifth step is re-entered and executed according to the sequence, and the Node nodes which fail to operate are removed from the deployable Node set, and the Node nodes are distributed again.

The eleventh step: and circulating the steps until the task list to be scheduled is empty.

The twelfth step: then the Scheduler sends the binding information of the Pod and the Node, the Node state information (i.e. the state information of the Node being opened or closed), and the Pod use scheduling information to the Etcd.

The thirteenth step: the information caching module acquires Node switch state information and Pod running state information (namely which pods are running and have been run finished information) in the cluster according to a fixed time period, and updates the stored corresponding information.

Although the present invention has been described with reference to the above embodiments, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A space sharing method based on a Kubernetes cluster GPU is characterized by comprising the following steps:

5) the Scheduler extends marks, then adds the marking condition and the to-be-scheduled Pod information into an ant colony algorithm for operation, if a solution is obtained, the solution is sent to a Scheduler, otherwise, a working node in a closed state is planned to be started, the working node information is brought into a working node set of the to-be-scheduled Pod, and iterative computation is carried out through the step again;

8) the steps are circulated until all the Pod to be scheduled is completed;

2. The method of claim 1, wherein each worker node runs a Kubelet service process, monitors the port, receives and executes the instruction sent by the master node, and manages Pod information and Pod in Pod; and each Kubelet service process registers the information of the working node on the API Server, periodically reports the resource use condition to the main node, and monitors the resources of the working node and the container in the Pod.

3. The method as claimed in claim 2, wherein the Device plug-in connects the Device plug-in Manager in Kubelet with the identity of the client through the remote procedure call system GRPC to obtain the GPU model and the GPU video memory resource of the working node for reporting and monitoring.

4. The method of claim 1, wherein the Scheduler Extender scores points according to a scoring rule that is:

G＝α*S_gpum+β*S_mps；

wherein, T_gpumGPU memory, T, representing the current working node_mpsRepresenting the active thread ratio surplus degree of the current working node; u shape_gpumGPU video memory, U, representing task consumption already deployed by current working node_mpsRepresenting the active thread ratio consumed by the task deployed by the current working node; r_gpumGPU memory, R, representing a current task request to be scheduled_mpsRepresenting the active thread ratio of the current task request to be scheduled; s_gpumGPU memory, S, indicating working node idleness_mpsRepresenting the active thread ratio of idle working nodes; alpha and beta represent weights; and G represents the priority of the resources of the working node.

5. The method of claim 4, wherein the ant colony algorithm consists of three formulas:

wherein i represents the starting point of the current ant, j represents the end point which can be reached by the current ant, Λ represents the collection of the end points which can be reached by the ant, and η_ijRepresenting a heuristic function, τ_ijIndicates the concentration of pheromone on the route from the starting point i to the end point j, and alpha indicates informationPrime weight factor, beta is a weight factor representing a heuristic function, n represents the number of iterations, m represents the total number of ants,

6. The method as claimed in claim 5, wherein the objective function of the optimal ant path in the solution of the ant colony algorithm is taken as the volatility factor p of the pheromone, and the objective function value is calculated; the smaller the objective function value is, the better the scheduling scheme searched by the ants is; the formula of the objective function is as follows:

represents the total amount of GPU video memory of the j-th working node,

the GPU video memory of the ith task request to be scheduled is represented,

7. The method of claim 6, wherein the heuristic function formulation is as follows:

8. the method as claimed in claim 6, wherein the constraint condition of the optimal ant path in the ant colony algorithm solution is:

q≤y；

the constraint conditions are respectively distributed to the sum of the ratios of the GPU video memory and the active threads used by all tasks on the jth working node, and the sum is less than or equal to the total ratio of the GPU video memory and the active threads of the current working node.

9. The method of claim 6, wherein the ant colony algorithm's initialization pheromone concentration value τ_ij(0) The percentage of the free resource quantity of the cluster working node is set as follows:

10. the method of claim 6, wherein the ant colony algorithm solution process relies on a roulette mechanism to select a path; after path selection is carried out for a single ant, local pheromones are updated through a global pheromone formula; after all ants finish the path selection, updating the global pheromone through a global pheromone formula;

wherein, local pheromone: tau is_ij(t+1)＝ρΔτ_ij(t)+G_ijτ_ij(t)；

Global pheromones: tau is_ij(t+1)＝ρΔτ_ij(t)+(1-f(q))τ_ij(t)；

representing the total amount of pheromones secreted by k ants from the task i to the working node j in t iterative tasks; g_ijIndicating the resource priority of the worker node.