WO2021203805A1 - 一种gpu共享调度、单机多卡方法、系统及装置 - Google Patents

一种gpu共享调度、单机多卡方法、系统及装置 Download PDF

Info

Publication number
WO2021203805A1
WO2021203805A1 PCT/CN2021/073784 CN2021073784W WO2021203805A1 WO 2021203805 A1 WO2021203805 A1 WO 2021203805A1 CN 2021073784 W CN2021073784 W CN 2021073784W WO 2021203805 A1 WO2021203805 A1 WO 2021203805A1
Authority
WO
WIPO (PCT)
Prior art keywords
pod
gpu
target
information
controlled host
Prior art date
Application number
PCT/CN2021/073784
Other languages
English (en)
French (fr)
Inventor
王德奎
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Priority to US17/917,860 priority Critical patent/US11768703B2/en
Publication of WO2021203805A1 publication Critical patent/WO2021203805A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3877Concurrent instruction execution, e.g. pipeline or look ahead using a slave processor, e.g. coprocessor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5044Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources

Definitions

  • the present invention relates to the field of computer resource management, and in particular to a GPU (Graphics Processing Unit, graphics processor) shared scheduling, single-machine multi-card method, system and device.
  • GPU Graphics Processing Unit, graphics processor
  • the purpose of the present invention is to provide a method, system and device for GPU sharing scheduling and single-machine multi-card scheduling, which can determine the resource usage of pods before scheduling, and supports single-machine multi-card scheduling.
  • the specific plan is as follows:
  • a GPU sharing scheduling method applied to a scheduler of a central control host, includes:
  • the resource occupancy flag and update flag of the pod to query the GPU information in the environment variables of the pod that has not been updated in each controlled host;
  • the non-updated pod is a pod that is already running but has not updated the GPU information;
  • the process of using the resource occupancy flag and update flag of the pod to query the GPU information in the environment variable of the pod that is not updated in each controlled host includes:
  • the process of setting the conditional target GPU includes:
  • the target controlled host and the target controlled host that meet the first preset condition are selected from the schedulable controlled host.
  • the process of selecting the target controlled host that meets the first preset condition from the schedulable controlled host and the target GPU that meets the second preset condition from the target controlled host includes:
  • the target controlled host with the most free GPU resources and the target GPU with the most free video memory among the target controlled hosts are screened out.
  • the process of allocating the pod to be allocated to the target controlled host includes:
  • the pod to be allocated is allocated to the target controlled host and bound.
  • the invention also discloses a GPU sharing single-machine multi-card method, which is applied to the target controlled host, and includes:
  • Setting the resource occupation flag for the target pod indicates that the target GPU recorded in the comment of the target pod has been occupied.
  • the invention also discloses a GPU sharing scheduling system, which is applied to the scheduler of the central control host, including:
  • the GPU information query module is used to use the resource occupancy mark and update mark of the pod to query the GPU information in the environment variables of the unupdated pod in each controlled host;
  • the unupdated pod is a pod that is already running but has not updated the GPU information ;
  • a GPU information update module configured to update the GPU information to the comment of the non-updated pod, and add the update mark to the non-updated pod;
  • the unmarked pod screening module is used to filter out schedulable controlled hosts without unmarked pods from multiple controlled hosts; the unmarked pod is the pod without the update mark;
  • the scheduling module is used to use the state information of the GPU in the schedulable controlled host to filter out the target controlled host that meets the first preset condition from the schedulable controlled host and the target controlled host that satisfies the second preset Conditional target GPU;
  • the GPU information writing module is used to write the GPU information of the target GPU in the annotation of the pod to be allocated;
  • the pod allocation module is used to allocate the pod to be allocated to the target controlled host.
  • the invention also discloses a GPU sharing single-machine multi-card system, which is applied to the target controlled host, and includes:
  • a division request receiving module configured to receive a resource division request sent by a pod bound to the target controlled host
  • An information consistency judgment module configured to traverse the pods bound to the target controlled host, and judge whether there are multiple pods whose recorded resource division information is consistent with the resource division information recorded in the resource division request;
  • the target pod determination module is configured to, if the information consistency determination module determines that there is only one pod whose recorded resource division information is consistent, it is used as the target pod;
  • the target pod screening module is configured to, if the information consistency judgment module determines that there are multiple pods with the same recorded resource division information, filter out the target pods without a resource occupation mark;
  • the environment variable writing module is used to write the GPU information of the target GPU written by the scheduler of the central control host in the target pod annotation into the environment variable of the target pod;
  • a video memory division module configured to use the resource division information in the target pod to register corresponding one or more virtual graphics cards, and divide the idle video memory of the target GPU into each virtual graphics card accordingly;
  • the resource occupancy marking module is configured to set the resource occupancy flag for the target pod, indicating that the target GPU recorded in the target pod comment has been occupied.
  • the invention also discloses a GPU sharing scheduling device, which includes:
  • Memory used to store computer programs
  • the processor is configured to execute the computer program to implement the aforementioned GPU sharing scheduling method.
  • the invention also discloses a GPU sharing single-machine multi-card device, which includes:
  • Memory used to store computer programs
  • the processor is configured to execute the computer program to implement the aforementioned GPU sharing single-machine multi-card method.
  • the GPU sharing scheduling method is applied to the scheduler of the central control host, including: using the resource occupancy mark and update mark of the pod to query the GPU information in the environment variables of the unupdated pod in each controlled host; the pod is not updated For pods that are already running but have not updated GPU information; update GPU information to non-updated pods, and add update flags for non-updated pods; filter out schedulable controlled hosts without unmarked pods from multiple controlled hosts ; Unmarked pod is a pod with no update mark; using the state information of the GPU in the schedulable controlled host, select the target controlled host that meets the first preset condition and the target controlled host that meets the target controlled host from the schedulable controlled host The target GPU of the second preset condition; write the GPU information of the target GPU to the pod to be allocated; allocate the pod to be allocated to the target controlled host.
  • the present invention adds a resource occupancy mark and an update mark to the pod, uses the resource occupancy mark to prove that the pod is running, and uses the update mark to indicate that the resource usage information in the pod annotation is consistent with the actual resource usage information, so that it can be determined during scheduling
  • update the resource usage information in the pod annotation before scheduling that is, the GPU information of the GPU actually used, to ensure that the GPU information in the annotation is consistent with the GPU information actually recorded in the pod environment variable to avoid actual use
  • the resource situation is inconsistent with the annotations, causing scheduling failures and causing bugs, laying a foundation for subsequent pods to use multiple GPUs for multi-card division.
  • pods when pods are to be allocated, they should avoid controlled hosts with unmarked pods. , To avoid causing errors, thereby avoiding the bugs caused by the allocation.
  • FIG. 1 is a schematic flowchart of a GPU sharing scheduling method disclosed in an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of another GPU sharing scheduling method disclosed in an embodiment of the present invention.
  • FIG. 3 is a schematic flow chart of a method for GPU sharing single-machine multi-card disclosed in an embodiment of the present invention
  • FIG. 4 is a schematic diagram of the structure of a GPU sharing scheduling system disclosed in an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of the structure of a GPU sharing single-machine multi-card system disclosed in an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of a GPU sharing scheduling device disclosed in an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of a GPU sharing single-machine multi-card device disclosed in an embodiment of the present invention.
  • the embodiment of the present invention discloses a GPU sharing scheduling method. As shown in FIG. 1, it is applied to a scheduler of a central control host, and the method includes:
  • S11 Use the resource occupancy flag and update flag of the pod to query the GPU information in the environment variable of the pod that is not updated in each controlled host.
  • the pod will have a resource occupancy mark set in the runtime annotation, which is used to identify the GPU information used by the current pod at runtime to prove that the pod is running, and because there is a situation where pod resources are exchanged, that is, the pod is allocated for use The specified resource, but another resource is actually used.
  • the pod resource usage recorded in the pod annotation is inconsistent with the actual resource used by the pod. Since the pod annotation is easy to access, the scheduler will use the pod annotation If the pod resource usage recorded in the pod comment is inconsistent with the actual resource used by the pod, it will interfere with the scheduling function of the scheduler and cause the scheduling to fail. For this reason, set the update flag in the pod and update the pod Add an update mark to the pod after the resource information, indicating that the actual resource situation of the pod is consistent with that recorded in the comment.
  • first use the resource occupation mark and update mark of the pod to query the non-updated pods that are running but have not updated the GPU information in each controlled host, that is, the pods without the update mark.
  • Such pods may be on The pod that runs during a scheduling has not yet been updated, so there is no update flag, or the communication with the controlled host was interrupted during the previous update, and it was not updated. Therefore, in order to update the unupdated pod, you need to use the resource occupation flag first Filter out the pods that are already running, and then use the update mark to filter out the unupdated pods that are running but have no update mark, and then find the container ID of the unupdated pod.
  • GPU information may be the UUID of the GPU, or may also include GPU-related information such as total GPU memory.
  • information such as tags and GPU information can be recorded in the pod annotations, so that the scheduler can directly determine the pod information from the pod annotations, instead of using the ID to access the pod from the bottom. Therefore, record the flags and information in the pod annotations Later, the scheduler can easily obtain the information recorded in the pod comments.
  • the pod comment contains the GPU information of the allocated target GPU written in advance when the scheduler allocates. Therefore, there may be cases where the GPU described in the comment does not match the GPU actually used.
  • the GPU information of the GPU is obtained, the GPU information is updated to the annotation of the unupdated pod, so as to update the resource usage of the unupdated pod.
  • the update continue to add the update mark in the annotation to indicate the pod
  • the actual resource usage is consistent with that recorded in the scheduler.
  • the GPU actually used is the same as the GPU in the annotation, it can also be updated, but the result after the update is unchanged from the result before the update.
  • S14 Use the state information of the GPU in the schedulable controlled host to filter out the target controlled host that meets the first preset condition from the schedulable controlled host and the target GPU that meets the second preset condition from the target controlled host.
  • the state information of can include the GPU information of each GPU in the schedulable controlled host and the virtual GPU information in all pods.
  • the GPU usage of each schedulable controlled host can be determined by using the state information of the GPU in the schedulable controlled host , According to the first preset condition, select the target controlled host that meets the first preset condition from the multiple schedulable controlled hosts, and then filter the target controlled host according to the second preset condition from the target controlled host.
  • the target GPU that meets the conditions is generated, and the pod to be allocated is run on the target controlled host using the resources of the target GPU.
  • the first preset condition may be the controlled host with the most idle GPU resources, that is, the target controlled host with the most idle GPU resources and the target GPU with the most free video memory among the target controlled hosts are screened out from the schedulable controlled hosts.
  • the second preset condition may be the GPU with the most free video memory, that is, the target GPU with the most free video memory is filtered from the target controlled host.
  • the GPU information of the target GPU is written in the annotation of the pod to be allocated, so that when the pod to be allocated runs on the target controlled host, the GPU information of the GPU recorded in the annotation is used to obtain the resources of the target GPU.
  • S16 Allocate the pod to be allocated to the target controlled host.
  • the allocated pod can run on the target controlled host to divide the video memory.
  • the embodiment of the present invention adds a resource occupancy mark and an update mark to the pod, uses the resource occupancy mark to prove that the pod is running, and uses the update mark to indicate that the resource usage information in the pod annotation is consistent with the actual resource usage information, so that the scheduling
  • update the resource usage information in the pod annotation before scheduling that is, the GPU information of the GPU actually used, to ensure that the GPU information in the annotation is consistent with the GPU information actually recorded in the pod environment variable .
  • avoid the existence of unmarked pods when allocating pods to be allocated The controlled host avoids errors, thus avoiding bugs caused by allocation.
  • the embodiment of the present invention also discloses a GPU sharing scheduling method. As shown in FIG. 2, the method includes:
  • S21 Use the resource occupancy mark and update mark of the pod to query the container ID of the non-updated pod that is already running but has not updated the GPU information in each controlled host;
  • S22 Use the container ID of the unupdated pod to query the GPU information in the environment variable of the unupdated pod.
  • the API Application Programming Interface
  • the API Application Programming Interface
  • the API of the application container engine docker an open source application container engine
  • the GPU information of the GPU in the variable can be used to access the unupdated pod according to the container ID of the unupdated pod, thereby obtaining the environment of the unupdated pod
  • the GPU information of the GPU in the variable can be used to access the unupdated pod according to the container ID of the unupdated pod, thereby obtaining the environment of the unupdated pod.
  • S25 Use the GPU information of the GPU in the schedulable controlled host and the virtual GPU information in all pods to filter out the target controlled host with the most free GPU resources and the target controlled host with the most free video memory from the schedulable controlled host GPU.
  • the GPU information of the GPU in the schedulable controlled host and the virtual GPU information in all pods in the schedulable controlled host it is possible to know which GPU memory has been used by the pod and which has not been used.
  • the target controlled host with the most free GPU resources and the target GPU with the most free video memory among the target controlled hosts are screened out from the control hosts.
  • S27 Allocate the pod to be allocated to the target controlled host and bind it.
  • the embodiment of the present invention also discloses a GPU sharing single-machine multi-card method. As shown in FIG. 3, it is applied to a target controlled host, and the method includes:
  • S31 Receive a resource division request sent by a pod bound to the target controlled host.
  • the resource division request includes resource division.
  • resource division information specifically records how many virtual GPUs to be divided, and how much video memory each virtual GPU has. For example, register resources in the form of inspur.com/gpu, and set 5-digit resource bits as the ID of the virtual graphics card, such as 99999 , To ensure that the capacity is large enough to avoid exceeding the upper limit due to the number of registrations.
  • the first 3 digits of the 5 resource bits are the number of virtual graphics cards to be registered, and the last two digits are the video memory value that each virtual graphics card needs to divide, for example, 00203, which is Two virtual graphics cards need to be divided, each with 3GiB (Giga Binary Byte) video memory.
  • 00203 which is Two virtual graphics cards need to be divided, each with 3GiB (Giga Binary Byte) video memory.
  • 3GiB Giga Binary Byte
  • S32 Traverse the pods bound to the target controlled host, and determine whether there are multiple pods whose recorded resource division information is consistent with the resource division information recorded in the resource division request.
  • the control host may also run multiple pods whose recorded resource division information is consistent with the resource division information recorded in the resource division request, so it is necessary to select a suitable pod to apply for resource division.
  • the resource occupation mark of the pod can be used as the judgment criterion, where the resource occupation mark is running on the pod.
  • the pod written in the pod comment is used to indicate that the GPU recorded in the pod comment has been occupied, so the pod with no resource occupancy mark is selected from it as the target pod to perform a new video memory division.
  • S35 Write the GPU information of the target GPU written by the scheduler of the central control host in the target pod annotation into the environment variable of the target pod.
  • the GPU information of the target GPU written by the scheduler of the central control host in the target pod annotation is written into the environment variable of the target pod to determine the target GPU to be divided.
  • the written environment variable of the target pod may be the UUID of the GPU.
  • the target pod can specify the GPU.
  • S36 Use the resource division information in the target pod to register corresponding one or more virtual graphics cards, and divide the idle video memory of the target GPU into each virtual graphics card accordingly.
  • one or more virtual graphics cards are registered, and the free video memory of the target GPU is divided into each virtual graphics card accordingly.
  • the division of a single machine with multiple cards and realize that a pod on a single controlled host can divide the graphics card multiple times.
  • setting a resource occupancy flag for the target pod indicates that the target GPU recorded in the target pod comment has been occupied, paving the way for the subsequent scheduler to correctly schedule the pod.
  • the embodiment of the present invention uses the concept of registering virtual resources to implement multiple divisions of the graphics card by a pod on a controlled host. After running the pod, a resource occupancy mark is added to the pod, so that the resource occupancy mark can be used from Multiple pods with the same resource division request are selected to select a pod that is not running, allowing multiple resource division requests for the same pod to run. At the same time, setting the resource occupancy flag provides a pavement for the subsequent scheduler to accurately schedule the pods to be allocated.
  • the embodiment of the present invention also discloses a GPU shared scheduling system.
  • the scheduler applied to the central control host includes:
  • the GPU information query module 11 is used to use the resource occupancy mark and update mark of the pod to query the GPU information in the environment variables of the unupdated pod in each controlled host; the unupdated pod is the pod that is already running but has not updated the GPU information;
  • the GPU information update module 12 is used to update the GPU information to the annotations of the unupdated pods, and add an update mark for the unupdated pods;
  • the unmarked pod screening module 13 is used to filter out schedulable controlled hosts without unmarked pods from multiple controlled hosts; unmarked pods are pods without update marks;
  • the scheduling module 14 is configured to use the state information of the GPU in the schedulable controlled host to filter out the target controlled host that meets the first preset condition from the schedulable controlled host and the target controlled host that meets the second preset condition Target GPU;
  • the GPU information writing module 15 is used to write the GPU information of the target GPU in the annotation of the pod to be allocated;
  • the pod allocation module 16 is used to allocate pods to be allocated to the target controlled host.
  • the embodiment of the present invention adds a resource occupancy mark and an update mark to the pod, uses the resource occupancy mark to prove that the pod is running, and uses the update mark to indicate that the resource usage information in the pod annotation is consistent with the actual resource usage information, so that the scheduling
  • update the resource usage information in the pod annotation before scheduling that is, the GPU information of the GPU actually used, to ensure that the GPU information in the annotation is consistent with the GPU information actually recorded in the pod environment variable .
  • avoid the existence of unmarked pods when allocating pods to be allocated The controlled host avoids errors, thus avoiding bugs caused by allocation.
  • the aforementioned GPU information query module 11 may specifically include an update query unit and a GPU information query unit; wherein,
  • the update query unit is used to use the resource occupancy mark and update mark of the pod to query the container ID of the pod that has not been updated in each controlled host;
  • the GPU information query unit is configured to use the container ID of the unupdated pod to query the GPU information in the environment variable of the unupdated pod.
  • the aforementioned scheduling module 14 can be specifically configured to use GPU information of the GPU in the schedulable controlled host and virtual GPU information in all pods in the schedulable controlled host to filter out the schedulable controlled host that meets the first prerequisite. Set the conditional target controlled host and the target GPU that meets the second preset condition among the target controlled hosts.
  • the aforementioned scheduling module 14 can also be specifically configured to use the GPU information of the GPU in the schedulable controlled host and the virtual GPU information in all pods to filter out the target controlled host with the most idle GPU resources from the schedulable controlled host. And the target GPU with the most free video memory in the target controlled host.
  • the aforementioned pod allocation module 16 may be specifically configured to allocate and bind pods to be allocated to the target controlled host.
  • the API Application Programming Interface
  • the API Application Programming Interface of the application container engine docker can be used to access the unupdated pod according to the container ID of the unupdated pod, so as to obtain the GPU information of the GPU in the environment variable of the unupdated pod.
  • the embodiment of the present invention also discloses a GPU sharing single-machine multi-card system. As shown in FIG. 5, it is applied to the target controlled host and includes:
  • the division request receiving module 21 is configured to receive a resource division request sent by a pod bound to the target controlled host;
  • the information consistency judging module 22 is used to traverse the pods bound to the target controlled host and judge whether there are multiple pods whose recorded resource division information is consistent with the resource division information recorded in the resource division request;
  • the target pod determining module 23 is configured to, if the information consistency determining module 22 determines that there is only one pod whose recorded resource division information is consistent, it is used as the target pod;
  • the target pod screening module 24 is configured to, if the information consistency judging module 22 determines that there are multiple pods with the same recorded resource division information, then filter out the target pods without the resource occupation mark;
  • the environment variable writing module 25 is used to write the GPU information of the target GPU written by the scheduler of the central control host in the annotation of the target pod into the environment variable of the target pod;
  • the video memory division module 26 is configured to use the resource division information in the target pod to register one or more virtual graphics cards, and divide the idle video memory of the target GPU into each virtual graphics card accordingly;
  • the resource occupancy mark module 27 is used to set a resource occupancy mark for the target pod, indicating that the target GPU recorded in the comment of the target pod has been occupied.
  • the embodiment of the present invention uses the concept of registering virtual resources to implement multiple divisions of the graphics card by a pod on a controlled host. After running the pod, a resource occupancy mark is added to the pod, so that the resource occupancy mark can be used from Multiple pods with the same resource division request are selected to select a pod that is not running, allowing multiple resource division requests for the same pod to run. At the same time, setting the resource occupancy flag provides a pavement for the subsequent scheduler to accurately schedule the pods to be allocated.
  • the embodiment of the present invention also discloses a GPU sharing scheduling device 400, as shown in FIG. 6, including:
  • the memory 402 is used to store computer programs
  • the processor 401 is configured to execute a computer program to implement the aforementioned GPU sharing scheduling method.
  • the embodiment of the present invention also discloses a GPU sharing single-machine multi-card device 500, as shown in FIG. 7, including:
  • the memory 502 is used to store computer programs
  • the processor 501 is configured to execute a computer program to implement the aforementioned GPU sharing single-machine multi-card method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了一种GPU共享调度、单机多卡方法、系统及装置,应用于中控主机的调度器,本申请在pod中增加资源占用标记和更新标记,利用资源占用标记证明该pod正在运行,利用更新标记表明该pod注解中的资源使用信息与实际的资源使用信息一致,使得在调度时,能够确定每个pod使用资源的情况,在调度前更新pod注解中的资源使用信息,即实际使用的GPU的GPU信息,确保注解中的GPU信息与pod环境变量中实际记载的GPU信息一致,避免实际使用资源情况与注解中不符,造成调度失败,引发bug,为后续pod对使用多张GPU进行多卡划分奠定了基础,同时,在对待分配pod进行分配时,避开存在无标记pod的被控主机,避免引发错误,从而避免了分配时所引发的bug。

Description

一种GPU共享调度、单机多卡方法、系统及装置
本申请要求于2020年4月8日提交中国国家知识产权局,申请号为202010277708.7,发明名称为“一种GPU共享调度、单机多卡方法、系统及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及计算机资源管理领域,特别涉及一种GPU(Graphics Processing Unit,图形处理器)共享调度、单机多卡方法、系统及装置。
背景技术
当一个用户在kubernetes(K8s)上起了一个pod(kubernetes中的最小管理元素)用作开发平台时,由于开发时也需用到GPU,则起这个pod的资源要求中也必需涉及GPU,若不做GPU细粒度切分,则至少用1个GPU,这样造成了极大的资源浪费。考虑到多个用户进行GPU开发的场景,一般来说,每个用户开发时使用GPU的利用率低(与训练任务相反),所以用户之前的GPU共享非常的有必要。
现有技术中,虽能够支持Kubernetes集群细粒度调度,但是只支持单机单卡的调度,即此pod的所有容器的环境变量里只能写入一个GPU的UUID(Universally Unique Identifier,通用唯一识别码),且存在一定的bug(错误),在调度器调度pod时,会出现pod所使用的资源与实际使用的资源不符的现象,导致容易出现调度失败的情况。
为此,需要一种能够支持单机多卡的GPU共享调度方法。
发明内容
有鉴于此,本发明的目的在于提供一种GPU共享调度、单机多卡方法、系统及装置,在调度前能够确定pod的资源使用情况,支持单机多卡调度。其具体方案如下:
一种GPU共享调度方法,应用于中控主机的调度器,包括:
利用pod的资源占用标记和更新标记,查询各被控主机中未更新pod的环境变量中的GPU信息;所述未更新pod为已在运行但未更新GPU信息的pod;
将所述GPU信息更新到所述未更新pod中,并为所述未更新pod添加所述更新标记;
从多个被控主机中筛选出没有无标记pod的可调度被控主机;所述无标记pod为没有所述更新标记的pod;
利用可调度被控主机中GPU的状态信息,从可调度被控主机中筛选出满足第一预设条件的目标被控主机和所述目标被控主机中满足第二预设条件的目标GPU;
写入所述目标GPU的GPU信息至待分配pod;
将所述待分配pod分配至所述目标被控主机。
可选的,所述利用pod的资源占用标记和更新标记,查询各被控主机中未更新pod的环境变量中的GPU信息的过程,包括:
利用pod的资源占用标记和更新标记,查询各被控主机中所述未更新pod的容器ID(Identity,序列号);
利用所述未更新pod的容器ID,查询所述未更新pod的环境变量中的GPU信息。
可选的,所述利用可调度被控主机中GPU的状态信息,从可调度被控主机中筛选出满足第一预设条件的目标被控主机和所述目标被控主机中满足第二预设条件的目标GPU的过程,包括:
利用可调度被控主机中GPU的GPU信息和可调度被控主机中全部pod中虚拟GPU信息,从可调度被控主机中筛选出满足第一预设条件的目标被控主机和所述目标被控主机中满足第二预设条件的目标GPU。
可选的,所述从可调度被控主机中筛选出满足第一预设条件的目标被控主机和所述目标被控主机中满足第二预设条件的目标GPU的过程,包括:
从可调度被控主机中筛选出GPU资源空闲最多的所述目标被控主机和所述目标被控主机中显存空闲最多的所述目标GPU。
可选的,所述将所述待分配pod分配至所述目标被控主机的过程,包括:
将所述待分配pod分配至所述目标被控主机并进行绑定。
本发明还公开了一种GPU共享单机多卡方法,应用于目标被控主机,包括:
接收绑定在所述目标被控主机上的pod发送的资源划分请求;
遍历与所述目标被控主机绑定的pod,判断是否有多个记载的资源划分信息与所述资源划分请求中记载的资源划分信息一致的pod;
若仅有一个记载的资源划分信息一致的pod,则作为目标pod;
若有多个记载的资源划分信息一致的pod,则从中筛选出无资源占用标记的所述目标pod;
将所述目标pod的注解中由中控主机的调度器写入的目标GPU的GPU信息,写入所述目标pod的环境变量中;
利用所述目标pod中所述资源划分信息,注册相应的1个或多个虚拟显卡,并将所述目标GPU的空闲显存相应的划分至每个虚拟显卡中;
为所述目标pod设置所述资源占用标记,表明所述目标pod的注解中记载的目标GPU已被占用。
本发明还公开了一种GPU共享调度系统,应用于中控主机的调度器,包括:
GPU信息查询模块,用于利用pod的资源占用标记和更新标记,查询各被控主机中未更新pod的环境变量中的GPU信息;所述未更新pod为已在运行但未更新GPU信息的pod;
GPU信息更新模块,用于将所述GPU信息更新到所述未更新pod的注解中,并为所述未更新pod添加所述更新标记;
无标记pod筛选模块,用于从多个被控主机中筛选出没有无标记pod的可调度被控主机;所述无标记pod为没有所述更新标记的pod;
调度模块,用于利用可调度被控主机中GPU的状态信息,从可调度被控主机中筛选出满足第一预设条件的目标被控主机和所述目标被控主机中满足第二预设条件的目标GPU;
GPU信息写入模块,用于在待分配pod的注解中写入所述目标GPU的GPU信息;
pod分配模块,用于将所述待分配pod分配至所述目标被控主机。
本发明还公开了一种GPU共享单机多卡系统,应用于目标被控主机,包括:
划分请求接收模块,用于接收绑定在所述目标被控主机上的pod发送的资源划分请求;
信息一致判断模块,用于遍历与所述目标被控主机绑定的pod,判断是否有多个记载的资源划分信息与所述资源划分请求中记载的资源划分信息一致的pod;
目标pod确定模块,用于若所述信息一致判断模块判定仅有一个记载的资源划分信息一致的pod,则作为目标pod;
目标pod筛选模块,用于若所述信息一致判断模块判定有多个记载的资源划分信息一致的pod,则从中筛选出无资源占用标记的所述目标pod;
环境变量写入模块,用于将所述目标pod注解中由中控主机的调度器写入的目标GPU的GPU信息,写入所述目标pod的环境变量中;
显存划分模块,用于利用所述目标pod中所述资源划分信息,注册相应的1个或多个虚拟显卡,并将所述目标GPU的空闲显存相应的划分至每个虚拟显卡中;
资源占用标记模块,用于为所述目标pod设置所述资源占用标记,表明所述目标pod注解中记载的目标GPU已被占用。
本发明还公开了一种GPU共享调度装置,包括:
存储器,用于存储计算机程序;
处理器,用于执行所述计算机程序以实现如前述的GPU共享调度方法。
本发明还公开了一种GPU共享单机多卡装置,包括:
存储器,用于存储计算机程序;
处理器,用于执行所述计算机程序以实现如前述的GPU共享单机多卡方法。
本发明中,GPU共享调度方法,应用于中控主机的调度器,包括:利用pod的资源占用标记和更新标记,查询各被控主机中未更新pod的环境变量中的GPU信息;未更新pod为已在运行但未更新GPU信息的pod;将GPU信息更新到未更新pod中,并为未更新pod添加更新标记;从多个被控主机中筛选出没有无标记pod的可调度被控主机;无标记pod为没有更新标记的pod;利用可调度被控主机中GPU的状态信息,从可调度被控主机中筛选出满足第一预设条件的目标被控主机和目标被控主机中满足第二预设条件的目标GPU;写入目标GPU的GPU信息至待分配pod;将待分配pod分配至目标被控主机。
本发明在pod中增加资源占用标记和更新标记,利用资源占用标记证明该pod正在运行,利用更新标记表明该pod注解中的资源使用信息与实际的资源使用信息一致,使得在调度时,能够确定每个pod使用资源的情况,在调度前更新pod注解中的资源使用信息,即实际使用的GPU的GPU信息,确保注解中 的GPU信息与pod环境变量中实际记载的GPU信息一致,避免实际使用资源情况与注解中不符,造成调度失败,引发bug,为后续pod对使用多张GPU进行多卡划分奠定了基础,同时,在对待分配pod进行分配时,避开存在无标记pod的被控主机,避免引发错误,从而避免了分配时所引发的bug。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本发明实施例公开的一种GPU共享调度方法流程示意图;
图2为本发明实施例公开的另一种GPU共享调度方法流程示意图;
图3为本发明实施例公开的一种GPU共享单机多卡方法流程示意图;
图4为本发明实施例公开的一种GPU共享调度系统结构示意图;
图5为本发明实施例公开的一种GPU共享单机多卡系统结构示意图;
图6为本发明实施例公开的一种GPU共享调度装置的示意图;
图7为本发明实施例公开的一种GPU共享单机多卡装置的示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
本发明实施例公开了一种GPU共享调度方法,参见图1所示,应用于中控主机的调度器,该方法包括:
S11:利用pod的资源占用标记和更新标记,查询各被控主机中未更新pod的环境变量中的GPU信息。
具体的,pod在运行时注解中会设置有资源占用标记,用于标识当前pod运行时所使用的GPU信息,证明该pod正在运行,而由于存在pod资源互换的情况,即,分配pod使用指定的资源,但实际上使用了另外的资源,此时,pod注解中记载的pod资源使用情况与实际上pod所使用的资源不一致,由于pod的注解便于访问,所以调度器会利用pod注解中的信息进行调度,若pod注解中记载的pod资源使用情况与实际上pod所使用的资源不一致,将会干扰调度器调度功能,造成调度失败,为此,在pod中设置更新标记,在更新pod的资源信息后为pod添加更新标记,表明该pod的资源实际情况与注解中记载的一致。
具体的,在调度前首先利用pod的资源占用标记和更新标记,查询各被控主机中已在运行但未更新GPU信息的未更新pod,即无更新标记的pod,此类pod可能是在上一次调度时运行的pod,还未进行更新,所以不存在更新标记,或者因之前更新时与被控主机的通讯中断导致未更新,所以为了对未更新的pod进行更新,需要利用资源占用标记首先筛选出已经在运行的pod,再利用更新标记,筛选出在运行但是没有更新标记的未更新pod,再去查找未更新pod的容器ID。
具体的,查询到未更新pod后,访问被控主机中的未更新pod,并查询未更新pod的环境变量中记载的当前使用的GPU的GPU信息,从而确定出当前未更新pod实际使用的GPU;GPU信息可以为GPU的UUID,或还可以包括GPU总显存等GPU相关信息。
S12:将GPU信息更新到未更新pod的注解中,并为未更新pod添加更新标记。
具体的,pod的注解中可以记载标记和GPU信息等信息,便于调度器从pod的注解中直接确定pod的信息,而不用利用ID从底层访问pod,所以,在pod的注解中记载标记和信息后,调度器很容易能够获取pod注解中记载的信息。
具体的,pod的注解中记载有调度器分配时预先写入的分配的目标GPU的GPU信息,因此,存在注解中记载的GPU与实际使用的GPU不一致的情况。
具体的,在得到GPU的GPU信息后,便将GPU信息更新到未更新pod的注解中,实现对未更新pod的资源使用情况更新,更新后则继续在注解中添加更新标记,以表明该pod的实际资源使用情况与调度器中记载的一致。
可以理解的是,若实际使用的GPU与注解中的GPU一致,也可以进行更新,只是更新后相对于更新前结果未变。
S13:从多个被控主机中筛选出没有无标记pod的可调度被控主机;无标记pod为没有更新标记的pod。
具体的,由于存在通讯故障等情况,在上述S11至S13一次更新后,还可能会存在未更新的pod,为避免出现因pod资源不正确造成pod运行失败,需要运行新pod的可调度被控主机需要确保其中没有无标记pod,所以还需要从多个被控主机中筛选出没有无标记pod的可调度被控主机。
S14:利用可调度被控主机中GPU的状态信息,从可调度被控主机中筛选出满足第一预设条件的目标被控主机和目标被控主机中满足第二预设条件的目标GPU。
具体的,为了从筛选出的多个可调度被控主机中选择相对最优的可调度被控主机,还需要利用每个可调度被控主机中GPU的状态信息,可调度被控主机中GPU的状态信息可以包括可调度被控主机中每个GPU的GPU信息和其中全部pod中虚拟GPU信息,利用可调度被控主机中GPU的状态信息可 以确定每个可调度被控主机的GPU使用情况,再根据预先设定的第一预设条件,从多个可调度被控主机中筛选出满足第一预设条件的目标被控主机,再从目标被控主机中根据第二预设条件筛选出满足条件的目标GPU,让待分配的pod在目标被控主机中利用目标GPU的资源运行。
具体的,第一预设条件可以为GPU资源空闲最多的被控主机,即从可调度被控主机中筛选出GPU资源空闲最多的目标被控主机和目标被控主机中显存空闲最多的目标GPU;第二预设条件可以为显存空闲最多的GPU,即从目标被控主机中筛选出显存空闲最多的目标GPU。
S15:在待分配pod的注解中写入目标GPU的GPU信息。
具体的,在待分配pod的注解中写入目标GPU的GPU信息,以便待分配pod在目标被控主机上运行时利用注解中记载的GPU的GPU信息,获取目标GPU的资源。
S16:将待分配pod分配至目标被控主机。
具体的,将待分配pod分配至目标被控主机后,分配后的pod便可以在目标被控主机中运行,进行显存的划分。
可见,本发明实施例在pod中增加资源占用标记和更新标记,利用资源占用标记证明该pod正在运行,利用更新标记表明该pod注解中的资源使用信息与实际的资源使用信息一致,使得在调度时,能够确定每个pod使用资源的情况,在调度前更新pod注解中的资源使用信息,即实际使用的GPU的GPU信息,确保注解中的GPU信息与pod环境变量中实际记载的GPU信息一致,避免实际使用资源情况与注解中不符,造成调度失败,引发bug,为后续pod对使用多张GPU进行多卡划分奠定了基础,同时,在对待分配pod进行分配时,避开存在无标记pod的被控主机,避免引发错误,从而避免了分配时所引发的bug。
相应的,本发明实施例还公开了一种GPU共享调度方法,参见图2所示,该方法包括:
S21:利用pod的资源占用标记和更新标记,查询各被控主机中已在运行但未更新GPU信息的未更新pod的容器ID;
S22:利用未更新pod的容器ID,查询未更新pod的环境变量中的GPU信息。
具体的,可以利用应用容器引擎docker(一种开源的应用容器引擎)的API(Application Programming Interface,应用程序接口),根据未更新pod的容器ID,访问未更新pod,从而获取未更新pod的环境变量中GPU的GPU信息。
S23:将GPU信息更新到未更新pod的注解中,并为未更新pod添加更新标记;
S24:从多个被控主机中筛选出没有无标记pod的可调度被控主机;无标记pod为没有更新标记的pod;
S25:利用可调度被控主机中GPU的GPU信息和全部pod中虚拟GPU信息,从可调度被控主机中筛选出GPU资源空闲最多的目标被控主机和目标被控主机中显存空闲最多的目标GPU。
具体的,利用可调度被控主机中GPU的GPU信息和可调度被控主机中全部pod中虚拟GPU信息,能够得知哪些GPU的显存已被pod使用,哪些还未使用,从而从可调度被控主机中筛选出GPU资源空闲最多的目标被控主机和目标被控主机中显存空闲最多的目标GPU。
S26:在待分配pod的注解中写入目标GPU的GPU信息;
S27:将待分配pod分配至目标被控主机并进行绑定。
具体的,将pod与目标被控主机进行绑定,避免pod被移动或修改。
此外,本发明实施例还公开了一种GPU共享单机多卡方法,参见图3所示,应用于目标被控主机,该方法包括:
S31:接收绑定在目标被控主机上的pod发送的资源划分请求。
具体的,pod在分配到目标被控主机上后,开始运行并发送资源划分请求,利用虚拟资源的概念,通过注册多份虚拟显卡,实现单机多卡的显存划分,资源划分请求中包括资源划分信息,资源划分信息具体记载了要划分出多少虚拟GPU,每份虚拟GPU的显存为多少,例如,以inspur.com/gpu的形式注册资源,设置5位资源位作为虚拟显卡的ID,如99999,确保容量足够大,避免因需要注册的数量超过上限,5位资源位前3位为要注册的虚拟显卡数量,后两位为每个虚拟显卡所需划分的显存值,例如,00203,为需要划分2张虚拟显卡,每份3GiB(Giga Binary byte)显存。
S32:遍历与目标被控主机绑定的pod,判断是否有多个记载的资源划分信息与资源划分请求中记载的资源划分信息一致的pod。
具体的,由于资源划分请求中的资源划分信息使用的是虚拟显卡的ID,所以需要判断出这个资源请求是由哪个pod发出,以便后续利用该pod的注解对相应的GPU获取显存,而目标被控主机又可能运行有多个记载的资源划分信息与资源划分请求中记载的资源划分信息一致的pod,所以需要从中挑选出合适的pod申请资源划分。
S33:若仅有一个记载的资源划分信息一致的pod,则作为目标pod。
可以理解的是,若仅有一个记载的资源划分信息一致的pod,则将该pod作为目标pod进行后续步骤。
S34:若有多个记载的资源划分信息一致的pod,则从中筛选出无资源占用标记的目标pod。
具体的,若存在多个记载的资源划分信息一致的pod,则需要从中选出没有在运行的pod,此时可以利用pod的资源占用标记来作为判断标准,其中,资源占用标记为在pod运行时,在pod的注解中写入的用于表明pod注解中记载的GPU已被占用,所以从中筛选出无资源占用标记的pod作为目标pod,以进行新的显存划分。
需要说明的是,若存在多个无资源占用标记的pod,则可以从中任选其一执行,因为各pod中记载的资源划分请求一致。
S35:将目标pod注解中由中控主机的调度器写入的目标GPU的GPU信息,写入目标pod的环境变量中。
具体的,在确定目标pod后,则将目标pod注解中由中控主机的调度器写入的目标GPU的GPU信息,写入目标pod的环境变量中,以确定准备被划分的目标GPU。
其中,写入的目标pod的环境变量中的可以为GPU的UUID,通过GPU的UUID,目标pod能够指定GPU。
S36:利用目标pod中资源划分信息,注册相应的1个或多个虚拟显卡,并将目标GPU的空闲显存相应的划分至每个虚拟显卡中。
具体的,根据目标pod的资源划分请求中的资源划分信息中的虚拟显卡的ID,注册相应的1个或多个虚拟显卡,并将目标GPU的空闲显存相应的划分至每个虚拟显卡中,完成单机多卡的划分,实现在单个被控主机上一个pod对显卡进行多次划分。
S37:为目标pod设置资源占用标记,表明目标pod注解中记载的目标GPU已被占用。
具体的,为目标pod设置资源占用标记,表明目标pod注解中记载的目标GPU已被占用,为后续调度器能够正确调度pod做铺垫。
可见,本发明实施例通过注册虚拟资源这一概念实现在一台被控主机上由一个pod对显卡进行多次划分,在运行pod后,对pod添加资源占用标记,使得能够利用资源占用标记从多个相同资源划分请求的pod,选择出未运行的pod,允许了多个资源划分请求相同的pod运行,同时,设置资源占用标记为后续调度器准确的调度待分配的pod做出了铺垫。
相应的,本发明实施例还公开了一种GPU共享调度系统,参见图4所示,应用于中控主机的调度器,包括:
GPU信息查询模块11,用于利用pod的资源占用标记和更新标记,查询各被控主机中未更新pod的环境变量中的GPU信息;未更新pod为已在运行但未更新GPU信息的pod;
GPU信息更新模块12,用于将GPU信息更新到未更新pod的注解中,并为未更新pod添加更新标记;
无标记pod筛选模块13,用于从多个被控主机中筛选出没有无标记pod的可调度被控主机;无标记pod为没有更新标记的pod;
调度模块14,用于利用可调度被控主机中GPU的状态信息,从可调度被控主机中筛选出满足第一预设条件的目标被控主机和目标被控主机中满足第二预设条件的目标GPU;
GPU信息写入模块15,用于在待分配pod的注解中写入目标GPU的GPU信息;
pod分配模块16,用于将待分配pod分配至目标被控主机。
可见,本发明实施例在pod中增加资源占用标记和更新标记,利用资源占用标记证明该pod正在运行,利用更新标记表明该pod注解中的资源使用信息与实际的资源使用信息一致,使得在调度时,能够确定每个pod使用资源的情况,在调度前更新pod注解中的资源使用信息,即实际使用的GPU的GPU信息,确保注解中的GPU信息与pod环境变量中实际记载的GPU信息一致,避免实际使用资源情况与注解中不符,造成调度失败,引发bug,为后续pod对使用多张GPU进行多卡划分奠定了基础,同时,在对待分配pod进行分配时,避开存在无标记pod的被控主机,避免引发错误,从而避免了分配时所引发的bug。
具体的,上述GPU信息查询模块11,可以具体包括更新查询单元和GPU信息查询单元;其中,
更新查询单元,用于利用pod的资源占用标记和更新标记,查询各被控主机中未更新pod的容器ID;
GPU信息查询单元,用于利用未更新pod的容器ID,查询未更新pod的环境变量中的GPU信息。
具体的,上述调度模块14,可以具体用于利用可调度被控主机中GPU的GPU信息和可调度被控主机中全部pod中虚拟GPU信息,从可调度被控主机中筛选出满足第一预设条件的目标被控主机和目标被控主机中满足第二预设条件的目标GPU。
进一步的,上述调度模块14,还可以具体用于利用可调度被控主机中GPU的GPU信息和全部pod中虚拟GPU信息,从可调度被控主机中筛选出GPU资源空闲最多的目标被控主机和目标被控主机中显存空闲最多的目标GPU。
具体的,上述pod分配模块16,可以具体用于将待分配pod分配至目标被控主机并进行绑定。
具体的,可以利用应用容器引擎docker的API(Application Programming Interface,应用程序接口),根据未更新pod的容器ID,访问未更新pod,从而获取未更新pod的环境变量中GPU的GPU信息。
相应的,本发明实施例还公开了一种GPU共享单机多卡系统,参见图5所示,应用于目标被控主机,包括:
划分请求接收模块21,用于接收绑定在目标被控主机上的pod发送的资源划分请求;
信息一致判断模块22,用于遍历与目标被控主机绑定的pod,判断是否有多个记载的资源划分信息与资源划分请求中记载的资源划分信息一致的pod;
目标pod确定模块23,用于若信息一致判断模块22判定仅有一个记载的资源划分信息一致的pod,则作为目标pod;
目标pod筛选模块24,用于若信息一致判断模块22判定有多个记载的资源划分信息一致的pod,则从中筛选出无资源占用标记的目标pod;
环境变量写入模块25,用于将目标pod的注解中由中控主机的调度器写入的目标GPU的GPU信息,写入目标pod的环境变量中;
显存划分模块26,用于利用目标pod中资源划分信息,注册相应的1个或多个虚拟显卡,并将目标GPU的空闲显存相应的划分至每个虚拟显卡中;
资源占用标记模块27,用于为目标pod设置资源占用标记,表明目标pod的注解中记载的目标GPU已被占用。
可见,本发明实施例通过注册虚拟资源这一概念实现在一台被控主机上由一个pod对显卡进行多次划分,在运行pod后,对pod添加资源占用标记,使得能够利用资源占用标记从多个相同资源划分请求的pod,选择出未运行的pod,允许了多个资源划分请求相同的pod运行,同时,设置资源占用标记为后续调度器准确的调度待分配的pod做出了铺垫。
此外,本发明实施例还公开了一种GPU共享调度装置400,如图6所示,包括:
存储器402,用于存储计算机程序;
处理器401,用于执行计算机程序以实现如前述的GPU共享调度方法。
另外,本发明实施例还公开了一种GPU共享单机多卡装置500,如图7所示,包括:
存储器502,用于存储计算机程序;
处理器501,用于执行计算机程序以实现如前述的GPU共享单机多卡方法。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术 语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
以上对本发明所提供的技术内容进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。

Claims (10)

  1. 一种GPU共享调度方法,其特征在于,应用于中控主机的调度器,包括:
    利用pod的资源占用标记和更新标记,查询各被控主机中未更新pod的环境变量中的GPU信息;所述未更新pod为已在运行但未更新GPU信息的pod;
    将所述GPU信息更新到所述未更新pod的注解中,并为所述未更新pod添加所述更新标记;
    从多个被控主机中筛选出没有无标记pod的可调度被控主机;所述无标记pod为没有所述更新标记的pod;
    利用可调度被控主机中GPU的状态信息,从可调度被控主机中筛选出满足第一预设条件的目标被控主机和所述目标被控主机中满足第二预设条件的目标GPU;
    在待分配pod的注解中写入所述目标GPU的GPU信息;
    将所述待分配pod分配至所述目标被控主机。
  2. 根据权利要求1所述的GPU共享调度方法,其特征在于,所述利用pod的资源占用标记和更新标记,查询各被控主机中未更新pod的环境变量中的GPU信息的过程,包括:
    利用pod的资源占用标记和更新标记,查询各被控主机中所述未更新pod的容器序列号;
    利用所述未更新pod的容器序列号,查询所述未更新pod的环境变量中的GPU信息。
  3. 根据权利要求2所述的GPU共享调度方法,其特征在于,所述利用可调度被控主机中GPU的状态信息,从可调度被控主机中筛选出满足第一预设条件的目标被控主机和所述目标被控主机中满足第二预设条件的目标GPU的过程,包括:
    利用可调度被控主机中GPU的GPU信息和可调度被控主机中全部pod中虚拟GPU信息,从可调度被控主机中筛选出满足第一预设条件的目标被控主机和所述目标被控主机中满足第二预设条件的目标GPU。
  4. 根据权利要求3所述的GPU共享调度方法,其特征在于,所述从可调度被控主机中筛选出满足第一预设条件的目标被控主机和所述目标被控主机中满足第二预设条件的目标GPU的过程,包括:
    从可调度被控主机中筛选出GPU资源空闲最多的所述目标被控主机和所述目标被控主机中显存空闲最多的所述目标GPU。
  5. 根据权利要求1至4任一项所述的GPU共享调度方法,其特征在于,所述将所述待分配pod分配至所述目标被控主机的过程,包括:
    将所述待分配pod分配至所述目标被控主机并进行绑定。
  6. 一种GPU共享单机多卡方法,其特征在于,应用于目标被控主机,包括:
    接收绑定在所述目标被控主机上的pod发送的资源划分请求;
    遍历与所述目标被控主机绑定的pod,判断是否有多个记载的资源划分信息与所述资源划分请求中记载的资源划分信息一致的pod;
    若仅有一个记载的资源划分信息一致的pod,则作为目标pod;
    若有多个记载的资源划分信息一致的pod,则从中筛选出无资源占用标记的所述目标pod;
    将所述目标pod的注解中由中控主机的调度器写入的目标GPU的GPU信息,写入所述目标pod的环境变量中;
    利用所述目标pod中所述资源划分信息,注册相应的1个或多个虚拟显卡,并将所述目标GPU的空闲显存相应的划分至每个虚拟显卡中;
    为所述目标pod设置所述资源占用标记,表明所述目标pod的注解中记载的目标GPU已被占用。
  7. 一种GPU共享调度系统,其特征在于,应用于中控主机的调度器,包括:
    GPU信息查询模块,用于利用pod的资源占用标记和更新标记,查询各被控主机中未更新pod的环境变量中的GPU信息;所述未更新pod为已在运行但未更新GPU信息的pod;
    GPU信息更新模块,用于将所述GPU信息更新到所述未更新pod的注解中,并为所述未更新pod添加所述更新标记;
    无标记pod筛选模块,用于从多个被控主机中筛选出没有无标记pod的可调度被控主机;所述无标记pod为没有所述更新标记的pod;
    调度模块,用于利用可调度被控主机中GPU的状态信息,从可调度被控主机中筛选出满足第一预设条件的目标被控主机和所述目标被控主机中满足第二预设条件的目标GPU;
    GPU信息写入模块,用于在待分配pod的注解中写入所述目标GPU的GPU信息;
    pod分配模块,用于将所述待分配pod分配至所述目标被控主机。
  8. 一种GPU共享单机多卡系统,其特征在于,应用于目标被控主机,包括:
    划分请求接收模块,用于接收绑定在所述目标被控主机上的pod发送的资源划分请求;
    信息一致判断模块,用于遍历与所述目标被控主机绑定的pod,判断是否有多个记载的资源划分信息与所述资源划分请求中记载的资源划分信息一致的pod;
    目标pod确定模块,用于若所述信息一致判断模块判定仅有一个记载的资源划分信息一致的pod,则作为目标pod;
    目标pod筛选模块,用于若所述信息一致判断模块判定有多个记载的资源划分信息一致的pod,则从中筛选出无资源占用标记的所述目标pod;
    环境变量写入模块,用于将所述目标pod的注解中由中控主机的调度器写入的目标GPU的GPU信息,写入所述目标pod的环境变量中;
    显存划分模块,用于利用所述目标pod中所述资源划分信息,注册相应的1个或多个虚拟显卡,并将所述目标GPU的空闲显存相应的划分至每个虚拟显卡中;
    资源占用标记模块,用于为所述目标pod设置所述资源占用标记,表明所述目标pod的注解中记载的目标GPU已被占用。
  9. 一种GPU共享调度装置,其特征在于,包括:
    存储器,用于存储计算机程序;
    处理器,用于执行所述计算机程序以实现如权利要求1至5任一项所述的GPU共享调度方法。
  10. 一种GPU共享单机多卡装置,其特征在于,包括:
    存储器,用于存储计算机程序;
    处理器,用于执行所述计算机程序以实现如权利要求6所述的GPU共享单机多卡方法。
PCT/CN2021/073784 2020-04-08 2021-01-26 一种gpu共享调度、单机多卡方法、系统及装置 WO2021203805A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/917,860 US11768703B2 (en) 2020-04-08 2021-01-26 GPU-shared dispatching and single-machine multi-card methods, systems, and devices

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010277708.7A CN111475303B (zh) 2020-04-08 2020-04-08 一种gpu共享调度、单机多卡方法、系统及装置
CN202010277708.7 2020-04-08

Publications (1)

Publication Number Publication Date
WO2021203805A1 true WO2021203805A1 (zh) 2021-10-14

Family

ID=71751473

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/073784 WO2021203805A1 (zh) 2020-04-08 2021-01-26 一种gpu共享调度、单机多卡方法、系统及装置

Country Status (3)

Country Link
US (1) US11768703B2 (zh)
CN (1) CN111475303B (zh)
WO (1) WO2021203805A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114706690A (zh) * 2022-06-06 2022-07-05 浪潮通信技术有限公司 一种Kubernetes容器共享GPU方法及系统
CN115827253A (zh) * 2023-02-06 2023-03-21 青软创新科技集团股份有限公司 一种芯片资源算力分配方法、装置、设备及存储介质

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111475303B (zh) * 2020-04-08 2022-11-25 苏州浪潮智能科技有限公司 一种gpu共享调度、单机多卡方法、系统及装置
CN113127192B (zh) * 2021-03-12 2023-02-28 山东英信计算机技术有限公司 一种多个服务共享同一个gpu的方法、系统、设备及介质
CN113207116B (zh) * 2021-04-07 2022-11-11 上海微波技术研究所(中国电子科技集团公司第五十研究所) 虚拟卡系统及自适应虚拟卡方法
CN115658332A (zh) * 2022-12-28 2023-01-31 摩尔线程智能科技(北京)有限责任公司 一种gpu共享方法及装置、电子设备和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180052709A1 (en) * 2016-08-19 2018-02-22 International Business Machines Corporation Dynamic usage balance of central processing units and accelerators
CN109067828A (zh) * 2018-06-22 2018-12-21 杭州才云科技有限公司 基于Kubernetes和OpenStack容器云平台多集群构建方法、介质、设备
CN109885389A (zh) * 2019-02-19 2019-06-14 山东浪潮云信息技术有限公司 一种基于容器的并行深度学习调度训练方法及系统
CN110930291A (zh) * 2019-11-15 2020-03-27 山东英信计算机技术有限公司 一种gpu显存管理控制方法及相关装置
CN111475303A (zh) * 2020-04-08 2020-07-31 苏州浪潮智能科技有限公司 一种gpu共享调度、单机多卡方法、系统及装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110457135A (zh) * 2019-08-09 2019-11-15 重庆紫光华山智安科技有限公司 一种资源调度方法、装置及共享gpu显存的方法
US11182196B2 (en) * 2019-11-13 2021-11-23 Vmware, Inc. Unified resource management for containers and virtual machines
CN110888743B (zh) * 2019-11-27 2022-12-20 中科曙光国际信息产业有限公司 一种gpu资源使用方法、装置及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180052709A1 (en) * 2016-08-19 2018-02-22 International Business Machines Corporation Dynamic usage balance of central processing units and accelerators
CN109067828A (zh) * 2018-06-22 2018-12-21 杭州才云科技有限公司 基于Kubernetes和OpenStack容器云平台多集群构建方法、介质、设备
CN109885389A (zh) * 2019-02-19 2019-06-14 山东浪潮云信息技术有限公司 一种基于容器的并行深度学习调度训练方法及系统
CN110930291A (zh) * 2019-11-15 2020-03-27 山东英信计算机技术有限公司 一种gpu显存管理控制方法及相关装置
CN111475303A (zh) * 2020-04-08 2020-07-31 苏州浪潮智能科技有限公司 一种gpu共享调度、单机多卡方法、系统及装置

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114706690A (zh) * 2022-06-06 2022-07-05 浪潮通信技术有限公司 一种Kubernetes容器共享GPU方法及系统
CN114706690B (zh) * 2022-06-06 2022-09-16 浪潮通信技术有限公司 一种Kubernetes容器共享GPU方法及系统
CN115827253A (zh) * 2023-02-06 2023-03-21 青软创新科技集团股份有限公司 一种芯片资源算力分配方法、装置、设备及存储介质

Also Published As

Publication number Publication date
US11768703B2 (en) 2023-09-26
CN111475303A (zh) 2020-07-31
US20230153151A1 (en) 2023-05-18
CN111475303B (zh) 2022-11-25

Similar Documents

Publication Publication Date Title
WO2021203805A1 (zh) 一种gpu共享调度、单机多卡方法、系统及装置
CN103530170B (zh) 在虚拟机环境中提供硬件虚拟化的系统和方法
CN100399300C (zh) 用于数据处理的系统和方法和用于分配资源的系统和方法
US9459849B2 (en) Adaptive cloud aware just-in-time (JIT) compilation
US20090210464A1 (en) Storage management system and method thereof
US20080162865A1 (en) Partitioning memory mapped device configuration space
US8255431B2 (en) Managing memory
WO2023000673A1 (zh) 硬件加速器设备管理方法、装置及电子设备和存储介质
CN101159596B (zh) 用于布置服务器的方法和设备
CN1786927A (zh) 应用层高速缓存映像知晓和再分配的系统和方法
CN112052068A (zh) 一种Kubernetes容器平台CPU绑核的方法与装置
CN106445398A (zh) 一种基于新型存储器的嵌入式文件系统及其实现方法
KR20210088657A (ko) 시스템-온-칩에서 어드레스 디코딩을 핸들링하기 위한 장치 및 방법
JP2005208999A5 (zh)
US7793051B1 (en) Global shared memory subsystem
US20140289739A1 (en) Allocating and sharing a data object among program instances
CN113535087A (zh) 数据迁移过程中的数据处理方法、服务器及存储系统
CN109933358B (zh) 用于减少计量设备程序升级量的控制方法
CN115757260B (zh) 数据交互方法、图形处理器及图形处理系统
WO2023169161A1 (zh) 一种动态的、离散的、碎片化的嵌入式系统nand flash使用方法
WO2017142525A1 (en) Allocating a zone of a shared memory region
CN111475277A (zh) 一种资源分配方法、系统、设备及机器可读存储介质
US11483205B1 (en) Defragmentation of licensed resources in a provider network
CN115061813A (zh) 集群资源的管理方法、装置、设备及介质
CN110851181B (zh) 数据处理方法、装置及计算设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21784454

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21784454

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 21784454

Country of ref document: EP

Kind code of ref document: A1