CN107357661B - Fine-grained GPU resource management method for mixed load - Google Patents

Fine-grained GPU resource management method for mixed load Download PDF

Info

Publication number
CN107357661B
CN107357661B CN201710563834.7A CN201710563834A CN107357661B CN 107357661 B CN107357661 B CN 107357661B CN 201710563834 A CN201710563834 A CN 201710563834A CN 107357661 B CN107357661 B CN 107357661B
Authority
CN
China
Prior art keywords
task
gpu
resource
tasks
thread
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201710563834.7A
Other languages
Chinese (zh)
Other versions
CN107357661A (en
Inventor
杨海龙
禹超
白跃彬
栾钟治
顾育豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kaixi Beijing Information Technology Co ltd
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201710563834.7A priority Critical patent/CN107357661B/en
Publication of CN107357661A publication Critical patent/CN107357661A/en
Application granted granted Critical
Publication of CN107357661B publication Critical patent/CN107357661B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5022Mechanisms to release resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a fine-grained GPU resource management method aiming at mixed load, and provides a capacity-based stream multiprocessor abstract model CapSM, wherein the CapSM is used as a basic unit for resource management; when mixed loads (including online tasks and offline tasks) share GPU resources, the use of the GPU resources by different types of tasks is managed in a fine-grained mode, the task resource quota and the online adjustment of the resources are supported, and the service quality of the online tasks is guaranteed while the GPU resources are shared. The method determines the resources finally allocated to the tasks according to the types of the tasks, the resource requests and the current system GPU resource state, can meet the requirement of the offline tasks on GPU resources under the condition of sufficient resources, dynamically adjusts the resource usage of the offline tasks when the GPU resources are insufficient, and preferentially meets the resource requirements of the online tasks, so that the performance of the online tasks can be ensured and the GPU resources can be fully utilized when mixed loads run simultaneously.

Description

Fine-grained GPU resource management method for mixed load
Technical Field
The invention relates to the field of resource management and task scheduling in heterogeneous computing, in particular to a fine-grained GPU resource management method for mixed load.
Background
A Graphics Processing Unit (GPU) is becoming an indispensable component of high-performance computing, cloud computing, and data centers due to its powerful peak computing capability, and the acceleration of key services by using the GPU is being adopted by more and more organizations. In order to increase the utilization of the GPU, infrastructure providers often also make a plurality of different types of tasks (online tasks and offline tasks) share GPU resources, i.e. a mixed-load operation mode is adopted. However, when the mixed load shares the GPU, the performance of the on-line task will be severely disturbed because multiple tasks will compete for GPU resources. The fundamental reason is that when a task is submitted to the GPU for execution, the task can only release the resources occupied by the task after the execution is finished, and if the offline task occupies too much GPU or takes too long time, the online task cannot obtain enough GPU resources in time for processing, so that the quality of service target cannot be met.
In recent years, in order to solve the problem of performance interference when a hybrid load runs on a GPU, researchers have conducted research from multiple aspects, and the existing research results mainly come from the following aspects:
(1) hardware-based method
When the method is adopted, the existing GPU hardware structure needs to be modified, and corresponding control components are added. Due to the protection of the GPU manufacturer, the hardware architecture of the GPU is difficult to have a complete detailed understanding, and the modification of the hardware of the GPU in an actual system is impossible. Therefore, the hardware-based method is realized in a simulator, has study value only in academia and has no practical significance.
(2) Software-based method
When the method is adopted, the existing GPU hardware is not required to be modified, only different applications are required to be controlled on a software level, and operability is achieved; therefore, software-based approaches have practical significance. In particular, software-based methods can be further classified into the following categories:
a) method for scheduling based on priority
When the method is adopted, different types of GPU tasks are endowed with different priorities, the online tasks have higher priorities, the offline tasks have lower priorities, and when the online tasks and the offline tasks need to be scheduled simultaneously, the online tasks with higher priorities are preferentially operated. When the method is used, only one task can be operated on the GPU at each moment, so that the utilization rate of the GPU is low.
b) Method for reordering based on kernel
When the method is adopted, the method is similar to a method based on priority scheduling, except that the priority of each task is dynamic, when the kernel task arrives, the priority of the task is dynamically calculated according to the service quality requirement of the task, and then the submission sequence of the kernel task is adjusted according to the calculated dynamic priority.
c) Method for seizing based on GPU
When the method is adopted, similar to the method based on priority scheduling, each task has a fixed priority, but the method supports preemption based on the priority, when one task is running on the GPU, a task with higher priority comes, and the subsequent task with higher priority can preempt the task running on the GPU, so that the GPU can not be used until the task running on the GPU is executed. Although the method based on the GPU preemption can reduce the task waiting time at a certain level, the time overhead of preemption is related to the execution time of the GPU kernel task.
In conclusion, the hardware-based method needs to solve the performance problem during load mixing by modifying the hardware structure of the GPU, and has low operability and poor practicability on the conventional GPU equipment; although the software-based method can enable the online task to run preferentially as much as possible, the method cannot ensure that corresponding resources can be obtained in time when the online task needs additional resources. Therefore, a fine-grained GPU resource management method is needed to effectively control the usage of different types of tasks on GPU resources under mixed load, and particularly support task resource quota and resource online adjustment, so as to meet the requirement of service quality, and no related technical report is found at present.
Disclosure of Invention
The invention solves the problems: the defects of the prior art are overcome, and the fine-grained GPU resource management system and method for the mixed load are provided. Meanwhile, when the online task needs additional resources, the resources used by the offline task are adjusted online, so that the condition that the service quality target cannot be met due to the fact that the online task waits for a long time for the offline task to release the resources is avoided.
The present invention is based on a Computer Unified Device Architecture (CUDA) Multi-Process Service (MPS) technology. MPS is a Hyper-Q based GPU resource management technique proposed by great britain (NVIDIA), and by means of the MPS, kernel tasks from multiple applications can be executed concurrently when the GPU is not fully utilized, thereby improving the utilization rate of the GPU. Furthermore, the MPS is application transparent, and can automatically convert kernels from different CUDA contexts to the same CUDA Context, so that the kernels can run simultaneously on the GPU. Since the MPS treats all kernel equally, when each kernel starts running, all needed resources are allocated for the threads in that kernel. Therefore, when running online tasks and offline tasks on a GPU in a mixed manner, a mechanism needs to be taken to limit the resources used by the offline tasks and reduce the performance interference caused to the online tasks due to resource contention.
The technical scheme of the invention is as follows: the method for managing the fine-grained GPU resources for the mixed load is characterized in that the mixed load divides tasks into online tasks and offline tasks, when the online tasks and the offline tasks share GPU resources, an SM abstract model based on capacity is used as a resource management basic unit to manage the use of the GPU resources by different types of tasks in a fine-grained mode, the task resource quota and the resource online adjustment are supported, and the service quality of the online tasks is guaranteed while the GPU resources are shared, and the method comprises the following steps:
(1) when a user submits a task (if the task comprises an online task and an offline task) to a GPU through a resource management API, resource request information of the task is set, if the task is the offline task, the resource upper limit of the task is set, namely quota, and if the task is the online task, the lowest resource amount of the task is set, namely the reserved amount;
(2) analyzing the submission information of the tasks through a resource management API, wherein the submission information comprises a kernel function, the number of task blocks, the size of the task block and the resource request of the tasks;
(3) calculating the number of active thread blocks which can be accommodated on a GPU SM according to the kernel function of the task and the size of the task block;
(4) calculating the residual available resource amount on the GPU according to the running condition of the application on the current GPU;
(5) if the resource residual amount of the current GPU is not less than the resource request of the task acquired in the step (2), executing the step (6), otherwise, executing the task (8);
(6) setting the resource configuration of the task as a resource request of the task;
(7) according to the resource configuration of the task and the number of the active thread blocks determined in the step (3), calculating the number of thread blocks which are to be created when the task is submitted to the GPU for operation and the number of task blocks allocated to each thread block, and then executing a step (11);
(8) if the current task is an offline task, executing the step (9), and if the current task is an online task, executing the task (10);
(9) setting the resource residual amount of the current GPU as the resource configuration of the task, and then turning to the step (7) to execute;
(10) calculating a resource difference according to the resource surplus of the current GPU and the resource request of the task, then sending a resource release command to the offline task running on the current GPU to enable the offline task to release the resource amount specified by the resource difference, and then turning to the step (6);
(11) submitting the tasks to a GPU according to the calculated number of thread blocks of the GPU tasks, creating threads and starting to run;
(12) if the task receives a resource release command in the process of running on the GPU, executing the step (13), otherwise, executing the step (14);
(13) if the task running on the GPU receives a command of releasing the resources, the resources in the specified range are released, and if task blocks on the released resources are not executed, the task blocks which are not executed are remapped to the rest resources to be continuously executed;
(14) and (5) after the task is executed, quitting the GPU.
The used resource management basic unit is a capacity-based SM abstract model, hereinafter referred to as CapSM, and the CapSM is implemented as follows:
(1-1) setting the capacity of each SM to be 1 capacity unit given a GPU; giving a kernel task K, and assuming that the number of thread blocks in an active state, which can contain the task K, of an SM on a GPU is M;
(1-2) abstracting each SM into M small fragments according to M, wherein the capacity of each small fragment is 1/M capacity units, and each small fragment can only contain a thread block of one task K;
(1-3) after all SMs of the GPU are divided into a plurality of small fragments according to the method, regarding any N small fragments, if the capacity sum of the small fragments is equivalent to that of one physical SM, the N small fragments are considered to form one CapSM;
(1-4) for task K, any M shards make up one CapSM.
In the resource management unit cap sm,
(1-1) the M small fragments that make up each CapSM may be from the same SM or from multiple different SMs;
(1-2) each small slice corresponds to a thread block, and a cap sm can be regarded as a set of thread blocks, so that when the method is implemented, the management of the cap sm can be converted into the management of the number of thread blocks;
(1-3) the capacity-based SM abstraction model CapSM is not dependent on a particular GPU architecture and GPU parallel programming language, and the concept of CapSM can be readily applied to other GPU architectures and GPU parallel programming languages.
The original kernel function of the task needs to be converted, so that the thread of the task running on the GPU is a persistent thread, and the specific conversion process is as follows:
(1-1) inserting a cycle control structure into an original kernel function, wherein an original kernel function body is used as a cycle body of the cycle control structure;
(1-2) the loop body sequentially executes each task by traversing the task distributed to each persistent thread, and sets a variable taskIdx as a task block number to which the currently executed task belongs;
(1-3) changing a variable blockIdx representing the index of the thread block to which the thread belongs in the original kernel function body into a variable taskIdx representing the current task block to which the thread belongs;
in the step (1), when the user submits the task to the GPU through the resource management API, the resource management API used provides the following two task submission modes:
(1-1) running tasks in a resource quota mode, wherein the resource amount used by the offline tasks is limited mainly aiming at the offline tasks, and when the tasks are submitted in the mode, the resource quota amount quota, a kernel function of the tasks to be run, the task block number of the tasks, TaskBlockNumber, and the task block size, TaskBlockSize, are required to be provided;
and (1-2) running the tasks in a resource reservation mode, mainly aiming at the online tasks, and when the tasks are submitted in the resource reservation mode, the resource amount reservation reserved for the tasks, a kernel function to run the tasks, the task block number of the tasks, TaskBlockNumber, and the task block size, TaskBlockSize, namely the number of threads in each task block need to be provided.
In the step (3), according to the kernel function of the task and the size of the task block TaskBlockSizeiAnd calculating the number of active thread blocks which can be contained in one GPU SM, wherein the process can calculate the maximum number of active thread blocks MaxActivePBlock which can be contained in each SM or CapSM through cudaOccupycancyanmmaxactiveBlockSerMultiprocessoraPI provided by CUDAi
In the step (7), the CapsMQuota is configured according to the resources of the taskiAnd the number of active thread blocks MaxActivePBlock determined in the step (3)iAnd the PBlocknumber of the thread blocks which should be created when the calculation task is submitted to the GPU for operationiAnd the number of task blocks TaskBlocksPerPBlock allocated to each thread blockiThe specific process comprises the following steps:
(7-1) junctionResource configuration of synthetic task CapsMQuotaiAnd MaxActivePBlockiPBlocknumber of the thread block of the computing taski,PBlockNumberi=CapSMQuotai*MaxActivePBlocki
(7-2) task Block number TaskBlockNumber according to task numberiAnd number of thread blocks PBlocknumberiCalculating the number of task blocks allocated to each thread block,
Figure BDA0001347787300000051
in the step (11), a task is submitted to the GPU, a thread is created and starts to run, and the specific execution step of each thread includes:
(11-1) calculating the number CapsMId of the CapsM to which the current thread belongs, wherein the process comprises the following sub-processes:
(11-1-1) calculating the thread block number PBlockId to which the current thread belongs: PBlockId ═ blockdim.x blockidx.y + blockidx.x, wherein blockdim.x, blockidx.y and blockidx.x are private macros provided by CUDA for each thread, and the threads can be directly used in the running process;
(11-1-2) MaxActivePBlock based on the maximum number of active thread blocks accommodated in each SM or CapSMiAnd the number of the thread block PBlockId, calculating the number of the CapSM to which the program belongs:
Figure BDA0001347787300000061
(11-2) calculating a task block range for each persistent thread block process, the process comprising the sub-processes of:
(11-2-1) calculating the assigned task block start value StartTaskId: StartTaskId (PBlockId) TaskBlocksPerPBlocki
(11-2-2) calculating an assigned task block end value StopTask Id: StoptaskId StartTaskId + TaskBlocksPerPBlocki
(11-3) calculating the number PBIdInCapSM of the thread block to which the current thread belongs in the CapSM: PBIdInCapSM ═ PBlockId% CapSMId;
and (11-4) entering a cycle control structure according to the task block range obtained in the process, and sequentially executing corresponding tasks in all the distributed task blocks.
In the step (10), the step of sending a resource release command to the offline task running on the current GPU to enable the offline task to release the resource amount specified by the resource difference includes the following two stages:
(10-1) changing the resource release identification evictCapsMNum at the CPU endiValue of (e), evictCapsMNumiCapable of synchronization between CPU and GPU, evictCapsMNumiThe value of (d) represents the amount of CapsM that needs to be released;
(10-2) checking the epictCapsMNum before starting execution of the loop body each time the loop control structure for each persistent thread of a task running on the GPUiAll CapsMId are less than evictCapsMNumiIs exited, and thus, the evictCapsMNumiA number of caps resources are released.
In the step (13), if there are task blocks on the released resources that are not executed, remapping the task blocks that are not executed onto the remaining resources for further execution, and the specific processing procedure of the persistent thread block for task block remapping is as follows:
(13-1) calculating the number NumberPerCapsM of released CapsM responsible for mapping per CapsMi
Figure BDA0001347787300000062
Wherein the CapsMQuotaiAllocating resources of the GPU task, namely allocating the number of the CapSMs when the task is submitted;
(13-2) calculating the released CapSM ranges each CapSM is responsible for mapping, including the following two sub-processes:
(13-2-1) calculating a mapped CapSM Start value:
CapSMRemapStart=(CapSMId-evictCapSMNumi)*NumberPerCapSMi
(13-2-2) calculating a mapped CapSM end value:
CapSMRemapEnd=CapSMRemapStart+NumberPerCapSMi
(13-3) sequentially selecting each CapSM in the mapped CapSM range, setting the currently selected CapSM as CapSM MRemapCur, and jumping to the step (14) if the execution of task blocks in all CapSMs in the mapped range is finished;
(13-4) modifying the variable PBlockId to the corresponding thread block number in the CapsMRemapCur to be executed currently: PBlockId. CapsMRemapcur. MaxActivePBlocki+PBIdInCapSM;
(13-5) executing all unexecuted task blocks in the PBlockId according to the current thread block number PBlockId, and jumping to the step (13-3) if all unexecuted tasks in the currently mapped CapsMRemapCur are executed completely;
(13-6) go to the process (13-5) to continue execution.
(14) And (5) after the task is executed, quitting the GPU.
Compared with the prior art, the invention has the innovation points that: a capacity-based SM abstract model is provided, and the limit on the SM quantity is converted into the limit on the SM capacity, so that a GPU resource reservation and online adjustment mechanism can be flexibly realized without depending on specific hardware and a programming model, and the performance interference of an offline task on an online task is effectively controlled when loads are mixed. The concrete expression is as follows:
(1) the present invention abstracts the notion of capacity from physical SM. Each SM on the same GPU has the same capacity, and a certain number of SMs have a certain number of capacities, so that the use of SM resources can be converted into the use of SM capacity, and the concept of capacity makes the present invention very easy to apply to GPUs of other vendors, such as AMD.
(2) The invention eliminates the performance interference through resource reservation in the software level, and can flexibly realize the limitation of GPU resource use through the limitation of SM capacity. And reserving enough GPU resources for the online task by limiting GPU resources which can be used by the offline task, thereby eliminating resource competition as much as possible and ensuring the service quality target of the online task.
Drawings
FIG. 1 is a diagram of a fine-grained GPU resource management method of the present invention;
FIG. 2 is a flowchart of a fine-grained GPU resource management method for mixed loads according to the present invention;
FIG. 3 is a schematic diagram of the capacity-based SM resource abstraction model CapSM
FIG. 4 is a diagram of a generic thread block versus a persistent thread block;
FIG. 5 is a diagram illustrating the relationship between the CapSM, persistent thread blocks, and task block maps;
FIG. 6 is a diagram of dynamic resource reclamation and task remapping.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The basic idea of the present invention is to design a hybrid load resource management method based on resource quota and reservation based on the existing MPS mechanism through the design in the software layer. When the online task and the offline task are operated in a mixed mode on the GPU, in order to ensure that the online task can be processed in time, the use of GPU resources by the offline task is limited through a resource quota mechanism, so that enough resources are reserved for the online task; meanwhile, when the resource requirement of the online task is difficult to meet, the method supports online adjustment of the use of the offline task for GPU resources, recovers the resources used by the offline task, preferentially meets the resource requirement of the online task, and ensures the service quality of the online task.
An example of an application of the present invention is shown in fig. 1. After the offline application and the online application submit the GPU task to the resource management module through the resource management API, the resource management module can analyze the provided resource request information from the resource management API, and then the resource finally allocated to the task is determined according to the type of the task, the resource request and the current system GPU resource state. If the current system GPU resources are sufficient, allocating the resources requested by the task to the task; if the resources of the current GPU can not meet the resource allocation request of the task, the next processing is carried out according to the task type, if the task is an offline task, the currently available GPU resources are allocated to the task, if the task is an online task, part of the resources of the offline task running on the current GPU are recycled to meet the resource request of the online task, so that the performance of the online task can be guaranteed, the GPU resources can be fully utilized, the process needs the coordination control of a CPU and a GPU end, and a resource management module is also needed at the GPU end to be matched with the CPU to complete the fine-grained management of the GPU resources.
As shown in fig. 2, the fine-grained GPU resource management method for mixed load of the present invention includes the following steps:
(1) the user submits task to GPU through resource management APIiIn time, resource request information CapsMRequest of task is setiIf the task is an offline task, setting a resource upper limit of the task, namely a quota, and if the task is an online task, setting the minimum resource amount of the task, namely a reserved amount;
in order to manage GPU resources flexibly and effectively, the invention uses an SM resource abstraction model based on capacity (capacity): the CapSM is the basis for realizing the fine-grained resource management method in the software level and is a basic unit for resource management.
As shown in fig. 3, the definition of the resource management unit CapSM is as follows:
(1-1) setting the capacity of each SM to be 1 capacity unit given a GPU; giving a kernel task K, and assuming that the number of thread blocks in an active state, which can contain the task K, of an SM on a GPU is M;
(1-2) abstracting each SM into M small fragments according to M, wherein the capacity of each small fragment is 1/M capacity units, and each small fragment can only contain a thread block of one task K;
(1-3) after all SMs of the GPU are divided into a plurality of small fragments according to the method, regarding any N small fragments, if the capacity sum of the small fragments is equivalent to that of one physical SM, the N small fragments are considered to form one CapSM;
(1-4) for the task K, forming a CapSM by any M small fragments;
the resource management unit caps has the following characteristics:
(1-1) the M small fragments that make up each CapSM may be from the same SM or from multiple different SMs;
(1-2) each capacity slice corresponds to one thread block, and one cap sm can be regarded as a set of thread blocks, so that in implementation, the management of the cap sm can be converted into the management of the number of thread blocks;
(1-3) abstraction of GPU resources by the caps is not dependent on a particular GPU architecture and GPU parallel programming language, the concept of caps can be easily applied to other GPU architectures and GPU parallel programming languages;
the invention provides two resource management APIs for a user to submit tasks to a GPU, wherein the two APIs comprise the following two APIs:
(1-1)Launch_kernel_with_quota(quota,kernel,grid_size,block_size,kernel_arg_list):
running a kernel task in a resource quota mode, mainly aiming at an offline task, wherein quota is the amount of requested resource quota, kernel is a kernel function to be run, grid _ size is the number of task blocks, block _ size is the size of the task blocks, namely the number of threads in each task block, and kernel _ arg _ list is a parameter transferred to the kernel function;
(1-2)Launch_kernel_with_reservation(reservation,kernel,grid_size,block_size,kernel_arg_list):
running tasks in a resource reservation mode, mainly aiming at online tasks, wherein reservation is the amount of resources expected to be reserved for the tasks, kernel is a kernel function to be run, grid _ size is the number of task blocks, block _ size is the size of the task blocks, namely the number of threads in each task block, and kernel _ arg _ list is a parameter transferred to the kernel function;
in the two APIs, quota and reservation are the resource requests of the tasks, grid _ size is the number of task blocks of the tasks, and the tasks executed in a common thread block are called a task block in the invention.
The fine-grained GPU resource management method for mixed load is a process of cooperation of a CPU and a GPU, except that a resource management API is provided at the CPU end, a kernel function executed at the GPU end needs to be modified, so that a thread of an original GPUkernel task running on the GPU is converted into a persistent thread, each persistent thread block can execute tasks of a plurality of original thread blocks, namely the plurality of task blocks can be executed in sequence, and as shown in figure 4, the specific conversion process is as follows:
(1-1) inserting a cycle control structure into an original kernel function, wherein an original kernel function body is used as a cycle body of the cycle control structure;
(1-2) the loop body sequentially executes each task by traversing the task distributed to each persistent thread, and sets a variable taskIdx as a task block number to which the currently executed task belongs;
(1-3) changing a variable blockIdx representing the index of the thread block to which the thread belongs in the original kernel function body into a variable taskIdx representing the current task block to which the thread belongs;
the tasks managed by the present invention refer to the tasks after the above conversion, if not specifically described.
(2) The submission information of the tasks is analyzed through the resource management API, and the submission information comprises a kernel function and the number of task blocks TaskBlockNumberiTask Block size TaskBlockSizeiAnd resource request of task CapsMRequestiWherein TaskBlocknumberiThe value of (b) is obtained through a parameter grid _ size in the resource management API, TaskBlockSizeiAs parameter block _ size, CapsMRequestiIs the parameter quota or reservation.
(3) Kernel function according to task, and task block size taskbocksizeiCalculating the number of active thread blocks MaxActivePBlock capable of being accommodated on one GPU SM through cudaOccupicMaxActiveBlocksPermultiprocessoraPI provided by CUDAi
(4) Calculating the residual available resource amount Remain on the GPU according to the running condition of the application on the current GPUGPU
(5) If the current GPU resource residual quantity is RemainGPUNot less than renBusiness CapsMRequestiIf not, executing the step (6), otherwise, executing the task (8);
(6) setting resource configuration of task CapsMQuotaiRequesting CapsMRequest for a resource of a taski
(7) CapsMQuota by taskiAnd the number of active thread blocks MaxActivePBlock determined in the step (3)iAnd the PBlocknumber of the thread blocks which should be created when the calculation task is submitted to the GPU for operationiAnd the number of task blocks TaskBlocksPerPBlock allocated to each thread blockiThen, the step (11) is executed to specifically include the following sub-processes:
(7-1) resource configuration of binding tasks CapsMQuotaiAnd MaxActivePBlockiPBlocknumber of the thread block of the computing taski:PBlockNumberi=CapSMQuotai*MaxActivePBlocki
(7-2) task Block number TaskBlockNumber according to task numberiAnd number of thread blocks PBlocknumberiCalculating the number of task blocks TaskBlocksPerPBlock allocated to each thread blocki
Figure BDA0001347787300000101
(8) If the current task is an offline task, executing the step (9), and if the current task is an online task, executing the task (10);
(9) remaining amount of current GPU resources RemainGPUConfiguring CapsMQuota as a resource for a taskiThen, the step (7) is carried out;
(10) according to the current GPU resource residual amount RemainGPUAnd resource request of task CapsMRequestiCalculating the resource difference GapiThen, a resource release command is sent to the offline task running on the current GPU to enable the offline task to release the resource amount specified by the resource difference, and then the step (6) is carried out, wherein the execution of the resource release command needs the cooperation of the CPU and the GPU, and the method comprises the following two stages:
(10-1) changing the resource release identification evictCa at the CPU sidepSMNumiValue of (e), evictCapsMNumiCapable of synchronization between CPU and GPU, evictCapsMNumiThe value of (d) represents the amount of CapsM that needs to be released;
(10-2) checking the epictCapsMNum before starting execution of the loop body each time the loop control structure for each persistent thread of a task running on the GPUiAll CapsMId are less than evictCapsMNumiIs exited, and thus, the evictCapsMNumiA certain number of CapSM resources will be released;
(11) according to the calculated thread number PBlocknumber of the GPU taskiAfter the task is submitted to the GPU for running, threads are created on the GPU by the task and the task starts running, the specific execution steps of each thread comprise:
(11-1) calculating the number CapsMId of the CapsM to which the current thread belongs, wherein the process comprises the following sub-processes:
(11-1-1) calculating the thread block number PBlockId to which the current thread belongs: PBlockId ═ blockdim.x blockidx.y + blockidx.x, wherein blockdim.x, blockidx.y and blockidx.x are private macros provided by CUDA for each thread, and the threads can be directly used in the running process;
(11-1-2) MaxActivePBlock based on the maximum number of active thread blocks accommodated in each SM or CapSMiAnd the number of the thread block PBlockId, calculating the number of the CapSM to which the program belongs:
Figure BDA0001347787300000111
(11-2) calculating a task block range for each persistent thread block process, the process comprising the sub-processes of:
(11-2-1) calculating the assigned task block start value StartTaskId: StartTaskId (PBlockId) TaskBlocksPerPBlocki
(11-2-2) calculating an assigned task block end value StopTask Id: StoptaskId StartTaskId + TaskBlocksPerPBlocki
FIG. 5 is a diagram showing the relationship among the CapsM, the persistent thread blocks and the task blocks, wherein every 3 task blocks are allocated to a persistent thread block, and every 3 persistent thread blocks are allocated to a CapsM.
(11-3) calculating the number PBIdInCapSM of the thread block to which the current thread belongs in the CapSM: PBIdInCapSM ═ PBlockId% CapSMId;
(11-4) entering a cycle control structure according to the task block range obtained in the process, and sequentially executing corresponding tasks in all the distributed task blocks;
(12) if the task receives a resource release command in the process of running on the GPU, executing the step (13), otherwise, executing the step (14);
(13) and if the task running on the GPU receives a command of releasing the resources, releasing the resources within the specified range. As shown in fig. 6, if there are task blocks not executed on the released resources, remapping these task blocks not executed onto the remaining resources for further execution, where the specific execution steps of the above task block remapping in each persistent thread block include:
(13-1) calculating the number NumberPerCapsM of released CapsM responsible for mapping per CapsMi
Figure BDA0001347787300000121
Wherein the CapsMQuotaiAllocating resources of the GPU task, namely allocating the number of the CapSMs when the task is submitted;
(13-2) calculating the released CapSM ranges each CapSM is responsible for mapping, including the following two sub-processes:
(13-2-1) calculating a mapped CapSM Start value: CapsMRemapstart ═ (CapsMId-epictCapsMNum)i)*NumberPerCapSMi
(13-2-2) calculating a mapped CapSM end value: the sequence of the CapsMRemapEnd ═ CapsMRemapstart + NumberPerCapsMi
(13-3) sequentially selecting each CapSM in the mapped CapSM range, setting the currently selected CapSM as CapSM MRemapCur, and jumping to the step (14) if the execution of task blocks in all CapSMs in the mapped range is finished;
(13-4) modifying the variable PBlockId to the corresponding thread block number in the CapsMRemapCur to be executed currently: PBlockId ═ capsremapCur*MaxActivePBlocki+PBIdInCapSM;
(13-5) executing all unexecuted task blocks in the PBlockId according to the current thread block number PBlockId, and jumping to the step (13-3) if all unexecuted tasks in the currently mapped CapsMRemapCur are executed completely;
(13-6) going to the process (13-5) to continue execution;
(14) and (5) after the task is executed, quitting the GPU.
In a word, when mixed loads (including online tasks and offline tasks) share GPU resources, the method manages the use of the GPU resources by different types of tasks in a fine-grained mode, supports task resource quota and resource online adjustment, and guarantees the service quality of the online tasks while sharing the GPU resources. A capacity-based Streaming Multiprocessor (SM) abstract model CapSM is provided, wherein CapSM is used as a basic unit for resource management, and one CapSM is equivalent to one SM in capacity, namely the maximum number of thread blocks in an active state on one CapSM is equivalent to the original SM. When the off-line application and the on-line application submit the GPU task through the resource management API, the provided resource request information is firstly analyzed from the resource management API, and then the resource finally allocated to the task is determined according to the type of the task, the resource request and the current system GPU resource state. If the current system GPU resources are sufficient, allocating the resources requested by the task to the task; if the resources of the current GPU can not meet the resource allocation request of the task, the next processing is carried out according to the task type, if the task is an offline task, the currently available GPU resources are allocated to the task, if the task is an online task, part of the resources of the offline task running on the current GPU are released through a resource recovery mechanism to meet the resource request of the online task, and therefore the performance of the online task can be guaranteed, and the GPU resources can be fully utilized.
The invention has not been described in detail and is within the skill of the art.
The above description is only a part of the embodiments of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims (9)

1. A fine-grained GPU resource management method for mixed load is characterized in that the mixed load divides tasks into online tasks and offline tasks, when the online tasks and the offline tasks share GPU resources, an SM abstract model based on capacity is used as a basic unit of resource management to manage the GPU resources in a fine-grained mode, task resource quota and resource online adjustment are supported, and the service quality of the online tasks is guaranteed while the GPU resources are shared, and the method comprises the following steps:
(1) when a user submits a task to a GPU through an Application Programming Interface (API), resource request information of the task is set, wherein the task comprises an online task and an offline task, if the task is the offline task, the upper limit of the resource of the task, namely quota, is set, and if the task is the online task, the lowest resource amount of the task, namely the reserved amount, is set;
(2) analyzing the submission information of the tasks through a resource management API, wherein the submission information comprises a kernel function, the number of task blocks, the size of the task block and the resource request of the tasks;
(3) calculating the number of active thread blocks which can be accommodated on a GPU SM according to the kernel function of the task and the size of the task block;
(4) calculating the residual available resource amount on the GPU according to the running condition of the application on the current GPU;
(5) if the resource residual amount of the current GPU is not less than the resource request of the task acquired in the step (2), executing the step (6), otherwise, executing the task (8);
(6) setting the resource configuration of the task as a resource request of the task;
(7) according to the resource configuration of the task and the number of the active thread blocks determined in the step (3), calculating the number of thread blocks which are to be created when the task is submitted to the GPU for operation and the number of task blocks allocated to each thread block, and then executing a step (11);
(8) if the current task is an offline task, executing the step (9), and if the current task is an online task, executing the task (10);
(9) setting the resource residual amount of the current GPU as the resource configuration of the task, and then turning to the step (7) to execute;
(10) calculating a resource difference according to the resource surplus of the current GPU and the resource request of the task, then sending a resource release command to the offline task running on the current GPU to enable the offline task to release the resource amount specified by the resource difference, and then turning to the step (6);
(11) submitting the tasks to a GPU according to the calculated number of thread blocks of the GPU tasks, creating threads and starting to run;
(12) if the task receives a resource release command in the process of running on the GPU, executing the step (13), otherwise, executing the step (14);
(13) if the task running on the GPU receives a command of releasing the resources, the resources in the specified range are released, and if task blocks on the released resources are not executed, the task blocks which are not executed are remapped to the rest resources to be continuously executed;
(14) after the task is executed, quitting the GPU;
the used resource management basic unit is a capacity-based SM abstract model, hereinafter referred to as CapSM, which is implemented as follows:
(1-1) setting the capacity of each SM to be 1 capacity unit given a GPU; giving a kernel task K, and assuming that the number of thread blocks in an active state, which can contain the task K, of an SM on a GPU is M;
(1-2) abstracting each SM into M small fragments according to M, wherein the capacity of each small fragment is 1/M capacity units, and each small fragment can only contain a thread block of one task K;
(1-3) after all SMs of the GPU are divided into a plurality of small fragments according to the method, regarding any N small fragments, if the capacity sum of the small fragments is equivalent to that of one physical SM, the N small fragments are considered to form one CapSM;
(1-4) for task K, any M shards make up one CapSM.
2. A fine-grained GPU resource management method for mixed loads according to claim 1, characterized by: the resource management basic unit, the CapSM,
(1-1) the M small fragments that make up each CapSM may be from the same SM or from multiple different SMs;
(1-2) each small slice corresponds to a thread block, and a cap sm can be regarded as a set of thread blocks, so that when the method is implemented, the management of the cap sm can be converted into the management of the number of thread blocks;
(1-3) the capacity-based SM abstraction model CapSM is not dependent on a particular GPU architecture and GPU parallel programming language, and the concept of CapSM can be applied to other GPU architectures and GPU parallel programming languages.
3. A fine-grained GPU resource management method for mixed loads according to claim 1, characterized by: the original kernel function of the task needs to be converted, so that the thread of the task running on the GPU is a persistent thread, and the specific conversion process is as follows:
(1-1) inserting a cycle control structure into an original kernel function, wherein an original kernel function body is used as a cycle body of the cycle control structure;
(1-2) the loop body sequentially executes each task by traversing the task distributed to each persistent thread, and sets a variable taskIdx as a task block number to which the currently executed task belongs;
and (1-3) changing a variable blockIdx representing the index of the thread block to which the thread belongs in the original kernel function body into a variable taskIdx representing the current task block to which the thread belongs.
4. A fine-grained GPU resource management method for mixed loads according to claim 1, characterized by: in the step (1), when the user submits the task to the GPU through the resource management API, the resource management API used provides the following two task submission modes:
(1-1) running tasks in a resource quota mode, wherein the resource amount used by the offline tasks is limited mainly aiming at the offline tasks, when the tasks are submitted in the mode, the resource quota amount quota, a kernel function of the tasks to be run, the task block number taskblock number of the tasks and the task block size taskblock size, namely the number of threads in each task block, need to be provided;
and (1-2) running the tasks in a resource reservation mode, mainly aiming at the online tasks, and when the tasks are submitted in the resource reservation mode, the resource amount reservation reserved for the tasks, a kernel function to run the tasks, the task block number of the tasks, TaskBlockNumber, and the task block size, TaskBlockSize, namely the number of threads in each task block need to be provided.
5. The fine-grained GPU resource management method for hybrid loads according to claim 1, characterized in that in step (3), the number of active thread blocks that can be accommodated on one GPU SM is calculated according to a kernel function of the task and the size of the task block, and the process can calculate the maximum number of active thread blocks that can be accommodated on each SM or CapSM through an API provided by a Compute Unified Device Architecture (CUDA) for obtaining the maximum number of active thread blocks in each SMi
6. A fine-grained GPU resource management method for mixed loads according to claim 1, characterized by: in the step (7), the CapsMQuota is configured according to the resources of the taskiAnd the number of active thread blocks MaxActivePBlock determined in the step (3)iAnd the PBlocknumber of the thread blocks which should be created when the calculation task is submitted to the GPU for operationiAnd the number of task blocks TaskBlocksPerPBlock allocated to each thread blockiThe specific process comprises the following steps:
(7-1) resource configuration of binding tasks CapsMQuotaiAnd MaxActivePBlockiPBlocknumber of the thread block of the computing taski,PBlockNumberi=CapSMQuotai*MaxActivePBlocki
(7-2) task Block number TaskBlo according to taskckNumberiAnd number of thread blocks PBlocknumberiCalculating the number of task blocks allocated to each thread block,
Figure FDA0002472903710000031
7. a fine-grained GPU resource management method for mixed loads according to claim 6, characterized by: in the step (11), a task is submitted to the GPU, a thread is created and starts to run, and the specific execution step of each thread includes:
(11-1) calculating the number CapsMId of the CapsM to which the current thread belongs, wherein the process comprises the following sub-processes:
(11-1-1) calculating the thread block number PBlockId to which the current thread belongs: PBlockId ═ blockdim.x blockidx.y + blockidx.x, wherein blockdim.x, blockidx.y and blockidx.x are private macros provided by CUDA for each thread, and the threads can be directly used in the running process;
(11-1-2) MaxActivePBlock based on the maximum number of active thread blocks accommodated in each SM or CapSMiAnd the number of the thread block PBlockId, calculating the number of the CapSM to which the program belongs:
Figure FDA0002472903710000041
(11-2) calculating a task block range for each persistent thread block process, the process comprising the sub-processes of:
(11-2-1) calculating the assigned task block start value StartTaskId: StartTaskId (PBlockId) TaskBlocksPerPBlocki
(11-2-2) calculating an assigned task block end value StopTask Id: StoptaskId StartTaskId + TaskBlocksPerPBlocki
(11-3) calculating the number PBIdInCapSM of the thread block to which the current thread belongs in the CapSM: PBIdInCapSM ═ PBlockId% CapSMId;
and (11-4) entering a cycle control structure according to the task block range obtained in the process, and sequentially executing corresponding tasks in all the distributed task blocks.
8. A fine-grained GPU resource management method for mixed loads according to claim 7, characterized by: in the step (10), the step of sending a resource release command to the offline task running on the current GPU to enable the offline task to release the resource amount specified by the resource difference includes the following two stages:
(10-1) changing the resource release identification evictCapsMNum at the CPU endiValue of (e), evictCapsMNumiCapable of synchronization between CPU and GPU, evictCapsMNumiThe value of (d) represents the amount of CapsM that needs to be released;
(10-2) checking the epictCapsMNum before starting execution of the loop body each time the loop control structure for each persistent thread of a task running on the GPUiAll CapsMId are less than evictCapsMNumiIs exited, and thus, the evictCapsMNumiA number of caps resources are released.
9. A fine-grained GPU resource management method for mixed loads according to claim 8, characterized by: in the step (13), if there are task blocks on the released resources that are not executed, remapping the task blocks that are not executed onto the remaining resources for further execution, and the specific processing procedure of the persistent thread block for task block remapping is as follows:
(13-1) calculating the number NumberPerCapsM of released CapsM responsible for mapping per CapsMi
Figure FDA0002472903710000042
Wherein the CapsMQuotaiAllocating resources of the GPU task, namely allocating the number of the CapSMs when the task is submitted;
(13-2) calculating the released CapSM ranges each CapSM is responsible for mapping, including the following two sub-processes:
(13-2-1) calculating a mapped CapSM Start value:
CapSMRemapStart=(CapSMId-evictCapSMNumi)*NumberPerCapSMi
(13-2-2) calculating a mapped CapSM end value:
CapSMRemapEnd=CapSMRemapStart+NumberPerCapSMi
(13-3) sequentially selecting each CapSM in the mapped CapSM range, setting the currently selected CapSM as CapSM MRemapCur, and jumping to the step (14) if the execution of task blocks in all CapSMs in the mapped range is finished;
(13-4) modifying the variable PBlockId to the corresponding thread block number in the CapsMRemapCur to be executed currently: PBlockId. CapsMRemapcur. MaxActivePBlocki+PBIdInCapSM;
(13-5) executing all unexecuted task blocks in the PBlockId according to the current thread block number PBlockId, and jumping to the step (13-3) if all unexecuted tasks in the currently mapped CapsMRemapCur are executed completely;
(13-6) go to the process (13-5) to continue execution.
CN201710563834.7A 2017-07-12 2017-07-12 Fine-grained GPU resource management method for mixed load Expired - Fee Related CN107357661B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710563834.7A CN107357661B (en) 2017-07-12 2017-07-12 Fine-grained GPU resource management method for mixed load

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710563834.7A CN107357661B (en) 2017-07-12 2017-07-12 Fine-grained GPU resource management method for mixed load

Publications (2)

Publication Number Publication Date
CN107357661A CN107357661A (en) 2017-11-17
CN107357661B true CN107357661B (en) 2020-07-10

Family

ID=60292105

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710563834.7A Expired - Fee Related CN107357661B (en) 2017-07-12 2017-07-12 Fine-grained GPU resource management method for mixed load

Country Status (1)

Country Link
CN (1) CN107357661B (en)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109840877B (en) * 2017-11-24 2023-08-22 华为技术有限公司 Graphics processor and resource scheduling method and device thereof
CN109936604B (en) * 2017-12-18 2022-07-26 北京图森智途科技有限公司 Resource scheduling method, device and system
WO2019157743A1 (en) * 2018-02-14 2019-08-22 华为技术有限公司 Thread processing method and graphics processor
CN108710536B (en) * 2018-04-02 2021-08-06 上海交通大学 Multilevel fine-grained virtualized GPU (graphics processing Unit) scheduling optimization method
CN109298936B (en) * 2018-09-11 2021-05-18 华为技术有限公司 Resource scheduling method and device
CN109445565B (en) * 2018-11-08 2020-09-15 北京航空航天大学 GPU service quality guarantee method based on monopolization and reservation of kernel of stream multiprocessor
CN109412874B (en) * 2018-12-21 2021-11-02 腾讯科技(深圳)有限公司 Equipment resource configuration method, device, server and storage medium
CN111597034B (en) * 2019-02-21 2023-04-28 阿里巴巴集团控股有限公司 Processor resource scheduling method and device, terminal equipment and computer storage medium
CN110289990B (en) * 2019-05-29 2020-06-12 清华大学 Network function virtualization system, method and storage medium based on GPU
CN110415162B (en) * 2019-07-22 2020-03-31 中国人民大学 Adaptive graph partitioning method facing heterogeneous fusion processor in big data
CN110781007B (en) * 2019-10-31 2023-12-26 广州市网星信息技术有限公司 Task processing method, device, server, client, system and storage medium
CN112862658A (en) * 2019-11-28 2021-05-28 中兴通讯股份有限公司 GPU operation method, device, equipment and storage medium
WO2021128079A1 (en) * 2019-12-25 2021-07-01 阿里巴巴集团控股有限公司 Data processing method, image recognition method, processing server, system, and electronic device
CN113296921B (en) * 2020-04-07 2022-05-27 阿里巴巴集团控股有限公司 Cloud resource scheduling method, node, system and storage medium
CN111597045B (en) * 2020-05-15 2023-04-07 上海交通大学 Shared resource management method, system and server system for managing mixed deployment
CN111736987B (en) * 2020-05-29 2023-08-04 山东大学 Task scheduling method based on GPU space resource sharing
CN113407333B (en) * 2020-12-18 2023-05-26 上海交通大学 Task scheduling method, system, GPU and equipment for Warp level scheduling
CN113411230B (en) * 2021-06-09 2022-12-20 广州虎牙科技有限公司 Container-based bandwidth control method and device, distributed system and storage medium
CN113590317A (en) * 2021-07-27 2021-11-02 杭州朗和科技有限公司 Scheduling method, device, medium and computing equipment of offline service
CN114035935A (en) * 2021-10-13 2022-02-11 上海交通大学 High-throughput heterogeneous resource management method and device for multi-stage AI cloud service
CN116893854B (en) * 2023-09-11 2023-11-14 腾讯科技(深圳)有限公司 Method, device, equipment and storage medium for detecting conflict of instruction resources

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102958166A (en) * 2011-08-29 2013-03-06 华为技术有限公司 Resource allocation method and resource management platform
CN104243617A (en) * 2014-10-14 2014-12-24 中国科学院信息工程研究所 Task scheduling method and system facing mixed load in heterogeneous cluster
CN103365726B (en) * 2013-07-08 2016-05-25 华中科技大学 A kind of method for managing resource towards GPU cluster and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9275430B2 (en) * 2006-12-31 2016-03-01 Lucidlogix Technologies, Ltd. Computing system employing a multi-GPU graphics processing and display subsystem supporting single-GPU non-parallel (multi-threading) and multi-GPU application-division parallel modes of graphics processing operation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102958166A (en) * 2011-08-29 2013-03-06 华为技术有限公司 Resource allocation method and resource management platform
CN103365726B (en) * 2013-07-08 2016-05-25 华中科技大学 A kind of method for managing resource towards GPU cluster and system
CN104243617A (en) * 2014-10-14 2014-12-24 中国科学院信息工程研究所 Task scheduling method and system facing mixed load in heterogeneous cluster

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Disaggregated Architecture for at Scale Computing;Chung-Sheng Li et.al;《In Proceedings of the 2nd International Workshop on Emerging Software as a Service and Analytics (ESaaSA-2015)》;20151231;第45-52页 *
JamaisVu:Robust Scheduling with Auto-Estimated Job Runtimes;Alexey Tumanov et.al;《Parallel Data Laboratory》;20160930;第1-24页 *
云计算环境下多GPU资源调度机制研究;吕相文 等;《小型微型计算机系统》;20161231;第37卷(第4期);第687-683页 *
基于GPU/CPU混合架构的流程序多粒度划分与调度方法研究;陈文斌 等;《计算机工程与科学》;20170131;第39卷(第1期);第15-26页 *

Also Published As

Publication number Publication date
CN107357661A (en) 2017-11-17

Similar Documents

Publication Publication Date Title
CN107357661B (en) Fine-grained GPU resource management method for mixed load
US11669372B2 (en) Flexible allocation of compute resources
EP3425502B1 (en) Task scheduling method and device
EP3254196B1 (en) Method and system for multi-tenant resource distribution
WO2016078178A1 (en) Virtual cpu scheduling method
CN107122233B (en) TSN service-oriented multi-VCPU self-adaptive real-time scheduling method
CN109445565B (en) GPU service quality guarantee method based on monopolization and reservation of kernel of stream multiprocessor
JPH0659906A (en) Method for controlling execution of parallel
CN109564528B (en) System and method for computing resource allocation in distributed computing
CN111367630A (en) Multi-user multi-priority distributed cooperative processing method based on cloud computing
CN109992418B (en) SLA-aware resource priority scheduling method and system for multi-tenant big data platform
US20170109384A1 (en) Online index rebuilding method and apparatus
CN108123980A (en) A kind of resource regulating method and system
WO2024021489A1 (en) Task scheduling method and apparatus, and kubernetes scheduler
WO2017185285A1 (en) Method and device for assigning graphics processing unit task
US20190272201A1 (en) Distributed database system and resource management method for distributed database system
WO2013115821A1 (en) Quality of service targets in multicore processors
CN116401055B (en) Resource efficiency optimization-oriented server non-perception computing workflow arrangement method
JP7506096B2 (en) Dynamic allocation of computing resources
CN115576683A (en) Coroutine pool scheduling management method, system, device and storage medium
WO2020108337A1 (en) Cpu resource scheduling method and electronic equipment
CN116010064A (en) DAG job scheduling and cluster management method, system and device
CN106201681A (en) Task scheduling algorithm based on pre-release the Resources list under Hadoop platform
CN113010309B (en) Cluster resource scheduling method, device, storage medium, equipment and program product
CN114721818A (en) Kubernetes cluster-based GPU time-sharing method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210420

Address after: 100160, No. 4, building 12, No. 128, South Fourth Ring Road, Fengtai District, Beijing, China (1515-1516)

Patentee after: Kaixi (Beijing) Information Technology Co.,Ltd.

Address before: 100191 Haidian District, Xueyuan Road, No. 37,

Patentee before: BEIHANG University

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200710

Termination date: 20210712