CN107357661A - A kind of fine granularity GPU resource management method for mixed load - Google Patents

A kind of fine granularity GPU resource management method for mixed load Download PDF

Info

Publication number
CN107357661A
CN107357661A CN201710563834.7A CN201710563834A CN107357661A CN 107357661 A CN107357661 A CN 107357661A CN 201710563834 A CN201710563834 A CN 201710563834A CN 107357661 A CN107357661 A CN 107357661A
Authority
CN
China
Prior art keywords
task
resource
gpu
capsm
thread
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710563834.7A
Other languages
Chinese (zh)
Other versions
CN107357661B (en
Inventor
杨海龙
禹超
白跃彬
栾钟治
顾育豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kaixi Beijing Information Technology Co ltd
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201710563834.7A priority Critical patent/CN107357661B/en
Publication of CN107357661A publication Critical patent/CN107357661A/en
Application granted granted Critical
Publication of CN107357661B publication Critical patent/CN107357661B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5022Mechanisms to release resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of fine granularity GPU resource management method for mixed load, it is proposed that a stream multiprocessor abstract model CapSM based on capacity, the elementary cell using CapSM as resource management;When mixed load (including online task and offline task) shares GPU resource, use of the different type task to GPU resource is managed by fine granularity, task resource quota and resource on-line tuning are supported, while shared GPU resource, ensures the service quality of online task.The present invention determines the final resource for task distribution according to type, resource request and the current system GPU resource state of task, use of the offline task to GPU resource can be met in the case of resource abundance, when GPU resource deficiency, dynamic adjusts the resource use of offline task, preferentially meet the resource requirement of online task, so simultaneously when mixed load is run, both the performance of online task had been can guarantee that, and can makes full use of GPU resource.

Description

A kind of fine granularity GPU resource management method for mixed load
Technical field
It is negative for mixing more particularly to one kind the present invention relates to the resource management in Heterogeneous Computing and task scheduling field The fine granularity GPU resource management method of load.
Background technology
Graphics processor (Graphics Processing Unit, hereinafter referred to as GPU) is with its powerful peak computational energy Power, high-performance calculation, the indispensable part of cloud computing and data center are increasingly becoming, key business is entered using GPU Row accelerates by increasing mechanism and tissue to be adopted.In order to improve GPU utilization rate, infrastructure provider is usual Also a variety of different types of tasks (online task and offline task) can be allowed to share GPU resource, i.e., using the operation of mixed load Mode.However, when mixed load shares GPU, because multiple tasks can compete GPU resource, it will to the performance of online task Produce severe jamming.Its basic reason is that, when a task, which is submitted to GPU, to be performed, the task can only be after execution terminates The resource shared by it can be just discharged, GPU is excessive or overlong time if offline task takes, it will causes online task not Enough GPU resources can be obtained in time to be handled, so as to cause quality of service goals not to be met.
In recent years, in order to solve performance interference problem of the mixed load when being run on GPU, researcher is from many aspects Expansion research, existing achievement in research is essentially from the following aspects:
(1) hardware based method
When this method is employed, it is necessary to be modified to existing GPU hardware structure, corresponding control unit is added.By In the protection of GPU manufacturers, it is difficult to have a completely detailed understanding to GPU hardware structure, unlikely in systems in practice GPU hardware is modified.Therefore, hardware based method is realized in simulator, only academicly has research Value, the meaning without reality.
(2) method based on software
When this method is employed, it is not necessary to existing GPU hardware is modified, it is only necessary in software view to different Using being controlled, there is operability;Therefore, the method based on software has realistic meaning.Specifically, based on software Method can be divided into following a few classes again:
A) method based on priority scheduling
When this method is employed, different types of GPU task is assigned to different priority, online task has compared with Gao You First level, offline task have lower priority, when online task and offline task need scheduling simultaneously, preferentially make priority compared with High online task run.During using this method, there can only be a task to be run on GPU at each moment, so that GPU Utilization rate is relatively low.
B) method to be reordered based on kernel
When this method is employed, similar with the method based on priority scheduling, only the priority of each task is It is dynamic, it is necessary to when kernel tasks arrive, according to the quality of service requirement of task, the priority of dynamic calculation task, so Afterwards according to the submission of the dynamic priority adjusting kernel tasks calculated order.
C) method seized based on GPU
When this method is employed, it is similar with the method based on priority scheduling, every kind of task have one it is fixed preferential Grade, only this method supports seizing based on priority, when being currently running a task on GPU, a higher priority Task arrive, this higher priority task can subsequently to arrive seizes just being run on GPU for task, from without GPU can be used after the completion of the tasks carrying just run on GPU.The method seized based on GPU, although can be one The time that task waits is reduced on given layer degree, but the time overhead seized and the execution time correlation of GPU kernel tasks.
It in summary it can be seen, hardware based method needs solve load mixing by changing GPU hardware structure to reach When performance issue, operability is low in existing GPU equipment, poor practicability;Although the method based on software can make The preferential operation as far as possible of line task, but do not ensure that when online task needs extra resource, it can obtain in time corresponding Resource.Therefore, it is necessary to which a kind of fine-grained GPU resource management method effectively controls different type task pair under mixed load The use of GPU resource, task resource quota and resource on-line tuning are particularly supported, so as to meet the requirement of service quality, mesh It is preceding it is not yet found that correlation technique report.
The content of the invention
The technology of the present invention solves problem:A kind of overcome the deficiencies in the prior art, there is provided fine granularity GPU for mixed load Resource management system and method, when mixed load is being run on GPU, limit what offline task can use by resource quota Resource, offline task is avoided to take excess resource.Meanwhile when online task needs extra resource, support on-line tuning from The resource that line task uses, quality of service goals is caused so as to avoid online task from waiting as long for offline task release resource Situation about can not be met.
The present invention is based on calculating Unified Device framework (Compute Unified Device Architecture, below letter Claim CUDA) multi-process service (Multi-Process Service, hereinafter referred to as MPS) technology.MPS is tall and handsome to reach (NVIDIA) A kind of GPU resource administrative skill based on Hyper-Q proposed, can be allowed and come from when GPU is not fully used by MPS The kernel tasks of multiple applications concurrently perform, so as to improve GPU utilization rate.In addition, MPS is using transparent, MPS can be with Automatically the kernel in different CUDA Context is transformed into same CUDA Context, so as on GPU Run simultaneously.Can be in the kernel when each kernel brings into operation because MPS treats all kernel on an equal basis A thread distribution institute resource in need.Therefore, when when the online task of mixed running on GPU is with offline task, it is necessary to take A kind of mechanism limits the resource that offline task uses, and reduces due to contention for resources and performance is disturbed to caused by online task.
The technology of the present invention solution:A kind of fine granularity GPU resource management method for mixed load provided, it is described Mixed load is that task is divided into online task and offline task, when online task and offline task sharing GPU resource, is used A kind of SM abstract models based on capacity manage different type task to GPU resource as resource management elementary cell come fine granularity Use, support task resource quota and resource on-line tuning, while shared GPU resource, ensure the service of online task Quality, comprise the following steps:
(1) user by resource management API to GPU submit task (such as not specified otherwise, task including online task and from Line task) when, the resource request information of task is set, if task is offline task, is provided that the resource upper limit of task, That is quota, if task is online task, the minimum resources amount of task, i.e., pre- allowance are provided that;
(2) the submission information gone out on missions by resource management API parsings, including kernel functions, task number of blocks, task The resource request of block size and task;
(3) according to the kernel functions of task, and task block size, the activity that can be accommodated on a GPU SM is calculated Thread number of blocks;
(4) according to the operation conditions applied on current GPU, remaining available resource amount on GPU is calculated;
(5) if current GPU resource surplus performs step no less than the resource request for the task that step (2) obtains (6) task (8), otherwise, is performed;
(6) resource distribution for setting task is the resource request of task;
(7) the active threads number of blocks determined according to the resource distribution of task and step (3), calculating task are submitted to The thread number of blocks and the task number of blocks of each thread block distribution that GPU should be created when running, then perform step (11);
(8) if current task is offline task, step (9) is performed, if online task, then perform task (10);
(9) current GPU resource surplus is set to the resource distribution of task, then goes to step (7) execution;
(10) according to current GPU resource surplus, and the resource request of task, calculate resource difference, then to work as The offline task run on preceding GPU sends resource release commands, the stock number for specifying offline task release resource difference, then Go to step (6);
(11) according to the thread number of blocks of the GPU task calculated, task is submitted to GPU, is created thread and is started to transport OK;
(12) if task receives the order of resource release on GPU in running, step (13) is performed, otherwise Perform step (14);
(13) if being run on GPU for task, the order of release resource is received, then discharges the resource of specified range, If thering is task block to be not carried out in the resource being released, the task block that these are not carried out be remapped in remaining resource after It is continuous to perform;
(14) tasks carrying is completed, and exits GPU.
The resource management elementary cell that uses is a kind of SM abstract models based on capacity, hereinafter referred to as CapSM, CapSM realizes as follows:
(1-1) gives a GPU, sets each SM capacity as 1 bodge;A kernel task K is given, it is false If the thread number of blocks being active that the upper SM of GPU can accommodate task K is M;
(1-2) foundation M, each SM being abstracted as M small bursts, the capacity of each small burst is 1/M bodge, and And each small burst can only accommodate task K thread block;
After GPU all SM are divided into multiple small bursts by (1-3) according to above method, for any N number of small burst, such as Their capacity of fruit and of equal value with physics SM, then it is assumed that this N number of small burst forms a CapSM;
(1-4) forms a CapSM for task K, any M small bursts.
In described rm-cell CapSM,
(1-1) forms each CapSM M small bursts, can come from same SM, can be from multiple different SM;
(1-2) each small burst is corresponding with a thread block, and a CapSM can be regarded as the collection of one group of thread block Close, therefore, when realizing, it is management to thread number of blocks that the management to CapSM, which can be converted to,;
The SM abstract models CapSM of (1-3) based on capacity independent of specific GPU architecture and GPU parallel programming languages, CapSM concept can be readily applied to other GPU architectures and GPU parallel programming languages.
Need to change the original kernel functions of the task, the thread for making task run on GPU is lasting Thread, specific transfer process are as follows:
(1-1) inserts loop control structure into original kernel functions, and original kernel function bodies are as loop control The loop body of structure;
(1-2) loop body performs each task by traveling through as each task that persistently thread distributes successively, and by variable TaskIdx is arranged to the affiliated task block number of task currently performed;
The variable blockIdx that the affiliated thread block index of thread is represented in original kernel function bodies is changed to represent by (1-3) The variable taskIdx of task block belonging to current;
In the step (1), user by resource management API to GPU submit task when, the resource management API used is carried For the following two kinds task way of submission:
(1-1), come operation task, mainly for offline task, limits what offline task used by way of resource quota Stock number, when submitting task using which, it is desirable to provide resource quota amount quota, the kernel functions of operation task, appoint The task number of blocks TaskBlockNumber of business, and task block size TaskBlockSize are the thread in each task block Quantity;
(1-2), come operation task, mainly for online task, task is submitted using which by way of resource reservation When, it is desirable to provide the stock number reservation reserved for the task, the kernel functions of operation task, the task of task Number of blocks TaskBlockNumber, and task block size TaskBlockSize are the number of threads in each task block.
In the step (3), according to the kernel functions of task, and task block size TaskBlockSizei, calculate The active threads number of blocks that can be accommodated on one GPU SM, what the process can be provided by CUDA CudaOccupancyMaxActiveBlocksPerMultiprocessorAPI can accommodate to calculate on each SM or CapSM Maximum activity thread number of blocks MaxActivePBlocki
In the step (7), according to the resource distribution CapSMQuota of taskiAnd the active threads that step (3) determines Number of blocks MaxActivePBlocki, the thread number of blocks PBlockNumber that should create when calculating task is submitted to GPU operationsi And the task number of blocks TaskBlocksPerPBlock of each thread block distributioni, detailed process includes:
(7-1) combines the resource distribution CapSMQuota of taskiAnd MaxActivePBlocki, the thread block of calculating task Quantity PBlockNumberi, PBlockNumberi=CapSMQuotai*MaxActivePBlocki
(7-2) is according to the task number of blocks TaskBlockNumber of taskiWith thread number of blocks PBlockNumberi, meter The task number of blocks for the distribution of each thread block is calculated,
In the step (11), task is submitted to GPU, is created thread and is brought into operation, the specific execution of each thread Step includes:
(11-1) calculates the numbering CapSMId of the CapSM belonging to current thread, and the process includes following subprocess:
(11-1-1) calculates the thread block number PBlockId belonging to current thread:PBlockId=blockDim.x* BlockIdx.y+blockIdx.x, wherein blockDim.x, blockIdx.y and blockIdx.x are that CUDA carries for each thread What is supplied is privately owned grand, and thread can be used directly in running;
(11-1-2) is according to the active threads block maximum quantity MaxActivePBlock accommodated in each SM or CapSMiWith Place thread block number PBlockId, calculate affiliated CapSM numberings:
(11-2) calculates the task block scope of each persistently thread block processing, and the process includes following subprocess:
(11-2-1) calculates the task block initial value StartTaskId of distribution:StartTaskId=PBlockId* TaskBlocksPerPBlocki
(11-2-2) calculates the task block end value StopTaskId of distribution:StopTaskId=StartTaskId+ TaskBlocksPerPBlocki
(11-3) calculates numbering PBIdInCapSM of the current affiliated thread block in CapSM:PBIdInCapSM= PBlockId%CapSMId;
The task block scope that (11-4) obtains according to above procedure, into loop control structure, distribution is performed successively Corresponding task in all task blocks.
In the step (10), to current GPU on the offline task run send resource release commands, release offline task Put the stock number that resource difference is specified, including following two stages:
(10-1) changes resource release mark evictCapSMNum at CPU endsiValue, evictCapSMNumiCan be It is synchronous between CPU and GPU, evictCapSMNumiValue represent to need the CapSM quantity that discharges;
The loop control structure of each persistently thread for the task that (10-2) is run on GPU starts to perform loop body every time Preceding inspection evictCapSMNumiValue, all CapSMId are less than evictCapSMNumiThread will exit execution, so, evictCapSMNumiThe CapSM resources of individual quantity will be released.
In the step (13), if thering is task block to be not carried out in the resource being released, the task that these are not carried out Block is remapped in remaining resource and continued executing with, and it is as follows that lasting thread block carries out the concrete processing procedure that task block remaps:
(13-1) calculates the CapSM quantity NumberPerCapSM being released that each CapSM is responsible for mappingi
Wherein CapSMQuotaiFor the resource distribution of GPU task, that is, the CapSM quantity configured when submitting task;
(13-2) calculates the CapSM scopes being released that each CapSM is responsible for mapping, including following two subprocess:
(13-2-1) calculates the CapSM initial values of mapping:
CapSMRemapStart=(CapSMId-evictCapSMNumi)*NumberPerCapSMi
(13-2-2) calculates the CapSM end values of mapping:
CapSMRemapEnd=CapSMRemapStart+NumberPerCapSMi
(13-3) each CapSM in the CapSM scopes of Choose for user successively, set the CapSM that currently selects for CapSMRemapCur, if the task block in all CapSM in mapping range is carried out completing, jump to step (14);
(13-4) modification variable PBlockId is corresponding thread block number in the CapSMRemapCur currently to be performed: PBlockId=CapSMRemapCur*MaxActivePBlocki+PBIdInCapSM;
(13-5) performs all task blocks being not carried out in PBlockId according to current thread block number PBlockId, If all task executeds being not carried out in the CapSMRemapCur currently mapped are completed, step (13-3) is jumped to;
(13-6) go to procedure (13-5) continue executing with.
(14) tasks carrying is completed, and exits GPU.
Compared with prior art, innovation of the invention is:A kind of SM abstract models based on capacity are proposed, will be right The limitation of SM quantity is converted to the limitation to SM capacity, so as to independent of specific hardware and programming model and neatly real Existing GPU resource is reserved and on-line tuning mechanism, and when mixing load, offline task is disturbed to obtain effectively to the performance of online task Control.It is embodied in:
(1) present invention takes out the concept of capacity from physics SM.Each SM on same GPU holds with identical Amount, a number of SM have certain amount of capacity, so, SM resources are converted to using can be made to SM capacity With the concept of capacity can make the GPU, such as AMD of the invention for being easily applied to other manufacturers.
(2) present invention is disturbed in software view by resource reservation to eliminate performance, can be neatly by SM capacity Limitation realize the limitation used GPU resource.The GPU resource that can be used by limiting offline task, it is online task Enough GPU resources are reserved, so as to eliminate as much as resource contention, ensure the quality of service goals of online task.
Brief description of the drawings
Fig. 1 is the scene graph of the fine granularity GPU resource management method of the present invention;
Fig. 2 is flow chart of the present invention for the fine granularity GPU resource management method of mixed load;
Fig. 3 is the schematic diagram of the SM resource abstract models CapSM based on capacity
Fig. 4 is common thread block and lasting thread block graph of a relation;
Fig. 5 is the relation schematic diagram between CapSM, lasting thread block and task block are reflected;
Fig. 6 is that dynamic resource recovery and task remap schematic diagram.
Embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples The present invention is further elaborated.It should be appreciated that specific embodiment described herein is only to explain the present invention, not For limiting the present invention.As long as in addition, technical characteristic involved in each embodiment of invention described below that Conflict can is not formed between this to be mutually combined.
The basic ideas of the present invention are, pass through the design in software view, on the basis of existing MPS mechanism, design Based on resource quota and reserved mixed load method for managing resource.When online task and offline task mixed running on GPU When, in order to ensure online task can obtain timely processing, by resource quota mechanism, offline task is limited to GPU resource Use, so as to reserve enough resources for online task;Meanwhile when the resource requirement of online task is difficult to meet, support Line adjusts use of the offline task to GPU resource, reclaims the resource that offline task uses, preferentially meets the resource need of online task Ask, ensure the service quality of online task.
The application example of the present invention is as shown in Figure 1.Offline application and application on site are submitting GPU by resource management API Task is to after resource management module, and resource management module can parse the resource request information of offer from resource management API, so The final resource for task distribution is determined according to type, resource request and the current system GPU resource state of task afterwards.If Current system GPU resource is sufficient, then the resource of its request is distributed for task;If current GPU resource can not meet task Resource allocation request, then it is further processed according to task type, will be currently available if task is offline task GPU resource distributes to task, if the online task of task, reclaims the part resource for the offline task run on current GPU, To meet the resource request of online task, the performance of online task was so both can guarantee that, and can makes full use of GPU resource, the mistake Journey needs the coordination at CPU and GPU ends to control, and a resource management module is also required at GPU ends and is used for coordinating CPU completions pair The fine granularity management of GPU resource.
As shown in Fig. 2 the present invention comprises the following steps for the fine granularity GPU resource management method of mixed load:
(1) user submits task task by resource management API to GPUiWhen, the resource request information of task is set CapSMRequestiIf task is offline task, be provided that the resource upper limit of task, i.e. quota, if task be Line task, then it is provided that the minimum resources amount of task, i.e., pre- allowance;
In order to be flexibly effectively managed to GPU resource, the present invention uses a SM for being based on capacity (capacity) Resource abstract model:CapSM, CapSM are the bases that the inventive method realizes fine granularity method for managing resource in software view, are The elementary cell of resource management.
As shown in figure 3, rm-cell CapSM is defined as follows:
(1-1) gives a GPU, sets each SM capacity as 1 bodge;A kernel task K is given, it is false If the thread number of blocks being active that the upper SM of GPU can accommodate task K is M;
(1-2) foundation M, each SM being abstracted as M small bursts, the capacity of each small burst is 1/M bodge, and And each small burst can only accommodate task K thread block;
After GPU all SM are divided into multiple small bursts by (1-3) according to above method, for any N number of small burst, such as Their capacity of fruit and of equal value with physics SM, then it is assumed that this N number of small burst forms a CapSM;
(1-4) forms a CapSM for task K, any M small bursts;
Rm-cell CapSM has following feature:
(1-1) forms each CapSM M small bursts, can come from same SM, can be from multiple different SM;
(1-2) each capacity burst is corresponding with a thread block, and a CapSM can be regarded as the collection of one group of thread block Close, therefore, when realizing, it is management to thread number of blocks that the management to CapSM, which can be converted to,;
(1-3) CapSM is abstracted independent of specific GPU architecture and GPU parallel programming languages to GPU resource, CapSM's Concept can be readily applied to other GPU architectures and GPU parallel programming languages;
The present invention provides the user two resource management API to submit task, including following two API to GPU:
(1-1)Launch_kernel_with_quota(quota,kernel,grid_size,block_size, kernel_arg_list):
Kernel tasks are run by way of resource quota, mainly for offline task, wherein quota is request Resource quota amount, kernel are the kernel functions to be run, and grid_size is the quantity of task block, and block_size is to appoint It is engaged in block size, i.e., the number of threads in each task block, kernel_arg_list is the parameter for passing to kernel functions;
(1-2)Launch_kernel_with_reservation(reservation,kernel,grid_size, block_size,kernel_arg_list):
Come operation task by way of resource reservation, mainly for online task, wherein reservation is that hope is The reserved stock number of the task, kernel be the kernel functions to be run, and grid_size is the quantity of task block, block_ Size is task block size, i.e., the number of threads in each task block, kernel_arg_list is to pass to kernel functions Parameter;
In two above API, quota and reservation are the resource request of task, and grid_size is task Task number of blocks, being performed in a common thread block for task is referred to as a task block in the present invention.
The fine granularity GPU resource management method for mixed load of the present invention is a kind of process of CPU and GPU collaboration, Except providing resource management API at CPU ends, it is also necessary to the kernel functions performed at GPU ends are modified, make original GPU The thread that kernel tasks are run on GPU is converted to lasting thread, and each persistently thread block can perform original multiple threads The task of block, you can to perform multiple tasks block successively, as shown in figure 4, specific transfer process is as follows:
(1-1) inserts loop control structure into original kernel functions, and original kernel function bodies are as loop control The loop body of structure;
(1-2) loop body performs each task by traveling through as each task that persistently thread distributes successively, and by variable TaskIdx is arranged to the affiliated task block number of task currently performed;
The variable blockIdx that the affiliated thread block index of thread is represented in original kernel function bodies is changed to represent by (1-3) The variable taskIdx of task block belonging to current;
Institute's managerial role of the present invention, if not otherwise specified, all referring to the task after being changed more than.
(2) the submission information gone out on missions by resource management API parsings, including kernel functions, task number of blocks TaskBlockNumberi, task block size TaskBlockSizeiAnd the resource request CapSMRequest of taski, wherein TaskBlockNumberiValue obtained by parameter grid_size in resource management API, TaskBlockSizeiFor parameter Block_size, CapSMRequestiFor parameter quota or reservation.
(3) according to the kernel functions of task, and task block size TaskBlockSizei, provided by CUDA CudaOccupancyMaxActiveBlocksPerMultiprocessorAPI calculates the work that can be accommodated on a GPU SM Moving-wire journey number of blocks MaxActivePBlocki
(4) according to the operation conditions applied on current GPU, remaining available resource amount Remain on GPU is calculatedGPU
(5) if current GPU resource surplus RemainGPUNo less than the CapSMRequest of taski, then step is performed (6) task (8), otherwise, is performed;
(6) the resource distribution CapSMQuota of task is setiFor the resource request CapSMRequest of taski
(7) according to the CapSMQuota of taskiAnd the active threads number of blocks that step (3) determines MaxActivePBlocki, the thread number of blocks PBlockNumber that should create when calculating task is submitted to GPU operationsiIt is and every The task number of blocks TaskBlocksPerPBlock of individual thread block distributioni, then perform step (11) specifically include it is following several Subprocess:
(7-1) combines the resource distribution CapSMQuota of taskiAnd MaxActivePBlocki, the thread block of calculating task Quantity PBlockNumberi:PBlockNumberi=CapSMQuotai*MaxActivePBlocki
(7-2) is according to the task number of blocks TaskBlockNumber of taskiWith thread number of blocks PBlockNumberi, meter Calculate the task number of blocks TaskBlocksPerPBlock for the distribution of each thread blocki
(8) if current task is offline task, step (9) is performed, if online task, then perform task (10);
(9) by current GPU resource surplus RemainGPUIt is set to the resource distribution CapSMQuota of taski, then go to Step (7) performs;
(10) according to current GPU resource surplus RemainGPU, and the resource request CapSMRequest of taski, meter Calculate resource difference Gapi, resource release commands then are sent to the offline task run on current GPU, it is discharged resource poor The stock number that volume is specified, step (6) being then gone to, the execution of above resource release commands needs CPU and GPU cooperation, including Following two stages:
(10-1) changes resource release mark evictCapSMNum at CPU endsiValue, evictCapSMNumiCan be It is synchronous between CPU and GPU, evictCapSMNumiValue represent to need the CapSM quantity that discharges;
The loop control structure of each persistently thread for the task that (10-2) is run on GPU starts to perform loop body every time Preceding inspection evictCapSMNumiValue, all CapSMId are less than evictCapSMNumiThread will exit execution, so, evictCapSMNumiThe CapSM resources of individual quantity will be released;
(11) according to the number of threads PBlockNumber of the GPU task calculatedi, run in submission task to GPU, appoint After business creates thread and brought into operation on GPU, the specific execution step of each thread includes:
(11-1) calculates the numbering CapSMId of the CapSM belonging to current thread, and the process includes following subprocess:
(11-1-1) calculates the thread block number PBlockId belonging to current thread:PBlockId=blockDim.x* BlockIdx.y+blockIdx.x, wherein blockDim.x, blockIdx.y and blockIdx.x are that CUDA carries for each thread What is supplied is privately owned grand, and thread can be used directly in running;
(11-1-2) is according to the active threads block maximum quantity MaxActivePBlock accommodated in each SM or CapSMiWith Place thread block number PBlockId, calculate affiliated CapSM numberings:
(11-2) calculates the task block scope of each persistently thread block processing, and the process includes following subprocess:
(11-2-1) calculates the task block initial value StartTaskId of distribution:StartTaskId=PBlockId* TaskBlocksPerPBlocki
(11-2-2) calculates the task block end value StopTaskId of distribution:StopTaskId=StartTaskId+ TaskBlocksPerPBlocki
It is the relation schematic diagram between CapSM, lasting thread block and task block as shown in Figure 5, wherein every 3 task blocks are distributed To a lasting thread block, every 3 lasting thread blocks distribute to a CapSM again.
(11-3) calculates numbering PBIdInCapSM of the current affiliated thread block in CapSM:PBIdInCapSM= PBlockId%CapSMId;
The task block scope that (11-4) obtains according to above procedure, into loop control structure, distribution is performed successively Corresponding task in all task blocks;
(12) if task receives the order of resource release on GPU in running, step (13) is performed, otherwise Perform step (14);
(13) if being run on GPU for task, the order of release resource is received, then discharges the resource of specified range. If as shown in fig. 6, having task block to be not carried out in the resource being released, the task block that these are not carried out is remapped to residue Resource on continue executing with, above task block remaps to be included in each specific execution step persistently in thread block:
(13-1) calculates the CapSM quantity NumberPerCapSM being released that each CapSM is responsible for mappingi
Wherein CapSMQuotaiFor the resource distribution of GPU task, that is, the CapSM quantity configured when submitting task;
(13-2) calculates the CapSM scopes being released that each CapSM is responsible for mapping, including following two subprocess:
(13-2-1) calculates the CapSM initial values of mapping:CapSMRemapStart=(CapSMId- evictCapSMNumi)*NumberPerCapSMi
(13-2-2) calculates the CapSM end values of mapping:CapSMRemapEnd=CapSMRemapStart+ NumberPerCapSMi
(13-3) each CapSM in the CapSM scopes of Choose for user successively, set the CapSM that currently selects for CapSMRemapCur, if the task block in all CapSM in mapping range is carried out completing, jump to step (14);
(13-4) modification variable PBlockId is corresponding thread block number in the CapSMRemapCur currently to be performed: PBlockId=CapSMRemapCur*MaxActivePBlocki+PBIdInCapSM;
(13-5) performs all task blocks being not carried out in PBlockId according to current thread block number PBlockId, If all task executeds being not carried out in the CapSMRemapCur currently mapped are completed, step (13-3) is jumped to;
(13-6) go to procedure (13-5) continue executing with;
(14) tasks carrying is completed, and exits GPU.
In a word, when mixed load (including online task and offline task) shares GPU resource, the present invention passes through fine granularity Use of the different type task to GPU resource is managed, task resource quota and resource on-line tuning are supported, in shared GPU resource While, ensure the service quality of online task.Propose a stream multiprocessor (Streaming based on capacity Multiprocessor, hereinafter referred to as SM) abstract model CapSM, the elementary cell using CapSM as resource management, one CapSM and SM is of equal value on capacity, i.e., maximum quantity and the original SM of the thread block being active on a CapSM It is of equal value., first can be from resource management API when offline application and application on site are submitting GPU task by resource management API The resource request information of offer is parsed, then the type, resource request and current system GPU resource state according to task are come true The fixed final resource for task distribution.If current system GPU resource is sufficient, the resource of its request is distributed for task;If Current GPU resource can not meet the resource allocation request of task, then be further processed according to task type, if task It is offline task, then currently available GPU resource is distributed into task, if task is online task, passes through resource reclaim Mechanism discharges the part resource for the offline task run on current GPU, to meet the resource request of online task, can so protect The performance of online task is demonstrate,proved, and can makes full use of GPU resource.
Non-elaborated part of the present invention belongs to techniques well known.
It is described above, part embodiment only of the present invention, but protection scope of the present invention is not limited thereto, and is appointed What those skilled in the art the invention discloses technical scope in, the change or replacement that can readily occur in should all be covered Within protection scope of the present invention.

Claims (10)

1. a kind of fine granularity GPU resource management method for mixed load, it is characterised in that the mixed load is by task It is divided into online task and offline task, when online task and offline task sharing GPU resource, uses a kind of SM based on capacity Abstract model carrys out use of the fine granularity management different type task to GPU resource as the elementary cell of resource management, supports to appoint Business resource quota and resource on-line tuning, while shared GPU resource, the service quality of online task is ensured, including it is following Step:
(1) user by asset management application DLL (Application Programming Interface, with Lower abbreviation API) to GPU submit task (such as not specified otherwise, task includes online task and offline task) when, task is set Resource request information, if task is offline task, be provided that the resource upper limit of task, i.e. quota, if task be Line task, then it is provided that the minimum resources amount of task, i.e., pre- allowance;
(2) the submission information gone out on missions by resource management API parsings, including kernel functions, task number of blocks, task block are big Small and task resource request;
(3) according to the kernel functions of task, and task block size, the active threads that can be accommodated on a GPU SM are calculated Number of blocks;
(4) according to the operation conditions applied on current GPU, remaining available resource amount on GPU is calculated;
(5) if current GPU resource surplus is no less than the resource request for the task that step (2) obtains, step (6) is performed, Otherwise, task (8) is performed;
(6) resource distribution for setting task is the resource request of task;
(7) the active threads number of blocks determined according to the resource distribution of task and step (3), calculating task are submitted to GPU fortune The thread number of blocks that should be created during row and the task number of blocks of each thread block distribution, then perform step (11);
(8) if current task is offline task, step (9) is performed, if online task, then performs task (10);
(9) current GPU resource surplus is set to the resource distribution of task, then goes to step (7) execution;
(10) according to current GPU resource surplus, and the resource request of task, resource difference is calculated, then to current GPU The offline task of upper operation sends resource release commands, the stock number for specifying offline task release resource difference, then goes to Step (6);
(11) according to the thread number of blocks of the GPU task calculated, task is submitted to GPU, is created thread and is brought into operation;
(12) if task receives the order of resource release on GPU in running, step (13) is performed, is otherwise performed Step (14);
(13) if being run on GPU for task, the order of release resource is received, then discharges the resource of specified range, if There is task block to be not carried out in the resource being released, then the task block being not carried out these, which is remapped in remaining resource, to be continued to hold OK;
(14) tasks carrying is completed, and exits GPU.
A kind of 2. fine granularity GPU resource management method for mixed load according to claim 1, it is characterised in that: The resource management elementary cell used is a kind of SM abstract models based on capacity, and hereinafter referred to as CapSM, CapSM realize as follows:
(1-1) gives a GPU, sets each SM capacity as 1 bodge;Give a kernel tasks K, it is assumed that The thread number of blocks being active that the upper SM of GPU can accommodate task K is M;
(1-2) foundation M, each SM being abstracted as M small bursts, the capacity of each small burst is 1/M bodge, and often Individual small burst can only accommodate task K thread block;
After GPU all SM are divided into multiple small bursts by (1-3) according to above method, for any N number of small burst, if it Capacity and of equal value with physics SM, then it is assumed that this N number of small burst forms a CapSM;
(1-4) forms a CapSM for task K, any M small bursts.
A kind of 3. fine granularity GPU resource management method for mixed load according to claim 2, it is characterised in that: The rm-cell CapSM,
(1-1) forms each CapSM M small bursts, can come from same SM, can be from multiple different SM;
(1-2) each small burst is corresponding with a thread block, and a CapSM can be regarded as the set of one group of thread block, because This, when realizing, it is management to thread number of blocks that the management to CapSM, which can be converted to,;
The SM abstract models CapSM of (1-3) based on capacity is independent of specific GPU architecture and GPU parallel programming languages, CapSM Concept can be readily applied to other GPU architectures and GPU parallel programming languages.
A kind of 4. fine granularity GPU resource management method for mixed load according to claim 1, it is characterised in that: Need to change the original kernel functions of the task, the thread for making task run on GPU is lasting thread, specifically Transfer process is as follows:
(1-1) inserts loop control structure into original kernel functions, and original kernel function bodies are as loop control structure Loop body;
(1-2) loop body performs each task by traveling through as each task that persistently thread distributes successively, and by variable TaskIdx is arranged to the affiliated task block number of task currently performed;
The variable blockIdx that the affiliated thread block index of thread is represented in original kernel function bodies is changed to represent currently by (1-3) The variable taskIdx of affiliated task block.
A kind of 5. fine granularity GPU resource management method for mixed load according to claim 1, it is characterised in that: In the step (1), user by resource management API to GPU submit task when, the resource management API that uses provides following two Kind task way of submission:
(1-1), come operation task, mainly for offline task, limits the resource that offline task uses by way of resource quota Amount, when submitting task using which, it is desirable to provide resource quota amount quota, the kernel functions of operation task, task Task number of blocks TaskBlockNumber and task block size TaskBlockSize is the number of threads in each task block;
(1-2) by way of resource reservation come operation task, mainly for online task, when submitting task using which, Need to be provided as the reserved stock number reservation of the task, the kernel functions of operation task, the task block number of task TaskBlockNumber is measured, and task block size TaskBlockSize is the number of threads in each task block.
A kind of 6. fine granularity GPU resource management method for mixed load according to claim 1, it is characterised in that institute State in step (3), according to the kernel functions of task, and task block size, calculate the work that can be accommodated on a GPU SM Moving-wire journey number of blocks, the process can be by calculating Unified Device framework (Compute Unified Device Architecture, hereinafter referred to as CUDA) maximum activity thread number of blocks in each SM of acquisition that provides API it is every to calculate The maximum activity thread number of blocks MaxActivePBlock that can be accommodated on individual SM or CapSMi
A kind of 7. fine granularity GPU resource management method for mixed load according to claim 1, it is characterised in that: In the step (7), according to the resource distribution CapSMQuota of taskiAnd the active threads number of blocks that step (3) determines MaxActivePBlocki, the thread number of blocks PBlockNumber that should create when calculating task is submitted to GPU operationsiIt is and every The task number of blocks TaskBlocksPerPBlock of individual thread block distributioni, detailed process includes:
(7-1) combines the resource distribution CapSMQuota of taskiAnd MaxActivePBlocki, the thread number of blocks of calculating task PBlockNumberi, PBlockNumberi=CapSMQuotai*MaxActivePBlocki
(7-2) is according to the task number of blocks TaskBlockNumber of taskiWith thread number of blocks PBlockNumberi, it is calculated as The task number of blocks of each thread block distribution,
A kind of 8. fine granularity GPU resource management method for mixed load according to claim 1, it is characterised in that: In the step (11), task is submitted to GPU, is created thread and is brought into operation, the specific execution step of each thread includes:
(11-1) calculates the numbering CapSMId of the CapSM belonging to current thread, and the process includes following subprocess:
(11-1-1) calculates the thread block number PBlockId belonging to current thread:PBlockId=blockDim.x* BlockIdx.y+blockIdx.x, wherein blockDim.x, blockIdx.y and blockIdx.x are that CUDA carries for each thread What is supplied is privately owned grand, and thread can be used directly in running;
(11-1-2) is according to the active threads block maximum quantity MaxActivePBlock accommodated in each SM or CapSMiAnd place Thread block number PBlockId, calculate affiliated CapSM numberings:
(11-2) calculates the task block scope of each persistently thread block processing, and the process includes following subprocess:
(11-2-1) calculates the task block initial value StartTaskId of distribution:StartTaskId=PBlockId* TaskBlocksPerPBlocki
(11-2-2) calculates the task block end value StopTaskId of distribution:StopTaskId=StartTaskId+ TaskBlocksPerPBlocki
(11-3) calculates numbering PBIdInCapSM of the current affiliated thread block in CapSM:PBIdInCapSM= PBlockId%CapSMId;
The task block scope that (11-4) obtains according to above procedure, into loop control structure, all of distribution are performed successively Corresponding task in task block.
A kind of 9. fine granularity GPU resource management method for mixed load according to claim 1, it is characterised in that: In the step (10), to current GPU on the offline task run send resource release commands, make offline task release resource poor The stock number that volume is specified, including following two stages:
(10-1) changes resource release mark evictCapSMNum at CPU endsiValue, evictCapSMNumiCan in CPU and It is synchronous between GPU, evictCapSMNumiValue represent to need the CapSM quantity that discharges;
The loop control structure of each persistently thread for the task that (10-2) is run on GPU starts to examine before performing loop body every time Look into evictCapSMNumiValue, all CapSMId are less than evictCapSMNumiThread will exit execution, so, evictCapSMNumiThe CapSM resources of individual quantity will be released.
A kind of 10. fine granularity GPU resource management method for mixed load according to claim 1, it is characterised in that: In the step (13), if having task block to be not carried out in the resource being released, the task block that these are not carried out remaps Continued executing with to remaining resource, it is as follows that lasting thread block carries out the concrete processing procedure that task block remaps:
(13-1) calculates the CapSM quantity NumberPerCapSM being released that each CapSM is responsible for mappingi
Wherein CapSMQuotaiFor the resource distribution of GPU task, that is, the CapSM quantity configured when submitting task;
(13-2) calculates the CapSM scopes being released that each CapSM is responsible for mapping, including following two subprocess:
(13-2-1) calculates the CapSM initial values of mapping:
CapSMRemapStart=(CapSMId-evictCapSMNumi)*NumberPerCapSMi
(13-2-2) calculates the CapSM end values of mapping:
CapSMRemapEnd=CapSMRemapStart+NumberPerCapSMi
(13-3) each CapSM in the CapSM scopes of Choose for user successively, set the CapSM that currently selects for CapSMRemapCur, if the task block in all CapSM in mapping range is carried out completing, jump to step (14);
(13-4) modification variable PBlockId is corresponding thread block number in the CapSMRemapCur currently to be performed: PBlockId=CapSMRemapCur*MaxActivePBlocki+PBIdInCapSM;
(13-5) performs all task blocks being not carried out in PBlockId according to current thread block number PBlockId, if All task executeds being not carried out in the CapSMRemapCur currently mapped are completed, then jump to step (13-3);
(13-6) go to procedure (13-5) continue executing with.
CN201710563834.7A 2017-07-12 2017-07-12 Fine-grained GPU resource management method for mixed load Expired - Fee Related CN107357661B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710563834.7A CN107357661B (en) 2017-07-12 2017-07-12 Fine-grained GPU resource management method for mixed load

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710563834.7A CN107357661B (en) 2017-07-12 2017-07-12 Fine-grained GPU resource management method for mixed load

Publications (2)

Publication Number Publication Date
CN107357661A true CN107357661A (en) 2017-11-17
CN107357661B CN107357661B (en) 2020-07-10

Family

ID=60292105

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710563834.7A Expired - Fee Related CN107357661B (en) 2017-07-12 2017-07-12 Fine-grained GPU resource management method for mixed load

Country Status (1)

Country Link
CN (1) CN107357661B (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108710536A (en) * 2018-04-02 2018-10-26 上海交通大学 A kind of multi-level fine-grained virtualization GPU method for optimizing scheduling
CN109298936A (en) * 2018-09-11 2019-02-01 华为技术有限公司 A kind of resource regulating method and device
CN109412874A (en) * 2018-12-21 2019-03-01 腾讯科技(深圳)有限公司 Configuration method, device, server and the storage medium of device resource
CN109445565A (en) * 2018-11-08 2019-03-08 北京航空航天大学 A kind of GPU QoS guarantee method exclusive and reserved based on stream multiple processor cores
CN109840877A (en) * 2017-11-24 2019-06-04 华为技术有限公司 A kind of graphics processor and its resource regulating method, device
CN109936604A (en) * 2017-12-18 2019-06-25 北京图森未来科技有限公司 A kind of resource regulating method, device and system
CN110289990A (en) * 2019-05-29 2019-09-27 清华大学 Network function virtualization system, method and storage medium based on GPU
CN110415162A (en) * 2019-07-22 2019-11-05 中国人民大学 Towards the adaptive figure division methods of isomery fusion treatment device in big data
CN110781007A (en) * 2019-10-31 2020-02-11 广州市网星信息技术有限公司 Task processing method, device, server, client, system and storage medium
CN111597045A (en) * 2020-05-15 2020-08-28 上海交通大学 Shared resource management method, system and server system for managing mixed deployment
CN111597034A (en) * 2019-02-21 2020-08-28 阿里巴巴集团控股有限公司 Processor resource scheduling method and device, terminal equipment and computer storage medium
CN111712793A (en) * 2018-02-14 2020-09-25 华为技术有限公司 Thread processing method and graphics processor
CN111736987A (en) * 2020-05-29 2020-10-02 山东大学 Task scheduling method based on GPU space resource sharing
WO2021104083A1 (en) * 2019-11-28 2021-06-03 中兴通讯股份有限公司 Gpu operating method, apparatus, device, and storage medium
WO2021128079A1 (en) * 2019-12-25 2021-07-01 阿里巴巴集团控股有限公司 Data processing method, image recognition method, processing server, system, and electronic device
CN113296921A (en) * 2020-04-07 2021-08-24 阿里巴巴集团控股有限公司 Cloud resource scheduling method, node, system and storage medium
CN113407333A (en) * 2020-12-18 2021-09-17 上海交通大学 Task scheduling method, system, GPU and equipment for Warp level scheduling
CN113411230A (en) * 2021-06-09 2021-09-17 广州虎牙科技有限公司 Container-based bandwidth control method and device, distributed system and storage medium
CN113590317A (en) * 2021-07-27 2021-11-02 杭州朗和科技有限公司 Scheduling method, device, medium and computing equipment of offline service
CN114035935A (en) * 2021-10-13 2022-02-11 上海交通大学 High-throughput heterogeneous resource management method and device for multi-stage AI cloud service
CN114579284A (en) * 2022-03-30 2022-06-03 阿里巴巴(中国)有限公司 Task scheduling method and device
CN116893854A (en) * 2023-09-11 2023-10-17 腾讯科技(深圳)有限公司 Method, device, equipment and storage medium for detecting conflict of instruction resources

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110169840A1 (en) * 2006-12-31 2011-07-14 Lucid Information Technology, Ltd Computing system employing a multi-gpu graphics processing and display subsystem supporting single-gpu non-parallel (multi-threading) and multi-gpu application-division parallel modes of graphics processing operation
CN102958166A (en) * 2011-08-29 2013-03-06 华为技术有限公司 Resource allocation method and resource management platform
CN104243617A (en) * 2014-10-14 2014-12-24 中国科学院信息工程研究所 Task scheduling method and system facing mixed load in heterogeneous cluster
CN103365726B (en) * 2013-07-08 2016-05-25 华中科技大学 A kind of method for managing resource towards GPU cluster and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110169840A1 (en) * 2006-12-31 2011-07-14 Lucid Information Technology, Ltd Computing system employing a multi-gpu graphics processing and display subsystem supporting single-gpu non-parallel (multi-threading) and multi-gpu application-division parallel modes of graphics processing operation
CN102958166A (en) * 2011-08-29 2013-03-06 华为技术有限公司 Resource allocation method and resource management platform
CN103365726B (en) * 2013-07-08 2016-05-25 华中科技大学 A kind of method for managing resource towards GPU cluster and system
CN104243617A (en) * 2014-10-14 2014-12-24 中国科学院信息工程研究所 Task scheduling method and system facing mixed load in heterogeneous cluster

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ALEXEY TUMANOV ET.AL: "JamaisVu:Robust Scheduling with Auto-Estimated Job Runtimes", 《PARALLEL DATA LABORATORY》 *
CHUNG-SHENG LI ET.AL: "Disaggregated Architecture for at Scale Computing", 《IN PROCEEDINGS OF THE 2ND INTERNATIONAL WORKSHOP ON EMERGING SOFTWARE AS A SERVICE AND ANALYTICS (ESAASA-2015)》 *
吕相文 等: "云计算环境下多GPU资源调度机制研究", 《小型微型计算机系统》 *
陈文斌 等: "基于GPU/CPU混合架构的流程序多粒度划分与调度方法研究", 《计算机工程与科学》 *

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109840877B (en) * 2017-11-24 2023-08-22 华为技术有限公司 Graphics processor and resource scheduling method and device thereof
CN109840877A (en) * 2017-11-24 2019-06-04 华为技术有限公司 A kind of graphics processor and its resource regulating method, device
CN109936604A (en) * 2017-12-18 2019-06-25 北京图森未来科技有限公司 A kind of resource regulating method, device and system
CN111712793B (en) * 2018-02-14 2023-10-20 华为技术有限公司 Thread processing method and graphic processor
CN111712793A (en) * 2018-02-14 2020-09-25 华为技术有限公司 Thread processing method and graphics processor
CN108710536B (en) * 2018-04-02 2021-08-06 上海交通大学 Multilevel fine-grained virtualized GPU (graphics processing Unit) scheduling optimization method
CN108710536A (en) * 2018-04-02 2018-10-26 上海交通大学 A kind of multi-level fine-grained virtualization GPU method for optimizing scheduling
CN109298936A (en) * 2018-09-11 2019-02-01 华为技术有限公司 A kind of resource regulating method and device
CN109298936B (en) * 2018-09-11 2021-05-18 华为技术有限公司 Resource scheduling method and device
CN109445565A (en) * 2018-11-08 2019-03-08 北京航空航天大学 A kind of GPU QoS guarantee method exclusive and reserved based on stream multiple processor cores
CN109412874A (en) * 2018-12-21 2019-03-01 腾讯科技(深圳)有限公司 Configuration method, device, server and the storage medium of device resource
CN109412874B (en) * 2018-12-21 2021-11-02 腾讯科技(深圳)有限公司 Equipment resource configuration method, device, server and storage medium
CN111597034A (en) * 2019-02-21 2020-08-28 阿里巴巴集团控股有限公司 Processor resource scheduling method and device, terminal equipment and computer storage medium
CN111597034B (en) * 2019-02-21 2023-04-28 阿里巴巴集团控股有限公司 Processor resource scheduling method and device, terminal equipment and computer storage medium
CN110289990A (en) * 2019-05-29 2019-09-27 清华大学 Network function virtualization system, method and storage medium based on GPU
CN110415162A (en) * 2019-07-22 2019-11-05 中国人民大学 Towards the adaptive figure division methods of isomery fusion treatment device in big data
CN110781007A (en) * 2019-10-31 2020-02-11 广州市网星信息技术有限公司 Task processing method, device, server, client, system and storage medium
CN110781007B (en) * 2019-10-31 2023-12-26 广州市网星信息技术有限公司 Task processing method, device, server, client, system and storage medium
WO2021104083A1 (en) * 2019-11-28 2021-06-03 中兴通讯股份有限公司 Gpu operating method, apparatus, device, and storage medium
WO2021128079A1 (en) * 2019-12-25 2021-07-01 阿里巴巴集团控股有限公司 Data processing method, image recognition method, processing server, system, and electronic device
CN113296921A (en) * 2020-04-07 2021-08-24 阿里巴巴集团控股有限公司 Cloud resource scheduling method, node, system and storage medium
CN111597045A (en) * 2020-05-15 2020-08-28 上海交通大学 Shared resource management method, system and server system for managing mixed deployment
CN111597045B (en) * 2020-05-15 2023-04-07 上海交通大学 Shared resource management method, system and server system for managing mixed deployment
CN111736987B (en) * 2020-05-29 2023-08-04 山东大学 Task scheduling method based on GPU space resource sharing
CN111736987A (en) * 2020-05-29 2020-10-02 山东大学 Task scheduling method based on GPU space resource sharing
CN113407333B (en) * 2020-12-18 2023-05-26 上海交通大学 Task scheduling method, system, GPU and equipment for Warp level scheduling
CN113407333A (en) * 2020-12-18 2021-09-17 上海交通大学 Task scheduling method, system, GPU and equipment for Warp level scheduling
CN113411230A (en) * 2021-06-09 2021-09-17 广州虎牙科技有限公司 Container-based bandwidth control method and device, distributed system and storage medium
CN113590317A (en) * 2021-07-27 2021-11-02 杭州朗和科技有限公司 Scheduling method, device, medium and computing equipment of offline service
CN114035935A (en) * 2021-10-13 2022-02-11 上海交通大学 High-throughput heterogeneous resource management method and device for multi-stage AI cloud service
CN114579284A (en) * 2022-03-30 2022-06-03 阿里巴巴(中国)有限公司 Task scheduling method and device
CN116893854B (en) * 2023-09-11 2023-11-14 腾讯科技(深圳)有限公司 Method, device, equipment and storage medium for detecting conflict of instruction resources
CN116893854A (en) * 2023-09-11 2023-10-17 腾讯科技(深圳)有限公司 Method, device, equipment and storage medium for detecting conflict of instruction resources

Also Published As

Publication number Publication date
CN107357661B (en) 2020-07-10

Similar Documents

Publication Publication Date Title
CN107357661A (en) A kind of fine granularity GPU resource management method for mixed load
CN104021040B (en) Based on the cloud computing associated task dispatching method and device under time constraint condition
CN108876702A (en) A kind of training method and device accelerating distributed deep neural network
CN104461467B (en) The method for improving calculating speed using MPI and OpenMP hybrid parallels for SMP group systems
CN104331321B (en) Cloud computing task scheduling method based on tabu search and load balancing
CN104731657B (en) A kind of resource regulating method and system
CN105389206B (en) A kind of cloud computation data center resources of virtual machine quickly configuration method
CN104619029B (en) It is a kind of centralization cellular network architecture under baseband pool resource allocation methods and device
WO2024021489A1 (en) Task scheduling method and apparatus, and kubernetes scheduler
CN105094751B (en) A kind of EMS memory management process for stream data parallel processing
CN105373426B (en) A kind of car networking memory aware real time job dispatching method based on Hadoop
CN103401939A (en) Load balancing method adopting mixing scheduling strategy
CN103731372A (en) Resource supply method for service supplier under hybrid cloud environment
CN102968344A (en) Method for migration scheduling of multiple virtual machines
CN105872114A (en) Video monitoring cloud platform resource scheduling method and device
CN108170517A (en) A kind of container allocation method, apparatus, server and medium
CN113672391B (en) Parallel computing task scheduling method and system based on Kubernetes
CN109522090A (en) Resource regulating method and device
TW202220469A (en) Resource management system and resource management method
CN106998340B (en) Load balancing method and device for board resources
CN116401055A (en) Resource efficiency optimization-oriented server non-perception computing workflow arrangement method
CN102325054A (en) Self-adaptive adjusting method for hierarchy management of distributed type calculation management platform cluster
CN111404818A (en) Routing protocol optimization method for general multi-core network processor
CN106775975A (en) Process scheduling method and device
US20230161620A1 (en) Pull mode and push mode combined resource management and job scheduling method and system, and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20210420

Address after: 100160, No. 4, building 12, No. 128, South Fourth Ring Road, Fengtai District, Beijing, China (1515-1516)

Patentee after: Kaixi (Beijing) Information Technology Co.,Ltd.

Address before: 100191 Haidian District, Xueyuan Road, No. 37,

Patentee before: BEIHANG University

TR01 Transfer of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200710

Termination date: 20210712

CF01 Termination of patent right due to non-payment of annual fee