WO2017185285A1 - 图形处理器任务的分配方法和装置 - Google Patents

图形处理器任务的分配方法和装置 Download PDF

Info

Publication number
WO2017185285A1
WO2017185285A1 PCT/CN2016/080478 CN2016080478W WO2017185285A1 WO 2017185285 A1 WO2017185285 A1 WO 2017185285A1 CN 2016080478 W CN2016080478 W CN 2016080478W WO 2017185285 A1 WO2017185285 A1 WO 2017185285A1
Authority
WO
WIPO (PCT)
Prior art keywords
gpu
parameter data
target
task
processed
Prior art date
Application number
PCT/CN2016/080478
Other languages
English (en)
French (fr)
Inventor
邓利群
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2016/080478 priority Critical patent/WO2017185285A1/zh
Priority to CN201680084996.6A priority patent/CN109074281B/zh
Publication of WO2017185285A1 publication Critical patent/WO2017185285A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Definitions

  • the present invention relates to the field of information technology, and in particular, to a method and an apparatus for allocating a GPU task of a graphics processor.
  • GPU graphics processing unit
  • the graphics processing unit has the advantages of massive parallel threads and high memory bandwidth, it is suitable for multi-threaded concurrent computationally intensive tasks.
  • GPUs have been used in many other general-purpose computing areas, such as databases, data compression, deep learning, and biocomputing.
  • databases data compression, deep learning, and biocomputing.
  • DNA deoxyribonucleic acid
  • the data required for GPU computing needs to be transferred from the central processing unit (CPU) memory to the GPU memory via the bus interface PCIe (PCI-Express), and the PCIe bandwidth is much smaller than the GPU memory bandwidth.
  • PCIe PCI-Express
  • the mismatch in bandwidth makes it difficult to fully utilize the computational threads of the GPU.
  • the data to be transmitted for each scheduling execution of the task includes not only the set of DNA fragments to be aligned, but also the reference DNA sequence data, and it is found through experiments that: a DNA fragment of about 200 MB
  • the set alignment task has an execution time of approximately 41 seconds on the Nvidia K40 GPU, where the reference DNA sequence data is up to 20 seconds through the PCIe transfer preparation time. Therefore, data transmission on PCIe has become a major bottleneck in GPU computing.
  • the present application provides a method and apparatus for allocating GPU tasks, which can improve the efficiency of the GPU.
  • a method for allocating a GPU task comprising: determining a target GPU for a GPU task to be processed from a GPU cluster, where the GPU cluster includes at least two GPUs, and each GPU in the GPU cluster is saved. Having at least one parameter data; transmitting, to the target GPU, target parameter data required for processing the to-be-processed GPU task, the target GPU is configured to process a GPU task corresponding to the target parameter data; and assigning the to-be-processed GPU to the target GPU processing.
  • the method for allocating a GPU task of the present application determines a target GPU that processes a to-be-processed GPU task in a GPU cluster, and sends target parameter data for processing the to-be-processed GPU task to the target GPU, so that the target GPU is obtained.
  • the target parameter data may be included, and the GPU task to be processed is processed, so that the task using the corresponding parameter data can be processed by the same GPU to implement parameter data multiplexing, which greatly reduces data initialization and transmission of parameter data between the CPU and the GPU.
  • the overhead which can improve the utilization of the GPU.
  • the GPU task can be divided into multiple types. Therefore, after receiving the GPU task to be processed, the target parameter data of the to-be-processed GPU task can be determined.
  • the target parameter data of the GPU task is processed to determine a target GPU that processes the to-be-processed GPU task.
  • the GPU in the GPU cluster may have a GPU that has not been processed by the GPU.
  • the GPU task to be processed may be directly allocated to the GPU for processing.
  • the GPU has been processed by each GPU. The task is explained as an example.
  • the GPU saves parameter data required for processing the GPU task, and implements data multiplexing, so that when the GPU processes the same type of task again, the saved parameter data may be reused. Save parameter data transmission time and improve efficiency. Since the GPU task has been processed by each GPU in the embodiment of the present invention, each GPU in the GPU cluster has saved parameter data.
  • each GPU in the GPU cluster may include one or more parameter data, that is, at least one GPU in the GPU cluster can process multiple tasks, and each task requires one parameter data, so multiple GPUs are saved.
  • Parameter data; or at least two GPUs in the GPU cluster may hold the same parameter data, that is, multiple GPUs in the GPU cluster process GPU tasks corresponding to the same type of parameter data.
  • the GPU cluster there may already be a GPU in the GPU cluster that has processed the same task as the GPU task to be processed, that is, the GPU stores parameter data required for processing the GPU task, that is, target parameter data;
  • the GPU may not exist in the GPU cluster, that is, the to-be-processed GPU task is a new type of task.
  • An embodiment of the present invention is an example in which the target GPU corresponding to the target parameter data does not exist in the GPU cluster, that is, the GPU cluster receives the GPU task to be processed for the first time, and the target GPU task is processed when the target GPU task is processed. , the GPU cluster will first determine the target GPU.
  • the slave GPU cluster is to be The processing of the GPU task determines the target GPU, including: determining the first GPU and the second GPU from the GPU cluster, wherein the first GPU and the second GPU save the same first parameter data; determining that the first GPU is the Target GPU.
  • At least one GPU of the plurality of GPUs that repeatedly process the same type of task can be determined as the target GPU, and the target GPU can process a new type of task, that is, the GPU task to be processed corresponding to the target parameter data and other tasks, thereby improving the GPU. Utilization rate.
  • first GPU and the second GPU may be referred to as one GPU or multiple GPUs, and the first GPU may be determined as the target GPU. Accordingly, the target GPU may be a GPU or may be Multiple GPUs.
  • the method further includes: sending a first delete instruction to the first GPU, The first deletion instruction is used to instruct the first GPU to delete the first parameter data.
  • the target GPU may currently have a GPU task being processed and an unprocessed GPU task, where the GPU task corresponds to the first parameter data, and the target GPU may delete the original saved after processing the task currently being processed.
  • the first parameter data, and other unprocessed GPU tasks are converted to be processed by the second GPU; or, the currently processing GPU tasks and other unprocessed GPU tasks may be converted to be processed by the second GPU. And delete the first parameter data originally saved in the target GPU.
  • the target parameter data may be one or more.
  • the target parameter data may be one or more.
  • at least k GPUs can be determined in the GPU cluster as the target GPU, and k satisfies equation (1):
  • the slave GPU Determining the target GPU for the GPU task to be processed in the cluster includes: determining a third GPU from the GPU cluster, wherein the third GPU saves the second parameter data, and the target parameter data and the size of the second parameter data are And not greater than the memory size of the third GPU; determining that the third GPU is the target GPU, the target GPU is configured to process the GPU task corresponding to the target parameter data and the third parameter data.
  • the required length of the unprocessed GPU task in the third GPU is less than or equal to the first duration, and the to-be-processed The duration required for a GPU task to be processed is less than or equal to the second duration.
  • the first duration and the second duration may be set according to actual conditions. For example, if the processing time of the original task or the GPU task to be processed corresponding to the third GPU is not limited, the first duration and the second duration may be set.
  • the first duration is set to the tolerance value of the waiting time of the GPU task to be processed.
  • the second duration is set to the tolerance value of the original task waiting time corresponding to the third GPU.
  • the slave GPU Determining a target GPU for the GPU task to be processed in the cluster, comprising: determining a fourth GPU and a fifth GPU from the GPU cluster, the third parameter data saved in the fourth GPU, and the fourth parameter data saved in the fifth GPU The sum of the sizes is not greater than the memory of the fourth GPU; the fourth parameter data is sent to the fourth GPU, and the fourth GPU is configured to process the GPU task corresponding to the third parameter data and the fourth parameter data;
  • the fifth GPU is the target GPU.
  • the tasks corresponding to the GPUs that meet the preset conditions in the GPU cluster are merged. For example, when the parameter data saved by the two GPUs is less than or equal to at least one of the GPUs, the tasks of the two GPUs are combined to improve the GPU utilization. .
  • the method further includes: sending a second deletion instruction to the fifth GPU, The second deletion instruction is used to instruct the fifth GPU to delete the fourth parameter data.
  • the required time length of the unprocessed GPU task corresponding to the third parameter data in the fourth GPU is less than or equal to
  • the duration of time required for the unprocessed GPU task corresponding to the fourth parameter data in the fifth GPU is less than or equal to the fourth duration.
  • the third duration and the fourth duration may be set according to actual conditions.
  • the corresponding third time length may be The fourth time length is set to be infinite; or, the third time length may be set as the tolerance value of the waiting time of the task corresponding to the fourth parameter data, and correspondingly, the fourth time length is set as the waiting time of the task corresponding to the third parameter data. Tolerance value, but embodiments of the invention are not limited this.
  • the fourth GPU or the fifth GPU may be selected to process the merged tasks.
  • the merged task is processed by the fourth GPU, and the fifth GPU determines The target GPU is configured to process the to-be-processed GPU task corresponding to the target parameter data; when the sum of the size of the third parameter data and the fourth parameter data is greater than the fourth GPU and less than or equal to the memory size of the fifth GPU, then the merged The task is processed by the fifth GPU, and the fourth GPU is determined to be the target GPU for processing the to-be-processed GPU task corresponding to the target parameter data; when the sum of the size of the third parameter data and the fourth parameter data is less than or equal to the fourth When the GPU is also less than or equal to the fifth GPU, the merged task may be processed by any one of the fourth GPU and the
  • whether the third GPU exists in the GPU cluster may be determined first, and if not, whether the presence or absence of the foregoing is determined.
  • the fourth GPU and the fifth GPU; or, after determining that the first GPU and the second GPU that hold the same first parameter data are not present in the GPU cluster, the GPU cluster may first determine whether the fourth GPU is present. And the fifth GPU, if not, determine whether the third GPU exists.
  • the task type of the previous task may be the same as or different from the task type of the to-be-processed GPU task.
  • the task type of the previous task is the same as the task type of the GPU task to be processed, it indicates that the parameter data required by the target GPU to process the previous task is consistent with the target parameter data of the GPU task to be processed, and then the GPU is in the GPU.
  • the target parameter data has been saved, and the GPU can process the GPU task according to the target parameter data, that is, multiplex the target parameter data. In this way, the data initialization and the overhead of transmitting parameter data between the CPU and the GPU can be greatly reduced, and the GPU usage efficiency is improved.
  • the task type of the previous task when the task type of the previous task is different from the task type of the GPU task to be processed, it indicates that the task processed by the target GPU is merged, if the previous task is a task belonging to the original third GPU, that is, the target
  • the third GPU of the GPU bit, that is, the previous task corresponds to the second parameter data the second parameter data of the previous task may be retained, and a processing class instance of the to-be-processed GPU task is newly created, and the required target parameter is transmitted.
  • the target GPU can process the task corresponding to the second parameter data, and can also process the task corresponding to the target parameter data, and can also implement the multiplexing of the parameter data.
  • the previous task belongs to the original fourth GPU or the original fifth GPU, that is, the tasks in the fourth GPU and the fifth GPU are merged.
  • the original fourth GPU or the fifth GPU is determined as the target GPU.
  • the target GPU deletes the saved parameter data for processing the previous task, and creates a new one.
  • the processing class instance of the GPU task to be processed, the required target parameter data is passed in, so that the target GPU processes the to-be-processed GPU task.
  • the task request fails. For example, after the CPU determines that the task request of the GPU task to be processed fails, the CPU may re-allocate the task or not process the task request.
  • a GPU task allocation apparatus for performing the method of any of the above first aspect or any of the possible implementations of the first aspect.
  • the apparatus comprises means for performing the method of any of the above-described first aspect or any of the possible implementations of the first aspect.
  • a third aspect provides a GPU task allocation apparatus, including: a storage unit and a processor, the storage unit is configured to store an instruction, the processor is configured to execute an instruction stored by the memory, and when the processor executes the memory storage The execution causes the processor to perform the method of the first aspect or any of the possible implementations of the first aspect.
  • a computer readable medium for storing a computer program comprising instructions for performing the method of the first aspect or any of the possible implementations of the first aspect.
  • FIG. 1 is a schematic flowchart of a method for allocating a GPU task according to an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of an application scenario of a method for allocating a GPU task according to an embodiment of the present invention.
  • FIG. 3 is a schematic flowchart of a method for allocating a GPU task according to another embodiment of the present invention.
  • FIG. 4 is a schematic block diagram of a device for allocating GPU tasks according to an embodiment of the present invention.
  • FIG. 5 is a schematic block diagram of a device for allocating GPU tasks according to another embodiment of the present invention.
  • FIG. 1 is a schematic flowchart of a method 100 for allocating a GPU task according to an embodiment of the present invention.
  • the method may be executed by a processor, for example, by a CPU.
  • the method 100 includes:
  • the client may include one or more applications, and each application may run one or more threads.
  • the client may include one or more applications, and each application may run one or more threads.
  • the computing node or all the GPUs included in the computing node cluster may be regarded as a GPU cluster, and the computing node or the computing node cluster may be uniformly managed and allocated, for example, The CPU manages and allocates.
  • each thread of the application end may send a GPU task request to the computing node or the computing node cluster when encountering a GPU task that needs to be processed by the GPU, and the CPU in the computing node or the computing node cluster is based on the current computing environment.
  • the GPU task request is dispatched to a specific GPU to wait for the corresponding GPU to schedule execution.
  • the client may send a task request to the computing node or the computing node cluster, where the task request is used to allocate a target GPU in the GPU cluster for the GPU task to be processed, so that the GPU processes the to-be-processed GPU task. .
  • the CPU may first determine the task request.
  • Target parameter data for the pending GPU task Specifically, the GPU task needs to utilize parameter data when processed, and the parameter data does not change during processing of the GPU task.
  • the parameter data may be a calculation function, or for a DNA sequence comparison class task, the parameter The data can be a reference DNA sequence.
  • the CPU can divide tasks into multiple types according to different parameter data, that is, tasks that access the same parameter data can be classified into the same task type. Therefore, for the GPU task to be processed, the target parameter data required for the to-be-processed GPU task to be processed may be determined first, and the target GPU that processes the to-be-processed GPU task is determined according to the target parameter data.
  • the GPU in the GPU cluster may have a GPU that has not been processed by the GPU, and the GPU task to be processed may be directly allocated to the GPU for processing; however, each GPU in the GPU cluster may also be processed.
  • the GPU task is described in the embodiment of the present invention as an example in which each GPU has processed a GPU task.
  • the GPU saves parameter data required for processing the GPU task, and implements data multiplexing, so that when the GPU processes the same type of task again, the saved parameter data may be reused. Save parameter data transmission time and improve efficiency. Since the GPU task has been processed by each GPU in the embodiment of the present invention, each GPU in the GPU cluster has saved parameter data.
  • each GPU in the GPU cluster may include one or more parameter data, that is, at least one GPU in the GPU cluster can process multiple tasks, and each task requires one parameter data, so multiple GPUs are saved.
  • Parameter data; or at least two GPUs in the GPU cluster may hold the same parameter data, that is, multiple GPUs in the GPU cluster process GPU tasks corresponding to the same type of parameter data.
  • the GPU may have the same type of task as the GPU task to be processed, that is, the GPU stores the parameter data required to process the GPU task, that is, the target parameter. Data; however, the GPU may not exist in the GPU cluster, that is, the pending GPU task is a new type of task.
  • the target GPU corresponding to the target parameter data of the to-be-processed GPU task may be determined by searching the parameter data and the GPU mapping table. Alternatively, the mapping table may be saved in the memory of the CPU.
  • the GPU when it is determined that the target GPU corresponding to the target parameter data exists in the GPU cluster, the GPU is determined as the target GPU, the target parameter data is saved in the target GPU, and the same target as the to-be-processed GPU task is processed.
  • the task of parameter data so the target GPU can continue to process the pending GPU task without requiring the target parameter data to be transmitted again.
  • the target GPU needs to be determined for the GPU task to be processed in the GPU cluster, and the target GPU processes the to-be-processed GPU task.
  • the target GPU corresponding to the target parameter data does not exist in the GPU cluster as an example, that is, the GPU cluster receives the GPU task to be processed for the first time, and the to-be-processed GPU task is used when being processed. With the target parameter data, the GPU cluster determines the target GPU and proceeds to S120.
  • the target parameter data of the to-be-processed GPU task is saved first, so as to be passed later.
  • the target parameter data processing other tasks of the same type, and implementing data multiplexing.
  • the target GPU task may be processed by the target parameter data, and the CPU may allocate the task to the target GPU for processing.
  • the method for allocating the GPU task in the embodiment of the present invention determines the target GPU that processes the GPU task to be processed in the GPU cluster, and sends the target parameter data of the GPU task to be processed to the target GPU, so that the target GPU can
  • the target GPU is saved, and the target GPU can be used to process the GPU task to be processed, and can also be used to process other GPU tasks corresponding to the target parameter data, so that tasks using the same parameter data can be processed by the same GPU to implement parameters.
  • Data multiplexing greatly reduces data initialization and the overhead of transferring parameter data between the CPU and the GPU, thereby improving GPU utilization.
  • FIG. 3 shows a GPU according to another embodiment of the present invention.
  • S210 Receive a task request, where the task request is used to request to process a GPU task to be processed.
  • the task request may be received by the CPU, and the CPU allocates the to-be-processed GPU task requested to be processed in the task request.
  • the CPU can perform unified management and allocation on the GPU cluster, and each GPU can store one or more parameter data, and can search for parameter data according to the target parameter data required by the to-be-processed GPU task when being processed.
  • the GPU mapping table determines whether there is a GPU corresponding to the target parameter data of the to-be-processed GPU task, and the corresponding GPU stores the target parameter data, if yes, executes S240; if not, executes S250.
  • the GPU corresponding to the target parameter data of the GPU task to be processed in the GPU cluster is determined as the target GPU, and the target GPU saves the target parameter data, which can be used to process the to-be-processed GPU task.
  • the target GPU may be one or more GPUs, and when there are multiple target GPUs, the S260 may continue to be executed.
  • S250 Reassign the GPU in the GPU cluster, determine the target GPU, and continue to execute S260.
  • the target GPU may be used to save target parameter data for processing a GPU task to be processed, that is, the target GPU may be used to process a GPU task to be processed, and other similar GPUs that use the same target parameter data as the to-be-processed GPU task. task.
  • reallocating the GPUs in the GPU cluster includes: selecting one or more GPUs from the plurality of GPUs that store the same parameter data as the target GPU, and the target GPU saves the target parameter data instead; or
  • the GPU that saves other parameter data is determined as the target GPU, so that the target GPU includes other parameter data and target parameter data at the same time; or, the two GPUs that can be merged in the GPU cluster are merged, and one of the GPUs processes the original two GPUs. All of the tasks, while another GPU is determined to be the target GPU, used to save target parameter data and process pending GPU tasks.
  • the first GPU and the second GPU in the GPU cluster save the same first parameter data
  • the first GPU and the second GPU when there are multiple GPUs in the GPU cluster that hold the same parameter data, for example, the first GPU and the second GPU in the GPU cluster save the same first parameter data
  • the first GPU and the second GPU when there are multiple GPUs in the GPU cluster that hold the same parameter data, for example, the first GPU and the second GPU in the GPU cluster save the same first parameter data, the first GPU and the second GPU.
  • the GPU may be referred to as a GPU or a plurality of GPUs, and the first GPU may be determined as a target GPU. Accordingly, the target GPU may be a GPU or multiple GPUs.
  • a first delete instruction may be sent to the target GPU, indicating that the target GPU deletes The first parameter data that was originally saved.
  • the target GPU may currently have a GPU task being processed and an unprocessed GPU task, where the GPU task corresponds to the first parameter data, and the target GPU may delete the original saved after processing the task currently being processed.
  • the first parameter data, and other unprocessed GPU tasks are converted to be processed by the second GPU; or, the currently processing GPU tasks and other unprocessed GPU tasks may be converted to be processed by the second GPU. And delete the first parameter data originally saved in the target GPU.
  • the target parameter data may be one or more.
  • the target parameter data may be one or more.
  • at least k GPUs can be determined in the GPU cluster as the target GPU, and k satisfies equation (1):
  • n is the number of GPUs in the GPU cluster
  • N is the number of types of parameter data corresponding to the current GPU cluster in addition to the target parameters; Indicates a rounding down operation.
  • the GPU cluster may be determined in the GPU cluster.
  • the third GPU that meets the first preset condition is the target GPU, and the second GPU is saved with the second parameter data.
  • the target parameter data may be processed corresponding to the GPU task to be processed. It is also possible to continue processing the task corresponding to the second parameter data that can be processed.
  • the first preset condition that is satisfied by the third GPU includes: the sum of the second parameter data saved in the third GPU and the target parameter data of the GPU task to be processed is less than or equal to the memory size of the third GPU.
  • the second parameter data is parameter data saved before the third GPU, and the second parameter data may be one or more parameter data, and the embodiment of the present invention is not limited thereto.
  • the first preset condition that the third GPU meets may further include: the duration required for the existing task in the third GPU to be processed is less than or equal to the first duration, and the to-be-processed GPU task is processed.
  • the required time is less than or equal to the second duration.
  • the first duration and the second duration may be set according to actual conditions. For example, if the processing time of the original task or the GPU task to be processed corresponding to the third GPU is not limited, the first duration and the second duration may be set.
  • the first time duration is set to the tolerance value of the GPU task waiting time to be processed, and the second time length is set to the tolerance value of the original task waiting time corresponding to the third GPU, but the embodiment of the present invention Not limited to this.
  • the fourth GPU holds third parameter data
  • the fifth GPU holds fourth parameter data
  • the fourth GPU is sent to the fourth GPU. Sending the fourth parameter data, so that the fourth GPU saves the third parameter data and the fourth parameter data at the same time, that is, the fourth GPU can process the task corresponding to the third parameter data and the fourth parameter data; and the fifth GPU determines Target GPU.
  • the second GPU may be sent to the target GPU, that is, the fifth GPU, to instruct the fifth GPU to delete the previously saved fourth parameter data, and send the target parameter data to the fifth GPU, where the fifth GPU is It is used to process the to-be-processed GPU tasks corresponding to the target parameter data and other similar tasks.
  • the second preset condition that the fourth GPU and the fifth GPU meet includes: the sum of the size of the third parameter data saved in the fourth GPU and the fourth parameter data saved in the fifth GPU is not greater than the first Four GPU memory.
  • the second preset condition may further include: when the unprocessed GPU task corresponding to the third parameter data in the fourth GPU is processed, the required duration is less than or equal to a third duration, and the fifth GPU is in the fifth GPU. The time required for the unprocessed GPU task corresponding to the four-parameter data to be processed is less than or equal to the fourth duration.
  • the third duration and the fourth duration may be set according to actual conditions.
  • the corresponding third time length may be The fourth time length is set to be infinite; or, the third time length may be set as the tolerance value of the waiting time of the task corresponding to the fourth parameter data, and correspondingly, the fourth time length is set as the waiting time of the task corresponding to the third parameter data. Tolerance values, but embodiments of the invention are not limited thereto.
  • the fourth GPU or the fifth GPU may be selected to process the merged tasks.
  • the merged task is processed by the fourth GPU, and the fifth GPU determines The target GPU is configured to process the to-be-processed GPU task corresponding to the target parameter data; when the sum of the size of the third parameter data and the fourth parameter data is greater than the fourth GPU and less than or equal to the memory size of the fifth GPU, then the merged
  • the task is processed by the fifth GPU, and the fourth GPU is determined to be the target GPU for processing the to-be-processed GPU task corresponding to the target parameter data; when the sum of the size of the third parameter data and the fourth parameter data is less than or equal to the fourth
  • the fourth GPU can be passed.
  • the GPU is processed by any one of the
  • whether the third GPU exists in the GPU cluster may be determined first, and if not, whether the presence or absence of the foregoing is determined.
  • the fourth GPU and the fifth GPU; or, after determining that the first GPU and the second GPU that hold the same first parameter data are not present in the GPU cluster, the GPU cluster may first determine whether the fourth GPU is present.
  • the fifth GPU if not present, determines whether the third GPU is present, and the embodiment of the present invention is not limited thereto.
  • the target GPU that processes the to-be-processed GPU task is determined in the GPU cluster, and execution continues to S260.
  • S280 may be directly executed.
  • the foregoing method determines that the ear target GPU may be one or more GPUs. When multiple target GPUs are determined, one target GPU may be determined among the plurality of target GPUs for processing the to-be-processed GPU task.
  • S270 is performed; when the target GPU cannot be determined, S280 is performed.
  • the plurality of target GPUs may be referred to as candidate GPUs, and in the candidate GPU, determining, according to the task amount of each candidate GPU, a target GPU is used to process the target GPU.
  • the GPU task to be processed that is, the sum of the task amounts of the tasks to be executed corresponding to each candidate GPU is determined, and the candidate GPU with the smallest task amount is determined.
  • the amount of tasks for each candidate GPU can be determined by the time at which the task is processed.
  • the GPU may have other unprocessed tasks before processing the GPU task to be processed.
  • the processing time of each task in these tasks is estimated, and the time summation of processing the tasks is The amount of tasks for the candidate GPU.
  • the candidate GPU with the smallest amount of tasks is determined according to the task amount of each candidate GPU.
  • the candidate GPU is the target GPU, which is used to process the GPU task to be processed, that is, to continue to execute S270; if the task of the candidate GPU with the smallest amount of tasks is the smallest If the amount is greater than the preset value, there is no target GPU that satisfies the condition to process the to-be-processed GPU task, and the process proceeds to S280.
  • the preset value may be set according to actual conditions. For example, the preset value may be set according to the tolerance value of the to-be-processed GPU task. If the to-be-processed GPU task has no limit on the waiting time for processing, If the preset value is set to infinity, the candidate GPU with the smallest amount of tasks is the smallest. Can be determined as the target GPU for processing pending GPU tasks.
  • the target GPU may not be processed by the task of the one or more task types, and the GPU task to be processed may be placed in the queue corresponding to the target GPU. , waiting to be processed.
  • the task type of the previous task may be the same as or different from the task type of the to-be-processed GPU task.
  • the task type of the previous task is the same as the task type of the GPU task to be processed, it indicates that the parameter data required by the target GPU to process the previous task is consistent with the target parameter data of the GPU task to be processed, and then the GPU is in the GPU.
  • the target parameter data has been saved, and the GPU can process the GPU task according to the target parameter data, that is, multiplex the target parameter data. In this way, the data initialization and the overhead of transmitting parameter data between the CPU and the GPU can be greatly reduced, and the GPU usage efficiency is improved.
  • the target GPU When the task type of the previous task is different from the task type of the GPU task to be processed, it indicates that the task processed by the target GPU is merged, and if the previous task belongs to the task belonging to the original third GPU, that is, the target GPU is the third.
  • the GPU that is, the previous task corresponding to the second parameter data, may retain the second parameter data of the previous task, and create a processing class instance of the to-be-processed GPU task, and input the required target parameter data, so as to facilitate The target GPU processes the to-be-processed GPU task. In this way, the target GPU can process the task corresponding to the second parameter data, and can also process the task corresponding to the target parameter data, and can also implement the multiplexing of the parameter data.
  • the previous task belongs to the original fourth GPU or the original fifth GPU, that is, the tasks in the fourth GPU and the fifth GPU are merged.
  • the original fourth GPU or the fifth GPU is determined as the target GPU.
  • the target GPU deletes the saved parameter data for processing the previous task, and creates a new one.
  • the processing class instance of the GPU task to be processed, the required target parameter data is passed in, so that the target GPU processes the to-be-processed GPU task.
  • the merged task is processed by the fourth GPU, and the GPU task to be processed corresponding to the target parameter data is processed by the fifth GPU, and the target GPU is determined as the target.
  • the GPU, the previous task of the GPU task to be processed by the target GPU is a task corresponding to the fourth parameter data, so the target GPU can process the previous one.
  • the task After the task, delete the saved fourth parameter data, create a processing class instance of the to-be-processed GPU task, and pass in the required target parameter data, so that the target GPU processes the to-be-processed GPU task, and the target GPU will
  • the target parameter data is saved, so that when the task corresponding to the target parameter data is received again, the saved target parameter data can be reused, the overhead of transmitting parameter data between the CPU and the GPU is reduced, and the GPU usage efficiency is improved.
  • the data interfaces of the CPU and the GPU may be defined for the required parameter data of each task at the time of execution, and the data interfaces are forwarded by the pointer through the pointer when the task is instantiated.
  • the cached data in the environment is passed in, thus achieving the purpose of data multiplexing.
  • the specific operational logic corresponding to the GPU task can be encapsulated into a corresponding processing function interface, such as a compute interface.
  • the task request fails.
  • the CPU may re-allocate the task or not process the task request, and the embodiment of the present invention is not limited thereto.
  • the size of the sequence numbers of the above processes does not mean the order of execution, and the order of execution of each process should be determined by its function and internal logic, and should not be directed to the embodiments of the present invention.
  • the implementation process constitutes any limitation.
  • the GPU task allocation method in the embodiment of the present invention determines target GPU for the GPU task to be processed in the GPU cluster, and sends target parameter data to the target GPU, and the target GPU saves the target parameter data to facilitate processing the target.
  • the GPU tasks to be processed corresponding to the parameter data and other similar tasks so that the same type of tasks can be processed by the same GPU to implement parameter data multiplexing, which greatly reduces the data initialization and the overhead of transmitting parameter data between the CPU and the GPU.
  • the efficiency of the GPU can be dynamically configured, so that the parameter data saved by the GPU can be changed according to processing requirements, thereby further improving the usage rate of the GPU and the flexibility of the processing task.
  • the GPU task allocation method of the embodiment of the present invention is used for the calculation of the human gene sequence comparison task, and the throughput can be increased by 80% to 100%.
  • the GPU task allocation apparatus 300 includes:
  • a determining unit 310 configured to determine, from a GPU cluster, a target GPU for a GPU task to be processed, where the GPU cluster includes at least two GPUs, each GPU of the GPU cluster retaining at least one parameter data;
  • the sending unit 320 is configured to send, to the target GPU, target parameter data required for processing the to-be-processed GPU task, where the target GPU is configured to process a GPU task corresponding to the target parameter data;
  • the sending unit 320 is further configured to: allocate the to-be-processed GPU to the target GPU for processing.
  • the apparatus for the GPU task of the embodiment of the present invention determines the target GPU that processes the to-be-processed GPU task in the GPU cluster, and sends the target parameter data of the to-be-processed GPU task to the target GPU, so that the target GPU can be saved.
  • the target parameter data, the target GPU can be used to process the GPU task to be processed, and can also be used to process other GPU tasks corresponding to the target parameter data, so that tasks using the same parameter data can be processed by the same GPU to implement parameter data.
  • Multiplexing greatly reduces the overhead of data initialization and the transfer of parameter data between the CPU and the GPU, thereby improving the utilization of the GPU.
  • the determining unit 310 is specifically configured to: determine, by the GPU cluster, the first GPU and the second GPU, where the first GPU and the second GPU save the same first parameter data; determine the first The GPU is the target GPU.
  • the sending unit 320 is further configured to: after determining that the first GPU is the target GPU, send a first deletion instruction to the first GPU, where the first deletion instruction is used to instruct the first GPU to delete the first A parameter data.
  • the determining unit 310 is specifically configured to: when the first GPU and the second GPU that hold the same first parameter data are not present in the GPU cluster, determine a third GPU from the GPU cluster, where The third GPU saves the second parameter data, and the sum of the size of the target parameter data and the second parameter data is not greater than the memory size of the third GPU; determining that the third GPU is the target GPU, the target GPU is used for Processing a GPU task corresponding to the target parameter data and the third parameter data.
  • the length of time required for the unprocessed GPU task in the third GPU to be processed is less than or equal to the first duration, and the required duration of the to-be-processed GPU task is less than or equal to the second duration.
  • the determining unit 310 is specifically configured to: when the first GPU and the second GPU that hold the same first parameter data are not present in the GPU cluster, determine the fourth from the GPU cluster.
  • the sum of the size of the third parameter data saved in the fourth GPU and the fourth parameter data saved in the fifth GPU is not greater than the memory of the fourth GPU;
  • the sending unit 320 is specifically configured to: Sending the fourth parameter data to the fourth GPU, where the fourth GPU is configured to process the GPU task corresponding to the third parameter data and the fourth parameter data;
  • the determining unit 310 is specifically configured to: determine that the fifth GPU is The target GPU.
  • the sending unit 320 is configured to: after determining that the fifth GPU is the target GPU, send a second delete instruction to the fifth GPU, where the second delete instruction is used to instruct the fifth GPU to delete the first Four parameter data.
  • the duration of the unprocessed GPU task corresponding to the third parameter data in the fourth GPU is less than or equal to a third duration, and the fourth parameter data in the fifth GPU is not processed.
  • the required time for the GPU task to be processed is less than or equal to the fourth duration.
  • the GPU task allocation apparatus 300 may correspond to the method 100 and the method 200 in the embodiments of the present invention, and the above and other operations and/or functions of the respective units in the apparatus 300 are respectively implemented.
  • the corresponding processes of the respective methods in FIG. 1 to FIG. 3 are not described herein again for the sake of brevity.
  • the GPU task allocation apparatus of the embodiment of the present invention determines target GPU for the GPU task to be processed in the GPU cluster, and sends target parameter data to the target GPU, and the target GPU saves the target parameter data to facilitate processing the target.
  • the GPU tasks to be processed corresponding to the parameter data and other similar tasks so that the same type of tasks can be processed by the same GPU to implement parameter data multiplexing, which greatly reduces the data initialization and the overhead of transmitting parameter data between the CPU and the GPU.
  • the efficiency of the GPU can be dynamically configured, so that the parameter data saved by the GPU can be changed according to processing requirements, thereby further improving the usage rate of the GPU and the flexibility of the processing task.
  • FIG. 5 is a schematic block diagram of a GPU task distribution apparatus 400 according to an embodiment of the present invention.
  • the apparatus 400 includes a processor 410 and a transceiver interface 420, and the processor 410 is connected to the transceiver interface 420.
  • the apparatus 400 further includes a memory 430 that is coupled to the processor 410.
  • the apparatus 400 includes a bus system 440.
  • the processor 410, the memory 430, and the transceiver interface 420 can be connected by a bus system 440.
  • the memory 430 can be used to store instructions for executing the instructions stored in the memory 430 to control the transceiver interface 420 to send information or signal,
  • the processor 410 is configured to: determine, from a GPU cluster, a target GPU for a GPU task to be processed, At least two GPUs are included in the GPU cluster, and each GPU in the GPU cluster stores at least one parameter data; and the target parameter data required for processing the to-be-processed GPU task is sent to the target GPU through the transceiver interface 420.
  • the target GPU is configured to process a GPU task corresponding to the target parameter data; the GPU to be processed is allocated to the target GPU through the transceiver interface 420.
  • the GPU task allocation apparatus determines the target GPU that processes the to-be-processed GPU task in the GPU cluster, and sends the target parameter data of the to-be-processed GPU task to the target GPU, so that the target GPU can
  • the target GPU is saved, and the target GPU can be used to process the GPU task to be processed, and can also be used to process other GPU tasks corresponding to the target parameter data, so that tasks using the same parameter data can be processed by the same GPU to implement parameters.
  • Data multiplexing greatly reduces data initialization and the overhead of transferring parameter data between the CPU and the GPU, thereby improving GPU utilization.
  • the processor 410 may be a CPU, and the processor 410 may also be other general-purpose processors, digital signal processors (DSPs), application specific integrated circuits (ASICs), and off-the-shelf programmable gate arrays. (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, etc.
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
  • the memory 430 can include read only memory and random access memory and provides instructions and data to the processor 410.
  • a portion of the memory 430 may also include a non-volatile random access memory.
  • the memory 430 can also store information of the device type.
  • the bus system 440 may include a power bus, a control bus, a status signal bus, and the like in addition to the data bus. However, for clarity of description, various buses are labeled as bus system 440 in the figure.
  • each step of the foregoing method may be completed by an integrated logic circuit of hardware in the processor 410 or an instruction in a form of software.
  • the steps of the method disclosed in the embodiments of the present invention may be directly implemented as a hardware processor, or may be performed by a combination of hardware and software modules in the processor.
  • the software module can be located in a conventional storage medium such as random access memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, and the like.
  • the storage medium is located in the memory 430, and the processor 410 reads the information in the memory 430 and completes the steps of the above method in combination with its hardware. To avoid repetition, it will not be described in detail here.
  • the processor 410 is configured to: determine the first GPU and the second from the GPU cluster. a GPU, wherein the first GPU and the second GPU save the same first parameter data; and determine that the first GPU is the target GPU.
  • the processor 410 is configured to send, by the transceiver interface 420, a first deletion instruction to the first GPU after determining that the first GPU is the target GPU, where the first deletion instruction is used to indicate the first GPU. Delete the first parameter data.
  • the processor 410 is configured to: when the first GPU and the second GPU that hold the same first parameter data are not present in the GPU cluster, determine a third GPU from the GPU cluster, where the third The third GPU saves the second parameter data, and the sum of the size of the target parameter data and the second parameter data is not greater than the memory size of the third GPU; determining that the third GPU is the target GPU, and the target GPU is configured to process A GPU task corresponding to the target parameter data and the third parameter data.
  • the length of time required for the unprocessed GPU task in the third GPU to be processed is less than or equal to the first duration, and the required duration of the to-be-processed GPU task is less than or equal to the second duration.
  • the processor 410 is configured to: when the first GPU and the second GPU that hold the same first parameter data are not present in the GPU cluster, determine the fourth GPU and the fifth GPU from the GPU cluster, The sum of the size of the third parameter data saved in the fourth GPU and the fourth parameter data saved in the fifth GPU is not greater than the memory of the fourth GPU; the fourth parameter is sent to the fourth GPU through the transceiver interface 420.
  • the fourth GPU is configured to process a GPU task corresponding to the third parameter data and the fourth parameter data; the processor 410 is configured to: determine that the fifth GPU is the target GPU.
  • the processor 410 is configured to send, by the transceiver interface 420, a second deletion instruction to the fifth GPU after determining that the fifth GPU is the target GPU, where the second deletion instruction is used to indicate the fifth GPU. Delete the fourth parameter data.
  • the duration of the unprocessed GPU task corresponding to the third parameter data in the fourth GPU is less than or equal to a third duration, and the fourth parameter data in the fifth GPU is not processed.
  • the required time for the GPU task to be processed is less than or equal to the fourth duration.
  • the GPU task assigning apparatus 400 may correspond to the apparatus 300 in the embodiment of the present invention, and may correspond to executing the corresponding body in the method 100 and the method 200 according to the embodiment of the present invention, and the apparatus
  • the above and other operations and/or functions of the various modules in 400 are respectively implemented in order to implement the respective processes of the various methods in FIGS. 1 to 3, for the sake of brevity. No longer.
  • the GPU task allocation apparatus of the embodiment of the present invention determines target GPU for the GPU task to be processed in the GPU cluster, and sends target parameter data to the target GPU, and the target GPU saves the target parameter data to facilitate processing the target.
  • the GPU tasks to be processed corresponding to the parameter data and other similar tasks so that the same type of tasks can be processed by the same GPU to implement parameter data multiplexing, which greatly reduces the data initialization and the overhead of transmitting parameter data between the CPU and the GPU.
  • the efficiency of the GPU can be dynamically configured, so that the parameter data saved by the GPU can be changed according to processing requirements, thereby further improving the usage rate of the GPU and the flexibility of the processing task.
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
  • the technical solution of the present invention which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

Abstract

本发明实施例涉及GPU任务的分配方法和装置。该方法包括:从GPU集群中为待处理GPU任务确定目标GPU,其中该GPU集群中包含至少两个GPU,该GPU集群中每个GPU保存有至少一个参数数据;向该目标GPU发送处理该待处理GPU任务时所需的目标参数数据,该目标GPU用于处理与该目标参数数据对应的GPU任务;将该待处理GPU分配给该目标GPU处理。本发明实施例的GPU任务的分配方法和装置,使相同类型的任务可以通过同一GPU进行处理,实现参数数据复用,极大降低了CPU与GPU间传输参数数据的开销,由此能够提高GPU的利用率。

Description

[根据细则37.2由ISA制定的发明名称] 图形处理器任务的分配方法和装置 技术领域
本发明涉及信息技术领域,尤其涉及图形处理器GPU任务的分配方法和装置。
背景技术
由于图形处理器(Graphics Processing Unit,GPU)具有大规模并行线程、高内存带宽等优势,适用于多线程并发的计算密集型任务。除了传统的图形图像处理应用外,GPU已被运用到诸多其他通用计算领域,如数据库、数据压缩、深度学习、生物计算等。例如,生物计算中的脱氧核糖核酸(Deoxyribonucleic acid,DNA)序列比对问题,大量的待测DNA片段可同时被GPU线程并行计算,极大的提高了DNA比对问题的吞吐量。
但是,GPU计算所需的数据均需由中央处理器(Central Processing Unit,CPU)的内存经过总线接口PCIe(PCI-Express)传输至GPU内存,而PCIe带宽远远小于GPU的内存带宽,这种带宽的不匹配使得GPU的计算线程难以得到充分利用。例如,以DNA序列比对为例,该类任务的每次调度执行需传输的数据不仅包括待比对的DNA片段集合,同时也包括参考DNA序列数据,通过实验发现:一个约200MB的DNA片段集合的比对任务在Nvidia K40 GPU上的执行时间约为41秒,其中,参考DNA序列数据通过PCIe传输准备时间就多达到20秒。因此,PCIe上的数据传输已经成为GPU计算的一大瓶颈。
发明内容
本申请提供了一种GPU任务的分配方法和装置,能够提高GPU的效率。
第一方面,提供了一种GPU任务的分配方法,该方法包括:从GPU集群中为待处理GPU任务确定目标GPU,其中该GPU集群中包含至少两个GPU,该GPU集群中每个GPU保存有至少一个参数数据;向该目标GPU发送处理该待处理GPU任务时所需的目标参数数据,该目标GPU用于处理与该目标参数数据对应的GPU任务;将该待处理GPU分配给该目标GPU处理。
基于上述技术方案,本申请的GPU任务的分配方法,通过在GPU集群中确定处理待处理GPU任务的目标GPU,并向该目标GPU发送处理该待处理GPU任务的目标参数数据,使得该目标GPU可以包括该目标参数数据,并对待处理GPU任务进行处理,这样,采用相应参数数据的任务可以通过同一GPU进行处理,实现参数数据复用,极大降低了数据初始化以及CPU与GPU间传输参数数据的开销,由此能够提高GPU的利用率。
应理解,根据GPU任务在被处理时所需的参数数据,可以将GPU任务分为多种类型,因此,接收到待处理GPU任务后,可以确定该待处理GPU任务的目标参数数据,根据待处理GPU任务的目标参数数据,确定处理该待处理GPU任务的目标GPU。
在本发明实施例中,GPU集群中可能存在还未处理过GPU任务的GPU,则该待处理GPU任务可以直接分配给该GPU进行处理;但本发明实施例以每个GPU都已经处理过GPU任务为例进行说明。
可选地,GPU在处理过GPU任务后,会保存处理该GPU任务时所需的参数数据,实现数据复用,这样,在该GPU再次处理相同类型任务时,可以复用保存的参数数据,节省参数数据传输时间,提高效率。由于本发明实施例以每个GPU都已经处理过GPU任务为例进行说明,因此,GPU集群中每个GPU都已经保存有参数数据。
具体地,GPU集群中每个GPU可以包括一个或多个参数数据,即该GPU集群中存在至少一个GPU可以处理多种任务,每种任务都需要一种参数数据因此该GPU中保存有多种参数数据;或者该GPU集群中也可以存在至少两个GPU保存有相同的参数数据,即该GPU集群中存在多个GPU处理同一类参数数据对应的GPU任务。
应理解,该GPU集群中可能已经存在处理过与待处理GPU任务属于同一类任务的GPU,也就是该GPU中保存有处理该类GPU任务时所需的参数数据,即目标参数数据;但该GPU集群中也可能不存在该GPU,也就是该待处理GPU任务是新的一类任务。本发明实施例以该GPU集群中不存在与目标参数数据对应的目标GPU为例进行说明,即该GPU集群首次接收到该类待处理GPU任务,该待处理GPU任务被处理时使用目标参数数据,则GPU集群会先确定目标GPU。
结合第一方面,在第一方面的一种实现方式中,该从GPU集群中为待 处理GPU任务确定目标GPU,包括:从该GPU集群中确定第一GPU和第二GPU,其中,该第一GPU和该第二GPU保存有相同的第一参数数据;确定该第一GPU为该目标GPU。
这样,可以将重复处理同一类任务的多个GPU中至少一个GPU确定为目标GPU,通过该目标GPU处理新的一类任务,即目标参数数据对应的待处理GPU任务以及其他任务,可以提高GPU利用率。
应理解,该第一和GPU和第二GPU可以分别指一个GPU,也可以指多个GPU,则可以将第一GPU确定为目标GPU,相应地,该目标GPU可以为一个GPU,也可以为多个GPU。
结合第一方面及其上述实现方式,在第一方面的另一种实现方式中,该确定该第一GPU为该目标GPU之后,该方法还包括:向该第一GPU发送第一删除指令,该第一删除指令用于指示该第一GPU删除该第一参数数据。
具体地,该目标GPU当前可能存在正在处理的GPU任务和未处理的GPU任务,这些GPU任务都对应第一参数数据,则该目标GPU可以在处理完该当前正在处理的任务后,删除原来保存的第一参数数据,并将其他未处理的GPU任务转为由第二GPU处理;或者,也可以将该当前正在处理的GPU任务以及其他未处理的GPU任务都转为由第二GPU处理,并删除目标GPU中原来保存的第一参数数据。
应理解,该GPU集群中可能存在多个GPU保存相同参数数据,因此,该目标参数数据可以为一个或多个。例如,可以在GPU集群中确定至少k个GPU为目标GPU,k满足公式(1):
Figure PCTCN2016080478-appb-000001
其中,n表示GPU集群中GPU的个数;N表示除目标参数外,当前GPU集群中对应的参数数据的种类数;
Figure PCTCN2016080478-appb-000002
表示向下取整运算。结合第一方面及其上述实现方式,在第一方面的另一种实现方式中,当该GPU集群中不存在保存有相同的第一参数数据的第一GPU和第二GPU时,该从GPU集群中为待处理GPU任务确定目标GPU,包括:从该GPU集群中确定第三GPU,其中,该第三GPU保存有第二参数数据,且该目标参数数据和该第二参数数据的大小之和不大于该第三GPU的内存大小;确定该第三GPU为该目标GPU,该目标GPU用于处理与该目标参数数据和该第三参数数据对应的 GPU任务。
结合第一方面及其上述实现方式,在第一方面的另一种实现方式中,该第三GPU中未被处理的GPU任务被处理时所需时长小于或等于第一时长,且该待处理GPU任务被处理时所需时长小于或等于第二时长。
应理解,第一时长和第二时长可以根据实际情况进行设置,例如,若不限制第三GPU对应的原有任务或待处理GPU任务的处理时间,则可以将第一时长和第二时长设置为无限大;或者,可以将第一时长设置为待处理GPU任务等待时间的容忍值,对应地,第二时长设置为第三GPU对应的原有任务等待时间的容忍值。
结合第一方面及其上述实现方式,在第一方面的另一种实现方式中,当该GPU集群中不存在保存有相同的第一参数数据的第一GPU和第二GPU时,该从GPU集群中为待处理GPU任务确定目标GPU,包括:从该GPU集群中确定第四GPU和第五GPU,该第四GPU中保存的第三参数数据和该第五GPU中保存的第四参数数据的大小之和不大于该第四GPU的内存;向该第四GPU发送该第四参数数据,该第四GPU用于处理与该第三参数数据和该第四参数数据对应的GPU任务;确定该第五GPU为该目标GPU。
这样,将GPU集群中部分满足预设条件的GPU对应的任务进行合并,例如,两个GPU保存的参数数据小于等于其中至少一个GPU时,将两个GPU的任务进行合并,可以提高GPU利用率。
结合第一方面及其上述实现方式,在第一方面的另一种实现方式中,该确定该第五GPU为该目标GPU之后,该方法还包括:向该第五GPU发送第二删除指令,该第二删除指令用于指示该第五GPU删除该第四参数数据。
结合第一方面及其上述实现方式,在第一方面的另一种实现方式中,该第四GPU中该第三参数数据对应的未被处理的GPU任务被处理时所需时长小于或等于第三时长,该第五GPU中该第四参数数据对应的未被处理的GPU任务被处理时所需时长小于或等于第四时长。
应理解,第三时长和第四时长可以根据实际情况进行设置,例如,若不限制第三参数数据对应的任务和第四参数数据对应的任务的处理时间,对应的则可以将第三时长和第四时长设置为无限大;或者,可以将第三时长设置为第四参数数据对应的任务的等待时间的容忍值,对应地,第四时长设置为第三参数数据对应的任务的等待时间的容忍值,但本发明实施例并不限于 此。
应理解,将第四GPU中的任务与第五GPU中的任务合并后,可以选择第四GPU或第五GPU处理合并后的任务。具体地,当第三参数数据与第四参数数据的大小之和小于或等于第四GPU而大于第五GPU的内存大小时,则合并后的任务由第四GPU处理,第五GPU则确定为目标GPU,用于处理目标参数数据对应的待处理GPU任务;当该第三参数数据与第四参数数据的大小之和大于第四GPU而小于或等于第五GPU的内存大小时,则合并后的任务由第五GPU处理,第四GPU则确定为目标GPU,用于处理目标参数数据对应的待处理GPU任务;当该第三参数数据与第四参数数据的大小之和小于或等于第四GPU也小于或等于第五GPU时,则可以通过第四GPU和第五GPU中任意一个GPU处理合并后的任务,而另一个GPU确定为目标GPU,用于处理目标参数数据对应的待处理GPU任务。
应理解,在确定GPU集群中不存在保存了相同第一参数数据的第一GPU和第二GPU后,可以先在GPU集群中确定是否存在上述第三GPU,若不存在,再确定是否存在上述的第四GPU和第五GPU;或者,在确定GPU集群中不存在保存了相同第一参数数据的第一GPU和第二GPU后,也可以先在GPU集群中确定是否存在上述的第四GPU和第五GPU,若不存在,再确定是否存在上述第三GPU。
可选地,当该目标GPU处理待处理GPU任务的前一个任务时,该前一个任务的任务类型与该待处理GPU任务的任务类型可能相同也可能不同。当该前一个任务的任务类型与待处理GPU任务的任务类型相同时,说明该目标GPU处理该前一个任务时所需的参数数据与待处理GPU任务的目标参数数据一致,则此时GPU中已经保存了该目标参数数据,GPU可以根据该目标参数数据对待处理GPU任务进行处理,也就是复用该目标参数数据。这样,可以极大降低了数据初始化以及CPU与GPU间传输参数数据的开销,提升了GPU的使用效率。
可选地,当该前一个任务的任务类型与待处理GPU任务的任务类型不同时,说明该目标GPU处理的任务经过合并,若该前一个任务为属于原第三GPU中的任务,即目标GPU位第三GPU,即该前一个任务对应第二参数数据,则可以保留该前一个任务的第二参数数据,并新建一个该待处理GPU任务的处理类实例,传入所需的目标参数数据,以便于该目标GPU处理该 待处理GPU任务。这样,该目标GPU既可以处理第二参数数据对应的任务,也可以处理目标参数数据对应的任务,同样可以实现参数数据的复用。
当该前一个任务的任务类型与待处理GPU任务的任务类型不同时,还有可能该前一个任务为属于原第四GPU或原第五GPU,即将第四GPU和第五GPU中的任务合并后,原第四GPU或第五GPU则被确定为目标GPU。该目标GPU在处理完前一个任务后,由于前一个任务的任务类型与待处理GPU任务的任务类型不同,则该目标GPU会删除保存的用于处理前一个任务的参数数据,并新建一个该待处理GPU任务的处理类实例,传入所需的目标参数数据,以便于该目标GPU处理该待处理GPU任务。
应理解,若没有确定出目标GPU用于处理待处理GPU任务,则该任务请求失败。例如,该CPU确定本次待处理GPU任务的任务请求失败后,可以重新分配该任务,或不处理该任务请求。
第二方面,提供了一种GPU任务的分配装置,用于执行上述第一方面或第一方面的任意可能的实现方式中的方法。具体地,该装置包括用于执行上述第一方面或第一方面的任意可能的实现方式中的方法的单元。
第三方面,提供了一种GPU任务的分配装置,包括:存储单元和处理器,该存储单元用于存储指令,该处理器用于执行该存储器存储的指令,并且当该处理器执行该存储器存储的指令时,该执行使得该处理器执行第一方面或第一方面的任意可能的实现方式中的方法。
第四方面,提供了一种计算机可读介质,用于存储计算机程序,该计算机程序包括用于执行第一方面或第一方面的任意可能的实现方式中的方法的指令。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对本发明实施例中所需要使用的附图作简单地介绍,显而易见地,下面所描述的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是根据本发明实施例的GPU任务的分配方法的示意性流程图。
图2是根据本发明实施例的GPU任务的分配方法的应用场景的示意图。
图3是根据本发明另一实施例的GPU任务的分配方法的示意性流程图。
图4是根据本发明实施例的GPU任务的分配装置的示意性框图。
图5是根据本发明另一实施例的GPU任务的分配装置的示意性框图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明的一部分实施例,而不是全部实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都应属于本发明保护的范围。
具体地,图1示出了根据本发明实施例的GPU任务的分配方法100的示意性流程图,该方法可以由处理器执行,例如由CPU执行,这里以CPU为例进行说明。具体地,如图1所示,该方法100包括:
S110,从GPU集群中为待处理GPU任务确定目标GPU,其中该GPU集群中包含至少两个GPU,该GPU集群中每个GPU保存有至少一个参数数据。
应理解,该方法100可以应用于如图2所示的应用场景中。如图2所示,客户端可以包括一个或多个应用,每个应用中可以运行一个或多个线程,当一个或多个应用同时在同一个计算节点或同一个计算节点集群上运行时,它们可以共享该计算节点或该计算节点集群上的全部GPU计算资源。为了对GPU计算资源进行统一管理及分配,可以将该计算节点或该计算节点集群中包括的全部GPU看作GPU集群,由该计算节点或该计算节点集群进行统一管理和分配,例如,可以通过CPU进行管理和分配。从而,应用端的各线程在遇到需要通过GPU处理的GPU任务时,可以向该计算节点或该计算节点集群中的发送GPU任务请求,由计算节点或计算节点集群中的CPU根据当前的计算环境,将GPU任务请求分派给具体的GPU以等待对应GPU调度执行。
在本发明实施例中,客户端可以向计算节点或计算节点集群发送任务请求,该任务请求用于请求为待处理GPU任务分配GPU集群中的目标GPU,以便于该GPU处理该待处理GPU任务。
在本发明实施例中,CPU接收到该任务请求后,可以先确定任务请求中 的待处理GPU任务的目标参数数据。具体地,GPU任务在被处理时需要利用参数数据,该参数数据在处理该GPU任务过程中不会发生变化,例如,该参数数据可以为计算函数,或者对于DNA序列比对类任务,该参数数据可以为参考DNA序列。CPU可以根据不同参数数据将任务划分为多种类型,即可以将访问相同的参数数据的任务归为同一种任务类型。因此,对于待处理GPU任务,可以先确定该待处理GPU任务在被处理时所需的目标参数数据,根据该目标参数数据,确定处理该待处理GPU任务的目标GPU。
在本发明实施例中,GPU集群中可能存在还未处理过GPU任务的GPU,则该待处理GPU任务可以直接分配给该GPU进行处理;但是,该GPU集群中每个GPU也可能都处理过GPU任务,本发明实施例以每个GPU都已经处理过GPU任务为例进行说明。
可选地,GPU在处理过GPU任务后,会保存处理该GPU任务时所需的参数数据,实现数据复用,这样,在该GPU再次处理相同类型任务时,可以复用保存的参数数据,节省参数数据传输时间,提高效率。由于本发明实施例以每个GPU都已经处理过GPU任务为例进行说明,因此,GPU集群中每个GPU都已经保存有参数数据。
具体地,GPU集群中每个GPU可以包括一个或多个参数数据,即该GPU集群中存在至少一个GPU可以处理多种任务,每种任务都需要一种参数数据因此该GPU中保存有多种参数数据;或者该GPU集群中也可以存在至少两个GPU保存有相同的参数数据,即该GPU集群中存在多个GPU处理同一类参数数据对应的GPU任务。
在本发明实施例中,该GPU集群中可能已经存在处理过与待处理GPU任务属于同一类任务的GPU,也就是该GPU中保存有处理该类GPU任务时所需的参数数据,即目标参数数据;但该GPU集群中也可能不存在该GPU,也就是该待处理GPU任务是新的一类任务。例如,可以通过查找参数数据与GPU映射表,确定该待处理GPU任务的目标参数数据对应的目标GPU,可选地,该映射表可以保存在CPU的内存中。根据映射关系,当确定GPU集群中存在与目标参数数据对应的目标GPU时,将该GPU确定为目标GPU,该目标GPU中保存了目标参数数据,并且处理过与该待处理GPU任务使用相同目标参数数据的任务,因此该目标GPU可以继续处理该待处理GPU任务,而不要求再次传输目标参数数据。但是,当确定该GPU集群中不存在 与目标参数数据对应的目标GPU时,需要在该GPU集群中为待处理GPU任务确定目标GPU,由该目标GPU处理待处理GPU任务。
应理解,本发明实施例以该GPU集群中不存在与目标参数数据对应的目标GPU为例进行说明,即该GPU集群首次接收到该类待处理GPU任务,该待处理GPU任务被处理时使用目标参数数据,则GPU集群会确定目标GPU,并继续执行S120。
S120,向该目标GPU发送处理该待处理GPU任务时所需的目标参数数据,该目标GPU用于处理与该目标参数数据对应的GPU任务。
具体地,在GPU集群中为待处理GPU任务确定其对应的目标GPU后,由于该目标GPU首次处理该类GPU任务,因此会先保存该待处理GPU任务的目标参数数据,以便于之后再通过该目标参数数据,处理同类型其他任务,实现数据复用。
S130,将该待处理GPU分配给该目标GPU处理。
具体地,目标GPU保存处理该待处理GPU任务时所需的目标参数数据后,即可以通过该目标参数数据处理该待处理GPU任务,CPU可以将该任务分配给该目标GPU进行处理。
因此,本发明实施例的GPU任务的分配方法,通过在GPU集群中确定处理待处理GPU任务的目标GPU,并向该目标GPU发送处理该待处理GPU任务的目标参数数据,使得该目标GPU可以保存该目标参数数据,该目标GPU可以用于处理待处理GPU任务,还可以用于处理与目标参数数据对应的其他GPU任务,这样,采用相同参数数据的任务可以通过同一GPU进行处理,实现参数数据复用,极大降低了数据初始化以及CPU与GPU间传输参数数据的开销,由此能够提高GPU的利用率。
可选地,作为一个实施例,对于上述方法100,尤其该方法100中的S110,可以通过如图3所示的方法200来具体实现,图3示出了根据本发明另一实施例的GPU任务的分配方法200的示意性流程图,该方法200包括:
S210,接收任务请求,该任务请求用于请求处理待处理GPU任务。
可选地,可以通过CPU接收该任务请求,由CPU对该任务请求中请求处理的待处理GPU任务进行分配。
S220,确定该待处理GPU任务的目标参数数据。
S230,在GPU集群中确定是否存在与目标参数数据对应的GPU,如果 存在,则执行S240;如果不存在,则执行S250。
具体地,CPU可以对GPU集群进行统一管理和分配,每个GPU可以保存有一个或多个参数数据,可以根据该待处理GPU任务在被处理时所需要的目标参数数据,通过查找参数数据与GPU映射表,确定是否存在与该待处理GPU任务的目标参数数据对应的GPU,该对应的GPU中保存有该目标参数数据,如果存在,则执行S240;如果不存在,则执行S250。
S240,将与目标参数数据对应的GPU确定为目标GPU,并继续执行S260。
具体地,将GPU集群中与待处理GPU任务的目标参数数据对应的GPU,确定为目标GPU,该目标GPU中保存了目标参数数据,可以用于处理该待处理GPU任务。
由于,该GPU集群中可能存在多个GPU保存了相同的目标参数,因此,该目标GPU可以为一个或多个GPU,当存在多个目标GPU时,可以继续执行S260。
S250,将GPU集群中的GPU进行重新分配,确定目标GPU,并继续执行S260。可选地,该目标GPU可以用于保存处理待处理GPU任务的目标参数数据,即该目标GPU可以用于处理待处理GPU任务,以及与该待处理GPU任务使用相同目标参数数据的同类其他GPU任务。
可选地,将GPU集群中的GPU进行重新分配包括:从将保存了相同参数数据的多个GPU中选择一个或多个GPU为目标GPU,该目标GPU改为保存目标参数数据;或者,将保存了其他参数数据的GPU确定为目标GPU,使得该目标GPU同时包括其他参数数据和目标参数数据;或者,将GPU集群中可以合并的两个GPU合并,由其中一个GPU处理原两个GPU处理的全部任务,而另一个GPU确定为目标GPU,用于保存目标参数数据以及处理待处理GPU任务。
具体地,当GPU集群中存在多个GPU保存有相同的参数数据时,例如,该GPU集群中第一GPU和第二GPU保存有相同的第一参数数据,该第一和GPU和第二GPU可以分别指一个GPU,也可以指多个GPU,则可以将第一GPU确定为目标GPU,相应地,该目标GPU可以为一个GPU,也可以为多个GPU。
可选地,可以向该目标GPU发送第一删除指令,指示该目标GPU删除 原来保存的第一参数数据。具体地,该目标GPU当前可能存在正在处理的GPU任务和未处理的GPU任务,这些GPU任务都对应第一参数数据,则该目标GPU可以在处理完该当前正在处理的任务后,删除原来保存的第一参数数据,并将其他未处理的GPU任务转为由第二GPU处理;或者,也可以将该当前正在处理的GPU任务以及其他未处理的GPU任务都转为由第二GPU处理,并删除目标GPU中原来保存的第一参数数据。
应理解,该GPU集群中可能存在多个GPU保存相同参数数据,因此,该目标参数数据可以为一个或多个。例如,可以在GPU集群中确定至少k个GPU为目标GPU,k满足公式(1):
Figure PCTCN2016080478-appb-000003
其中,n表示GPU集群中GPU的个数;N表示除目标参数外,当前GPU集群中对应的参数数据的种类数;
Figure PCTCN2016080478-appb-000004
表示向下取整运算。
可选地,当GPU集群中不存在保存有相同第一参数数据的第一GPU和第二GPU时,即该GPU集群中不同GPU处理不同类的GPU任务时,则可以在该GPU集群中确定满足第一预设条件的第三GPU为目标GPU,该第三GPU中保存有第二参数数据,则该第三GPU被确定为目标GPU后,既可以处理目标参数数据对应待处理GPU任务,还可以继续处理原来可以处理的第二参数数据对应的任务。具体地,该第三GPU满足的第一预设条件包括:该第三GPU中保存的第二参数数据与待处理GPU任务的目标参数数据之和小于或等于第三GPU的内存大小。其中,该第二参数数据为该第三GPU之前保存的参数数据,该第二参数数据可以为一种或多种参数数据,本发明实施例并不限于此。
可选地,该第三GPU满足的第一预设条件还可以包括:该第三GPU中的已存在的任务被处理时所需的时长小于或等于第一时长,且待处理GPU任务被处理时所需时长小于或等于第二时长。应理解,第一时长和第二时长可以根据实际情况进行设置,例如,若不限制第三GPU对应的原有任务或待处理GPU任务的处理时间,则可以将第一时长和第二时长设置为无限大;或者,可以将第一时长设置为待处理GPU任务等待时间的容忍值,对应地,第二时长设置为第三GPU对应的原有任务等待时间的容忍值,但本发明实施例并不限于此。
可选地,当GPU集群中不存在保存有相同第一参数数据的第一GPU和第二GPU时,即该GPU集群中不同GPU处理不同类的GPU任务时,还可以在该GPU集群中确定第四GPU和第五GPU,该第四GPU保存有第三参数数据,该第五GPU保存有第四参数数据,当第四GPU和第五GPU满足第二预设条件时,向第四GPU发送第四参数数据,使得该第四GPU同时保存有第三参数数据和第四参数数据,即该第四GPU可以处理第三参数数据和第四参数数据对应的任务;而将第五GPU确定为目标GPU。可选地,可以向该目标GPU,即第五GPU,发送第二删除指令,指示该第五GPU删除原来保存的第四参数数据,向该第五GPU发送目标参数数据,则该第五GPU用于处理该目标参数数据对应的待处理GPU任务以及其他同类任务。
其中,该第四GPU和第五GPU满足的第二预设条件包括:该第四GPU中保存的第三参数数据和该第五GPU中保存的第四参数数据的大小之和不大于该第四GPU的内存。可选地,该第二预设条件还可以包括:该第四GPU中第三参数数据对应的未被处理的GPU任务被处理时所需时长小于或等于第三时长,该第五GPU中第四参数数据对应的未被处理的GPU任务被处理时所需时长小于或等于第四时长。
应理解,第三时长和第四时长可以根据实际情况进行设置,例如,若不限制第三参数数据对应的任务和第四参数数据对应的任务的处理时间,对应的则可以将第三时长和第四时长设置为无限大;或者,可以将第三时长设置为第四参数数据对应的任务的等待时间的容忍值,对应地,第四时长设置为第三参数数据对应的任务的等待时间的容忍值,但本发明实施例并不限于此。
应理解,将第四GPU中的任务与第五GPU中的任务合并后,可以选择第四GPU或第五GPU处理合并后的任务。具体地,当第三参数数据与第四参数数据的大小之和小于或等于第四GPU而大于第五GPU的内存大小时,则合并后的任务由第四GPU处理,第五GPU则确定为目标GPU,用于处理目标参数数据对应的待处理GPU任务;当该第三参数数据与第四参数数据的大小之和大于第四GPU而小于或等于第五GPU的内存大小时,则合并后的任务由第五GPU处理,第四GPU则确定为目标GPU,用于处理目标参数数据对应的待处理GPU任务;当该第三参数数据与第四参数数据的大小之和小于或等于第四GPU也小于或等于第五GPU时,则可以通过第四GPU 和第五GPU中任意一个GPU处理合并后的任务,而另一个GPU确定为目标GPU,用于处理目标参数数据对应的待处理GPU任务。
应理解,在确定GPU集群中不存在保存了相同第一参数数据的第一GPU和第二GPU后,可以先在GPU集群中确定是否存在上述第三GPU,若不存在,再确定是否存在上述的第四GPU和第五GPU;或者,在确定GPU集群中不存在保存了相同第一参数数据的第一GPU和第二GPU后,也可以先在GPU集群中确定是否存在上述的第四GPU和第五GPU,若不存在,再确定是否存在上述第三GPU,本发明实施例并不限于此。
根据上述方法,在GPU集群中确定处理待处理GPU任务的目标GPU,并继续执行S260。可选地,若无法确定该目标GPU,或者不存在目标GPU,则可以直接执行S280。
S260,上述方法确定耳朵目标GPU可以为一个或多个GPU,当确定出多个目标GPU时,可以在该多个目标GPU中确定一个目标GPU用于处理该待处理GPU任务。可选地,当确定了处理待处理GPU任务的目标GPU后,执行S270;当无法确定该目标GPU时,执行S280。
在本发明实施例中,当确定了多个目标GPU后,可以称该多个目标GPU为候选GPU,可以在候选GPU中,根据每个候选GPU的任务量,确定一个目标GPU用于处理该待处理GPU任务,即确定每个候选GPU对应的待执行的任务的任务量总和,并确定出任务量最小的候选GPU。
具体地,每个候选GPU的任务量可以通过处理任务的时间确定。对于任意一个候选GPU,该GPU在处理待处理GPU任务之前,可能存在其他还未处理的任务,对这些任务中每个任务的处理时间进行预估,将处理这些任务的时间求和即为该候选GPU的任务量。根据每个候选GPU的任务量,确定任务量最小的候选GPU。
若该任务量最小的候选GPU的任务量小于或等于预设值,则该候选GPU即为目标GPU,用于处理待处理GPU任务,即继续执行S270;若该任务量最小的候选GPU的任务量大于该预设值,则不存在满足条件的目标GPU来处理待处理GPU任务,并继续执行S280。
应理解,该预设值可以根据实际情况进行设定,例如,可以根据该待处理GPU任务的容忍值设置该预设值,若该待处理GPU任务对等待被处理的时间没有限制,可以将预设值设置为无限大,则该任务量最小的候选GPU 可以确定为目标GPU,用于处理待处理GPU任务。
S270,通过该目标GPU处理该待处理GPU任务。
应理解,确定用于处理待处理GPU任务的目标GPU后,该目标GPU可能对应一个或多个任务类型的任务未进行处理,则可以将该待处理GPU任务放入该目标GPU对应的队列中,等待被处理。
具体地,当该目标GPU处理待处理GPU任务的前一个任务时,该前一个任务的任务类型与该待处理GPU任务的任务类型可能相同也可能不同。当该前一个任务的任务类型与待处理GPU任务的任务类型相同时,说明该目标GPU处理该前一个任务时所需的参数数据与待处理GPU任务的目标参数数据一致,则此时GPU中已经保存了该目标参数数据,GPU可以根据该目标参数数据对待处理GPU任务进行处理,也就是复用该目标参数数据。这样,可以极大降低了数据初始化以及CPU与GPU间传输参数数据的开销,提升了GPU的使用效率。
当该前一个任务的任务类型与待处理GPU任务的任务类型不同时,说明该目标GPU处理的任务经过合并,若该前一个任务为属于原第三GPU中的任务,即目标GPU位第三GPU,即该前一个任务对应第二参数数据,则可以保留该前一个任务的第二参数数据,并新建一个该待处理GPU任务的处理类实例,传入所需的目标参数数据,以便于该目标GPU处理该待处理GPU任务。这样,该目标GPU既可以处理第二参数数据对应的任务,也可以处理目标参数数据对应的任务,同样可以实现参数数据的复用。
当该前一个任务的任务类型与待处理GPU任务的任务类型不同时,还有可能该前一个任务为属于原第四GPU或原第五GPU,即将第四GPU和第五GPU中的任务合并后,原第四GPU或第五GPU则被确定为目标GPU。该目标GPU在处理完前一个任务后,由于前一个任务的任务类型与待处理GPU任务的任务类型不同,则该目标GPU会删除保存的用于处理前一个任务的参数数据,并新建一个该待处理GPU任务的处理类实例,传入所需的目标参数数据,以便于该目标GPU处理该待处理GPU任务。
例如,将第四GPU和第第五GPU的任务合并后,由第四GPU处理合并后的任务,由第五GPU处理目标参数数据对应的待处理GPU任务,则该第物GPU被确定为目标GPU,该目标GPU处理的待处理GPU任务的前一个任务为第四参数数据对应的任务,因此该目标GPU可以在处理完该前一 个任务后,删除保存的第四参数数据,新建一个该待处理GPU任务的处理类实例,传入所需的目标参数数据,以便于该目标GPU处理该待处理GPU任务,并且该目标GPU会保存该目标参数数据,以便于再次接收到目标参数数据对应的任务时,可以复用该保存的目标参数数据,降低CPU与GPU间传输参数数据的开销,提升GPU的使用效率。
应理解,在处理任意一类GPU任务时,可以分别针对各类任务在执行时的所需的参数数据,定义CPU和GPU的数据接口,这些数据接口在该类任务实例化时通过指针从当前环境中的已缓存数据传入,从而达到数据复用的目的。GPU任务对应的具体操作逻辑则可以封装到对应的处理函数接口,如compute接口。
S280,该待处理GPU任务分配失败,返回该待处理GPU任务的任务请求。
应理解,若没有确定出目标GPU用于处理待处理GPU任务,则该任务请求失败。例如,该CPU确定本次待处理GPU任务的任务请求失败后,可以重新分配该任务,或不处理该任务请求,本发明实施例并不限于此。
应理解,在本发明的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本发明实施例的实施过程构成任何限定。
因此,本发明实施例的GPU任务的分配方法,通过在GPU集群中为待处理GPU任务确定目标GPU,向该目标GPU发送目标参数数据,该目标GPU保存该目标参数数据,以便于处理该目标参数数据对应的待处理GPU任务以及其他同类任务,这样,相同类型的任务可以通过同一GPU进行处理,实现参数数据复用,极大降低了数据初始化以及CPU与GPU间传输参数数据的开销,提升了GPU的使用效率。并且,可以通过动态调配已经保存参数数据的GPU,使得GPU保存的参数数据可以根据处理需要而变化,进一步提高了GPU的使用率和处理任务的灵活性。
具体地,根据实际测试表明,将本发明实施例的GPU任务的分配方法用于人类基因序列比对任务的计算,吞吐量可提升80%~100%。
上文中结合图1至图3,详细描述了根据本发明实施例的GPU任务的分配方法,下面将结合图4至图5,描述根据本发明实施例的GPU任务的分配装置。
如图4所示,根据本发明实施例的GPU任务的分配装置300包括:
确定单元310,用于从GPU集群中为待处理GPU任务确定目标GPU,其中该GPU集群中包含至少两个GPU,该GPU集群中每个GPU保存有至少一个参数数据;
发送单元320,用于向该目标GPU发送处理该待处理GPU任务时所需的目标参数数据,该目标GPU用于处理与该目标参数数据对应的GPU任务;
该发送单元320还用于:将该待处理GPU分配给该目标GPU处理。
因此,本发明实施例的GPU任务的装置,通过在GPU集群中确定处理待处理GPU任务的目标GPU,并向该目标GPU发送处理该待处理GPU任务的目标参数数据,使得该目标GPU可以保存该目标参数数据,该目标GPU可以用于处理待处理GPU任务,还可以用于处理与目标参数数据对应的其他GPU任务,这样,采用相同参数数据的任务可以通过同一GPU进行处理,实现参数数据复用,极大降低了数据初始化以及CPU与GPU间传输参数数据的开销,由此能够提高GPU的利用率。
可选地,该确定单元310具体用于:从该GPU集群中确定第一GPU和第二GPU,其中,该第一GPU和该第二GPU保存有相同的第一参数数据;确定该第一GPU为该目标GPU。
可选地,该发送单元320还用于:在确定该第一GPU为该目标GPU之后,向该第一GPU发送第一删除指令,该第一删除指令用于指示该第一GPU删除该第一参数数据。
可选地,该确定单元310具体用于:当该GPU集群中不存在保存有相同的第一参数数据的第一GPU和第二GPU时,从该GPU集群中确定第三GPU,其中,该第三GPU保存有第二参数数据,且该目标参数数据和该第二参数数据的大小之和不大于该第三GPU的内存大小;确定该第三GPU为该目标GPU,该目标GPU用于处理与该目标参数数据和该第三参数数据对应的GPU任务。
可选地,该第三GPU中未被处理的GPU任务被处理时所需时长小于或等于第一时长,且该待处理GPU任务被处理时所需时长小于或等于第二时长。
可选地,该确定单元310具体用于:当该GPU集群中不存在保存有相同的第一参数数据的第一GPU和第二GPU时,从该GPU集群中确定第四 GPU和第五GPU,该第四GPU中保存的第三参数数据和该第五GPU中保存的第四参数数据的大小之和不大于该第四GPU的内存;该发送单元320具体用于:向该第四GPU发送该第四参数数据,该第四GPU用于处理与该第三参数数据和该第四参数数据对应的GPU任务;该确定单元310具体用于:确定该第五GPU为该目标GPU。
可选地,该发送单元320具体用于:在确定该第五GPU为该目标GPU之后,向该第五GPU发送第二删除指令,该第二删除指令用于指示该第五GPU删除该第四参数数据。
可选地,该第四GPU中该第三参数数据对应的未被处理的GPU任务被处理时所需时长小于或等于第三时长,该第五GPU中该第四参数数据对应的未被处理的GPU任务被处理时所需时长小于或等于第四时长。
应理解,根据本发明实施例的GPU任务的分配装置300可对应于执行本发明实施例中的方法100和方法200,并且装置300中的各个单元的上述和其它操作和/或功能分别为了实现图1至图3中的各个方法的相应流程,为了简洁,在此不再赘述。
因此,本发明实施例的GPU任务的分配装置,通过在GPU集群中为待处理GPU任务确定目标GPU,向该目标GPU发送目标参数数据,该目标GPU保存该目标参数数据,以便于处理该目标参数数据对应的待处理GPU任务以及其他同类任务,这样,相同类型的任务可以通过同一GPU进行处理,实现参数数据复用,极大降低了数据初始化以及CPU与GPU间传输参数数据的开销,提升了GPU的使用效率。并且,可以通过动态调配已经保存参数数据的GPU,使得GPU保存的参数数据可以根据处理需要而变化,进一步提高了GPU的使用率和处理任务的灵活性。
图5示出了根据本发明实施例的GPU任务的分配装置400的示意性框图,如图5所示,该装置400包括:处理器410和收发接口420,处理器410和收发接口420相连,可选地,该装置400还包括存储器430,存储器430与处理器410相连,进一步可选地,该装置400包括总线系统440。其中,处理器410、存储器430和收发接口420可以通过总线系统440相连,该存储器430可以用于存储指令,该处理器410用于执行该存储器430存储的指令,以控制收发接口420发送信息或信号,
该处理器410用于:从GPU集群中为待处理GPU任务确定目标GPU, 其中该GPU集群中包含至少两个GPU,该GPU集群中每个GPU保存有至少一个参数数据;通过该收发接口420向该目标GPU发送处理该待处理GPU任务时所需的目标参数数据,该目标GPU用于处理与该目标参数数据对应的GPU任务;通过该收发接口420将该待处理GPU分配给该目标GPU处理。
因此,本发明实施例的GPU任务的分配装置,通过在GPU集群中确定处理待处理GPU任务的目标GPU,并向该目标GPU发送处理该待处理GPU任务的目标参数数据,使得该目标GPU可以保存该目标参数数据,该目标GPU可以用于处理待处理GPU任务,还可以用于处理与目标参数数据对应的其他GPU任务,这样,采用相同参数数据的任务可以通过同一GPU进行处理,实现参数数据复用,极大降低了数据初始化以及CPU与GPU间传输参数数据的开销,由此能够提高GPU的利用率。
应理解,在本发明实施例中,该处理器410可以是CPU,该处理器410还可以是其他通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现成可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
该存储器430可以包括只读存储器和随机存取存储器,并向处理器410提供指令和数据。存储器430的一部分还可以包括非易失性随机存取存储器。例如,存储器430还可以存储设备类型的信息。
该总线系统440除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都标为总线系统440。
在实现过程中,上述方法的各步骤可以通过处理器410中的硬件的集成逻辑电路或者软件形式的指令完成。结合本发明实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器430,处理器410读取存储器430中的信息,结合其硬件完成上述方法的步骤。为避免重复,这里不再详细描述。
可选地,该处理器410用于:从该GPU集群中确定第一GPU和第二 GPU,其中,该第一GPU和该第二GPU保存有相同的第一参数数据;确定该第一GPU为该目标GPU。
可选地,该处理器410用于:通过收发接口420在确定该第一GPU为该目标GPU之后,向该第一GPU发送第一删除指令,该第一删除指令用于指示该第一GPU删除该第一参数数据。
可选地,该处理器410用于:当该GPU集群中不存在保存有相同的第一参数数据的第一GPU和第二GPU时,从该GPU集群中确定第三GPU,其中,该第三GPU保存有第二参数数据,且该目标参数数据和该第二参数数据的大小之和不大于该第三GPU的内存大小;确定该第三GPU为该目标GPU,该目标GPU用于处理与该目标参数数据和该第三参数数据对应的GPU任务。
可选地,该第三GPU中未被处理的GPU任务被处理时所需时长小于或等于第一时长,且该待处理GPU任务被处理时所需时长小于或等于第二时长。
可选地,该处理器410用于:当该GPU集群中不存在保存有相同的第一参数数据的第一GPU和第二GPU时,从该GPU集群中确定第四GPU和第五GPU,该第四GPU中保存的第三参数数据和该第五GPU中保存的第四参数数据的大小之和不大于该第四GPU的内存;通过收发接口420向该第四GPU发送该第四参数数据,该第四GPU用于处理与该第三参数数据和该第四参数数据对应的GPU任务;该处理器410用于:确定该第五GPU为该目标GPU。
可选地,该处理器410用于:通过收发接口420在确定该第五GPU为该目标GPU之后,向该第五GPU发送第二删除指令,该第二删除指令用于指示该第五GPU删除该第四参数数据。
可选地,该第四GPU中该第三参数数据对应的未被处理的GPU任务被处理时所需时长小于或等于第三时长,该第五GPU中该第四参数数据对应的未被处理的GPU任务被处理时所需时长小于或等于第四时长。
应理解,根据本发明实施例的GPU任务的分配装置400可对应于本发明实施例中的装置300,并可以对应于执行根据本发明实施例的方法100和方法200中的相应主体,并且装置400中的各个模块的上述和其它操作和/或功能分别为了实现图1至图3中的各个方法的相应流程,为了简洁,在此 不再赘述。
因此,本发明实施例的GPU任务的分配装置,通过在GPU集群中为待处理GPU任务确定目标GPU,向该目标GPU发送目标参数数据,该目标GPU保存该目标参数数据,以便于处理该目标参数数据对应的待处理GPU任务以及其他同类任务,这样,相同类型的任务可以通过同一GPU进行处理,实现参数数据复用,极大降低了数据初始化以及CPU与GPU间传输参数数据的开销,提升了GPU的使用效率。并且,可以通过动态调配已经保存参数数据的GPU,使得GPU保存的参数数据可以根据处理需要而变化,进一步提高了GPU的使用率和处理任务的灵活性。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应所述以权利要求的保护范围为准。

Claims (17)

  1. 一种图形处理器GPU任务的分配方法,其特征在于,包括:
    从GPU集群中为待处理GPU任务确定目标GPU,其中所述GPU集群中包含至少两个GPU,所述GPU集群中每个GPU保存有至少一个参数数据;
    向所述目标GPU发送处理所述待处理GPU任务时所需的目标参数数据,所述目标GPU用于处理与所述目标参数数据对应的GPU任务;
    将所述待处理GPU分配给所述目标GPU处理。
  2. 根据权利要求1所述的方法,其特征在于,所述从GPU集群中为待处理GPU任务确定目标GPU,包括:
    从所述GPU集群中确定第一GPU和第二GPU,其中,所述第一GPU和所述第二GPU保存有相同的第一参数数据;
    确定所述第一GPU为所述目标GPU。
  3. 根据权利要求2所述的方法,其特征在于,所述确定所述第一GPU为所述目标GPU之后,所述方法还包括:
    向所述第一GPU发送第一删除指令,所述第一删除指令用于指示所述第一GPU删除所述第一参数数据。
  4. 根据权利要求1所述的方法,其特征在于,当所述GPU集群中不存在保存有相同的第一参数数据的第一GPU和第二GPU时,所述从GPU集群中为待处理GPU任务确定目标GPU,包括:
    从所述GPU集群中确定第三GPU,其中,所述第三GPU保存有第二参数数据,且所述目标参数数据和所述第二参数数据的大小之和不大于所述第三GPU的内存大小;
    确定所述第三GPU为所述目标GPU,所述目标GPU用于处理与所述目标参数数据和所述第三参数数据对应的GPU任务。
  5. 根据权利要求4所述的方法,其特征在于,
    所述第三GPU中未被处理的GPU任务被处理时所需时长小于或等于第一时长,且所述待处理GPU任务被处理时所需时长小于或等于第二时长。
  6. 根据权利要求1所述的方法,其特征在于,当所述GPU集群中不存在保存有相同的第一参数数据的第一GPU和第二GPU时,所述从GPU集群中为待处理GPU任务确定目标GPU,包括:
    从所述GPU集群中确定第四GPU和第五GPU,所述第四GPU中保存 的第三参数数据和所述第五GPU中保存的第四参数数据的大小之和不大于所述第四GPU的内存;
    向所述第四GPU发送所述第四参数数据,所述第四GPU用于处理与所述第三参数数据和所述第四参数数据对应的GPU任务;
    确定所述第五GPU为所述目标GPU。
  7. 根据权利要求6所述的方法,其特征在于,所述确定所述第五GPU为所述目标GPU之后,所述方法还包括:
    向所述第五GPU发送第二删除指令,所述第二删除指令用于指示所述第五GPU删除所述第四参数数据。
  8. 根据权利要求6或7所述的方法,其特征在于,
    所述第四GPU中所述第三参数数据对应的未被处理的GPU任务被处理时所需时长小于或等于第三时长,所述第五GPU中所述第四参数数据对应的未被处理的GPU任务被处理时所需时长小于或等于第四时长。
  9. 一种图形处理器GPU任务的分配装置,其特征在于,包括:
    确定单元,用于从GPU集群中为待处理GPU任务确定目标GPU,其中所述GPU集群中包含至少两个GPU,所述GPU集群中每个GPU保存有至少一个参数数据;
    发送单元,用于向所述目标GPU发送处理所述待处理GPU任务时所需的目标参数数据,所述目标GPU用于处理与所述目标参数数据对应的GPU任务;
    所述发送单元还用于:将所述待处理GPU分配给所述目标GPU处理。
  10. 根据权利要求9所述的装置,其特征在于,所述确定单元具体用于:
    从所述GPU集群中确定第一GPU和第二GPU,其中,所述第一GPU和所述第二GPU保存有相同的第一参数数据;
    确定所述第一GPU为所述目标GPU。
  11. 根据权利要求10所述的装置,其特征在于,所述发送单元还用于:
    在确定所述第一GPU为所述目标GPU之后,向所述第一GPU发送第一删除指令,所述第一删除指令用于指示所述第一GPU删除所述第一参数数据。
  12. 根据权利要求9所述的装置,其特征在于,所述确定单元具体用于:
    当所述GPU集群中不存在保存有相同的第一参数数据的第一GPU和第 二GPU时,从所述GPU集群中确定第三GPU,其中,所述第三GPU保存有第二参数数据,且所述目标参数数据和所述第二参数数据的大小之和不大于所述第三GPU的内存大小;
    确定所述第三GPU为所述目标GPU,所述目标GPU用于处理与所述目标参数数据和所述第三参数数据对应的GPU任务。
  13. 根据权利要求12所述的装置,其特征在于,
    所述第三GPU中未被处理的GPU任务被处理时所需时长小于或等于第一时长,且所述待处理GPU任务被处理时所需时长小于或等于第二时长。
  14. 根据权利要求9所述的装置,其特征在于,所述确定单元具体用于:
    当所述GPU集群中不存在保存有相同的第一参数数据的第一GPU和第二GPU时,从所述GPU集群中确定第四GPU和第五GPU,所述第四GPU中保存的第三参数数据和所述第五GPU中保存的第四参数数据的大小之和不大于所述第四GPU的内存;
    所述发送单元具体用于:
    向所述第四GPU发送所述第四参数数据,所述第四GPU用于处理与所述第三参数数据和所述第四参数数据对应的GPU任务;
    所述确定单元具体用于:
    确定所述第五GPU为所述目标GPU。
  15. 根据权利要求14所述的装置,其特征在于,所述发送单元具体用于:
    在确定所述第五GPU为所述目标GPU之后,向所述第五GPU发送第二删除指令,所述第二删除指令用于指示所述第五GPU删除所述第四参数数据。
  16. 根据权利要求14或15所述的装置,其特征在于,
    所述第四GPU中所述第三参数数据对应的未被处理的GPU任务被处理时所需时长小于或等于第三时长,所述第五GPU中所述第四参数数据对应的未被处理的GPU任务被处理时所需时长小于或等于第四时长。
  17. 一种图形处理器GPU任务的分配装置,其特征在于,包括:处理器、存储器和总线;
    所述存储器用于存储执行指令,所述处理器与所述存储器通过所述总线连接,当所述存储控制器运行时,所述处理器执行所述存储器存储的所述执 行指令,以使所述装置执行权利要求1-8任一项所述的方法。
PCT/CN2016/080478 2016-04-28 2016-04-28 图形处理器任务的分配方法和装置 WO2017185285A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2016/080478 WO2017185285A1 (zh) 2016-04-28 2016-04-28 图形处理器任务的分配方法和装置
CN201680084996.6A CN109074281B (zh) 2016-04-28 2016-04-28 图形处理器任务的分配方法和装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/080478 WO2017185285A1 (zh) 2016-04-28 2016-04-28 图形处理器任务的分配方法和装置

Publications (1)

Publication Number Publication Date
WO2017185285A1 true WO2017185285A1 (zh) 2017-11-02

Family

ID=60161796

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/080478 WO2017185285A1 (zh) 2016-04-28 2016-04-28 图形处理器任务的分配方法和装置

Country Status (2)

Country Link
CN (1) CN109074281B (zh)
WO (1) WO2017185285A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111124656A (zh) * 2018-10-31 2020-05-08 伊姆西Ip控股有限责任公司 用于向专用计算资源分配任务的方法、设备和计算机程序产品
WO2020098414A1 (zh) * 2018-11-13 2020-05-22 Oppo广东移动通信有限公司 终端数据处理方法、装置及终端
CN112346859A (zh) * 2020-10-26 2021-02-09 北京市商汤科技开发有限公司 资源调度方法及装置、电子设备和存储介质
CN113204428A (zh) * 2021-05-28 2021-08-03 北京市商汤科技开发有限公司 资源调度方法、装置、电子设备以及计算机可读存储介质
CN115955550A (zh) * 2023-03-15 2023-04-11 浙江宇视科技有限公司 一种gpu集群的图像分析方法与系统

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111078356A (zh) * 2019-11-22 2020-04-28 北京达佳互联信息技术有限公司 Gpu集群资源控制系统、方法、装置、设备及存储介质
CN111190712A (zh) * 2019-12-25 2020-05-22 北京推想科技有限公司 一种任务调度方法、装置、设备及介质
CN111192230B (zh) * 2020-01-02 2023-09-19 北京百度网讯科技有限公司 基于多相机的图像处理方法、装置、设备和可读存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102648450A (zh) * 2009-09-23 2012-08-22 辉达公司 用于并行命令列表生成的硬件
US20130300752A1 (en) * 2012-05-10 2013-11-14 Nvidia Corporation System and method for compiler support for kernel launches in device code
WO2015183851A1 (en) * 2014-05-30 2015-12-03 Apple Inc. Combining compute tasks for a graphics processing unit

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8959501B2 (en) * 2010-12-14 2015-02-17 Microsoft Corporation Type and length abstraction for data types
CN102036043A (zh) * 2010-12-15 2011-04-27 成都市华为赛门铁克科技有限公司 视频数据处理方法、装置及视频监控系统
CN103299277B (zh) * 2011-12-31 2016-11-09 华为技术有限公司 Gpu系统及其处理方法
CN102693317B (zh) * 2012-05-29 2014-11-05 华为软件技术有限公司 数据挖掘流程生成方法及装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102648450A (zh) * 2009-09-23 2012-08-22 辉达公司 用于并行命令列表生成的硬件
US20130300752A1 (en) * 2012-05-10 2013-11-14 Nvidia Corporation System and method for compiler support for kernel launches in device code
WO2015183851A1 (en) * 2014-05-30 2015-12-03 Apple Inc. Combining compute tasks for a graphics processing unit

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111124656A (zh) * 2018-10-31 2020-05-08 伊姆西Ip控股有限责任公司 用于向专用计算资源分配任务的方法、设备和计算机程序产品
CN111124656B (zh) * 2018-10-31 2023-09-15 伊姆西Ip控股有限责任公司 用于向专用计算资源分配任务的方法、设备和计算机可读存储介质
WO2020098414A1 (zh) * 2018-11-13 2020-05-22 Oppo广东移动通信有限公司 终端数据处理方法、装置及终端
CN112346859A (zh) * 2020-10-26 2021-02-09 北京市商汤科技开发有限公司 资源调度方法及装置、电子设备和存储介质
CN113204428A (zh) * 2021-05-28 2021-08-03 北京市商汤科技开发有限公司 资源调度方法、装置、电子设备以及计算机可读存储介质
CN113204428B (zh) * 2021-05-28 2023-01-20 北京市商汤科技开发有限公司 资源调度方法、装置、电子设备以及计算机可读存储介质
CN115955550A (zh) * 2023-03-15 2023-04-11 浙江宇视科技有限公司 一种gpu集群的图像分析方法与系统

Also Published As

Publication number Publication date
CN109074281A (zh) 2018-12-21
CN109074281B (zh) 2022-05-24

Similar Documents

Publication Publication Date Title
WO2017185285A1 (zh) 图形处理器任务的分配方法和装置
US8478926B1 (en) Co-processing acceleration method, apparatus, and system
EP3754498B1 (en) Architecture for offload of linked work assignments
US9009312B2 (en) Controlling access to a resource in a distributed computing system with a distributed access request queue
US9563474B2 (en) Methods for managing threads within an application and devices thereof
WO2022247105A1 (zh) 一种任务调度方法、装置、计算机设备和存储介质
CN109564528B (zh) 分布式计算中计算资源分配的系统和方法
US10848440B2 (en) Systems and methods for allocating bandwidth across a cluster of accelerators
CN109726005B (zh) 用于管理资源的方法、服务器系统和计算机可读介质
WO2018107751A1 (zh) 一种资源调度装置、系统和方法
WO2021204147A1 (zh) 一种数据传输方法及装置
WO2020125396A1 (zh) 一种共享数据的处理方法、装置及服务器
CN106385377B (zh) 一种信息处理方法和系统
CN111240813A (zh) 一种dma调度方法、装置和计算机可读存储介质
US20150268985A1 (en) Low Latency Data Delivery
WO2016202153A1 (zh) 一种gpu资源的分配方法及系统
CN112286688A (zh) 一种内存管理和使用方法、装置、设备和介质
WO2017075796A1 (zh) 网络功能虚拟化nfv网络中分配虚拟资源的方法和装置
US11544113B2 (en) Task scheduling for machine-learning workloads
WO2023020010A1 (zh) 一种运行进程的方法及相关设备
KR101620896B1 (ko) 이기종 프로세싱 타입을 고려한 맵리듀스 프로그램 모델의 수행 성능 향상 방법, 수행 성능 향상 장치 및 수행 성능 향상 시스템
KR20160061726A (ko) 인터럽트 핸들링 방법
CN112114971A (zh) 一种任务分配方法、装置及设备
US10142245B2 (en) Apparatus and method for parallel processing
WO2023122899A1 (zh) 一种基于向量计算的处理方法及装置

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16899798

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 16899798

Country of ref document: EP

Kind code of ref document: A1