CN105224410A - A kind of GPU of scheduling carries out method and the device of batch computing - Google Patents

A kind of GPU of scheduling carries out method and the device of batch computing Download PDF

Info

Publication number
CN105224410A
CN105224410A CN201510673433.8A CN201510673433A CN105224410A CN 105224410 A CN105224410 A CN 105224410A CN 201510673433 A CN201510673433 A CN 201510673433A CN 105224410 A CN105224410 A CN 105224410A
Authority
CN
China
Prior art keywords
gpu
buffer memory
task
batch
scheduler module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510673433.8A
Other languages
Chinese (zh)
Inventor
吴庆国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Westone Information Industry Inc
Original Assignee
Chengdu Westone Information Industry Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Westone Information Industry Inc filed Critical Chengdu Westone Information Industry Inc
Priority to CN201510673433.8A priority Critical patent/CN105224410A/en
Publication of CN105224410A publication Critical patent/CN105224410A/en
Pending legal-status Critical Current

Links

Landscapes

  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The present invention relates to GPU parallel computation field, especially a kind of GPU of scheduling carries out method and the device of batch computing.The present invention is directed to prior art Problems existing, a kind of method providing GPU of scheduling to carry out batch computing and device, the present invention devises independently GPU scheduler module and coordinates GPU to carry out the process of calculation task together with API application module, gives full play to the arithmetic capability of GPU.The present invention sends processor active task by API application module to GPU scheduler module, and the calculation task that one-period receives by GPU scheduler module stores in the buffer; After GPU process completes last batch calculation task, the calculation task of buffer memory batch is submitted to GPU by GPU scheduler module, and then API application module adopts synchronous mode or asynchronous mode to complete the subsequent operation of calculation task.

Description

A kind of GPU of scheduling carries out method and the device of batch computing
Technical field
The present invention relates to GPU parallel computation field, especially a kind of GPU of scheduling carries out method and the device of batch computing.
Background technology
GPU (GraphicsProcessingUnit, Graphics Processing Unit) can be understood as programmable video card, in a computer for the process of graph image.Through the development of recent years, GPU is not limited to graph and image processing, is also applied to large-scale parallel computation field, uses GPU parallel computing, likely makes the performance of algorithm obtain the lifting of several times.
Monolithic GPU has hundreds of CORE(its main operational unit usually), considerably beyond the quantity of CPUCORE, GPU is very suitable for performing can the intensive calculation task of highly-parallel, compare the CPU of same price, high hundreds of times of the comparable CPU of CORE number that GPU is used, use GPU to perform these tasks, often can promote the performance of several times.GPU technology will change the fields such as business application, scientific algorithm, cloud computing, Computer-based Virtualization System, game and robot future, even redefine the computer programming mode known by us.
Although GPU has CORE more more than CPU, be that one group of unit is dispatched because GPU is inner with 16 CORE, that is, even if a task only needs a CORE, but inner at GPU, still minimumly can take 16 CORE.Therefore, need to submit processor active task to GPU in bulk, more CORE could be dispatched simultaneously and carry out computing; Further, concurrent number of tasks is larger, more can reduce mutual, the scheduling overhead that more can reduce GPU inside of GPU and host memory, more can realize higher operational performance.Even if adopt multithreading to call GPU, but the task that each thread gives GPU only dispatches a GPUCORE participation computing, and this scheduling mode gives GPU task with batch, and allow each CORE of GPU participate in computing and compare, its performance gap may than the latter thousands of times.
We are based on the application program of CPU architectural framework, multi-process or multithreading is normally adopted to process multiple task, after receiving a task, CPU will be called and carry out once-through operation, and do not need task buffer to calling execution computing in batches together, because performance boost can not be brought like this, the complexity of program greatly can be increased on the contrary.Use GPU then different, GPU needs batch to submit to processor active task just can give full play to its performance to it, but existing application will be transformed into the pattern submitting task to GPU by buffer memory batch, has sizable difficulty.
Summary of the invention
Technical matters to be solved by this invention is: for prior art Problems existing, a kind of method providing GPU of scheduling to carry out batch computing and device, the present invention devises independently GPU scheduler module, this module externally provides API, communicated by inter-process communication mechanisms between API and GPU scheduler module, application module sends processor active task by calling API to GPU scheduler module, the calculation task that a treatment cycle receives is cached by GPU scheduler module, by the time after GPU process completes last batch calculation task, again the calculation task of buffer memory batch is submitted to GPU, then API application module adopts synchronous mode or asynchronous mode to complete the subsequent operation of calculation task.Give full play to the arithmetic capability of GPU, and promote the internal storage access performance of GPU.
the technical solution used in the present invention is as follows:
A kind of method that GPU of scheduling carries out batch computing comprises:
Step 1:GPU scheduler module is communicated by Inter-Process Communication mechanism with API application module;
Step 2:API application module sends processor active task to GPU scheduler module, and the calculation task that one-period receives by GPU scheduler module stores in the buffer; After GPU process completes last batch calculation task, the calculation task of buffer memory batch is submitted to GPU by GPU scheduler module, and then API application module adopts synchronous mode or asynchronous mode to complete the subsequent operation of calculation task.
Further, when in described step 2, calculation task stores in the buffer, deposit in the same set respectively by calculation task identical for type, the calculation task often in group buffer memory is independently submitted to GPU and is carried out batch computing.
Further, described buffer memory of often organizing is Double buffer, namely with buffer memory and buffer memory for subsequent use; Buffer memory for subsequent use is for depositing the calculation task newly received, after GPU computing completes, operation result is placed in buffer memory, result of calculation is issued API application module, perform with giving GPU by the batch data in buffer memory for subsequent use, buffer memory for subsequent use becomes with buffer memory, originally becoming buffer memory for subsequent use with buffer memory, for newly receiving task.
Further, described being all that different parameters sets up corresponding buffer area with buffer memory and buffer memory for subsequent use, each identical parameters is left in continuous print address space.
Further, in described step 2, synchronous mode refers to API application module after GPU scheduler module transmission processor active task, waits for that GPU scheduler module returns result of calculation to API application module after executing processor active task; Asynchronous mode is that after API application module sends processor active task to GPU scheduler module, wait operation does not complete and directly returns, and after the computing of GPU scheduler module completes and returns operation result, API application module completes the process of operation result in the mode of readjustment.
The device that a kind of GPU of scheduling carries out batch computing comprises:
API application module, for sending processor active task to GPU scheduler module;
GPU scheduler module, for when API application module sends processor active task to GPU scheduler module, the calculation task that one-period receives by GPU scheduler module stores in the buffer; After GPU process completes last batch calculation task, the calculation task of buffer memory batch is submitted to GPU by GPU scheduler module, and then API application module adopts synchronous mode or asynchronous mode to complete the subsequent operation of calculation task; Wherein GPU scheduler module is communicated by Inter-Process Communication mechanism with API application module.
Further, when described calculation task stores in the buffer, deposit in the same set respectively by calculation task identical for type, the calculation task often in group buffer memory is independently submitted to GPU and is carried out batch computing.
Further, described buffer memory of often organizing is Double buffer, namely with buffer memory and buffer memory for subsequent use; Buffer memory for subsequent use is for depositing the calculation task newly received, after GPU computing completes, operation result is placed in buffer memory, result of calculation is issued API application module, perform with giving GPU by the batch data in buffer memory for subsequent use, buffer memory for subsequent use becomes with buffer memory, originally becoming buffer memory for subsequent use with buffer memory, for newly receiving task.
Further, described buffer memory and buffer memory for subsequent use are all that different parameters sets up corresponding buffer area, and each identical parameters is left in continuous print address space .
Further, described middle synchronous mode refers to API application module after GPU scheduler module transmission processor active task, waits for that GPU scheduler module returns result of calculation to API application module after executing processor active task; Asynchronous mode is that after API application module sends processor active task to GPU scheduler module, wait operation does not complete and directly returns, and after the computing of GPU scheduler module completes and returns operation result, API application module completes the process of operation result in the mode of readjustment .
in sum, owing to have employed technique scheme, the invention has the beneficial effects as follows:
1. the calculation task being respectively every type sets up one group of buffer memory, so often organizes in buffer memory and is same task, and the task often in group buffer memory is all independently submitted to GPU and carried out batch computing, allows GPU batch perform the arithmetic capability that same task just can give full play to GPU;
2. to make GPU effective use internal memory, need to organize efficiently data in advance, being all that different parameters sets up corresponding buffer area with buffer memory and buffer memory for subsequent use, each identical parameters is made to leave in continuous print address space, to promote the internal storage access performance of GPU.
Accompanying drawing explanation
examples of the present invention will be described by way of reference to the accompanying drawings, wherein:
Fig. 1 is the schematic diagram that scheduling GPU carries out batch computing;
Fig. 2 is the caching technology schematic diagram of scheduling GPU.
Embodiment
All features disclosed in this instructions, or the step in disclosed all methods or process, except mutually exclusive feature and/or step, all can combine by any way.
Arbitrary feature disclosed in this instructions (comprising any accessory claim, summary and accompanying drawing), unless specifically stated otherwise, all can be replaced by other equivalences or the alternative features with similar object.That is, unless specifically stated otherwise, each feature is an example in a series of equivalence or similar characteristics.
Implementation step of the present invention is as follows:
1. use synchronous mode API:
A) GPU is disposed;
B) dispose and run GPU scheduler module;
C) application module invoke synchronous Mode A PI performs processor active task.
2. use asynchronous mode API:
A) GPU is disposed;
B) dispose and run GPU scheduler module;
Restoration and reuse module, calls asynchronous mode API, and in readjustment, process the operation result of GPU.
Specifically comprise:
Step 1:GPU scheduler module is communicated by Inter-Process Communication mechanism with API application module;
Step 2:API application module sends processor active task to GPU scheduler module, and the calculation task that one-period receives by GPU scheduler module stores in the buffer; After GPU process completes last batch calculation task, the calculation task of buffer memory batch is submitted to GPU by GPU scheduler module, and then API application module adopts synchronous mode or asynchronous mode to complete the subsequent operation of calculation task.
Further, often organize buffer memory task and be Double buffer: with buffer memory and buffer memory for subsequent use, with buffer memory for depositing current performing of task and the result of executing the task, buffer memory for subsequent use is for depositing the calculation task newly received, after GPU computing completes, operation result is placed in buffer memory, result of calculation is issued application module, perform with giving GPU by the batch data in buffer memory for subsequent use, buffer memory for subsequent use becomes with buffer memory, originally becoming buffer memory for subsequent use with buffer memory, for newly receiving task;
Further, described is being all that different parameters sets up corresponding buffer area with buffer memory and buffer memory for subsequent use, makes each identical parameters leave in continuous print address space:
Such as: in certain, the calculation task of type needs use structure, structure comprises A and B two members, A and B length is all 4 bytes, buffer memory is done in CPU, normally structure entirety is cached in array, each structure represents the parameter of a computing, but this mode can not play to high-performance the parallel ability of GPU in GPU, because in GPU, when performing use A member, the thread that GPU may have more than 16 reads the A member of different structure body simultaneously, GPU has batch and reads mechanism, once read and can read continuous print 32 bytes, if A and B puts together as structure, structure accounts for 8 directly, then once read in 32 bytes and only comprise 4 A, meet at most the requirement of 4 threads, if A and B is placed on independently in array separately, then once read the requirement that can meet 8 threads, the performance of its internal memory operation is the former one times.
The present invention is not limited to aforesaid embodiment.The present invention expands to any new feature of disclosing in this manual or any combination newly, and the step of the arbitrary new method disclosed or process or any combination newly.

Claims (10)

1. dispatch the method that GPU carries out batch computing, it is characterized in that comprising:
Step 1:GPU scheduler module is communicated by Inter-Process Communication mechanism with API application module;
Step 2:API application module sends processor active task to GPU scheduler module, and the calculation task that one-period receives by GPU scheduler module stores in the buffer; After GPU process completes last batch calculation task, the calculation task of buffer memory batch is submitted to GPU by GPU scheduler module, and then API application module adopts synchronous mode or asynchronous mode to complete the subsequent operation of calculation task.
2. a kind of GPU of scheduling according to claim 1 carries out the method for batch computing, when it is characterized in that in described step 2, calculation task stores in the buffer, deposit in the same set by calculation task identical for type respectively, the calculation task often in group buffer memory is independently submitted to GPU and is carried out batch computing.
3. a kind of GPU of scheduling according to claim 2 carries out the method for batch computing, it is characterized in that described buffer memory of often organizing is Double buffer, namely with buffer memory and buffer memory for subsequent use; Buffer memory for subsequent use is for depositing the calculation task newly received, after GPU computing completes, operation result is placed in buffer memory, result of calculation is issued API application module, perform with giving GPU by the batch data in buffer memory for subsequent use, buffer memory for subsequent use becomes with buffer memory, originally becoming buffer memory for subsequent use with buffer memory, for newly receiving task.
4. a kind of GPU of scheduling according to claim 2 carries out the method for batch computing, it is characterized in that described being all that different parameters sets up corresponding buffer area with buffer memory and buffer memory for subsequent use, each identical parameters is left in continuous print address space.
5. a kind of GPU of scheduling according to claim 2 carries out the method for batch computing, it is characterized in that described step 2 synchronous mode refers to API application module after GPU scheduler module transmission processor active task, wait for that GPU scheduler module returns result of calculation to API application module after executing processor active task; Asynchronous mode is that after API application module sends processor active task to GPU scheduler module, wait operation does not complete and directly returns, and after the computing of GPU scheduler module completes and returns operation result, API application module completes the process of operation result in the mode of readjustment.
6. dispatch the device that GPU carries out batch computing, it is characterized in that comprising:
API application module, for sending processor active task to GPU scheduler module;
GPU scheduler module, for when API application module sends processor active task to GPU scheduler module, the calculation task that one-period receives by GPU scheduler module stores in the buffer; After GPU process completes last batch calculation task, the calculation task of buffer memory batch is submitted to GPU by GPU scheduler module, and then API application module adopts synchronous mode or asynchronous mode to complete the subsequent operation of calculation task; Wherein GPU scheduler module is communicated by Inter-Process Communication mechanism with API application module.
7. a kind of GPU of scheduling according to claim 5 carries out the device of batch computing, when it is characterized in that described calculation task stores in the buffer, deposit in the same set by calculation task identical for type respectively, the calculation task often in group buffer memory is independently submitted to GPU and is carried out batch computing.
8. a kind of GPU of scheduling according to claim 6 carries out the device of batch computing, it is characterized in that described buffer memory of often organizing is Double buffer, namely with buffer memory and buffer memory for subsequent use; Buffer memory for subsequent use is for depositing the calculation task newly received, after GPU computing completes, operation result is placed in buffer memory, result of calculation is issued API application module, perform with giving GPU by the batch data in buffer memory for subsequent use, buffer memory for subsequent use becomes with buffer memory, originally becoming buffer memory for subsequent use with buffer memory, for newly receiving task.
9. a kind of GPU of scheduling according to claim 7 carries out the device of batch computing, it is characterized in that described buffer memory and buffer memory for subsequent use are all that different parameters sets up corresponding buffer area, each identical parameters is left in continuous print address space .
10. the device of batch computing is carried out according to a kind of GPU of scheduling one of claim 6 to 9 Suo Shu, it is characterized in that described synchronous mode refers to API application module after GPU scheduler module transmission processor active task, wait for that GPU scheduler module returns result of calculation to API application module after executing processor active task; Asynchronous mode is that after API application module sends processor active task to GPU scheduler module, wait operation does not complete and directly returns, and after the computing of GPU scheduler module completes and returns operation result, API application module completes the process of operation result in the mode of readjustment.
CN201510673433.8A 2015-10-19 2015-10-19 A kind of GPU of scheduling carries out method and the device of batch computing Pending CN105224410A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510673433.8A CN105224410A (en) 2015-10-19 2015-10-19 A kind of GPU of scheduling carries out method and the device of batch computing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510673433.8A CN105224410A (en) 2015-10-19 2015-10-19 A kind of GPU of scheduling carries out method and the device of batch computing

Publications (1)

Publication Number Publication Date
CN105224410A true CN105224410A (en) 2016-01-06

Family

ID=54993400

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510673433.8A Pending CN105224410A (en) 2015-10-19 2015-10-19 A kind of GPU of scheduling carries out method and the device of batch computing

Country Status (1)

Country Link
CN (1) CN105224410A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017143522A1 (en) * 2016-02-23 2017-08-31 Intel Corporation Graphics processor workload acceleration using a command template for batch usage scenarios
CN108446253A (en) * 2018-03-28 2018-08-24 北京航空航天大学 The parallel calculating method that a kind of Sparse Matrix-Vector for martial prowess architectural framework multiplies
CN109711323A (en) * 2018-12-25 2019-05-03 武汉烽火众智数字技术有限责任公司 A kind of live video stream analysis accelerated method, device and equipment
CN109769115A (en) * 2019-01-04 2019-05-17 武汉烽火众智数字技术有限责任公司 A kind of method, apparatus and equipment of Intelligent Optimal video analysis performance
US10484690B2 (en) 2015-06-04 2019-11-19 Intel Corporation Adaptive batch encoding for slow motion video recording

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102299843A (en) * 2011-06-28 2011-12-28 北京安天电子设备有限公司 Network data processing method based on graphic processing unit (GPU) and buffer area, and system thereof
CN102859492A (en) * 2010-04-28 2013-01-02 瑞典爱立信有限公司 Technique for GPU command scheduling
CN104035751A (en) * 2014-06-20 2014-09-10 深圳市腾讯计算机系统有限公司 Graphics processing unit based parallel data processing method and device
CN104036451A (en) * 2014-06-20 2014-09-10 深圳市腾讯计算机系统有限公司 Parallel model processing method and device based on multiple graphics processing units
WO2015123840A1 (en) * 2014-02-20 2015-08-27 Intel Corporation Workload batch submission mechanism for graphics processing unit

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102859492A (en) * 2010-04-28 2013-01-02 瑞典爱立信有限公司 Technique for GPU command scheduling
CN102299843A (en) * 2011-06-28 2011-12-28 北京安天电子设备有限公司 Network data processing method based on graphic processing unit (GPU) and buffer area, and system thereof
WO2015123840A1 (en) * 2014-02-20 2015-08-27 Intel Corporation Workload batch submission mechanism for graphics processing unit
CN104035751A (en) * 2014-06-20 2014-09-10 深圳市腾讯计算机系统有限公司 Graphics processing unit based parallel data processing method and device
CN104036451A (en) * 2014-06-20 2014-09-10 深圳市腾讯计算机系统有限公司 Parallel model processing method and device based on multiple graphics processing units

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10484690B2 (en) 2015-06-04 2019-11-19 Intel Corporation Adaptive batch encoding for slow motion video recording
WO2017143522A1 (en) * 2016-02-23 2017-08-31 Intel Corporation Graphics processor workload acceleration using a command template for batch usage scenarios
US10846142B2 (en) 2016-02-23 2020-11-24 Intel Corporation Graphics processor workload acceleration using a command template for batch usage scenarios
CN108446253A (en) * 2018-03-28 2018-08-24 北京航空航天大学 The parallel calculating method that a kind of Sparse Matrix-Vector for martial prowess architectural framework multiplies
CN108446253B (en) * 2018-03-28 2021-07-23 北京航空航天大学 Parallel computing method for sparse matrix vector multiplication aiming at Shenwei system architecture
CN109711323A (en) * 2018-12-25 2019-05-03 武汉烽火众智数字技术有限责任公司 A kind of live video stream analysis accelerated method, device and equipment
CN109711323B (en) * 2018-12-25 2021-06-15 武汉烽火众智数字技术有限责任公司 Real-time video stream analysis acceleration method, device and equipment
CN109769115A (en) * 2019-01-04 2019-05-17 武汉烽火众智数字技术有限责任公司 A kind of method, apparatus and equipment of Intelligent Optimal video analysis performance
CN109769115B (en) * 2019-01-04 2020-10-27 武汉烽火众智数字技术有限责任公司 Method, device and equipment for optimizing intelligent video analysis performance

Similar Documents

Publication Publication Date Title
CN105224410A (en) A kind of GPU of scheduling carries out method and the device of batch computing
US9146777B2 (en) Parallel processing with solidarity cells by proactively retrieving from a task pool a matching task for the solidarity cell to process
CN110806923B (en) Parallel processing method and device for block chain tasks, electronic equipment and medium
CN105579959B (en) Hardware accelerator virtualization
CN106662995B (en) Device, method, system, medium and the equipment seized for providing intermediate thread
Budden et al. Deep tensor convolution on multicores
CN107450971A (en) Task processing method and device
CN109522108B (en) GPU task scheduling system and method based on Kernel merging
CN111310893A (en) Device and method for executing neural network operation
CN106844017A (en) The method and apparatus that event is processed for Website server
DE102020101814A1 (en) EFFICIENT EXECUTION BASED ON TASK GRAPHS OF DEFINED WORKLOADS
CN110795254A (en) Method for processing high-concurrency IO based on PHP
US8370845B1 (en) Method for synchronizing independent cooperative thread arrays running on a graphics processing unit
CN115880132A (en) Graphics processor, matrix multiplication task processing method, device and storage medium
CN109416673A (en) Memory requests arbitration
CN114691232A (en) Offloading performance of multitask parameter-related operations to a network device
CN111352896B (en) Artificial intelligence accelerator, equipment, chip and data processing method
US10275289B2 (en) Coexistence of message-passing-like algorithms and procedural coding
US20120151145A1 (en) Data Driven Micro-Scheduling of the Individual Processing Elements of a Wide Vector SIMD Processing Unit
US20170262487A1 (en) Using Message-Passing With Procedural Code In A Database Kernel
CN115361382B (en) Data processing method, device, equipment and storage medium based on data group
CN112395062A (en) Task processing method, device, equipment and computer readable storage medium
JP2019515384A (en) Intersubgroup data sharing
CN115378937B (en) Distributed concurrency method, device, equipment and readable storage medium for tasks
CN116701010A (en) Method, system, device and medium for updating multithreading shared variable

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160106

RJ01 Rejection of invention patent application after publication