CN105224410A - A kind of GPU of scheduling carries out method and the device of batch computing - Google Patents
A kind of GPU of scheduling carries out method and the device of batch computing Download PDFInfo
- Publication number
- CN105224410A CN105224410A CN201510673433.8A CN201510673433A CN105224410A CN 105224410 A CN105224410 A CN 105224410A CN 201510673433 A CN201510673433 A CN 201510673433A CN 105224410 A CN105224410 A CN 105224410A
- Authority
- CN
- China
- Prior art keywords
- gpu
- buffer memory
- task
- batch
- scheduler module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Memory System Of A Hierarchy Structure (AREA)
Abstract
The present invention relates to GPU parallel computation field, especially a kind of GPU of scheduling carries out method and the device of batch computing.The present invention is directed to prior art Problems existing, a kind of method providing GPU of scheduling to carry out batch computing and device, the present invention devises independently GPU scheduler module and coordinates GPU to carry out the process of calculation task together with API application module, gives full play to the arithmetic capability of GPU.The present invention sends processor active task by API application module to GPU scheduler module, and the calculation task that one-period receives by GPU scheduler module stores in the buffer; After GPU process completes last batch calculation task, the calculation task of buffer memory batch is submitted to GPU by GPU scheduler module, and then API application module adopts synchronous mode or asynchronous mode to complete the subsequent operation of calculation task.
Description
Technical field
The present invention relates to GPU parallel computation field, especially a kind of GPU of scheduling carries out method and the device of batch computing.
Background technology
GPU (GraphicsProcessingUnit, Graphics Processing Unit) can be understood as programmable video card, in a computer for the process of graph image.Through the development of recent years, GPU is not limited to graph and image processing, is also applied to large-scale parallel computation field, uses GPU parallel computing, likely makes the performance of algorithm obtain the lifting of several times.
Monolithic GPU has hundreds of CORE(its main operational unit usually), considerably beyond the quantity of CPUCORE, GPU is very suitable for performing can the intensive calculation task of highly-parallel, compare the CPU of same price, high hundreds of times of the comparable CPU of CORE number that GPU is used, use GPU to perform these tasks, often can promote the performance of several times.GPU technology will change the fields such as business application, scientific algorithm, cloud computing, Computer-based Virtualization System, game and robot future, even redefine the computer programming mode known by us.
Although GPU has CORE more more than CPU, be that one group of unit is dispatched because GPU is inner with 16 CORE, that is, even if a task only needs a CORE, but inner at GPU, still minimumly can take 16 CORE.Therefore, need to submit processor active task to GPU in bulk, more CORE could be dispatched simultaneously and carry out computing; Further, concurrent number of tasks is larger, more can reduce mutual, the scheduling overhead that more can reduce GPU inside of GPU and host memory, more can realize higher operational performance.Even if adopt multithreading to call GPU, but the task that each thread gives GPU only dispatches a GPUCORE participation computing, and this scheduling mode gives GPU task with batch, and allow each CORE of GPU participate in computing and compare, its performance gap may than the latter thousands of times.
We are based on the application program of CPU architectural framework, multi-process or multithreading is normally adopted to process multiple task, after receiving a task, CPU will be called and carry out once-through operation, and do not need task buffer to calling execution computing in batches together, because performance boost can not be brought like this, the complexity of program greatly can be increased on the contrary.Use GPU then different, GPU needs batch to submit to processor active task just can give full play to its performance to it, but existing application will be transformed into the pattern submitting task to GPU by buffer memory batch, has sizable difficulty.
Summary of the invention
Technical matters to be solved by this invention is: for prior art Problems existing, a kind of method providing GPU of scheduling to carry out batch computing and device, the present invention devises independently GPU scheduler module, this module externally provides API, communicated by inter-process communication mechanisms between API and GPU scheduler module, application module sends processor active task by calling API to GPU scheduler module, the calculation task that a treatment cycle receives is cached by GPU scheduler module, by the time after GPU process completes last batch calculation task, again the calculation task of buffer memory batch is submitted to GPU, then API application module adopts synchronous mode or asynchronous mode to complete the subsequent operation of calculation task.Give full play to the arithmetic capability of GPU, and promote the internal storage access performance of GPU.
the technical solution used in the present invention is as follows:
A kind of method that GPU of scheduling carries out batch computing comprises:
Step 1:GPU scheduler module is communicated by Inter-Process Communication mechanism with API application module;
Step 2:API application module sends processor active task to GPU scheduler module, and the calculation task that one-period receives by GPU scheduler module stores in the buffer; After GPU process completes last batch calculation task, the calculation task of buffer memory batch is submitted to GPU by GPU scheduler module, and then API application module adopts synchronous mode or asynchronous mode to complete the subsequent operation of calculation task.
Further, when in described step 2, calculation task stores in the buffer, deposit in the same set respectively by calculation task identical for type, the calculation task often in group buffer memory is independently submitted to GPU and is carried out batch computing.
Further, described buffer memory of often organizing is Double buffer, namely with buffer memory and buffer memory for subsequent use; Buffer memory for subsequent use is for depositing the calculation task newly received, after GPU computing completes, operation result is placed in buffer memory, result of calculation is issued API application module, perform with giving GPU by the batch data in buffer memory for subsequent use, buffer memory for subsequent use becomes with buffer memory, originally becoming buffer memory for subsequent use with buffer memory, for newly receiving task.
Further, described being all that different parameters sets up corresponding buffer area with buffer memory and buffer memory for subsequent use, each identical parameters is left in continuous print address space.
Further, in described step 2, synchronous mode refers to API application module after GPU scheduler module transmission processor active task, waits for that GPU scheduler module returns result of calculation to API application module after executing processor active task; Asynchronous mode is that after API application module sends processor active task to GPU scheduler module, wait operation does not complete and directly returns, and after the computing of GPU scheduler module completes and returns operation result, API application module completes the process of operation result in the mode of readjustment.
The device that a kind of GPU of scheduling carries out batch computing comprises:
API application module, for sending processor active task to GPU scheduler module;
GPU scheduler module, for when API application module sends processor active task to GPU scheduler module, the calculation task that one-period receives by GPU scheduler module stores in the buffer; After GPU process completes last batch calculation task, the calculation task of buffer memory batch is submitted to GPU by GPU scheduler module, and then API application module adopts synchronous mode or asynchronous mode to complete the subsequent operation of calculation task; Wherein GPU scheduler module is communicated by Inter-Process Communication mechanism with API application module.
Further, when described calculation task stores in the buffer, deposit in the same set respectively by calculation task identical for type, the calculation task often in group buffer memory is independently submitted to GPU and is carried out batch computing.
Further, described buffer memory of often organizing is Double buffer, namely with buffer memory and buffer memory for subsequent use; Buffer memory for subsequent use is for depositing the calculation task newly received, after GPU computing completes, operation result is placed in buffer memory, result of calculation is issued API application module, perform with giving GPU by the batch data in buffer memory for subsequent use, buffer memory for subsequent use becomes with buffer memory, originally becoming buffer memory for subsequent use with buffer memory, for newly receiving task.
Further, described buffer memory and buffer memory for subsequent use are all that different parameters sets up corresponding buffer area, and each identical parameters is left in continuous print address space
.
Further, described middle synchronous mode refers to API application module after GPU scheduler module transmission processor active task, waits for that GPU scheduler module returns result of calculation to API application module after executing processor active task; Asynchronous mode is that after API application module sends processor active task to GPU scheduler module, wait operation does not complete and directly returns, and after the computing of GPU scheduler module completes and returns operation result, API application module completes the process of operation result in the mode of readjustment
.
in sum, owing to have employed technique scheme, the invention has the beneficial effects as follows:
1. the calculation task being respectively every type sets up one group of buffer memory, so often organizes in buffer memory and is same task, and the task often in group buffer memory is all independently submitted to GPU and carried out batch computing, allows GPU batch perform the arithmetic capability that same task just can give full play to GPU;
2. to make GPU effective use internal memory, need to organize efficiently data in advance, being all that different parameters sets up corresponding buffer area with buffer memory and buffer memory for subsequent use, each identical parameters is made to leave in continuous print address space, to promote the internal storage access performance of GPU.
Accompanying drawing explanation
examples of the present invention will be described by way of reference to the accompanying drawings, wherein:
Fig. 1 is the schematic diagram that scheduling GPU carries out batch computing;
Fig. 2 is the caching technology schematic diagram of scheduling GPU.
Embodiment
All features disclosed in this instructions, or the step in disclosed all methods or process, except mutually exclusive feature and/or step, all can combine by any way.
Arbitrary feature disclosed in this instructions (comprising any accessory claim, summary and accompanying drawing), unless specifically stated otherwise, all can be replaced by other equivalences or the alternative features with similar object.That is, unless specifically stated otherwise, each feature is an example in a series of equivalence or similar characteristics.
Implementation step of the present invention is as follows:
1. use synchronous mode API:
A) GPU is disposed;
B) dispose and run GPU scheduler module;
C) application module invoke synchronous Mode A PI performs processor active task.
2. use asynchronous mode API:
A) GPU is disposed;
B) dispose and run GPU scheduler module;
Restoration and reuse module, calls asynchronous mode API, and in readjustment, process the operation result of GPU.
Specifically comprise:
Step 1:GPU scheduler module is communicated by Inter-Process Communication mechanism with API application module;
Step 2:API application module sends processor active task to GPU scheduler module, and the calculation task that one-period receives by GPU scheduler module stores in the buffer; After GPU process completes last batch calculation task, the calculation task of buffer memory batch is submitted to GPU by GPU scheduler module, and then API application module adopts synchronous mode or asynchronous mode to complete the subsequent operation of calculation task.
Further, often organize buffer memory task and be Double buffer: with buffer memory and buffer memory for subsequent use, with buffer memory for depositing current performing of task and the result of executing the task, buffer memory for subsequent use is for depositing the calculation task newly received, after GPU computing completes, operation result is placed in buffer memory, result of calculation is issued application module, perform with giving GPU by the batch data in buffer memory for subsequent use, buffer memory for subsequent use becomes with buffer memory, originally becoming buffer memory for subsequent use with buffer memory, for newly receiving task;
Further, described is being all that different parameters sets up corresponding buffer area with buffer memory and buffer memory for subsequent use, makes each identical parameters leave in continuous print address space:
Such as: in certain, the calculation task of type needs use structure, structure comprises A and B two members, A and B length is all 4 bytes, buffer memory is done in CPU, normally structure entirety is cached in array, each structure represents the parameter of a computing, but this mode can not play to high-performance the parallel ability of GPU in GPU, because in GPU, when performing use A member, the thread that GPU may have more than 16 reads the A member of different structure body simultaneously, GPU has batch and reads mechanism, once read and can read continuous print 32 bytes, if A and B puts together as structure, structure accounts for 8 directly, then once read in 32 bytes and only comprise 4 A, meet at most the requirement of 4 threads, if A and B is placed on independently in array separately, then once read the requirement that can meet 8 threads, the performance of its internal memory operation is the former one times.
The present invention is not limited to aforesaid embodiment.The present invention expands to any new feature of disclosing in this manual or any combination newly, and the step of the arbitrary new method disclosed or process or any combination newly.
Claims (10)
1. dispatch the method that GPU carries out batch computing, it is characterized in that comprising:
Step 1:GPU scheduler module is communicated by Inter-Process Communication mechanism with API application module;
Step 2:API application module sends processor active task to GPU scheduler module, and the calculation task that one-period receives by GPU scheduler module stores in the buffer; After GPU process completes last batch calculation task, the calculation task of buffer memory batch is submitted to GPU by GPU scheduler module, and then API application module adopts synchronous mode or asynchronous mode to complete the subsequent operation of calculation task.
2. a kind of GPU of scheduling according to claim 1 carries out the method for batch computing, when it is characterized in that in described step 2, calculation task stores in the buffer, deposit in the same set by calculation task identical for type respectively, the calculation task often in group buffer memory is independently submitted to GPU and is carried out batch computing.
3. a kind of GPU of scheduling according to claim 2 carries out the method for batch computing, it is characterized in that described buffer memory of often organizing is Double buffer, namely with buffer memory and buffer memory for subsequent use; Buffer memory for subsequent use is for depositing the calculation task newly received, after GPU computing completes, operation result is placed in buffer memory, result of calculation is issued API application module, perform with giving GPU by the batch data in buffer memory for subsequent use, buffer memory for subsequent use becomes with buffer memory, originally becoming buffer memory for subsequent use with buffer memory, for newly receiving task.
4. a kind of GPU of scheduling according to claim 2 carries out the method for batch computing, it is characterized in that described being all that different parameters sets up corresponding buffer area with buffer memory and buffer memory for subsequent use, each identical parameters is left in continuous print address space.
5. a kind of GPU of scheduling according to claim 2 carries out the method for batch computing, it is characterized in that described step 2 synchronous mode refers to API application module after GPU scheduler module transmission processor active task, wait for that GPU scheduler module returns result of calculation to API application module after executing processor active task; Asynchronous mode is that after API application module sends processor active task to GPU scheduler module, wait operation does not complete and directly returns, and after the computing of GPU scheduler module completes and returns operation result, API application module completes the process of operation result in the mode of readjustment.
6. dispatch the device that GPU carries out batch computing, it is characterized in that comprising:
API application module, for sending processor active task to GPU scheduler module;
GPU scheduler module, for when API application module sends processor active task to GPU scheduler module, the calculation task that one-period receives by GPU scheduler module stores in the buffer; After GPU process completes last batch calculation task, the calculation task of buffer memory batch is submitted to GPU by GPU scheduler module, and then API application module adopts synchronous mode or asynchronous mode to complete the subsequent operation of calculation task; Wherein GPU scheduler module is communicated by Inter-Process Communication mechanism with API application module.
7. a kind of GPU of scheduling according to claim 5 carries out the device of batch computing, when it is characterized in that described calculation task stores in the buffer, deposit in the same set by calculation task identical for type respectively, the calculation task often in group buffer memory is independently submitted to GPU and is carried out batch computing.
8. a kind of GPU of scheduling according to claim 6 carries out the device of batch computing, it is characterized in that described buffer memory of often organizing is Double buffer, namely with buffer memory and buffer memory for subsequent use; Buffer memory for subsequent use is for depositing the calculation task newly received, after GPU computing completes, operation result is placed in buffer memory, result of calculation is issued API application module, perform with giving GPU by the batch data in buffer memory for subsequent use, buffer memory for subsequent use becomes with buffer memory, originally becoming buffer memory for subsequent use with buffer memory, for newly receiving task.
9. a kind of GPU of scheduling according to claim 7 carries out the device of batch computing, it is characterized in that described buffer memory and buffer memory for subsequent use are all that different parameters sets up corresponding buffer area, each identical parameters is left in continuous print address space
.
10. the device of batch computing is carried out according to a kind of GPU of scheduling one of claim 6 to 9 Suo Shu, it is characterized in that described synchronous mode refers to API application module after GPU scheduler module transmission processor active task, wait for that GPU scheduler module returns result of calculation to API application module after executing processor active task; Asynchronous mode is that after API application module sends processor active task to GPU scheduler module, wait operation does not complete and directly returns, and after the computing of GPU scheduler module completes and returns operation result, API application module completes the process of operation result in the mode of readjustment.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510673433.8A CN105224410A (en) | 2015-10-19 | 2015-10-19 | A kind of GPU of scheduling carries out method and the device of batch computing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510673433.8A CN105224410A (en) | 2015-10-19 | 2015-10-19 | A kind of GPU of scheduling carries out method and the device of batch computing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105224410A true CN105224410A (en) | 2016-01-06 |
Family
ID=54993400
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510673433.8A Pending CN105224410A (en) | 2015-10-19 | 2015-10-19 | A kind of GPU of scheduling carries out method and the device of batch computing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105224410A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017143522A1 (en) * | 2016-02-23 | 2017-08-31 | Intel Corporation | Graphics processor workload acceleration using a command template for batch usage scenarios |
CN108446253A (en) * | 2018-03-28 | 2018-08-24 | 北京航空航天大学 | The parallel calculating method that a kind of Sparse Matrix-Vector for martial prowess architectural framework multiplies |
CN109711323A (en) * | 2018-12-25 | 2019-05-03 | 武汉烽火众智数字技术有限责任公司 | A kind of live video stream analysis accelerated method, device and equipment |
CN109769115A (en) * | 2019-01-04 | 2019-05-17 | 武汉烽火众智数字技术有限责任公司 | A kind of method, apparatus and equipment of Intelligent Optimal video analysis performance |
US10484690B2 (en) | 2015-06-04 | 2019-11-19 | Intel Corporation | Adaptive batch encoding for slow motion video recording |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102299843A (en) * | 2011-06-28 | 2011-12-28 | 北京安天电子设备有限公司 | Network data processing method based on graphic processing unit (GPU) and buffer area, and system thereof |
CN102859492A (en) * | 2010-04-28 | 2013-01-02 | 瑞典爱立信有限公司 | Technique for GPU command scheduling |
CN104035751A (en) * | 2014-06-20 | 2014-09-10 | 深圳市腾讯计算机系统有限公司 | Graphics processing unit based parallel data processing method and device |
CN104036451A (en) * | 2014-06-20 | 2014-09-10 | 深圳市腾讯计算机系统有限公司 | Parallel model processing method and device based on multiple graphics processing units |
WO2015123840A1 (en) * | 2014-02-20 | 2015-08-27 | Intel Corporation | Workload batch submission mechanism for graphics processing unit |
-
2015
- 2015-10-19 CN CN201510673433.8A patent/CN105224410A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102859492A (en) * | 2010-04-28 | 2013-01-02 | 瑞典爱立信有限公司 | Technique for GPU command scheduling |
CN102299843A (en) * | 2011-06-28 | 2011-12-28 | 北京安天电子设备有限公司 | Network data processing method based on graphic processing unit (GPU) and buffer area, and system thereof |
WO2015123840A1 (en) * | 2014-02-20 | 2015-08-27 | Intel Corporation | Workload batch submission mechanism for graphics processing unit |
CN104035751A (en) * | 2014-06-20 | 2014-09-10 | 深圳市腾讯计算机系统有限公司 | Graphics processing unit based parallel data processing method and device |
CN104036451A (en) * | 2014-06-20 | 2014-09-10 | 深圳市腾讯计算机系统有限公司 | Parallel model processing method and device based on multiple graphics processing units |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10484690B2 (en) | 2015-06-04 | 2019-11-19 | Intel Corporation | Adaptive batch encoding for slow motion video recording |
WO2017143522A1 (en) * | 2016-02-23 | 2017-08-31 | Intel Corporation | Graphics processor workload acceleration using a command template for batch usage scenarios |
US10846142B2 (en) | 2016-02-23 | 2020-11-24 | Intel Corporation | Graphics processor workload acceleration using a command template for batch usage scenarios |
CN108446253A (en) * | 2018-03-28 | 2018-08-24 | 北京航空航天大学 | The parallel calculating method that a kind of Sparse Matrix-Vector for martial prowess architectural framework multiplies |
CN108446253B (en) * | 2018-03-28 | 2021-07-23 | 北京航空航天大学 | Parallel computing method for sparse matrix vector multiplication aiming at Shenwei system architecture |
CN109711323A (en) * | 2018-12-25 | 2019-05-03 | 武汉烽火众智数字技术有限责任公司 | A kind of live video stream analysis accelerated method, device and equipment |
CN109711323B (en) * | 2018-12-25 | 2021-06-15 | 武汉烽火众智数字技术有限责任公司 | Real-time video stream analysis acceleration method, device and equipment |
CN109769115A (en) * | 2019-01-04 | 2019-05-17 | 武汉烽火众智数字技术有限责任公司 | A kind of method, apparatus and equipment of Intelligent Optimal video analysis performance |
CN109769115B (en) * | 2019-01-04 | 2020-10-27 | 武汉烽火众智数字技术有限责任公司 | Method, device and equipment for optimizing intelligent video analysis performance |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105224410A (en) | A kind of GPU of scheduling carries out method and the device of batch computing | |
US9146777B2 (en) | Parallel processing with solidarity cells by proactively retrieving from a task pool a matching task for the solidarity cell to process | |
CN110806923B (en) | Parallel processing method and device for block chain tasks, electronic equipment and medium | |
CN105579959B (en) | Hardware accelerator virtualization | |
CN106662995B (en) | Device, method, system, medium and the equipment seized for providing intermediate thread | |
Budden et al. | Deep tensor convolution on multicores | |
CN107450971A (en) | Task processing method and device | |
CN109522108B (en) | GPU task scheduling system and method based on Kernel merging | |
CN111310893A (en) | Device and method for executing neural network operation | |
CN106844017A (en) | The method and apparatus that event is processed for Website server | |
DE102020101814A1 (en) | EFFICIENT EXECUTION BASED ON TASK GRAPHS OF DEFINED WORKLOADS | |
CN110795254A (en) | Method for processing high-concurrency IO based on PHP | |
US8370845B1 (en) | Method for synchronizing independent cooperative thread arrays running on a graphics processing unit | |
CN115880132A (en) | Graphics processor, matrix multiplication task processing method, device and storage medium | |
CN109416673A (en) | Memory requests arbitration | |
CN114691232A (en) | Offloading performance of multitask parameter-related operations to a network device | |
CN111352896B (en) | Artificial intelligence accelerator, equipment, chip and data processing method | |
US10275289B2 (en) | Coexistence of message-passing-like algorithms and procedural coding | |
US20120151145A1 (en) | Data Driven Micro-Scheduling of the Individual Processing Elements of a Wide Vector SIMD Processing Unit | |
US20170262487A1 (en) | Using Message-Passing With Procedural Code In A Database Kernel | |
CN115361382B (en) | Data processing method, device, equipment and storage medium based on data group | |
CN112395062A (en) | Task processing method, device, equipment and computer readable storage medium | |
JP2019515384A (en) | Intersubgroup data sharing | |
CN115378937B (en) | Distributed concurrency method, device, equipment and readable storage medium for tasks | |
CN116701010A (en) | Method, system, device and medium for updating multithreading shared variable |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20160106 |
|
RJ01 | Rejection of invention patent application after publication |