CN105224410A

CN105224410A - A kind of GPU of scheduling carries out method and the device of batch computing

Info

Publication number: CN105224410A
Application number: CN201510673433.8A
Authority: CN
Inventors: 吴庆国
Original assignee: Chengdu Westone Information Industry Inc
Current assignee: Chengdu Westone Information Industry Inc
Priority date: 2015-10-19
Filing date: 2015-10-19
Publication date: 2016-01-06

Abstract

The present invention relates to GPU parallel computation field, especially a kind of GPU of scheduling carries out method and the device of batch computing.The present invention is directed to prior art Problems existing, a kind of method providing GPU of scheduling to carry out batch computing and device, the present invention devises independently GPU scheduler module and coordinates GPU to carry out the process of calculation task together with API application module, gives full play to the arithmetic capability of GPU.The present invention sends processor active task by API application module to GPU scheduler module, and the calculation task that one-period receives by GPU scheduler module stores in the buffer; After GPU process completes last batch calculation task, the calculation task of buffer memory batch is submitted to GPU by GPU scheduler module, and then API application module adopts synchronous mode or asynchronous mode to complete the subsequent operation of calculation task.

Description

A kind of GPU of scheduling carries out method and the device of batch computing

Technical field

The present invention relates to GPU parallel computation field, especially a kind of GPU of scheduling carries out method and the device of batch computing.

Background technology

GPU (GraphicsProcessingUnit, Graphics Processing Unit) can be understood as programmable video card, in a computer for the process of graph image.Through the development of recent years, GPU is not limited to graph and image processing, is also applied to large-scale parallel computation field, uses GPU parallel computing, likely makes the performance of algorithm obtain the lifting of several times.

Monolithic GPU has hundreds of CORE(its main operational unit usually), considerably beyond the quantity of CPUCORE, GPU is very suitable for performing can the intensive calculation task of highly-parallel, compare the CPU of same price, high hundreds of times of the comparable CPU of CORE number that GPU is used, use GPU to perform these tasks, often can promote the performance of several times.GPU technology will change the fields such as business application, scientific algorithm, cloud computing, Computer-based Virtualization System, game and robot future, even redefine the computer programming mode known by us.

Although GPU has CORE more more than CPU, be that one group of unit is dispatched because GPU is inner with 16 CORE, that is, even if a task only needs a CORE, but inner at GPU, still minimumly can take 16 CORE.Therefore, need to submit processor active task to GPU in bulk, more CORE could be dispatched simultaneously and carry out computing; Further, concurrent number of tasks is larger, more can reduce mutual, the scheduling overhead that more can reduce GPU inside of GPU and host memory, more can realize higher operational performance.Even if adopt multithreading to call GPU, but the task that each thread gives GPU only dispatches a GPUCORE participation computing, and this scheduling mode gives GPU task with batch, and allow each CORE of GPU participate in computing and compare, its performance gap may than the latter thousands of times.

We are based on the application program of CPU architectural framework, multi-process or multithreading is normally adopted to process multiple task, after receiving a task, CPU will be called and carry out once-through operation, and do not need task buffer to calling execution computing in batches together, because performance boost can not be brought like this, the complexity of program greatly can be increased on the contrary.Use GPU then different, GPU needs batch to submit to processor active task just can give full play to its performance to it, but existing application will be transformed into the pattern submitting task to GPU by buffer memory batch, has sizable difficulty.

Summary of the invention

Technical matters to be solved by this invention is: for prior art Problems existing, a kind of method providing GPU of scheduling to carry out batch computing and device, the present invention devises independently GPU scheduler module, this module externally provides API, communicated by inter-process communication mechanisms between API and GPU scheduler module, application module sends processor active task by calling API to GPU scheduler module, the calculation task that a treatment cycle receives is cached by GPU scheduler module, by the time after GPU process completes last batch calculation task, again the calculation task of buffer memory batch is submitted to GPU, then API application module adopts synchronous mode or asynchronous mode to complete the subsequent operation of calculation task.Give full play to the arithmetic capability of GPU, and promote the internal storage access performance of GPU.

the technical solution used in the present invention is as follows:

A kind of method that GPU of scheduling carries out batch computing comprises:

Step 1:GPU scheduler module is communicated by Inter-Process Communication mechanism with API application module;

Step 2:API application module sends processor active task to GPU scheduler module, and the calculation task that one-period receives by GPU scheduler module stores in the buffer; After GPU process completes last batch calculation task, the calculation task of buffer memory batch is submitted to GPU by GPU scheduler module, and then API application module adopts synchronous mode or asynchronous mode to complete the subsequent operation of calculation task.

Further, when in described step 2, calculation task stores in the buffer, deposit in the same set respectively by calculation task identical for type, the calculation task often in group buffer memory is independently submitted to GPU and is carried out batch computing.

Further, described buffer memory of often organizing is Double buffer, namely with buffer memory and buffer memory for subsequent use; Buffer memory for subsequent use is for depositing the calculation task newly received, after GPU computing completes, operation result is placed in buffer memory, result of calculation is issued API application module, perform with giving GPU by the batch data in buffer memory for subsequent use, buffer memory for subsequent use becomes with buffer memory, originally becoming buffer memory for subsequent use with buffer memory, for newly receiving task.

Further, described being all that different parameters sets up corresponding buffer area with buffer memory and buffer memory for subsequent use, each identical parameters is left in continuous print address space.

Further, in described step 2, synchronous mode refers to API application module after GPU scheduler module transmission processor active task, waits for that GPU scheduler module returns result of calculation to API application module after executing processor active task; Asynchronous mode is that after API application module sends processor active task to GPU scheduler module, wait operation does not complete and directly returns, and after the computing of GPU scheduler module completes and returns operation result, API application module completes the process of operation result in the mode of readjustment.

The device that a kind of GPU of scheduling carries out batch computing comprises:

API application module, for sending processor active task to GPU scheduler module;

GPU scheduler module, for when API application module sends processor active task to GPU scheduler module, the calculation task that one-period receives by GPU scheduler module stores in the buffer; After GPU process completes last batch calculation task, the calculation task of buffer memory batch is submitted to GPU by GPU scheduler module, and then API application module adopts synchronous mode or asynchronous mode to complete the subsequent operation of calculation task; Wherein GPU scheduler module is communicated by Inter-Process Communication mechanism with API application module.

Further, when described calculation task stores in the buffer, deposit in the same set respectively by calculation task identical for type, the calculation task often in group buffer memory is independently submitted to GPU and is carried out batch computing.

Further, described buffer memory and buffer memory for subsequent use are all that different parameters sets up corresponding buffer area, and each identical parameters is left in continuous print address space .

Further, described middle synchronous mode refers to API application module after GPU scheduler module transmission processor active task, waits for that GPU scheduler module returns result of calculation to API application module after executing processor active task; Asynchronous mode is that after API application module sends processor active task to GPU scheduler module, wait operation does not complete and directly returns, and after the computing of GPU scheduler module completes and returns operation result, API application module completes the process of operation result in the mode of readjustment .

in sum, owing to have employed technique scheme, the invention has the beneficial effects as follows:

1. the calculation task being respectively every type sets up one group of buffer memory, so often organizes in buffer memory and is same task, and the task often in group buffer memory is all independently submitted to GPU and carried out batch computing, allows GPU batch perform the arithmetic capability that same task just can give full play to GPU;

2. to make GPU effective use internal memory, need to organize efficiently data in advance, being all that different parameters sets up corresponding buffer area with buffer memory and buffer memory for subsequent use, each identical parameters is made to leave in continuous print address space, to promote the internal storage access performance of GPU.

Accompanying drawing explanation

examples of the present invention will be described by way of reference to the accompanying drawings, wherein:

Fig. 1 is the schematic diagram that scheduling GPU carries out batch computing;

Fig. 2 is the caching technology schematic diagram of scheduling GPU.

Embodiment

All features disclosed in this instructions, or the step in disclosed all methods or process, except mutually exclusive feature and/or step, all can combine by any way.

Arbitrary feature disclosed in this instructions (comprising any accessory claim, summary and accompanying drawing), unless specifically stated otherwise, all can be replaced by other equivalences or the alternative features with similar object.That is, unless specifically stated otherwise, each feature is an example in a series of equivalence or similar characteristics.

Implementation step of the present invention is as follows:

1. use synchronous mode API:

A) GPU is disposed;

B) dispose and run GPU scheduler module;

C) application module invoke synchronous Mode A PI performs processor active task.

2. use asynchronous mode API:

A) GPU is disposed;

B) dispose and run GPU scheduler module;

Restoration and reuse module, calls asynchronous mode API, and in readjustment, process the operation result of GPU.

Specifically comprise:

Further, often organize buffer memory task and be Double buffer: with buffer memory and buffer memory for subsequent use, with buffer memory for depositing current performing of task and the result of executing the task, buffer memory for subsequent use is for depositing the calculation task newly received, after GPU computing completes, operation result is placed in buffer memory, result of calculation is issued application module, perform with giving GPU by the batch data in buffer memory for subsequent use, buffer memory for subsequent use becomes with buffer memory, originally becoming buffer memory for subsequent use with buffer memory, for newly receiving task;

Further, described is being all that different parameters sets up corresponding buffer area with buffer memory and buffer memory for subsequent use, makes each identical parameters leave in continuous print address space:

Such as: in certain, the calculation task of type needs use structure, structure comprises A and B two members, A and B length is all 4 bytes, buffer memory is done in CPU, normally structure entirety is cached in array, each structure represents the parameter of a computing, but this mode can not play to high-performance the parallel ability of GPU in GPU, because in GPU, when performing use A member, the thread that GPU may have more than 16 reads the A member of different structure body simultaneously, GPU has batch and reads mechanism, once read and can read continuous print 32 bytes, if A and B puts together as structure, structure accounts for 8 directly, then once read in 32 bytes and only comprise 4 A, meet at most the requirement of 4 threads, if A and B is placed on independently in array separately, then once read the requirement that can meet 8 threads, the performance of its internal memory operation is the former one times.

The present invention is not limited to aforesaid embodiment.The present invention expands to any new feature of disclosing in this manual or any combination newly, and the step of the arbitrary new method disclosed or process or any combination newly.

Claims

1. dispatch the method that GPU carries out batch computing, it is characterized in that comprising:

2. a kind of GPU of scheduling according to claim 1 carries out the method for batch computing, when it is characterized in that in described step 2, calculation task stores in the buffer, deposit in the same set by calculation task identical for type respectively, the calculation task often in group buffer memory is independently submitted to GPU and is carried out batch computing.

3. a kind of GPU of scheduling according to claim 2 carries out the method for batch computing, it is characterized in that described buffer memory of often organizing is Double buffer, namely with buffer memory and buffer memory for subsequent use; Buffer memory for subsequent use is for depositing the calculation task newly received, after GPU computing completes, operation result is placed in buffer memory, result of calculation is issued API application module, perform with giving GPU by the batch data in buffer memory for subsequent use, buffer memory for subsequent use becomes with buffer memory, originally becoming buffer memory for subsequent use with buffer memory, for newly receiving task.

4. a kind of GPU of scheduling according to claim 2 carries out the method for batch computing, it is characterized in that described being all that different parameters sets up corresponding buffer area with buffer memory and buffer memory for subsequent use, each identical parameters is left in continuous print address space.

5. a kind of GPU of scheduling according to claim 2 carries out the method for batch computing, it is characterized in that described step 2 synchronous mode refers to API application module after GPU scheduler module transmission processor active task, wait for that GPU scheduler module returns result of calculation to API application module after executing processor active task; Asynchronous mode is that after API application module sends processor active task to GPU scheduler module, wait operation does not complete and directly returns, and after the computing of GPU scheduler module completes and returns operation result, API application module completes the process of operation result in the mode of readjustment.

6. dispatch the device that GPU carries out batch computing, it is characterized in that comprising:

7. a kind of GPU of scheduling according to claim 5 carries out the device of batch computing, when it is characterized in that described calculation task stores in the buffer, deposit in the same set by calculation task identical for type respectively, the calculation task often in group buffer memory is independently submitted to GPU and is carried out batch computing.

8. a kind of GPU of scheduling according to claim 6 carries out the device of batch computing, it is characterized in that described buffer memory of often organizing is Double buffer, namely with buffer memory and buffer memory for subsequent use; Buffer memory for subsequent use is for depositing the calculation task newly received, after GPU computing completes, operation result is placed in buffer memory, result of calculation is issued API application module, perform with giving GPU by the batch data in buffer memory for subsequent use, buffer memory for subsequent use becomes with buffer memory, originally becoming buffer memory for subsequent use with buffer memory, for newly receiving task.

9. a kind of GPU of scheduling according to claim 7 carries out the device of batch computing, it is characterized in that described buffer memory and buffer memory for subsequent use are all that different parameters sets up corresponding buffer area, each identical parameters is left in continuous print address space .

10. the device of batch computing is carried out according to a kind of GPU of scheduling one of claim 6 to 9 Suo Shu, it is characterized in that described synchronous mode refers to API application module after GPU scheduler module transmission processor active task, wait for that GPU scheduler module returns result of calculation to API application module after executing processor active task; Asynchronous mode is that after API application module sends processor active task to GPU scheduler module, wait operation does not complete and directly returns, and after the computing of GPU scheduler module completes and returns operation result, API application module completes the process of operation result in the mode of readjustment.