CN104361553B

CN104361553B - Synchronizing method capable of increasing processing efficiency of graphics processing unit

Info

Publication number: CN104361553B
Application number: CN201410610231.4A
Authority: CN
Inventors: 左颢睿; 徐智勇; 魏宇星; 张建林; 欧阳益民; 许俊平; 祁小平
Original assignee: Institute of Optics and Electronics of CAS
Current assignee: Institute of Optics and Electronics of CAS
Priority date: 2014-11-02
Filing date: 2014-11-02
Publication date: 2017-04-12
Anticipated expiration: 2034-11-02
Also published as: CN104361553A

Abstract

The invention provides a synchronizing method capable of increasing the processing efficiency of a graphics processing unit. The synchronizing method includes: after the current executing core of the graphics processing unit enters synchronization, a synchronous input vector and a synchronous output vector are built in the graphics processing unit; the graphics processing unit updates signs in the synchronous input vector; the graphics processing unit performs circulating inquiry on the signs in the synchronous input vector, quits circulating when the fact that the synchronous input signs are already updated is inquired, and updates the signs in the synchronous output vector; the graphics processing unit performs circulating inquiry on the signs in the synchronous output vector and quits circulating when the fact that all the synchronous output signs are already updated is inquired; synchronization of the current executing core inside the graphics processing unit is completed. By the synchronizing method, direct fast synchronization inside the graphics processing unit can be performed when the graphics processing unit executes a multi-core processing task, the graphics processing unit is prevented from repeatedly returning to a computer system for loading and synchronization, and the processing efficiency of the graphics processing unit is increased.

Description

A kind of synchronous method for improving graphic process unit treatment effeciency

Technical field

The invention belongs to high-performance calculation is particularly parallel computation field.Specifically related to a kind of graphic process unit that improves is processed The synchronous method of efficiency.

Background technology

Graphic process unit is a kind of high performance parallel processor, is widely used to multiple fields, including image procossing, gold Melt the fields such as analysis, oil exploration, physical modeling and meteorologic analysis.The task of graphic process unit processes model and can be divided into three Level：Core level, block level, thread-level.Each task can include multiple cores, and each core can include multiple pieces, and each block can be wrapped Containing multiple threads.

However, at present graphic process unit is mainly used in laboratory simulations or processing system afterwards, seldom it is applied in real time Processing system.This is primarily due to current all of graphic process unit and is required to computer system be controlled, computer system Load core carries out computing to graphic process unit, after graphic process unit has processed a core, it is necessary to exit graphic process unit simultaneously Computer system is returned, for completing the synchronization of result.When a task is made up of multiple cores, computer system needs Load multiple core, graphic process unit is also required to repeatedly to start and return computer system, this strong influence image processor Treatment effeciency.Cause in many cases, to start and the time of synchronizing pattern processor result substantially exceeds graphics process Between device process task.

At present, graphic process unit is only providing the synchronization of thread-level, does not provide block level synchronous method, it is necessary to exit Graphic process unit returns computer system and can just complete block level synchronization, and this is to cause graphic process unit performing multiple core compositions times Efficiency low basic reason during business.The utilization ratio improved by graphic process unit, in the urgent need to synchronization side between a kind of efficient block Method.

The content of the invention

The technology of the present invention solve problem：For the deficiencies in the prior art, there is provided a kind of to improve graphic process unit treatment effeciency Synchronous method, ensure result it is consistent on the basis of, improve graphic process unit treatment effeciency.

The purpose as realizing, technical scheme：A kind of synchronization side for improving graphic process unit treatment effeciency Method, comprises the steps：

Step one, graphic process unit process one be made up of n core task when (n >=2), it is current to perform core t entrance Synchronous (1≤t ＜ n) is processed；

Step 2, graphic process unit set up synchronous input vector and synchronism output vector, when the process point of graphic process unit When block number mesh is m (m >=1).Synchronous input vector A is initially set up, the size of synchronous input vector A is equal to the place of graphic process unit Reason piecemeal number m, is expressed as A { i }, i={ 1,2 ..., m }；Synchronism output vector B is then set up, synchronism output vector B's is big The little process piecemeal number m equal to graphic process unit, is expressed as B { i }, i={ 1,2 ..., m }；

Step 3, graphic process unit update synchronous input vector, and it is Flag that block i updates synchronous input vector A [i] value；

Step 4, graphic process unit judge whether synchronous input vector completes to update, with any one block to synchronous input The value of vectorial A is circulated inquiry, judges whether the mark in synchronous input vector A is all updated to Flag, if completed Update, execution step five, outstanding updates continue executing with step 4；

Step 5, graphic process unit update synchronism output vector, update synchronism output vector B's using the block in step 4 It is worth for Flag；

Step 6, graphic process unit judge whether synchronism output vector completes to update, and block i judges synchronism output vector B [i] Whether value is Flag, if all values are updated to Flag in synchronism output vector B, completes to update, and execution step seven is not complete Into renewal, step 6 is continued executing with；

Step 7, graphic process unit complete the synchronization process for currently performing core t.

Present invention beneficial effect compared with prior art is：

The synchronous method for improving graphic process unit treatment effeciency proposed by the present invention, in the task that execution is made up of multiple cores When, can complete between block synchronous in graphic process unit, it is to avoid graphic process unit repeatedly returns department of computer science because synchronous between block System, the low problem of graphic process unit treatment effeciency.

Description of the drawings

Fig. 1 is the inventive method flowchart；

Fig. 2 is a task schematic diagram being made up of multiple cores, and the task is divided into n core, and each core is made up of m block, Each block is made up of u thread；

Fig. 3 is that graphic process unit is processing processing stream when constituting task by multinuclear as shown in Figure 2 using after the present invention Cheng Tu.

Specific embodiment

In order to be better understood by technical scheme, below in conjunction with the accompanying drawings embodiments of the invention are made specifically It is bright.

Process one it is as shown in Figure 2 be made up of multiple cores task when, as shown in figure 1, the invention provides a kind of The synchronous method of graphic process unit treatment effeciency is improved, implementation steps are as follows：

, when the task that is made up of n core is processed, the current core that performs enters synchronous for step one, graphic process unit：Figure Shape processor after t-th core (1≤t ＜ n) completes process task, into synchronization process；

Step 2, graphic process unit set up synchronous input vector and synchronism output vector：When the process point of graphic process unit When block number mesh is m, synchronous input vector A is initially set up, the size of synchronous input vector A is equal to the process piecemeal of graphic process unit Number m, is expressed as A { i }, i={ 1,2 ..., m }；Synchronism output vector B is then set up, the size of synchronism output vector B is equal to The process piecemeal number m of graphic process unit, is expressed as B { i }, i={ 1,2 ..., m }；

Step 3, graphic process unit update synchronous input vector：Block i updates synchronization input vector A [i] value for Flag, example As block 5 updates the A [5] of synchronous input vector A so that A [5]=Flag；

Step 4, graphic process unit judge whether synchronous input vector completes to update：Use any one block, this example block 1 Inquiry is circulated to the value of synchronous input vector A, judges whether the mark in synchronous input vector A is all updated to Flag, if (1≤i≤m) each value is equal to Flag in A [i], indicates that synchronous input vector A completes to update, performs Step 5, if there is any one A [i] to be not equal to Flag, indicates outstanding updates, continues executing with step 4；

Step 5, graphic process unit update synchronism output vector：Synchronism output vector B is updated using the block 1 in step 4 Value be Flag so that any one B [i]=Flag, (1≤i≤m)；

Step 6, graphic process unit judge whether synchronism output vector completes to update：Block i judges synchronism output vector B [i] Whether value (1≤i≤m) is Flag, if all values are updated to Flag in synchronism output vector B, if each in B [i] Individual value is equal to Flag, then indicate that synchronism output vector B completes to update, execution step seven, if there is any one B [i] In Flag, synchronism output vector B outstanding updates are indicated, step 6 is continued executing with；

Step 7, graphic process unit complete the synchronization process of currently processed core t.

As shown in figure 3, using the present invention, when the task that is made up of n core is processed, computer system only needs loading 1 core after graphic process unit only need to be processed all n cores are completed, 1 property of final result is returned and is calculated to graphic process unit Machine system, comparison with standard handling process have been saved n-1 core load time and n-1 graphic process unit synchronization process result and have been returned The time of returning, improve the treatment effeciency of graphic process unit.

Non-elaborated part of the present invention belongs to the known technology of those skilled in the art.

Those of ordinary skill in the art it should be appreciated that the embodiment of the above be intended merely to explanation the present invention, And limitation of the invention is not intended as, as long as in the spirit of the present invention, the change to above-described embodiment becomes Type will all fall in the range of claims of the present invention.

Claims

1. a kind of synchronous method for improving graphic process unit treatment effeciency, its feature comprises the steps：

, when the task that is made up of n core is processed, the current core t that performs enters synchronization process for step one, graphic process unit；Its In, n >=2,1≤t ＜ n；

Step 2, graphic process unit set up synchronous input vector and synchronism output vector, when the process block count of graphic process unit When mesh is m, wherein, m >=1 initially sets up synchronous input vector A, and the size of synchronous input vector A is equal to the place of graphic process unit Reason piecemeal number m, is expressed as A { i }, i={ 1,2 ..., m }；Synchronism output vector B is then set up, synchronism output vector B's is big The little process piecemeal number m equal to graphic process unit, is expressed as B { i }, i={ 1,2 ..., m }；

Step 4, graphic process unit judge whether synchronous input vector completes to update, with any one block to synchronous input vector A Value be circulated inquiry, judge whether the mark in synchronous input vector A is all updated to Flag, if completing to update, Execution step five, outstanding updates continue executing with step 4；

Step 5, graphic process unit update synchronism output vector, and the value for updating synchronism output vector B using the block in step 4 is Flag；

Step 6, graphic process unit judge whether synchronism output vector completes to update, and block i judges that synchronism output vector B [i] value is No if all values are updated to Flag in synchronism output vector B, to complete to update for Flag, execution step seven is not completed more Newly, continue executing with step 6；