CN104361553B - Synchronizing method capable of increasing processing efficiency of graphics processing unit - Google Patents
Synchronizing method capable of increasing processing efficiency of graphics processing unit Download PDFInfo
- Publication number
- CN104361553B CN104361553B CN201410610231.4A CN201410610231A CN104361553B CN 104361553 B CN104361553 B CN 104361553B CN 201410610231 A CN201410610231 A CN 201410610231A CN 104361553 B CN104361553 B CN 104361553B
- Authority
- CN
- China
- Prior art keywords
- process unit
- graphic process
- processing unit
- synchronous input
- graphics processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
The invention provides a synchronizing method capable of increasing the processing efficiency of a graphics processing unit. The synchronizing method includes: after the current executing core of the graphics processing unit enters synchronization, a synchronous input vector and a synchronous output vector are built in the graphics processing unit; the graphics processing unit updates signs in the synchronous input vector; the graphics processing unit performs circulating inquiry on the signs in the synchronous input vector, quits circulating when the fact that the synchronous input signs are already updated is inquired, and updates the signs in the synchronous output vector; the graphics processing unit performs circulating inquiry on the signs in the synchronous output vector and quits circulating when the fact that all the synchronous output signs are already updated is inquired; synchronization of the current executing core inside the graphics processing unit is completed. By the synchronizing method, direct fast synchronization inside the graphics processing unit can be performed when the graphics processing unit executes a multi-core processing task, the graphics processing unit is prevented from repeatedly returning to a computer system for loading and synchronization, and the processing efficiency of the graphics processing unit is increased.
Description
Technical field
The invention belongs to high-performance calculation is particularly parallel computation field.Specifically related to a kind of graphic process unit that improves is processed
The synchronous method of efficiency.
Background technology
Graphic process unit is a kind of high performance parallel processor, is widely used to multiple fields, including image procossing, gold
Melt the fields such as analysis, oil exploration, physical modeling and meteorologic analysis.The task of graphic process unit processes model and can be divided into three
Level:Core level, block level, thread-level.Each task can include multiple cores, and each core can include multiple pieces, and each block can be wrapped
Containing multiple threads.
However, at present graphic process unit is mainly used in laboratory simulations or processing system afterwards, seldom it is applied in real time
Processing system.This is primarily due to current all of graphic process unit and is required to computer system be controlled, computer system
Load core carries out computing to graphic process unit, after graphic process unit has processed a core, it is necessary to exit graphic process unit simultaneously
Computer system is returned, for completing the synchronization of result.When a task is made up of multiple cores, computer system needs
Load multiple core, graphic process unit is also required to repeatedly to start and return computer system, this strong influence image processor
Treatment effeciency.Cause in many cases, to start and the time of synchronizing pattern processor result substantially exceeds graphics process
Between device process task.
At present, graphic process unit is only providing the synchronization of thread-level, does not provide block level synchronous method, it is necessary to exit
Graphic process unit returns computer system and can just complete block level synchronization, and this is to cause graphic process unit performing multiple core compositions times
Efficiency low basic reason during business.The utilization ratio improved by graphic process unit, in the urgent need to synchronization side between a kind of efficient block
Method.
The content of the invention
The technology of the present invention solve problem:For the deficiencies in the prior art, there is provided a kind of to improve graphic process unit treatment effeciency
Synchronous method, ensure result it is consistent on the basis of, improve graphic process unit treatment effeciency.
The purpose as realizing, technical scheme:A kind of synchronization side for improving graphic process unit treatment effeciency
Method, comprises the steps:
Step one, graphic process unit process one be made up of n core task when (n >=2), it is current to perform core t entrance
Synchronous (1≤t < n) is processed;
Step 2, graphic process unit set up synchronous input vector and synchronism output vector, when the process point of graphic process unit
When block number mesh is m (m >=1).Synchronous input vector A is initially set up, the size of synchronous input vector A is equal to the place of graphic process unit
Reason piecemeal number m, is expressed as A { i }, i={ 1,2 ..., m };Synchronism output vector B is then set up, synchronism output vector B's is big
The little process piecemeal number m equal to graphic process unit, is expressed as B { i }, i={ 1,2 ..., m };
Step 3, graphic process unit update synchronous input vector, and it is Flag that block i updates synchronous input vector A [i] value;
Step 4, graphic process unit judge whether synchronous input vector completes to update, with any one block to synchronous input
The value of vectorial A is circulated inquiry, judges whether the mark in synchronous input vector A is all updated to Flag, if completed
Update, execution step five, outstanding updates continue executing with step 4;
Step 5, graphic process unit update synchronism output vector, update synchronism output vector B's using the block in step 4
It is worth for Flag;
Step 6, graphic process unit judge whether synchronism output vector completes to update, and block i judges synchronism output vector B [i]
Whether value is Flag, if all values are updated to Flag in synchronism output vector B, completes to update, and execution step seven is not complete
Into renewal, step 6 is continued executing with;
Step 7, graphic process unit complete the synchronization process for currently performing core t.
Present invention beneficial effect compared with prior art is:
The synchronous method for improving graphic process unit treatment effeciency proposed by the present invention, in the task that execution is made up of multiple cores
When, can complete between block synchronous in graphic process unit, it is to avoid graphic process unit repeatedly returns department of computer science because synchronous between block
System, the low problem of graphic process unit treatment effeciency.
Description of the drawings
Fig. 1 is the inventive method flowchart;
Fig. 2 is a task schematic diagram being made up of multiple cores, and the task is divided into n core, and each core is made up of m block,
Each block is made up of u thread;
Fig. 3 is that graphic process unit is processing processing stream when constituting task by multinuclear as shown in Figure 2 using after the present invention
Cheng Tu.
Specific embodiment
In order to be better understood by technical scheme, below in conjunction with the accompanying drawings embodiments of the invention are made specifically
It is bright.
Process one it is as shown in Figure 2 be made up of multiple cores task when, as shown in figure 1, the invention provides a kind of
The synchronous method of graphic process unit treatment effeciency is improved, implementation steps are as follows:
, when the task that is made up of n core is processed, the current core that performs enters synchronous for step one, graphic process unit:Figure
Shape processor after t-th core (1≤t < n) completes process task, into synchronization process;
Step 2, graphic process unit set up synchronous input vector and synchronism output vector:When the process point of graphic process unit
When block number mesh is m, synchronous input vector A is initially set up, the size of synchronous input vector A is equal to the process piecemeal of graphic process unit
Number m, is expressed as A { i }, i={ 1,2 ..., m };Synchronism output vector B is then set up, the size of synchronism output vector B is equal to
The process piecemeal number m of graphic process unit, is expressed as B { i }, i={ 1,2 ..., m };
Step 3, graphic process unit update synchronous input vector:Block i updates synchronization input vector A [i] value for Flag, example
As block 5 updates the A [5] of synchronous input vector A so that A [5]=Flag;
Step 4, graphic process unit judge whether synchronous input vector completes to update:Use any one block, this example block 1
Inquiry is circulated to the value of synchronous input vector A, judges whether the mark in synchronous input vector A is all updated to
Flag, if (1≤i≤m) each value is equal to Flag in A [i], indicates that synchronous input vector A completes to update, performs
Step 5, if there is any one A [i] to be not equal to Flag, indicates outstanding updates, continues executing with step 4;
Step 5, graphic process unit update synchronism output vector:Synchronism output vector B is updated using the block 1 in step 4
Value be Flag so that any one B [i]=Flag, (1≤i≤m);
Step 6, graphic process unit judge whether synchronism output vector completes to update:Block i judges synchronism output vector B [i]
Whether value (1≤i≤m) is Flag, if all values are updated to Flag in synchronism output vector B, if each in B [i]
Individual value is equal to Flag, then indicate that synchronism output vector B completes to update, execution step seven, if there is any one B [i]
In Flag, synchronism output vector B outstanding updates are indicated, step 6 is continued executing with;
Step 7, graphic process unit complete the synchronization process of currently processed core t.
As shown in figure 3, using the present invention, when the task that is made up of n core is processed, computer system only needs loading
1 core after graphic process unit only need to be processed all n cores are completed, 1 property of final result is returned and is calculated to graphic process unit
Machine system, comparison with standard handling process have been saved n-1 core load time and n-1 graphic process unit synchronization process result and have been returned
The time of returning, improve the treatment effeciency of graphic process unit.
Non-elaborated part of the present invention belongs to the known technology of those skilled in the art.
Those of ordinary skill in the art it should be appreciated that the embodiment of the above be intended merely to explanation the present invention,
And limitation of the invention is not intended as, as long as in the spirit of the present invention, the change to above-described embodiment becomes
Type will all fall in the range of claims of the present invention.
Claims (1)
1. a kind of synchronous method for improving graphic process unit treatment effeciency, its feature comprises the steps:
, when the task that is made up of n core is processed, the current core t that performs enters synchronization process for step one, graphic process unit;Its
In, n >=2,1≤t < n;
Step 2, graphic process unit set up synchronous input vector and synchronism output vector, when the process block count of graphic process unit
When mesh is m, wherein, m >=1 initially sets up synchronous input vector A, and the size of synchronous input vector A is equal to the place of graphic process unit
Reason piecemeal number m, is expressed as A { i }, i={ 1,2 ..., m };Synchronism output vector B is then set up, synchronism output vector B's is big
The little process piecemeal number m equal to graphic process unit, is expressed as B { i }, i={ 1,2 ..., m };
Step 3, graphic process unit update synchronous input vector, and it is Flag that block i updates synchronous input vector A [i] value;
Step 4, graphic process unit judge whether synchronous input vector completes to update, with any one block to synchronous input vector A
Value be circulated inquiry, judge whether the mark in synchronous input vector A is all updated to Flag, if completing to update,
Execution step five, outstanding updates continue executing with step 4;
Step 5, graphic process unit update synchronism output vector, and the value for updating synchronism output vector B using the block in step 4 is
Flag;
Step 6, graphic process unit judge whether synchronism output vector completes to update, and block i judges that synchronism output vector B [i] value is
No if all values are updated to Flag in synchronism output vector B, to complete to update for Flag, execution step seven is not completed more
Newly, continue executing with step 6;
Step 7, graphic process unit complete the synchronization process for currently performing core t.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410610231.4A CN104361553B (en) | 2014-11-02 | 2014-11-02 | Synchronizing method capable of increasing processing efficiency of graphics processing unit |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410610231.4A CN104361553B (en) | 2014-11-02 | 2014-11-02 | Synchronizing method capable of increasing processing efficiency of graphics processing unit |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104361553A CN104361553A (en) | 2015-02-18 |
CN104361553B true CN104361553B (en) | 2017-04-12 |
Family
ID=52528811
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410610231.4A Active CN104361553B (en) | 2014-11-02 | 2014-11-02 | Synchronizing method capable of increasing processing efficiency of graphics processing unit |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104361553B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101267564A (en) * | 2008-04-16 | 2008-09-17 | 中国科学院计算技术研究所 | A multi-processor video coding chip device and method |
CN101673391A (en) * | 2008-09-09 | 2010-03-17 | 索尼株式会社 | Pipelined image processing engine |
CN101710986A (en) * | 2009-11-18 | 2010-05-19 | 中兴通讯股份有限公司 | H.264 parallel decoding method and system based on isostructural multicore processor |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103080921B (en) * | 2010-08-30 | 2015-11-25 | 富士通株式会社 | Multi-core processor system, synchronous control system, sync control device, information generating method |
-
2014
- 2014-11-02 CN CN201410610231.4A patent/CN104361553B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101267564A (en) * | 2008-04-16 | 2008-09-17 | 中国科学院计算技术研究所 | A multi-processor video coding chip device and method |
CN101673391A (en) * | 2008-09-09 | 2010-03-17 | 索尼株式会社 | Pipelined image processing engine |
CN101710986A (en) * | 2009-11-18 | 2010-05-19 | 中兴通讯股份有限公司 | H.264 parallel decoding method and system based on isostructural multicore processor |
Non-Patent Citations (1)
Title |
---|
《基于GPU的并行优化技术》;左颢睿等;《计算机应用研究》;20091130;第26卷(第11期);第4115-4118页 * |
Also Published As
Publication number | Publication date |
---|---|
CN104361553A (en) | 2015-02-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104834561B (en) | A kind of data processing method and device | |
CN103049241B (en) | A kind of method improving CPU+GPU isomery device calculated performance | |
DE102013018915A1 (en) | An approach to power reduction in floating point operations | |
CN109213607B (en) | Multithreading rendering method and device | |
CN109416755A (en) | Artificial intelligence method for parallel processing, device, readable storage medium storing program for executing and terminal | |
CN111209094A (en) | Request processing method and device, electronic equipment and computer readable storage medium | |
CN107977504B (en) | Asymmetric reactor core fuel management calculation method and device and terminal equipment | |
CN109214512A (en) | A kind of parameter exchange method, apparatus, server and the storage medium of deep learning | |
CN106026107A (en) | QR decomposition method of power flow Jacobian matrix for GPU acceleration | |
CN103942788B (en) | High-spectrum remote sensing feature extracting method and device | |
CN113222125A (en) | Convolution operation method and chip | |
CN104361553B (en) | Synchronizing method capable of increasing processing efficiency of graphics processing unit | |
CN109472734A (en) | A kind of target detection network and its implementation based on FPGA | |
WO2014004736A4 (en) | A method or apparatus to perform footprint-based optimization simultaneously with other steps | |
CN104182208A (en) | Method and system utilizing cracking rule to crack password | |
CN103279328A (en) | BlogRank algorithm parallelization processing construction method based on Haloop | |
CN108022201B (en) | Parallel rasterization sequencing method for triangle primitives | |
DE102015014800A1 (en) | Improved SIMD-K next-neighbor implementation | |
CN102194247A (en) | Method for judging graphic element information in modeling process of vector word triangular plate | |
CN103530639A (en) | Picture contour ordered point set extraction method | |
CN105893145B (en) | A kind of method for scheduling task and device based on genetic algorithm | |
CN104050079A (en) | Real-time system testing method based on time automata | |
CN107256203A (en) | The implementation method and device of a kind of matrix-vector multiplication | |
CN102253861A (en) | Method for executing stepwise plug-in computation | |
CN104391929A (en) | Data flow transmitting method in ETL (extract, transform and load) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |