CN104361553B - Synchronizing method capable of increasing processing efficiency of graphics processing unit - Google Patents

Synchronizing method capable of increasing processing efficiency of graphics processing unit Download PDF

Info

Publication number
CN104361553B
CN104361553B CN201410610231.4A CN201410610231A CN104361553B CN 104361553 B CN104361553 B CN 104361553B CN 201410610231 A CN201410610231 A CN 201410610231A CN 104361553 B CN104361553 B CN 104361553B
Authority
CN
China
Prior art keywords
process unit
graphic process
processing unit
synchronous input
graphics processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410610231.4A
Other languages
Chinese (zh)
Other versions
CN104361553A (en
Inventor
左颢睿
徐智勇
魏宇星
张建林
欧阳益民
许俊平
祁小平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Optics and Electronics of CAS
Original Assignee
Institute of Optics and Electronics of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Optics and Electronics of CAS filed Critical Institute of Optics and Electronics of CAS
Priority to CN201410610231.4A priority Critical patent/CN104361553B/en
Publication of CN104361553A publication Critical patent/CN104361553A/en
Application granted granted Critical
Publication of CN104361553B publication Critical patent/CN104361553B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention provides a synchronizing method capable of increasing the processing efficiency of a graphics processing unit. The synchronizing method includes: after the current executing core of the graphics processing unit enters synchronization, a synchronous input vector and a synchronous output vector are built in the graphics processing unit; the graphics processing unit updates signs in the synchronous input vector; the graphics processing unit performs circulating inquiry on the signs in the synchronous input vector, quits circulating when the fact that the synchronous input signs are already updated is inquired, and updates the signs in the synchronous output vector; the graphics processing unit performs circulating inquiry on the signs in the synchronous output vector and quits circulating when the fact that all the synchronous output signs are already updated is inquired; synchronization of the current executing core inside the graphics processing unit is completed. By the synchronizing method, direct fast synchronization inside the graphics processing unit can be performed when the graphics processing unit executes a multi-core processing task, the graphics processing unit is prevented from repeatedly returning to a computer system for loading and synchronization, and the processing efficiency of the graphics processing unit is increased.

Description

A kind of synchronous method for improving graphic process unit treatment effeciency
Technical field
The invention belongs to high-performance calculation is particularly parallel computation field.Specifically related to a kind of graphic process unit that improves is processed The synchronous method of efficiency.
Background technology
Graphic process unit is a kind of high performance parallel processor, is widely used to multiple fields, including image procossing, gold Melt the fields such as analysis, oil exploration, physical modeling and meteorologic analysis.The task of graphic process unit processes model and can be divided into three Level:Core level, block level, thread-level.Each task can include multiple cores, and each core can include multiple pieces, and each block can be wrapped Containing multiple threads.
However, at present graphic process unit is mainly used in laboratory simulations or processing system afterwards, seldom it is applied in real time Processing system.This is primarily due to current all of graphic process unit and is required to computer system be controlled, computer system Load core carries out computing to graphic process unit, after graphic process unit has processed a core, it is necessary to exit graphic process unit simultaneously Computer system is returned, for completing the synchronization of result.When a task is made up of multiple cores, computer system needs Load multiple core, graphic process unit is also required to repeatedly to start and return computer system, this strong influence image processor Treatment effeciency.Cause in many cases, to start and the time of synchronizing pattern processor result substantially exceeds graphics process Between device process task.
At present, graphic process unit is only providing the synchronization of thread-level, does not provide block level synchronous method, it is necessary to exit Graphic process unit returns computer system and can just complete block level synchronization, and this is to cause graphic process unit performing multiple core compositions times Efficiency low basic reason during business.The utilization ratio improved by graphic process unit, in the urgent need to synchronization side between a kind of efficient block Method.
The content of the invention
The technology of the present invention solve problem:For the deficiencies in the prior art, there is provided a kind of to improve graphic process unit treatment effeciency Synchronous method, ensure result it is consistent on the basis of, improve graphic process unit treatment effeciency.
The purpose as realizing, technical scheme:A kind of synchronization side for improving graphic process unit treatment effeciency Method, comprises the steps:
Step one, graphic process unit process one be made up of n core task when (n >=2), it is current to perform core t entrance Synchronous (1≤t < n) is processed;
Step 2, graphic process unit set up synchronous input vector and synchronism output vector, when the process point of graphic process unit When block number mesh is m (m >=1).Synchronous input vector A is initially set up, the size of synchronous input vector A is equal to the place of graphic process unit Reason piecemeal number m, is expressed as A { i }, i={ 1,2 ..., m };Synchronism output vector B is then set up, synchronism output vector B's is big The little process piecemeal number m equal to graphic process unit, is expressed as B { i }, i={ 1,2 ..., m };
Step 3, graphic process unit update synchronous input vector, and it is Flag that block i updates synchronous input vector A [i] value;
Step 4, graphic process unit judge whether synchronous input vector completes to update, with any one block to synchronous input The value of vectorial A is circulated inquiry, judges whether the mark in synchronous input vector A is all updated to Flag, if completed Update, execution step five, outstanding updates continue executing with step 4;
Step 5, graphic process unit update synchronism output vector, update synchronism output vector B's using the block in step 4 It is worth for Flag;
Step 6, graphic process unit judge whether synchronism output vector completes to update, and block i judges synchronism output vector B [i] Whether value is Flag, if all values are updated to Flag in synchronism output vector B, completes to update, and execution step seven is not complete Into renewal, step 6 is continued executing with;
Step 7, graphic process unit complete the synchronization process for currently performing core t.
Present invention beneficial effect compared with prior art is:
The synchronous method for improving graphic process unit treatment effeciency proposed by the present invention, in the task that execution is made up of multiple cores When, can complete between block synchronous in graphic process unit, it is to avoid graphic process unit repeatedly returns department of computer science because synchronous between block System, the low problem of graphic process unit treatment effeciency.
Description of the drawings
Fig. 1 is the inventive method flowchart;
Fig. 2 is a task schematic diagram being made up of multiple cores, and the task is divided into n core, and each core is made up of m block, Each block is made up of u thread;
Fig. 3 is that graphic process unit is processing processing stream when constituting task by multinuclear as shown in Figure 2 using after the present invention Cheng Tu.
Specific embodiment
In order to be better understood by technical scheme, below in conjunction with the accompanying drawings embodiments of the invention are made specifically It is bright.
Process one it is as shown in Figure 2 be made up of multiple cores task when, as shown in figure 1, the invention provides a kind of The synchronous method of graphic process unit treatment effeciency is improved, implementation steps are as follows:
, when the task that is made up of n core is processed, the current core that performs enters synchronous for step one, graphic process unit:Figure Shape processor after t-th core (1≤t < n) completes process task, into synchronization process;
Step 2, graphic process unit set up synchronous input vector and synchronism output vector:When the process point of graphic process unit When block number mesh is m, synchronous input vector A is initially set up, the size of synchronous input vector A is equal to the process piecemeal of graphic process unit Number m, is expressed as A { i }, i={ 1,2 ..., m };Synchronism output vector B is then set up, the size of synchronism output vector B is equal to The process piecemeal number m of graphic process unit, is expressed as B { i }, i={ 1,2 ..., m };
Step 3, graphic process unit update synchronous input vector:Block i updates synchronization input vector A [i] value for Flag, example As block 5 updates the A [5] of synchronous input vector A so that A [5]=Flag;
Step 4, graphic process unit judge whether synchronous input vector completes to update:Use any one block, this example block 1 Inquiry is circulated to the value of synchronous input vector A, judges whether the mark in synchronous input vector A is all updated to Flag, if (1≤i≤m) each value is equal to Flag in A [i], indicates that synchronous input vector A completes to update, performs Step 5, if there is any one A [i] to be not equal to Flag, indicates outstanding updates, continues executing with step 4;
Step 5, graphic process unit update synchronism output vector:Synchronism output vector B is updated using the block 1 in step 4 Value be Flag so that any one B [i]=Flag, (1≤i≤m);
Step 6, graphic process unit judge whether synchronism output vector completes to update:Block i judges synchronism output vector B [i] Whether value (1≤i≤m) is Flag, if all values are updated to Flag in synchronism output vector B, if each in B [i] Individual value is equal to Flag, then indicate that synchronism output vector B completes to update, execution step seven, if there is any one B [i] In Flag, synchronism output vector B outstanding updates are indicated, step 6 is continued executing with;
Step 7, graphic process unit complete the synchronization process of currently processed core t.
As shown in figure 3, using the present invention, when the task that is made up of n core is processed, computer system only needs loading 1 core after graphic process unit only need to be processed all n cores are completed, 1 property of final result is returned and is calculated to graphic process unit Machine system, comparison with standard handling process have been saved n-1 core load time and n-1 graphic process unit synchronization process result and have been returned The time of returning, improve the treatment effeciency of graphic process unit.
Non-elaborated part of the present invention belongs to the known technology of those skilled in the art.
Those of ordinary skill in the art it should be appreciated that the embodiment of the above be intended merely to explanation the present invention, And limitation of the invention is not intended as, as long as in the spirit of the present invention, the change to above-described embodiment becomes Type will all fall in the range of claims of the present invention.

Claims (1)

1. a kind of synchronous method for improving graphic process unit treatment effeciency, its feature comprises the steps:
, when the task that is made up of n core is processed, the current core t that performs enters synchronization process for step one, graphic process unit;Its In, n >=2,1≤t < n;
Step 2, graphic process unit set up synchronous input vector and synchronism output vector, when the process block count of graphic process unit When mesh is m, wherein, m >=1 initially sets up synchronous input vector A, and the size of synchronous input vector A is equal to the place of graphic process unit Reason piecemeal number m, is expressed as A { i }, i={ 1,2 ..., m };Synchronism output vector B is then set up, synchronism output vector B's is big The little process piecemeal number m equal to graphic process unit, is expressed as B { i }, i={ 1,2 ..., m };
Step 3, graphic process unit update synchronous input vector, and it is Flag that block i updates synchronous input vector A [i] value;
Step 4, graphic process unit judge whether synchronous input vector completes to update, with any one block to synchronous input vector A Value be circulated inquiry, judge whether the mark in synchronous input vector A is all updated to Flag, if completing to update, Execution step five, outstanding updates continue executing with step 4;
Step 5, graphic process unit update synchronism output vector, and the value for updating synchronism output vector B using the block in step 4 is Flag;
Step 6, graphic process unit judge whether synchronism output vector completes to update, and block i judges that synchronism output vector B [i] value is No if all values are updated to Flag in synchronism output vector B, to complete to update for Flag, execution step seven is not completed more Newly, continue executing with step 6;
Step 7, graphic process unit complete the synchronization process for currently performing core t.
CN201410610231.4A 2014-11-02 2014-11-02 Synchronizing method capable of increasing processing efficiency of graphics processing unit Active CN104361553B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410610231.4A CN104361553B (en) 2014-11-02 2014-11-02 Synchronizing method capable of increasing processing efficiency of graphics processing unit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410610231.4A CN104361553B (en) 2014-11-02 2014-11-02 Synchronizing method capable of increasing processing efficiency of graphics processing unit

Publications (2)

Publication Number Publication Date
CN104361553A CN104361553A (en) 2015-02-18
CN104361553B true CN104361553B (en) 2017-04-12

Family

ID=52528811

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410610231.4A Active CN104361553B (en) 2014-11-02 2014-11-02 Synchronizing method capable of increasing processing efficiency of graphics processing unit

Country Status (1)

Country Link
CN (1) CN104361553B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101267564A (en) * 2008-04-16 2008-09-17 中国科学院计算技术研究所 A multi-processor video coding chip device and method
CN101673391A (en) * 2008-09-09 2010-03-17 索尼株式会社 Pipelined image processing engine
CN101710986A (en) * 2009-11-18 2010-05-19 中兴通讯股份有限公司 H.264 parallel decoding method and system based on isostructural multicore processor

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103080921B (en) * 2010-08-30 2015-11-25 富士通株式会社 Multi-core processor system, synchronous control system, sync control device, information generating method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101267564A (en) * 2008-04-16 2008-09-17 中国科学院计算技术研究所 A multi-processor video coding chip device and method
CN101673391A (en) * 2008-09-09 2010-03-17 索尼株式会社 Pipelined image processing engine
CN101710986A (en) * 2009-11-18 2010-05-19 中兴通讯股份有限公司 H.264 parallel decoding method and system based on isostructural multicore processor

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《基于GPU的并行优化技术》;左颢睿等;《计算机应用研究》;20091130;第26卷(第11期);第4115-4118页 *

Also Published As

Publication number Publication date
CN104361553A (en) 2015-02-18

Similar Documents

Publication Publication Date Title
CN104834561B (en) A kind of data processing method and device
CN103049241B (en) A kind of method improving CPU+GPU isomery device calculated performance
DE102013018915A1 (en) An approach to power reduction in floating point operations
CN109213607B (en) Multithreading rendering method and device
CN109416755A (en) Artificial intelligence method for parallel processing, device, readable storage medium storing program for executing and terminal
CN111209094A (en) Request processing method and device, electronic equipment and computer readable storage medium
CN107977504B (en) Asymmetric reactor core fuel management calculation method and device and terminal equipment
CN109214512A (en) A kind of parameter exchange method, apparatus, server and the storage medium of deep learning
CN106026107A (en) QR decomposition method of power flow Jacobian matrix for GPU acceleration
CN103942788B (en) High-spectrum remote sensing feature extracting method and device
CN113222125A (en) Convolution operation method and chip
CN104361553B (en) Synchronizing method capable of increasing processing efficiency of graphics processing unit
CN109472734A (en) A kind of target detection network and its implementation based on FPGA
WO2014004736A4 (en) A method or apparatus to perform footprint-based optimization simultaneously with other steps
CN104182208A (en) Method and system utilizing cracking rule to crack password
CN103279328A (en) BlogRank algorithm parallelization processing construction method based on Haloop
CN108022201B (en) Parallel rasterization sequencing method for triangle primitives
DE102015014800A1 (en) Improved SIMD-K next-neighbor implementation
CN102194247A (en) Method for judging graphic element information in modeling process of vector word triangular plate
CN103530639A (en) Picture contour ordered point set extraction method
CN105893145B (en) A kind of method for scheduling task and device based on genetic algorithm
CN104050079A (en) Real-time system testing method based on time automata
CN107256203A (en) The implementation method and device of a kind of matrix-vector multiplication
CN102253861A (en) Method for executing stepwise plug-in computation
CN104391929A (en) Data flow transmitting method in ETL (extract, transform and load)

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant