CN102819819A - Implementation method for quickly reading peak in GPU (graphics processing unit) - Google Patents
Implementation method for quickly reading peak in GPU (graphics processing unit) Download PDFInfo
- Publication number
- CN102819819A CN102819819A CN2012102879974A CN201210287997A CN102819819A CN 102819819 A CN102819819 A CN 102819819A CN 2012102879974 A CN2012102879974 A CN 2012102879974A CN 201210287997 A CN201210287997 A CN 201210287997A CN 102819819 A CN102819819 A CN 102819819A
- Authority
- CN
- China
- Prior art keywords
- order
- data
- summit
- fifo
- vertex
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Image Generation (AREA)
Abstract
The invention discloses an implementation method for quickly reading a graphics primitive peak in a GPU (graphics processing unit) design, which comprises the following steps of: storing peak data in sequence, configuring a peak start address, analyzing a drawing command, managing the graphics primitive, reading graphics primitive peak data, and clearing residual data. With the implementation method, the memory bandwidth can be fully utilized, the bus pressure is lightened, and the GPU chip peak passing rate is improved.
Description
Technical field
The present invention is mainly concerned with the GPU design field, refers in particular among the GPU that drawing command is resolved and primitive vertices is obtained the field.
Background technology
The tissue of vertex data and to read be major issue among the GPU that realizes of fixed flowline.Its quality directly has influence on drawing efficiency.Traditional way is the information such as start address and Stride of in command word, specifying primitive types, component number, number of vertices, each component; Often a drawing command needs a plurality of (7 or more) command word to describe, and the shortcoming of doing like this is: (1) is brought very big pressure because command word is more to pci bus; In drawing process; Pci bus needs to transmit order always, and because the restriction of frequency, the transfer rate of order does not often catch up with render speed; (2) because each component all need be sent out to DDR according to information such as the start address of command word appointment and Stride asks to fetch data; Cause Burst smaller; The delay of reading DDR is just bigger; The data that read a summit often need be sent repeatedly read request and just can be obtained data, can not make full use of the bandwidth of DDR.
Summary of the invention
The problem that the present invention will solve just is: to the shortcoming of prior art existence; The invention provides the implementation structure of getting the summit among a kind of GPU fast; This implementation structure strengthens the mode that reads of Burst value through with the vertex data sequential organization, makes full use of bandwidth of memory; Improve the efficient of getting the summit greatly, adopted this mode can also reduce the number of command word simultaneously.
Implementation method of the present invention need be deposited vertex data according to fixing order, if current pel is a line segment, require the data on each summit of line segment all (to be 32 single-precision floating-point datas according to X, Y, Z, W, R, G, B, A; The horizontal ordinate of difference corresponding vertex, ordinate, depth coordinate; Homogeneous coordinates coefficient, color component red, green, blue, transparency) order deposit continuously, if current pel is a triangle; Require leg-of-mutton each vertex data all (to be 32 single-precision floating-point datas, the horizontal ordinate of difference corresponding vertex, ordinate according to X, Y, Z, W, R, G, B, A, S, T; Depth coordinate; The homogeneous coordinates coefficient, color component red, green, blue, transparency, texture picture horizontal ordinate; 0,0 the texture picture ordinate), (replenishing two 0 data is in order to make 128 alignment of data; The high-bit width that helps DDR) order is deposited, and CPU is through the initial storage address (through configuration pel administration module relevant register can realize primitive control module with this address be start address continuously from DDR get vertex data) of pci bus to primitive control block configuration vertex data simultaneously; CPU sends order through pci bus to the command analysis module then; The command analysis module is through asynchronous FIFO reading order word; If current order is effective order (a pel rendering order or empty fifo command); Then each component of command word is deciphered primitive types and the number of vertices information of obtaining; And send this information to the primitive control module, and send and finish if in drawing process, need to revise the rendering order of rendering parameter (as: texture address switches, transformation matrix switch) or present frame, need send an order that empties FIFO by software.The primitive control module is after receiving the start address of software arrangements; Send start address to reading the vertex data module, owing to all vertex datas are deposited in proper order, so can be with bigger BurstLength (number of bursts; Promptly can return a plurality of vertex datas continuously) send read request to the DDR controller; As long as the FIFO that reads the summit less than, just can continue to send read request according to the address increment order, simultaneously the vertex data that obtains is sent to the primitive control module; The primitive control module sends to graphics module after according to order request these data organizations being become corresponding primitive data; If current what receive is the order that empties FIFO, the primitive control module can empty order according to this and will obtain the FIFO of DDR data and empty, and guarantees that the order of next time sending can not get wrong summit.
Advantage of the present invention just is: 1, make full use of bandwidth of memory: the implementation structure of getting the summit fast that the present invention proposes can send read memory request with bigger BurstLength, makes full use of bandwidth of memory; 2, reduce the command word number:,, can the command word of an order be reduced to 2 by 7 ~ 8 so information such as each the component start address in traditional drawing command word, Stride can be omitted because vertex data is deposited in order.
Summary of the invention
Fig. 1 is a kind of implementation structure of getting the summit fast among the GPU that realizes of the present invention;
Embodiment
Below will combine accompanying drawing and specific embodiment that the present invention is explained further details.
As shown in Figure 1, a kind of implementation structure that reads the summit fast among the GPU.The initial storage address of CPU through pci bus configuration pel vertex data (all according to fixed format deposit by primitive data; The line segment vertex format is X, Y, Z, W, R, G, B, A; The triangular apex form is X, Y, Z, W, R, G, B, A, S, T, 0,0); CPU sends order through pci bus to the command analysis module then; The command analysis module obtains order data through reading asynchronous FIFO, if the primitive control module is deciphered and sent each component in the command word into to lawful order then; The primitive control module is sent request to the DDR controller with bigger BurstLength through reading the vertex data module according to the start address of configuration; After obtaining the return data of DDR; It is write among the FIFO, need only among the FIFO less than just continuing to send read request; The primitive control module reads return data from FIFO, according to the form of command word data set is woven then and send to the drawing streamline; If the current order that obtains is for emptying fifo command, the primitive control module can be sent and emptied signal to FIFO so, and the data among the FIFO are emptied, and guarantees that drafting next time can not read wrong summit.
Claims (2)
1.GPU in a kind of implementation method that reads the summit fast, it is characterized in that: vertex data is deposited in order, if line segment then requires each summit all according to X, Y, Z, W, R, G, B, the A (horizontal ordinate of corresponding vertex respectively; Ordinate, depth coordinate, homogeneous coordinates coefficient; Color component red, green, blue, transparency) order deposit, if triangle then requires each summit according to X, Y, Z, W, R, G, B, A, S, T (horizontal ordinate of corresponding vertex respectively, ordinate; Depth coordinate; The homogeneous coordinates coefficient, color component red, green, blue, transparency, texture picture horizontal ordinate; The texture picture ordinate), 0,0 order deposits, CPU is through the initial storage address of pci bus to primitive control block configuration vertex data simultaneously; CPU transmits order through pci bus to the command analysis module; The command analysis module is through asynchronous FIFO reading order word; If the current command is effective order (rendering order or empty fifo command); Then each component of command word is deciphered primitive types, the number of vertices information of obtaining; And send it to primitive control module, and to send and finish if in drawing process, need to revise the rendering order of rendering parameter (as: texture address switches, matrix switch) or present frame, the command analysis module need be sent an order that empties FIFO.
2. the summit start address that disposes according to pci bus in the claim 1; The primitive control module is sent start address to reading the vertex data module, owing to all vertex datas are deposited in proper order, so can be with bigger BurstLength (number of bursts; Promptly can return a plurality of vertex datas continuously) send read request to the DDR controller; As long as the FIFO that reads the summit less than, just can continue to send read request according to the address increment order, simultaneously the vertex data that obtains is sent to the primitive control module; The primitive control module sends to graphics module after according to order request these data organizations being become corresponding primitive data; If current what receive is the order that empties FIFO, the primitive control module can empty order according to this and will obtain the FIFO of DDR data and empty, and guarantees that the order of next time sending can not get wrong summit.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210287997.4A CN102819819B (en) | 2012-08-14 | 2012-08-14 | A kind of implementation method of quick reading summit in GPU |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210287997.4A CN102819819B (en) | 2012-08-14 | 2012-08-14 | A kind of implementation method of quick reading summit in GPU |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102819819A true CN102819819A (en) | 2012-12-12 |
CN102819819B CN102819819B (en) | 2015-09-16 |
Family
ID=47303926
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210287997.4A Active CN102819819B (en) | 2012-08-14 | 2012-08-14 | A kind of implementation method of quick reading summit in GPU |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102819819B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103559078A (en) * | 2013-11-08 | 2014-02-05 | 华为技术有限公司 | GPU (Graphics Processing Unit) virtualization realization method as well as vertex data caching method and related device |
CN108520489A (en) * | 2018-04-12 | 2018-09-11 | 长沙景美集成电路设计有限公司 | It is a kind of in GPU to realize that command analysis and vertex obtain parallel device and method |
CN111915475A (en) * | 2020-07-10 | 2020-11-10 | 长沙景嘉微电子股份有限公司 | Drawing command processing method, GPU, host, terminal and medium |
CN112581350A (en) * | 2020-12-05 | 2021-03-30 | 西安翔腾微电子科技有限公司 | Drawing command synchronization method based on continuous primitives |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6018353A (en) * | 1995-08-04 | 2000-01-25 | Sun Microsystems, Inc. | Three-dimensional graphics accelerator with an improved vertex buffer for more efficient vertex processing |
CN1702692A (en) * | 2004-05-03 | 2005-11-30 | 微软公司 | System and method for providing an enhanced graphics pipeline |
CN102096897A (en) * | 2011-03-17 | 2011-06-15 | 长沙景嘉微电子有限公司 | Realization of tile cache strategy in graphics processing unit (GPU) based on tile based rendering |
-
2012
- 2012-08-14 CN CN201210287997.4A patent/CN102819819B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6018353A (en) * | 1995-08-04 | 2000-01-25 | Sun Microsystems, Inc. | Three-dimensional graphics accelerator with an improved vertex buffer for more efficient vertex processing |
CN1702692A (en) * | 2004-05-03 | 2005-11-30 | 微软公司 | System and method for providing an enhanced graphics pipeline |
CN102096897A (en) * | 2011-03-17 | 2011-06-15 | 长沙景嘉微电子有限公司 | Realization of tile cache strategy in graphics processing unit (GPU) based on tile based rendering |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103559078A (en) * | 2013-11-08 | 2014-02-05 | 华为技术有限公司 | GPU (Graphics Processing Unit) virtualization realization method as well as vertex data caching method and related device |
WO2015067043A1 (en) * | 2013-11-08 | 2015-05-14 | 华为技术有限公司 | Gpu virtualization realization method as well as vertex data caching method and related device |
CN103559078B (en) * | 2013-11-08 | 2017-04-26 | 华为技术有限公司 | GPU (Graphics Processing Unit) virtualization realization method as well as vertex data caching method and related device |
CN108520489A (en) * | 2018-04-12 | 2018-09-11 | 长沙景美集成电路设计有限公司 | It is a kind of in GPU to realize that command analysis and vertex obtain parallel device and method |
CN108520489B (en) * | 2018-04-12 | 2022-12-06 | 长沙景美集成电路设计有限公司 | Device and method for realizing command analysis and vertex acquisition parallel in GPU |
CN111915475A (en) * | 2020-07-10 | 2020-11-10 | 长沙景嘉微电子股份有限公司 | Drawing command processing method, GPU, host, terminal and medium |
WO2022007206A1 (en) * | 2020-07-10 | 2022-01-13 | 长沙景嘉微电子股份有限公司 | Drawing command processing method, gpu, main device, terminal, and medium |
CN111915475B (en) * | 2020-07-10 | 2024-04-05 | 长沙景嘉微电子股份有限公司 | Processing method of drawing command, GPU, host, terminal and medium |
CN112581350A (en) * | 2020-12-05 | 2021-03-30 | 西安翔腾微电子科技有限公司 | Drawing command synchronization method based on continuous primitives |
Also Published As
Publication number | Publication date |
---|---|
CN102819819B (en) | 2015-09-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105630441B (en) | A kind of GPU system based on unified staining technique | |
CN104881666B (en) | A kind of real-time bianry image connected component labeling implementation method based on FPGA | |
CN101639929B (en) | Graphics processing systems | |
US9760968B2 (en) | Reduction of graphical processing through coverage testing | |
KR20170103667A (en) | An efficient low-power texture cache architecture | |
CN102819820B (en) | Tiled rendering based implementation method for multi-pipeline rendering in GPU (graphics processing unit) | |
CN103793893A (en) | Primitive re-ordering between world-space and screen-space pipelines with buffer limited processing | |
KR101683556B1 (en) | Apparatus and method for tile-based rendering | |
CN103380417B (en) | The method and system of the data for being stored from memory requests | |
CN108958800A (en) | A kind of DDR management control system accelerated based on FPGA hardware | |
CN103077132B (en) | A kind of cache handles method and protocol processor high-speed cache control module | |
US10769837B2 (en) | Apparatus and method for performing tile-based rendering using prefetched graphics data | |
CN101958112B (en) | Method for realizing rotation of handheld device screen pictures by 90 degrees and 270 degrees simultaneously | |
CN102819819A (en) | Implementation method for quickly reading peak in GPU (graphics processing unit) | |
CN103760525B (en) | Completion type in-place matrix transposition method | |
US10430989B2 (en) | Multi-pass rendering in a screen space pipeline | |
CN102314400B (en) | Method and device for dispersing converged DMA (Direct Memory Access) | |
US9196014B2 (en) | Buffer clearing apparatus and method for computer graphics | |
CN201927324U (en) | Color liquid crystal screen display control device based on SPI (single program initiation) serial or parallel interface | |
US20200013137A1 (en) | Fixed-stride draw tables for tiled rendering | |
CN114302087A (en) | MIPI data transmission mode conversion method and device and electronic equipment | |
CN103838694B (en) | FPGA high-speed USB interface data reading method | |
WO2023202367A1 (en) | Graphics processing unit, system, apparatus, device, and method | |
CN104836973B (en) | A kind of high definition LED display video data R-T unit and method of data flow control | |
CN101216931A (en) | 3D graphical display superposition device based on OpenGL |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C53 | Correction of patent of invention or patent application | ||
CB03 | Change of inventor or designer information |
Inventor after: Rao Xianhong Inventor before: Jiao Yong |
|
COR | Change of bibliographic data |
Free format text: CORRECT: INVENTOR; FROM: JIAO YONG TO: RAO XIANHONG |
|
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |