CN102819819A - Implementation method for quickly reading peak in GPU (graphics processing unit) - Google Patents

Implementation method for quickly reading peak in GPU (graphics processing unit) Download PDF

Info

Publication number
CN102819819A
CN102819819A CN2012102879974A CN201210287997A CN102819819A CN 102819819 A CN102819819 A CN 102819819A CN 2012102879974 A CN2012102879974 A CN 2012102879974A CN 201210287997 A CN201210287997 A CN 201210287997A CN 102819819 A CN102819819 A CN 102819819A
Authority
CN
China
Prior art keywords
order
data
summit
fifo
vertex
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012102879974A
Other languages
Chinese (zh)
Other versions
CN102819819B (en
Inventor
焦永
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CHANGSHA JINGJIA MICROELECTRONIC Co Ltd
Original Assignee
CHANGSHA JINGJIA MICROELECTRONIC Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CHANGSHA JINGJIA MICROELECTRONIC Co Ltd filed Critical CHANGSHA JINGJIA MICROELECTRONIC Co Ltd
Priority to CN201210287997.4A priority Critical patent/CN102819819B/en
Publication of CN102819819A publication Critical patent/CN102819819A/en
Application granted granted Critical
Publication of CN102819819B publication Critical patent/CN102819819B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses an implementation method for quickly reading a graphics primitive peak in a GPU (graphics processing unit) design, which comprises the following steps of: storing peak data in sequence, configuring a peak start address, analyzing a drawing command, managing the graphics primitive, reading graphics primitive peak data, and clearing residual data. With the implementation method, the memory bandwidth can be fully utilized, the bus pressure is lightened, and the GPU chip peak passing rate is improved.

Description

A kind of implementation method that reads the summit fast among the GPU
Technical field
The present invention is mainly concerned with the GPU design field, refers in particular among the GPU that drawing command is resolved and primitive vertices is obtained the field.
Background technology
The tissue of vertex data and to read be major issue among the GPU that realizes of fixed flowline.Its quality directly has influence on drawing efficiency.Traditional way is the information such as start address and Stride of in command word, specifying primitive types, component number, number of vertices, each component; Often a drawing command needs a plurality of (7 or more) command word to describe, and the shortcoming of doing like this is: (1) is brought very big pressure because command word is more to pci bus; In drawing process; Pci bus needs to transmit order always, and because the restriction of frequency, the transfer rate of order does not often catch up with render speed; (2) because each component all need be sent out to DDR according to information such as the start address of command word appointment and Stride asks to fetch data; Cause Burst smaller; The delay of reading DDR is just bigger; The data that read a summit often need be sent repeatedly read request and just can be obtained data, can not make full use of the bandwidth of DDR.
Summary of the invention
The problem that the present invention will solve just is: to the shortcoming of prior art existence; The invention provides the implementation structure of getting the summit among a kind of GPU fast; This implementation structure strengthens the mode that reads of Burst value through with the vertex data sequential organization, makes full use of bandwidth of memory; Improve the efficient of getting the summit greatly, adopted this mode can also reduce the number of command word simultaneously.
Implementation method of the present invention need be deposited vertex data according to fixing order, if current pel is a line segment, require the data on each summit of line segment all (to be 32 single-precision floating-point datas according to X, Y, Z, W, R, G, B, A; The horizontal ordinate of difference corresponding vertex, ordinate, depth coordinate; Homogeneous coordinates coefficient, color component red, green, blue, transparency) order deposit continuously, if current pel is a triangle; Require leg-of-mutton each vertex data all (to be 32 single-precision floating-point datas, the horizontal ordinate of difference corresponding vertex, ordinate according to X, Y, Z, W, R, G, B, A, S, T; Depth coordinate; The homogeneous coordinates coefficient, color component red, green, blue, transparency, texture picture horizontal ordinate; 0,0 the texture picture ordinate), (replenishing two 0 data is in order to make 128 alignment of data; The high-bit width that helps DDR) order is deposited, and CPU is through the initial storage address (through configuration pel administration module relevant register can realize primitive control module with this address be start address continuously from DDR get vertex data) of pci bus to primitive control block configuration vertex data simultaneously; CPU sends order through pci bus to the command analysis module then; The command analysis module is through asynchronous FIFO reading order word; If current order is effective order (a pel rendering order or empty fifo command); Then each component of command word is deciphered primitive types and the number of vertices information of obtaining; And send this information to the primitive control module, and send and finish if in drawing process, need to revise the rendering order of rendering parameter (as: texture address switches, transformation matrix switch) or present frame, need send an order that empties FIFO by software.The primitive control module is after receiving the start address of software arrangements; Send start address to reading the vertex data module, owing to all vertex datas are deposited in proper order, so can be with bigger BurstLength (number of bursts; Promptly can return a plurality of vertex datas continuously) send read request to the DDR controller; As long as the FIFO that reads the summit less than, just can continue to send read request according to the address increment order, simultaneously the vertex data that obtains is sent to the primitive control module; The primitive control module sends to graphics module after according to order request these data organizations being become corresponding primitive data; If current what receive is the order that empties FIFO, the primitive control module can empty order according to this and will obtain the FIFO of DDR data and empty, and guarantees that the order of next time sending can not get wrong summit.
Advantage of the present invention just is: 1, make full use of bandwidth of memory: the implementation structure of getting the summit fast that the present invention proposes can send read memory request with bigger BurstLength, makes full use of bandwidth of memory; 2, reduce the command word number:,, can the command word of an order be reduced to 2 by 7 ~ 8 so information such as each the component start address in traditional drawing command word, Stride can be omitted because vertex data is deposited in order.
Summary of the invention
Fig. 1 is a kind of implementation structure of getting the summit fast among the GPU that realizes of the present invention;
Embodiment
Below will combine accompanying drawing and specific embodiment that the present invention is explained further details.
As shown in Figure 1, a kind of implementation structure that reads the summit fast among the GPU.The initial storage address of CPU through pci bus configuration pel vertex data (all according to fixed format deposit by primitive data; The line segment vertex format is X, Y, Z, W, R, G, B, A; The triangular apex form is X, Y, Z, W, R, G, B, A, S, T, 0,0); CPU sends order through pci bus to the command analysis module then; The command analysis module obtains order data through reading asynchronous FIFO, if the primitive control module is deciphered and sent each component in the command word into to lawful order then; The primitive control module is sent request to the DDR controller with bigger BurstLength through reading the vertex data module according to the start address of configuration; After obtaining the return data of DDR; It is write among the FIFO, need only among the FIFO less than just continuing to send read request; The primitive control module reads return data from FIFO, according to the form of command word data set is woven then and send to the drawing streamline; If the current order that obtains is for emptying fifo command, the primitive control module can be sent and emptied signal to FIFO so, and the data among the FIFO are emptied, and guarantees that drafting next time can not read wrong summit.

Claims (2)

1.GPU in a kind of implementation method that reads the summit fast, it is characterized in that: vertex data is deposited in order, if line segment then requires each summit all according to X, Y, Z, W, R, G, B, the A (horizontal ordinate of corresponding vertex respectively; Ordinate, depth coordinate, homogeneous coordinates coefficient; Color component red, green, blue, transparency) order deposit, if triangle then requires each summit according to X, Y, Z, W, R, G, B, A, S, T (horizontal ordinate of corresponding vertex respectively, ordinate; Depth coordinate; The homogeneous coordinates coefficient, color component red, green, blue, transparency, texture picture horizontal ordinate; The texture picture ordinate), 0,0 order deposits, CPU is through the initial storage address of pci bus to primitive control block configuration vertex data simultaneously; CPU transmits order through pci bus to the command analysis module; The command analysis module is through asynchronous FIFO reading order word; If the current command is effective order (rendering order or empty fifo command); Then each component of command word is deciphered primitive types, the number of vertices information of obtaining; And send it to primitive control module, and to send and finish if in drawing process, need to revise the rendering order of rendering parameter (as: texture address switches, matrix switch) or present frame, the command analysis module need be sent an order that empties FIFO.
2. the summit start address that disposes according to pci bus in the claim 1; The primitive control module is sent start address to reading the vertex data module, owing to all vertex datas are deposited in proper order, so can be with bigger BurstLength (number of bursts; Promptly can return a plurality of vertex datas continuously) send read request to the DDR controller; As long as the FIFO that reads the summit less than, just can continue to send read request according to the address increment order, simultaneously the vertex data that obtains is sent to the primitive control module; The primitive control module sends to graphics module after according to order request these data organizations being become corresponding primitive data; If current what receive is the order that empties FIFO, the primitive control module can empty order according to this and will obtain the FIFO of DDR data and empty, and guarantees that the order of next time sending can not get wrong summit.
CN201210287997.4A 2012-08-14 2012-08-14 A kind of implementation method of quick reading summit in GPU Active CN102819819B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210287997.4A CN102819819B (en) 2012-08-14 2012-08-14 A kind of implementation method of quick reading summit in GPU

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210287997.4A CN102819819B (en) 2012-08-14 2012-08-14 A kind of implementation method of quick reading summit in GPU

Publications (2)

Publication Number Publication Date
CN102819819A true CN102819819A (en) 2012-12-12
CN102819819B CN102819819B (en) 2015-09-16

Family

ID=47303926

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210287997.4A Active CN102819819B (en) 2012-08-14 2012-08-14 A kind of implementation method of quick reading summit in GPU

Country Status (1)

Country Link
CN (1) CN102819819B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559078A (en) * 2013-11-08 2014-02-05 华为技术有限公司 GPU (Graphics Processing Unit) virtualization realization method as well as vertex data caching method and related device
CN108520489A (en) * 2018-04-12 2018-09-11 长沙景美集成电路设计有限公司 It is a kind of in GPU to realize that command analysis and vertex obtain parallel device and method
CN111915475A (en) * 2020-07-10 2020-11-10 长沙景嘉微电子股份有限公司 Drawing command processing method, GPU, host, terminal and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6018353A (en) * 1995-08-04 2000-01-25 Sun Microsystems, Inc. Three-dimensional graphics accelerator with an improved vertex buffer for more efficient vertex processing
CN1702692A (en) * 2004-05-03 2005-11-30 微软公司 System and method for providing an enhanced graphics pipeline
CN102096897A (en) * 2011-03-17 2011-06-15 长沙景嘉微电子有限公司 Realization of tile cache strategy in graphics processing unit (GPU) based on tile based rendering

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6018353A (en) * 1995-08-04 2000-01-25 Sun Microsystems, Inc. Three-dimensional graphics accelerator with an improved vertex buffer for more efficient vertex processing
CN1702692A (en) * 2004-05-03 2005-11-30 微软公司 System and method for providing an enhanced graphics pipeline
CN102096897A (en) * 2011-03-17 2011-06-15 长沙景嘉微电子有限公司 Realization of tile cache strategy in graphics processing unit (GPU) based on tile based rendering

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559078A (en) * 2013-11-08 2014-02-05 华为技术有限公司 GPU (Graphics Processing Unit) virtualization realization method as well as vertex data caching method and related device
WO2015067043A1 (en) * 2013-11-08 2015-05-14 华为技术有限公司 Gpu virtualization realization method as well as vertex data caching method and related device
CN103559078B (en) * 2013-11-08 2017-04-26 华为技术有限公司 GPU (Graphics Processing Unit) virtualization realization method as well as vertex data caching method and related device
CN108520489A (en) * 2018-04-12 2018-09-11 长沙景美集成电路设计有限公司 It is a kind of in GPU to realize that command analysis and vertex obtain parallel device and method
CN108520489B (en) * 2018-04-12 2022-12-06 长沙景美集成电路设计有限公司 Device and method for realizing command analysis and vertex acquisition parallel in GPU
CN111915475A (en) * 2020-07-10 2020-11-10 长沙景嘉微电子股份有限公司 Drawing command processing method, GPU, host, terminal and medium
WO2022007206A1 (en) * 2020-07-10 2022-01-13 长沙景嘉微电子股份有限公司 Drawing command processing method, gpu, main device, terminal, and medium
CN111915475B (en) * 2020-07-10 2024-04-05 长沙景嘉微电子股份有限公司 Processing method of drawing command, GPU, host, terminal and medium

Also Published As

Publication number Publication date
CN102819819B (en) 2015-09-16

Similar Documents

Publication Publication Date Title
CN105630441B (en) A kind of GPU system based on unified staining technique
CN101639929B (en) Graphics processing systems
KR20170103649A (en) Method and apparatus for accessing texture data using buffers
CN1270278C (en) Z-buffer technology for figure heightening
US9760968B2 (en) Reduction of graphical processing through coverage testing
CN105741237B (en) A kind of hardware implementation method based on FPGA Image Reversal
CN103793893A (en) Primitive re-ordering between world-space and screen-space pipelines with buffer limited processing
KR101683556B1 (en) Apparatus and method for tile-based rendering
CN108958800A (en) A kind of DDR management control system accelerated based on FPGA hardware
US10769837B2 (en) Apparatus and method for performing tile-based rendering using prefetched graphics data
CN103077132B (en) A kind of cache handles method and protocol processor high-speed cache control module
CN101958112B (en) Method for realizing rotation of handheld device screen pictures by 90 degrees and 270 degrees simultaneously
CN103380417A (en) Techniques to request stored data from a memory
US10198789B2 (en) Out-of-order cache returns
US10430989B2 (en) Multi-pass rendering in a screen space pipeline
CN102314400B (en) Method and device for dispersing converged DMA (Direct Memory Access)
CN102819819A (en) Implementation method for quickly reading peak in GPU (graphics processing unit)
US9196014B2 (en) Buffer clearing apparatus and method for computer graphics
CN201927324U (en) Color liquid crystal screen display control device based on SPI (single program initiation) serial or parallel interface
US20200013137A1 (en) Fixed-stride draw tables for tiled rendering
CN103838694B (en) FPGA high-speed USB interface data reading method
WO2023202367A1 (en) Graphics processing unit, system, apparatus, device, and method
CN104836973B (en) A kind of high definition LED display video data R-T unit and method of data flow control
CN101216931A (en) 3D graphical display superposition device based on OpenGL
CN104461967B (en) It is a kind of to support synchronous and asynchronous transfer mode parallel data grabbing card

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C53 Correction of patent of invention or patent application
CB03 Change of inventor or designer information

Inventor after: Rao Xianhong

Inventor before: Jiao Yong

COR Change of bibliographic data

Free format text: CORRECT: INVENTOR; FROM: JIAO YONG TO: RAO XIANHONG

C14 Grant of patent or utility model
GR01 Patent grant