CN102819819A

CN102819819A - Implementation method for quickly reading peak in GPU (graphics processing unit)

Info

Publication number: CN102819819A
Application number: CN2012102879974A
Authority: CN
Inventors: 焦永
Original assignee: CHANGSHA JINGJIA MICROELECTRONIC Co Ltd
Current assignee: CHANGSHA JINGJIA MICROELECTRONIC Co Ltd
Priority date: 2012-08-14
Filing date: 2012-08-14
Publication date: 2012-12-12
Anticipated expiration: 2032-08-14
Also published as: CN102819819B

Abstract

The invention discloses an implementation method for quickly reading a graphics primitive peak in a GPU (graphics processing unit) design, which comprises the following steps of: storing peak data in sequence, configuring a peak start address, analyzing a drawing command, managing the graphics primitive, reading graphics primitive peak data, and clearing residual data. With the implementation method, the memory bandwidth can be fully utilized, the bus pressure is lightened, and the GPU chip peak passing rate is improved.

Description

A kind of implementation method that reads the summit fast among the GPU

Technical field

The present invention is mainly concerned with the GPU design field, refers in particular among the GPU that drawing command is resolved and primitive vertices is obtained the field.

Background technology

The tissue of vertex data and to read be major issue among the GPU that realizes of fixed flowline.Its quality directly has influence on drawing efficiency.Traditional way is the information such as start address and Stride of in command word, specifying primitive types, component number, number of vertices, each component; Often a drawing command needs a plurality of (7 or more) command word to describe, and the shortcoming of doing like this is: (1) is brought very big pressure because command word is more to pci bus; In drawing process; Pci bus needs to transmit order always, and because the restriction of frequency, the transfer rate of order does not often catch up with render speed; (2) because each component all need be sent out to DDR according to information such as the start address of command word appointment and Stride asks to fetch data; Cause Burst smaller; The delay of reading DDR is just bigger; The data that read a summit often need be sent repeatedly read request and just can be obtained data, can not make full use of the bandwidth of DDR.

Summary of the invention

The problem that the present invention will solve just is: to the shortcoming of prior art existence; The invention provides the implementation structure of getting the summit among a kind of GPU fast; This implementation structure strengthens the mode that reads of Burst value through with the vertex data sequential organization, makes full use of bandwidth of memory; Improve the efficient of getting the summit greatly, adopted this mode can also reduce the number of command word simultaneously.

Implementation method of the present invention need be deposited vertex data according to fixing order, if current pel is a line segment, require the data on each summit of line segment all (to be 32 single-precision floating-point datas according to X, Y, Z, W, R, G, B, A; The horizontal ordinate of difference corresponding vertex, ordinate, depth coordinate; Homogeneous coordinates coefficient, color component red, green, blue, transparency) order deposit continuously, if current pel is a triangle; Require leg-of-mutton each vertex data all (to be 32 single-precision floating-point datas, the horizontal ordinate of difference corresponding vertex, ordinate according to X, Y, Z, W, R, G, B, A, S, T; Depth coordinate; The homogeneous coordinates coefficient, color component red, green, blue, transparency, texture picture horizontal ordinate; 0,0 the texture picture ordinate), (replenishing two 0 data is in order to make 128 alignment of data; The high-bit width that helps DDR) order is deposited, and CPU is through the initial storage address (through configuration pel administration module relevant register can realize primitive control module with this address be start address continuously from DDR get vertex data) of pci bus to primitive control block configuration vertex data simultaneously; CPU sends order through pci bus to the command analysis module then; The command analysis module is through asynchronous FIFO reading order word; If current order is effective order (a pel rendering order or empty fifo command); Then each component of command word is deciphered primitive types and the number of vertices information of obtaining; And send this information to the primitive control module, and send and finish if in drawing process, need to revise the rendering order of rendering parameter (as: texture address switches, transformation matrix switch) or present frame, need send an order that empties FIFO by software.The primitive control module is after receiving the start address of software arrangements; Send start address to reading the vertex data module, owing to all vertex datas are deposited in proper order, so can be with bigger BurstLength (number of bursts; Promptly can return a plurality of vertex datas continuously) send read request to the DDR controller; As long as the FIFO that reads the summit less than, just can continue to send read request according to the address increment order, simultaneously the vertex data that obtains is sent to the primitive control module; The primitive control module sends to graphics module after according to order request these data organizations being become corresponding primitive data; If current what receive is the order that empties FIFO, the primitive control module can empty order according to this and will obtain the FIFO of DDR data and empty, and guarantees that the order of next time sending can not get wrong summit.

Advantage of the present invention just is: 1, make full use of bandwidth of memory: the implementation structure of getting the summit fast that the present invention proposes can send read memory request with bigger BurstLength, makes full use of bandwidth of memory; 2, reduce the command word number:,, can the command word of an order be reduced to 2 by 7 ~ 8 so information such as each the component start address in traditional drawing command word, Stride can be omitted because vertex data is deposited in order.

Summary of the invention

Fig. 1 is a kind of implementation structure of getting the summit fast among the GPU that realizes of the present invention;

Embodiment

Below will combine accompanying drawing and specific embodiment that the present invention is explained further details.

As shown in Figure 1, a kind of implementation structure that reads the summit fast among the GPU.The initial storage address of CPU through pci bus configuration pel vertex data (all according to fixed format deposit by primitive data; The line segment vertex format is X, Y, Z, W, R, G, B, A; The triangular apex form is X, Y, Z, W, R, G, B, A, S, T, 0,0); CPU sends order through pci bus to the command analysis module then; The command analysis module obtains order data through reading asynchronous FIFO, if the primitive control module is deciphered and sent each component in the command word into to lawful order then; The primitive control module is sent request to the DDR controller with bigger BurstLength through reading the vertex data module according to the start address of configuration; After obtaining the return data of DDR; It is write among the FIFO, need only among the FIFO less than just continuing to send read request; The primitive control module reads return data from FIFO, according to the form of command word data set is woven then and send to the drawing streamline; If the current order that obtains is for emptying fifo command, the primitive control module can be sent and emptied signal to FIFO so, and the data among the FIFO are emptied, and guarantees that drafting next time can not read wrong summit.

Claims

1.GPU in a kind of implementation method that reads the summit fast, it is characterized in that: vertex data is deposited in order, if line segment then requires each summit all according to X, Y, Z, W, R, G, B, the A (horizontal ordinate of corresponding vertex respectively; Ordinate, depth coordinate, homogeneous coordinates coefficient; Color component red, green, blue, transparency) order deposit, if triangle then requires each summit according to X, Y, Z, W, R, G, B, A, S, T (horizontal ordinate of corresponding vertex respectively, ordinate; Depth coordinate; The homogeneous coordinates coefficient, color component red, green, blue, transparency, texture picture horizontal ordinate; The texture picture ordinate), 0,0 order deposits, CPU is through the initial storage address of pci bus to primitive control block configuration vertex data simultaneously; CPU transmits order through pci bus to the command analysis module; The command analysis module is through asynchronous FIFO reading order word; If the current command is effective order (rendering order or empty fifo command); Then each component of command word is deciphered primitive types, the number of vertices information of obtaining; And send it to primitive control module, and to send and finish if in drawing process, need to revise the rendering order of rendering parameter (as: texture address switches, matrix switch) or present frame, the command analysis module need be sent an order that empties FIFO.

2. the summit start address that disposes according to pci bus in the claim 1; The primitive control module is sent start address to reading the vertex data module, owing to all vertex datas are deposited in proper order, so can be with bigger BurstLength (number of bursts; Promptly can return a plurality of vertex datas continuously) send read request to the DDR controller; As long as the FIFO that reads the summit less than, just can continue to send read request according to the address increment order, simultaneously the vertex data that obtains is sent to the primitive control module; The primitive control module sends to graphics module after according to order request these data organizations being become corresponding primitive data; If current what receive is the order that empties FIFO, the primitive control module can empty order according to this and will obtain the FIFO of DDR data and empty, and guarantees that the order of next time sending can not get wrong summit.