EP1269418A1 - Tiled graphics architecture - Google Patents

Tiled graphics architecture

Info

Publication number
EP1269418A1
EP1269418A1 EP01930417A EP01930417A EP1269418A1 EP 1269418 A1 EP1269418 A1 EP 1269418A1 EP 01930417 A EP01930417 A EP 01930417A EP 01930417 A EP01930417 A EP 01930417A EP 1269418 A1 EP1269418 A1 EP 1269418A1
Authority
EP
European Patent Office
Prior art keywords
bin
graphics
data
primitive
vertex
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP01930417A
Other languages
German (de)
French (fr)
Inventor
Hsien-Cheng Hsieh
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corp filed Critical Intel Corp
Publication of EP1269418A1 publication Critical patent/EP1269418A1/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management

Definitions

  • the present invention pertains to the field of computer systems. More particularly, this invention pertains to the field of reducing primitive storage requirements and improving memory bandwidth utilization in a tiled graphics architecture.
  • a three dimensional (3D) object to be represented on the display screen is composed of graphics primitives such as triangle lists, triangle strips and triangle fans etc.
  • the primitives of a 3D object to be rendered are defined by a host computer in terms of primitive data.
  • the host computer may define the three vertices of the triangle in terms of its spatial location in terms of X, Y and Z coordinates, as well as data defining the red, green and blue (R, G and B) color values of each vertex, and texture coordinates.
  • Additional primitive data may be used in specific applications.
  • Rendering hardware within a graphics controller interpolates the primitive data to compute the display screen pixels that represent each primitive, and the R, G and B color values for each pixel.
  • Figure 1 and Figure 2 show an example of sorting graphics primitives into bins, or tiles.
  • a microprocessor fetches data for primitives 1 10, 120, and 130 from a primitive storage area.
  • the primitive storage area may be implemented as a portion of the main system memory or may be implemented as a local graphics memory directly coupled to the graphics controller.
  • the primitives 1 10, 120, and 130 are eventually to be rendered and then displayed on a display screen, represented by block 100.
  • the block 100 is divided into four bins for this example.
  • a frame of display data is divided into many more bins than the four shown in this example with a typical bin dimension of 128 x 64 pixels. Four bins are used in this example in order to simplify the discussion.
  • the processor After fetching data for a graphics primitive, the processor determines which bin or "tile" the primitive intersects. For example, the processor can determine that primitive 1 10 intersects bin 210 as well as bin 220. The processor then writes the data for the three vertices of primitive 1 10 to an area of graphics memory set aside to store primitive data for bin 210 as well as to an area of graphics memory set aside to store primitive data for bin 220. Similarly, the processor writes vertex data for primitive 120 to storage areas for bins 220 and 240 and writes vertex data for primitive 130 to storage areas for bins 210, 230, and 240. Once the primitives have been sorted into bins, the graphics controller fetches primitive data from the graphics memory and renders the primitives one bin at a time.
  • Figure 2 demonstrates how the graphics controller divides the primitives 1 10, 120, and 130 into various primitives that fit into bins 210, 220, 230, and 240.
  • the various primitives are divided into bins according to how the primitives intersect the bin boundaries. For example, when the primitive data for bin 210 is fetched from graphics memory, the graphics controller divides primitive 1 10 to create primitive 21 1. Primitive 130 is divided to create primitive 212. The graphics controller then proceeds to render primitives 21 1 and 212. The graphics controller then proceeds to process bin 220 by dividing primitives 1 10 and 120 to create primitives 221 and 222 and by rendering the primitives 221 and 222. The graphics controller continues in a similar fashion to process bins 230 and 240.
  • Figure 3 is a block diagram of a prior computer system that implements a tiled graphics architecture.
  • Figure 3 shows a processor 310, a system memory 330 including a graphics primitive storage area 332. a graphics controller 340, and a display monitor 350.
  • Prior tiled graphics architectures such as that implemented by the system of Figure 3 have a disadvantage of using large amounts of memory bandwidth when moving primitive data from device to device. For example, when the processor 310 processes a primitive, the processor 310 reads the vertex data for the primitive from the graphics primitive storage area 332. The processor 310 then determines which bins that the primitive intersects. The processor 310 then must write several copies of the vertex data back out to the graphics primitive storage area 332 where the number of copies to be written depends on how many bins the primitive intersects.
  • a typical graphics primitive can be represented by about 100 bytes of vertex data and that a graphics primitive may intersect several bins. This example will assume that a typical primitive intersects 3 bins.
  • the processor 310 must write an average of 300 bytes of vertex data to the graphics primitive storage area 332 for each primitive processed. For one frame of a relatively simple display consisting of 2k graphics primitives, the processor 310 must deliver 600k bytes of data per frame. If the frame display rate is 60 frames per second, the processor 310 must deliver data to the graphics primitive storage area 332 at a rate of 360M bytes per second. For a more complicated display consisting of 100k primitives, the bandwidth requirements increase to 1.8G bytes per second.
  • Figure 1 is a diagram of a several 3D objects arranged on a display screen in accordance with prior systems.
  • Figure 2 is a diagram depicting the several 3D objects of Figure 1 sorted into bins in accordance with prior systems.
  • Figure 3 is a block diagram of a prior system including a tiled graphics architecture.
  • Figure 4 is a flow diagram of an embodiment of a method for reducing memory bandwidth utilization in a tiled graphics architecture.
  • Figure 5 is a flow diagram of an embodiment of a method for reducing memory bandwidth utilization in a tiled graphics architecture where the graphics primitive storage area is located in system memory.
  • Figure 6 is a flow diagram of an embodiment of a method for reducing memory bandwidth utilization in a tiled graphics architecture where the graphics primitive storage area is located in a local graphics memory.
  • Figure 7 is a block diagram of a system including an embodiment of a graphics controller that includes a vertex cache.
  • a microprocessor reads vertex data for a graphics primitive from graphics memory.
  • the processor determines which bins the graphics primitive intersects. All vertices of the primitive are written into a vertex buffer for future reference.
  • the vertex buffer may reside in either main system memory or local graphics memory.
  • the vertex buffer may be implemented as part of the bin storage area or in a separate memory location.
  • the processor determines that the graphics primitive intersects a first and a second bin
  • the processor writes a pointer to the first and second bin storage areas.
  • the pointer indicates the location in memory of the actual vertex data. Thus, only one copy of the vertex data is moved from the processor to the graphics memory. Because the pointer is smaller in size than the vertex data, less data is moved from the processor to the graphics memory and memory bandwidth utilization is improved.
  • microprocessor in the above example and in the example embodiments that follow may be substituted for a 3D graphics processor that handles the same primitive processing as performed by the microprocessor.
  • an additional embodiment may include a 3D graphics processor that performs hardware transformation and lighting calculations in hardware.
  • the graphics memory in the above example and in the example embodiments that follow may be included as part of a main system memory or may by implemented as a local graphics memory directly coupled to a graphics controller.
  • the term "pointer" as used herein is meant to include any means of at least partially indicating the location of vertex data, including a memory address and also including an index.
  • the pointer may be a physical or virtual memory address indicating the location of the vertex data.
  • the pointer may be an index that can be used to calculate the address location of the vertex data.
  • an address may be calculated from an index according to the equation "base address + index * vertex data size".
  • an address is assumed to be 32 bits wide
  • an index is assumed to be 16 bits wide
  • vertex data for a triangular graphics primitive is assumed to be approximately 100 bytes long.
  • Other embodiments are possible using a wide range of address, index, and data sizes and lengths.
  • Figure 4 is a flow diagram of an embodiment of a method for improving memory bandwidth utilization in a tiled graphics architecture.
  • a determination is made as to whether a graphics primitive intersects a first and a second bin. If the graphics primitive is found to intersect the first and the second bin, then at block 420 data for a plurality of vertices corresponding to the graphics primitive are written to a first bin storage area located in a memory device.
  • the memory device may comprise the main system memory or may comprise a local graphics memory coupled directly to a graphics controller.
  • a plurality of pointers are written to a second bin storage area located in the graphics memory. The plurality of pointers indicate the memory locations for the data for the plurality of vertices.
  • pointers By writing pointers to the second bin storage area instead of writing the vertex data, less data is moved from the processor to the graphics memory and memory bandwidth utilization is improved.
  • the pointers will be fetched by the graphics controller along with any other second bin primitive data.
  • the graphics controller will use the pointers to fetch the vertex data from the first bin storage area.
  • FIG. 5 is a flow diagram of an embodiment of a method for improving memory bandwidth utilization in a tiled graphics architecture in a computer system where the graphics memory is implemented as an area within main system memory and where the graphics controller includes a vertex cache.
  • the vertex cache provides temporary storage for vertex data and allows for an improvement in system memory to graphics controller memory bandwidth utilization by reducing the amount of vertex data moved between the graphics memory located in main system memory and the graphics controller.
  • a processor fetches vertex data for a graphics primitive from system memory and at block 510 the processor performs calculations on the vertex data.
  • the vertex data for the graphics primitive includes data for three vertices, although other embodiments are possible where the vertex data for the graphics primitive may include data for any number of vertices.
  • the calculations described as part of this embodiment are meant to represent a broad range of well-known techniques for manipulating graphics primitive data.
  • the processor determines whether the graphics primitive intersects a first bin, and, assuming there is an intersection, the processor writes the vertex data for the graphics primitive to a first bin storage area in system memory.
  • the processor determines whether the graphics primitive intersects a second bin. If the graphics primitive is found to intersect the second bin, then at block 525 the processor writes three pointers to a second bin storage area in system memory. The pointers indicate the memory locations of the three vertices that were previously written to system memory.
  • the processor determines whether the graphics primitive intersects a third bin. If the graphics primitive is found to intersect the third bin, then at block 535 the processor writes three pointers to a third bin storage area in system memory. The pointers indicate the memory locations of the three vertices that were previously written to system memory.
  • the processor determines whether the graphics primitive intersects a fourth bin. If the graphics primitive is found to intersect the fourth bin, then at block 545 the processor writes three pointers to a fourth bin storage area in system memory. The pointers indicate the memory locations of the three vertices that were previously written to system memory.
  • a graphics primitive possibly intersecting four bins
  • the graphics primitive may possibly intersect two or more bins.
  • a bin may have the dimensions of 128 pixels by 64 pixels, although other bin dimensions are possible.
  • the bin intersection determination may be performed in a parallel manner instead of the serial approach described above. For example, a bounding box of the primitive may be used to find all bins that the primitive intersects at once.
  • blocks 505 through 545 may be repeated until all primitives have been sorted into bins.
  • the graphics controller fetches data from the first bin storage area.
  • the data fetched from the first bin storage area and the vertex buffer includes the vertex data for the graphics primitive that was previously written to the system memory at block 515.
  • the graphics controller stores the fetched vertex data in the vertex cache.
  • the vertex cache includes 16 entries which are 4-way interleaved with each entry capable of storing 32 bytes of vertex data. Other embodiments are possible with different numbers of entries and numbers of ways and further with each entry capable of storing different amounts of vertex data.
  • the graphics controller fetches the first bin data and stores the vertex data in the vertex cache, the graphics controller renders the first bin primitives at block 560. As part of the rendering process, the graphics controller determines which portion of each graphics primitive included in the first bin data falls within the first bin and renders only that portion of the primitive.
  • the graphics controller proceeds to process the second bin.
  • the graphics controller fetches data from the second bin storage area at block 565.
  • the data fetched from the second bin storage area includes pointers to the vertex data for the graphics primitive, assuming that an intersection with the second bin was found at block 520.
  • the graphics controller uses the pointers to access the vertex data that was previously stored in the vertex cache at block 555. Once the graphics processor has accessed the vertex data, the graphics controller renders the second bin primitives at block 575.
  • the above embodiment may be generalized to, based on certain heuristics, render the second bin first and followed by the third, first, and fourth bins. This allows overall system performance optimization measures. For example, load balancing can be used to normalize loadings on front-end and back-end processing in the graphics processor.
  • Figure 6 is a flow diagram of an embodiment of a method for improving memory bandwidth utilization in a tiled graphics architecture in a computer system where the graphics memory is implemented as a local graphics memory directly coupled to a graphics controller.
  • the local graphics memory provides storage for vertex data and allows for an improvement in system memory to graphics controller memory bandwidth utilization by reducing the amount of vertex data moved between the graphics memory located in main system memory and the graphics controller.
  • a processor fetches vertex data for a graphics primitive from local graphics memory or, alternatively, from system memory and at block 610 the processor performs calculations on the vertex data.
  • the vertex data for the graphics primitive includes data for three vertices, although other embodiments are possible where the vertex data for the graphics primitive may include data for any number of vertices.
  • the calculations described as part of this embodiment are meant to represent a broad range of well-known techniques for manipulating graphics primitive data.
  • the processor determines whether the graphics primitive intersects a first bin, and, assuming there is an intersection, the processor writes the vertex data for the graphics primitive to a first bin storage area in the local graphics memory.
  • the processor determines whether the graphics primitive intersects a second bin. If the graphics primitive is found to intersect the second bin, then at block 625 the processor writes three pointers to a second bin storage area in local graphics memory. The pointers indicate the memory locations of the three vertices that were previously written to local graphics memory.
  • the processor determines whether the graphics primitive intersects a third bin. If the graphics primitive is found to intersect the third bin, then at block 635 the processor writes three pointers to a third bin storage area in local graphics memory. The pointers indicate the memory locations of the three vertices that were previously written to local graphics memory.
  • the processor determines whether the graphics primitive intersects a fourth bin. If the graphics primitive is found to intersect the fourth bin, then at block 645 the processor writes three pointers to a fourth bin storage area in local graphics memory. The pointers indicate the memory locations of the three vertices that were previously written to local graphics memory.
  • a bin may have the dimensions of 128 pixels by 64 pixels, although other bin dimensions are possible.
  • the bin intersection determination may be performed in a parallel manner instead of the serial approach described above. For example, a bounding box of the primitive may be used to find all bins that the primitive intersects at once. As indicated by block 647, blocks 605 through 645 may be repeated until all primitives have been sorted into bins.
  • the graphics controller fetches data from the first bin storage area.
  • the data fetched from the first bin storage area includes the vertex data for the graphics primitive that was previously written to the local graphics memory at block 615.
  • the graphics controller renders the first bin primitives at block 660.
  • the graphics controller determines which portion of each graphics primitive included in the first bin data falls within the first bin and renders only that portion of the primitive.
  • the graphics controller proceeds to process the second bin.
  • the graphics controller fetches data from the second bin storage area at block 665.
  • the data fetched from the second bin storage area includes pointers to the vertex data for the graphics primitive, assuming that an intersection with the second bin was found at block 620.
  • the graphics controller uses the pointers to access the vertex data that was previously stored in the local graphics memory at block 615. Once the graphics processor has accessed the vertex data, the graphics controller renders the second bin primitives at block 675.
  • the above method may be generalized to, based on certain heuristics, render second bin first and then followed by the third, first, and fourth bins. This allows overall system performance optimization measures. For example, load balancing can be used to normalize loadings on front-end and back-end processing in the graphics processor.
  • Figure 7 is a block diagram of a computer system including a graphics controller 740 that includes a vertex cache 742.
  • the computer system of Figure 7 also includes a processor 710 coupled to a system logic device 720 via a processor bus 715.
  • the system logic device 720 provides communication between the processor 710 and a system memory 730.
  • the system memory 730 includes a graphics primitive storage area 732.
  • the graphics primitive storage area 732 may be separated into storage areas for a plurality of bins.
  • the system logic device 720 also serves to couple the graphics controller 740 to the processor 710 and the system memory 730.
  • the system of Figure 7 also includes a display monitor 750 coupled to the graphics controller 740.
  • the system of Figure 7 may be used with embodiments of methods for improving memory bandwidth utilization such as those discussed above in connection with Figures 4 and 5.
  • the processor 710 may read vertex data for a graphics primitive from the graphics primitive storage area 732. The processor 710 may then determine which bins the graphics primitive intersects. The processor 710 then writes the vertex data to a first bin storage area within the graphics primitive storage area 732. If the graphics primitive is found to intersect other bins, then the processor 710 writes pointers to other bin storage areas within the graphics primitive storage area 732. The pointers indicate the location within the first bin storage area where the vertex data is stored.
  • the pointers in this example include a 16 bit index from which the memory location of the vertex data may be calculated. Other embodiments are possible where the pointers include a 32 bit address identifying the storage locations of the vertex data. Still other embodiments are possible using different length indices and/or addresses.
  • the graphics controller 740 desires to process the first bin.
  • the graphics controller 740 fetches the first bin data from the graphics primitive storage area 732.
  • the graphics controller 740 stores the vertex data for the graphics primitive in the vertex cache 742.
  • the graphics controller 740 then renders the first bin. including the portion of the graphics primitive that falls within the first bin.
  • a bin has the dimensions of 128 x 64 pixels.
  • the vertex cache 742 in this example includes 16 entries that are 4-way set-associative and are capable of storing 32 bytes of vertex data.
  • the graphics primitive of this example is represented by three vertices where each vertex is defined by 32 bytes of data. Other embodiments are possible using other bin dimensions and/or other cache arrangements.
  • the graphics controller 740 fetches the data for the second bin from the graphics primitive storage area 732.
  • the data for the second bin will include pointers to the vertex data for the graphics primitive assuming that the processor 710 previously determined that the graphics primitive intersects the second bin.
  • the graphics controller 740 then uses the pointers to access the vertex data stored in the vertex cache 742.
  • the vertex cache 742 serves to improve memory bandwidth utilization by eliminating the necessity for the vertex data to be fetched from the graphics primitive storage area 732 when a copy of the vertex data is stored in the vertex cache 742, as is the case in this example.
  • the graphics controller 740 can render the second bin. Subsequent bins can be processed in a similar manner until all bins have been rendered.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Graphics (AREA)
  • Image Generation (AREA)

Abstract

A method and apparatus for reducing memory bandwidth utilization in a tiled graphics architecture is disclosed. In one embodiment, a microprocessor reads vertex data for a graphics primitive from graphics memory. The processor determines with which bins the graphics primitive intersects. Assuming that the processor determines that the graphics primitive intersects a first and a second bin, the processor writes the vertex data for the graphics primitve to a first bin storage area in graphics memory. The processor then writes a pointer to a second bin storage area. The pointer indicates the location in memory of the actual vertex data.

Description

TILED GRAPHICS ARCHITECTURE
Field Of The Invention
The present invention pertains to the field of computer systems. More particularly, this invention pertains to the field of reducing primitive storage requirements and improving memory bandwidth utilization in a tiled graphics architecture.
Background of the Invention
In typical computer graphics systems, a three dimensional (3D) object to be represented on the display screen is composed of graphics primitives such as triangle lists, triangle strips and triangle fans etc. Typically, the primitives of a 3D object to be rendered are defined by a host computer in terms of primitive data. For example, for each triangle in a primitive, the host computer may define the three vertices of the triangle in terms of its spatial location in terms of X, Y and Z coordinates, as well as data defining the red, green and blue (R, G and B) color values of each vertex, and texture coordinates. Additional primitive data may be used in specific applications. Rendering hardware within a graphics controller interpolates the primitive data to compute the display screen pixels that represent each primitive, and the R, G and B color values for each pixel.
In order to make more efficient use of memory bandwidth, graphics primitives are sorted into bins, also referred to as "tiles". This well-know technique is often referred to
as "tiling".
Figure 1 and Figure 2 show an example of sorting graphics primitives into bins, or tiles. For this example, a microprocessor fetches data for primitives 1 10, 120, and 130 from a primitive storage area. The primitive storage area may be implemented as a portion of the main system memory or may be implemented as a local graphics memory directly coupled to the graphics controller. The primitives 1 10, 120, and 130 are eventually to be rendered and then displayed on a display screen, represented by block 100. The block 100 is divided into four bins for this example. Typically, a frame of display data is divided into many more bins than the four shown in this example with a typical bin dimension of 128 x 64 pixels. Four bins are used in this example in order to simplify the discussion.
After fetching data for a graphics primitive, the processor determines which bin or "tile" the primitive intersects. For example, the processor can determine that primitive 1 10 intersects bin 210 as well as bin 220. The processor then writes the data for the three vertices of primitive 1 10 to an area of graphics memory set aside to store primitive data for bin 210 as well as to an area of graphics memory set aside to store primitive data for bin 220. Similarly, the processor writes vertex data for primitive 120 to storage areas for bins 220 and 240 and writes vertex data for primitive 130 to storage areas for bins 210, 230, and 240. Once the primitives have been sorted into bins, the graphics controller fetches primitive data from the graphics memory and renders the primitives one bin at a time. Figure 2 demonstrates how the graphics controller divides the primitives 1 10, 120, and 130 into various primitives that fit into bins 210, 220, 230, and 240. The various primitives are divided into bins according to how the primitives intersect the bin boundaries. For example, when the primitive data for bin 210 is fetched from graphics memory, the graphics controller divides primitive 1 10 to create primitive 21 1. Primitive 130 is divided to create primitive 212. The graphics controller then proceeds to render primitives 21 1 and 212. The graphics controller then proceeds to process bin 220 by dividing primitives 1 10 and 120 to create primitives 221 and 222 and by rendering the primitives 221 and 222. The graphics controller continues in a similar fashion to process bins 230 and 240.
Figure 3 is a block diagram of a prior computer system that implements a tiled graphics architecture. Figure 3 shows a processor 310, a system memory 330 including a graphics primitive storage area 332. a graphics controller 340, and a display monitor 350. Prior tiled graphics architectures such as that implemented by the system of Figure 3 have a disadvantage of using large amounts of memory bandwidth when moving primitive data from device to device. For example, when the processor 310 processes a primitive, the processor 310 reads the vertex data for the primitive from the graphics primitive storage area 332. The processor 310 then determines which bins that the primitive intersects. The processor 310 then must write several copies of the vertex data back out to the graphics primitive storage area 332 where the number of copies to be written depends on how many bins the primitive intersects.
The impact on memory bandwidth utilization can be demonstrated by considering that a typical graphics primitive can be represented by about 100 bytes of vertex data and that a graphics primitive may intersect several bins. This example will assume that a typical primitive intersects 3 bins. In this situation, the processor 310 must write an average of 300 bytes of vertex data to the graphics primitive storage area 332 for each primitive processed. For one frame of a relatively simple display consisting of 2k graphics primitives, the processor 310 must deliver 600k bytes of data per frame. If the frame display rate is 60 frames per second, the processor 310 must deliver data to the graphics primitive storage area 332 at a rate of 360M bytes per second. For a more complicated display consisting of 100k primitives, the bandwidth requirements increase to 1.8G bytes per second. The same bandwidth requirements must also be met between the graphics primitive storage area 332 and the graphics controller 340. This high utilization of memory bandwidth to move graphics primitive data from the processor 310 to the graphics primitive storage area 332 and from the graphics primitive storage area 332 to the graphics controller 340 can result in a significant negative impact on overall system performance.
Brief Description of the Drawings
The invention will be understood more fully from the detailed description given below and from the accompanying drawings of embodiments of the invention which, however, should not be taken to limit the invention to the specific embodiments described, but are for explanation and understanding only.
Figure 1 is a diagram of a several 3D objects arranged on a display screen in accordance with prior systems.
Figure 2 is a diagram depicting the several 3D objects of Figure 1 sorted into bins in accordance with prior systems. Figure 3 is a block diagram of a prior system including a tiled graphics architecture.
Figure 4 is a flow diagram of an embodiment of a method for reducing memory bandwidth utilization in a tiled graphics architecture.
Figure 5 is a flow diagram of an embodiment of a method for reducing memory bandwidth utilization in a tiled graphics architecture where the graphics primitive storage area is located in system memory.
Figure 6 is a flow diagram of an embodiment of a method for reducing memory bandwidth utilization in a tiled graphics architecture where the graphics primitive storage area is located in a local graphics memory. Figure 7 is a block diagram of a system including an embodiment of a graphics controller that includes a vertex cache.
Detailed Description
An example embodiment of a method and apparatus for reducing memory bandwidth utilization in a tiled graphics architecture will be described. For this example, a microprocessor reads vertex data for a graphics primitive from graphics memory. The processor determines which bins the graphics primitive intersects. All vertices of the primitive are written into a vertex buffer for future reference. The vertex buffer may reside in either main system memory or local graphics memory. The vertex buffer may be implemented as part of the bin storage area or in a separate memory location. Assuming that the processor determines that the graphics primitive intersects a first and a second bin, the processor writes a pointer to the first and second bin storage areas. The pointer indicates the location in memory of the actual vertex data. Thus, only one copy of the vertex data is moved from the processor to the graphics memory. Because the pointer is smaller in size than the vertex data, less data is moved from the processor to the graphics memory and memory bandwidth utilization is improved.
The microprocessor in the above example and in the example embodiments that follow may be substituted for a 3D graphics processor that handles the same primitive processing as performed by the microprocessor. For example, an additional embodiment may include a 3D graphics processor that performs hardware transformation and lighting calculations in hardware.
The graphics memory in the above example and in the example embodiments that follow may be included as part of a main system memory or may by implemented as a local graphics memory directly coupled to a graphics controller. The term "pointer" as used herein is meant to include any means of at least partially indicating the location of vertex data, including a memory address and also including an index. For example, the pointer may be a physical or virtual memory address indicating the location of the vertex data. Alternatively, the pointer may be an index that can be used to calculate the address location of the vertex data. For example, an address may be calculated from an index according to the equation "base address + index * vertex data size".
Although the above example and the examples that follow discuss a given number of bins which a graphics primitive may intersect, other examples are possible using any number of bins. Further, although the graphics primitives discussed herein comprise triangles including three vertices, other types of primitives are possible.
Also, for the example embodiments described herein, an address is assumed to be 32 bits wide, an index is assumed to be 16 bits wide, and vertex data for a triangular graphics primitive is assumed to be approximately 100 bytes long. Other embodiments are possible using a wide range of address, index, and data sizes and lengths.
Figure 4 is a flow diagram of an embodiment of a method for improving memory bandwidth utilization in a tiled graphics architecture. At block 410, a determination is made as to whether a graphics primitive intersects a first and a second bin. If the graphics primitive is found to intersect the first and the second bin, then at block 420 data for a plurality of vertices corresponding to the graphics primitive are written to a first bin storage area located in a memory device. The memory device may comprise the main system memory or may comprise a local graphics memory coupled directly to a graphics controller. At block 430 a plurality of pointers are written to a second bin storage area located in the graphics memory. The plurality of pointers indicate the memory locations for the data for the plurality of vertices. By writing pointers to the second bin storage area instead of writing the vertex data, less data is moved from the processor to the graphics memory and memory bandwidth utilization is improved. The pointers will be fetched by the graphics controller along with any other second bin primitive data. The graphics controller will use the pointers to fetch the vertex data from the first bin storage area.
Figure 5 is a flow diagram of an embodiment of a method for improving memory bandwidth utilization in a tiled graphics architecture in a computer system where the graphics memory is implemented as an area within main system memory and where the graphics controller includes a vertex cache. The vertex cache provides temporary storage for vertex data and allows for an improvement in system memory to graphics controller memory bandwidth utilization by reducing the amount of vertex data moved between the graphics memory located in main system memory and the graphics controller. Referring to Figure 5, at block 505 a processor fetches vertex data for a graphics primitive from system memory and at block 510 the processor performs calculations on the vertex data. For this example, the vertex data for the graphics primitive includes data for three vertices, although other embodiments are possible where the vertex data for the graphics primitive may include data for any number of vertices. The calculations described as part of this embodiment are meant to represent a broad range of well-known techniques for manipulating graphics primitive data. At block 515, the processor determines whether the graphics primitive intersects a first bin, and, assuming there is an intersection, the processor writes the vertex data for the graphics primitive to a first bin storage area in system memory.
At block 520. the processor determines whether the graphics primitive intersects a second bin. If the graphics primitive is found to intersect the second bin, then at block 525 the processor writes three pointers to a second bin storage area in system memory. The pointers indicate the memory locations of the three vertices that were previously written to system memory.
At block 530, the processor determines whether the graphics primitive intersects a third bin. If the graphics primitive is found to intersect the third bin, then at block 535 the processor writes three pointers to a third bin storage area in system memory. The pointers indicate the memory locations of the three vertices that were previously written to system memory.
At block 540, the processor determines whether the graphics primitive intersects a fourth bin. If the graphics primitive is found to intersect the fourth bin, then at block 545 the processor writes three pointers to a fourth bin storage area in system memory. The pointers indicate the memory locations of the three vertices that were previously written to system memory.
Although this embodiment describes a graphics primitive possibly intersecting four bins, other embodiments are possible where the graphics primitive may possibly intersect two or more bins. Further, in one embodiment, a bin may have the dimensions of 128 pixels by 64 pixels, although other bin dimensions are possible. Also the bin intersection determination may be performed in a parallel manner instead of the serial approach described above. For example, a bounding box of the primitive may be used to find all bins that the primitive intersects at once.
As indicated by block 547, blocks 505 through 545 may be repeated until all primitives have been sorted into bins. At block 550, the graphics controller fetches data from the first bin storage area.
The data fetched from the first bin storage area and the vertex buffer includes the vertex data for the graphics primitive that was previously written to the system memory at block 515.
At block 555 the graphics controller stores the fetched vertex data in the vertex cache. In one embodiment, the vertex cache includes 16 entries which are 4-way interleaved with each entry capable of storing 32 bytes of vertex data. Other embodiments are possible with different numbers of entries and numbers of ways and further with each entry capable of storing different amounts of vertex data.
After the graphics controller fetches the first bin data and stores the vertex data in the vertex cache, the graphics controller renders the first bin primitives at block 560. As part of the rendering process, the graphics controller determines which portion of each graphics primitive included in the first bin data falls within the first bin and renders only that portion of the primitive.
Following the rendering of the first bin, the graphics controller proceeds to process the second bin. As a first step in processing the second bin, the graphics controller fetches data from the second bin storage area at block 565. The data fetched from the second bin storage area includes pointers to the vertex data for the graphics primitive, assuming that an intersection with the second bin was found at block 520. At block 570, the graphics controller uses the pointers to access the vertex data that was previously stored in the vertex cache at block 555. Once the graphics processor has accessed the vertex data, the graphics controller renders the second bin primitives at block 575.
At block 580, a determination is made as to whether additional bins remain to be rendered. If additional bins remain, then processing resumes at block 565. Blocks 565 through 580 are repeated until all bins have been rendered and processing stops at block 585. Note that the sequencing of bin rendering may or may not be serial. The above embodiment may be generalized to, based on certain heuristics, render the second bin first and followed by the third, first, and fourth bins. This allows overall system performance optimization measures. For example, load balancing can be used to normalize loadings on front-end and back-end processing in the graphics processor.
Figure 6 is a flow diagram of an embodiment of a method for improving memory bandwidth utilization in a tiled graphics architecture in a computer system where the graphics memory is implemented as a local graphics memory directly coupled to a graphics controller. The local graphics memory provides storage for vertex data and allows for an improvement in system memory to graphics controller memory bandwidth utilization by reducing the amount of vertex data moved between the graphics memory located in main system memory and the graphics controller.
Referring to Figure 6, at block 605 a processor fetches vertex data for a graphics primitive from local graphics memory or, alternatively, from system memory and at block 610 the processor performs calculations on the vertex data. For this example, the vertex data for the graphics primitive includes data for three vertices, although other embodiments are possible where the vertex data for the graphics primitive may include data for any number of vertices. The calculations described as part of this embodiment are meant to represent a broad range of well-known techniques for manipulating graphics primitive data. At block 615, the processor determines whether the graphics primitive intersects a first bin, and, assuming there is an intersection, the processor writes the vertex data for the graphics primitive to a first bin storage area in the local graphics memory.
At block 620, the processor determines whether the graphics primitive intersects a second bin. If the graphics primitive is found to intersect the second bin, then at block 625 the processor writes three pointers to a second bin storage area in local graphics memory. The pointers indicate the memory locations of the three vertices that were previously written to local graphics memory.
At block 630, the processor determines whether the graphics primitive intersects a third bin. If the graphics primitive is found to intersect the third bin, then at block 635 the processor writes three pointers to a third bin storage area in local graphics memory. The pointers indicate the memory locations of the three vertices that were previously written to local graphics memory.
At block 640, the processor determines whether the graphics primitive intersects a fourth bin. If the graphics primitive is found to intersect the fourth bin, then at block 645 the processor writes three pointers to a fourth bin storage area in local graphics memory. The pointers indicate the memory locations of the three vertices that were previously written to local graphics memory.
Although this embodiment describes a graphics primitive possibly intersecting four bins, other embodiments are possible where the graphics primitive may possibly intersect two or more bins. Further, in one embodiment, a bin may have the dimensions of 128 pixels by 64 pixels, although other bin dimensions are possible. Also the bin intersection determination may be performed in a parallel manner instead of the serial approach described above. For example, a bounding box of the primitive may be used to find all bins that the primitive intersects at once. As indicated by block 647, blocks 605 through 645 may be repeated until all primitives have been sorted into bins.
At block 650, the graphics controller fetches data from the first bin storage area. The data fetched from the first bin storage area includes the vertex data for the graphics primitive that was previously written to the local graphics memory at block 615. After the graphics controller fetches the first bin data, the graphics controller renders the first bin primitives at block 660. As part of the rendering process, the graphics controller determines which portion of each graphics primitive included in the first bin data falls within the first bin and renders only that portion of the primitive.
Following the rendering of the first bin, the graphics controller proceeds to process the second bin. As a first step in processing the second bin, the graphics controller fetches data from the second bin storage area at block 665. The data fetched from the second bin storage area includes pointers to the vertex data for the graphics primitive, assuming that an intersection with the second bin was found at block 620. At block 670, the graphics controller uses the pointers to access the vertex data that was previously stored in the local graphics memory at block 615. Once the graphics processor has accessed the vertex data, the graphics controller renders the second bin primitives at block 675.
At block 680, a determination is made as to whether additional bins remain to be rendered. If additional bins remain, then processing resumes at block 665. Blocks 665 through 680 are repeated until all bins have been rendered and processing stops at block 685. Note that the sequencing of bin rendering may or may not be serial. The above method may be generalized to, based on certain heuristics, render second bin first and then followed by the third, first, and fourth bins. This allows overall system performance optimization measures. For example, load balancing can be used to normalize loadings on front-end and back-end processing in the graphics processor.
Figure 7 is a block diagram of a computer system including a graphics controller 740 that includes a vertex cache 742. The computer system of Figure 7 also includes a processor 710 coupled to a system logic device 720 via a processor bus 715. The system logic device 720 provides communication between the processor 710 and a system memory 730. The system memory 730 includes a graphics primitive storage area 732. The graphics primitive storage area 732 may be separated into storage areas for a plurality of bins.
The system logic device 720 also serves to couple the graphics controller 740 to the processor 710 and the system memory 730. The system of Figure 7 also includes a display monitor 750 coupled to the graphics controller 740.
The system of Figure 7 may be used with embodiments of methods for improving memory bandwidth utilization such as those discussed above in connection with Figures 4 and 5. For example, the processor 710 may read vertex data for a graphics primitive from the graphics primitive storage area 732. The processor 710 may then determine which bins the graphics primitive intersects. The processor 710 then writes the vertex data to a first bin storage area within the graphics primitive storage area 732. If the graphics primitive is found to intersect other bins, then the processor 710 writes pointers to other bin storage areas within the graphics primitive storage area 732. The pointers indicate the location within the first bin storage area where the vertex data is stored. The pointers in this example include a 16 bit index from which the memory location of the vertex data may be calculated. Other embodiments are possible where the pointers include a 32 bit address identifying the storage locations of the vertex data. Still other embodiments are possible using different length indices and/or addresses.
When the graphics controller 740 desires to process the first bin. the graphics controller 740 fetches the first bin data from the graphics primitive storage area 732. The graphics controller 740 stores the vertex data for the graphics primitive in the vertex cache 742. The graphics controller 740 then renders the first bin. including the portion of the graphics primitive that falls within the first bin.
For this example, a bin has the dimensions of 128 x 64 pixels. The vertex cache 742 in this example includes 16 entries that are 4-way set-associative and are capable of storing 32 bytes of vertex data. The graphics primitive of this example is represented by three vertices where each vertex is defined by 32 bytes of data. Other embodiments are possible using other bin dimensions and/or other cache arrangements.
When the graphics controller 740 is ready to process the second bin, the graphics controller 740 fetches the data for the second bin from the graphics primitive storage area 732. The data for the second bin will include pointers to the vertex data for the graphics primitive assuming that the processor 710 previously determined that the graphics primitive intersects the second bin. The graphics controller 740 then uses the pointers to access the vertex data stored in the vertex cache 742. The vertex cache 742 serves to improve memory bandwidth utilization by eliminating the necessity for the vertex data to be fetched from the graphics primitive storage area 732 when a copy of the vertex data is stored in the vertex cache 742, as is the case in this example.
Once the vertex data is retrieved from the vertex cache 742, the graphics controller 740 can render the second bin. Subsequent bins can be processed in a similar manner until all bins have been rendered.
In the foregoing specification the invention has been described with reference to specific exemplary embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention as set forth in the appended claims. The specification and drawings are, accordingly, to be regarded in an illustrative rather than in a restrictive sense.
Reference in the specification to "an embodiment," "one embodiment," "some embodiments," or "other embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least some embodiments, but not necessarily all embodiments, of the invention. The various appearances of "an embodiment," "one embodiment," or "some embodiments" are not necessarily all referring to the same embodiments.

Claims

CLAIMSWhat is claimed is:
1. An method, comprising: determining that a graphics primitive intersects a first and a second bin; writing data for a plurality of vertices corresponding to the graphics primitive to a first bin storage area located in a memory device; and writing a plurality of pointers to a second bin storage area located in the memory device, the plurality of pointers to indicate the location of the data for the plurality of vertices.
2. The method of claim 1, wherein writing data for a plurality of vertices corresponding to the graphics primitive to a first bin storage area located in a memory device includes writing data for a plurality of vertices corresponding to the graphics primitive to a first bin storage area located in a frame buffer.
3. The method of claim 2, wherein writing a plurality of pointers to a second bin storage area located in the memory device includes writing a plurality of pointers to a second bin storage area located in the frame buffer.
4. The method of claim 1, wherein writing data for a plurality of vertices corresponding to the graphics primitive to a first bin storage area located in a memory device includes writing data for a plurality of vertices corresponding to the graphics primitive to a first bin storage area located in a main memory.
5. The method of claim 4, wherein writing a plurality of pointers to a second bin storage area located in the memory device includes writing a plurality of pointers to a second bin storage area located in the main memorv.
6. The method of claim 5, further comprising: loading data for one of the plurality of vertices into a vertex cache; reading one of the plurality of pointers into a graphics controller; and accessing the data for the one of the plurality of vertices stored in the vertex cache using the one of the plurality of pointers.
7. An apparatus comprising a bin fetch unit to fetch primitive data from a first bin storage area located in a memory, the primitive data including a pointer to indicate a memory location for data corresponding to a vertex, the bin fetch unit further to fetch the data corresponding to the vertex indicated by the pointer.
8. The apparatus of claim 7, the memory including a main memory device.
9. The apparatus of claim 7, the bin fetch unit to fetch the data corresponding to the vertex indicated by the pointer from a frame buffer.
10. The apparatus of claim 7, the bin fetch unit to fetch the data corresponding to the vertex indicated by the pointer from a main memory device.
1 1. The apparatus of claim 7, further including a vertex cache, the bin fetch unit to fetch the data corresponding to the vertex indicated by the pointer from the vertex cache.
12. The apparatus of claim 1 1, wherein the vertex cache includes a plurality of entries, each entry to store 32 bytes of vertex data.
13. A system, comprising: a processor; a memory controller coupled to the processor; a main memory coupled to the memory controller; and a graphics controller including a bin fetch unit to fetch primitive data from a first bin storage area located in the main memory, the primitive data including a pointer to indicate a memory location for data corresponding to a vertex, the bin fetch unit further to fetch the data corresponding to the vertex indicated by the pointer.
14. The system of claim 13. the bin fetch unit to fetch the data corresponding to the vertex indicated by the pointer from a frame buffer coupled to the graphics controller.
15. The system of claim 13. the bin fetch unit to fetch the data corresponding to the vertex indicated by the pointer from the main memory.
16. The system of claim 13. the graphics controller further including a vertex cache, the bin fetch unit to fetch the data corresponding to the vertex indicated by the pointer from the vertex cache.
17. The system of claim 16. wherein the vertex cache includes a plurality of entries, each entry to store 32 bytes of vertex data.
EP01930417A 2000-03-31 2001-03-06 Tiled graphics architecture Withdrawn EP1269418A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US54061600A 2000-03-31 2000-03-31
US540616 2000-03-31
PCT/US2001/007225 WO2001075804A1 (en) 2000-03-31 2001-03-06 Tiled graphics architecture

Publications (1)

Publication Number Publication Date
EP1269418A1 true EP1269418A1 (en) 2003-01-02

Family

ID=24156227

Family Applications (1)

Application Number Title Priority Date Filing Date
EP01930417A Withdrawn EP1269418A1 (en) 2000-03-31 2001-03-06 Tiled graphics architecture

Country Status (8)

Country Link
EP (1) EP1269418A1 (en)
JP (1) JP2003529860A (en)
KR (1) KR100550240B1 (en)
CN (2) CN1430769B (en)
AU (1) AU2001256955A1 (en)
HK (1) HK1049537A1 (en)
TW (1) TWI233573B (en)
WO (1) WO2001075804A1 (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6738069B2 (en) * 2001-12-31 2004-05-18 Intel Corporation Efficient graphics state management for zone rendering
US7765366B2 (en) * 2005-06-23 2010-07-27 Intel Corporation Memory micro-tiling
GB2449399B (en) * 2006-09-29 2009-05-06 Imagination Tech Ltd Improvements in memory management for systems for generating 3-dimensional computer images
WO2008053597A1 (en) * 2006-11-01 2008-05-08 Digital Media Professionals Inc. Device for accelerating the processing of extended primitive vertex cache
US8139058B2 (en) * 2006-11-03 2012-03-20 Vivante Corporation Hierarchical tile-based rasterization algorithm
GB2458488C (en) * 2008-03-19 2018-09-12 Imagination Tech Ltd Untransformed display lists in a tile based rendering system
US20110043518A1 (en) * 2009-08-21 2011-02-24 Nicolas Galoppo Von Borries Techniques to store and retrieve image data
KR101609266B1 (en) 2009-10-20 2016-04-21 삼성전자주식회사 Apparatus and method for rendering tile based
KR101683556B1 (en) 2010-01-06 2016-12-08 삼성전자주식회사 Apparatus and method for tile-based rendering
EP2587454B1 (en) * 2010-06-24 2017-12-13 Fujitsu Limited Drawing device and drawing method
KR102018699B1 (en) 2011-11-09 2019-09-06 삼성전자주식회사 Apparatus and Method for Tile Binning
CN110415161B (en) * 2019-07-19 2023-06-27 龙芯中科(合肥)技术有限公司 Graphics processing method, device, equipment and storage medium
WO2022150347A1 (en) * 2021-01-05 2022-07-14 Google Llc Subsurface display interfaces and associated systems and methods

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5886701A (en) * 1995-08-04 1999-03-23 Microsoft Corporation Graphics rendering device and method for operating same
US6771264B1 (en) * 1998-08-20 2004-08-03 Apple Computer, Inc. Method and apparatus for performing tangent space lighting and bump mapping in a deferred shading graphics processor
WO2000011607A1 (en) * 1998-08-20 2000-03-02 Apple Computer, Inc. Deferred shading graphics pipeline processor

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO0175804A1 *

Also Published As

Publication number Publication date
JP2003529860A (en) 2003-10-07
CN102842145A (en) 2012-12-26
AU2001256955A1 (en) 2001-10-15
CN102842145B (en) 2016-08-24
CN1430769A (en) 2003-07-16
HK1049537A1 (en) 2003-05-16
KR100550240B1 (en) 2006-02-08
KR20030005253A (en) 2003-01-17
WO2001075804A1 (en) 2001-10-11
TWI233573B (en) 2005-06-01
CN1430769B (en) 2012-05-30

Similar Documents

Publication Publication Date Title
US7164426B1 (en) Method and apparatus for generating texture
US7944441B2 (en) Compression and decompression of data using plane equations
US8089486B2 (en) Tiled prefetched and cached depth buffer
KR100478767B1 (en) Graphic processing with deferred shading
JP3889195B2 (en) Image processing apparatus, image processing system, and image processing method
US6903737B2 (en) Method and apparatus for implementing spread memory layout
US7315301B1 (en) Computer graphics processing system, computer memory, and method of use with computer graphics processing system utilizing hierarchical image depth buffer
US6160557A (en) Method and apparatus providing efficient rasterization with data dependent adaptations
EP1725989B1 (en) Register based queuing for texture requests
US7042462B2 (en) Pixel cache, 3D graphics accelerator using the same, and method therefor
EP1016068B1 (en) Reordering of memory references for pixels in a page-mode memory architecture
US6650333B1 (en) Multi-pool texture memory management
US20080074430A1 (en) Graphics processing unit with unified vertex cache and shader register file
EP0837449A2 (en) Image processing system and method
US20070211070A1 (en) Texture unit for multi processor environment
KR100550240B1 (en) Tiled graphics architecture
US6891546B1 (en) Cache memory for texture mapping process in three-dimensional graphics and method for reducing penalty due to cache miss
EP1721298A2 (en) Embedded system with 3d graphics core and local pixel buffer
US5844571A (en) Z buffer bandwidth reductions via split transactions
US6300953B1 (en) Apparatus and method for grouping texture cache requests
JP2882465B2 (en) Image generation method and apparatus
US6683615B1 (en) Doubly-virtualized texture memory
US7710425B1 (en) Graphic memory management with invisible hardware-managed page faulting
US6590579B1 (en) System for low miss rate replacement of texture cache lines
Antochi et al. Selecting the optimal tile size for low-power tile-based rendering

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20021028

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR

AX Request for extension of the european patent

Free format text: AL;LT;LV;MK;RO;SI

17Q First examination report despatched

Effective date: 20050622

17Q First examination report despatched

Effective date: 20050622

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20071012

REG Reference to a national code

Ref country code: HK

Ref legal event code: WD

Ref document number: 1049537

Country of ref document: HK