EP0803859A2

EP0803859A2 - System and method for optimizing storage requirements for an N-way distribution channel

Info

Publication number: EP0803859A2
Application number: EP96118476A
Authority: EP
Inventors: John A. Dykstal; Darel N. Emmot
Original assignee: Hewlett Packard Co
Current assignee: HP Inc
Priority date: 1996-04-23
Filing date: 1996-11-18
Publication date: 1997-10-29
Also published as: JPH1083457A; EP0803859A3

Abstract

In a texture mapping computer graphics system including a texture mapping chip (46) that stores a plurality of texels (S, T) and multiple frame buffer controller chips (50A-50E) that process the texels, an interface is provided between the texture mapping chip (46) and the frame buffer controller chip. The interface includes a texel array storage unit, coupled between the texture mapping chip (46) and the frame buffer controller chips (50A-50E), that temporarily stores a limited number of texels, each texel being destined for a particular frame buffer controller chip. A control unit (110), coupled to the texel array storage unit (90), controls shifting texels from the texture mapping chip (46) into locations within the texel array storage unit (90) and transferring texels from the texel array storage unit (90) into appropriate frame buffer controller chips (50A-50E). A plurality of address storage units (114A-114E), coupled to the control unit (90), store addresses of locations within the texel array storage unit (90) in which texels are stored. Each address storage unit (114A-114E) corresponds to a different frame buffer controller chip (50A-50E).

Description

Field of the Invention

The present invention relates generally to a texture mapping computer graphics system and, more particularly, to a system and method for buffering texture data transferred between circuit boards.

Background of the Invention

Computer graphics systems commonly are used for displaying graphical representations of objects on a two dimensional display screen. Current computer graphics systems can provide highly detailed representations and are used in a variety of applications.
In typical computer graphics systems, an object to be represented on the display screen is broken down into a plurality of graphics primitives. Primitives are basic components of a graphics picture and may include points, lines, vectors and polygons, such as triangles. Typically, a hardware/software scheme is implemented to render, or draw, on the two-dimensional display screen, the graphics primitives that represent the view of one or more objects being represented on the screen.
Typically, the primitives that define the three-dimensional object to be rendered are provided from a host computer, which defines each primitive in terms of primitive data. For example, when the primitive is a triangle, the host computer may define the primitive in terms of the x,y,z coordinates of its vertices, as well as the R,G,B color values of each vertex. Rendering hardware interpolates the primitive data to compute the display screen pixels that are turned on to represent each primitive, and the R,G,B values for each pixel.
Early graphics systems failed to display images in a sufficiently realistic manner to represent or model complex three-dimensional objects. The images displayed by such systems exhibited extremely smooth surfaces absent textures, bumps, scratches, shadows and other surface details present in the object being modeled.
As a result, methods were developed to display images with improved surface detail. Texture mapping is one such method that involves mapping a source image, referred to as a texture, onto a surface of a three-dimensional object, and thereafter mapping the textured three-dimensional object to the two-dimensional graphics display screen to display the resulting image. Surface detail attributes commonly texture mapped include color, specular reflection, vector perturbation, specularity, transparency, shadows, surface irregularities and grading.
Texture mapping involves applying one or more point elements (texels) of a texture to each point element (pixel) of the displayed portion of the object to which the texture is being mapped. Texture mapping hardware is conventionally provided with information indicating the manner in which the texels in a texture map correspond to the pixels on the display screen that represent the object. Each texel in a texture map is defined by S and T coordinates which identify its location in the two-dimensional texture map. For each pixel, the corresponding texel or texels that map to it are accessed from the texture map, and incorporated into the final R,G,B values generated for the pixel to represent the textured object on the display screen.
It should be understood that each pixel in an object primitive may not map in one-to-one correspondence with a single texel in the texture map for every view of the object. For example, the closer the object is to the view port represented on the display screen, the larger the object will appear. As the object appears larger on the display screen, the representation of the texture becomes more detailed. Thus, when the object consumes a fairly large portion of the display screen, a large number of pixels is used to represent the object on the display screen, and each pixel that represents the object may map in one-to-one correspondence with a single texel in the texture map, or a single texel may map to multiple pixels. However, when the object takes up a relatively small portion of the display screen, a much smaller number of pixels is used to represent the object, resulting in the texture being represented with less detail, so that each pixel may map to multiple texels. Each pixel may also map to multiple texels when a texture is mapped to a small portion of an object. Resultant texel data is calculated for each pixel that maps to more than one texel, and typically represents an average of the texels that map to that pixel.
Texture mapping hardware systems typically include a local memory that stores data representing a texture associated with the object being rendered. As discussed above, a pixel may map to multiple texels. If it were necessary for the texture mapping hardware to read a large number of texels that map to a pixel from the local memory to generate an average value, then a large number of memory reads and the averaging of many texel values would be required, which would be time consuming and would degrade system performance.
To overcome this problem, a scheme has been developed that involves the creation of a series of MIP maps for each texture, and storing the MIP maps of the texture associated with the object being rendered in the local memory of the texture mapping hardware. A MIP map for a texture includes a base map that corresponds directly to the texture map as well as a series of filtered maps wherein each successive map is reduced in size by a factor of two in each of the two texture map dimensions. An illustrative example of a set of MIP maps is shown in Fig. 1. The MIP (multum in parvo-many things in a small place) maps include a base map 100 that is eight-by-eight texels in size, as well as a series of maps 102, 104 and 108 that are respectively four-by-four texels, two-by-two texels, and one texel in size.
The four-by-four map 102 is generated by box filtering (decimating) the base map 100, such that each texel in the map 102 corresponds to an average of four texels in the base map 100. For example, the texel 110 in map 102 equals the average of the texels 112-115 in map 100, and texels 118 and 120 in map 102 respectively equal the averages of texels 121-124 and 125-128 in map 100. The two-by-two map 104 is similarly generated by box filtering map 102, such that texel 130 in map 104 equals the average of texels 110 and 118-120 in map 102. The single texel in map 108 is generated by averaging the four texels in map 104.
Conventional graphics systems generally download, from the main memory of the host computer to the local memory of the texture mapping hardware, the complete series of MIP maps for any texture that is to be used with the primitives to be rendered on the display screen. Thus, the texture mapping hardware can access texture data from any of the series of MIP maps. The determination of which map to access to provide the texel data for any particular pixel is based upon the number of texels to which the pixel maps. For example, if the pixel maps in one-to-one correspondence with a single texel in the texture map, then the base map 100 is accessed. However, if the pixel maps to four, sixteen or sixty-four texels, then the maps 102, 104 and 108 are respectively accessed because those maps respectively store texel data representing an average of four, sixteen and sixty-four texels in the texture map.
A pixel may not map directly to any one texel in the selected map, and may fall between two or more texels. Some graphics systems employ bi-linear interpolation to accurately produce texel data when this occurs. If a pixel maps into a MIP map between two or more texel entries, then the resulting texel data used is a weighted average of the closest texel entries. Thus, the texel data corresponding to any pixel can be the weighted average of as many as four texel entries in a single map. For example, if a pixel maps to a location in map 102 indicated at 132, the resulting texel data mapping to that pixel would be the weighted average of the texels 110 and 118-120.
Pixels may also not map directly into any one of the maps in the series of MIP maps, and may fall between two maps. For example, a pixel may map to a number of texels in the texture map that is greater than one but less than four. Some graphics systems address this situation by interpolating between the two closest MIP maps to achieve the resultant texel data. For the example above wherein a pixel maps to greater than one but less than four texels in the texture map, the texel data provided by maps 100 and 102 would be interpolated to achieve the resultant texel data for the pixel. When combined with the above-described interpolation of multiple texel entries in a single map, this scheme is known as tri-linear interpolation, and can lead to resultant texel data for any one pixel being generated as a weighted average of as many as eight texels, i.e., the four closest texels in each of the two closest maps.
In pipelined systems, in which various operations are performed simultaneously on different object primitives by different system elements, it often is necessary to buffer data being transferred between different chips or boards of the system. It is desirable in such systems to reduce the size, cost and complexity of the buffering hardware.

Summary of the Invention

In one embodiment of the invention, in a texture mapping computer graphics system, a method is provided for transferring texture data including a plurality of texels from a texture mapping chip to multiple frame buffer controller chips. The method includes the following steps: receiving texels from the texture mapping chip, each texel being destined for a particular frame buffer controller chip; temporarily storing a limited number of texels in a texel array storage unit; shifting each texel into each of a plurality of first storage registers, wherein each frame buffer controller chip has a corresponding first storage register; and transferring the each texel from the first storage registers into each frame buffer controller chip.
In one embodiment, the step of transferring farther includes the step of transferring a portion of each texel from the first storage registers into second storage registers and then transferring the portions of the texels from the second storage registers into the frame buffer controller chips.
In one embodiment, the method farther includes, for each texel stored in the texel array storage unit, the step of storing an address of the texel array storage unit within one of a plurality of address storage units, wherein each frame buffer controller chip has a corresponding address storage unit.
In one embodiment, the method farther includes, during the step of receiving texels, the step of determining, for each texel, for which frame buffer controller chip the texel is destined. The method also includes, before the step of shifting, the step of determining the highest priority frame buffer controller chip for receiving a texel.
In another embodiment of the invention, a texture mapping computer graphics system includes a texture mapping chip that stores a plurality of texels and multiple frame buffer controller chips that process the texels. An interface between the texture mapping chip and the frame buffer controller chip includes a texel array storage unit, coupled between a portion of the texture mapping chip and the frame buffer controller chips, that temporarily stores a limited number of texels, each texel being destined for a particular frame buffer controller chip. A control unit, coupled to the texel array storage unit, controls shifting texels from the texture mapping chip into locations within the texel array storage unit and transferring texels from the texel array storage unit into appropriate frame buffer controller chips.
In an embodiment, the interface further includes a plurality of address storage units, coupled to the control unit, that store addresses of locations within the texel array storage unit in which texels are stored, wherein each address storage unit corresponds to a different frame buffer controller chip.
In an embodiment, the control unit includes a first portion that controls shifting texels from a texel interpolator into locations within the texel array storage unit and a second portion that controls transferring texels from the texel array storage unit into appropriate frame buffer controller chips.
In an embodiment, the first portion of the control unit includes a decoder, coupled to the texture mapping chip, that determines to which frame buffer controller chip each texel is destined and, for each texel, enables writing the texel array storage unit address to the corresponding address storage unit. The second portion of the control unit includes a priority decoder that determines relative priorities of the frame buffer controller chips for receiving texels from the texel array storage unit.
In an embodiment, the interface further includes a plurality of registers, one register corresponding to each frame buffer controller chip, each register coupled between the texel array storage unit and the corresponding frame buffer controller chip, for temporarily storing texels destined for the frame buffer controller chip.
An even farther embodiment of the invention is directed to a texture mapping computer graphics system including a texture mapping chip that stores a plurality of texels. Also included is a plurality of frame buffer controller chips, coupled to the texture mapping chip, each frame buffer controller chip receiving and processing different texels from the texture mapping chip. A texel array storage unit, coupled between the texture mapping chip and the frame buffer controller chips, temporarily stores texels when the texels are transferred from the texture mapping chip to the frame buffer controller chips.
The system further includes a control unit, coupled to the texel array storage unit, having a first portion that controls shifting texels from the texture mapping chip into locations within the texel array storage unit, and a second portion that controls transferring texels from the texel array storage unit into appropriate frame buffer controller chips.
In an embodiment, the system further includes a plurality of address storage units, coupled to the control unit, that store addresses of locations within the texel array storage unit in which texels are stored, wherein each address storage unit corresponds to a different frame buffer controller chip.
In an embodiment, the first portion of the control unit includes a decoder, coupled to the texture mapping chip, that determines to which frame buffer controller chip each texel is destined and, for each texel, enables writing the texel array storage unit address to the corresponding address storage unit. The second portion of the control unit includes a priority decoder that determines relative priorities of the frame buffer controller chips for receiving texels from the texel array storage unit.

Brief Description of the Drawings

For a better understanding of the present invention, reference is made to the accompanying drawings, which are incorporated herein by reference and in which:

Fig. 1 is a graphical illustration of a set of texture MIP maps;
Fig. 2 is a block diagram of one embodiment of an overall computer graphics system;
Fig. 2A is a block diagram of another embodiment of an overall computer graphics system;
Fig. 3 is a block diagram of texture mapping hardware;
Fig. 4 is a more detailed block diagram of the parameter interpolator element of the texture mapping hardware of Fig. 3;
Fig. 5 is a block diagram illustrating one embodiment of the screen space interleaving for the frame buffer controller chips according to the present invention;
Fig. 6 is a block diagram illustrating the texel buffering required within the interface to the frame buffer controller chips on the texture mapping chip, according to the present invention;
Fig. 7 is a block diagram of the interface to the frame buffer controller chips on the texture mapping chip;
Fig. 8 is a block diagram of a first portion of the control unit within the frame buffer controller chips interface; and
Fig. 9 is a block diagram of a second portion of the control unit within the frame buffer controller chips interface.

Detailed Description

I. System Overview

Fig. 2 is a block diagram of one embodiment of a graphics system that includes texture mapping hardware. The present invention is directed to an interface for buffering data transferred between boards. It should be understood that the illustrative implementation shown is merely exemplary with respect to the number of boards and chips, the manner in which they are partitioned, the bus widths, and the data transfer rates. Numerous other implementations can be employed.
As shown, the system includes a front end board 10, a texture mapping board 12, and a frame buffer board 14. The front end board communicates with a host computer 15 over a 52-bit bus 16. The front end board receives primitives to be rendered from the host computer over bus 16. The primitives are specified by x,y,z vector coordinate data, R,G,B color data and texture S,T coordinates, all for portions of the primitives, such as for the vertices when the primitive is a triangle. Data representing the primitives in three dimensions then is provided by the front end board 10 to the texture mapping board 12 and the frame buffer board 14 over 85-bit bus 18. The texture mapping board interpolates the primitive data received to compute the screen display pixels that will represent the primitive, and determines corresponding resultant texture data for each primitive pixel. The resultant texture data is provided to the frame buffer board over five 11-bit buses 28, which are shown in Fig. 2 as a single bus to clarify the figure. As will be described in detail herein, the present invention relates to an interface on the texture mapping board 12 for buffering resultant texture data destined for the frame buffer board.
The frame buffer board 14 also interpolates the primitive data received from the front end board 10 to compute the pixels on the display screen that will represent each primitive, and to determine object color values for each pixel. The frame buffer board then combines, on a pixel by pixel basis, the object color values with the resultant texture data provided from the texture mapping board, to generate resulting image R,G,B values for each pixel. R,G,B color control signals for each pixel are respectively provided over R,G,B lines 29 to control the pixels of the display screen (not shown) to display a resulting image on the display screen that represents the texture mapped primitive.
The front end board 10, texture mapping board 12 and frame buffer board 14 each is pipelined and operates on multiple primitives simultaneously. While the texture mapping and frame buffer boards operate on primitives previously provided by the front end board, the front end board continues to operate upon and provide new primitives until the pipelines in the boards 12 and 14 become full.
The front end board 10 includes a distributor chip 30, 3 three-dimensional (3-D) geometry accelerator chips 32A, 32B and 32C, a two-dimensional (2-D) geometry accelerator chip 34 and a concentrator chip 36. The distributor chip 30 receives the X,Y,Z coordinate and color primitive data over bus 16 from the host computer, and distributes 3-D primitive data evenly among the 3-D geometry accelerator chips 32A, 32B and 32C. In this manner, the system bandwidth is increased because three groups of primitives are operated upon simultaneously. Data is provided over 40-bit bus 38A to the 3-D geometry accelerator chips 32A and 32B, and over 40-bit bus 38B to chip 32C. Both buses 38A and 38B transfer data at a rate of 60 MHz and provide sufficient bandwidth to support two 3-D geometry accelerator chips. 2-D primitive data is provided over a 44-bit bus 40 to the 2-D geometry accelerator chip 34 at a rate of 40 MHz.
Each 3-D geometry accelerator chip transforms the x,y,z coordinates that define the primitives received into corresponding screen space coordinates, determines object R,G,B values and texture S,T values for the screen space coordinates, decomposes primitive quadrilaterals into triangles, and computes a triangle plane equation to define each triangle. Each 3-D geometry accelerator chip also performs view clipping operations to ensure an accurate screen display of the resulting image when multiple windows are displayed, or when a portion of a primitive extends beyond the view volume represented on the display screen. Output data from the 3-D geometry accelerator chips 32A, 32B and 32C respectively is provided over 44- bit buses 42A, 42B and 42C to concentrator chip 36 at a rate of 60 MHz. Two-dimensional geometry accelerator chip 34 also provides output data to concentrator chip 36 over a 46-bit bus 44 at a rate of 45 MHz. Concentrator chip 36 combines the 3-D primitive output data received from the 3-D geometry accelerator chips 32A-C, re-orders the primitives to the original order they had prior to distribution by the distributor chip 30, and provides the combined primitive output data over bus 18 to the texture mapping and frame buffer boards.
Texture mapping board 12 includes a texture mapping chip 46 and a local memory 48 which preferably is arranged as a cache memory. The local memory may be formed from a plurality of SDRAM (synchronous dynamic random access memory) chips. The cache memory 48 stores texture MIP map data associated with the primitives being rendered in the frame buffer board. The texture MIP map data is downloaded from a main memory 17 of the host computer 15, over bus 40, through the 2-D geometry accelerator chip 34, and over 24-bit bus 24.
The texture mapping chip 46 successively receives primitive data over bus 18 representing the primitives to be rendered on the display screen. As discussed above, the primitives provided from the 3-D geometry accelerator chips 32A-C include points, lines and triangles. The texture mapping board does not perform texture mapping of points or lines, and operates only upon triangle primitives. The data representing the triangle primitives includes the x,y,z object pixel coordinates for at least one vertex, the object color R,G,B values of the at least one vertex, the coordinates in S,T of the portions of the texture map that correspond to the at least one vertex, and the plane equation of the triangle. The texture mapping chip 46 ignores the object pixel z coordinate and the object color R,G,B values. The chip 46 interpolates the x,y pixel coordinates and interpolates S and T coordinates that correspond to each x,y screen display pixel that represents the primitive. For each pixel, the texture mapping chip accesses the portion of the texture MIP map that corresponds to the pixel from the cache memory, and computes resultant texture data for the pixel, which may include a weighted average of multiple texels.
The cache may store sixty-four blocks of 256x256 texels. Unlike the local memory employed in the texture mapping hardware of prior art systems, the cache memory may not store the entire series of MIP maps of the texture that maps to the primitive being rendered, such as for large textures. Rather, the cache memory stores at any one time only the particular portions of the series of MIP maps actually used in currently rendering the primitive. Therefore, for most applications, only a portion of the complete texture data for the image being rendered will be stored in the cache memory at any one time.
The complete series of MIP maps for each texture is arranged and stored in the main memory 17 of the host computer 15. For each pixel of the primitive being rendered, the texture mapping chip 46 accesses a directory of the cache memory 48 to determine whether the corresponding texel or texels of the texture MIP maps currently are present in the cache. If the corresponding texels are stored in the cache memory at the time of the access, then a cache hit occurs, and the texels are read from the cache and operated upon by the texture mapping chip 46 to compute the resultant texture data which is passed to the frame buffer board. If, however, the corresponding texels for the current primitive pixel are not stored in the cache memory when accessed by the texture mapping chip 46, a cache miss occurs. When a cache miss occurs, the portion of the texture MIP map data needed to render the primitive is downloaded from the main memory 17 of the host computer 15 into the cache memory 48, possibly replacing some data previously stored therein. Unlike conventional texture mapping systems that download the entire series of MIP maps for any primitive being rendered, the present system downloads only the portion of the series of MIP maps actually needed to currently render the primitive or the currently rendered portion thereof. When a cache miss occurs, an interrupt control signal is generated by the texture mapping chip 46 to initiate a texture interrupt manager in the host computer 15. The interrupt control signal is provided over line 94 to the distributor chip 30, which in turn provides an interrupt signal over line 95 to the host computer.
The requested texture data is retrieved by the host computer from its main memory and is downloaded to the texture mapping board 48 over bus 24, bypassing the 3-D primitive rendering pipeline through the front end board and the texture mapping chip. Thus, when a cache miss interrupt occurs, the front end board can continue to operate upon 3-D primitives and provide output primitive data over bus 18 to the texture mapping chip and the frame buffer board, while the texture data associated with a primitive that caused the cache miss is being downloaded from main memory 17. In contrast to conventional texture mapping systems, the downloading of texture data to the texture mapping hardware does not require a flushing of the 3-D primitive pipeline, thereby increasing the bandwidth and performance of the system.
According to the present invention, the resultant texture data is buffered within a frame buffer board interface on the texture mapping board in the texture mapping chip before being shifted to the frame buffer board. The resultant texture data for each pixel is stored temporarily in a RAM array (not shown) accessible by all five frame buffer controller chips 50A-50E on the frame buffer board. From the RAM array, resultant texture data is shifted into registers (not shown) accessible by all five frame buffer controller chips 50A-50E in parallel through buses 28. As described in detail below, an interface control unit (not shown) of the present invention coordinates the transfer of resultant texture data from the texture mapping chip 46 to the five frame buffer controller chips 50A-50E.
The frame buffer controller chips 50A-E respectively are coupled to groups of associated VRAM (video random access memory) chips 51 A-E. The frame buffer board further includes four video format chips, 52A, 52B, 52C and 52D, and a RAMDAC (random access memory digital-to-analog converter) 54.
As described in more detail below, frame buffer controller chips control different, non-overlapping segments of the display screen. Each frame buffer controller chip receives primitive data from the front end board over bus 18, and resultant texture mapping data from the texture mapping board over bus 28. The frame buffer controller chips interpolate the primitive data to compute the screen display pixel coordinates in their respective segments that represent the primitive, and the corresponding object R,G,B color values for each pixel coordinate. For those primitives (i.e., triangles) for which resultant texture data is provided from the texture mapping board, the frame buffer controller chips combine, on a pixel by pixel basis, the object color values and the resultant texture data to generate final R,G,B values for each pixel to be displayed on the display screen.
The manner in which the object and texture color values are combined can be controlled in a number of different ways. For example, in a replace mode, the object color values can be simply replaced by the texture color values, so that only the texture color values are used in rendering the pixel. Alternatively, in a modulate mode, the object and texture color values can be multiplied together to generate the final R,G,B values for the pixel. Furthermore, a color control word can be stored for each texel that specifies a ratio defining the manner in which the corresponding texture color values are to be combined with the object color values. A resultant color control word can be determined for the resultant texel data corresponding to each pixel and provided to the frame buffer controller chips over bus 28 so that the controller chips can use the ratio specified by the corresponding resultant control word to determine the final R,G,B values for each pixel.
The resulting image video data generated by the frame buffer controller chips 50A-E, including R,G,B values for each pixel, is stored in the corresponding VRAM chips 51 A-E. Each group of VRAM chips 51A-E includes eight VRAM chips, such that forty VRAM chips are located on the frame buffer board. Each of video format chips 52A-D is connected to, and receives data from, a different set often VRAM chips. The video data is serially shifted out of the VRAM chips and is respectively provided over 64- bit buses 58A, 58B, 58C, and 58D to the four video format chips 52A, 52B, 52C and 52D at a rate of 33 MHz. The video format chips format the video data so that it can be handled by the RAMDAC and provide the formatted data over 32- bit buses 60A, 60B, 60C and 60D to RAMDAC 54 at a rate of 33 MHz. RAMDAC 54, in turn, converts the digital color data to analog R,G,B color control signals and provides the R,G,B control signals for each pixel to a screen display (not shown) along R,G,B control lines 29.
Hardware on the texture mapping board 12 and the frame buffer board 14 can be replicated so that certain primitive rendering tasks can be performed on multiple primitives in parallel, thereby increasing the bandwidth of the system. An example of such an alternate embodiment is shown in Fig. 2A, which is a block diagram of a computer graphics system having certain hardware replicated. The system of Fig. 2A includes four 3-D geometry accelerator chips 32A, 32B, 32C and 32D, two texture mapping chips 46A and 46B respectively associated with cache memories 48A and 48B, and ten frame buffer controller chips 50A-50J, each with an associated group of VRAM chips. The operation of the system of Fig. 2A is similar to that of the system of Fig. 2, described above. The replication of the hardware in the embodiment of Fig. 2A allows for increased system bandwidth because certain primitive rendering operations can be performed in parallel on multiple primitives.

II. Texture Mapping Chip Overview

A block diagram of the texture mapping chip 46 is shown in Fig. 3. The chip 46 includes a front end pipeline interface 60 that receives object and texture primitive data from the front end board over 64-bit bus 18. The triangle primitives operated upon the texture mapping chip are defined by up to fifty-two 32-bit digital words but may be defined by words of different lengths. The pipeline interface includes a set of master registers and a set of corresponding slave registers. During rendering, the master registers are filled sequentially with the fifty-two digital words of data that define the primitive. Then, upon receipt of an appropriate rendering command, the data is shifted into the slave registers in the pipeline interface, allowing, in a pipelined fashion, the master registers to be filled with data representing another primitive. The primitive data provided over bus 18 includes the x,y,z vector coordinate data, the S,T texture coordinates and the R,G,B object color data for at least one triangle vertice, as well as data representing the triangle plane equation. As discussed above, the texture mapping chip ignores the object pixel z coordinate and the object color R,G,B values, and stores only the other data in the front end pipeline interface 60.
The slave registers of the pipeline interface 60 transfer the primitive data over bus 62 to a parameter interpolator circuit 64. Parameter interpolator circuit 64 interpolates each primitive triangle to determine, for each display screen pixel coordinate that represents the triangle, the S,T texture map coordinates for the texture map that maps to the pixel, and an S and T gradient value (ΔS, ΔT). The S and T gradients respectively equal changes in the S and T coordinates between adjacent pixels, and are computed in a manner discussed below.
The parameter interpolator circuit 64, shown in more detail in Fig. 4, includes an edge stepper 66, a FIFO ("first-in, first-out") buffer 68, a span stepper 70 and a gradient and perspective correction circuit 72, all connected in series. The edge stepper starts at the x,y pixel coordinate of one of the triangle vertices, and utilizing the triangle plane equation, steps the edges of the triangle to determine the pixel coordinates that define the triangle edges. For each pixel coordinate, texture map S and T coordinates are determined, based on the S,T values of the triangle vertices, to identify which texels in the texture map correspond to each display screen pixel coordinate. The pixel and texel coordinates temporarily are stored in the FIFO buffer and then are provided to the span stepper. At each x,y pixel location along an edge of the triangle, the span stepper steps across the corresponding span of the triangle to determine the S,T texel coordinates for each pixel location along the span.
Each S and T coordinate for a display screen pixel may have an integer portion and a fractional portion if the pixel does not map directly (in one-to-one correspondence) to a single texel in one of the series of MIP maps for the texture. As explained above, when mapped to the texture map, each display screen pixel may lie between multiple texels in one of the series of MIP maps for the texture, and furthermore, may lie between adjacent (in size) MIP maps in the series.
The gradient and perspective correction circuit 72 determines the gradient values of S and T(ΔS, ΔT) for each display screen pixel. In one embodiment, gradient ΔS is selected to be the larger of gradient ΔSx and gradient ΔSy, wherein gradient ΔSx is the change in the S coordinate in the texture map as coordinate x changes between adjacent pixels on the display screen, and gradient ΔSy is the change in the S coordinate as coordinate y changes between adjacent pixels. Gradient ΔT is similarly computed. The gradients ΔS, ΔT for a display screen pixel indicate the rate of change in coordinate position within the texture map for a change of one pixel on the display screen in the corresponding S,T dimension, and are used to determine which MIP map or maps should be accessed to provide the resultant texture data for the pixel. For example, a gradient equal to two for a display screen pixel indicates that the pixel maps to four (i.e., 22 as discussed below) texels, so that the MIP map reduced in size by two from the base map (e.g., the map 102 in Fig. 1) should be accessed to provide the resultant texture data for the pixel. Thus, as the gradient increases, the size of the MIP map that is accessed to provide the resultant texture data for the pixel is reduced.
A single gradient, equal to the larger of ΔS and ΔT, may be used to select the appropriate MIP map for each pixel, such that the gradient equals the largest of ΔSx, ΔSy, ΔTx, and ΔTy for the pixel. It should be understood, however, that the gradient can alternatively be selected in a different fashion, such as by selecting the smallest of those values, an average of those values, or some other combination. Since a single gradient is selected that indicates the rate of change in only one of the S,T coordinates, the square ofthe gradient represents the number of texels that map to the corresponding pixel.
From the gradient, the parameter interpolator determines the closest map to which the pixel maps, and a value indicating by how much the pixel varies from mapping directly to that map. The closest map is identified by the whole number portion of a map number, the value indicating by how much the pixel varies from a direct mapping is identified by a fractional component of the map number.
Referring again to the block diagram of the texture mapping chip in Fig. 3, the texel data output from the parameter interpolator circuit 64 is provided over line 70 to a tiler and boundary checker 72, which determines the address of the four texels that are closest to the position in each of the texture maps specified by the texel data, and cheeks to determine whether each is within the boundary of the texture. The texel data includes the interpolated S, T coordinates (integer and fractional values) as well as the map number and map fraction. The tiler uses the integer portion of the S and T coordinates computed by the parameter interpolator 64, and adds one to the integer portion of each to generate the addresses of the four closest texels. The boundary checker then determines whether the S,T coordinates for any of these four texels fall outside the boundary of the texture map. If a display screen pixel maps to an S,T coordinate position that falls outside the boundary of the texture map, then one of several texture mapping schemes is implemented to determine whether any resultant texture data is to be generated for that pixel, and how that data is to be generated. Examples of such schemes include wrapping (a repeat of the texture), mirroring (a repeat of the mirror image of the texture), turning off texture mapping outside the boundary, and displaying a solid color outside the boundary.
The capability of allowing a pixel to map to a location in a texture map that is beyond its boundary provides flexibility in the manner in which textures can be mapped to object primitives. It may be desirable to map a texture to an object in a repeating fashion, such that the texture is mapped to multiple portions of the object. For example, if a texture is defined having S,T coordinates ranging from [0, 0] inclusive through (10, 10) non-inclusive, a user could specify certain portions of the object to map to S,T coordinates [10, 10] inclusive through (20, 20) non-inclusive. The notation of the bracketed inclusive coordinates indicates that those coordinates are included in the portion of the texture mapped to the object, whereas the object maps to only the S,T coordinates up to but not including the non-inclusive coordinates in parentheses. If the wrapping feature is selected for S,T coordinates falling outside the boundary of the texture, pixels having S,T coordinates [10, 10] through (20, 20) would respectively map to the texels at S,T coordinates [0, 0] through (10, 10).
As discussed above, the resultant texture data from a two-dimensional texture map for a single pixel may be the result of a combination of as many as eight texels, i.e., the four closest texels in the two closest MIP maps. There are a number of ways in which the eight texels can be combined to generate the resultant texel data. For example, the single closest texel in the closest map can be selected, so that no averaging is required. Alternatively, the single closest texel in each of the two closest maps can be averaged together based on the value of the gradient. Such schemes do not map the texture as accurately as when the eight closest texels are averaged.
Trilinear interpolation may be supported wherein the resultant texture data for a single pixel may be calculated as a weighted average of as many as eight texels. The gradient representing rates of change of S,T is used to identify the two closest MIP maps from which to access texture data, and the four closest texels within each map are accessed. The average of the four texels within each map is weighted based on which texels are closest to the S,T coordinates of the position in the MIP map that the display screen pixel maps to. The fractional portion of the S and T coordinates for the pixel are used to perform this weighting. The average value from each of the two closest MIP maps is then weighted based upon the value of the gradient. A fractional value is computed from the gradient for use in this weighting process. For example, a gradient of three is half-way between the MIP maps that respectively correspond to gradients of two and four.
The texel interpolation process is performed by the texel interpolators 76. The fractional portions of the S and T coordinates for each display screen pixel are provided from the parameter interpolators, through the tiler/boundary checker, to texel interpolator 76 over lines 74. The fractional portions are used by the texel interpolator to determine the weight afforded each texel during interpolation ofthe multiple texels when computing resultant texel data.
As discussed above, texture MIP maps associated with a primitive being rendered are stored locally in the cache memory 48 (Fig. 2). The cache may be fully associative. The cache may include eight SDRAM chips divided into four interleaves, with two SDRAM chips in each interleave. Four separate controllers are provided, with one corresponding to each interleave so that the SDRAM chips within each interleave can be accessed simultaneously. Each SDRAM chip includes two distinct banks of memory in which different pages of memory can be accessed in consecutive read cycles without incurring repaging penalties commonly associated with accessing data from two different pages (i.e., from two different row addresses) in a conventional DRAM.
The texture data (i.e., the MIP maps) may be divided into texel blocks of data that each includes 256x256 texels. The cache memory can store as many as sixty-four blocks of data at one time. Each block has an associated block tag that uniquely identifies the block. The cache includes a cache directory 78 that stores the block tags that correspond to the blocks of data currently stored in the cache. Each block tag includes a texture identifier (texture ID) that identifies the particular texture that the block of data represents, a map number that identifies the particular MIP map within the texture's series of maps that the block of data represents, and high-order S and T coordinates that identify the location of the block of data within the particular map. The physical location of the block tag within the cache directory represents the location of the corresponding block of data within the cache memory.
MIP maps from more than one texture may be stored in the cache memory simultaneously, with the texture identifier distinguishing between the different textures. Some MIP maps contain fewer than 256x256 texels, and therefore, do not consume an entire block of data. For example, the smaller maps in a series of MIP maps or even the larger maps for small textures may not exceed 256x256 texels. To efficiently utilize memory space, portions of multiple maps may be stored in a single block of texture data, with each map portion being assigned to a sub-block within the block. Each of the multiple maps stored within a single block has an associated sub-texture identifier (ID) that identifies the location of the map within the block.
During rendering, the tiler/boundary checker 72 generates a read cache tag for the block of texture data that maps to the pixel to be rendered. The tags are 23-bit fields that include eight bits representing the texture ID ofthe texture data, a bit used in determining the map number of the texture data, and the seven high-order S and T coordinates of the texture data. The cache directory 78 compares the read cache tag provided from the tiler/boundary with the block tags stored in the directory to determine whether the block of texture data to be used in rendering is in the cache memory. If the block tag of the texture data that maps to the primitive to be rendered is stored in (i.e., hits) the cache directory, then the cache directory generates a block index that indicates the physical location of the block of texture data in the cache that corresponds to the hit tag. A texel address is also generated by the tiler/boundary checker 72 for each texel to be read from the cache and indicates the location of the texel within the block. The texel address includes low-order address bits of the interpolated S,T coordinates for larger size maps, and is computed based on an algorithm described below for smaller size maps. The block index and texel address together comprise the cache address which indicates the location of the texel within the cache. The LSBs of the S and T coordinates for each texel are decoded to determine in which of four cache interleaves the texel is stored, and the remaining bits of the cache address are provided to the texel cache access circuit 82 along with a command over line 84 to read the texel data stored at the addressed location in the cache.
When the read cache tag does not match any of the block tags stored in the cache directory 78, a miss occurs and the cache directory 78 generates an interrupt control signal over line 94 (Fig. 2) to the distributor chip 30 on the front end board, which generates an interrupt over line 95 to the host computer 15. In response to the interrupt, the processor 19 of the host computer executes a service routine which reads the missed block tag from the cache directory and downloads the corresponding block of texture data into the cache memory in a manner that bypasses the 3-D primitive pipeline in the front end board 10 and the texture mapping chip 46. The texture data downloaded from the main memory is provided over bus 24, through the texel port 92 (Fig. 3) to the texel cache access circuit 82, which writes the data to the SDRAMs that form the cache memory.
When a cache miss occurs, the texture mapping chip waits for the new texture data to be downloaded before proceeding with processing the primitive on which the miss occurred. The stages of the pipeline that follow the cache read continue to process those primitives received prior to the miss primitive. Similarly, the stages of the pipeline that precede the cache read also continue to process primitives unless and until the pipeline fills up behind the cache read operation while awaiting the downloading of the new texture data.
During rendering, the later stages of the pipeline in the frame buffer board 14 do not proceed with processing a primitive until the texture data corresponding to the primitive is received from the texture mapping board. Therefore, when a cache miss occurs and the texture mapping chip waits for the new texture data to be downloaded, the frame buffer board 14 similarly waits for the resultant texture data to be provided from the texture mapping chip. As with the texture mapping chip, the stages of the pipeline that follow the stage that receives the texture mapping data continue to process those primitives received prior to the miss primitive, and the stages of the pipeline that precede the stage that receives texture mapping data also continue to process primitives unless and until the pipeline fills up.
It should be understood that when the pipeline of either the texture mapping board or the frame buffer board backs up when waiting for new texture data in response to a cache miss, the pipeline in the front end board 10 will similarly back up. Because cache misses will occur and will result in an access to the host computer main memory and a downloading of texture data that will take several cycles to complete, it is desirable to ensure that the pipeline in the texture mapping chip never has to wait because the pipeline in the frame buffer board has become backed up. Therefore the frame buffer board can be provided with a deeper primitive pipeline than the texture mapping board, so that the texture mapping pipeline should not be delayed by waiting for the frame buffer pipeline to become available.
In one embodiment, the capability is provided to turn off texture mapping. This is accomplished by software operating on the processor 19 of the host computer to set a register in both the texture mapping board 12 and the frame buffer board 14. When set to turn off texture mapping, these registers respectively inhibit the texture mapping chip 46 from providing texture data to the frame buffer board 14, and instruct the frame buffer board to proceed with rendering primitives without waiting for texture data from the texture mapping board.
As described above, for each display screen pixel that is rendered with texture data from a two-dimensional texture map, as many as four texels from one MIP map (bilinear interpolation) or eight texels from two adjacent MIP maps (trilinear interpolation) may be accessed from the cache memory to determine the resultant texture data for the pixel. The texels read from the cache are provided over bus 86 (Fig 3) to the texel interpolator 76, which interpolates the multiple texels to compute resultant texel data for each pixel. The interpolation can vary depending upon a mode established for the system. When a point sampling interpolation mode is established, the resultant texel data equals the single texel that is closest to the location defined by the pixel's S,T coordinates in the texture map. Alternatively, when bilinear or trilinear interpolation is employed, the resultant texel data is respectively a weighted average of the four or eight closest texels in the one or two closest maps. The weight given to each of the multiple texels is determined based upon the value of the gradient and the factional components of the S and T coordinates provided to the texel interpolator 76 from the tiler/boundary checker.
The resultant texel data for the display screen pixels is sequentially provided over bus 88 to a frame buffer interface array storage unit 90. The frame buffer interface array storage unit 90 can, in one embodiment of the invention, temporarily store up to sixty four resultant texels, as explained in greater detail below. As explained below, in accordance with one aspect of the invention, texels are shifted out of the array storage unit 90 into intermediate registers from which the frame buffer controller chips 50A-50E (see Fig. 2) can simultaneously and in parallel access the texels.
Each resultant texel is a 32-bit word including eight bits to represent each of R,G,B and α. The α byte indicates to the frame buffer board 14 (Fig. 2) the manner in which the R,G,B values of the resultant texture data should be combined with the R,G,B values of the object data generated by the frame buffer board in computing final display screen R,G,B values for any pixel that maps to the texel. The frame buffer interface array storage unit outputs T0-T4 are provided to the frame buffer board 14 (Fig. 2) over bus 28. The frame buffer board combines the R,G,B values of the resultant texel data with the object R,G,B values in the manner specified by α to generate final R,G,B values for each display screen pixel.

III. Screen Space Interleaving for Frame Buffer Controller Chips

Fig. 5 is a block diagram illustrating how the screen space is divided among the five frame buffer controller chips 50A-50E (see Fig. 2). A portion of the screen 100 is shown in Fig. 5. In one embodiment, the screen comprises 1280 pixels horizontally and 1024 pixels vertically. An interleave of the screen space is defined to be a portion of a contiguous screen space (including multiple pixels) rendered by one and only one frame buffer controller chip. In this embodiment, an interleave includes two scan lines vertically and 16 pixels wide, including a total of 32 pixels per interleave. The interleaves shown in Fig. 5 are labeled by A, B, C, D and E respectively corresponding to frame buffer controller chips 50A, 50B, 50C, 50D and 50E. Only a portion of the total screen space is illustrated in Fig. 5 including ten interleaves horizontally and four interleaves vertically.
It should be understood that the number of horizontal pixels (1280) in this exemplary embodiment is divisible by five (the number of frame buffer controller chips in this embodiment). Thus, the interleaves are distributed evenly among the five frame buffer controller chips 50A-50E. As a scan line moves horizontally across the screen space, the pattern of interleaves, A, B, C, D and E, repeats itself The purpose behind such an arrangement is so that pixels within adjacent interleaves will be rendered by distinct frame buffer controller chips.
Thus, it would appear that the most number of pixels (or corresponding texels) required to be processed by any one frame buffer controller chip at any one time would be 32. It should be understood, however, that in certain circumstances, interleaves which are diagonally adjacent one another are assigned to the same frame buffer controller chip and, thus, a worst-case scenario would require a single frame buffer controller chip to render 64 pixels within one primitive (triangle). Such a scenario is illustrated in Fig. 6 in which a very small portion of the screen space 100 is illustrated including only four interleaves. As shown, the interleaves (A) assigned to frame buffer controller chip 50A are diagonally adjacent one another.
A portion of a triangle to be rendered is shown by dotted lines 102 and covers all 64 pixels within diagonally adjacent interleaves (A) assigned to frame buffer controller chip 50A as well as eight pixels within interleave (B) and eight pixels within interleave (E). Thus, the total number of pixels (and corresponding texels) required to be rendered for this particular triangle includes 80, of which 64 pixels are required to be rendered by the same frame buffer controller chip 50A. For this particular screen space arrangement (shown in Fig. 5), this is a worst-case scenario.

IV. Texture Mapping Board/Frame Buffer Board Interface

The data interface between the texture mapping chip and the multiple frame buffer controller chips was designed to minimize cost and power by reducing the number of logic gates and reducing the silicon area. As previously discussed with reference to Fig. 2, the last stage of the pipeline inside the texture mapping chip outputs a resultant four-byte (32-bit) texel to one of five frame buffer controller chips, which frame buffer controller chips reside on a different printed circuit board than the texture mapping chip. The interface resides on the texture mapping board in the texture mapping chip and is included in this last pipeline stage.
The interface enables the parallel transmission of texels from the texture mapping chip to all five frame buffer controller chips. As discussed in more detail below, the texel pipeline as it enters the interface is one texel (four bytes) wide whereas the pipeline output (to each frame buffer controller chip) is only one byte wide. Therefore, in order to maintain all five frame buffer controller chips busy simultaneously, the interface was designed to include a texel buffering system. The buffering system minimizes the amount of storage required.
As discussed above with reference to Fig. 3, the texel interpolator 76 outputs a four-byte, 32-bit texel to the frame buffer interface storage array 90 every 45 MHz state. Each 32-bit texel has an additional 3-bit field associated with it which indicates to which of the five frame buffer controller chips the texel is destined. After buffering the texel within the frame buffer interface storage array 90, the texture mapping chip interface provides the texel to the appropriate frame buffer controller chip serially, one byte (8 bits) every 45 MHz state. Transferring one byte at a time reduces the number of pins required for the texture mapping chip interface almost by a factor of four, but increases the total number of states to four in which to transfer the entire four-byte texel.
As discussed above with reference to Fig. 6, 80 pixels (and 80 corresponding texels) is the minimum number of pixels (and texels) (from a portion of a triangle, for example) of which 64 pixels (and 64 texels) must be rendered by the same frame buffer controller chip. As described, the input to the interface receives one texel each 45 MHz state. Thus, the interface receives 80 texels over 80 states. In the 80 states, the interface can output as many as 20 texels to a single frame buffer controller chip. This is so because it requires four states (one byte per state) for the interface to output a single texel. Therefore, only 20 of the 64 texels required to be rendered by the single frame buffer controller chip can be output to that frame buffer controller chip within the 80 states. Thus, 44 texels remain in the interface to be buffered.
A possible way to organize the interface to the frame buffer controller chips on the texture mapping chip is as follows. A texel from the texel interpolator 76 arrives at the interface once per each 45 MHz state. A five-way multiplexer, residing within the last stage of the interpolator, then would guide the texel into one of five FIFO (first in, first out) storage buffers, depending upon to which of the five frame buffer controller chips the texel is destined. Each of the FIFO storage buffers would correspond to a different frame buffer controller chip and each would be at least 44 texels deep; 48 texels of depth is selected herein for example. Remember that 44 texels is the maximum number required to be buffered at any one time for a single frame buffer controller chip. A state machine then would "handshake" between each FIFO buffer and the corresponding frame buffer controller chip to unload the texel from the FIFO buffer to the frame buffer controller chip, one byte at a time.
This possible solution requires a significant amount of storage. Specifically, having five FIFOs, each with 48 texels deep of storage capacity, yields the following amount of storage: (48 texels) ∗ (4 bytes/texel) ∗ (8 bits/byte) ∗ (5 FIFOs) = 7680 storage cells.
Fig. 7 is a block diagram illustrating one embodiment of the interface according to the present invention, which embodiment uses less storage space than the possible solution discussed above. As illustrated, the interface includes a RAM array 90 that temporarily stores texels, five address FIFO buffers 114A-114E that store addresses of texels stored in the RAM array, and registers 120A-120E and registers 124A-124E through which the texels are shifted on route to frame buffer controller chips 50A-50E respectively. The interface also includes a control unit 110 that controls the shifting of data throughout such interface elements.
The texels output by interpolator 76 (see Fig. 3) are provided along bus 88 to 64-texel deep RAM array 90, which RAM array is shared by all five frame buffer controller chips 50A-50E. As described in more detail below, each frame buffer controller chip port uses one of the four states to unload one of its texels from the shared RAM array into a temporary register 120A-120E. During the remaining three states, the other frame buffer controller chip ports may access the RAM array 90 as needed. Each of the five address FIFO buffers 114A-114E corresponds to one of the frame buffer controller chips 50A-50E. Each address FIFO buffer 114A-114E can store up to 48 six-bit words. Each six-bit word stored is an address denoting one of the 64 locations within the RAM array 90 in which a texel is stored. The address FIFO buffer 114A, 114B, 114C, 114D or 114E that stores the address of a particular texel is the address FIFO buffer associated with the frame buffer controller chip to which the texel is destined.
As described in more detail below, control unit 110 controls storing texels received from the texel interpolator within the RAM array 90 and corresponding addresses within the address FIFO buffers 114A-114E, and also controls unloading texels from the RAM array 90 through the intermediate registers and into one of the frame buffer controller chips 50A-50E.
When a texel is provided by the texel interpolator to the interface, the 32-bit texel is provided along bus 88 to a data input of the RAM array 90. The 3-bit frame buffer controller number word is decoded in the interpolator to determine to which of the frame buffer controller chips the texel is destined. Five one-bit signals are provided along buses 106 to the control unit with only one of the five signals being asserted at any one time. The one asserted signal corresponds to the frame buffer controller chip to which the texel is destined. As will be described in more detail below, the control unit 110 also determines which locations within the RAM array 90 are empty and selects an empty location in which to load the texel. The address of this location is written to the appropriate FIFO buffer 114A, 114B, 114C, 114D or 114E through one of 6- bit buses 112A, 112B, 112C, 112D or 112E. That address also is provided along bus 112 to an address input of the RAM array 90 so that the texel can be loaded into the appropriate location within the RAM array.
As shown in Fig. 7, also included in the interface are five 32-bit registers 120A-120E, one register corresponding to each frame buffer controller chip. Coupled between each 32-bit register 120A-120E and the corresponding frame buffer controller chip 50A-50E is a corresponding 8-bit register 124A-124E. As is described in more detail below, when (1) one of the registers 120A, 120B, 120C, 120D or 120E is available to receive resultant texel data, (2) data corresponding to that frame buffer controller chip presently is stored in the RAM array 90, and (3) that register 120A, 120B, 120C, 120D or 120E is of highest priority among those available, then a texel is unloaded from the RAM array 90 along bus 118 to that register 120A, 120B, 120C, 120D or 120E. This transfer occurs during one state. The control unit 110 communicates with each of the FIFO buffers 114A-114E to determine whether any currently store an address of a texel stored within the RAM array 90. The addresses are transferred to the control unit 110 along buses 108A-108E and then to the address input of the RAM array 90 along bus 112. The control unit 110 also communicates with each of the registers 120A-120E to determine whether any is available to receive data along bus 116.
After a texel is written to one of the registers 120A, 120B, 120C, 120D or 120E, that texel is shifted into the respective one of the intermediate registers 124A, 124B, 124C, 124D or 24E, one byte at a time over bus 122. Then, each byte of the texel is shifted from the intermediate register 124A, 124B, 124C, 124D or 124E to the corresponding frame buffer controller chip 50A, 50B, 50C, 50D or 50E over bus 28A, 28B, 28C, 28D or 28E. Thus, it takes four states for the texel to be shifted from one of the registers 120A, 120B, 120C, 120D or 120E to the corresponding frame buffer controller chip 50A, 50B, 50C, 50D or 50E. It should be appreciated that once a texel is shifted from the RAM array 90 to one of the registers 120A, 120B, 120C, 120D or 120E, another one of the registers 120A, 120B, 120C, 120D or 120E can access texels from the RAM array. Thus, during three of the four states required for transferring a texel from the RAM array to one of the frame buffer controller chips, other registers can access the RAM array.
The embodiment shown in Fig. 7 requires much less storage than the previously-mentioned design. The total storage includes (64 texels) ∗ (4 bytes/texel) ∗ (8 bits/byte) + (48 addresses) ∗ (6 bits/address) ∗ (5 FIFOs) = 3488 storage cells.

V. Control Unit

As previously described, the control unit 110 includes a first portion that controls transferring resultant texel data from the texel interpolator to the RAM array 90, and a second portion that controls transferring the texels from the RAM array 90 through the registers 120A-120E to the frame buffer controller chips 50A-50B. Fig. 8 is a diagram showing the first portion of the control unit 110. The first portion of the control unit 110 is shown surrounded by a dotted line for ease of illustration. As shown, part of the first portion of control unit 110 is located in the last pipelined stage of the interpolator 76.
When the texel interpolator is ready to provide a texel to the interface, the texel interpolator 76 provides five one-bit valid signals along lines 143A-143E, only one of which is asserted at any one time. The texel then is provided along bus 88 to a data input of the RAM array 90. As shown, each one-bit valid signal is provided as an input to logical OR gate 145, the output of which is provided on line 128 as a write-enable input to the RAM array 90 and as an input to vector register 134. Vector register 134, in this embodiment, stores a 64-bit word, each bit corresponding to a different location in the RAM array 90. When a texel is stored in a particular location within the RAM array 90, the corresponding bit of the vector register is asserted. Similarly, when a texel is unloaded from the RAM array 90, the corresponding bit is cleared.
Vector register 134 and encoder 137 together select an empty location within the RAM array in which the texel is to be stored. When the vector register 134 receives the logical OR result of the five valid signals as an input, indicating that a texel is ready to be loaded within the RAM array from the texel interpolator 76, the vector register 134 outputs the 64-bit word along bus 135 to encoder 137. Encoder 137 then selects one of the empty locations within the RAM array by locating the first zero bit within the vector word. The encoder 137 may randomly select a zero bit among the 64-bit vector word. The 6-bit address associated with that location then is output by the encoder 137 along bus 112 to the address input ofthe RAM array. This 6-bit address also is provided to vector register 134 so that the appropriate bit, corresponding to the addressed location, within the vector word can be set. The valid signal is provided to the write-enable input of the RAM array 90 and the texel data, provided along bus 88, then is stored in the appropriate location within the RAM array.
The first portion of the control unit 110 also includes a decoder 126, five logical AND gates 132A-132E, one AND gate corresponding to each address FIFO buffer 114A-114E, and five corresponding registers 141A-141E. When the texel is written to a particular location within the RAM array 90, the address of the location within the RAM array 90 also is written to one of the five address FIFO buffers 114A-114E. The address is written to the one address FIFO buffer that corresponds to the particular frame buffer controller chip to which the texel is destined. The decoder 126 and AND gates 132A-132E determine to which address FIFO buffer 114A-114E the address will be written.
The 3-bit frame buffer number is provided from earlier pipelined stages of the interpolator 76 along bus 106 as an input to decoder 126. Decoder 126 decodes the 3-bit frame buffer number to determine to which ofthe frame buffer controller chips the texel is destined, and outputs five one-bit words along buses 130A-130E. Only one of the one-bit words output from decoder 126 will be asserted and that one asserted word corresponds to the frame buffer controller chip to which the texel is destined. The one-bit words are provided along buses 130A-130E respectively to AND gates 132A-132E. Also, a valid signal is provided from earlier pipelined stages of the interpolator along bus 128 as an input to each ofthe AND gates 132A-132E indicating that the earlier pipelined stages of the interpolator has resultant texture data to store in the RAM array. Thus, each of the five output signals from the decoder is ANDed logically with the valid bit provided from the earlier pipelined stages of the texel interpolator. If the earlier pipelined stages of the texel interpolator have asserted the valid bit, indicating that valid resultant texel data can be provided, then only one of the valid signal outputs from the AND gates 132A-132E will be asserted. That one valid signal output will correspond to the frame buffer controller chip to which the texel is destined.
The valid signal outputs of the AND gates 132A-132E are provided along buses 139A-139E respectively to registers 141A-141E. From registers 141A-141E, the valid signals are provided to OR gate 145 and to the write enable inputs of the address FIFO buffers 114A-114E along buses 143A-143E. Thus, only one of the address FIFO buffers 114A, 114B, 114C, 114D or 114E, corresponding to the frame buffer controller chip to which the texel is destined, will be write-enabled. The 6-bit address word representing the location within the RAM array in which the texel will be stored is provided to all of the address FIFO buffers 114A-114E along buses 112A-112E. Because only one of the address FIFO buffers is write-enabled, the 6-bit address will be written only to that address FIFO buffer.
The resultant texture data and corresponding address respectively will be written to the RAM array and appropriate address FIFO buffer only if the following three conditions are met: 1) the interpolator has valid resultant texture data; 2) the RAM array is not full to its capacity; and 3) the appropriate address FIFO buffer is not full to its capacity.
The portion of the control unit 110 that controls unloading texels from the RAM array 90 is shown in block diagram form in Fig. 9. That portion of the control unit 110 is shown surrounded by broken lines. The portion of the control unit shown in Fig. 9 can be considered as including a state machine associated with each of the five frame buffer controller chip interfaces. On each clock state, one of the five state machines will be allowed to unload a texel from the RAM array 90. Which of the five selected is dependent upon three things. First, the address FIFO buffer associated with that frame buffer controller chip must not be empty, indicating that that frame buffer controller chip has a texel destined for it stored presently in the RAM array 90. Second, the particular frame buffer controller chip interface must be idle, not busy shifting out a previous texel to its associated frame buffer controller chip and not associated with a "halted" texel due to its unavailability to receive texel data. Third, the frame buffer controller chip interface must have the highest priority among the available interfaces to unload a texel, as determined by a round-robin priority scheme described in more detail below.
If the three above conditions are met, the second portion of the control unit, shown in Fig. 9, operates to unload the address from one of the address FIFO buffers 114A, 114B, 114C, 114D or 114E into address register 152 on the first clock state. On the next clock state, this registered address will be input along bus 112 to the address input of RAM array 90 to access the corresponding location within the RAM array, and the texel will be unloaded from the RAM array 90 on bus 118 and along buses 118A-118E to each of the registers 120A-120E. As will be explained herein below, only one of the registers 120A, 120B, 120C, 120D or 120E, corresponding to the frame buffer controller chip to which the texel is destined, will be load-enabled allowing the texel to be loaded only into that register.
The round-robin priority scheme operates as follows. The control unit 110 includes five 7-bit priority counters 154A-154E. Each priority counter is associated with a respective frame buffer controller chip. The 7-bit priority counter consists of the two following fields: a priority value and a priority state, wherein the priority value consists of the three most significant bits of the priority counter and can be any number between 0 and 4, 4 being the highest priority.
The priority state increments by one within each priority counter during each clock cycle. The priority counter is free running. Therefore, the priority value increments by one every four clock cycles. The priority value from each counter 154A, 154B, 154C, 154D and 154E is output along buses 156A, 156B, 156C, 156D and 156E respectively, to priority decoder 160. The priority decoder 160 outputs five signals called priority acknowledge along buses 162A-162E. Only one of the priority acknowledge signals will be set at any one time. The one priority acknowledge signal that is set corresponds to the frame buffer controller chip for which the corresponding register has the highest priority value signal among those available to receive texel data.
The priority acknowledge signal for any particular frame buffer controller chip will be set if the priority value for that particular chip is at its highest priority of 4. If the priority value for a particular frame buffer controller chip is equal to 3, then the corresponding priority acknowledge signal will be set only if the particular register associated with the frame buffer controller chip having a corresponding priority value of 4 is not then ready to unload a texel from the RAM array 90. Similarly, if the priority value for a particular frame buffer controller chip is equal to 2, then the priority acknowledge signal for that particular frame buffer controller chip will be set only if the registers corresponding to the frame buffer controller chips having corresponding priority value signals equal to 3 and 4 are not then ready to unload texels from the RAM array 90. Similarly, if the priority value for a particular frame buffer controller chip is equal to 1, then the corresponding priority acknowledge signal will be set only if the three registers for the frame buffer controller chips having corresponding priority value signals equal to 4, 3 and 2 are not then ready to unload texels from the RAM array 90. Finally, if the priority value for a particular frame buffer controller chip is equal to 0, then the priority acknowledge signal for that frame buffer controller chip will be set only if the four registers associated with the other four frame buffer controller chips are not then ready to unload texels from the RAM array 90. Thus, when a particular frame buffer controller chip has a highest priority value of 4, the value of the corresponding priority acknowledge signal is not dependent on the state of any of the other frame buffer controller chip interfaces.
The priority decoder implements an algorithm in accordance with the above-described scheme to output five priority acknowledge signals along buses 162A-162E. As shown, the priority decoder 160 receives the five priority value signals along buses 156A-156E respectively from counters 154A-154E. Also received by priority decoder 160 along lines 116A-116E are five signals illustrating whether any of the registers 120A-120E, respectively, are available to unload a texel from the RAM array 90. Priority decoder 160 additionally receives five one-bit acknowledge signals from the frame buffer controller chips 50A-50E along lines 191A-191E, wherein an acknowledge signal is asserted when the corresponding frame buffer controller chip is available to receive data. Using the information received, including the priority value signals and the register ready signals associated with each of the frame buffer controller chips, the priority decoder implements the algorithm described above and outputs the five priority acknowledge signals. The priority acknowledge signals output on buses 162A-162E are provided through delay elements 184A-184E to the load enable inputs of registers 120A-120E. At most, at any one time, only one of the priority acknowledge signals will be set such that only one of the registers 120A, 120B, 120C, 120D or 120E will be load-enabled. When one of the registers is load-enabled, the texel data received on bus 118 will be loaded into that particular register.
Before the priority acknowledge signals reach the load-enable inputs of the registers 120A-120E, the same priority acknowledge signals are provided on buses 162A-162E to separate inputs of respective AND gates 180A-180E. AND gates 180A-180E also receive respective address outputs along buses 150A-150E from address FIFO buffers 114A-114E. The priority acknowledge signal is a one-bit signal that is either high or low. Each output from the address FIFO buffers is a 6-bit address. Thus, the logical AND operation is performed on a bit-by-bit basis. In other words, the priority acknowledge signal is logically ANDed separately with each bit of the 6-bit address output from the address FIFO buffer.
The outputs ofthe AND gates 180A-180E are provided to an OR gate 182 which performs a logical OR operation on the outputs. The output from the OR gate 182 is provided to address register 152. As only one of the priority acknowledge signals will be set at any one time, the outputs of all of the AND gates 180A-180E except for one will be equal to zero. The output of the AND gate associated with the frame buffer controller chip having the priority acknowledge signal set will be equal to the 6-bit address output from the corresponding address FIFO buffer. Thus, the address register 152 stores the address output from the particular address FIFO buffer corresponding to the frame buffer controller chip for which the priority acknowledge signal is set. The address register 152 stores this address during the state when each of the priority acknowledge signals are within the delay buffers 184A-184E. During a following state, the address stored within the address register is provided along bus 112 to the address input of the RAM array such that the texel from the addressed location will be unloaded and provided on bus 118 to the registers 120A-120E.
Having thus described at least one illustrative embodiment of the invention, various alterations, modifications and improvements will readily occur to those skilled in the art. Such alterations, modifications and improvements are intended to be within the spirit and scope of the invention. Accordingly, the foregoing description is by way of example only and is not intended as limiting. The invention is limited only as defined in the following claims and the equivalents thereto.

Claims

In a texture mapping computer graphics system, a method for transferring texture data (S, T) including a plurality of texels from a texture mapping chip (46) to multiple frameuffer controller chips (50A-50E) comprising the steps of:
temporarily storing a limited number of texels in a texel array storage unit (90), wherein each texel is destined for a particular frame buffer controller chip; and

transferring each texel from the texel array storage unit (90) into an appropriate frame buffer controller chip.
The method as claimed in claim 1 wherein the step of transferring includes the step of transferring each texel from the texel array storage unit (90) into one of a plurality of first storage registers (120A-120E).
The method as claimed in claim 2 wherein the step of transferring further includes the step of transferring texels from each first storage register (120A-120E) into each of a plurality of second storage registers (124A-124E).
The method as claimed in claim 1 further including, for each texel stored in the texel array storage unit (90), the step of storing in one of a plurality of address storage units (114A-114E) an address of a location within the texel array storage unit (90) in which a texel is stored, wherein each frame buffer controller chip (50A-50E) has a corresponding address storage unit (114A-114E).
The method as claimed in claim 1 further including, during the step of receiving, the step of determining, for each texel, for which frame buffer controller chip (50A-50E) the texel is destined.
The method as claimed in claim 1 further including, before the step of shifting, the step of determining a highest priority frame buffer controller chip (50A-50E).
In a texture mapping computer graphics system including a texture mapping chip (46) that stores a plurality of texels (S, T) and multiple frame buffer controller chips (50A-50E) that process the texels, an interface between the texture mapping chip (46) and the frame buffer controller chips comprises (50A-50E):
a texel array storage unit (90), coupled between a portion of the texture mapping chip (46) and the frame buffer controller chips (50A-50E), that temporarily stores a limited number of texels, each texel being destined for a particular frame buffer controller chip (50A-50E); and

a control unit (110), coupled to the texel array storage unit (90), that controls shifting texels from the texture mapping chip (46) into locations within the texel array storage unit (90) and transferring texels from the texel array storage unit (90) into appropriate frame buffer controller chips (50A-50E).
The interface as claimed in claim 7 further including a plurality of address storage units (114A-114E), coupled to the control unit (90), that store addresses of locations within the texel array storage unit (90) in which texels are stored, wherein each address storage unit (114A-114E) corresponds to a different frame buffer controller chip (50A-50E).
The interface as claimed in claim 7 wherein the control unit (90) includes:
a first portion that controls shifting texels from the texture mapping (46) chip into locations within the texel array storage unit (90); and

a second portion that controls transferring texels from the texel array storage unit (90) into appropriate frame buffer controller chips (50A-50E).
The interface as claimed in claim 9 wherein the first portion of the control unit includes a decoder (126), coupled to the texture mapping chip, that determines to which frame buffer controller chip each texel is destined.