US7710425B1 - Graphic memory management with invisible hardware-managed page faulting - Google Patents
Graphic memory management with invisible hardware-managed page faulting Download PDFInfo
- Publication number
- US7710425B1 US7710425B1 US09/591,225 US59122500A US7710425B1 US 7710425 B1 US7710425 B1 US 7710425B1 US 59122500 A US59122500 A US 59122500A US 7710425 B1 US7710425 B1 US 7710425B1
- Authority
- US
- United States
- Prior art keywords
- texture
- memory
- page
- data
- bit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/04—Texture mapping
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/10—Address translation
- G06F12/1081—Address translation for peripheral access to main memory, e.g. direct memory access [DMA]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/60—Memory management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09G—ARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
- G09G2360/00—Aspects of the architecture of display systems
- G09G2360/12—Frame memory handling
- G09G2360/121—Frame memory handling using a cache memory
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09G—ARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
- G09G2360/00—Aspects of the architecture of display systems
- G09G2360/12—Frame memory handling
- G09G2360/122—Tiling
Definitions
- the present application relates to computer graphics rendering systems and methods, and particularly to handling of texture data used by rendering accelerators for 3D graphics.
- 3D three-dimensional
- the peculiar demands of 3D graphics are driven by the need to present a realistic view, on a computer monitor, of a three-dimensional scene.
- the pattern written onto the two-dimensional screen must therefore be derived from the three-dimensional geometries in such a way that the user can easily “see” the three-dimensional scene (as if the screen were merely a window into a real three-dimensional scene).
- This requires extensive computation to obtain the correct image for display, taking account of surface textures, lighting, shadowing, and other characteristics.
- the starting point (for the aspects of computer graphics considered in the present application) is a three-dimensional scene, with specified viewpoint and lighting (etc.).
- the elements of a 3D scene are normally defined by sets of polygons (typically triangles), each having attributes such as color, reflectivity, and spatial location.
- polygons typically triangles
- Textures are “applied” onto the polygons, to provide detail in the scene.
- a flat carpeted floor will look far more realistic if a simple repeating texture pattern is applied onto it.
- Designers use specialized modelling software tools, such as 3D Studio, to build textured polygonal models.
- the 3D graphics pipeline consists of two major stages, or subsystems, referred to as geometry and rendering.
- the geometry stage is responsible for managing all polygon activities and for converting three-dimensional spatial data into a two-dimensional representation of the viewed scene, with properly-transformed polygons.
- the polygons in the three-dimensional scene, with their applied textures, must then be transformed to obtain their correct appearance from the viewpoint of the moment; this transformation requires calculation of lighting (and apparent brightness), foreshortening, obstruction, etc.
- the correct values for EACH PIXEL of the transformed polygons must be derived from the two-dimensional representation. (This requires not only interpolation of pixel values within a polygon, but also correct application of properly oriented texture maps.)
- the rendering stage is responsible for these activities: it “renders” the two-dimensional data from the geometry stage to produce correct values for all pixels of each frame of the image sequence.
- FIG. 2 shows a high-level overview of the processes performed in the overall 3D graphics pipeline. However, this is a very general overview, which ignores the crucial issues of what hardware performs which operations.
- Textures A texture is a two-dimensional image which is mapped into the data to be rendered. Textures provide a very efficient way to generate the level of minor surface detail which makes synthetic images realistic, without requiring transfer of immense amounts of data. Texture patterns provide realistic detail at the sub-polygon level, so the higher-level tasks of polygon-processing are not overloaded. See Foley et al., Computer Graphics: Principles and Practice (2.ed. 1990, corr. 1995), especially at pages 741-744; Paul S. Heckbert, “Fundamentals of Texture Mapping and Image Warping,” Thesis submitted to Dept. of EE and Computer Science, University of California, Berkeley, Jun.
- a typical graphics system reads data from a texture map, processes it, and writes color data to display memory.
- the processing may include mipmap filtering which requires access to several maps.
- the texture map need not be limited to colors, but can hold other information that can be applied to a surface to affect its appearance; this could include height perturbation to give the effect of roughness.
- the individual elements of a texture map are called “texels.”
- Perspective-corrected texture mapping involves an algorithm that translates “texels” (pixels from the bitmap texture image) into display pixels in accordance with the spatial orientation of the surface. Since the surfaces are transformed (by the host or geometry engine) to produce a 2D view, the textures will need to be similarly transformed by a linear transform (normally projective or “affine”). (In conventional terminology, the coordinates of the object surface, i.e.
- mapping means that a horizontal line in the (x,y) display space is very likely to correspond to a slanted line in the (u,v) space of the texture map, and hence many additional reads will occur, due to the texturing operation, as rendering walks along a horizontal line of pixels.
- virtual memory One of the basic tools of computer architecture is “virtual” memory. This is a technique which allows application software to use a very large range of memory addresses, without knowing how much physical memory is actually present on the computer, nor how the virtual addresses correspond to the physical addresses which are actually used to address the physical memory chips (or other memory devices) over a bus.
- AGP Accelerated Graphics Port
- the Intel specification also provides a special protocol for “AGP memory.” This is not physically separate memory, but just dynamically-allocated system DRAM areas which the graphics chip can access quickly.
- the Intel chip set includes address translation hardware which makes the “AGP memory” look continuous to the graphics controller. This permits the graphics chip to access large texture bitmaps (e.g. 128 KB) as a single entity.
- GART Graphics Address Remapping Table
- the GART hardware is somewhat similar in function to the paging hardware in the CPU chip, in that the processor “linear” virtual addresses get automatically translated into physical addresses (which may point to system RAM and local Frame Buffer memory, as well as the AGP RAM).
- a texture image may be stored on an otherwise blank page; when the texture image is desired to be inserted into a display, the blank background page is obviously unneeded.
- the parts of the source image not to be copied are defined by setting them to a specific color, called the “key” color.
- the key color a specific color
- a test is made for the existence of this key color, and any pixels of this key color are rejected and therefore not copied.
- This technique allows an image of any shape to be copied onto a background, since the unwanted pixels are automatically excluded. For example, this could be used to show an explosion, where the flames are represented by an image.
- the first defect is a border effect caused by including some of the key color, which should not be plotted, in the pixels that are valid for plotting.
- the edge pixels will be colored in part by neighboring pixels which would not otherwise be copied at all.
- the edge pixels will spuriously include some of the key color, and will form a border around the plotted object.
- the resulting image will appear to have a dark or shimmering outline, which is obviously not intended.
- the second problem is the accuracy with which the cut-out can be performed.
- the normal way of deciding whether or not to plot a pixel is to test if any of the contributing pixels is valid, or if any of them are invalid. Since all of the edge pixels will have been blended with a key color neighbor, and the bordering invalid pixels will have been blended with a valid neighboring pixel, both approaches lead to final image that has a different size before filtering as compared to after filtering.
- the first method makes the final image too big, while the second method makes it too small.
- the third problem is that while bilinear filtering may smooth the color transitions within the selected region of the copy, the edge of the cut-out does not get smoothed and remains blocky.
- Bit-blit also written as bit blit and bitblt, is a pixel block copying procedure.
- bitblt is short form for “bit block transfer.”
- One of the most common uses of the bit-blit is in copying pixels from the back framebuffer, where they were written by the graphics processor, to the front framebuffer, from where they will be scanned and displayed. Blitting is also used to simply move a block of pixels from one set of memory locations to another, which effectively moves those pixels on the display, e.g. scrolling of text or moving a window on the screen.
- Virtualization of texture memory like virtualization of host memory, gives the user the impression of a memory space which is larger than can be physically accommodated in real memory. This is achieved by partitioning the memory space into a small physical working set and a large virtual set with dynamic swapping between the two.
- the physical working set is main memory and the virtual set is disk storage.
- the apparently-larger virtual texture memory space increases performance as the optimum set of textures (or part of textures) are chosen for residence by the hardware. It also simplifies the management of texture memory by the driver and/or application where either or both try to manage the memory manually. This is akin to program overlays before the days of virtual memory on CPUs where the program had to dynamically load and unload segments of itself.
- the present inventor has realized that managing the texture memory in the driver or by the application is very difficult (or impossible) to do properly, because:
- the present application discloses a computer system in which a graphics accelerator unit manages page faulting of texture data invisibly to the host processor.
- this determination uses the least recently used algorithm.
- the host does (a) (after having made the page available), but the hardware carries on and does b, c, d, e and f.
- features of the virtual texture mapping architecture described in the present application include at least the following: A single chip solution is provided; Two or three levels of texture memory hierarchy are supported; The page faulting is all done in hardware with no host intervention; The texture memory management function can be used to manage texture storage in the host memory in addition to the texture storage in our normal texture memory; multiple memory pools are supported; and multiple rasterizers can be supported.
- the present application is one of nine applications filed simultaneously, which are all contemplated to be implemented together in a common system. The other applications are (U.S. non-provisional application Ser. Nos.
- FIG. 1 is an overview of a computer system, with a rendering subsystem, which incorporates the disclosed graphics memory management ideas.
- FIG. 2 is a very high-level view of other processes performed in a 3D graphics computer system.
- FIG. 3 shows a block diagram of a 3D graphics accelerator subsystem.
- FIGS. 4A and 4B are a pair of flow charts which show how a texture is loaded, depending on whether a cache miss occurs.
- FIG. 5 shows a 2-D coordinate space mapped to a 1-D address range.
- FIG. 6 shows a 2 ⁇ 2 patch arrangement within a texture map.
- FIGS. 7A and 7B show layouts in memory for the various supported formats.
- FIG. 8 shows how the map level and address can be encoded into the least amount of bits.
- FIG. 9 shows which texels the memory reads bring in and the corresponding output fragments they will satisfy.
- FIG. 10 shows a block diagram of the Texture Read Unit.
- FIG. 11 shows a block diagram of the Primary Cache Manager.
- FIG. 12 shows a block diagram of the Cache Directory.
- FIG. 13 shows a block diagram of the CAM Cell.
- FIG. 14 shows a block diagram of the Translation Look aside Buffer (TLB).
- FIG. 15 shows a block diagram of an individual CAM cell.
- FIG. 16 shows a sample configuration where two rasterizers are served by a common memory manager and bus interface chip.
- P3 P ERMEDIA 3TM
- the overall architecture of the graphics core is best viewed using the software paradigm of a message passing system. In this system all the processing units are connected in a long pipeline with communication with the adjacent units being done through message passing. Between each unit there is a small amount of buffering, the size being specific to the local communications requirements and speed of the two units.
- the message rate is variable and depends on the rendering mode. The messages do not propagate through the system at a fixed rate typical of a more traditional pipeline system. If the receiving block cannot accept a message, because its input buffer is full, then the sending block stalls until space is available.
- RX has the same functionality as the P3 chip, but has more memory etc.
- Both chips, and other members of the 3Dlabs family of pipelined rendering accelerators, may also be referred to generically as “GLINT” chips.
- FIG. 1 shows a block diagram of a sample computer system context; however, the disclosed techniques can advantageously be incorporated in any number of graphics systems.
- FIG. 3 shows a block diagram of a graphics processor which can incorporate the disclosed embodiments of the read-modify-write solutions in its rendering subsystem.
- a sample board incorporating the P3TM graphics processor may include these elements:
- the Texture Read Unit's main job is to manage the primary texture cache (the data part is in the Texture Filter Unit) and load texel data into it, preferably in advance of when it is needed.
- the primary cache can be used as one large cache or as two smaller (half size) caches depending on the type of texture mapping being done.
- the single large cache is an optimization and allows higher cache hits when the texture map is large or the polygon is large and a single bilinear texture is used.
- address(es) are calculated for the texel data. These addresses may be physical addresses in which case the address is issued to the Memory Controller and some time later the data returned. Alternatively the address may be a logical one so the following steps are taken to resolve (or translate) it:
- the unit is controlled by the TextureReadMode 0 and TextureReadMode 1 messages for texture 0 and texture 1 respectively. Both messages have an identical format, however some modes are mutually exclusive as there are not enough resources to allow all combinations.
- the supported combinations are:
- the target throughput is one active step message every 1 cycles for all mode combinations providing all required texels are in the primary cache.
- the first cache miss in each bank requiring a physical memory read or logical memory read with a TLB hit adds zero cycles but subsequent ones can take an extra cycle each. (The nth cache miss for a step may be satisfied by an earlier cache load on the same step so does not count for extra time.)
- the zero extra cycles case can not be sustained as the actual address generation and reading will take two cycles, however the flow of fragments into the M FIFO should not be disrupted until the AG FIFOs blocks.
- the Layout field in the TextureMapWidth registers selects how the texture data is to be laid out in memory for each mip map level. The options are:
- Linear or Patch64 texture formats can choose between top left and bottom left origins, but the texture map must start on the natural boundary for the texel size. For 8 bit texels this is on a byte boundary, for 16 bit texels this is on a 2 byte boundary and for 32 bit texels this is on a 4 byte boundary.
- TextureMapSize should be set to a value greater than or equal to the product of the width and height for a slice.
- the type of texture is checked and if it is a 3D texture map the base address is set from TextureBaseAddr[0] register, the layout and texel size are taken from TextureReadMode 0 register and the width from TextureMapWidth 0 .
- the layout, texel size and width parameters are taken from the appropriate texture registers (these registers should be loaded the same for per pixel mip mapping).
- the width is divided by 2 to the (map level), so the correct mip map width is used. Note the width does not have to be a power of 2, so the divide may have a remainder (which is ignored) so will fail past some map level. This is not a problem as mip maps will always be a power of two in size and non mip maps will always have a map level of 0.
- the base address is read from one of the 16 base address registers. The actual one used depends on the map level, the map base level and map max level associated with this texture as given by:
- the maximum width is 4095, but the minimum width depends on the layout as the Patch2 and Patch32 — 2 have some minimum requirements. If the mip mapping forces the width below these minimum requirements then the width is forced to be the minimum allowed for the texel size.
- the minimum texel widths are 8, 4 and 2 for 8, 16 and 32 bits per texel respectively.
- the minimum width is one memory word (i.e. 16 bytes). Also if the width falls below 128, 64 or 32 texels for 8, 16 or 32 bits per texel respectively any textures with a Patch32 — 2 layout are automatically set to Patch2.
- the address is calculated as follows. (i and j are the coordinates of the required texel.)
- Pixel Offset top left origin
- the base address is held as a byte address and must be aligned to the natural boundary for the texel size. For a 16 bpp address the bottom bit must be 0. For a 32 bpp address the bottom two bits must be zero. This is forced in hardware to remove any concerns of what happens if this condition is not true.
- An efficient texture cache is vital if a sustained texture rate of one output texel per cycle is to be achieved and maintained. This is even more important when mip mapping as, in general, the zoom ratio is between 1:1 and 2:1 (output:input) so there is only moderate re-use of texel data as we move from one pixel to the next.
- the cache is divided into two banks so two independent textures can be cached without any interference, or to hold two levels of a mip map, or slices of a 3D texture.
- two caches can be joined together so a larger texture map or polygon can be rendered while still maintaining scanline coherency.
- Span processing where the pixel mask (as part of the SpanStep message) is modified by the texel data does not use the primary cache.
- the cache is always enabled and the only control over its operation the user has is to be able to invalidate the cache. This needs to be done whenever a new texture map is selected or the current texture map's data is edited in memory, thus causing any cached data become stale.
- the cache is divided into two parts: a data part and a directory part.
- the data part holds the texel data and this can be found in the Texture Filter Unit so it is connected directly to the linear interpolators used to implement the filtering operations.
- the texel data is held in “raw” format so the cache holds the maximum number of texels and the texel data is converted “on the fly” as it is needed into 8888 format the filter logic expects.
- the two texel formats which cannot be handled this way is the 8 bit indexed textures (replicating the conversion LUT is too expensive) and YUV 422 (the addressing and data routing gets too complicated). In these two cases the data is converted into 8888 formats and this is loaded into the cache.
- Each cache line holds 128 bits of data and there are 256 cache lines in each bank for RX and 64 cache lines in each bank for P3. (These sizes are for illustration only and may be changed later.)
- Each cache line holds a 2 ⁇ 2, 4 ⁇ 2 or 8 ⁇ 2 patch of texels for 32, 16 and 8 bits per texel respectively. In the 2 ⁇ 2 case the cache's performance is independent of the traversal direction through the texture map, however in the other two cases the “u” direction is preferred over the “v” direction.
- the patch (2 ⁇ 2, etc.) has a fixed relationship to the origin of the texture map such that the origin of the patch is always some integral multiple of the patch size from the origin of the texture map.
- the following diagram shows the 2 ⁇ 2 patch arrangement within a texture map.
- the numbers in the brackets show the texel coordinates within the texture map vary and the T 0 . . . T 3 are the corresponding filter registers each texel is assigned.
- the grey areas are show the texels held in a memory word (16 bytes) for each size of texel.
- the texture map may also be patched at a higher level (32 ⁇ 32) to reduce the effect of page breaks but this is of no consequence to how the primary cache functions (see FIG. 6 ).
- Texture maps are preferably stored in memory in one of the 2 ⁇ 2 patched formats to give the best overall performance for general 3D use, however this is not always possible or desirable.
- the texture data originates from an external source or is used to drive an external device (i.e. a monitor) the layout of the data may be fixed and not in 2 ⁇ 2 format.
- the traversal direction may be known to always be in the u direction—examples of this are video scaling, fonts and general 2D use.
- the texture map When the texture map is stored in memory in a non 2 ⁇ 2 layout it is formatted into the 2 ⁇ 2 layout expected by the Filter Unit as it is read in.
- FIGS. 7A-7B The layout in memory for the various supported format is shown in FIGS. 7A-7B .
- Each line is one memory word and the bit numbers are shown along the top.
- the tick marks are at byte intervals and the numbers in brackets show how the texel coordinates vary within the memory word.
- the directory part of the primary cache is held in this unit and is searched to find out if a texel is already in the primary cache, and if so where.
- the search is done fully associatively and 8 texels (four per cache bank) are searched simultaneously (to support the target performance of trilinear filtering or two bilinear filtered texels in a cycle).
- the replacement policy is oldest first (FIFO). These parameters will be justified later.
- the key stored in the cache directory is formed from the texel's integer coordinate (i, j) and map level (or k for 3D texture).
- a bank of the cache cannot hold texels from different texture maps (texels from the different levels in a mip map or from the different slices in a 3D texture can be held in the same bank). This means that the cache must be invalidated whenever a new texture map is selected.
- the typical search policies are fully associative, set associative and direct mapped. These are graded from most expensive, most flexible (fully associative) to least expensive, least flexible (direct mapped). Set associative and direct mapped both rely on using a subset of index bits to choose one (direct mapped) or a set of locations to search.
- the access patterns through a 2D texture map follow an approximate straight line. (It is actually a slightly curved line due to the perspective projection, but this is a minor effect and doesn't change any of the reasoning.)
- the orientation of the line and its position is arbitrary and successive scanline will all follow on approximately parallel paths.
- the other variable to contend with is the width of the texture map—this is variable (between texture maps) and a power of two. Given these constrains choosing a set of index bits to which will give a good distribution for each possible orientation of line looks an impossible task. A good distribution is vital otherwise, in the worst case, all texels along a line could fall into one set (or a single entry for direct mapped)—clearly this will defeat the purpose of a cache.
- the fully associative search works equally well for all access patterns.
- the common replacement policies are least recently used (LRU), oldest (FIFO), least frequently used and random.
- LRU policy usually gives excellent result but is the most expensive, however the approximately regular access patterns repeated from scanline to scanline will make the least recently used page the same as the oldest page (at least within the same polygon).
- the oldest replacement policy is implemented by a simple counter which selects the entry to replace and is incremented after every replacement. The counter wraps within the available table size.
- n is programmable (the TextureCacheReplacementMode).
- the size of the cache is a compromise—the larger the better, but it follows the law of diminishing returns.
- the minimum useful size is based on the number of texels visited along any path through the texture map. This will be the minimum of the texture map size and width of the polygon.
- the cache is patched based so holds a minimum of two rows (maybe only partial rows) at a time.
- the filter may require texels from two adjacent patches (in v) so in the worst case two pairs of rows are needed. If a bank holds n bytes of data the maximum width of texture map (or texels along a polygon) which can be held while maintaining scanline coherency is n/(bytes per texel)/4.
- each bank has 1K bytes of storage so for 16 bit textures the cache works best when less than 128 texels are used for mip maps or 256 texels for a single texture map (where both caches can be combined).
- each bank has 4K bytes of storage so for 32 bit textures the cache works best when less than 256 texels are used for mip maps or 512 texels for a single texture map (where both caches can be combined).
- the fully associative search is expensive and the two factors which govern the cost are the number of entries to search and the width of the key.
- the number of entries is governed by the cache line length and the total amount of data in the cache bank. The cache line length and size of the cache have already been considered, but what about the key?
- the key (as already described) holds the i and j index and the map level (3D textures will be considered shortly).
- the maximum width and height of a map is 2050 (2K+a border) so the indices have 12 bits.
- the cache line holds a 2 ⁇ 2 patch so the indices can be reduced by one bit to 11 bits.
- the number of map level is needed here.
- the key is (11+11+4) bits or 26. This can be reduced down to 23 by realizing that the full 2050 ⁇ 2050 value can only occur on map level 0.
- Map level 1 has a maximum size of 1026 ⁇ 1026 so by encoding the map into the upper bits as shown in FIG. 8 , the key width can be reduced.
- Three dimensional texture maps have a larger key requirement—the map bits are replaced by the k index.
- the i and j index are 11 bits as above and the k index is 12 bits.
- the even k slices are stored in bank 0 while the odd k slices are in bank 1 so the least significant bit of k can be dropped. This gives a key size of 33 bits and is larger than the total address space most processors have.
- the key for 3D textures is formed by concatenating the significant bits of the i, j and k indices together. The number of significant bits for the i and j indices are held in TextureReadMode 0 .Width and TextureReadMode 0 .Height respectively.
- a 23 bit key allows a 3D texture to have 2 23 texels in it or a cuboid of 256 ⁇ 256 ⁇ 128 without the risk of multiple texels aliasing to the same key (the reduced 21 bit key for P3 would allow a maximum cube size of 128 ⁇ 128 ⁇ 128). Both these cuboids (or any other with the same volume) are probably sufficient for a P3 class product but are marginal for an RX class of product. For RX the key size has been increased to 27 bits to allow a maximum cube size of 512 ⁇ 512 ⁇ 512.
- the two independent cache banks are ideal for mip mapping, 3D textures and when two independent texture maps are being used but when a single texture map is being used (a common occurrence) it is very wasteful to have half the cache idle.
- the Filter Unit can be put into a mode where the register files from bank 1 are used to extend the corresponding register files in bank 0 .
- the TextureReadMode 0 .CombineCaches bit is used to enable this mode of operation and when set the texels are alternately loaded into each bank.
- the texture 0 indices are used and are checked in both banks for their presence. Obviously only one bank should report that a texel is present and this is used to select which register file is to supply the texel data.
- This bank select bit is passed to the Filter Unit in the T 4 BorderColor to T 7 BorderColor bits as these are not needed in this mode of operation.
- Any caching scheme is going to suffer from cache misses where the only option open is to go and read the texel data from memory.
- the latency for the data to return may be anything from a few cycles to many tens of cycles depending on how busy the memory is and if the texture request introduces a page break. (This assumes that the texture is resident in memory or is a physical texture. If the texture is non-resident then the time for it to be fetched from host memory could be thousands of cycles at best or many more if the host has to respond to an interrupt, page the texture off disk and then download it.)
- a fragment could cause from one to eight memory reads, although if the cache is working well and scanline coherency is being made use of this will very much reduced.
- the pathological case is where bilinear filtering is being done with a zoom ratio of 1:n, where n>1. In this case we are minifying the map and no coherence between adjacent fragments or scanlines can be exploited. From 1 to 4 reads per fragment are needed depending on how the sample points interact with the underlying 2 ⁇ 2 patch structure in the texture map.)
- FIG. 9 shows which texels the memory reads bring in and the corresponding output fragments they will satisfy.
- the zoom ratio of 1:1 is used as this is the worst case for mip mapping and occurs for the higher resolution map; the lower resolution map will have a zoom ratio of 2:1 so any results for this map level will be twice as good.
- a texel size of 32 bits is also assumed so these results are independent of any path orientation. The smaller texels sizes will give better results for X major paths.
- Row 0 F(0), F(0), F(0), F(0), F(0), F(0), F(0), etc.
- Row 1 F(0), F(0), F(0), F(0), F(0), F(0), etc.
- Row 2 F(0), F(0), F(0), F(0), F(0), F(0), etc.
- Row 3 F(1), F(0), F(0), F(0), F(0), F(1), F(0), F(0), etc. Combining these together for the rows where there are accesses from both levels give:
- the cache management, address calculation and memory requests are being processed many fragments in advance of the fragments the filter unit is working on (determined largely by the depth of the M FIFO in this unit). So assuming the data is returned back from the memory quick enough it may be possible to have the texel data loaded into the primary cache before it is needed. This can be achieved if the step message collects the texel data as it leaves this unit (in much the same way as occurs in the LB Read Unit and FB Read Unit) but this requires write-through register files (probably not much of an issue) in the Filter Unit but does nothing to help the case where more than one load is needed to fulfil all the new texel data for this step message.
- Dispatcher Information to control the loading of the primary cache is passed to the output stage (called the Dispatcher) in the T FIFO.
- the step message is passed in a parallel, but independent M FIFO.
- the Dispatcher will append the new texel data to any message, or if no message is going to be sent to the Filter Unit in this cycle it will inject it's own just to load the primary cache.
- the solution for (1) adopted is to only update the T FIFO with the expedited load information while there are no steps in the M FIFO (or the current step we are working on which has not been entered into the M FIFO yet) which reference the cache line assigned to be updated.
- the 72 bits [8 ⁇ (8 address bits+1 valid bit)] of the FIFO width which hold the cache address for each of the 8 texels the step references are available as individual registers and have comparators so the test is done in parallel.
- the remaining width of the FIFO can be held in a normal FIFO.
- the solution to (2) is for the Dispatcher to maintain a running count of texels loaded into the Filter Unit. As each step message reaches the Dispatcher the running count (called texelsLoaded in the behavioral model) it checked against the number of texels needed to be read by this step. If the texelsLoaded is greater than or equal to what the step needs the step is allowed to proceed to the Filter Unit, otherwise it stalls until sufficient data has been loaded. Once the step is allowed to proceed the texelsLoaded value is decremented by the number of loads the step message was waiting for.
- the bottom line is this cache architecture and memory organization is up to 8 times more efficient than the GLINT MX as measured in number of memory reads per output fragment for 1:1 zoom ratio.
- the secondary cache at least compared to the primary cache is a very simple affair. For normal texture mapping it is largely superfluous except in the following cases:
- the secondary cache has four lines where each line holds 128 bits. Why four lines? There are two texture maps and each map can use two memory reads when in Linear or Patch64 layout. The span processing use all four lines to hold up to 512 bits of bit map data, but little re-use would be normally expected—the main gain is reading 128 bits of a font (for example) in one go and extracting several rows worth of bit mask data from this.
- the secondary cache is direct mapped (spans use a different algorithm) so the search and replacement policies are very simple and cheap.
- the cache directory holds addresses (rather than indices as the primary cache does) and these may be logical addresses or physical addresses. An extra bit identifies the type of address so a new logical address cannot alias with an old physical address, for example.
- the secondary cache is always enabled and the only use control is to be able to invalidate it using the InvalidateCache command.
- This cache should be invalidated whenever texture data has been changed in memory and this data may have been in the secondary cache. (This is never a problem when the Virtual Texture Management changes a texture in memory as the secondary cache holds the logical address and this is invariant unless software re-assigns this logical address to a new texture map. The act of updating the Logical Page Tables through the core will automatically invalidate the secondary cache.)
- Texture maps can be stored in physical memory or in logical/virtual memory. If the texture map is stored in physical memory then it must be physically contiguous and present before that texture is used.
- Host textures can also be managed; the main difference is that no texture data is downloaded, but is accessed “in situ” using the side band addressing capability of the AGP texture execute mode.
- each page is always 4K bytes so the bottom 12 bits of a texel byte address give the byte within a page while the next 16 bits give the page number (the remaining 4 most significant bits are ignored). This gives a maximum virtual texture size of 65536 pages or 256 MBytes.
- the working set can be any number of pages in size. Each logical page has 8 bytes of overhead (in the Logical Page Table) and each physical page has 8 bytes of overhead (in the Physical Page Allocation Table). Some typical sizes for these tables are:
- the Logical Page Table is typically much bigger than the Physical Page Allocation Table.
- the Logical Page Table must be physically contiguous and is allocated in local buffer memory.
- the Physical Page Allocation Table must be physically contiguous and is allocated in local buffer memory.
- the texture maps can be stored anywhere in the on card memory, however two factors influence where the optimum place the texture should be stored:
- the Logical Page Table identifies which pool each logical page should be assigned to when that logical page is loaded into memory.
- Each RX will accept a texture download at any time even if it has no outstanding requests. This means that the first RX to fault will have the faulting page of texture data loaded into itself and also all other RXs. If the other RXs had faulted soon afterwards on the same page they would remove their request when they detected this page being downloaded.
- RX When a page fault is detected RX will inform Gamma (or the Gamma-like Texture DMA Controller in P3) that it needs a page of texture data to be downloaded. Gamma will either interrupt the host and the host software will make available the texture data and start the download, or automatically DMA from the hosts memory.
- Gamma or the Gamma-like Texture DMA Controller in P3
- Gamma broadcasts the LogialTextureAddress and TextureOperation words to the TextureInput FIFO before the actual texture data.
- the RXs on seeing this information will remove any TextureDownloadRequest this transfer will satisfy and allocate space in its texture working set for the new texture page.
- the TLB is a fully associative table (or content addressable memory) which caches the recent logical to physical page mappings. It is first check to see if the mapping we want for this page is present as this is much faster than having to query the Logical Page Table in memory.
- the TLB search happens in a single cycle and is 16 entries for P3 and 64 entries for RX.
- the replacement policy is oldest first.
- a TLB can be classified according to its search policy, its replacement policy and its size. A justification for the chosen attributes will now be given.
- the typical search policies are fully associative, set associative and direct mapped. These are graded from most expensive, most flexible (fully associative) to least expensive, least flexible (direct mapped). Set associative and direct mapped both rely on using a subset of address bits to choose one (direct mapped) or a set of locations to search.
- the access patterns through a 2D texture map follow an approximate straight line. (It is actually a slightly curved line due to the perspective projection, but this is a minor effect and doesn't change any of the reasoning.)
- the orientation of the line and its position is arbitrary and successive scanline will all follow on approximately parallel paths.
- the other variable to contend with is the width of the texture map—this is variable (between texture maps) and a power of two. Given these constrains choosing a set of address bits to which will give a good distribution for each possible orientation of line looks an impossible task. A good distribution is vital otherwise, in the worst case, all addresses along a line could fall into one set (or a single entry for direct mapped)—clearly this will defeat the purpose of a TLB.
- the fully associative search works equally well in all access patterns.
- the common replacement policies are least recently used (LRU), oldest (FIFO), least frequently used and random.
- LRU policy usually gives excellent result but is the most expensive, however the approximately regular access patterns repeated from scanline to scanline will make the least recently used page the same as the oldest page (at least within the same polygon).
- the oldest replacement policy is implemented by a simple counter which selects the entry to replace and is incremented after every replacement. The counter wraps within the available table size.
- the size of the TLB is a compromise—the larger the better, but it follows the law of diminishing returns.
- the minimum useful size is based on the number of pages visited along any path through the texture map. Texture maps are preferably patched 32 ⁇ 32 (a patch at 32 bits per texel is the same size as a page).
- the sweet spot is 256 ⁇ 256 mip mapped or 8 pages for level 0 plus 4 pages for level 1 along a line.
- a 512 ⁇ 512 non mip mapped texture map will hit 16 pages along a line.
- the texel size is 16 bits so X-major lines will hit half the number of pages.
- a 16 entry TLB covers these sizes well.
- the sweet spot is 1024 ⁇ 1024 mip mapped or 32 pages for level 0 plus 16 pages for level 1 along a line.
- a 2048 ⁇ 2048 non mip mapped texture map will hit 64 pages along a line.
- a 64 entry TLB covers these sizes well.
- a TLB miss will cause a single read of the Logical Page Table—the cost of this is difficult to quantify because is depends on how busy the memory system is and if it causes a page break.
- the TLB miss time will be amortised over a minimum of 16 texel reads. (This assumes a one to one mapping between telexes and pixels and takes into account that textures are stored as 2 ⁇ 2 patches—i.e. there are 16 2 ⁇ 2 minor patches in a 32 ⁇ 32 major patch.)
- the TLB can be invalidated by using the InvalidateCache command with bit 2 set and this should be done whenever the host changes the Logical Page Table directly through the bypass. Changes to the Logical Page Table via the UpdateLogicalTextureInfo command will automatically invalidate those logical pages which are updated, if present in the TLB.
- the Logical Page Table has one entry per logical page and each entry has the following format:
- Bit No Name Descripton 0-15 Physical These bits hold the physical page number relative to the start of the working set where Page this logical page is held. If the page is not resident (next field) then these bits are ignored (but will frequently be set to zero). This field is normally maintained by RX, except when the page is marked as a HostTexture. 16 Resident This bit, when set, marks this logical page as resident in the working set. This field is normally maintained by RX, except when the page is marked as a HostTexture. 17 Host This bit, when set, marks this logical page as resident in the host memory and it should Texture be accessed using AGP texture execute mode rather than downloading it. The Length field should also be set to zero.
- this bit will generate an interrupt and involve the Page host in providing this page of texture data.
- this bit is 0
- the HostPage is the physical page and will be read directly with no host intervention. This field is maintained by the host. 44-63 Host This field holds the page in host memory where the texture data is held. This is a virtual Page host page or a physical host page as indicated by the VirtualHostPage bit (previous field). This field is maintained by the host.
- the first word in each entry is basically read and written by RX during the memory management activities unless the page is an host texture in which case the host is responsible for the first word as well.
- the second word is written by the host (either directly via the bypass or via the core using messages) and just read by RX.
- the base address of the table is held in the LogicalTexturePageTableAddr register and is aligned to a 64 bit boundary.
- the number of entries in the table is held in the LogicalTexturePageTableLength register and each logical page number is tested against this limit. If the logical page number is out of range then the address is always mapped into page 0 of the working set and will never cause a texture download. (As a debug aid page 0 of the working set can be missed out of the Physical Page Allocation Table and initialized to some distinctive texture map so any out of range texture mappings cause a distinctive visual effect.)
- the LogicalTexturePageTableLength is initialized to zero during reset which effectively disabled the logical and virtual texture management.
- the table can be updated by the host directly via the bypass once the chip has been synced to make sure there are no conflicting accesses.
- the Physical Page Allocation Table must also be updated to remove the reference (if any) to the logical page being updated.
- the TLB should be invalidated incase the updated Logical Page Table has left any stale data in the TLB.
- the InvalidateCache command (with bit 2 set) can be used to do this.
- the table can also be updated via the normal command stream using the SetLogicalTexturePage command to set the first page to update.
- the data for bits 32 . . . 63 is supplied with the UpdateLogicalTextureInfo command and this will update the Logical Page Table at the previously set page and do all the necessary housekeeping.
- the logical page to update is auto-incremented so several consecutive table entries are updated. Updates beyond the number of entries in the table (as set by LogicalTexturePageTableLength) are discarded and leave the memory untouched.
- the logical table is updated by:
- Keeping track of the least recently used page is done by a queue. Whenever a page is first accessed (easily identified by a TLB miss on the page) it is moved to the head of the queue. It therefore follows that the page at the tail of the queue is the least recently used so is the one allocated to the new texture page.
- This physical page may already be assigned to a logical page so that logical page is marked as non-resident in the Logical Page Table and removed from the TLB. (It is most unlikely it is in the TLB as the working set will normally hold many more pages than the TLB does.)
- the queue used to track the physical pages is held in the Physical Page Allocation Table. This table has one entry per physical page and each entry has the following format:
- Bit No Name Description 0-15 Logical Page These bits hold the logical page number this physical page has been assigned to. If no assignment has been made (or it has been removed) then the valid bit (next field) will be zero and these bits are ignored (but will frequently be set to zero). 16 Valid This bit, when set, marks this logical page as resident in the working set. This field is normally maintained by RX. 17-31 Reserved This field is not used but is set to zero whenever the Resident bit is updated. 32-47 Next Page This field holds the page number of the next page in the pool - i.e. the next recently used page. 36-63 Previous Page This field holds the page number of the previous page in the pool - i.e. the previous recently used page.
- the Physical Page Allocation Table is not normally accessed by the host. The two exceptions are during power-on initialization and if pages are to be locked down. See later for information on these.
- the NextPage and PrevPage fields are used to form a double linked list of the pages assigned to a memory pool.
- the double linked list is a classic data structure for building queues from as it allows fixed time insertion and deletions. In this application a deletion can occur from any queue entry, but insertions only occur at the head.
- the head entry is the most recently used physical page and the tail entry is the least recently used page.
- a traditional linked list suffers from a linear search time, but by combining it with an array (i.e. table) a constant search time to find a given physical page is guaranteed—you just use the physical page number to index into the table. This is important as a frequent operation is to make a specific physical page the most recent. This involves searching for this page and updating the head (and maybe the tail) pointer to move this page to the head of the queue.
- Each memory pool has a head and tail page. These are held in the HeadPhysicalPageAllocation[0 . . . 3] and TailPhysicalPageAllocation[0 . . . 3] registers respectively and the index relates to each memory pool. These registers are initialized by software at the start of day, but there after are read and written by the hardware.
- the PrevPage field for the head page is ignored and will hold links which should be ignored. Similarly for the NextPage field for the tail page.
- the maximum size the Physical Page Allocation Table needs to be is the amount of LB memory plus amount of FB memory (in MByes) divided by 4096. (There is no reason why the Physical Page Allocation Table could not be smaller and just cover the contiguous region set aside for dynamic texture management. Having it cover all the on card memory helps to illustrate some points.) This gives one entry for each 4K page on the card. Many of these pages are not available for virtual texture storage because:
- the BasePageOfWorkingSet register is set up.
- the texture management hardware is now ready to be used once logical textures have been created.
- the texture management can be done on a global basis so all contexts/APIs share the same mechanisms, or can be done on a context by context basis.
- the preferred way to update the Logical Texture Page Table is to use the SetLogicalTexturePage and UpdateLogicalPageInfo commands.
- the SetLogicalTexturePage command takes the logical page to update in the least significant bits.
- the UpdateLogicalPageInfo command sets bits 0 . . . 31 to zero and updates bits 32 . . . 63 with the given data.
- the entry to update was set by SetLogicalTexturePage command and this is auto incremented after the update. All the necessary housekeeping is done.
- the Logical Texture Page Table can be edited by software by reading and/or writing it directly to the table in memory by using bypass memory accesses methods. In this case it is the software's responsibility to do the necessary housekeeping to remove any referenced to the updated logical pages in the Physical Page Allocation Table.
- the texture map can be bound and used. Note that the texture map (or pages of it) are not loaded until it actually used.
- the texture map is only downloaded when it is used, but it is sometimes useful to ensure it is downloaded when it is created. This can be done by using the Load mode to load each logical page in the texture map. Alternatively when a texture map is bound (to a context) you may want to ensure it is resident at this time, rather than wait for a page fault. If the page is already resident then there is no need to load it (as the Load mode would do) so the Touch mode can be used instead. These can be done using the command TouchLogicalPages. This command has the following data fields:
- Bit No Name Description 0-15 Page This field set the first Logical Page to touch. 16-29 Count This field holds the number of pages to touch. 31-31 Mode This field is set to 3 to touch a page(s) or to 1 to load a page(s). As each page is touched the corresponding texture data is downloaded.
- the host's copy is edited.
- the texture management hardware is notified that the texture pages (if resident) are stale by using the command TouchLogicalPages to mark these pages as non resident.
- This command has the following data fields:
- Bit No Name Description 0-15 Page This field set the first Logical Page to mark as stale. 16-29 Count This field holds the number of pages to mark as stale. 30-31 Mode This field is set to 0 to mark the pages as stale (i.e. non resident).
- the primary texture cache is invalidated (using the InvalidateCache command) to ensure it doesn't hold any stale texel data for the texture map just edited.
- the best way to have locked down texture maps is to avoid using the logical/virtual management and have them as physical textures. If a texture is to be locked down after is has been created as a logical texture then the only way to do this is for the software to edit the Physical Page Allocation Table (and maybe the HeadPhysicalPageAllocation and/or TailPhysicalPageAllocation registers for the effected pools). Before these edits can be done the system must be in a quiescent state so no texture downloads are guaranteed to start.
- Virtual host textures are textures which live in virtual host memory so do not need to be locked down into physical memory. As a result they are not guaranteed to be present when a corresponding page fault occurs, and in any case the Logical Texture Page Table only holds the virtual page address and not the physical page address.
- the Logical Texture Page Table will have the VirtualHostPage bit set for these logical pages and other than this the general setup (from RX's viewpoint) is the same as when the bit is clear.
- the TextureAddr PCI register On receiving this interrupt the TextureAddr PCI register is read and this holds the 20 bit virtual address page for the faulting texture page. (In P3 for P3 or in Gamma for RX; the one in RX should not be accessed as the software will not know which RX in a multi-RX system is being serviced).)
- the physical address where the data is located is written in to the TextureAddr PCI register. This will wake up the texture download DMA controller and it will do the download and finish any necessary house keeping.
- Logical texture mapping can be used without the virtual part so a texture map does not need to be stored in consecutive physical pages in memory, but the automatic loading of textures is never done. This allows textures to be managed in the same way they are on GLINT MX, but simplifies the memory management issues as the physical memory allocation is now done on page size chunks, rather than variable texture map sized chunks.
- Texture maps stored in host memory can be managed by the virtual management hardware. This allows a texture map to be split over non contiguous pages of host memory (without relying on the AGP GART table to do the logical to physical mapping) and texture maps to be paged in and out of this memory.
- the host pages are not part of the physical memory pool managed by the hardware so all host pages are allocated (or reallocated) by host software.
- the preferred way to update the Logical Texture Page Table is to use the DownloadAddress and DownloadData commands.
- the DownloadAddress command takes the byte address in memory of the Logical Page Table Entry to update.
- the DownloadData command writes its data to memory and then auto increments the address. Two words are written per logical page entry.
- the TLB must be invalidated to prevent it holding stale data (use the InvalidateCache command with bit 2 set) and WaitForCompletion used to ensure the table in memory has been updated before any rendering can start. (The writes to the Logical Page Table are done via the Framebuffer Write Unit so may still be queued up on the subsequent TLB miss, hence stale page data will be read from the Logical Page Table. The WaitForCompletion command ensures this cannot happen.)
- the Logical Texture Page Table can be edited by software by reading and/or writing it directly to the table in memory by using bypass memory accesses methods. In this case it is the software's responsibility to Sync with the chip first to ensure no outstanding rendering is going to use a logical page about to be updated. The TLB still needs to be invalidated after the bypass updates have been done.
- the texture map can be bound and used.
- the host's copy is edited.
- the primary texture cache is invalidated (using the InvalidateCache command) to ensure it doesn't hold any stale texel data for the texture map just edited.
- Virtual host textures are textures which live in virtual host memory so do not need to be locked down into physical memory. As a result they are not guaranteed to be present when a corresponding page fault occurs, and in any case the Logical Texture Page Table only holds the virtual page address and not the physical page address.
- the Logical Texture Page Table will have the VirtualHostPage bit set, the resident bit clear, the host texture bit set and length field zero for these logical pages.
- the DMA controller will raise an interrupt (even though no download is needed the DMA controller is involved so the same software interface can be used).
- the TextureAddr, LogicalPage and TextureOperation PCI register are read (in P3 for P3 or in Gamma for RX—the one in RX should not be accessed as the software will not know which RX in a multi-RX system is being serviced) to identify the faulting texture page.
- the Logical Page Table is updated via the bypass and the TextureAddr PCI register is written (the data is not used).
- the write to the TextureAddr register will wake up the texture download DMA controller but because the length field is zero no download is done or physical page (from the Physical Page Allocation Table) allocated.
- the TLB will be automatically invalidated.
- a physical page (or pages if the interrupt is used to allocate a whole texture rather than just a page) must be allocated by software. If these physical pages are already assigned then the corresponding logical pages must be marked as non resident in the Logical Texture Page Table. If these newly non resident logical pages are subsequently accessed (maybe by a queued texture operation) they themselves will cause a page fault and be re assigned. Hence no knowledge of what textures are waiting in the DMA buffer to be used is necessary.
- the physical pages are allocated from the host working set whose base page is given by BaseOfWorkingSetHost register.
- a 3D texture map is one where the texels are indexed by a triplet of coordinates: (u, v, w) or (i, j, k) depending on the domain.
- Such textures are typically used for volumetric rendering.
- 3D texture mapping is in this unit is enabled by setting the Texture3D bit in TextureReadMode 0 (the same bit in TextureReadMode 1 is always ignored).
- the layout, texel size, texture type and width should be set up the same for texture 0 and texture 1 .
- 3D texture maps are not optimal for volumetric rendering—ideally the texture is stored in 3D patches (at the 2 ⁇ 2 ⁇ 2 level and at the 32 ⁇ 32 ⁇ 32 level, or equivalents). Some access paths (primarily along the k axis) will exhibit a high number of page breaks so be slower than paths primarily along the i or j axis. No effort has been made to address this as the inclusion of 3D textures is more a functional rather than a performance issue (yet!).
- CombinedCache mode bit should not be set when 3D textures are being used.
- Bitmap data can be stored in memory and accessed via the texture mapping hardware.
- the resulting “texel” data is treated as a bitmap and used to modify the pixel or color mask used in a span operation.
- the bitmap data can be held at 8, 16, 32 or 64 bit texels and is zero extended (when necessary) to 64 bits before being optionally byte swapped, optionally mirrored, optionally inverted and ANDed with the pixel mask or the color mask.
- the primary texture cache is not used for this data, however the secondary cache is.
- bitmap data can only be held in Linear or Patch64 layouts—Patch32 — 2 or Patch2 formats are not supported, however no interlocks prevent their use—the results are just not interesting or useful.
- the bitmap data can be stored as logical or physical textures.
- the bitmap data can be held as packed 8, 16, 32 or 64 bit data, usually with one scanline of the glyph held per texel. Glyphs wider than 64 bits will take multiple texels to cover the width. Packing multiple scanlines together reduces the waste of memory (in MX the texel size was limited to 32 bits for spans), and makes the caching more efficient.
- Windows normally supplies its bitmasks as a byte stream with successive bytes controlling 8 pixel groups at increasing x (i.e. towards the right edge). Bit 7 within a byte controls the left most pixel (for that group) and bit 0 the right most pixel. To match up the pixel mask order (bit 0 controls the left most pixel, bit 63 the right most pixel) the three byte swap bits are all set and the mirror bit set.
- Indexed textures are a special case because they are stored as 8 bit texels and expanded to 32 bit texels when loaded into the Texture Filter Unit (the expansion happened in the Texture LUT Unit). This makes the addressing and cache management slightly more complicated as the addressing uses the 8 bit texel size, while the cache management uses the 32 bit texel size.
- the secondary cache holds the texture data in its 8 bit format so reduces the number of memory reads when the access path is mainly in u across the texture map.
- YUV textures are a special case because two texels are stored in a 32 bit word (so in this sense they are 16 bit texels), however the U and V components are shared so the 32 bit word represents two 24 bits texels (the spare “alpha” byte is set to 255). If the input bytes in the 32 bit word are labelled:
- VYUY This arrangement of the YUV pixels in memory is called YVYU, but an alternative memory format (called VYUY) is also supported.
- VYUY an alternative memory format
- the bytes are labelled:
- Borders in the OpenGL sense are only used when the filter mode is bilinear and the wrapping mode is clamp. In this case when one of the filter points go outside the texture map the border texel is read (if present) or the border color is used (if absent). The border, if present, still needs to be skipped over and this will have already been done by incrementing the i, j indices before they get to this unit.
- the width of a texture map is given by (2 n +2b) where b is 0 for no border or 1 with a border.
- b is 0 for no border or 1 with a border.
- the TextureMapWidth 0 and TextureMapWidth 1 registers hold the width of the texture map without the border (in bits 0 . . . 11 ) and if a border is present the border bit (bit 12 ) in TextureMapWidth 0 or TextureMapWidth 1 ) is set.
- Texels which fall into the border when no border is present are flagged by the Texture Index Unit so these texels are not checked in the cache and no texels read from memory.
- the T 0 BorderColor . . . T 7 BorderColor flags used for this purpose are also passed to the Texture Filter Unit where they select the BorderColor 0 (T 0 . . . T 3 ) or BorderColor 1 (T 4 . . . T 7 ) registers instead of the primary cache to provide the texture data.
- the BorderColor 0 and BorderColor 1 registers would normally be set the same value for OpenGL when mip mapping.
- FIG. 4A and FIG. 4B are a pair of flow charts which show how a texture is loaded, depending on whether a cache miss occurs.
- FIG. 4B shows actions in the Primary Cache Manager. If a cache miss occurs (test 421 ), the details of the missing texel are obtained (step 423 ), and the next free cache line is looked up (step 425 ). A read command is then issued to the address generator (step 427 ), specifying the free cache line as the return address. The address generator updates the T FIFO after the read request has occurred. A message is then written into the M FIFO with details of the cache lines used, fragment details, and the number (if any) of additional cache loads which have now occurred.
- FIG. 4A shows actions in the Dispatcher. If the T FIFO or the Texel Data FIFO are not empty (test 401 ), then the data in the Texel Data FIFO is written (step 403 ) into the cache data line given by the T FIFO. The Cache lines loaded count is then updated (step 405 ), and the entry flushed from both FIFOs (step 407 ).
- FIG. 10 A block diagram of the unit is shown in FIG. 10 .
- the overall unit is split into 7 sub-units and these are basically organized into three groups:
- the Primary Cache Manager, Address Generator and Dispatcher form the core of the unit and work in a similar way to the other read units.
- the logical address translation is handled by the Address Mapper and TLB.
- the dynamic texture loading is handled by the Memory Allocater and the Download Controller.
- FIFOs The interfaces between all the units are shown as FIFOs, but most of the FIFOs are just a register with full/empty flags for simple handshaking.
- the single deep FIFOs have been used as they clearly delineate the functionality between units and allow a single sub unit to be responsible for a single resource.
- the two shared resources which are managed in this way are the TLB and Memory Allocater.
- the TLB is mainly queried by the Address Mapper but the Memory Allocater needs to invalidate pages when a physical page is re-assigned.
- the Memory Allocater will allocate pages when requested by the Download Controller, but also needs to mark pages as “most recently used” when requested by the Address Mapper.
- the read port to the Memory Controller is used to read texture data and has a deep address FIFO and return data FIFO to absorb latency.
- the write port to the Memory Controller is used by the Download Controller to write texture data into memory during a download.
- the path from the Texture Input FIFO to the Memory Controller is 128 bits wide so the maximum download bandwidth can be sustained.
- the texels: (i 0 , j 0 , map), (i 1 , j 0 , map), (i 0 , j 1 , map), (i 1 , j 1 , map) for texture 0 and for texture 1 are checked in parallel in the Primary Cache Manager to see if they are in the primary cache.
- the step message with the address of each texel filled in, is written to the M FIFO and the texel read count field on this step set to zero. This part of the processing all happens in the same cycle so the fragment throughput is maintained.
- this step message reaches the Dispatcher and is passed on as soon as the following unit can accept it.
- the texels: (i 0 , j 0 , map), (i 1 , j 0 , map), (i 0 , j 1 , map), (i 1 , j 1 , map) for texture 0 and for texture 1 are checked in parallel in the Primary Cache Manager to see if they are in the primary cache.
- the Address Generator will process the texel reads one at a time. It calculates the address for the texel in memory using the i, j and map values together with the appropriate TexelReadMode and TextrueMapWidth values. The address is checked to see if it is in the secondary cache, and if it is then instructions to load the primary cache from the secondary cache are sent down the T FIFO.
- a more common case for Patch32 — 2 or Patch2 layout
- the secondary cache doesn't hold the texel so the Address Mapper is given the address and its type (logical or physical) via the AM FIFO.
- the Address Mapper checks in the TLB to see if the logical page is present and, if so, what its corresponding physical page is.
- the logical page is not in the TLB so the Address Mapper reads the entry in the Logical Texture Page Table for this logical page.
- the entry returns a resident bit and a physical page number.
- the resident bit is set so the physical page number is now known.
- the physical memory address is derived from the physical page and low order bits of the logical address and passed to the Memory Controller.
- the TLB is updated so this logical page is the most recent one and its corresponding physical page recorded.
- this step message reaches the Dispatcher and if the outstanding texel data (as shown by the texel read count field) has been loaded into the primary cache (in the Filter Unit) the step is passed on as soon as the following unit can accept it. If, however the outstanding texel data has not been loaded then the step message is stalled until it has.
- the texels: (i 0 , j 0 , map), (i 1 , j 0 , map), (i 0 , j 1 , map), (i 1 , j 1 , map) for texture 0 and for texture 1 are checked in parallel in the Primary Cache Manager to see if they are in the primary cache.
- the Address Generator will process the texel reads one at a time. It calculates the address for the texel in memory using the i, j and map values together with the appropriate TexelReadMode and TextrueMapWidth values. The address is checked to see if it is in the secondary cache, and if it is then instructions to load the primary cache from the secondary cache are sent down the T FIFO.
- a more common case for Patch32 — 2 or Patch2 layout
- the secondary cache doesn't hold the texel so the Address Mapper is given the address and its type (logical or physical) via the AM FIFO.
- the logical page is not in the TLB and the resident bit in the Logical Texture Page Table is clear so the Address Mapper writes to the host physical address (read from the page table) into the PCI HostTextureAddress register, the logical page into the PCI LogicalTexturePage register and the transfer length, memory pool and address type (set to host physical for this description) into the PCI TextureOperation register. Finally the PCI TextureDownloadRequest bit is set. The Address Mapper will wait for the Texture Download Complete signal to be asserted by the Download Controller.
- Texture DMA Controller in Gamma for a RX system, or in P3 for a P3 will respond to the TextureDownloadRequest bit being set. It will write the logical address, transfer length and memory pool into the Texture Input FIFO and then follow this data with the page of texture map data.
- the Download Controller on receiving the logical page and pool information in the Texture Input FIFO will make a request to the Memory Allocator via the MAC FIFO for the physical page to use for the download just about to start.
- the Memory Allocator will use the Physical Page Allocation Table to allocate a physical page and ask the TLB (via the TLB I FIFO) to invalidate the logical page previously occupying (if any) the newly allocated physical page.
- the Memory Allocator also updates the Logical Texture Page Table to mark the logical page as being resident at the new physical page. The physical page is returned back to the Download Controller via the MAD FIFO.
- the Download Controller on receiving the physical page in the MAD FIFO will transfer the texture data in the Texture Input FIFO to the given physical page. Once this is done the TextureDownloadComplete signal is asserted which releases the Address Mapper to complete its task.
- the Address Mapper will read the Logical Texture Page Table entry for this logical page and now that the page is resident the physical page is read from the Logical Texture Page Table.
- the physical memory address is derived from the physical page and low order bits of the logical address and passed to the Memory Controller.
- the TLB is updated so this logical page is the most recent one and its corresponding physical page recorded.
- this step message reaches the Dispatcher and if the outstanding texel data (as shown by the texel read count field) has been loaded into the primary cache (in the Filter Unit) the step is passed on as soon as the following unit can accept it. If, however the outstanding texel data has not been loaded then the step message is stalled until it has.
- the Texture Read Unit has connections to four ports in the Memory Interface.
- the four ports are (in priority order from highest to lowest). This is an absolute priority and not based on any page break considerations:
- the first two ports are not FIFO buffered, so they will block subsequent texture processing until their read or write request have been serviced.
- This port is used to read texel data from memory.
- the addresses (after any necessary translation) are written into the Tx Addr FIFO and sometime later the 128 bits worth of data are returned via the Tx Data FIFO.
- This port is used by the Download Controller to write texture data into its allocated physical page. It is also used to update the Logical Texture Page Table to mark the page as being resident once it has been downloaded.
- TrWrComplete 1 This signal is asserted by the memory controller when the FIFO is empty and all writes from this port, the Memory Allocator Port and the Address Mapper Port have been written to memory so can be read from another port.
- This port is used to update the Logical Texture Page Table with information from the host and to remove references from a physical page to a logical page in the Physical Page Allocation Table.
- the port is 64 bits wide (to save routing a 128 bit data bus from the Memory Controller).
- the read and write operations are buffered by a single level FIFO (to provide a simple interface) so will stall until their operations are satisfied.
- MC Memory Controller
- MC Memory Controller
- This port is used to update the Physical Page Allocation Table as pages are allocated or made the most recent accessed page. It is also used to mark logical pages in the Logical Page Table as non resident when the associated physical page is re-used.
- the port is 64 bits wide (to save routing a 128 bit data bus from the Memory Controller). The read and write operations are buffered by a single level FIFO (to provide a simple interface) so will stall until their operations are satisfied.
- MC Memory Controller
- This unit receives a substantial amount of information about the filtering process and the texels taking part in it from the Texture Index Unit. Some of this information (such as the interpolation coefficients) are not used by this unit and are just passed through.
- the active step messages and the span step messages are extended to carry the extra information. The following table describes the format of these messages:
- Bit No Name Description 0-95 These bits carry the normal data present in an ActiveStepX, ActiveStepYDomEdge, SpanStepX or SpanStepYDomEdge message.
- 96- f0i0 This field holds i0 index for texture 0, even 107 mip maps or even slices for 3D textures. The least significant bit of the computed index is not needed so the original 12 bit number has been reduced to 11 bits.
- 108- f)i1 This field holds i1 index for texture 0, even 119 mip maps or even slices for 3D textures. The least significant bit of the computed index is not needed so the original 12 bit number has been reduced to 11 bits.
- T0BorderColor These bits show which texels are to use the border 149 T1BorderColor color instead of texel data. These are only taken 150 T2BorderColor into account for valid combinations of indices 151 T3BorderColor (see previous field).
- 152- f0map This field holds the map level the texels 155 (T0 . . . T3) are on.
- 156- f1i0 This field holds i0 index for texture 1, odd mip 167 maps or odd slices for 3D textures. The least significant bit of the computed index is not needed so the original 12 bit number has been reduced to 11 bits.
- 168- f1i1 This field holds i1 index for texture 1, odd mip 179 maps or odd slices for 3D textures.
- the least significant bit of the computed index is not needed so the original 12 bit number has been reduced to 11 bits.
- 180- f1j0 This field holds j0 index for texture 1, odd mip 191 maps or odd slices for 3D textures.
- the least significant bit of the computed index is not needed so the original 12 bit number has been reduced to 11 bits.
- 192- f1j1 This field holds j1 index for texture 1, odd mip 203 maps or odd slices for 3D textures.
- the least significant bit of the computed index is not needed so the original 12 bit number has been reduced to 11 bits.
- T4Valid These bits show which texels are valid texels as a 205 T5Valid function of the filter type and the map type (1D or 206 T6Valid 2D) and will limit the addresses checked in the 207 T7Valid primary cache and hence any texture reads ultimately done.
- 208 T0BorderColor These bits show which texels are to use the border 209 T1BorderColor color instead of texel data. These are only taken 210 T2BorderColor into account for valid combinations of indices 211 T3BorderColor (see previous field). 212- f1map This field holds the map level (T4-T7) are on.
- Bit No Name Description 1-70 These bits carry the normal data present in an ActiveStepX, ActiveStepYDomEdge message.
- 71-80 A0 also called This field identifies the cache line (bits 2-9) T0 is cacheLine0 in and the byte position in the word (bits 0-1).
- 81-90 A1 also called This field identifies the cache line (bits 2-9) T1 is cacheLine1 in and the byte position in the word (bits 0-1).
- 91- A2 also called This field identifies the cache line (bits 2-9) T2 is 100 cacheLine2 in and the byte position in the word (bits 0-1).
- 101- A3 also called This field identifies the cache line (bits 2-9) T3 is 110 cacheLine3 in and the byte position in the word (bits 0-1).
- 111- A4 also called This field identifies the cache line (bits 2-9) T4 is 120 cacheLine4 in and the byte position in the word (bits 0-1).
- 121- A5 also called This field identifies the cache line (bits 2-9) T5 is 130 cacheLine5 in and the byte position in the word (bits 0-1).
- 131- A6 also called This field identifies the cache line (bits 2-9) T6 is 140 cacheLine6 in and the byte position in the word (bits 0-1).
- T7 is 150 cacheLine7 in and the byte position in the word (bits 0-1).
- T4BorderColor-T7BorderColor are also used 200 T4BorderColor when in combined cache mode to select between 201 T5BorderColor the register files for each texel 202 T6BorderColor 203 T7BorderColor 204- texel This field tells the Dispatch sub unit how many 206 ReadCount0 texel reads this step needs from Tx Data 0 FIFO and prevents the message being forwarded on if insufficient data has been loaded into the cache from this FIFO and Tx Data 1 FIFO. This is used internally and not passed on to the next unit.
- the Primary Cache Manager is the interface point for the message stream and is responsible for the loading, readback and context switching of all the programmable registers in this unit.
- the registers are not loaded immediately a message is received as outstanding work queued up in the many FIFOs may depend on the current register values. Before the register is loaded all sub units must be idle (as indicated by all the FIFOs being empty.
- the goal of this sub unit is to process a step message in a single cycle when all the required texels are in the primary cache or when there is one miss from each bank of the cache. If one bank gets two or more misses then an extra cycle can be taken to process each miss that results in a new texel read. A read may clear multiple misses so these extra misses don't cost any extra cycles.
- the remainder of the sub units can only process one read at a time so if several successive steps cause two misses (one from each bank) the primary cache manager will eventually stall when the AG 0 and AG 1 FIFOs become full. This is not expected to be a frequent occurrence, except maybe at the start of a new primitive. Multiple cache line loads (in the Texture Filter Unit) will happen sequentially, but the expedited loading mechanism may allow these to be hidden under earlier step (or other) messages, providing the memory latency is less than the number of queued items in the M FIFO.
- the main component in the Primary Cache Manager is the Cache Directory (one per bank). Block diagrams of this will be given as a significant number of gates are involved in these parts. Note these diagrams only show the major data paths and omit clocks, etc.
- FIG. 11 The overall block diagram is shown in FIG. 11
- the cache directory block diagram is shown in FIG. 12 . Note the complementary key outputs are only used to reduce the cost of the comparators in the CAM cells.
- the CAM Cell block diagram is seen in FIG. 13 .
- the cache directory can only ever report a maximum of one match per given key.
- the Address Generator is presented with one or two texels (via the AG 0 and AG 1 FIFOs) which need to be read. It processes the read requests serially starting with filter 0 (if present) and calculates the address of the memory word(s) containing the 2 ⁇ 2 patch of texel data the read texel is in.
- the secondary cache is checked to see if the memory address has already been read and if not the address, a logical/physical flag and the filter number is passed over to the Address Mapper and control information inserted into the T FIFO to load the secondary cache line with the new texel data and to dispatch the texel data to the Filter Unit.
- Texture Filter Unit If the texture map layout is Linear or Patch64 then two or four reads will be necessary to build up the 2 ⁇ 2 patch of texel data the Texture Filter Unit is expecting.
- the secondary cache is 4 entries deep and the cache line length matches the memory width so is 128 bits.
- the cache is direct mapped so the search and replacement policies are very simple.
- the cache is mainly intended to help when the layout is Linear or Patch64, but is also useful for bitmask operations (i.e. with spans) and 8 bit indexed texture maps.
- the cache can hold a logical or a physical address so a flag identifies the address type to prevent unwanted aliasing from occurring.
- the cache line is formed from the least significant bit of j and the filter bank for all cases except bitmasks (i.e. span operations). For span operations the mapping is to take 2 bits out of the i index (adjusted for the texel size) on the assumption that the j index will normally be zero.
- the Address Mappers main job is to map logical addresses to physical addresses. Physical addresses pass straight through with no further processing.
- Physical addresses are passed to the Memory Controller via two FIFOs.
- There is one FIFO per filter bank (the filter bank an address corresponds to is passed in the AM FIFO along with the address and logical flag).
- the two FIFOs keep the addresses from one texture map separate from the addresses from the other texture map. For dual textures (unlike mip maps) it is not possible to ensure they are allocated into different banks of memory, hence they may try and share the same page detector in the Memory Controller. If the two texture map addresses are interleaved then we could get the sequence: page break, read texel from map 0 , page break, read texel from map 1 , etc. This high ratio of page breaks is very detrimental to achieving good memory performance.
- the Memory Controller is able to group reads from one texture map together, thereby amortising the page break costs over more texel reads.
- mapping the logical page to a physical page is done in the TLB sub unit and for the majority of mapping requests the TLB will hold the corresponding physical page so after merging the physical page and low order bits of the logical address the physical address is passed to the Memory Controller.
- the memory is read (via a separate 64 bit port) to look up the logical page entry in the Logical Texture Page Table. If the page is resident the physical address is formed, passed to the Memory Controller and the TLB given the logical page and its physical mapping to insert as the most recently accessed page.
- the Download Controller If the Download Controller is not currently downloading this logical page the pciTextureDownloadRequest bit set, which will inform the Texture DMA Controller (in Gamma for RX, or internal to P3) a transfer is needed. (There may be a race condition here where the Address Mapper fails to notice the page just downloaded is the one it wants and requests it again. This is a safe thing to do, but will waste a small amount of bandwidth.) The Download Controller will clear pciTextureDownloadRequest at the start of the transfer of this page.
- the Address Mapper asserts TextureDownloadRequest to the DownloadController and waits for the texture to be downloaded (as indicated by TextureDownloadComplete being asserted), re-reads the Logical Texture Page Table.
- the physical address is now formed, passed to the Memory Controller and the TLB given the logical page and its physical mapping to insert as the most recently accessed page.
- This sub unit stalls until the texture page has been downloaded and the Logical Texture Page Table updated. See the Download Controller for a description of the interface signals between the two sub units.
- Communication with the TLB is shown via FIFOs for simplicity and to allow a second source (the Memory Allocator) to invalidate entries in the TLB. (This may happen asynchronously because, in an RX system, a texture download may be initiated by another RX.)
- TLB Translation Look Aside Buffer
- the TLB responds to two command streams (serviced in round robin order):
- the TLB holds 16 entries for P3 and 64 entries for RX.
- the block diagram of the TLB is seen in FIG. 14 .
- the block diagram of an individual CAM cell is shown in FIG. 15 .
- An alternative arrangement is to hold the physical page as an extension to the register already holding the logical page and use the match signal from a CAM cell to gate the physical page into an or-array. This will be faster, but the storage of the physical page information will be less efficient than in a register file.
- the TLB can only ever report a maximum of one match for a given logical page
- the Memory Allocator responds to two command streams (serviced in round robin order):
- the Download Controller waits for the Texture Input FIFO to go not empty and then reads the first word to find out about the texture which is just about to be received. It asks the Memory Allocator, via the MAC FIFO for a suitable physical page and once it has received this (via the MAD FIFO) it will copy the texture data into the memory. If the logical page number of the texture matches up with the one the Address Mapper was waiting for (shown by the TextureDownloadRequest and pciLogicalTexturePage) the Address Mapper is notified it can continue by the TextureDownloadComplete signal and TextureDownloadRequest is cleared.
- the Download Controller moved 128 bits of data at a time so the download bandwidth can cope with AGP 4 ⁇ systems (the download bandwidth will be greater than 1 GByte per second).
- This sub unit interacts with the Address Mapper via the following signals:
- pciTextureDown 1 This is asserted by the Address Mapper when it hits a page fault and needs a loadRequest texture page downloaded and that page is not currently being downloaded (the download was instigated by another RX). This is cleared by the Download Controller. This signal tells the Texture Download Controller (in Gamma for RX or internal to P3) a download is needed.
- pciLogical 16 This is set by the Address Mapper to show what logical page it is requesting. TexturePage TextureDownload 1 This is asserted by the Address Mapper when it hits a page fault and needs a Request texture page downloaded. This is cleared by the Download Controller when this page has been downloaded and the Logical Texture Page Table updated.
- TextureDownload 1 This is asserted by the Download Controller and is used to validate the InProgress DownloadLogicalPage value. The Address Mapper uses this to check if the download it wants is currently being done. DownloadLogical 16 This is set by the Download Controller to identify the logical page it is in the Page process of downloading. TextureDownload 1 This is asserted by the Download Controller when it has finished downloading Complete the texture the Address Mapper is waiting on.
- the Dispatcher holds the data part of the secondary cache and forwards texel data to the primary cache (in the Filter Unit).
- Texel data is allowed to flow through whenever it arrives from the Memory Controller, but under control from commands received via the T FIFO.
- a count of the texel data loaded for each filter bank i.e. texture map
- this delay should not be invoked very often.
- the Dispatcher also handles span processing. This involves zero extending the texel data to a 64 bit bitmask, byte swapping, mirroring and inverting when necessary and finally anding the pixel mask in the span step message.
- Texture Read Unit interfaces with a Texture DMA Controller to actually get the data.
- This DMA Controller is in Gamma for a RX based system, or in P3 for a P3 system.
- the P3 Texture DMA Controller just handles a single request at a time.
- the Gamma based Texture DMA Controller is monitoring multiple RXs and broadcasts the texture data to all RXs and not just the requesting one.
- the following hardware signals are used to communicate between the Texture Read Unit and the Texture DMA Controller (each RX will provide its own pair of signals and a mechanism to allow the texture data to be broadcast to all RXs simultaneously):
- Bit No. Name Description 0-8 Length Transfer length in multiples of 128 bit words, maximum being 256 9-10 Memory Pool Identifies which memory pool the physical page is to be allocated from. 11 HostVirtual This bit, when set, indicates the address is a host Address virtual address so the data cannot be read directly without software intervention. The TextureDownload interrupt is generated, if enabled.
- This data (and bits 12 - 31 ) are returned back to the Texture Read Unit in bits 32 - 64 of the first entry written to the Texture Input FIFO (the FIFO is 128 bits wide) as a header preceding the actual texture data.
- Gamma broadcasts the LogialTextureAddress and TextureOperation words to the TextureInput FIFO before the actual texture data.
- the Texture Read Unit on seeing this information will remove any TextureDownloadRequest this transfer will satisfy and allocate space in its texture working set for the new texture page.
- the three PCI registers need to be offset from their base address based on the RX number.
- TextureAddr PCI register is loaded with the virtual address and the TextureOperation PCI register is loaded with the TextureOperation data read from Texture Read Unit before the interrupt is generated.
- the host services the interrupt, reads these two registers and provides the data. When the data is available in memory the physical address where the data is located is written in to the TextureAddr PCI register. This will wake up the texture download DMA controller and it will do the download.
- the P3 DMA controllers would not work behind the initial version of the Gamma (geometry processor from 3Dlabs), due to PCI bugs in Gamma. All is not lost as the texture management can still be done, but now the driver (or interrupt service routine) needs to do more work.
- the Texture DMA controller is placed in SlaveTextureDownload mode (controlled by a bit in a PCI register). This will allow the host to take over some of the DMA Controllers functions.
- Each logical texture page is marked as being a Virtual Host Page.
- an interrupt will be generated and the host does the following actions:
- the FIFO is 128 bits wide and the data is first buffered in a register until the 4th word is written at which time all 128 bits are written into the FIFO.
- the FIFO space is measured in 128 bit words.
- TextureDMAController (void) ⁇ // These three registers can also be read and written by the host across // the PCI bus. uint32 regHostTextureAddr, regLogicalTexturePage, regTextureOperation; uint128 fifoData; uint9 length; forever ⁇ if (pciTextureDownloadRequest is asserted) ⁇ // Get the texture request info from the Texture Read Unit.
- TextureReadMode 0 and TextureReadMode 1 messages are controlled by the TextureReadMode 0 and TextureReadMode 1 messages. These have identical fields (although some fields are ignored in TextureReadMode 1 ). Not all combinations of modes across both registers are supported and where there is a clash the modes in TextureReadMode 0 take priority. For per pixel mip mapping the TextureRead 0 and TextureReadMode 1 register should be set up the same as should the TextureMapWidth 0 and TextureMapWidth 1 registers.
- Texture3D This is only used when Texture3D is enabled and then is only used for cache management purposes and not for address calculations.
- Note field bit is ignored in TextureReadMode1.
- 9-10 TexelSize This field holds the size of the texels in the texture map.
- the CombinedCache mode bit should not be set when 3D textures are being used. 12 Combine This bit, when set, causes the two banks of the Primary Cache to be joined together, Cache thereby increasing the size of a single texture map which can be efficiently handled.
- TextureReadMode1 13-16 MapBase This field defines which TextureBaseAddr register should be used to hold the address for Level map level 0 when mip mapping or the texture map when not mip mapping. Successive map levels are at increasing TextureBaseAddr registers upto (and including) the MaxMaxLevel (next field). 3D textures always use TextureBaseAddr0. 17-20 MapMax This field defines the maximum TextureBaseAddr register this texture should use when mip Level mapping. Any attempt to use beyond this level will clamp to this level. 21 LogicalTexture This bit, when set, defines this texture or all mip map levels, if mip mapping, to be logically mapped so undergo logical to physical translation of the texture addresses.
- Origin selects where the origin is for a texture map with a Linear or Patch64 layout.
- Bit 27 when set, causes adjacent bytes to be swapped, bit 26 adjacent 16 bit words to be swapped and bit 27 adjacent 32 bit words to be swapped.
- this byte swap the input (ABCDEFGH) as follows: 0 ABCDEFGH 1 BADCFEHG 2 CDABGHEF 3 ABCDEFGH 4 EFGHABCD 5 FEHGBADC 6 GHEFCDAB 7 HGFEDCBA 28 Mirror This bit, when set will mirror any bitmap data. This only works for spans. 29 Invert This bit, when set will invert any bitmap data. This only works for spans. 30 Opaque This bit, when set, will cause the SpanColorMask to be modified rather than the pixel mask Span in SpanStepX or SpanStepYDom messages.
- the TextureCacheReplacementMode register controls the replacement policy in the primary cache. It has the following fields:
- the red component shows the number of texture 0 cache line misses Info
- the green component shows the number of texture 1 cache line misses.
- the blue component holds the number of cycles * 8 the fragment was delayed waiting for texel data.
- the alpha component holds the number of cycles * 8 the primary cache was stalled waiting for a free cache line.
- FIG. 1 shows a computer incorporating an embodiment of the innovative graphics innovations in a video display adapter 445 .
- the complete computer system includes in this example: user input device (e.g. keyboard 435 and mouse 440 ); at least one microprocessor 425 which is operatively connected to receive inputs from the input devices, across e.g. a system bus 431 , through an interface manager chip 430 which provides an interface to the various ports and registers; the microprocessor interfaces to the system bus through perhaps a bridge controller 427 ; a memory (e.g. flash or non-volatile memory 455 , RAM 460 , and BIOS 453 ), which is accessible by the microprocessor; a data output device (e.g. display 450 and video display adapter card 445 ) which is connected to output data generated by the microprocessor 425 ; and a mass storage disk drive 470 which is read-write accessible, through an interface unit 465 , by the microprocessor 425 .
- user input device e.g.
- the computer may also include a CD-ROM drive 480 and floppy disk drive (“FDD”) 475 which may interface to the disk interface controller 465 .
- FDD floppy disk drive
- L2 cache 485 may be added to speed data access from the disk drives to the microprocessor 425
- PCMCIA 490 slot accommodates peripheral enhancements.
- the computer may also accommodate an audio system for multimedia capability comprising a sound card 476 and a speaker(s) 477 .
- FIG. 16 shows a sample configuration where two rasterizers are served by a common memory manager and bus interface chip.
- both chips have a PCI bus connection to the CPUs as well as an arbitrated connection to memory, but of course many other configurations are also possible.
- the virtual texture innovations disclosed in the present application can optionally be used in combination with at least some of the previous virtual texture schemes discussed above.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computer Graphics (AREA)
- Image Generation (AREA)
Abstract
Description
- 1. What does the driver/application do when it runs out of memory and needs to fit another texture in? Which texture(s) does it delete?
- 2. The texture has to be completely resident and physically contiguous so a large enough space must be made available.
- 3. A texture which is about to be used MUST NOT be deleted or moved: otherwise all command buffers will be outdated.
- 4. In some cases a texture map will not fit into memory even when all other textures are deleted (a 2K×
2K 32 bpp texture map takes 16 MBytes of memory). - 5. The texture heap must be compacted to reclaim storage.
-
- the P3™ graphics core itself;
- a PCI/AGP interface;
- DMA controllers for PCI/AGP interface to the graphics core and memory;
- SGRAM/SDRAM, to which the chip has read-write access through its frame buffer (FB) and local buffer (LB) ports;
- a RAMDAC, which provides analog color values in accordance with the color values read out from the SGRAM/SDRAM; and
- a video stream interface for output and display connectivity.
-
- The logical address (really just the page part) is looked up in the Translation Look aside Buffer (TLB) and if present the corresponding physical address is issued to the Memory Controller.
- The address translation may fail in the TLB so the page table in memory is accessed and if the page is resident the physical address is looked up, the TLB updated and the physical address is issued to the Memory Controller.
- The page may not be resident in the working set so the page is read from host memory (or the host asked for it via an interrupt) and when it has been loaded the newly updated page table is read, the TLB updated and the physical address is issued to the Memory Controller. The page may be marked as a host texture in which case the address mapping is done, but the texture is not downloaded.
-
- One nearest or linear filtered texture using both halves of the cache to achieve higher cache hit rates on larger texture maps or polygons.
- Any two independent nearest or linear filtered textures, one per half of the cache.
- One automatically (or per pixel) mip mapped texture (always texture 0) using both halves of the cache to store alternate levels of the mip map.
- One 3D texture map using both halves of the cache to store alternate slices of the 3D volume.
- Two independent mip mapped textures where the minification filters only use texels from one level at a time (i.e. the filter are NearestMipNearest or LinearMipNearest). Each texture uses half the cache.
-
- Linear. Here the rows are stored one after another in memory. This is typically used for small texture maps (less than 32×32×32 bpp which fit into one page) and are always accessed along a row. This matches up with most 2D use of texture maps for font, icon and stipple pattern storage. Video data will also fall into this category.
- Patch64. In this layout the pixel data is arranged into 64×16 patches for 32 bpp, 128×16 for 16 bpp and 256×16 for 8 bpp. This is the preferred layout for the color buffer (desktop) so will only be used when the texture units need to operate on this data directly, for example to stretch blit a window.
-
Patch32 —2. The texture data is arranged into 32×32 patches, but also patched to a finer level so that one read always returns a 2×2 block of texel data (for 32 bit texels), a 2×4 block for 16 bit texels or a 2×8 block for 8 bit texels. - Patch2. The texture data is arranged into 2×2 patches. This is used for texture maps where the total number of texels is less than 1K so it all fits into a page.
-
- The texture maps are stored with the top left corner as the origin, i.e. texels at increasing u and/or v coordinates are at increasing memory addresses.
- The texture map must start on the natural patch boundary for the texel size. For 8 bit texels this is on a 4 byte boundary, for 16 bit texels this is on a 8 byte boundary and for 32 bit texels this is on a 16 byte boundary.
-
Patch32 —2 layout only make sense when the width of the texture map is greater than the patch width (128 bytes). UsingPatch32 —2 on texture maps which are less than 128 bytes wide will just fragment the texture map within the patch. This clearly wastes storage and may increase the number of page breaks. When the Texture Read Unit detects that the width of a texture map is less than or equal to 128 bytes it will change the layout fromPatch32 —2 to Patch2 automatically. This allows mip maps to bePatch32 —2 for the high resolution levels and Patch2 for the low resolution levels.
It is the software's responsibility to set the layout toPatch32 —2 or Patch2 as appropriate when the texture map is downloaded. The hardware will write the texel data into the correct place but not switch layouts automatically.
b0 | b1 | b2 |
b3 | t0 | b4 |
b5 | b6 | b7 |
b0 | b1 | b2 | b2 | ||
b0 | b1 | b2 | b2 | ||
b3 | t0 | b4 | b4 | ||
b5 | b6 | b7 | b7 | ||
offset into base registers = min(texture map level + map base level, |
max map level) |
so the allocation of the base registers between the two possible textures is up to software.
-
- bottom left origin: −j*width+i
- top left origin: j*width+i.
i % 64 + | // i within a patch | ||
(i / 64) * 1024 + | // i between patches | ||
(j % 16) * 64 + | // j within a patch | ||
(j / 16) * width * 16 | // j between patches | ||
This can be converted into a simpler calculation just using shifts and adds:
(i & 0x3f) + ((i & 0xffc0) << 4) − ((j & 0xf) << 6) − ((j & 0xfff0) * |
width). |
For bottom left origin the equation is:
(i & 0x3f) + ((i & 0xffc0) << 4) − ((j & 0xf) << 6) − ((j & 0xfff0) * |
width) |
i % 2 + | // i within a patch | ||
(i / 2) * 4 + | // i between patches | ||
(j % 2) * 2 + | // j within a patch | ||
(j / 2) * width * 2 | // j between patches | ||
This can be converted into a simpler calculation just using shifts and adds (only top left origin is supported):
(i & 0x1) + ((i & 0xfffe) << 1) + ((j & 0x1) << 1) + ((j & 0xfffe) * width) |
i′ = i >> 1 | |
j′ = j >> 1 | |
(i′ % 16 + | // i within a 32 × 32 patch |
(i′ / 16) * 256 + | // i between 32 × 32 patches |
(j′ % 16) * 16 + | // j within a 32 × 32 patch |
(j′ / 16) * width * 8) * 4 + | // j between 32 × 32 patches |
// convert from 2 × 2 patches to texels | |
Add in the offset within the 2×2 sub patch:
i % 2 + | // i within a patch | ||
(j % 2) * 2 | // j within a patch. | ||
This can be converted into a simpler calculation just using shifts and adds (only top left origin is supported):
(((i′ & 0xf) + ((i′ & 0xfff0) << 4) + ((j′ & 0xf) << 4) + |
(((j′ & 0xfff0) * width) >> 1)) << 2) + | ||
(i & 0x1) + ((j & 0x1) << 1). | ||
texelOffset+=k*TextureMapSize.
Note that the TextureMapSize does not have to be width×height, but can be larger, if necessary.
8 bpp: | byteOffset = texellOffset * 1 | ||
16 bpp: | byteOffset = texelOffset * 2 | ||
32 bpp: | byteOffset = texelOffset * 4 | ||
64 bpp: | byteOffset = texelOffset * 8 | ||
8 bpp: | byteAddr = baseAddr + |
||
16 bpp: | byteAddr = (baseAddr & ~0x1) + byteOffset | ||
32 bpp: | byteAddr = (baseAddr & ~0x3) + byteOffset | ||
64 bpp: | byteAddr = (baseAddr & ~0x7) + byteOffset | ||
-
- The performance should be independent of the traversal direction, especially for “large” texture maps (i.e. >32×32). Storing the texture map in a linear fashion gives very good access times in the u direction but poor access times in the v direction due to the page organization of DRAMS. Storing the texture maps in a patch form (32×32 in our case for 32 bit texels) equalizes the access times.
- The memory width is very much wider than the texel width so each memory read returns multiple texels. If the texel data in a memory word are all for the same row then all the data is used when traversing in u (along a row) but very little is used in the v direction (along a column). The 2×2 patch organization ensures that at least 2 texels can be used from each memory read for all traversal directions.
Even rows: | F(0), F(0), F(0), F(0), F(0), F(0), F(0), etc. | ||
Odd rows: | F(1), F(0), F(1), F(0), F(1), F(0), F(1), etc. | ||
The next lower resolution map:
Row 0: | F(0), F(0), F(0), F(0), F(0), F(0), F(0), etc. | ||
Row 1: | F(0), F(0), F(0), F(0), F(0), F(0), F(0), etc. | ||
Row 2: | F(0), F(0), F(0), F(0), F(0), F(0), F(0), etc. | ||
Row 3: | F(1), F(0), F(0), F(0), F(1), F(0), F(0), etc. | ||
Combining these together for the rows where there are accesses from both levels give:
-
- F(2), F(0), F(1), F(0), F(2), F(0), F(1), etc.
- (1) The expedited texels cannot overwrite texels which may be referenced by step messages which are queued up in the M FIFO until the original texel data has been used. This should be a rare occurrence and only happen when the number of texels used on a scanline is approximately the same as the texture cache can hold.
- (2) Memory latency or just the amount of data required for a step may mean the step reaches the Dispatcher before all the data has been loaded into the cache so the step message must be delayed.
-
- We could keep incrementing from the oldest entry looking for the first entry we can replace. This is very simple but suffers from taking several cycles and we are very likely to bump texels one of the following step message would like to use.
- Change the cache policy to be LRU (or something else). Unfortunately this adds significantly to the cost of the cache so isn't really an option.
- Start looking for an unused entry at some offset from the current position, say at half the cache's size from where we are now. If this fails then linearly search until an entry is found (which is always guaranteed as the M FIFO is draining so freeing up cache lines at it goes). This is a good compromise as it doesn't destroy the scanline coherency of the following steps (but may well do so for steps further into the future), should just cost a single cycle in most cases and in the limit is fail safe in that it will wait for the FIFO to drain.
-
- The texture layout in memory is Linear or Patch64. In these two cases the texture must first be converted to 2×2 patch format before it is loaded into the primary cache. The secondary cache holds the data while this reformatting or aligning is being done. It also allows some re-use of data as the two memory reads needed to build up the 2×2 patch may be able to be used on the next 2×2 patch.
- The texture map is an 8 bit indexed texture map. These are converted into 32 bit textures to be stored in the Filter Unit. The next primary cache load may well use 8 bit texels from the secondary cache rather than having read data from memory.
- The texture data is going to be used for span processing. Span processing does not use the primary cache so the secondary cache it its only way of reducing the memory bandwidth needed.
-
- The need to swapping and usage are decoupled in time by the DMA buffers.
- The memory granularity is controlled by the texture map size so is continually changing.
- Memory gets fragmented.
- There is no clear replacement policy.
There are a number of solutions to solving this problem: - Increase the amount of physical memory to hold texture maps. This is not always possible due to cost or board area constraints and in any case just delays the point at which the problem will re-occur, rather than fixing it altogether.
- Allow textures to be executed out of host memory via the AGP or PCI bus. This is a similar solution to the previous one, except it doesn't have the cost or board area constraints (at least as far as the graphics board is concerned). The downside of this is the bandwidth across the AGP bus is likely to be inferior to the bandwidth out of local memory. Also the latency for the texture data to arrive may degrade texture performance. This method is supported by setting the HostTexture bit in the TextureMapWidth registers. These texture reads will be done across the AGP bus. The PCI bus can be used but because it lacks the efficient random in-page addressing AGP has the texture accesses will be very slow. Note that there may be system reasons why such a method will not work or work poorly. A system with a GLINT Gamma cannot do this type of access (across AGP) and multiple RX's would require too much bandwidth and not interleave accesses very well.
- The final solution is to treat the texture addresses as logical or virtual addresses. The logical part allows texture maps to be stored in non-contiguous physical pages (a page is 4K bytes). This simplifies the memory management aspect as the granularity now is at the page level. The virtual part allows the dynamic paging of textures out of host or system memory with or without any assistance from the host CPU. This is done on demand so borrows many of the techniques used for CPU memory management. The virtual texture management (of which the logical addressing is a necessary sub-set) is implemented as standard in this unit and will now be described in detail.
-
- The texel has its logical byte address calculated from it's integer coordinates, base address of the texture, texture map width, etc.
- The logical page the logical address resides in is calculated and the Translation Look aside Buffer (TLB) checked to see if the physical page assigned to the logical page is present. If it is the physical address is formed from the physical page number and the low order bits of the logical address. Note the physical page is relative to the start of the working set and not physical memory. The physical address is then posted to the memory controller.
- If the logical page is not present in the TLB then the Logical Page Table entry for this logical page is read. If the resident bit is set then the logical page is present in the working set and its physical page is read from the Logical Page Table. The TLB is updated so the next time this logical page is accessed the physical page is to hand. The physical address is formed from the physical page number and the low order bits of the logical address and then posted to the memory controller.
- If the logical page is not resident in the working set then details about the page (its host address, target memory pool, etc.) is made available to the host or DMA controller. (The DMA controller is in Gamma for RXs or is integrated into P3.) Sometime later the working set has been updated with the new page of texture data and the Logical Page Table updated to show the faulting logical page is now resident and its physical address. The TLB is updated so the next time this logical page is accessed the physical page is to hand. The physical address is formed from the physical page number and the low order bits of the logical address and then posted to the memory controller.
Managed Memory | |||
(pages / MBytes) | Table Size | ||
256 / 1 |
1 KBytes | ||
512 / 2 |
2 |
||
1024 / 4 |
4 KBytes | ||
2048 / 8 |
8 KBytes | ||
4096 / 16 |
16 KBytes | ||
8192 / 32 |
32 KBytes | ||
-
- The column/row/bank structure of the memory devices result in the memory being divided up into pages (not to be confused with logical or physical pages previously discussed). (Some alignments and layouts are more efficient than others.) Access times within a DRAM page are much faster than out of page accesses. SDRAM and SGRAM have multiple banks so can have multiple open pages. When mip mapping or when two independent textures maps are being used it is advantageous if the texture maps (or adjacent levels) are in different banks. (If two or more mip map levels fit into the same DRAM page then this is not necessary.) Placing the two levels or maps in the same bank, but different pages can cause a page break for each texel access—something guaranteed to hurt performance.
- The position of other buffers which are being simultaneously accessed is another important consideration and texture map placement should avoid these banks whenever possible.
-
- TextureDownloadRequest. This signal is asserted by RX to request a texture download. It is de-asserted once the texture download has started.
- TextureFIFOFull. This signal is asserted by RX when it is not able to accept any more data being written into the TextureInput FIFO.
-
- HostTextureAddress. This register holds the host address where the texture resides. This is either a physical address or a virtual address. A bit in the TextureOperation register identifies the type of address. If the address is a virtual address then an interrupt is generated and the host will read the address and initiate the DMA once the data has been made available.
- LogicalTexturePage. This register holds the logical page for the texture data and is returned back to the RXs in the two word header preceding the actual texture data. In a multi-RX system all the RXs take the texture download and not just the RX which requested it.
- TextureOperation. This register holds the transfer length (=1024 words) in the bottom 11 bits and a bit to say if the host texture address is a physical or virtual address (bit 11). If the address type is virtual then the TextureDownload interrupt is generated, if enabled.
Bit | ||
No | Name | Descripton |
0-15 | Physical | These bits hold the physical page number relative to the start of the working set where |
Page | this logical page is held. If the page is not resident (next field) then these bits are ignored | |
(but will frequently be set to zero). This field is normally maintained by RX, except | ||
when the page is marked as a HostTexture. | ||
16 | Resident | This bit, when set, marks this logical page as resident in the working set. This field is |
normally maintained by RX, except when the page is marked as a HostTexture. | ||
17 | Host | This bit, when set, marks this logical page as resident in the host memory and it should |
Texture | be accessed using AGP texture execute mode rather than downloading it. The Length | |
field should also be set to zero. | ||
18-31 | Reserved | This field is not used but is set to zero whenever the Resident bit is updated. |
32-40 | Length | This field holds the number of 128 bit words to transfer when a page fault occurs. This |
allows a page to hold a texture map smaller than 4K without spending the extra download | ||
time on the unused words. There is no way to download to unused portion without | ||
overwriting the used part. When the physical page is in host memory the length field | ||
must be set to zero. This field is maintained by the host. | ||
41-42 | Memory | This field holds the memory pool this logical page should be allocated out of. |
Pool | This field is maintained by the host. | |
43 | Virtual | This bit, when set, indicates the HostPage (next field) is a virtual page in host memory so |
Host | cannot be accessed directly. Setting this bit will generate an interrupt and involve the | |
Page | host in providing this page of texture data. When this bit is 0 the HostPage is the | |
physical page and will be read directly with no host intervention. This field is maintained | ||
by the host. | ||
44-63 | Host | This field holds the page in host memory where the texture data is held. This is a virtual |
Page | host page or a physical host page as indicated by the VirtualHostPage bit (previous field). | |
This field is maintained by the host. | ||
-
- Memory Allocator to mark a logical page as non resident when its allocated physical page is reclaimed and assigned to another logical address.
- The Download Controller to update the resident bit and physical page field once the download is complete.
Bit No | Name | Description |
0-15 | Logical Page | These bits hold the logical page number this physical page has been assigned to. If |
no assignment has been made (or it has been removed) then the valid bit (next | ||
field) will be zero and these bits are ignored (but will frequently be set to zero). | ||
16 | Valid | This bit, when set, marks this logical page as resident in the working set. This |
field is normally maintained by RX. | ||
17-31 | Reserved | This field is not used but is set to zero whenever the Resident bit is updated. |
32-47 | Next Page | This field holds the page number of the next page in the pool - i.e. the next |
recently used page. | ||
36-63 | Previous Page | This field holds the page number of the previous page in the pool - i.e. the |
previous recently used page. | ||
-
- They hold the color buffers.
- They hold the Z, stencil, etc. buffer.
- They hold the overlay buffers.
- They hold the video overlay buffers.
- They hold non logical textures, icons, fonts, bitmaps, etc.
- They hold the Logical Page Table.
- They hold the Physical Page Allocation Table.
- Run length encoded window ID information.
- They hold logical textures which have been locked down.
-
- Space for the Logical Texture Page Table must be reserved in the local buffer and the table initialized to zero. The LogicalTexturePageAddr and LogicalTexturePageTableLength must be set up.
- Space for the working set must be reserved in the local buffer and/or framebuffer.
-
- Space for the Physical Page Allocation Table is reserved in the local buffer and PhysicalPageAllocationTableAddr register is set up to point to it.
-
Bits 0 . . . 31 of each entry in the Physical Page Allocation Table is set to zero—to clear the valid bit. - Each page entry in the Physical Page Allocation Table is associated to one of the four pools based on which bank of memory it resides in. All the pages in a pool are linked together as a double linked list by setting the NextPage and PrevPage fields. The order is unimportant, but sequential is simplest. (It will soon get scrambled once the memory allocation has been running for a while.) The PrevPage field for the first entry in the double linked list and the NextPage field for the last entry can be set to any value as they are not used. Finally the HeadPhysicalPageAllocation and TailPhysicalPageAllocation registers for this memory pool are updated with first and last page numbers. Each memory pool is set up like this. (Any number of memory pools up to a maximum of four can be set up. Unused memory pools don't have any pages linked to them and must not be referenced in the Logical Texture Page Table.)
-
- Host memory to hold the texture map is allocated and locked down. (Virtual host memory could be used, however the driver will need to respond to every page fault and make the textures available in locked physical memory before starting the DMA off to download them. Other than the extra run time overhead and setting the VirtualHostPage flag in the Logical Texture Page Table entries the rest of the operations are the same.) This memory is private to the driver or ICD and not accessible to the application. The pages do not need to be contiguous.
- The logical pages to use for the texture map are allocated from the Logical Texture Page Table. These may be new pages or currently assigned. If they are currently assigned then the texture management hardware will do any necessary housekeeping to prevent aliasing of physical pages to the same logical page (thereby degrading the performance, however still function correctly).
- The host physical page (or host virtual page when host virtual addressing is used) of each page reserved for the texture is found and the HostPage field in for each corresponding entry in the Logical Texture Page Table is updated with it.
- The memory pool this texture is to be stored in is determined and each logical entry has its MemoryPool field set appropriately. (This, in general, is likely to be a difficult thing to determine as the usage of the texture maps is not available Ideally texture maps which will be used simultaneously should be in different pools, unless they can both fit into the same 4K page.)
- The Length field for each logical entry will normally be set to 0x100 (i.e. 4096 bytes), however as an optimization if only part of the 4K page is used (must be the lower part) then the number of 128 bit words used can be used instead.
- The application's texture is copied into the previously allocated host memory and during the copy the texture map is patched and aligned as required by setting the texture map will be invoked with. (It is impossible to do any patching or aligning on the fly as the page of texture is downloaded as the download mechanism has no knowledge of the dimensions of the texture map, its base address, layout or texel size.)
Bit No | Name | Description |
0-15 | Page | This field set the first Logical Page to touch. |
16-29 | Count | This field holds the number of pages to touch. |
31-31 | Mode | This field is set to 3 to touch a page(s) or to 1 to load |
a page(s). | ||
As each page is touched the corresponding texture data is downloaded.
Bit | ||
No | Name | Description |
0-15 | Page | This field set the first Logical Page to mark as stale. |
16-29 | Count | This field holds the number of pages to mark as stale. |
30-31 | Mode | This field is set to 0 to mark the pages as stale (i.e. non |
resident). The primary texture cache is invalidated (using | ||
the InvalidateCache command) to ensure it doesn't hold | ||
any stale texel data for the texture map just edited. | ||
-
- Allocated the physical memory and update the Logical Texture Page Table with the logical to physical mappings. The physical page for each corresponding logical page is stored in
bits 0 . . . 15 and the resident bit (bit 16) is set. The second word in each entry will never be used as this is only accessed on a page fault.- The Logical Texture Page Table can be modified directly via the bypass (with the normal caveats on syncing first) or can be updated via the command stream. The DownloadAddress register and DownloadData commands (see FB Write Unit for details) can be used to update an arbitrary region of memory so can be used to update the logical entries in the Logical Texture Page Table. (The UpdateLogicalPageInfo command cannot be used as it zeros the physical page field and updates the fields concerned with page faults. Also this command does housekeeping work on the Physical Page Allocation Table, which presumably will not have been set up if the virtual texture management is not being used.)
- The texture map must be downloaded in to the physical pages. This can be done via the bypass mechanisms or through the command stream. In either case it is the software's responsibility to do any patching and alignment consistent with how the texture map will be used. Note the texture download mechanism which can do the patching doesn't have any method of remapping the addresses so cannot work with non contiguous physical memory. The DownloadAddress register and DownloadData commands can be used to download each page of texture (pre-patched, if necessary) into its corresponding physical page.
- Allocated the physical memory and update the Logical Texture Page Table with the logical to physical mappings. The physical page for each corresponding logical page is stored in
-
- Host memory to hold the texture map is allocated and locked down. (Virtual host memory could be used, however the driver will need to respond to every page fault and make the textures available in locked physical memory before starting the DMA off to download them. As these are AGP textures the length field (in the Logical Page Table) is zero so no download actually occurs, however it is convenient to use the same synchronisation methods in the hardware implementation. Other than the extra run time overhead and setting the VirtualHostPage flag in the Logical Texture Page Table entries the rest of the operations are the same.) This memory is private to the driver or ICD and not accessible to the application. The pages do not need to be contiguous.
- The logical pages to use for the texture map are allocated from the Logical Texture Page Table. These may be new pages or currently assigned. If they are currently assigned then the TLB should be invalidated to prevent it from holding stale addresses.
- Each logical page has its physical page, resident and host texture fields in the Logical Page Table updated with the corresponding host physical page where the texture is located. The length field must be set to zero (to disable a download from occurring). The pool field and the hostPage field are not used (but are available to software to hold information about this page).
- The application's texture is copied into the previously allocated host memory and during the copy the texture map is patched and aligned as required by the setting the texture map will be invoked with.
-
- The texel is zero extended up to 64 bits.
- The texel is byte swapped according to TextureReadMode0.ByteSwap field. If the 64 bit word has bytes labelled: ABCDEFGH then the three bits swap the bytes as follows:
|
|
|
swapped |
(long swap) | (short swap) | (byte swap) | |
0 | 0 | 0 | |
0 | 0 | 1 | |
0 | 1 | 0 | |
0 | 1 | 1 | |
1 | 0 | 0 | |
1 | 0 | 1 | |
1 | 1 | 0 | |
1 | 1 | 1 | HGFEDCBA |
-
- Next the texel is optionally mirrored. This is controlled by the TextureReadMode0.Mirror bit. The mirror swaps bits:
- (0, 63), (1, 62), (2, 61), . . . (31, 32).
- The texel is next optionally inverted under control of the TextureReadMode0.Invert bit.
- When TextureReadMode0.OpaqueSpan is zero the texel is ANDed with the pixel mask to remove pixels from the mask. When TextureReadMode0.OpaqueSpan is one the texel is ANDed with the color mask (in the SpanColorMask message) to control foreground/background color selection.
- Next the texel is optionally mirrored. This is controlled by the TextureReadMode0.Mirror bit. The mirror swaps bits:
b0 | b1 | b2 |
b3 | t0 | b4 |
b5 | b6 | b7 |
b0 | b1 | b2 | b2 | ||
b0 | b1 | b2 | b3 | ||
b3 | t0 | b2 | b4 | ||
b5 | b6 | b7 | b7 | ||
-
- Memory Allocator Port
- Address Mapper Port
- Texture Write Port
- Texture Read Port
Bit No. | Name | Width | Description |
0-1 | | 2 | Indicates what the target memory is. The |
options are: | |||
0 = | |||
1 = | |||
2 = PCI | |||
2-29 | | 28 | The read address of the 128 bits of memory |
data. | |||
The following information is passed back from the Memory Controller in a FIFO:
Bit No. | Name | Width | Description | ||
0-127 | |
128 | The data read from the memory. | ||
Bit No. | Name | Width | Description |
0-1 | |
2 | Indicates what the target memory is. |
The options are: | |||
0 = |
|||
1 = |
|||
2 = PCI | |||
2-29 | |
28 | The write address of the 128 bits of |
memory data. | |||
30-45 | ByteEnables | 16 | A high on a bit enables that byte to be |
written. The 1s byte enable corresponds | |||
to data bits 0-7. | |||
46-173 | |
128 | The data to be written to the memory. |
Bit No. | | Width | Description | |
0 | |
1 | This signal is asserted by the memory | |
controller when the FIFO is empty | ||||
and all writes from this port, the | ||||
Memory Allocator Port and the | ||||
Address Mapper Port have been | ||||
written to memory so can be read | ||||
from another port. | ||||
Bit No. | Name | Width | Description |
0-1 | |
2 | Indicates what the target memory is. |
The options are: | |||
0 = |
|||
1 = |
|||
2 = |
|||
2 | |
1 | 0 = Write, 1 = Read |
3-31 | Addr | 29 | The write address of the 64 bits of |
memory data. | |||
32-39 | ByteEnables | 8 | A high on a bit enables that byte to be |
written. The 1s byte enable corresponds | |||
to data bits 0-7. | |||
40-103 | |
64 | The data to be written to the memory. |
Bit No. | | Width | Description | ||
0 | |
64 | The data read from memory | ||
Bit No. | Name | Width | Description |
0-1 | | 2 | Indicates what the target memory is. |
The options are: | |||
0 = | |||
1 = | |||
2 = | |||
2 | | 1 | 0 = Write, 1 = Read |
3-31 | Addr | 29 | The write address of the 64 bits of |
memory data. | |||
32-39 | ByteEnables | 8 | A high on a bit enables that byte to be |
written. The 1s byte enable corresponds | |||
to data bits 0-7. | |||
40-103 | | 64 | The data to be written to the memory. |
The following signals are passed from the Memory Controller (MC):
Bit No. | | Width | Description | ||
0 | |
64 | The data read from memory | ||
Bit | ||
No | Name | Description |
0-95 | — | These bits carry the normal data present in an |
ActiveStepX, ActiveStepYDomEdge, | ||
SpanStepX or SpanStepYDomEdge message. | ||
96- | f0i0 | This field holds i0 index for |
107 | mip maps or even slices for 3D textures. The | |
least significant bit of the computed index is | ||
not needed so the original 12 bit number has | ||
been reduced to 11 bits. | ||
108- | f)i1 | This field holds i1 index for |
119 | mip maps or even slices for 3D textures. The | |
least significant bit of the computed index is | ||
not needed so the original 12 bit number has | ||
been reduced to 11 bits. | ||
120- | f0j0 | This field holds j0 index for |
131 | mip maps or even slices for 3D textures. The | |
least significant bit of the computed index is not | ||
needed so the original 12 bit number has been | ||
reduced to 11 bits. | ||
132- | f0j1 | This field holds j1 index for |
143 | maps or even slices for 3D textures. The least | |
significant bit of the computed index is not | ||
needed so the original 12 bit number has been | ||
reduced to 11 bits. | ||
144 | T0Valid | These bits show which texels are valid texels as a |
145 | T1Valid | function of the filter type and the map type (1D or |
146 | | 2D) and will limit the addresses checked in the |
147 | T3Valid | primary cache and hence any texture reads |
ultimately done. | ||
148 | T0BorderColor | These bits show which texels are to use the border |
149 | T1BorderColor | color instead of texel data. These are only taken |
150 | T2BorderColor | into account for valid combinations of indices |
151 | T3BorderColor | (see previous field). |
152- | f0map | This field holds the map level the texels |
155 | (T0 . . . T3) are on. | |
156- | f1i0 | This field holds i0 index for |
167 | maps or odd slices for 3D textures. The least | |
significant bit of the computed index is not | ||
needed so the original 12 bit number has been | ||
reduced to 11 bits. | ||
168- | f1i1 | This field holds i1 index for |
179 | maps or odd slices for 3D textures. The least | |
significant bit of the computed index is not | ||
needed so the original 12 bit number has been | ||
reduced to 11 bits. | ||
180- | f1j0 | This field holds j0 index for |
191 | maps or odd slices for 3D textures. The least | |
significant bit of the computed index is not | ||
needed so the original 12 bit number has been | ||
reduced to 11 bits. | ||
192- | f1j1 | This field holds j1 index for |
203 | maps or odd slices for 3D textures. The least | |
significant bit of the computed index is not | ||
needed so the original 12 bit number has been | ||
reduced to 11 bits. | ||
204 | T4Valid | These bits show which texels are valid texels as a |
205 | T5Valid | function of the filter type and the map type (1D or |
206 | | 2D) and will limit the addresses checked in the |
207 | T7Valid | primary cache and hence any texture reads |
ultimately done. | ||
208 | T0BorderColor | These bits show which texels are to use the border |
209 | T1BorderColor | color instead of texel data. These are only taken |
210 | T2BorderColor | into account for valid combinations of indices |
211 | T3BorderColor | (see previous field). |
212- | f1map | This field holds the map level (T4-T7) are on. |
215 | ||
216- | I0 | Interpolation coefficient between (T0, T1) and |
224 | (T2, T3) in 1.8 unsigned fixed point format. | |
225- | I1 | Interpolation coefficient between (T0, T2) and |
233 | (T1, T3) in 1.8 unsigned fixed point format. | |
234- | I2 | Interpolation coefficient between (T4, T5) and |
242 | (T6, T7) in 1.8 unsigned fixed point format. | |
243- | I3 | Interpolation coefficient between (T4, T6) and |
251 | (T5, T7) in 1.8 unsigned fixed point format. | |
252- | I4 | Interpolation coefficient between (T0, T1, T2, T3) |
260 | and (T4, T5, T7, T7) in 1.8 unsigned fixed point | |
format. | ||
The active step messages are extended to carry the extra information. The following table describes the format of these messages:
Bit | ||
No | Name | Description |
1-70 | — | These bits carry the normal data present in an |
ActiveStepX, ActiveStepYDomEdge message. | ||
71-80 | A0 also called | This field identifies the cache line (bits 2-9) T0 is |
cacheLine0 | in and the byte position in the word (bits 0-1). | |
81-90 | A1 also called | This field identifies the cache line (bits 2-9) T1 is |
cacheLine1 | in and the byte position in the word (bits 0-1). | |
91- | A2 also called | This field identifies the cache line (bits 2-9) T2 is |
100 | cacheLine2 | in and the byte position in the word (bits 0-1). |
101- | A3 also called | This field identifies the cache line (bits 2-9) T3 is |
110 | cacheLine3 | in and the byte position in the word (bits 0-1). |
111- | A4 also called | This field identifies the cache line (bits 2-9) T4 is |
120 | cacheLine4 | in and the byte position in the word (bits 0-1). |
121- | A5 also called | This field identifies the cache line (bits 2-9) T5 is |
130 | cacheLine5 | in and the byte position in the word (bits 0-1). |
131- | A6 also called | This field identifies the cache line (bits 2-9) T6 is |
140 | cacheLine6 | in and the byte position in the word (bits 0-1). |
141- | A7 also called | This field identifies the cache line (bits 2-9) T7 is |
150 | cacheLine7 | in and the byte position in the word (bits 0-1). |
151- | I0 | Interpolation coefficient between (T0, T1) and |
159 | (T2, T3) in 1.8 unsigned fixed point format. | |
160- | I1 | Interpolation coefficient between (T0, T2) and |
168 | (T1, T3) in 1.8 unsigned fixed point format. | |
169- | I2 | Interpolation coefficient between (T4, T5) and |
177 | (T6, T7) in 1.8 unsigned fixed point format. | |
178- | I3 | Interpolation coefficient between (T4, T6) and |
186 | (T5, T7) in 1.8 unsigned fixed point format. | |
187- | I4 | Interpolation coefficient between (T0, T1, T2, T3) |
195 | and (T4, T5, T7, T7) in 1.8 unsigned fixed point | |
format. | ||
196 | T0BorderColor | These bits select which texels are to use the |
197 | T1BorderColor | border color registers (one per bank) instead of |
198 | T2BorderColor | the texel from the register file. |
199 | T3BorderColor | T4BorderColor-T7BorderColor are also used |
200 | T4BorderColor | when in combined cache mode to select between |
201 | T5BorderColor | the register files for each texel |
202 | |
|
203 | T7BorderColor | |
204- | texel | This field tells the Dispatch sub unit how many |
206 | ReadCount0 | texel reads this step needs from |
and prevents the message being forwarded on if | ||
insufficient data has been loaded into the cache | ||
from this FIFO and |
||
This is used internally and not passed on to the | ||
next unit. | ||
207- | texel | This field tells the Dispatch sub unit how many |
209 | ReadCount1 | texel reads this step needs from |
and prevents the message being forwarded on if | ||
insufficient data has been loaded into the cache | ||
from this FIFO and Tx Data0 FIFO. | ||
This is used internally and not passed on to the | ||
next unit. | ||
210 | texelNeeded0 | These bits (also called cacheLineValid) are set |
211 | texelNeeded1 | when the cacheLine0 to cacheLine7 fields hold |
212 | texelNeeded2 | valid values and qualify the search operation |
213 | texelNeeded3 | when checking if the replacement cacheLine is in |
214 | texelNeeded4 | use. |
215 | texelNeeded5 | These are used internally and not passed on to the |
216 | texelNeeded6 | next unit. |
217 | texelNeeded7 | |
-
- The width of the texture map needs to be reduced as a function of the map level when mip mapping. This width is clamped (as a function of texel size) for the Patch32—2 and Patch2 layouts to conform to the layout rules.
- The base address for the texture map is taken from one of the TextureBaseAddr registers as a function of map level, map base level and map max level values held in the corresponding (to the filter) TextureReadMode register.
- The
Patch32 —2 layout will be changed to Patch2 layout when the texture map width falls below 128 bytes. - Three-D textures have the slice offset (held in TextureMapSize register) factored in to the address calculation.
- The borders are added in (if present) separately to the width calculation so they don't get divided out due to mip mapping.
-
- The Memory Allocator will request a logical page be invalidated if it is present. This will be a comparatively rare operation as it will occur once per download. In theory the logical page which is being invalidates should not be in the TLB as normally there are many more pages in the working set than TLB entries. Consequently the TLB holds the set of most recent pages while the page allocated is the least recently used one and they should not overlap. (It is possible to make them overlap by setting the working set to fewer pages than TLB entries or by doing many externally initiated texture downloads.)
- The Address Mapper checks if the logical to physical page mapping is already known before it takes the slower route of reading the Logical Texture Page Table. The TLB is fully associative and can provide the physical page (if present) in a single cycle (maybe pipelined). The update time can take longer if necessary as this will only occur after a Logical Texture Page Table read.
-
- The Download Controller asks for a physical page at the start of a new texture download. This is passed in the MAC FIFO and the tail page for the requested memory pool is allocated. The Physical Page Allocation Table is updated (via a private memory port) to move the tail page to the head of the pool. The previous logical page assigned to the allocated physical page is marked as non resident in the Logical Texture Page Table and invalidated in the TLB. The physical page is returned to the Download Controller via the MAD FIFO.
- The Address Mapper, when there is a TLB miss will ask for the physical page the logical page is mapped to be become the most recently used page in its pool (i.e. it is moved to the head).
Name | | Description |
pciTextureDown |
1 | This is asserted by the Address Mapper when it hits a page fault and needs a | |
loadRequest | texture page downloaded and that page is not currently being downloaded (the | |
download was instigated by another RX). This is cleared by the Download | ||
Controller. This signal tells the Texture Download Controller (in Gamma for | ||
RX or internal to P3) a download is needed. | ||
|
16 | This is set by the Address Mapper to show what logical page it is requesting. |
| ||
TextureDownload | ||
1 | This is asserted by the Address Mapper when it hits a page fault and needs a | |
Request | texture page downloaded. This is cleared by the Download Controller when | |
this page has been downloaded and the Logical Texture Page Table updated. | ||
This signal tells the Download Controller the pciLogicalTexturePage register | ||
holds a valid page number so it can inform the Address Mapper the download is | ||
complete (assuming the page matches). | ||
|
1 | This is asserted by the Download Controller and is used to validate the |
InProgress | DownloadLogicalPage value. The Address Mapper uses this to check if the | |
download it wants is currently being done. | ||
|
16 | This is set by the Download Controller to identify the logical page it is in the |
Page | process of downloading. | |
|
1 | This is asserted by the Download Controller when it has finished downloading |
Complete | the texture the Address Mapper is waiting on. | |
-
- pciTextureDownloadRequest. This signal is asserted by Texture Read Unit to request a texture download. It is de-asserted once the texture download has started.
- TextureFIFOFull. This signal is asserted by the Texture Read Unit when it is not able to accept any more data being written into the TextureInput FIFO.
-
- HostTexturePage. This register holds the host page (in
bits 0 . . . 19) where the texture resides. This is either a physical page or a virtual page. A bit in the TextureOperation register identifies the type of page. If the page is a virtual page then an interrupt is generated and the host will read the page and initiate the DMA once the data has been made available. The conversion from page to address is done by multiplying by 4096. - LogicalTexturePage. This register holds the logical page for the texture data and is returned back to the Texture Read Unit in
bits 0 . . . 15 of the first entry written to the Texture Input FIFO (the FIFO is 128 bits wide) as a header preceding the actual texture data. (All 32 bits of the register are returned inbits 0 . . . 31 to allow for future capabilities.) In a multi-RX system all the RXs take the texture download and not just the RX which requested it. - TextureOperation. This register holds the following information:
- HostTexturePage. This register holds the host page (in
Bit | ||
No. | Name | Description |
0-8 | Length | Transfer length in multiples of 128 bit words, |
maximum being 256 | ||
9-10 | Memory Pool | Identifies which memory pool the physical page is |
to be allocated from. | ||
11 | HostVirtual | This bit, when set, indicates the address is a host |
Address | virtual address so the data cannot be read directly | |
without software intervention. The | ||
TextureDownload interrupt is generated, if | ||
enabled. | ||
- 1. The host will service and clear this interrupt and read the regHostTextureAddr, regLogicalTexturePage and regTextureOperation registers.
- 3. The host will write the regLogicalTexturePage into the Texture Input FIFO.
- 4. The host will write the regTextureOperation into the Texture Input FIFO.
- 5. The host will write 0 into the Texture Input FIFO (to pad out to 128 bits).
- 6. The host will write 0 into the Texture Input FIFO (to pad out to 128 bits).
- 7. The host will download the texture data to the Texture Input FIFO using the length field in regTextureOperation to know how much data to download. The regHostTextureAddr register will indicate what texture page caused the page fault.
- 8. Wait until pciTextureDownloadRequest (visible via a PCI status register) is low. This will confirm that the data has been downloaded and prevents a possible race condition whereby a false new request is assumed before the old one has been removed.
- 9. The host will write to the regHostTextureAddr register (any data will do) and this will tell the Texture DMA Controller that all the texture data has been transferred.
void TextureDMAController (void) |
{ |
// These three registers can also be read and written by the host across |
// the PCI bus. |
uint32 | regHostTextureAddr, regLogicalTexturePage, |
regTextureOperation; | |
uint128 | fifoData; |
uint9 | length; |
forever |
{ |
if (pciTextureDownloadRequest is asserted) |
{ |
// Get the texture request info from the Texture Read Unit. |
regHostTextureAddr = pciHostTexturePage << 12; |
regLogicalTexturePage = pciLogicalTexturePage; |
regTextureOperation = pciTextureOperation; |
if (textureOperation.VirtualHostAddress) |
{ |
// Host virtual address. Just raise an interrupt and wait for |
// the host to kick of the DMA. |
SetInterrupt (eTextureDownload); |
// Host responds when it is ready by writing to the |
// regHostTextureAddr when it is ready. |
while (no write to regHostTextureAddr) |
; // wait |
// Now regHostTextureAddr holds the physical addr supplied by |
// host; |
} |
// SlaveTextureDownload is a bit in a general PCI register. |
if (SlaveTextureDownload == 0) |
{ |
|
|
|
WriteTextureFIFO (fifoData); |
// Wait for the texture request to be removed before sending |
// texture data. |
while (pciTextureDownloadRequest is asserted) |
; // wait. |
// Transfer the data. |
length = |
while (length > 0 && |
pciCommandMode.TextureDownloadEnalbe) |
{ |
|
|
4); |
|
8); |
|
12); |
WriteTextureFIFO (fifoData); |
length−−; |
regHostTextureAddr += 16; // byte address |
} |
} |
} |
} |
} |
void WriteTextureFIFO (int128 data) |
{ |
Wait for room in the Texture Input FIFO; |
Write data into Texture Input FIFO; |
} |
uint32 ReadAddr (uint32 byteAddr) |
{ |
return 32 bits of data read from byteAddr; |
} |
void TextureDMAController (void) |
{ |
// These three registers can also be read and written by the host across |
// the PCI bus. |
uint32 | regHostTextureAddr, regLogicalTexturePage, |
regTextureOperation; | |
uint32 | data; |
uint9 | length; |
int3 | i = 0; |
int | kRXCount; // Holds the number of RX in the system |
forever |
{ |
if (pciTextureDownloadRequest[i] is asserted) |
{ |
// Get the texture request info from the Texture Read Unit. |
regHostTextureAddr = ReadTextureInfo (i, 0) << 12; |
regLogicalTexturePage = ReadTextureInfo (i, 1); |
regTextureOperation = ReadTextureInfo (i, 2); |
if (textureOperation.VirtualHostAddress) |
{ |
// Host virtual address. Just raise an interrupt and wait for |
// the host to kick of the DMA. |
SetInterrupt (eTextureDownload); |
// Host responds when it is ready by writing to the |
// regHostTextureAddr when it is ready. |
while (no write to regHostTextureAddr) |
; // wait |
// Now regHostTextureAddr holds the physical addr supplied by |
// host; |
} |
|
|
|
WriteTextureFIFO (fifoData); |
// Wait for the texture request to be removed before sending |
// texture data. |
while (pciTextureDownloadRequest[i] is asserted) |
; // wait. |
// Transfer the data. |
length = |
while (length > 0 && |
pciCommandMode.TextureDownloadEnalbe) |
{ |
fifoData = Read.Addr (regHostTextureAddr + 0); |
WriteTextureFIFO (aata); |
fifoData = ReadAddr (regHostTextureAddr + 4); |
WriteTextureFIFO (aata); |
fifoData = ReadAddr (regHostTextureAddr + 8); |
WriteTextureFIFO (aata); |
fifoData = ReadAddr (regHostTextureAddr + 12); |
WriteTextureFIFO (aata); |
length−−; |
regHostTextureAddr += 16; // byte address |
} |
} |
// Round robbin to the next RX. |
i++; |
if (i == kRXCount) |
i = 0; |
} |
} |
uint32 ReadAddr (uint32 byteAddr) |
{ |
return 32 bits of data read from byteAddr; |
} |
// Reading the TextureFIFO returns the info (saves on address decode and |
// registers. Note this register is overloaded onto the XXX register. |
int32 ReadRxTextureInfo (int3 rxID, int2 register) |
{ |
int32 addr, data; |
addr = pciRXTextureBase + rxID * 12 + register * 4; // byte addr. |
data = PCI read on the secondary pci bus to addr; |
return data; |
} |
void WriteTextureFIFO (int32 data) |
{ |
int3 | i; |
int32 | addr; |
for (i = 0; i < kRXCount; i++) |
{ |
while (TextureInputFIFOFull[i] is asserted) |
; // wait until it goes empty. |
} |
// Increment the address to allow PCI bust writes. |
addr = pciRXTextureFIFOBase + textureDownloadOffset * 4; |
Write data to addr on the secondary PCI bus; |
textureDownloadOffset++; // wraps for modulo indexing |
} |
General Control
Bit No | | Description | |
0 | Enable | When set causes any texels needed by the fragment, but not in the primary cache to be | |
read. This is also qualified by the TextureEnable bit in the PrepareToRender message. | |||
1-4 | Width | This field holds the width of the map as a power of two. The legal range of values for this | |
field is 0 (map width = 1) to 11 (map width = 2048). This is only used when Texture3D | |||
is enabled and then is only used for cache management purposes and not for address | |||
calculations. | |||
Note this field is ignored in TextureReadMode1. | |||
5-8 | Height | This field holds the height of the map as a power of two. The legal range of values for | |
this field is 0 (map height = 1) to 11 (map height = 2048). This is only used when | |||
Texture3D is enabled and then is only used for cache management purposes and not for | |||
address calculations. | |||
Note field bit is ignored in TextureReadMode1. | |||
9-10 | TexelSize | This field holds the size of the texels in the texture map. The options are: | |
0 = 8 | |||
1 = 16 | |||
2 = 32 | |||
3 = 64 bits (Only valid for spans) | |||
11 | Texture3D | This bit, when set, enables 3D texture index generation. | |
Note this bit is ignored in TextureReadMode1. The CombinedCache mode bit should not | |||
be set when 3D textures are being used. | |||
12 | Combine | This bit, when set, causes the two banks of the Primary Cache to be joined together, | |
Cache | thereby increasing the size of a single texture map which can be efficiently handled. | ||
Note this bit is ignored in TextureReadMode1 | |||
13-16 | MapBase | This field defines which TextureBaseAddr register should be used to hold the address for | |
| map level | 0 when mip mapping or the texture map when not mip mapping. Successive | |
map levels are at increasing TextureBaseAddr registers upto (and including) the | |||
MaxMaxLevel (next field). | |||
3D textures always use TextureBaseAddr0. | |||
17-20 | MapMax | This field defines the maximum TextureBaseAddr register this texture should use when mip | |
Level | mapping. Any attempt to use beyond this level will clamp to this level. | ||
21 | LogicalTexture | This bit, when set, defines this texture or all mip map levels, if mip mapping, to be | |
logically mapped so undergo logical to physical translation of the texture addresses. | |||
22 | Origin | This field selects where the origin is for a texture map with a Linear or Patch64 layout. | |
The options are: | |||
0 = Top Left. | |||
1 = Bottom | |||
A Patch32 | |||
2 or Patch2 texture map is always bottom left origin. | |||
23-24 | Texture | This field defines any special processing needed on the texel data before it can be used. | |
Type | The options are: | ||
0 = Normal. | |||
1 = Eight bit indexed texture. | |||
2 = Sixteen bit YVYU texture in 422 format. | |||
3 = Sixteen bit VYUY texture in 422 format. | |||
25-27 | ByteSwap | This field defines the byte swapping, if any, to be done on texel data when it is used as a | |
bitmap. This is automatically done when spans are used. | |||
adjacent bytes to be swapped, bit 26 adjacent 16 bit words to be swapped and | |||
adjacent 32 bit words to be swapped. In combination this byte swap the input | |||
(ABCDEFGH) as follows: | |||
0 | |||
1 | |||
2 | |||
3 | |||
4 EFGHABCD | |||
5 | |||
6 | |||
7 | |||
28 | Mirror | This bit, when set will mirror any bitmap data. This only works for spans. | |
29 | Invert | This bit, when set will invert any bitmap data. This only works for spans. | |
30 | Opaque | This bit, when set, will cause the SpanColorMask to be modified rather than the pixel mask | |
Span | in SpanStepX or SpanStepYDom messages. | ||
The TextureCacheReplacementMode register controls the replacement policy in the primary cache. It has the following fields:
Bit No | | Description | |
0 | Keep | This bit, when set, will keep the oldest texels on the scanline when the | |
Oldest0 | wrap and just re-use a set of scratch lines. | ||
1-5 | Scratch | This field holds the number of cache lines to use as scratch lines when the | |
Lines0 | and the KeepOldest mode bit is set. The value in this field has a MIN_SCRATCH_SIZE value | ||
(currently 8) added to it so we can guarantee the scratch line size can always accommodate the | |||
cache lines the current fragments requires with some left over. Failure to make this provision | |||
would lead to deadlock. | |||
6 | Keep | This bit, when set, will keep the oldest texels on the scanline when the | |
Oldest1 | wrap and just re-use a set of scratch lines. | ||
7-11 | Scratch | This field holds the number of cache lines to use as scratch lines when the | |
Lines1 | and the KeepOldest mode bit is set. The value in this field has a MIN_SCRATCH_SIZE value | ||
(currently 8) added to it so we can guarantee the scratch line size can always accommodate the | |||
cache lines the current fragments requires with some left over. Failure to make this provision | |||
would lead to deadlock. | |||
12 | Show | This bit, when set, will cause the fragments color to be replaced by information relating to the | |
Cach | cache's performance. The red component shows the number of | ||
Info | The green component shows the number of | ||
The coding is as follows: | |||
0x40 = 0 misses | |||
0x80 = 1 miss | |||
0xA0 = 2 misses | |||
0xC0 = 3 misses | |||
0xE0 = 4 misses | |||
The blue component holds the number of cycles * 8 the fragment was delayed waiting for texel | |||
data. | |||
The alpha component holds the number of cycles * 8 the primary cache was stalled waiting for | |||
a free cache line. | |||
Sample Computer System Embodiment
Claims (11)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/591,225 US7710425B1 (en) | 2000-06-09 | 2000-06-09 | Graphic memory management with invisible hardware-managed page faulting |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/591,225 US7710425B1 (en) | 2000-06-09 | 2000-06-09 | Graphic memory management with invisible hardware-managed page faulting |
Publications (1)
Publication Number | Publication Date |
---|---|
US7710425B1 true US7710425B1 (en) | 2010-05-04 |
Family
ID=42124855
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/591,225 Expired - Fee Related US7710425B1 (en) | 2000-06-09 | 2000-06-09 | Graphic memory management with invisible hardware-managed page faulting |
Country Status (1)
Country | Link |
---|---|
US (1) | US7710425B1 (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140198122A1 (en) * | 2013-01-15 | 2014-07-17 | Microsoft Corporation | Engine for streaming virtual textures |
US20150058576A1 (en) * | 2013-08-20 | 2015-02-26 | International Business Machines Corporation | Hardware managed compressed cache |
US20150070365A1 (en) * | 2013-09-06 | 2015-03-12 | Apple Inc. | Arbitration method for multi-request display pipeline |
GB2564466A (en) * | 2017-07-13 | 2019-01-16 | Advanced Risc Mach Ltd | Graphics processing systems |
WO2021119185A1 (en) | 2019-12-10 | 2021-06-17 | Pony Ai Inc. | Dynamic memory address encoding |
US11221956B2 (en) * | 2017-05-31 | 2022-01-11 | Seagate Technology Llc | Hybrid storage device with three-level memory mapping |
US20220230270A1 (en) * | 2019-12-31 | 2022-07-21 | Intel Corporation | Method and apparatus for compression of graphics processing commands |
Citations (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5548740A (en) * | 1992-02-10 | 1996-08-20 | Sharp Kabushiki Kaisha | Information processor efficiently using a plurality of storage devices having different access speeds and a method of operation thereof |
US5548709A (en) | 1994-03-07 | 1996-08-20 | Silicon Graphics, Inc. | Apparatus and method for integrating texture memory and interpolation logic in a computer system |
US5594860A (en) * | 1995-01-27 | 1997-01-14 | Varis Corporation | Method for banding and rasterizing an image in a multiprocessor printing system |
US5611064A (en) | 1990-08-03 | 1997-03-11 | 3Dlabs Ltd. | Virtual memory system |
EP0766177A1 (en) * | 1995-09-29 | 1997-04-02 | International Business Machines Corporation | Information handling system including effective address translation for one or more auxiliary processors |
US5696927A (en) * | 1995-12-21 | 1997-12-09 | Advanced Micro Devices, Inc. | Memory paging system and method including compressed page mapping hierarchy |
US5790130A (en) | 1995-06-08 | 1998-08-04 | Hewlett-Packard Company | Texel cache interrupt daemon for virtual memory management of texture maps |
US5828382A (en) | 1996-08-02 | 1998-10-27 | Cirrus Logic, Inc. | Apparatus for dynamic XY tiled texture caching |
US5831640A (en) | 1996-12-20 | 1998-11-03 | Cirrus Logic, Inc. | Enhanced texture map data fetching circuit and method |
US5842015A (en) | 1996-07-26 | 1998-11-24 | Hewlett-Packard Company | System and method for real-time control of hardware in a multiprocessing environment |
US5880737A (en) | 1995-08-04 | 1999-03-09 | Microsoft Corporation | Method and system for accessing texture data in environments with high latency in a graphics rendering system |
US5886705A (en) * | 1996-05-17 | 1999-03-23 | Seiko Epson Corporation | Texture memory organization based on data locality |
US5886706A (en) | 1995-06-06 | 1999-03-23 | Hewlett-Packard Company | System for managing texture mapping data in a computer graphics system |
US5999189A (en) | 1995-08-04 | 1999-12-07 | Microsoft Corporation | Image compression to reduce pixel and texture memory requirements in a real-time image generator |
US6002410A (en) | 1997-08-25 | 1999-12-14 | Chromatic Research, Inc. | Reconfigurable texture cache |
US6002407A (en) | 1997-12-16 | 1999-12-14 | Oak Technology, Inc. | Cache memory and method for use in generating computer graphics texture |
US6011565A (en) | 1998-04-09 | 2000-01-04 | S3 Incorporated | Non-stalled requesting texture cache |
US6124865A (en) * | 1991-08-21 | 2000-09-26 | Digital Equipment Corporation | Duplicate cache tag store for computer graphics system |
US6202146B1 (en) * | 1998-06-29 | 2001-03-13 | Sun Microsystems, Inc. | Endianness checking for platform-independent device drivers |
US6246422B1 (en) * | 1998-09-01 | 2001-06-12 | Sun Microsystems, Inc. | Efficient method for storing texture maps in multi-bank memory |
US6249853B1 (en) * | 1997-06-25 | 2001-06-19 | Micron Electronics, Inc. | GART and PTES defined by configuration registers |
US6295068B1 (en) * | 1999-04-06 | 2001-09-25 | Neomagic Corp. | Advanced graphics port (AGP) display driver with restricted execute mode for transparently transferring textures to a local texture cache |
US6297832B1 (en) * | 1999-01-04 | 2001-10-02 | Ati International Srl | Method and apparatus for memory access scheduling in a video graphics system |
US6344852B1 (en) | 1999-03-17 | 2002-02-05 | Nvidia Corporation | Optimized system and method for binning of graphics data |
US6362826B1 (en) * | 1999-01-15 | 2002-03-26 | Intel Corporation | Method and apparatus for implementing dynamic display memory |
US6374404B1 (en) * | 1998-12-16 | 2002-04-16 | Sony Corporation Of Japan | Intelligent device having background caching of web pages from a digital television broadcast signal and method of same |
US6407998B1 (en) * | 1997-10-02 | 2002-06-18 | Thomson Licensing S.A. | Multimedia decoder for prioritized bi-directional communication in a broadcast system |
US6538650B1 (en) * | 2000-01-10 | 2003-03-25 | Intel Corporation | Efficient TLB entry management for the render operands residing in the tiled memory |
-
2000
- 2000-06-09 US US09/591,225 patent/US7710425B1/en not_active Expired - Fee Related
Patent Citations (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5611064A (en) | 1990-08-03 | 1997-03-11 | 3Dlabs Ltd. | Virtual memory system |
US6124865A (en) * | 1991-08-21 | 2000-09-26 | Digital Equipment Corporation | Duplicate cache tag store for computer graphics system |
US5548740A (en) * | 1992-02-10 | 1996-08-20 | Sharp Kabushiki Kaisha | Information processor efficiently using a plurality of storage devices having different access speeds and a method of operation thereof |
US5548709A (en) | 1994-03-07 | 1996-08-20 | Silicon Graphics, Inc. | Apparatus and method for integrating texture memory and interpolation logic in a computer system |
US5706481A (en) | 1994-03-07 | 1998-01-06 | Silicon Graphics, Inc. | Apparatus and method for integrating texture memory and interpolation logic in a computer system |
US5594860A (en) * | 1995-01-27 | 1997-01-14 | Varis Corporation | Method for banding and rasterizing an image in a multiprocessor printing system |
US5886706A (en) | 1995-06-06 | 1999-03-23 | Hewlett-Packard Company | System for managing texture mapping data in a computer graphics system |
US5790130A (en) | 1995-06-08 | 1998-08-04 | Hewlett-Packard Company | Texel cache interrupt daemon for virtual memory management of texture maps |
US5999189A (en) | 1995-08-04 | 1999-12-07 | Microsoft Corporation | Image compression to reduce pixel and texture memory requirements in a real-time image generator |
US5880737A (en) | 1995-08-04 | 1999-03-09 | Microsoft Corporation | Method and system for accessing texture data in environments with high latency in a graphics rendering system |
EP0766177A1 (en) * | 1995-09-29 | 1997-04-02 | International Business Machines Corporation | Information handling system including effective address translation for one or more auxiliary processors |
US5696927A (en) * | 1995-12-21 | 1997-12-09 | Advanced Micro Devices, Inc. | Memory paging system and method including compressed page mapping hierarchy |
US5886705A (en) * | 1996-05-17 | 1999-03-23 | Seiko Epson Corporation | Texture memory organization based on data locality |
US5842015A (en) | 1996-07-26 | 1998-11-24 | Hewlett-Packard Company | System and method for real-time control of hardware in a multiprocessing environment |
US5828382A (en) | 1996-08-02 | 1998-10-27 | Cirrus Logic, Inc. | Apparatus for dynamic XY tiled texture caching |
US5831640A (en) | 1996-12-20 | 1998-11-03 | Cirrus Logic, Inc. | Enhanced texture map data fetching circuit and method |
US6249853B1 (en) * | 1997-06-25 | 2001-06-19 | Micron Electronics, Inc. | GART and PTES defined by configuration registers |
US6002410A (en) | 1997-08-25 | 1999-12-14 | Chromatic Research, Inc. | Reconfigurable texture cache |
US6407998B1 (en) * | 1997-10-02 | 2002-06-18 | Thomson Licensing S.A. | Multimedia decoder for prioritized bi-directional communication in a broadcast system |
US6002407A (en) | 1997-12-16 | 1999-12-14 | Oak Technology, Inc. | Cache memory and method for use in generating computer graphics texture |
US6011565A (en) | 1998-04-09 | 2000-01-04 | S3 Incorporated | Non-stalled requesting texture cache |
US6202146B1 (en) * | 1998-06-29 | 2001-03-13 | Sun Microsystems, Inc. | Endianness checking for platform-independent device drivers |
US6246422B1 (en) * | 1998-09-01 | 2001-06-12 | Sun Microsystems, Inc. | Efficient method for storing texture maps in multi-bank memory |
US6374404B1 (en) * | 1998-12-16 | 2002-04-16 | Sony Corporation Of Japan | Intelligent device having background caching of web pages from a digital television broadcast signal and method of same |
US6297832B1 (en) * | 1999-01-04 | 2001-10-02 | Ati International Srl | Method and apparatus for memory access scheduling in a video graphics system |
US6362826B1 (en) * | 1999-01-15 | 2002-03-26 | Intel Corporation | Method and apparatus for implementing dynamic display memory |
US6344852B1 (en) | 1999-03-17 | 2002-02-05 | Nvidia Corporation | Optimized system and method for binning of graphics data |
US6295068B1 (en) * | 1999-04-06 | 2001-09-25 | Neomagic Corp. | Advanced graphics port (AGP) display driver with restricted execute mode for transparently transferring textures to a local texture cache |
US6538650B1 (en) * | 2000-01-10 | 2003-03-25 | Intel Corporation | Efficient TLB entry management for the render operands residing in the tiled memory |
Non-Patent Citations (12)
Title |
---|
Blinn, Jim Blinn's Corner: "A Trip Down the Graphics Pipeline: Grandpa, What Dopes Viewport Mean?", IEEE Computer Graphics and Applications Journal, Jan. 1992, vol. 12, iss. 1. |
Blinn, Jim Blinn's Corner: "A Trip Down the Graphics Pipeline: Line Clipping", IEEE Computer Graphics and Applications Journal, Jan. 1991, vol. 11, issue 1. |
Blinn, Jim Blinn's Corner: "A Trip Down the Graphics Pipeline: Pixel Coordinates", IEEE Computer Graphics and Applications Journal, Jul. 1991, vol. 11, issue 4. |
Blinn, Jim Blinn's Corner: "A Trip Down the Graphics Pipeline:Sub-Pixelic Particles" IEEE Computer Graphics and Applications Journal, Sep. 1991, vol. 11, issue 5. |
Blinn, Jim Blinn's Corner: "Dirty Pixels", IEEE Computer Graphics and Applications Journal, Jan. 1989, vol. 9, issue 4. |
Cox et al., "Multi-Level Texture Caching for 3D Graphics Hardware," Proceedings of the 25th International Symposium on Computer Architechture, 1998. |
Foley et al., Computer Graphics: Principles and Practice (2.ed. 1990, corr.1995), pp. 741-744. |
Hakura and Gupta, "The Design and Analysis of a Cache Architecture for Texture Mapping," Proceedings of the 24th International Symposium on Computer Architechture, 1997. |
Heckbert, "Survey of Computer Graphics," IEEE Computer Graphics, Nov. 1986, pp. 56. |
Igehy et al., "Prefetching in a Texture Cache Architecture", IEEE. |
Jim Blinn's Corner, "The Truth About Texture Mapping" by James Blinn, IEEE Computer Graphics & Application, Mar. 1990, pp. 78-83. * |
Paul S. Heckbert, "Fundamentals of Texture Mapping and Image Warping," Thesis submitted to Dept. of EE and Computer Science, University of California, Berkeley, Jun. 17, 1994. |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140198122A1 (en) * | 2013-01-15 | 2014-07-17 | Microsoft Corporation | Engine for streaming virtual textures |
US9734598B2 (en) * | 2013-01-15 | 2017-08-15 | Microsoft Technology Licensing, Llc | Engine for streaming virtual textures |
US20150058576A1 (en) * | 2013-08-20 | 2015-02-26 | International Business Machines Corporation | Hardware managed compressed cache |
US20150100736A1 (en) * | 2013-08-20 | 2015-04-09 | International Business Machines Corporation | Hardware managed compressed cache |
US9582426B2 (en) * | 2013-08-20 | 2017-02-28 | International Business Machines Corporation | Hardware managed compressed cache |
US9720841B2 (en) * | 2013-08-20 | 2017-08-01 | International Business Machines Corporation | Hardware managed compressed cache |
US20150070365A1 (en) * | 2013-09-06 | 2015-03-12 | Apple Inc. | Arbitration method for multi-request display pipeline |
US9747658B2 (en) * | 2013-09-06 | 2017-08-29 | Apple Inc. | Arbitration method for multi-request display pipeline |
US11221956B2 (en) * | 2017-05-31 | 2022-01-11 | Seagate Technology Llc | Hybrid storage device with three-level memory mapping |
US10388057B2 (en) | 2017-07-13 | 2019-08-20 | Arm Limited | Graphics processing systems with efficient YUV format texturing |
GB2564466B (en) * | 2017-07-13 | 2020-01-08 | Advanced Risc Mach Ltd | Storing YUV texture data in a cache in a graphics processing system |
GB2564466A (en) * | 2017-07-13 | 2019-01-16 | Advanced Risc Mach Ltd | Graphics processing systems |
WO2021119185A1 (en) | 2019-12-10 | 2021-06-17 | Pony Ai Inc. | Dynamic memory address encoding |
EP4073657A4 (en) * | 2019-12-10 | 2023-12-20 | Pony AI Inc. | Dynamic memory address encoding |
US20220230270A1 (en) * | 2019-12-31 | 2022-07-21 | Intel Corporation | Method and apparatus for compression of graphics processing commands |
US12051130B2 (en) * | 2019-12-31 | 2024-07-30 | Intel Corporation | Method and apparatus for compression of graphics processing commands |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6650333B1 (en) | Multi-pool texture memory management | |
US6677952B1 (en) | Texture download DMA controller synching multiple independently-running rasterizers | |
US9916643B1 (en) | Multi-sample antialiasing optimization via edge tracking | |
US10262459B1 (en) | Multiple simultaneous bin sizes | |
US10162642B2 (en) | Shader with global and instruction caches | |
US6587113B1 (en) | Texture caching with change of update rules at line end | |
US7505036B1 (en) | Order-independent 3D graphics binning architecture | |
US7061500B1 (en) | Direct-mapped texture caching with concise tags | |
US7164426B1 (en) | Method and apparatus for generating texture | |
US6819332B2 (en) | Antialias mask generation | |
US6891543B2 (en) | Method and system for optimally sharing memory between a host processor and graphics processor | |
US6798421B2 (en) | Same tile method | |
US6731288B2 (en) | Graphics engine with isochronous context switching | |
US6744438B1 (en) | Texture caching with background preloading | |
US6791559B2 (en) | Parameter circular buffers | |
US7746352B2 (en) | Deferred page faulting in virtual memory based sparse texture representations | |
US8760460B1 (en) | Hardware-managed virtual buffers using a shared memory for load distribution | |
US8692829B2 (en) | Calculation of plane equations after determination of Z-buffer visibility | |
US7385608B1 (en) | State tracking methodology | |
US5936632A (en) | Method for fast downloading of textures to accelerated graphics hardware and the elimination of extra software copies of texels | |
US6683615B1 (en) | Doubly-virtualized texture memory | |
US8144156B1 (en) | Sequencer with async SIMD array | |
EP1721298A2 (en) | Embedded system with 3d graphics core and local pixel buffer | |
US20020171655A1 (en) | Dirty tag bits for 3D-RAM SRAM | |
US7050061B1 (en) | Autonomous address translation in graphic subsystem |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: 3DLABS INC., LTD.,GREAT BRITAIN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BALDWIN, DAVE;REEL/FRAME:011181/0414 Effective date: 20000814 |
|
AS | Assignment |
Owner name: FOOTHILL CAPITAL CORPORATION,CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:3DLABS INC., LTD., AND CERTAIN OF PARENT'S SUBSIDIARIES;3DLABS INC., LTD.;3DLABS (ALABAMA) INC.;AND OTHERS;REEL/FRAME:012063/0335 Effective date: 20010727 Owner name: FOOTHILL CAPITAL CORPORATION, CALIFORNIA Free format text: SECURITY INTEREST;ASSIGNORS:3DLABS INC., LTD., AND CERTAIN OF PARENT'S SUBSIDIARIES;3DLABS INC., LTD.;3DLABS (ALABAMA) INC.;AND OTHERS;REEL/FRAME:012063/0335 Effective date: 20010727 |
|
AS | Assignment |
Owner name: 3DLABS (ALABAMA) INC.,ALABAMA Free format text: RELEASE OF SECURITY AGREEMENT;ASSIGNOR:WELL FARGO FOOTHILL, INC., FORMERLY KNOWN AS FOOTHILL CAPITAL CORPORATION;REEL/FRAME:015722/0752 Effective date: 20030909 Owner name: 3DLABS INC., A CORP. OF DE,CALIFORNIA Free format text: RELEASE OF SECURITY AGREEMENT;ASSIGNOR:WELL FARGO FOOTHILL, INC., FORMERLY KNOWN AS FOOTHILL CAPITAL CORPORATION;REEL/FRAME:015722/0752 Effective date: 20030909 Owner name: 3DLABS INC., A COMPANY ORGANIZED UNDER THE LAWS OF Free format text: RELEASE OF SECURITY AGREEMENT;ASSIGNOR:WELL FARGO FOOTHILL, INC., FORMERLY KNOWN AS FOOTHILL CAPITAL CORPORATION;REEL/FRAME:015722/0752 Effective date: 20030909 Owner name: 3DLABS LIMITED, A COMPANY ORGANIZED UNDER THE LAWS Free format text: RELEASE OF SECURITY AGREEMENT;ASSIGNOR:WELL FARGO FOOTHILL, INC., FORMERLY KNOWN AS FOOTHILL CAPITAL CORPORATION;REEL/FRAME:015722/0752 Effective date: 20030909 Owner name: 3DLABS (ALABAMA) INC., ALABAMA Free format text: RELEASE OF SECURITY AGREEMENT;ASSIGNOR:WELL FARGO FOOTHILL, INC., FORMERLY KNOWN AS FOOTHILL CAPITAL CORPORATION;REEL/FRAME:015722/0752 Effective date: 20030909 Owner name: 3DLABS INC., A CORP. OF DE, CALIFORNIA Free format text: RELEASE OF SECURITY AGREEMENT;ASSIGNOR:WELL FARGO FOOTHILL, INC., FORMERLY KNOWN AS FOOTHILL CAPITAL CORPORATION;REEL/FRAME:015722/0752 Effective date: 20030909 Owner name: 3DLABS INC., LTD., A COMPANY ORGANIZED UNDER THE L Free format text: RELEASE OF SECURITY AGREEMENT;ASSIGNOR:WELL FARGO FOOTHILL, INC., FORMERLY KNOWN AS FOOTHILL CAPITAL CORPORATION;REEL/FRAME:015722/0752 Effective date: 20030909 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: ZIILABS INC. LTD., A CORPORATION ORGANIZED UNDER T Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:3DLABS LTD., A CORPORATION ORGANIZED UNDER THE LAWS OF GREAT BRITAIN;REEL/FRAME:029299/0659 Effective date: 20121110 |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
SULP | Surcharge for late payment | ||
IPR | Aia trial proceeding filed before the patent and appeal board: inter partes review |
Free format text: TRIAL NO: IPR2015-00929 Opponent name: APPLE INC. Effective date: 20150328 Free format text: TRIAL NO: IPR2015-00928 Opponent name: APPLE INC. Effective date: 20150328 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552) Year of fee payment: 8 |
|
AS | Assignment |
Owner name: RPX CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:ZIILABS INC., LTD;REEL/FRAME:048947/0592 Effective date: 20190418 |
|
AS | Assignment |
Owner name: MEDIATEK INC., TAIWAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:RPX CORPORATION;REEL/FRAME:054107/0830 Effective date: 20200618 |
|
AS | Assignment |
Owner name: RPX CORPORATION, CALIFORNIA Free format text: RELEASE OF LIEN ON PATENTS;ASSIGNOR:JEFFERIES FINANCE LLC, AS COLLATERAL AGENT;REEL/FRAME:053498/0067 Effective date: 20200814 |
|
AS | Assignment |
Owner name: MEDIATEK INC., TAIWAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:RPX CORPORATION;REEL/FRAME:054152/0888 Effective date: 20200618 |
|
AS | Assignment |
Owner name: BARINGS FINANCE LLC, AS COLLATERAL AGENT, NORTH CAROLINA Free format text: PATENT SECURITY AGREEMENT;ASSIGNORS:RPX CLEARINGHOUSE LLC;RPX CORPORATION;REEL/FRAME:054198/0029 Effective date: 20201023 Owner name: BARINGS FINANCE LLC, AS COLLATERAL AGENT, NORTH CAROLINA Free format text: PATENT SECURITY AGREEMENT;ASSIGNORS:RPX CLEARINGHOUSE LLC;RPX CORPORATION;REEL/FRAME:054244/0566 Effective date: 20200823 |
|
AS | Assignment |
Owner name: XUESHAN TECHNOLOGIES INC., CANADA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MEDIATEK INC.;REEL/FRAME:056593/0167 Effective date: 20201223 |
|
LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: RPX CLEARINGHOUSE LLC, CALIFORNIA Free format text: RELEASE OF SECURITY INTEREST IN SPECIFIED PATENTS;ASSIGNOR:BARINGS FINANCE LLC;REEL/FRAME:059925/0652 Effective date: 20220510 |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20220504 |