CN1722175A

CN1722175A - An Improved Method and Device for Stencil Shadow Cone Operation

Info

Publication number: CN1722175A
Application number: CNA2005100921322A
Authority: CN
Inventors: 徐建明; 陈文中; 王渊峰; 李亮; 约翰·柏拉勒斯; 博里斯·普罗科彭科
Original assignee: Via Technologies Inc
Current assignee: Via Technologies Inc
Priority date: 2004-08-23
Filing date: 2005-08-19
Publication date: 2006-01-18
Anticipated expiration: 2025-08-19
Also published as: TW200608309A; US20060038822A1; TWI307054B; US7277098B2; CN100354891C

Abstract

The present invention provides a shadow cone algorithm to improve the efficiency of shadow generation for use in a computer graphics system. In one embodiment, a method and apparatus are provided that utilize a combination of compressed and uncompressed stencil buffers, and are paired with compressed and uncompressed depth data buffers. An uncompressed stencil buffer may be used to store stencil shadow cone data for each pixel, and a compressed stencil buffer may be used to store stencil shadow cone data for a group of pixels. The compressed stencil buffer utilizes a high-speed cache to improve computational efficiency.

Description

An Improved Method and Device for Stencil Shadow Cone Operation

技术领域technical field

本发明涉及一种计算机绘图系统，特别涉及一种利用阴影锥产生阴影效果的方法和装置。The invention relates to a computer drawing system, in particular to a method and a device for producing shadow effects using shadow cones.

背景技术Background technique

所谓三维计算机绘图就是在平面的屏幕上把三维物体生成并显示为二维图像。三维对象可以是很单纯的点、线、三角形或多边形。而更复杂的对象则以相连的平面多边形表示，举例来说，以一系列的平面三角形集合成一立体对象。而所有的几何元素最终可以用一个或一组顶点来描述。举例来说，坐标(X，Y，Z)定义一个顶点或一条线的终点或一多边形的边角。The so-called three-dimensional computer graphics is to generate and display three-dimensional objects as two-dimensional images on a flat screen. Three-dimensional objects can be very simple points, lines, triangles or polygons. More complex objects are represented by connected planar polygons, for example, a series of planar triangles assembled into a solid object. And all geometric elements can be described by one or a group of vertices. For example, coordinates (X, Y, Z) define a vertex or an end point of a line or a corner of a polygon.

为了在二维的屏幕上投射出三维图像，该顶点值需经过一系列运算或绘图管线中的数道程序。一般绘图管线仅是一串相连的处理单元或程序，其中上一道程序的输出即为下一道程序的输入。在一绘图处理器中，这些程序包含各别顶点运算、基本组合运算、像素运算、材质组合运算、点阵转换运算以及片段运算。In order to project a three-dimensional image on a two-dimensional screen, the vertex value needs to go through a series of operations or several programs in the graphics pipeline. A general graphics pipeline is simply a chain of connected processing units or programs, where the output of the previous program is the input to the next program. In a graphics processor, these programs include individual vertex operations, basic combination operations, pixel operations, texture combination operations, bitmap transformation operations, and fragment operations.

在一典型绘图显示系统中，一图像数据库(例如一命令列表)可储存一场景中的对象描述。这些对象以数个小多边形描述，同样的，对象表面可以用数个小瓦片(tile)描述。每一多边形以一系列的顶点坐标(模块坐标X，Y，Z)以及部份材料表面属性(例如颜色、材质、光泽等)，甚或是每一顶点表面的法向量。对于具有复杂曲面的三维物体而言，多边形一般而言必须是三角形或四角形，其中后者可以拆成一对三角形。In a typical graphical display system, an image database (eg, a command list) may store descriptions of objects in a scene. These objects are described by several small polygons. Similarly, the object surface can be described by several small tiles. Each polygon has a series of vertex coordinates (module coordinates X, Y, Z) and some material surface properties (such as color, material, gloss, etc.), or even the normal vector of each vertex surface. For three-dimensional objects with complex surfaces, polygons must generally be triangles or quadrilaterals, where the latter can be broken down into a pair of triangles.

一转换引擎根据使用者输入的视角切换对象的坐标。此外，使用者可以指定视野、图像大小或视锥背端，藉以决定要包含或去除背景。A transformation engine switches the coordinates of the object according to the viewing angle input by the user. In addition, the user can specify the field of view, image size, or back end of the frustum to include or exclude the background.

一旦选定视线区域，一剪辑逻辑电路消除该视线区域以外的多边形(三角形)，并修剪那些部份在外、部份在内的多边形。被修剪的多边形的新边缘即为视线区域的边缘。这些多边形顶点接着被传送至下一阶段，对应屏幕的坐标(X，Y)，每一顶点具有对应的深度值(Z坐标)。在典型系统中，接着以光源模块计算光源，将这些多边形的颜色值传送到一点阵转换器。Once the view area is selected, a clipping logic circuit eliminates polygons (triangles) outside the view area and clips those polygons that are partially outside and partially inside. The new edges of the clipped polygons are the edges of the view area. These polygon vertices are then sent to the next stage, corresponding to screen coordinates (X, Y), each vertex has a corresponding depth value (Z coordinate). In a typical system, the light source is then calculated with the light source module, and the color values of these polygons are passed to a bitmap converter.

对每一多边形而言，该点阵转换器决定在哪一像素放置该多边形，并尝试将对应的颜色值与深度(Z-value)写入画面缓冲器。该点阵转换器比较该多边形的深度值和像素的深度值，其中该像素可能已经写入画面缓冲器。如果新的多边形的深度值较小，表示其位于先前写入的像素的前面，所以将画面缓冲器中该位置的值替换掉。此程序持续进行直到所有的多边形皆通过点阵转换器。至此，绘图控制器显示该画面缓冲器的内容于显示器上，依扫描线逐行显示。For each polygon, the bitmap converter determines at which pixel to place the polygon and attempts to write the corresponding color and depth (Z-value) to the frame buffer. The bitmap converter compares the polygon's depth value with the depth value of the pixel that may have been written to the frame buffer. If the depth value of the new polygon is smaller, it means that it is located in front of the previously written pixel, so the value at that position in the frame buffer is replaced. This process continues until all polygons have passed through the raster converter. So far, the graphics controller displays the contents of the frame buffer on the display, and displays them row by row according to scan lines.

图1是为现有绘图管线的运算流程图。绘图管线中的组件可能因系统而异，也可以各种方式表达。一计算机主机10(或计算机上执行的一绘图应用程序界面)产生一命令串行12，包含一系列绘图指令和数据，用以产生一「环境」于一绘图显示器上。该绘图管线中的组件对该命令串行12中的数据和指令做运算，以在显示器上产生画面。FIG. 1 is a flow chart of an operation of an existing graphics pipeline. The components in the graphics pipeline can vary from system to system and can be expressed in various ways. A host computer 10 (or a graphics application program interface executed on the computer) generates a command sequence 12, including a series of graphics commands and data, for creating an "environment" on a graphics display. Components in the graphics pipeline operate on the data and instructions in the command sequence 12 to generate images on the display.

一解码器14从该命令串行12截取数据以解译命令和沿绘图管线传送原始绘图数据。原始绘图数据包含位置(X，Y，Z和W坐标)，及光源和材质信息。这些原始绘图数据被解码器14从命令串行12中读取后，传送到一顶点转换器16(vertex shader)。该顶点转换器16则对该原始绘图数据执行各种转换程序。这些数据必须从世界坐标转换成模块视野坐标，投射坐标，最后转成屏幕坐标。该顶点转换器16中的功能为现有技术。接着绘图数据被传送到点阵转换器18，进行点阵转换。A decoder 14 intercepts data from the command line 12 to interpret commands and transmit raw graphics data along the graphics pipeline. Raw drawing data contains position (X, Y, Z and W coordinates), as well as light source and material information. After these original drawing data are read from the command sequence 12 by the decoder 14, they are sent to a vertex converter 16 (vertex shader). The vertex converter 16 executes various conversion programs on the original drawing data. These data must be converted from world coordinates to module view coordinates, projected coordinates, and finally to screen coordinates. The functions in this vertex converter 16 are known in the art. Then the drawing data is sent to the dot matrix converter 18 for dot matrix conversion.

一深度测试器20接着对该原始数据中的每一像素进行处理，比较一像素所对应的已储存的和新输入的深度值。该深度值是用以表示像素位置的深度值。如果新输入的深度值代表的深度较靠近视者，则取代已储存的深度值，而在绘图缓冲器24中相关的颜色信息亦被取代(由像素转换器22处理)。反之如果新的深度值深度较远，则无任何处理。颜色信息包含该像素是否位于阴影中，其判断的方法使用了现有的阴影锥算法。A depth tester 20 then processes each pixel in the raw data, comparing the stored and newly input depth values corresponding to a pixel. The depth value is used to represent the depth value of the pixel position. If the newly input depth value represents a depth closer to the viewer, the stored depth value is replaced, and the associated color information in the graphics buffer 24 is also replaced (processed by the pixel converter 22). On the contrary, if the new depth value is farther, there is no processing. The color information includes whether the pixel is in the shadow, and the method of judging it uses the existing shadow cone algorithm.

图2是为阴影锥算法示意图。该阴影锥34(shadow volume)定义了一障碍物32(occluder)在一光源30下的阴影空间。如果一像素38落在于该阴影锥34的范围内，所产生的图像便显示出在阴影下的效果。该阴影锥算法判断一像素38、39是否位于阴影锥34之中，计算光线35，以及观察者36进入点33和离开点37的次数。如果进入点33和离开点37的次数相等则不在阴影中。举例来说，从观察者36到像素38的光线35具有一次进入点33而没有离开点37。因此，像素38就是位在阴影中。同样的，从观察者36出发的光线31到达像素39前进入点33一次并离开点37一次，所以像素39不位在阴影中。Figure 2 is a schematic diagram of the shadow cone algorithm. The shadow cone 34 (shadow volume) defines a shadow space of an obstacle 32 (occluder) under a light source 30 . If a pixel 38 falls within the shadow cone 34, the resulting image appears to be in shadow. The shadow cone algorithm determines whether a pixel 38 , 39 is within the shadow cone 34 , and counts the number of times the ray 35 and the viewer 36 enter point 33 and exit point 37 . If the number of times entering point 33 and leaving point 37 is equal then it is not in the shadow. For example, ray 35 from observer 36 to pixel 38 has one entry point 33 and no exit point 37 . Thus, pixel 38 is in shadow. Likewise, ray 31 from observer 36 enters point 33 once and exits point 37 once before reaching pixel 39, so pixel 39 is not in shadow.

光迹追踪相当耗时，尤其是在多阻隔与多光源时。模板阴影锥算法简化了计算程序，仅利用一模板缓冲器(stencil buffer)执行简单的输入输出运算，又称为第二级模板缓冲器或SL2。该模板缓冲器，SL2，储存并执行每一像素数据以产生各种功能包含模板阴影锥算法。一像素是否位于阴影中，可藉由在一阴影锥多边形上的前平面和后平面(相对于视者或最深平面)上进行深度测试(Z-test)来判断。举例来说，如果前平面通过Z-test则该模板缓冲值累加，如果后平面通过Z-test则该模板缓冲值累减(decrement)。因此如果最后的模板值(stencil value)为零，该像素就不在阴影中。Raytracing can be quite time consuming, especially with multiple blocks and multiple light sources. The stencil shadow cone algorithm simplifies the calculation procedure, and only uses a stencil buffer (stencil buffer) to perform simple input and output operations, also known as the second-level stencil buffer or SL2. The stencil buffer, SL2, stores and executes per-pixel data to generate various functions including stencil shadow cone algorithms. Whether a pixel is in shadow can be determined by performing a depth test (Z-test) on the front and back planes (relative to the viewer or deepest plane) on a shadow cone polygon. For example, if the front plane passes the Z-test, the stencil buffer value is accumulated, and if the back plane passes the Z-test, the stencil buffer value is decremented. So if the final stencil value is zero, the pixel is not in shadow.

图3是为模板阴影锥算法的流程图。步骤40，初始化清除模板缓冲器，步骤42，以漫射颜色(diffuse color)产生场景。在步骤43中，提供数据给颜色缓冲器和深度缓冲器(又称为Z-buffer)。在步骤44中，该深度缓冲器和颜色缓冲器更新关闭除了模板值留在深度缓冲器中之外。在步骤46中，对每一道光而言，每一阻碍物皆产生一模板值，并描绘出每一模板值的多边形前平面(front facing polygon)。步骤47，对于每一被描绘了多边形前平面的像素累加模板缓冲值。步骤48，对每一多边形后平面亦执行同样步骤，步骤49，对每一被描绘了多边形后平面的像素，累减其模板缓冲值。这些累加和累减的步骤称为模板阴影锥程序。步骤50，具有非零模板值的物体即位在阴影中，并据以描绘。步骤52，具有模板值为零的物体即不在阴影中，故以镜面颜色(specular color)描绘。此步骤又称为镜面着色程序。如图1所示，像素转换器22计算出颜色信息之后，储存在显示缓冲器24(framebuffer)中。Figure 3 is a flowchart for the template shadow cone algorithm. Step 40, initialize and clear the stencil buffer, and step 42, generate a scene with diffuse color. In step 43, data is provided to a color buffer and a depth buffer (also called Z-buffer). In step 44, the depth buffer and color buffer updates are turned off except that the stencil values are left in the depth buffer. In step 46, for each ray, a stencil value is generated for each obstacle, and the front facing polygon of each stencil value is delineated. Step 47, accumulating the stencil buffer value for each pixel on which the polygonal front plane is drawn. In step 48, the same step is performed for each polygonal backplane. In step 49, for each pixel on which the polygonal backplane is drawn, the stencil buffer value is accumulated. These accumulation and subtraction steps are called the template shadow cone procedure. In step 50, objects with non-zero stencil values are positioned in the shadow and drawn accordingly. Step 52, the object with the template value of zero is not in the shadow, so it is drawn with specular color. This step is also known as the specular shading procedure. As shown in FIG. 1 , after the pixel converter 22 calculates the color information, it stores it in the display buffer 24 (framebuffer).

如图2所示，该像素38的模板缓冲值被累加了一次，因进入点33阴影锥多边形前平面的次数为一，且没有通过任何阴影锥多边形后平面所以模板缓冲值未被累减。于是像素38的模板缓冲值不为零。同样的，像素39进入点33，模板缓冲值加一，并离开点37，模板缓冲值减一。因此描绘时以镜面颜色描绘。本例仅包含单一障碍物和单一光源，但本模板阴影锥算法可适用于多障碍物与多光源的场合。As shown in FIG. 2, the stencil buffer value of the pixel 38 is accumulated once, because the number of times of entering the front plane of the shadow cone polygon at point 33 is one, and the stencil buffer value has not been accumulated because it has not passed through any shadow cone polygon back plane. The stencil buffer value for pixel 38 is then non-zero. Similarly, when pixel 39 enters point 33, the stencil buffer value increases by one, and when it leaves point 37, the stencil buffer value decreases by one. Therefore, it is drawn with the mirror color when drawing. This example only includes a single obstacle and a single light source, but the shadow cone algorithm of this template can be applied to occasions with multiple obstacles and multiple light sources.

图4是为现有的压缩深度数据(Z-data)处理单元，又称为ZL1。利用ZL1处理一区块或一瓦片的深度数据，可以增进系统效能。瓦片中有些像素的深度数据超过ZL1的压缩格式范围，需由另一像素深度数据处理单元(又称为ZL2)处理。FIG. 4 is an existing compressed depth data (Z-data) processing unit, also called ZL1. Using ZL1 to process the depth data of one block or one tile can improve system performance. The depth data of some pixels in the tile exceeds the range of the ZL1 compression format and needs to be processed by another pixel depth data processing unit (also called ZL2).

ZL1和ZL2通常代表第一阶深度缓冲器和第二阶深度缓冲器。这类算法又有各种名称，例如Hyper Z和Heirarchy Z Buffer。这两阶深度缓冲器可以为更大处理单元和最小颗粒储存更高阶深度信息，例如一瓦片(tile)，或屏幕上的一像素。ZL1的其中一项好处是在描绘管线(rendering pipeline)中可以减低计算深度数据的复杂度。ZL1 and ZL2 generally represent a first-level depth buffer and a second-level depth buffer. This type of algorithm has various names, such as Hyper Z and Heirarchy Z Buffer. The two-level depth buffer can store higher-level depth information for larger processing units and smallest grains, such as a tile, or a pixel on the screen. One of the benefits of ZL1 is that it reduces the complexity of calculating depth data in the rendering pipeline.

一瓦片产生器60为多个像素组成的一瓦片产生瓦片数据(tile data)，例如八乘以八，并传送一要求至一ZL1高速缓存64。该瓦片数据被送至一压缩深度数据处理单元(ZL1)62。该ZL1 62亦与该ZL1高速缓存64沟通。对于深度数据不能被ZL1 62处理的像素，则由像素深度数据处理单元(ZL2)66搭配ZL2高速缓存68处理。在本例中该ZL1 62一周期可以拒绝多达64像素，而未被拒绝的像素则标上「接受」或「重试」以减轻ZL2 66的流量。A tile generator 60 generates tile data for a tile composed of a plurality of pixels, such as eight times eight, and sends a request to a ZL1 cache 64 . The tile data is sent to a compressed depth data processing unit ( ZL1 ) 62 . The ZL1 62 also communicates with the ZL1 cache 64. For pixels whose depth data cannot be processed by ZL1 62, it is processed by pixel depth data processing unit (ZL2) 66 with ZL2 cache 68. In this example, the ZL1 62 can reject up to 64 pixels in one cycle, and the pixels that are not rejected are marked with "accept" or "retry" to reduce the traffic of the ZL2 66.

虽然ZL1 62降低了ZL2 66的内存读取流量，对解决模板运算并不算是很有效率。当进行模板运算时，ZL1 62将所有像素标上「重试」以确保每一模板运算不会遗漏。被拒绝的像素亦会对ZL2 66发出模板运算要求。因此在模板运算期间，ZL1 62必须耗费大量流量以滤除该结果。Although ZL1 62 reduces the memory read traffic of ZL2 66, it is not very efficient for solving template operations. When performing stencil operations, ZL1 62 marks all pixels as "retry" to ensure that each stencil operation will not be missed. Rejected pixels will also issue stencil operation requests to the ZL2 66. Therefore, during the template operation, the ZL1 62 must consume a large amount of traffic to filter out the result.

当一ZL1瓦片(子瓦片)在深度比较(Z-compare)功能之后被接受或拒绝，这个现象尤其明显。因为即使该子瓦片通过Z-test，模板运算依然进行，ZL1 62必须将该子瓦片从接受状态切换成重试状态，并传送至ZL2 66。这时ZL2 66和该模板缓冲器SL2结合，使ZL2/SL2处理单元为32位，包含24位深度值和8位模板值。在接受和拒绝状态下，为了那8位的模板值，整个32位的深度值/模板值必须全部被读取。导致内存流量效率极差。其中一种解决方法是使用个别的模板缓冲器和深度缓冲器，使内存要求量降至极小。举例来说，对八个像素而言，对一8位模板值的像素的内存要求只需要64位，造成很大的内存流量浪费。This phenomenon is especially evident when a ZL1 tile (sub-tile) is accepted or rejected after a depth comparison (Z-compare) function. Because even if the sub-tile passes the Z-test, the template calculation is still going on, ZL1 62 must switch the sub-tile from the accept state to the retry state and send it to ZL2 66. At this time, the ZL2 66 is combined with the stencil buffer SL2, so that the ZL2/SL2 processing unit is 32 bits, including a 24-bit depth value and an 8-bit stencil value. In the Accept and Reject states, the entire 32-bit depth/stencil value must be read in full for the 8-bit stencil value. Resulting in extremely poor memory flow efficiency. One solution is to use separate stencil and depth buffers, keeping memory requirements to a minimum. For example, for eight pixels, the memory requirement for a pixel with an 8-bit template value is only 64 bits, resulting in a large waste of memory traffic.

发明内容Contents of the invention

本发明的一实施例提供一计算机绘图装置，以增进模板阴影锥运算(stencil shadow volume operation)的效能。该计算机绘图装置包含一压缩模板缓冲器以及一压缩模板缓冲器高速缓存。其中该模板缓冲器包含一压缩模板阴影锥纪录，对应一组像素。该压缩模板阴影锥纪录包含一瓦片参考模板值(tile reference stencil value)。该组像素包含一瓦片，该瓦片包含多个子瓦片，每一子瓦片包含多个区块(block)。该压缩模板阴影锥纪录更进一步包含多个区块参考值，对应于每一区块。该压缩模板阴影锥纪录更进一步包含多个像素级差值(pixel delta value)，每一像素级差值对应该瓦片中的一像素。该压缩模板阴影锥纪录更进一步包含多个子瓦片修改标记位(subtile dirty bit)，每一子瓦片修改标记位对应该瓦片中的一子瓦片。一像素模板值包含这些子瓦片修改标记位其中之一，该瓦片参考模板值，这些区块参考值其中之一，以及这些像素级差值其中之一。An embodiment of the present invention provides a computer graphics device to improve the performance of stencil shadow volume operation. The computer graphics device includes a compressed stencil buffer and a compressed stencil buffer cache. Wherein the stencil buffer contains a compressed stencil shadow cone record corresponding to a group of pixels. The compressed stencil shadow cone record includes a tile reference stencil value. The group of pixels includes a tile, and the tile includes a plurality of sub-tiles, and each sub-tile includes a plurality of blocks. The compressed template shadow cone record further includes a plurality of block reference values, corresponding to each block. The compressed template shadow cone record further includes a plurality of pixel delta values, and each pixel delta value corresponds to a pixel in the tile. The compressed template shadow cone record further includes a plurality of subtile dirty bits, and each subtile dirty bit corresponds to a subtile in the tile. A pixel template value includes one of the sub-tile modification flag bits, the tile reference template value, one of the block reference values, and one of the pixel-level difference values.

本发明的另一实施例提供一绘图系统，包含一第一模板缓冲器，用于一组像素的一模板阴影锥运算，其中该组像素包含一瓦片。该第一模板缓冲器更进一步用于运算该组像素中一像素的一模板。该绘图系统亦包含一绘图处理器，用以产生一阴影效果，其中该阴影效果是通过该模板阴影锥运算产生，该绘图处理器储存一瓦片模板纪录于该第一模板范冲器中。该绘图系统更包含一第一模板缓冲器高速缓存，耦接该第一模板缓冲器。该绘图系统更进一步包含一第一深度缓冲器(depth buffer)，用以储存一瓦片深度纪录。该绘图系统更进一步包含一第二深度缓冲器，用以储存一像素深度纪录，以及一第二模板缓冲器，用以储存一像素模板纪录。其中该第二深度缓冲器和该第二模板缓冲器结合，该像素模板纪录和该像素深度纪录结合。该绘图系统，更进一步包含多个子瓦片，包含该瓦片，以及多个区块，包含这些子瓦片其中之一。其中该瓦片模板纪录更进一步包含一瓦片参考模板值，多个区块模板值，各对应这些区块其中之一，多个像素级差值，各对应该瓦片中的一像素，以及多个子瓦片值，各对应该瓦片中多个子瓦片其中之一。Another embodiment of the present invention provides a graphics system including a first stencil buffer for a stencil shadow cone operation of a set of pixels, wherein the set of pixels includes a tile. The first stencil buffer is further used to calculate a stencil for a pixel in the set of pixels. The graphics system also includes a graphics processor for generating a shadow effect, wherein the shadow effect is generated by the template shadow cone operation, and the graphics processor stores a tile template record in the first template buffer. The graphics system further includes a first stencil buffer cache coupled to the first stencil buffer. The graphics system further includes a first depth buffer for storing a tile depth record. The graphics system further includes a second depth buffer for storing a pixel depth record, and a second stencil buffer for storing a pixel stencil record. Wherein the second depth buffer is combined with the second stencil buffer, and the pixel stencil record is combined with the pixel depth record. The drawing system further comprises a plurality of subtiles comprising the tile, and a plurality of blocks comprising one of the subtiles. Wherein the tile template record further includes a tile reference template value, a plurality of block template values, each corresponding to one of these blocks, a plurality of pixel-level difference values, each corresponding to a pixel in the tile, and Multiple sub-tile values, each corresponding to one of the multiple sub-tiles in the tile.

本发明的另一实施例提供一种绘图方法，用于运算模板阴影锥，包含下列步骤。首先，产生一瓦片模板阴影纪录，一像素模板阴影纪录，以及一瓦片三角值。最后利用该瓦片模板阴影纪录和该像素模板阴影纪录进行一模板阴影锥运算。其中多个像素包含一区块，多个区块包含一子瓦片，多个子瓦片包含一瓦片，以及该瓦片模板阴影纪录包含每一区块的模板数据。Another embodiment of the present invention provides a drawing method for computing template shadow cones, including the following steps. First, a tile template shadow record, a pixel template shadow record, and a tile triangle value are generated. Finally, a template shadow cone operation is performed using the tile template shadow record and the pixel template shadow record. The pixels include a block, the blocks include a sub-tile, the sub-tiles include a tile, and the tile template shadow record includes template data for each block.

本发明的另一实施例提供一阴影产生方法，用于一计算机绘图系统，包含下列步骤。首先，求得一子瓦片，用于压缩，该子瓦片关联于一压缩深度数据缓冲器。接着选择性地将该子瓦片清扫至一像素深度数据/模板缓冲器，该子瓦片在一压缩模板缓冲器中并无压缩。最后在一压缩模板缓冲器中进行模板阴影锥运算。其中该求得子瓦片的步骤更进一步检测一子瓦片状态。其中该子瓦片状态包含「重试」、「拒绝」和「接受」。Another embodiment of the present invention provides a shadow generation method for a computer graphics system, comprising the following steps. First, a sub-tile is obtained for compression, and the sub-tile is associated with a compressed depth data buffer. The sub-tile is then optionally flushed to a pixel depth data/stencil buffer, the sub-tile is uncompressed in a compressed stencil buffer. Finally, the stencil shadow cone operation is performed in a compressed stencil buffer. The step of obtaining the sub-tile further detects a state of a sub-tile. The sub-tile status includes "retry", "reject" and "accept".

本发明的另一实施例提供一绘图方法，使计算机绘图系统在模板阴影锥运算时合并压缩模板数据至一像素模板缓冲器，包含下列步骤。首先判断一子瓦片满足一第一条件或一第二条件，其中该第一条件包含子瓦片过低(underflow)，该第二条件包含子瓦片满溢(overflow)。当该第一条件或第二条件其中之一符合时，将一子瓦片状态从「接受」改变为「重试」。接着设定一子瓦片合并屏蔽，用以确认在一压缩模板缓冲器中的压缩模板数据，以清扫(flush)至一像素模板缓冲器。接着合并该压缩模板数据至该像素模板缓冲器，藉此产生一结果，为该压缩模板数据和该像素模板缓冲器的模板数据总和。最后重置该压缩模板缓冲器中的一子瓦片修改标记位，使其值为零，以及清除该压缩模板缓冲器中的压缩模板值。Another embodiment of the present invention provides a drawing method for a computer graphics system to merge compressed stencil data into a pixel stencil buffer during stencil shadow cone operation, including the following steps. Firstly, it is judged that a sub-tile satisfies a first condition or a second condition, wherein the first condition includes sub-tile underflow, and the second condition includes sub-tile overflow. When one of the first condition or the second condition is met, change the state of a sub-tile from "accept" to "retry". A sub-tile merge mask is then set to validate the compressed stencil data in a compressed stencil buffer for flushing to a pixel stencil buffer. The compressed stencil data is then merged into the pixel stencil buffer, thereby generating a result that is the sum of the compressed stencil data and the stencil data of the pixel stencil buffer. Finally, a sub-tile modification flag bit in the compressed template buffer is reset to zero, and the compressed template value in the compressed template buffer is cleared.

附图说明Description of drawings

图1为现有绘图管线方块图；Figure 1 is a block diagram of an existing drawing pipeline;

图2为现有阴影锥的二维表示图；Fig. 2 is the two-dimensional representation diagram of existing shadow cone;

图3为现有模板阴影锥运算的方块图；Fig. 3 is the block diagram of existing template shadow cone operation;

图4为现有压缩z缓冲器的方块图；FIG. 4 is a block diagram of an existing compressed z-buffer;

图5为本发明实施例的一的绘图系统；Fig. 5 is a drawing system of an embodiment of the present invention;

图6为本发明实施例所使用的瓦片格式；Fig. 6 is the tile format used in the embodiment of the present invention;

图7为本发明实施例的一的压缩模板缓冲数据格式；Fig. 7 is a compressed template buffer data format according to an embodiment of the present invention;

图8为本发明实施例的一的ZL1子瓦片状态检测逻辑电路；FIG. 8 is a ZL1 sub-tile state detection logic circuit according to an embodiment of the present invention;

图9为本发明压缩模板缓冲器运算的实施例；FIG. 9 is an embodiment of the compressed stencil buffer operation of the present invention;

图10为本发明实施例的SL1预处理步骤；Fig. 10 is the SL1 pretreatment step of the embodiment of the present invention;

图11为本发明实施例的SL1累加运算；Fig. 11 is the SL1 accumulative operation of the embodiment of the present invention;

图12为本发明实施例的SL1累减运算；Fig. 12 is the SL1 cumulative subtraction operation of the embodiment of the present invention;

图13为本发明实施例的模板阴影锥程序合并运算；Fig. 13 is the template shadow cone program merging operation of the embodiment of the present invention;

图14为本发明实施例的球面色彩合并运算；Fig. 14 is a spherical color merging operation according to an embodiment of the present invention;

图15为本发明实施例的压缩模板缓冲器合并运算；以及FIG. 15 is a compressed stencil buffer merging operation according to an embodiment of the present invention; and

图16为本发明实施例的模板阴影锥运算中的压缩模板缓冲器。FIG. 16 shows the compressed stencil buffer in the stencil shadow cone operation of the embodiment of the present invention.

附图符号说明Description of reference symbols

10～计算机主机 12～命令串行10～Computer host 12～Command serial

14～解码器 16～顶点转换器14~decoder 16~vertex converter

18～点阵转换器 20～深度测试器18～dot matrix converter 20～depth tester

22～像素转换器 24～绘图缓冲器22～pixel converter 24～drawing buffer

30～光源 35～光线30～light source 35～light

31～光线 32～障碍物31～light 32～obstacles

34～阴影锥 38～像素34～Shadow Cone 38～Pixels

39～像素 33～进入点39 ~ pixel 33 ~ entry point

37～离开点 60～瓦片产生器37～Leaving Point 60～Tile Generator

62～压缩深度数据处理单元ZL162～compressed depth data processing unit ZL1

64～ZL1高速缓存64 ~ ZL1 cache

66～像素深度数据处理单元ZL266～Pixel depth data processing unit ZL2

68～ZL2高速缓存68 ~ ZL2 cache

500～计算机绘图装置 510～绘图处理器500～computer drawing device 510～graphic processor

511～高速缓存 512～高速缓存511～cache 512～cache

514～高速缓存 516～逻辑电路514～high-speed cache 516～logic circuit

520～内存 530～深度缓冲器520～memory 530～depth buffer

540～模板缓冲器 550～单一缓冲器540～stencil buffer 550～single buffer

560～压缩深度缓冲器ZL1 562～深度数据560～compressed depth buffer ZL1 562～depth data

570～压缩模板缓冲器SL1 572～模板值570 ~ compressed stencil buffer SL1 572 ~ stencil value

610～瓦片 620～子瓦片610～tiles 620～subtiles

630～区块 640～像素630～blocks 640～pixels

700～数据纪录格式 710～8位参考值700～data record format 710～8 reference value

720～3位参考值 730～1位三角值720～3 reference values 730～1 triangle value

740～1位的修改标记位740 ~ 1 modified flag bit

具体实施方式Detailed ways

图5是为本发明实施例的基本架构图。一计算机绘图装置500中包含一绘图处理器510和内存520。该内存520也可以是系统或主存储器，与该绘图处理器510搭配使用。该内存520中特定的地址用做为深度缓冲器530，以及模板缓冲器540。该深度缓冲器530和模板缓冲器540数据结构亦可结合成一单一缓冲器550。举例来说，该数据纪录是32位，其中的24位是深度值，而8位是模板值。该单一缓冲器550储存每一像素的纪录。FIG. 5 is a basic architecture diagram of an embodiment of the present invention. A computer graphics device 500 includes a graphics processor 510 and a memory 520 . The memory 520 can also be a system or main memory, used together with the graphics processor 510 . Specific addresses in the memory 520 are used as the depth buffer 530 and the stencil buffer 540 . The depth buffer 530 and stencil buffer 540 data structures can also be combined into a single buffer 550 . For example, the data record is 32 bits, of which 24 bits are the depth value and 8 bits are the template value. The single buffer 550 stores a record for each pixel.

内存520中另有特定地址做为压缩深度缓冲器ZL1 560，用以储存一组像素的深度数据(Z-data)562。该组像素可以是一瓦片，一子瓦片或多个瓦片。此外该内存520包含压缩模板缓冲器SL1 570，用以储存一瓦片的像素中的模板值572。一瓦片的像素可以是8乘以8，8乘以16或其它比例，视所需要的效能而定。There is another specific address in the memory 520 as a compressed depth buffer ZL1 560 for storing depth data (Z-data) 562 of a group of pixels. The set of pixels can be a tile, a sub-tile or multiple tiles. In addition, the memory 520 includes a compressed stencil buffer SL1 570 for storing stencil values 572 in pixels of a tile. The pixels in a tile can be 8 by 8, 8 by 16 or other ratios, depending on the required performance.

绘图处理器510中包含一高速缓存512做为压缩模板缓冲器SL1 570，和一高速缓存511用于压缩深度缓冲器ZL1 560，各用以储存压缩深度缓冲器ZL1 560和压缩模板缓冲器SL1 570纪录。绘图处理器510亦包含一高速缓存514，用以储存单一缓冲器550纪录。这些高速缓存512，高速缓存511和高速缓存514又称为SL1，Z11和ZL2/SL2。绘图处理器510更进一步包含逻辑电路电路516，用以在模板阴影锥运算中控制压缩深度缓冲器ZL1 560，压缩模板缓冲器SL1 570，深度缓冲器530和模板缓冲器540。该逻辑电路电路516亦可进行深度和模板阴影数据的压缩。该逻辑电路电路516可更进一步产生未压缩模板阴影数据542。此外，该逻辑电路电路516可选择性的合并关联于压缩模板缓冲器SL1 570和模板缓冲器540的模板值572和未压缩模板阴影数据542。In the graphics processor 510, a cache 512 is included as the compressed stencil buffer SL1 570, and a cache 511 is used for the compressed depth buffer ZL1 560, each for storing the compressed depth buffer ZL1 560 and the compressed stencil buffer SL1 570 record. GPU 510 also includes a cache 514 for storing single buffer 550 records. These caches 512, 511 and 514 are also referred to as SL1, Z11 and ZL2/SL2. Graphics processor 510 further includes logic circuitry 516 for controlling compressed depth buffer ZL1 560, compressed stencil buffer SL1 570, depth buffer 530, and stencil buffer 540 during stencil shadow cone operations. The logic circuit 516 may also perform compression of depth and stencil shadow data. The logic circuit 516 can further generate uncompressed template shadow data 542 . In addition, the logic circuitry 516 may optionally merge stencil values 572 and uncompressed stencil shadow data 542 associated with compressed stencil buffer SL1 570 and stencil buffer 540.

图6是为一瓦片格式的实施例。该瓦片610包含64个像素640，以8乘以8方式排列。该瓦片610亦分为四个子瓦片620，各包含8乘以2个像素。该瓦片610可以更进一步分割成16个区块630，各包含2乘以2个像素。FIG. 6 is an embodiment of a tile format. The tile 610 includes 64 pixels 640 arranged in an 8 by 8 pattern. The tile 610 is also divided into four sub-tiles 620, each containing 8 by 2 pixels. The tile 610 can be further divided into 16 blocks 630, each containing 2 by 2 pixels.

图7是为压缩模板缓冲器SL1 570数据纪录格式的实施例。在压缩模板缓冲器SL1 570中的模板值572包含每一瓦片610的纪录，对应压缩深度缓冲器ZL1 560中每一瓦片。该数据纪录格式700代表一个8乘以8的瓦片610，其中分为四个8乘以2的子瓦片620。该瓦片610更进一步分成16个2乘以2的区块630。该数据纪录格式700包含一8位参考值710用于该瓦片，一3位参考值720用于该16个区块630，以及1位三角值730用于每64个像素，和1位修改标记位740用于每个子瓦片。FIG. 7 is an embodiment of the data record format for the compressed stencil buffer SL1 570. Stencil values 572 in compressed stencil buffer SL1 570 contain records for each tile 610, corresponding to each tile in compressed depth buffer ZL1 560. The data record format 700 represents an 8 by 8 tile 610 divided into four 8 by 2 sub-tiles 620 . The tile 610 is further divided into sixteen 2 by 2 blocks 630 . The data record format 700 includes an 8-bit reference value 710 for the tile, a 3-bit reference value 720 for the 16 blocks 630, and 1-bit triangular value 730 for every 64 pixels, and 1 bit for modification Flag bits 740 are used for each subtile.

该区块数据以4位的半字节(nibble)以及3位的进位位(carry)表示。该4位各代表该区块中每一像素的像素级差值(pixel delta value)。该3位进位位代表该区块的参考值。该数据格式是基于对于统计上某个比例像素而言，相邻两像素的模板值差异通常不大一。虽然相邻两像素的模板值差异在SL1中不能大于1，使用下列编码方式的像素却能达到动态范围-4到+4。The block data is represented by a 4-bit nibble and a 3-bit carry. Each of the 4 bits represents a pixel delta value of each pixel in the block. The 3-bit carry bit represents the reference value of the block. This data format is based on the fact that for a statistically certain proportion of pixels, the difference in template values between two adjacent pixels is usually not greater than one. Although the template value difference between two adjacent pixels cannot be greater than 1 in SL1, the pixels using the following encoding methods can achieve a dynamic range of -4 to +4.

表1 区块参考值像素级差值＝0 像素级差值＝1 000 -4 -3 001 -3 -2 010 -2 -1 011 -1 0 100 0 +1 101 +1 +2 110 +2 +3 111 +3 +4 Table 1 block reference value Pixel level difference = 0 Pixel level difference = 1 000 -4 -3 001 -3 -2 010 -2 -1 011 -1 0 100 0 +1 101 +1 +2 110 +2 +3 111 +3 +4

图8是为在ZL1中检测子瓦片状态的逻辑电路的实施例。在步骤800中检查该子瓦片的D_Mask位值。该D_Mask是为该ZL1纪录中的一个位，用以表示该子瓦片是否需要描绘。在步骤810中，如果该D_Mask的值是零，则跳至步骤860，将该子瓦片的状态设为拒绝。反之跳至步骤820，检查该子瓦片的T_Mask值。在步骤830中，如果该T_Mask的值为0，则跳至步骤850，将该子瓦片的状态设为接受。反之如果该T_Mask的值为1，则跳至步骤840，将该子瓦片的状态设为重试。这些状态是用来判断该子瓦片是否适合SL1运算。FIG. 8 is an embodiment of a logic circuit for detecting sub-tile status in ZL1. In step 800 the D_Mask bit value of the sub-tile is checked. The D_Mask is a bit in the ZL1 record, used to indicate whether the sub-tile needs to be drawn. In step 810, if the value of the D_Mask is zero, skip to step 860, and set the status of the sub-tile as rejected. Otherwise, jump to step 820 to check the T_Mask value of the sub-tile. In step 830, if the value of the T_Mask is 0, skip to step 850, and set the state of the sub-tile as accepted. On the contrary, if the value of the T_Mask is 1, skip to step 840, and set the state of the sub-tile to retry. These states are used to determine whether the sub-tile is suitable for SL1 operation.

图9是为本发明其中一实施例。其中在模板阴影锥运算中的压缩模板缓冲器，SL1可以有许多不同的实作手法，本发明的精神并不限定于此。在步骤912中，在判断子瓦片深度数据的状态为「重试」、「接受」和「拒绝」之后，则判断该子瓦片是否需要经过SL1处理。在步骤914中，如果该子瓦片为重试，则表示该子瓦片不适合SL1处理，步骤跳至930，由SL2进行像素或区块等级的处理。反之，如果状态为「拒绝」或「接受」，则判定为适合SL1处理，跳至步骤916，判断该子瓦片信息是否可以压缩。判断原则是检查SL1是否足够容纳该子瓦片数据。如果不能压缩，跳至步骤918，将该子瓦片模板数据清扫至SL2。如果该子瓦片数据可以根据SL1数据纪录格式压缩至SL1，在步骤941中，在SL1中对该子瓦片进行模板运算。Fig. 9 is one embodiment of the present invention. Among them, the compressed stencil buffer in the stencil shadow cone operation, SL1 may have many different implementation methods, and the spirit of the present invention is not limited thereto. In step 912, after judging the state of the sub-tile depth data as "retry", "accept" and "reject", it is judged whether the sub-tile needs to be processed by SL1. In step 914, if the sub-tile is a retry, it means that the sub-tile is not suitable for SL1 processing, and the step skips to 930, and SL2 performs pixel or block level processing. On the contrary, if the status is "Reject" or "Accept", it is determined that it is suitable for SL1 processing, and jumps to step 916 to determine whether the sub-tile information can be compressed. The judgment principle is to check whether SL1 is enough to accommodate the sub-tile data. If it cannot be compressed, skip to step 918 and clear the sub-tile template data to SL2. If the sub-tile data can be compressed into SL1 according to the SL1 data record format, in step 941 , a template operation is performed on the sub-tile in SL1.

在步骤940中，在SL1中对该子瓦片进行模板运算时，一SL1处理器920，发出一SL1要求至SL1高速缓存922，并将该子瓦片模板纪录的快取信息放入SL1 FIFO堆栈中924中。该SL1运算器926进行模板阴影锥算法的累加和累减运算，并合并压缩数据至SL2 930。此外，该SL1运算器926检查模板数据纪录的满溢或过少状态，以避免数据错误。In step 940, when the template operation is performed on the sub-tile in SL1, an SL1 processor 920 sends an SL1 request to the SL1 cache 922, and puts the cached information of the template record of the sub-tile into the SL1 FIFO 924 in the stack. The SL1 calculator 926 performs accumulation and subtraction operations of the template shadow cone algorithm, and merges the compressed data to the SL2 930. In addition, the SL1 calculator 926 checks the overflow or underfill status of the template data records to avoid data errors.

图10是为SL1预处理程序的实施例。在步骤1010中，ZL1中的子瓦片已知皆具有接受或拒绝状态，并需要SL1纪录。在步骤1020中，对SL1高速缓存执行快取命中测试，而SL1进入点则存入一FIFO堆栈以补偿内存存取的延迟。在步骤1030中，如果快取命中测试的结果是失误，则在步骤1040中产生一SL1内存要求。步骤1050，SL1进入堆栈。Figure 10 is an example of a preprocessor for SL1. In step 1010, the subtiles in ZL1 are known to have accepted or rejected status and require SL1 records. In step 1020, a cache hit test is performed on the SL1 cache, and the SL1 entry point is stored in a FIFO stack to compensate for memory access latency. In step 1030, if the result of the cache hit test is a miss, then in step 1040 an SL1 memory request is generated. Step 1050, SL1 enters the stack.

图11是为在SL1中累加运算的实施例。步骤1110中，根据模板数据纪录的格式判断该瓦片参考值是否位于最大值。如果是，在步骤1120中，该SL1将整个瓦片的模板数据清扫至SL2使模板运算得以进行于一像素等级。如果否，在步骤1140中，判断是否每一瓦片具有「接受」状态。如果每一瓦片皆为接受状态，则在步骤1130中，累加瓦片参考值，完成累加运算程序。如果并非全部为接受状态，则在步骤1150中，检查每一区块的满溢状态。如果有任何一区块中的像素为满溢状态，则该区块为满溢状态。在步骤1160中，如果区块中没有任何像素满溢，则累加该子瓦片。在步骤1170中，任一子瓦片的该模板数据若具有满溢状态，则被清扫至SL2供模板运算。例如，在像素等级上的运算可在一区块或其它逻辑电路像素组上进行。Fig. 11 is an embodiment of accumulation operation in SL1. In step 1110, it is judged according to the format of the template data record whether the tile reference value is at the maximum value. If yes, in step 1120, the SL1 clears the stencil data of the entire tile to SL2 so that the stencil operation can be performed at a pixel level. If not, in step 1140, it is determined whether each tile has an "accepted" status. If each tile is accepted, then in step 1130, the tile reference value is accumulated to complete the accumulation operation procedure. If not all are accepting, then in step 1150, check the overflow status of each block. If the pixels in any block are in the overflow state, the block is in the overflow state. In step 1160, if no pixel in the block overflows, then the sub-tile is accumulated. In step 1170, if the template data of any sub-tile has overflow status, it is cleared to SL2 for template operation. For example, operations at the pixel level may be performed on a block or other logic circuit pixel group.

考虑一子瓦片在压缩模板缓冲纪录中的累加运算，瓦片参考值界于最小和最大值之间，而瓦片被切割为四个子瓦片，A，B，C和D。假设子瓦片C因为其中16个区块中至少一个过少而不具有接受状态，而该瓦片中其它区块没有过少状态。再假设子瓦片D因为其中16个区块中至少一个满溢而不具有接受状态，而该瓦片中没有其它区块满溢。因子瓦片A，B和C不具有满溢区块，对应这些区块的区块参考值被累加。因子瓦片D没有累加，则子瓦片D中所有像素的模板值被清扫至像素模板缓冲器。Consider the accumulation operation of a sub-tile in the compressed template buffer record, the tile reference value is between the minimum and the maximum value, and the tile is cut into four sub-tiles, A, B, C and D. Assume that sub-tile C does not have an accept state because at least one of its 16 blocks is too few, while the other blocks in the tile do not have too few states. Assume further that subtile D does not have an accept state because at least one of its 16 blocks overflows, while no other blocks in the tile overflow. Factor tiles A, B and C do not have overflow blocks, and the block reference values corresponding to these blocks are accumulated. If the factor tile D is not accumulated, the stencil values of all pixels in the sub-tile D are cleared to the pixel stencil buffer.

图12是为本发明实施例之一的SL1累减运算。在步骤1210中，根据该模板数据纪录的格式判断瓦片参考值是否位于最小值。如果瓦片参考值是最小值，则在步骤1220中，SL1将整个瓦片的模板数据清扫至SL2。如果瓦片参考值不是最小值，则在步骤1240中，检查瓦片中每个子瓦片是否具有接受状态。如果瓦片完全是接受状态，在步骤1230中，该瓦片参考值被累减，完成累减运算。如果瓦片不完全是接受状态，则跳至步骤1250，检查过少状态。如果有任何区块中的像素为过少状态，则该区块为过少状态。如果没有任何区块为过少状态，则跳至步骤1260，累减该子瓦片。在步骤1270中，任何具有过少状态区块的子瓦片，被清扫至SL2。FIG. 12 is an SL1 accumulation operation of one embodiment of the present invention. In step 1210, it is determined whether the tile reference value is at the minimum value according to the format of the template data record. If the tile reference value is the minimum value, then in step 1220, SL1 clears the template data of the entire tile to SL2. If the tile reference value is not the minimum value, then in step 1240, it is checked whether each sub-tile in the tile has accept status. If the tile is completely accepted, in step 1230, the tile reference value is accumulated to complete the accumulation operation. If the tile is not fully accepted, then skip to step 1250 to check for too little state. If any of the pixels in the block are in the low state, the block is in the low state. If no block is too small, then go to step 1260 to accumulate the sub-tiles. In step 1270, any sub-tiles with too few state blocks are flushed to SL2.

以上述实施例中的压缩模板缓冲器纪录考虑累减运算。因子瓦片A，B和D不具有任何满溢区块，在子瓦片中所有区块中的区块参考值被累减。子瓦片C因为一个区块参考值过少而不能被累减，该子瓦片C中所有像素的模板值被清扫至该像素模板缓冲器。如果上述瓦片中所有子瓦片皆具有接受状态，则该瓦片参考值根据对应的累加和累减运算而改变。The accumulation operation is considered with the compressed stencil buffer records in the above embodiments. Factor tiles A, B and D do not have any overflow blocks, and the block reference values in all blocks in the sub-tiles are accumulated. The sub-tile C cannot be accumulated because a block reference value is too small, and the stencil values of all pixels in the sub-tile C are cleared to the pixel stencil buffer. If all sub-tiles in the above-mentioned tile have an accepting state, the tile reference value is changed according to the corresponding accumulation and accumulation operations.

如上所述，当SL1中一子瓦片修改标记位被设定，该SL1数据被合并至SL2。该合并运算求出SL1和SL2中该模板值的最终分布状况。合并运算可以在模板阴影锥程序或镜面着色程序进行。在模板阴影锥程序中，如图13所示，步骤1310，判断该子瓦片是否满溢或过少。在步骤1320中，如果判断结果为真，则该子瓦片状态转换为重试。此外，在步骤1330中，产生一SM_Mask，用以从SL1和SL2中合并数据。该SM_Mask是为一额外屏蔽，由SL1所输出，用以指出该SL1和SL2是否要合并。最后的值，即SL1+SL2，在步骤1340中写入SL2。在步骤1350中，如果数据已合并至SL2，则该SL1子瓦片修改标记位重置为零，以代表该子瓦片已清干净，在步骤1360中，藉此将该子瓦片模板值清除。动态合并法降低了每一子瓦片满溢和过少的机率。As mentioned above, when a sub-tile modification flag is set in SL1, the SL1 data is merged into SL2. The merge operation finds the final distribution of the template values in SL1 and SL2. The merging operation can be performed in the stencil shadow cone program or in the specular shader program. In the shadow cone program of the template, as shown in FIG. 13 , step 1310 is to determine whether the sub-tile is overflowing or too small. In step 1320, if the determination result is true, the sub-tile state is changed to retry. In addition, in step 1330, an SM_Mask is generated for merging data from SL1 and SL2. The SM_Mask is an additional mask output by SL1 to indicate whether SL1 and SL2 are to be merged. The final value, ie SL1+SL2, is written to SL2 in step 1340 . In step 1350, if the data has been merged into SL2, the SL1 sub-tile modification flag bit is reset to zero to represent that the sub-tile has been cleaned, and in step 1360, the sub-tile template value clear. The dynamic merging method reduces the probability of each sub-tile being overfilled and underfilled.

图14是镜面着色程序的实施例，一ZL1中的位控制SL1和SL2合并运算的触发机制。步骤1410，开始。步骤1420，检查ZL1中的SL1瓦片修改标记位是否设定，步骤1430，检查SL1子瓦片修改标记位是否设定。如果上述步骤皆肯定答案，则在步骤1440中设定SM_Mask以通知ZL2进行SL1和SL2的合并。接着在步骤1450中，在模板比较之前进行SL1和SL2的合并，最后在步骤1460中将SL1和SL2的总和写入SL2。FIG. 14 is an embodiment of a specular shading program, a bit in ZL1 controls the triggering mechanism of the merge operation of SL1 and SL2. Step 1410, start. Step 1420, check whether the SL1 tile modification flag in ZL1 is set, and step 1430, check whether the SL1 sub-tile modification flag is set. If the answers to the above steps are all affirmative, set SM_Mask in step 1440 to notify ZL2 to combine SL1 and SL2. Then in step 1450, SL1 and SL2 are merged before template comparison, and finally in step 1460, the sum of SL1 and SL2 is written into SL2.

SL1和S12的合并，由SM_Mask位的设置而表示。图15是为一般合并程序的实施例。在步骤1510中，从SL1中读取该SM_Mask的值。在步骤1530中，如果该SM_Mask的值是零，则跳至步骤1520，没有任何事情发生。反之，如果SM_Mask的值是一，则在步骤1540中，产生该SL1和SL2的总和，开在步骤1550中将最后值写入SL2。The combination of SL1 and S12 is indicated by the setting of the SM_Mask bit. Figure 15 is an example of a general merge procedure. In step 1510, the value of the SM_Mask is read from SL1. In step 1530, if the value of the SM_Mask is zero, skip to step 1520 and nothing happens. Conversely, if the value of SM_Mask is one, then in step 1540, the sum of SL1 and SL2 is generated, and in step 1550 the final value is written into SL2.

图16代表本发明模板阴影锥运算时压缩模板缓冲器的实施例。在步骤1610中，产生一瓦片模板阴影纪录，对应一瓦片，其中该瓦片又分割为多个子瓦片，其中子瓦片再分割为多个区块，包含数个像素。此外，在步骤1620中，搭配瓦片模板阴影纪录，产生一像素模板阴影纪录以容纳每一像素的模板阴影值。在子瓦片模板数据超过该瓦片模板阴影纪录容量的情况下，该像素模板阴影纪录是必要的。此外，在步骤1630中，产生一瓦片深度纪录，对应该瓦片模板阴影纪录中像素的深度数据。在步骤1640中，尽可能利用瓦片模板阴影纪录执行模板阴影锥运算。若该瓦片模板阴影纪录不能容纳该模板阴影运算，则该运算在像素等级利用像素模板阴影纪录而进行。FIG. 16 represents an embodiment of compressing the stencil buffer during the operation of the stencil shadow cone of the present invention. In step 1610, a tile template shadow record is generated corresponding to a tile, wherein the tile is further divided into a plurality of sub-tiles, wherein the sub-tile is further divided into a plurality of blocks, including several pixels. In addition, in step 1620, a pixel template shadow record is generated in conjunction with the tile template shadow record to accommodate the template shadow value of each pixel. The pixel template shadow record is necessary in case the sub-tile template data exceeds the capacity of the tile template shadow record. In addition, in step 1630, a tile depth record is generated corresponding to the depth data of the pixels in the shadow record of the tile template. In step 1640, template shadow cone operations are performed using the tile template shadow records as much as possible. If the tile template shadow record cannot accommodate the template shadow operation, the operation is performed at the pixel level using the pixel template shadow record.

以上提供的实施例已突显本发明的诸多特色。本发明虽以较佳实施例揭露如上，然其并非用以限定本发明的范围，任何熟习此项技艺者，在不脱离本发明的精神和范围内，当可做各种的更动与润饰。The examples provided above have highlighted many features of the present invention. Although the present invention is disclosed above with preferred embodiments, it is not intended to limit the scope of the present invention. Any person skilled in the art can make various changes and modifications without departing from the spirit and scope of the present invention. .

Claims

1. a computer graphics unit comprises:

One compression stencil buffer; And

One compression stencil buffer high-speed cache;

Wherein, this compression stencil buffer comprises a compression template shadow awl record of one group of pixel.

2. device as claimed in claim 1, wherein:

This compression template shadow awl record comprises a tile reference template value;

This group pixel comprises a tile;

This tile comprises a plurality of sub-tiles; And

Each sub-tile comprises a plurality of blocks.

3. device as claimed in claim 2, wherein:

This compression template shadow awl record further comprises a plurality of block reference values, and it corresponds respectively to each block; And

The record of this compression template shadow awl further comprises a plurality of pixel triangle values, in this Pixel-level difference to the pixel in should tile.

4. device as claimed in claim 3, wherein:

The record of this compression template shadow awl further comprises a plurality of sub-tiles and revises marker bits, and each sub-tile is revised marker bit to the sub-tile in should tile at this;

One template pixel value comprises:

This a little tile revise marker bit one of them;

This tile reference template value;

These block reference values one of them; And

These Pixel-level differences one of them.

5. a drafting system comprises:

One first stencil buffer is used for a template shadow awl computing of one group of pixel, and wherein this group pixel comprises a tile, and this first stencil buffer further is used for a kind of template computing of a pixel, and this pixel is one of in this group pixel;

One painting processor, in order to produce a hatching effect, wherein, this hatching effect is to produce by this template shadow awl computing;

The logical circuit of one painting processor is in order to store tile template record in this first stencil buffer; And

One first stencil buffer high-speed cache, it is in order to communicate by letter with first stencil buffer.

6. system as claimed in claim 5 further comprises:

One first depth buffer is in order to store tile degree of depth record.

One second depth buffer is in order to store pixel depth record; And

One second stencil buffer is in order to store template pixel record, wherein

This second depth buffer and this second stencil buffer combination, this template pixel record and this pixel depth record combination.

7. system as claimed in claim 6 more comprises:

A plurality of sub-tiles are corresponding to this tile; And

A plurality of blocks, corresponding to this a little tile one of them; Wherein

This tile template record further comprises:

One tile reference template value;

A plurality of block stencil values, respectively corresponding these blocks one of them;

A plurality of Pixel-level differences are respectively to the pixel in should tile; And

A plurality of sub-tile values, each to a plurality of sub-tiles in should tile one of them.

8. system as claimed in claim 7 further comprises dynamic removing logical circuit, and in order to tile template record is optionally cleaned to this second stencil buffer, wherein, this tile template record there is no compression in this first stencil buffer.

9. the method for a template shadow awl computing is used for the counter drafting system, comprises:

Produce tile template shade record;

Produce template pixel shade record;

Produce a tile depth value record; And

Carry out the computing of template shadow awl, wherein this template shadow awl computing utilizes this tile template shade record, and this template shadow awl computing also utilizes this template pixel shade record.

10. method as claimed in claim 9, wherein:

A plurality of pixels are corresponding to a block;

A plurality of blocks are corresponding to a sub-tile;

A plurality of sub-tiles are corresponding to a tile; And

This tile template shade record comprises the template data of each block.

11. method as claimed in claim 10 further comprises:

The record of one tile template shade is optionally cleaned to a template pixel impact damper;

The template data of each sub-tile in the described tile;

The template data of each pixel in this tile; And

One tile reference value.

A 12. drafting system, comprise a template shadow awl arithmetical unit, in order to calculate the template shadow awl of a plurality of pixel groups, wherein these pixel groups comprise one first group of pixel, this first group of pixel comprises a plurality of pixels of one first quantity, these pixel groups also comprise one second group of pixel, and this second group of pixel comprises a plurality of pixels of one second quantity, and this first quantity is greater than this second quantity.

13. system as claimed in claim 12 further comprises a single template shadow awl arithmetical unit, in order to single pixel is done the computing of template shadow awl; Wherein this single pixel is to be selected from this first group of pixel or this second group of pixel.

14. a shadow production method is used for a computer plotting system, comprises:

Try to achieve a sub-tile, be used for compression, this sub-tile is associated with a compression depth data buffer;

Optionally should clean to a pixel depth data/stencil buffer by sub-tile, this sub-tile there is no compression in a compression stencil buffer; And

In a compression stencil buffer, carry out the computing of template shadow awl.

15. method as claimed in claim 14, wherein, this step of trying to achieve sub-tile further comprises the sub-tile state of detection; Wherein this sub-tile state comprises " retry ", " refusal " and " acceptance ".

16. method as claimed in claim 15, wherein:

Be " refusal " if the cleaning step further comprises this sub-tile state, then should clean to this pixel depth data/stencil buffer by sub-tile;

The step that detects this sub-tile state further comprises:

Read one first masking value, wherein, if this first masking value is one, then this sub-tile state is " refusal "; And

Read a secondary shielding value, wherein, if this secondary shielding value is one, then this sub-tile state is " retry "; And

If this secondary shielding value is zero, then this sub-tile state is " acceptance ".

17. method as claimed in claim 16, wherein, this is tried to achieve step and further comprises and judge whether this sub-tile template data is the step of a compressible form.

18. method as claimed in claim 17, wherein, this step of cleaning sub-tile comprises:

When being " refusal ", cleans this sub-tile state this sub-tile; And

When this sub-tile template data is incompressible, clean this sub-tile.

19. method as claimed in claim 18, wherein, this step of carrying out the computing of template shadow awl comprises:

This sub-tile template data of pre-service; At this, and send a requirement to a compression stencil buffer high-speed cache; And the data storing that will compress the stencil buffer high-speed cache is compressed in the stencil buffer in a first in first out.

20. method as claimed in claim 19, wherein, this step of carrying out the computing of template shadow awl further comprises:

A compression template record optionally adds up; Wherein

If a tile reference value is a maximal value, then this compression template record does not add up.

21. method as claimed in claim 20 further comprises:

If each sub-tile state is " acceptance " in the tile, this tile reference value then adds up;

If any one sub-tile state is " refusal " in this tile, then check the whether spill-over of each block in this sub-tile; And

If no any block spill-over in this sub-tile, sub-tile reference value then adds up.

22. method as claimed in claim 21, wherein, this step of carrying out the computing of template shadow awl further comprises the optionally tired compression template record that subtracts; Wherein, if a tile reference value is a minimum value, then this compression template record is not tired subtracts.

23. method as claimed in claim 22 further comprises:

If each sub-tile state is " acceptance " in the tile, then tired this tile reference value that subtracts;

If arbitrary sub-tile state is " refusal " in this tile, check then whether each block is low excessively in this sub-tile; And

If no any block is low excessively in this sub-tile, then tiredly subtract sub-tile reference value.

24. a method that merges compression template data to template pixel impact damper when the computing of template shadow awl is used for computer plotting system, this method comprises:

Judge that whether a sub-tile satisfies a first condition or a second condition, wherein to comprise sub-tile low excessively for this first condition, and this second condition comprises sub-tile spill-over;

When one of them meets when this first condition or second condition, a sub-tile state is changed into " retry " from " acceptance ";

Set a sub-tile and merge shielding, in order to confirm the compression template data in a compression stencil buffer, to clean to a template pixel impact damper;

Merge this compression template data to this template pixel impact damper, produce a result by this, be the template data summation of this compression template data and this template pixel impact damper;

Sub-tile in this compression stencil buffer of resetting is revised marker bit, and making its value is zero; And

Remove the compression stencil value in this compression stencil buffer.

25. method as claimed in claim 24, wherein, this combining step comprises:

From this compression template data, read this sub-tile and merge shielding;

If it is zero that this sub-tile merges shielding, then ignore this compression template data; And

If it is one that this sub-tile merges shielding, then merge this compression template data to this template pixel impact damper.

26. a method that merges compression template data to template pixel impact damper when the computing of minute surface color is used for computer plotting system, comprises:

Reading a tile and revise marker bit, is zero if this tile is revised marker bit, does not then carry out union operation;

Reading a sub-tile and revise marker bit, is zero if this sub-tile is revised marker bit, does not then carry out union operation;

Set a sub-tile and merge shielding,, and cleaned in order to affirmation compression template data;

Merge this compression template data and template pixel data; And

The summation of this compression template data and template pixel data is write this template pixel impact damper.