CN1722175A - Method and apparatus for operating improved stencil shadow awl - Google Patents
Method and apparatus for operating improved stencil shadow awl Download PDFInfo
- Publication number
- CN1722175A CN1722175A CNA2005100921322A CN200510092132A CN1722175A CN 1722175 A CN1722175 A CN 1722175A CN A2005100921322 A CNA2005100921322 A CN A2005100921322A CN 200510092132 A CN200510092132 A CN 200510092132A CN 1722175 A CN1722175 A CN 1722175A
- Authority
- CN
- China
- Prior art keywords
- tile
- template
- sub
- pixel
- compression
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/50—Lighting effects
- G06T15/60—Shadow generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/60—Memory management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Graphics (AREA)
- Image Generation (AREA)
Abstract
The computer graphics system is configured to improve the performance of a stencil shadow volume method for rendering shadows. The apparatus and methods utilize a combination of compressed and uncompressed stencil buffers in coordination with compressed and uncompressed depth data buffers. An uncompressed stencil buffer is capable of storing stencil shadow volume data for each pixel and a compressed stencil buffer is capable of storing stencil shadow volume data for a group of pixels. The compressed stencil buffer is utilized with a compressed stencil buffer cache to perform a stencil shadow volume operation more efficiently than present methods.
Description
Technical field
The present invention relates to a kind of computer plotting system, particularly a kind of method and apparatus that utilizes shadow awl to produce hatching effect.
Background technology
So-called plot of three-D computer is exactly on the screen on plane three-dimensional body to be generated and be shown as two dimensional image.Three dimensional object can be very simple point, line, triangle or polygon.More complicated object is then represented with the plane polygon that links to each other, for instance, is assembled a solid object with a series of plane triangle.And all geometric elements finally can be described with one or one group of summit.For instance, coordinate (X, Y, Z) terminal point or a polygonal corner of a summit of definition or a line.
In order to project 3-D view on the screen of two dimension, this summit value needs through several program in a series of computings or the drawing pipeline.General drawing pipeline only is a string continuous processing unit or program, and the output of wherein going up one program is down the input of one program.In a painting processor, these programs comprise summit out of the ordinary computing, basic combinatorial operation, Pixel calcualting, material combinatorial operation, dot matrix translation operation and fragment computing.
In a typical drawing display system, an image data base (a for example command list (CLIST)) can store the object factory in the scene.These objects are described with several little polygons, and same, object surface can be described with several small pieces of tiles (tile).(material surface attribute (for example color, material, gloss etc.) Z) and partly is even be the normal vector on each surface, summit for module coordinate X, Y with a series of apex coordinate for each polygon.For having complex-curved three-dimensional body, generally speaking polygon must be triangle or quadrangle, and wherein the latter can split into a diabolo.
One transform engine switches the coordinate of object according to the visual angle of user's input.In addition, the user can specify the visual field, image size or cone back of the body end, uses decision and will comprise or remove background.
In case the polygon (triangle) beyond this sight line zone is eliminated in selected sight line zone, a montage logical circuit, and prune those partly outside, partly at interior polygon.The polygonal new edge of being pruned is the edge in sight line zone.These polygon vertexs then are transferred into next stage, and (X, Y), each summit has corresponding depth value (Z coordinate) to the coordinate of corresponding screen.In canonical system, then calculate light source with light source module, these polygonal color values are sent to a dot matrix converter.
For each polygon, this dot matrix converter decision is placed this polygon in which pixel, and attempts the color value and the degree of depth (Z-value) of correspondence are write picture buffer.This dot matrix converter is the depth value of this polygonal depth value and pixel relatively, and wherein this pixel may write picture buffer.If new polygonal depth value is less, represent that it is positioned at the front of the pixel that had before write, so the value of this position in the picture buffer is replaced.This program continues to carry out up to all polygons all by the dot matrix converter.So far, drawing controller show this picture buffer in be dissolved on the display, show line by line according to sweep trace.
Fig. 1 is the operational flowchart for existing drawing pipeline.Assembly in the drawing pipeline may be different because of system, also can express in every way.One main frame 10 (or drawing Application Program Interface of carrying out on the computing machine) produces an order serial 12, comprises a series of drawing for orders and data, in order to produce one " environment " on a drawing display.Assembly in this drawing pipeline is done computing to data and instruction in this order serial 12, to produce picture on display.
One demoder 14 transmits raw graphics data from these order serial 12 data intercepts with the decipher order with along the drawing pipeline.Raw graphics data comprises position (X, Y, Z and W coordinate), and light source and material information.The decoded device 14 of these raw graphics datas is sent to a summit converter 16 (vertex shader) after reading from order serial 12.16 of this summit converters are carried out various converse routines to this raw graphics data.These data must convert module visual field coordinate to from world coordinates, and the projection coordinate changes into screen coordinate at last.Function in this summit converter 16 is a prior art.Then draw data is sent to dot matrix converter 18, carries out the dot matrix conversion.
One depth test device 20 is then handled each pixel in this raw data, relatively the pairing depth value of importing that stored and new of a pixel.This depth value is the depth value in order to the remarked pixel position.If the degree of depth of the depth value of new input representative by myope, then replace the depth value that has stored, and relevant colouring information also is substituted (being handled by pixel converters 22) in draw buffer 24.If instead the new depth value degree of depth is far away, then there is not any processing.Colouring information comprises this pixel and whether is arranged in shade, and the method for its judgement has been used existing shadow awl algorithm.
Fig. 2 is shadow awl algorithm synoptic diagram.This shadow awl 34 (shadow volume) has defined the shadow volume of a barrier 32 (occluder) under a light source 30.If a pixel 38 drops in the scope of this shadow awl 34, just the image that is produced demonstrates the effect under shade.This shadow awl algorithm judges whether a pixel 38,39 is positioned among the shadow awl 34, compute ray 35, and observer's 36 inlet points 33 and leave a little 37 number of times.If inlet point 33 and to leave a little 37 number of times equal then not in shade.For instance, 38 light 35 has primary entry point 33 and does not leave a little 37 from observer 36 to pixel.Therefore, pixel 38 is exactly that the position is in shade.Same, arrive pixels 39 from the light 31 of observer 36s and advance into a little 33 once and leave a little 37 once, so pixel 39 is not in shade.
Ray tracing is quite consuming time, especially when many obstructs and multiple light courcess.Template shadow awl algorithm has been simplified calculation procedure, only utilizes a stencil buffer (stencil buffer) to carry out simple input/output operation, is called second level stencil buffer or SL2 again.This stencil buffer, SL2 stores and carries out each pixel data and comprises template shadow awl algorithm to produce various functions.Whether one pixel is arranged in shade, can judge by carry out depth test (Z-test) on frontal plane on the shadow awl polygon and back plane (with respect to the person of looking or the darkest plane).For instance, if frontal plane by Z-test this stencil buffers value add up, if back plane is by Z-test then tired subtract (decrement) of this stencil buffers value.If therefore last stencil value (stencil value) is zero, this pixel is not just in shade.
Fig. 3 is the process flow diagram for template shadow awl algorithm.Step 40, stencil buffer is removed in initialization, and step 42 produces scene with diffuse color (diffuse color).In step 43, provide data to color buffer and depth buffer (being called Z-buffer again).In step 44, this depth buffer and color buffer are upgraded and are closed except stencil value is stayed in the depth buffer.In step 46, for each road light, each hamper all produces a stencil value, and depicts the polygon frontal plane (front facing polygon) of each stencil value.Step 47 is for each pixel of having been described polygon frontal plane stencil buffers value that adds up.Step 48 is also carried out same step to each polygon back plane, and step 49 has been described the pixel of polygon back plane to each, tired its stencil buffers value that subtracts.These add up and tire out the step that subtracts and are called template shadow awl program.Step 50, the object with non-zero stencil value is ascended the throne in shade, and describes according to this.Step 52 has stencil value and is zero object promptly not in shade, so describe with specular color (specular color).This step is called the minute surface coloring process again.As shown in Figure 1, pixel converters 22 calculates after the colouring information, is stored in the display buffer 24 (framebuffer).
As shown in Figure 2, the stencil buffers value of this pixel 38 has been added up once, because of the number of times of inlet point 33 shadow awl polygon frontal planes is one, so and do not subtracted by tired by any shadow awl polygon back plane stencil buffers value.So the stencil buffers value of pixel 38 is non-vanishing.Same, pixel 39 inlet points 33, the stencil buffers value adds one, and leaves a little 37, and the stencil buffers value subtracts one.Describe with specular color when therefore describing.This example only comprises single barrier and single light source, but this template shadow awl algorithm is applicable to the occasion of multi-obstacle avoidance and multiple light courcess.
Fig. 4 is for existing compression depth data (Z-data) processing unit, is called ZL1 again.Utilize ZL1 to handle the depth data of a block or a tile, can promote system effectiveness.The depth data of some pixel surpasses the compressed format scope of ZL1 in the tile, needs to be handled by another pixel depth data processing unit (being called ZL2 again).
ZL1 and ZL2 represent the first rank depth buffer and the second rank depth buffer usually.This class algorithm has various titles again, for example Hyper Z and Heirarchy Z Buffer.This two rank depth buffer can store more high-order depth information, a for example tile (tile), or the pixel on the screen for bigger processing unit and smallest particles.Wherein benefit of ZL1 is can lower the compute depth complexity of data in describing pipeline (rendering pipeline).
One tile generator 60 is that the tile that a plurality of pixels are formed produces tile data (tile data), for example eight multiply by eight, and transmits a requirement to a ZL1 high-speed cache 64.These tile data are sent to a compression depth data processing unit (ZL1) 62.This ZL1 62 also links up with this ZL1 high-speed cache 64.For the pixel that depth data can not be handled by ZL1 62, then handle by pixel depth data processing unit (ZL2) 66 collocation ZL2 high-speed caches 68.62 one-periods of this ZL1 can be refused nearly 64 pixels in this example, and not unaccepted pixel is then put on " acceptance " or " retry " to alleviate the flow of ZL2 66.
Reading flow though ZL1 62 has reduced the internal memory of ZL2 66, is not very efficient to solving the template computing.When carrying out the template computing, ZL1 62 puts on " retry " with all pixels and can not omit to guarantee each template computing.Unaccepted pixel also can be sent template computing requirement to ZL2 66.Therefore between the template operational stage, ZL1 62 must expend a large amount of flows with this result of filtering.
When a ZL1 tile (sub-tile) is accepted or rejected after (Z-compare) function in depth ratio, this phenomenon is especially obvious.Even because this sub-tile passes through Z-test, the template computing is still carried out, ZL1 62 must switch to retry state from receive status by sub-tile, and is sent to ZL2 66.At this moment ZL2 66 and this stencil buffer SL2 combination, making the ZL2/SL2 processing unit is 32, comprises 24 bit depth values and 8 stencil values.Under acceptance and disarmed state, for that stencil value of 8, whole 32 depth value/stencil value must all be read.Cause internal memory flow efficient extreme difference.Wherein a kind of solution is to use other stencil buffer and depth buffer, and it is minimum that the request memory amount is reduced to.For instance,, the request memory of the pixel of one 8 stencil values is only needed 64, cause very big internal memory flow waste eight pixels.
Summary of the invention
One embodiment of the invention provide a computer graphics unit, to promote the usefulness of template shadow awl computing (stencil shadow volume operation).This computer graphics unit comprises a compression stencil buffer and a compression stencil buffer high-speed cache.Wherein this stencil buffer comprises a compression template shadow awl record, corresponding one group of pixel.This compression template shadow awl record comprises a tile reference template value (tile reference stencil value).This group pixel comprises a tile, and this tile comprises a plurality of sub-tiles, and each sub-tile comprises a plurality of blocks (block).This compression template shadow awl record further comprises a plurality of block reference values, corresponding to each block.The record of this compression template shadow awl further comprises a plurality of Pixel-level differences (pixel delta value), and each Pixel-level difference is to the pixel in should tile.The record of this compression template shadow awl further comprises a plurality of sub-tiles and revises marker bits (subtile dirty bit), and each sub-tile is revised marker bit to the sub-tile in should tile.One template pixel value comprise this a little tile revise marker bit one of them, this tile reference template value, these block reference values one of them, and these Pixel-level differences one of them.
Another embodiment of the present invention provides a drafting system, comprises one first stencil buffer, is used for a template shadow awl computing of one group of pixel, and wherein this group pixel comprises a tile.This first stencil buffer further is used for a template of this group pixel one pixel of computing.This drafting system also comprises a painting processor, and in order to produce a hatching effect, wherein this hatching effect is to produce by this template shadow awl computing, and this painting processor stores a tile template and is embedded in this first template model in device.This drafting system more comprises one first stencil buffer high-speed cache, couples this first stencil buffer.This drafting system further comprises one first depth buffer (depth buffer), in order to store tile degree of depth record.This drafting system further comprises one second depth buffer, notes down in order to store a pixel depth, and one second stencil buffer, in order to store template pixel record.Wherein this second depth buffer and this second stencil buffer combination, this template pixel record and this pixel depth record combination.This drafting system further comprises a plurality of sub-tiles, comprises this tile, and a plurality of block, comprise this a little tile one of them.Wherein the record of this tile template further comprises a tile reference template value, a plurality of block stencil values, each corresponding these block one of them, a plurality of Pixel-level differences, each is to the pixel in should tile, and a plurality of sub-tile value, each to a plurality of sub-tiles in should tile one of them.
Another embodiment of the present invention provides a kind of drawing practice, is used for computing template shadow awl, comprises the following step.At first, produce tile template shade record, template pixel shade record, and a tile triangle value.Utilize this tile template shade record and this template pixel shade record carrying out template shadow awl computing at last.Wherein a plurality of pixels comprise a block, and a plurality of blocks comprise a sub-tile, and a plurality of sub-tiles comprise a tile, and this tile template shade record comprises the template data of each block.
Another embodiment of the present invention provides a shadow production method, is used for a computer plotting system, comprises the following step.At first, try to achieve a sub-tile, be used for compression, this sub-tile is associated with a compression depth data buffer.Then optionally should clean to a pixel depth data/stencil buffer by sub-tile, this sub-tile there is no compression in a compression stencil buffer.In a compression stencil buffer, carry out the computing of template shadow awl at last.Wherein this step of trying to achieve sub-tile further detects a sub-tile state.Wherein this sub-tile state comprises " retry ", " refusal " and " acceptance ".
Another embodiment of the present invention provides a drawing practice, makes computer plotting system merge compression template data to template pixel impact damper when the computing of template shadow awl, comprises the following step.Judge that at first a sub-tile satisfies a first condition or a second condition, wherein this first condition comprises sub-tile low excessively (underflow), and this second condition comprises sub-tile spill-over (overflow).When one of them meets when this first condition or second condition, a sub-tile state is changed into " retry " from " acceptance ".Then set a sub-tile and merge shielding, in order to confirm the compression template data in a compression stencil buffer, to clean (flush) to a template pixel impact damper.Then merge this compression template data to this template pixel impact damper, produce a result by this, be the template data summation of this compression template data and this template pixel impact damper.Sub-tile in this compression stencil buffer of resetting is at last revised marker bit, and making its value is zero, and removes the compression stencil value in this compression stencil buffer.
Description of drawings
Fig. 1 is existing drawing pipeline calcspar;
Fig. 2 is the bivariate table diagrammatic sketch of existing shadow awl;
Fig. 3 is the calcspar of existing template shadow awl computing;
Fig. 4 is the calcspar of existing compression z impact damper;
Fig. 5 is one a drafting system of the embodiment of the invention;
Fig. 6 is the employed tile form of the embodiment of the invention;
Fig. 7 is one a compression template buffered data form of the embodiment of the invention;
Fig. 8 is one a ZL1 tile state-detection logical circuit of the embodiment of the invention;
Fig. 9 compresses the embodiment of stencil buffer computing for the present invention;
Figure 10 is the SL1 pre-treatment step of the embodiment of the invention;
Figure 11 is the SL1 accumulating operation of the embodiment of the invention;
Figure 12 subtracts computing for the SL1 of the embodiment of the invention is tired;
Figure 13 is the template shadow awl program union operation of the embodiment of the invention;
Figure 14 is the sphere color union operation of the embodiment of the invention;
Figure 15 is the compression stencil buffer union operation of the embodiment of the invention; And
Figure 16 is the compression stencil buffer in the template shadow awl computing of the embodiment of the invention.
The reference numeral explanation
10~main frame 12~order serial
14~demoder, 16~summit converter
18~dot matrix converter, 20~depth test device
22~pixel converters, 24~draw buffer
30~light source, 35~light
31~light, 32~barrier
34~shadow awl, 38~pixel
39~pixel, 33~inlet point
37~leave a little 60~tile generator
62~compression depth data processing unit ZL1
64~ZL1 high-speed cache
66~pixel depth data processing unit ZL2
68~ZL2 high-speed cache
500~computer graphics unit, 510~painting processor
511~high-speed cache, 512~high-speed cache
514~high-speed cache, 516~logical circuit
520~internal memory, 530~depth buffer
540~stencil buffer 550~single impact damper
560~compression depth impact damper ZL1,562~depth data
570~compression stencil buffer SL1,572~stencil value
610~tile 620~sub-tile
630~block, 640~pixel
710~8 reference values of 700~data recording form
730~1 triangle values of 720~3 reference values
740~1 modification marker bit
Embodiment
Fig. 5 is the basic framework figure for the embodiment of the invention.Comprise a painting processor 510 and internal memory 520 in one computer graphics unit 500.This internal memory 520 also can be system or primary memory, uses with these painting processor 510 collocation.Specific address is used as depth buffer 530 in this internal memory 520, and stencil buffer 540.This depth buffer 530 and stencil buffer 540 data structures also can be combined into a single impact damper 550.For instance, this data recording is 32, and wherein 24 is depth value, and 8 is stencil value.This single impact damper 550 stores the record of each pixel.
Particular address is arranged in addition as compression depth impact damper ZL1 560 in the internal memory 520, in order to the depth data (Z-data) 562 that stores one group of pixel.This group pixel can be a tile, a sub-tile or a plurality of tile.This internal memory 520 comprises compression stencil buffer SL1 570 in addition, in order to the stencil value in the pixel that stores a tile 572.The pixel of one tile can be 8 to multiply by 8,8 and multiply by 16 or other ratio, decides on needed usefulness.
Comprise a high-speed cache 512 in the painting processor 510 and be used for compression depth impact damper ZL1 560 as a compression stencil buffer SL1 570 and a high-speed cache 511, each is in order to store compression depth impact damper ZL1 560 and compression stencil buffer SL1 570 records.Painting processor 510 also comprises a high-speed cache 514, in order to store single impact damper 550 records.These high-speed caches 512, high-speed cache 511 and high-speed cache 514 are called SL1 again, Z11 and ZL2/SL2.Painting processor 510 further comprises logical circuit circuit 516, in order to control compression depth impact damper ZL1 560 in the computing of template shadow awl, compression stencil buffer SL1 570, depth buffer 530 and stencil buffer 540.This logical circuit circuit 516 also can carry out the compression of the degree of depth and template shadow data.This logical circuit circuit 516 can further produce and not compress template shadow data 542.In addition, this logical circuit circuit 516 optionally merges the stencil value 572 that is associated with compression stencil buffer SL1 570 and stencil buffer 540 and does not compress template shadow data 542.
Fig. 6 is the embodiment for a tile form.This tile 610 comprises 64 pixels 640, multiply by 8 modes with 8 and arranges.This tile 610 also is divided into four sub-tiles 620, respectively comprises 8 and multiply by 2 pixels.This tile 610 can further be divided into 16 blocks 630, respectively comprises 2 and multiply by 2 pixels.
Fig. 7 is the embodiment for compression stencil buffer SL1 570 data recording forms.Stencil value 572 in compression stencil buffer SL1 570 comprises the record of each tile 610, each tile among the corresponding compression depth impact damper ZL1 560.This data recording form 700 represents one 8 to multiply by 8 tile 610, wherein is divided into four 8 and multiply by 2 sub-tile 620.This tile 610 further is divided into 16 2 and multiply by 2 block 630.This data recording form 700 comprises one 8 reference values 710 and is used for this tile, and one 3 reference values 720 are used for this 16 blocks 630, and 1 triangle value 730 is used for per 64 pixels and 1 and revises marker bit 740 and be used for each sub-tile.
This block data is with 4 nibble (nibble) and 3 carry digit (carry) expression.These 4 the Pixel-level differences (pixel delta value) of respectively representing each pixel in this block.These 3 carry digits are represented the reference value of this block.This data layout is based on for statistics goes up certain ratio pixel, the stencil value difference of adjacent two pixels common little one.Though the stencil value difference of adjacent two pixels can not use the pixel of following coded system can reach dynamic range-4 to+4 greater than 1 in SL1.
Table 1
The block reference value | Pixel-level difference=0 | Pixel-level difference=1 |
000 | -4 | -3 |
001 | -3 | -2 |
010 | -2 | -1 |
011 | -1 | 0 |
100 | 0 | +1 |
101 | +1 | +2 |
110 | +2 | +3 |
111 | +3 | +4 |
Fig. 8 is for detect the embodiment of the logical circuit of sub-tile state in ZL1.In step 800, check the D_Mask place value of this sub-tile.Whether this D_Mask is for a position in this ZL1 record, need to describe in order to represent this sub-tile.In step 810, if the value of this D_Mask is zero, then skip to step 860, the state of this sub-tile is made as refusal.Otherwise skip to step 820, check the T_Mask value of this sub-tile.In step 830, if the value of this T_Mask is 0, then skip to step 850, the state of this sub-tile is made as acceptance.If instead the value of this T_Mask is 1, then skip to step 840, the state of this sub-tile is made as retry.These states are to be used for judging whether this sub-tile is fit to the SL1 computing.
Fig. 9 is the present invention embodiment wherein.The compression stencil buffer in the computing of template shadow awl wherein, SL1 can have many different realities to make gimmick, and spirit of the present invention is not limited thereto.In step 912, the state of judging sub-tile depth data be " retry ", " acceptances " and " refusal " afterwards, judge then whether this sub-tile needs to pass through the SL1 processing.In step 914, if this sub-tile is a retry, represent that then this sub-tile is not suitable for SL1 and handles, step skips to 930, is carried out the processing of pixel or block grade by SL2.Otherwise, if state is " refusal " or " acceptance ", then is judged to be suitable SL1 and handles, skip to step 916, judge whether this sub-tile information can be compressed.Judgment principle is to check whether SL1 is large enough to hold this sub-tile data.If can not compress, skip to step 918, should clean to SL2 by sub-tile template data.If these sub-tile data can be compressed to SL1 according to SL1 data recording form, in step 941, in SL1, this sub-tile is carried out the template computing.
In step 940, when in SL1 this sub-tile being carried out the template computing, a SL1 processor 920 sends a SL1 and requires to SL1 high-speed cache 922, and the breath of winning the confidence soon that will this sub-tile template record is put into SL1 FIFO storehouse 924.This SL1 arithmetical unit 926 carries out adding up of template shadow awl algorithm and tires out subtracting computing, and merges packed data to SL2 930.In addition, this SL1 arithmetical unit 926 is checked the spill-over or the very few state of template data record, to avoid error in data.
Figure 10 is the embodiment for the SL1 preprocessor.In step 1010, the sub-tile among the ZL1 is known all to have acceptance or disarmed state, and needs the SL1 record.In step 1020, the SL1 high-speed cache is carried out the cache hit test, the SL1 inlet point then deposits the delay of a FIFO storehouse with the compensation memory access in.In step 1030,, then in step 1040, produce a SL1 request memory if the result of cache hit test is error.Step 1050, SL1 enters storehouse.
Figure 11 is the embodiment for accumulating operation in SL1.In the step 1110, judge according to the form of template data record whether this tile reference value is positioned at maximal value.If in step 1120, this SL1 cleans the template data of whole tile to SL2 and makes the template computing be carried out at a pixel class.If not, in step 1140, judge whether that each tile has " acceptance " state.If each tile is all receive status, then in step 1130, the tile reference value that adds up is finished the accumulating operation program.If be not all to be receive status, then in step 1150, check the overflow status of each block.If it is overflow status that the pixel in any block is arranged, then this block is an overflow status.In step 1160, if in the block without any the pixel spill-over, this sub-tile then adds up.In step 1170, this template data of arbitrary sub-tile is then cleaned to the computing of SL2 confession template if having overflow status.For example, can on a block or other logical circuit pixel groups, carry out in the computing on the pixel class.
Consider the accumulating operation of a sub-tile in the record of compression stencil buffers, tile reference value circle is between minimum and maximal value, and tile is cut into four sub-tiles, A, B, C and D.Suppose sub-tile C because wherein at least one is very few and do not have receive status in 16 blocks, and other block there is not very few state in this tile.Suppose sub-tile D again because wherein at least one spill-over and do not have receive status in 16 blocks, and do not have other block spill-over in this tile.Factor tile A, B and C do not have the spill-over block, and the block reference value of corresponding these blocks is added up.Factor tile D does not add up, and the stencil value of all pixels is cleaned to the template pixel impact damper among the then sub-tile D.
Figure 12 subtracts computing for the SL1 of one of embodiment of the invention is tired.In step 1210, judge according to the form of this template data record whether the tile reference value is positioned at minimum value.If the tile reference value is a minimum value, then in step 1220, SL1 cleans the template data of whole tile to SL2.If the tile reference value is not a minimum value, then in step 1240, check whether each sub-tile has receive status in the tile.If tile is receive status fully, in step 1230, this tile reference value is subtracted by tired, finishes the tired computing that subtracts.If tile not exclusively is a receive status, then skip to step 1250, check very few state.If it is very few state that the pixel in any block is arranged, then this block is very few state.If without any block is very few state, then skip to step 1260, tired this sub-tile that subtracts.In step 1270, any sub-tile with very few state block is cleaned to SL2.
Consider the tired computing that subtracts with the record of the compression stencil buffer in the foregoing description.Factor tile A, B and D do not have any spill-over block, and the block reference value in sub-tile in all blocks is subtracted by tired.Sub-tile C can not be subtracted by tired because block reference value is very few, and the stencil value of all pixels is cleaned to this template pixel impact damper among this sub-tile C.If all sub-tiles all have receive status in the above-mentioned tile, this tile reference value adding up and tired subtract computing and change then according to correspondence.
As mentioned above, a sub-tile modification marker bit is set in SL1, and these SL1 data are incorporated into SL2.This union operation is obtained the final distribution situation of this stencil value among SL1 and the SL2.Union operation can carry out in template shadow awl program or minute surface coloring process.In template shadow awl program, as shown in figure 13, step 1310 is judged the whether spill-over or very few of this sub-tile.In step 1320, if judged result is true, then this sub-tile state is converted to retry.In addition, in step 1330, produce a SM_Mask, in order to pooled data from SL1 and SL2.Whether this SM_Mask is for an additional mask, is exported by SL1, will merge in order to point out this SL1 and SL2.Last value, promptly SL1+SL2 writes SL2 in step 1340.In step 1350, if data have been incorporated into SL2, then this SL1 tile is revised marker bit and reset to zero, and is clean clearly to represent this sub-tile, in step 1360, should remove by sub-tile stencil value by this.Dynamically act of union has reduced each sub-tile spill-over and very few probability.
Figure 14 is the embodiment of minute surface coloring process, the position control SL1 among the ZL1 and the trigger mechanism of SL2 union operation.Step 1410, beginning.Step 1420 checks whether the SL1 tile modification marker bit among the ZL1 is set, and step 1430 checks whether SL1 tile modification marker bit is set.If above-mentioned steps answer is all certainly then set SM_Mask and is carried out the merging of SL1 and SL2 with notice ZL2 in step 1440.Then in step 1450, carry out the merging of SL1 and SL2 before relatively in template, in step 1460, the summation of SL1 and SL2 is write SL2 at last.
The merging of SL1 and S12 is represented by the setting of SM_Mask position.Figure 15 is the embodiment for general consolidation procedure.In step 1510, from SL1, read the value of this SM_Mask.In step 1530, if the value of this SM_Mask is zero, then skip to step 1520, take place without any thing.Otherwise, if the value of SM_Mask is one, then in step 1540, produce the summation of this SL1 and SL2, open in step 1550 final value is write SL2.
The embodiment of compression stencil buffer when Figure 16 represents template shadow awl computing of the present invention.In step 1610, produce tile template shade record, a corresponding tile, wherein this tile is divided into a plurality of sub-tiles again, and its neutron tile is divided into a plurality of blocks again, comprises several pixels.In addition, in step 1620, the record of collocation tile template shade produces template pixel shade record to hold the template shade value of each pixel.Surpass at sub-tile template data under the situation of this tile template shade record capacity, this template pixel shade record is necessary.In addition, in step 1630, produce tile degree of depth record, to the depth data of pixel in should the record of tile template shade.In step 1640, utilize the record of tile template shade to carry out the computing of template shadow awl as far as possible.If the record of this tile template shade can not be held this template shade computing, then this computing utilizes the record of template pixel shade and carries out in pixel class.
The embodiment that more than provides has highlighted many characteristics of the present invention.Though the present invention discloses as above with preferred embodiment, so it is not in order to limiting scope of the present invention, anyly has the knack of this skill person, without departing from the spirit and scope of the present invention, and when doing various changes and retouching.
Claims (26)
1. a computer graphics unit comprises:
One compression stencil buffer; And
One compression stencil buffer high-speed cache;
Wherein, this compression stencil buffer comprises a compression template shadow awl record of one group of pixel.
2. device as claimed in claim 1, wherein:
This compression template shadow awl record comprises a tile reference template value;
This group pixel comprises a tile;
This tile comprises a plurality of sub-tiles; And
Each sub-tile comprises a plurality of blocks.
3. device as claimed in claim 2, wherein:
This compression template shadow awl record further comprises a plurality of block reference values, and it corresponds respectively to each block; And
The record of this compression template shadow awl further comprises a plurality of pixel triangle values, in this Pixel-level difference to the pixel in should tile.
4. device as claimed in claim 3, wherein:
The record of this compression template shadow awl further comprises a plurality of sub-tiles and revises marker bits, and each sub-tile is revised marker bit to the sub-tile in should tile at this;
One template pixel value comprises:
This a little tile revise marker bit one of them;
This tile reference template value;
These block reference values one of them; And
These Pixel-level differences one of them.
5. a drafting system comprises:
One first stencil buffer is used for a template shadow awl computing of one group of pixel, and wherein this group pixel comprises a tile, and this first stencil buffer further is used for a kind of template computing of a pixel, and this pixel is one of in this group pixel;
One painting processor, in order to produce a hatching effect, wherein, this hatching effect is to produce by this template shadow awl computing;
The logical circuit of one painting processor is in order to store tile template record in this first stencil buffer; And
One first stencil buffer high-speed cache, it is in order to communicate by letter with first stencil buffer.
6. system as claimed in claim 5 further comprises:
One first depth buffer is in order to store tile degree of depth record.
One second depth buffer is in order to store pixel depth record; And
One second stencil buffer is in order to store template pixel record, wherein
This second depth buffer and this second stencil buffer combination, this template pixel record and this pixel depth record combination.
7. system as claimed in claim 6 more comprises:
A plurality of sub-tiles are corresponding to this tile; And
A plurality of blocks, corresponding to this a little tile one of them; Wherein
This tile template record further comprises:
One tile reference template value;
A plurality of block stencil values, respectively corresponding these blocks one of them;
A plurality of Pixel-level differences are respectively to the pixel in should tile; And
A plurality of sub-tile values, each to a plurality of sub-tiles in should tile one of them.
8. system as claimed in claim 7 further comprises dynamic removing logical circuit, and in order to tile template record is optionally cleaned to this second stencil buffer, wherein, this tile template record there is no compression in this first stencil buffer.
9. the method for a template shadow awl computing is used for the counter drafting system, comprises:
Produce tile template shade record;
Produce template pixel shade record;
Produce a tile depth value record; And
Carry out the computing of template shadow awl, wherein this template shadow awl computing utilizes this tile template shade record, and this template shadow awl computing also utilizes this template pixel shade record.
10. method as claimed in claim 9, wherein:
A plurality of pixels are corresponding to a block;
A plurality of blocks are corresponding to a sub-tile;
A plurality of sub-tiles are corresponding to a tile; And
This tile template shade record comprises the template data of each block.
11. method as claimed in claim 10 further comprises:
The record of one tile template shade is optionally cleaned to a template pixel impact damper;
The template data of each sub-tile in the described tile;
The template data of each pixel in this tile; And
One tile reference value.
A 12. drafting system, comprise a template shadow awl arithmetical unit, in order to calculate the template shadow awl of a plurality of pixel groups, wherein these pixel groups comprise one first group of pixel, this first group of pixel comprises a plurality of pixels of one first quantity, these pixel groups also comprise one second group of pixel, and this second group of pixel comprises a plurality of pixels of one second quantity, and this first quantity is greater than this second quantity.
13. system as claimed in claim 12 further comprises a single template shadow awl arithmetical unit, in order to single pixel is done the computing of template shadow awl; Wherein this single pixel is to be selected from this first group of pixel or this second group of pixel.
14. a shadow production method is used for a computer plotting system, comprises:
Try to achieve a sub-tile, be used for compression, this sub-tile is associated with a compression depth data buffer;
Optionally should clean to a pixel depth data/stencil buffer by sub-tile, this sub-tile there is no compression in a compression stencil buffer; And
In a compression stencil buffer, carry out the computing of template shadow awl.
15. method as claimed in claim 14, wherein, this step of trying to achieve sub-tile further comprises the sub-tile state of detection; Wherein this sub-tile state comprises " retry ", " refusal " and " acceptance ".
16. method as claimed in claim 15, wherein:
Be " refusal " if the cleaning step further comprises this sub-tile state, then should clean to this pixel depth data/stencil buffer by sub-tile;
The step that detects this sub-tile state further comprises:
Read one first masking value, wherein, if this first masking value is one, then this sub-tile state is " refusal "; And
Read a secondary shielding value, wherein, if this secondary shielding value is one, then this sub-tile state is " retry "; And
If this secondary shielding value is zero, then this sub-tile state is " acceptance ".
17. method as claimed in claim 16, wherein, this is tried to achieve step and further comprises and judge whether this sub-tile template data is the step of a compressible form.
18. method as claimed in claim 17, wherein, this step of cleaning sub-tile comprises:
When being " refusal ", cleans this sub-tile state this sub-tile; And
When this sub-tile template data is incompressible, clean this sub-tile.
19. method as claimed in claim 18, wherein, this step of carrying out the computing of template shadow awl comprises:
This sub-tile template data of pre-service; At this, and send a requirement to a compression stencil buffer high-speed cache; And the data storing that will compress the stencil buffer high-speed cache is compressed in the stencil buffer in a first in first out.
20. method as claimed in claim 19, wherein, this step of carrying out the computing of template shadow awl further comprises:
A compression template record optionally adds up; Wherein
If a tile reference value is a maximal value, then this compression template record does not add up.
21. method as claimed in claim 20 further comprises:
If each sub-tile state is " acceptance " in the tile, this tile reference value then adds up;
If any one sub-tile state is " refusal " in this tile, then check the whether spill-over of each block in this sub-tile; And
If no any block spill-over in this sub-tile, sub-tile reference value then adds up.
22. method as claimed in claim 21, wherein, this step of carrying out the computing of template shadow awl further comprises the optionally tired compression template record that subtracts; Wherein, if a tile reference value is a minimum value, then this compression template record is not tired subtracts.
23. method as claimed in claim 22 further comprises:
If each sub-tile state is " acceptance " in the tile, then tired this tile reference value that subtracts;
If arbitrary sub-tile state is " refusal " in this tile, check then whether each block is low excessively in this sub-tile; And
If no any block is low excessively in this sub-tile, then tiredly subtract sub-tile reference value.
24. a method that merges compression template data to template pixel impact damper when the computing of template shadow awl is used for computer plotting system, this method comprises:
Judge that whether a sub-tile satisfies a first condition or a second condition, wherein to comprise sub-tile low excessively for this first condition, and this second condition comprises sub-tile spill-over;
When one of them meets when this first condition or second condition, a sub-tile state is changed into " retry " from " acceptance ";
Set a sub-tile and merge shielding, in order to confirm the compression template data in a compression stencil buffer, to clean to a template pixel impact damper;
Merge this compression template data to this template pixel impact damper, produce a result by this, be the template data summation of this compression template data and this template pixel impact damper;
Sub-tile in this compression stencil buffer of resetting is revised marker bit, and making its value is zero; And
Remove the compression stencil value in this compression stencil buffer.
25. method as claimed in claim 24, wherein, this combining step comprises:
From this compression template data, read this sub-tile and merge shielding;
If it is zero that this sub-tile merges shielding, then ignore this compression template data; And
If it is one that this sub-tile merges shielding, then merge this compression template data to this template pixel impact damper.
26. a method that merges compression template data to template pixel impact damper when the computing of minute surface color is used for computer plotting system, comprises:
Reading a tile and revise marker bit, is zero if this tile is revised marker bit, does not then carry out union operation;
Reading a sub-tile and revise marker bit, is zero if this sub-tile is revised marker bit, does not then carry out union operation;
Set a sub-tile and merge shielding,, and cleaned in order to affirmation compression template data;
Merge this compression template data and template pixel data; And
The summation of this compression template data and template pixel data is write this template pixel impact damper.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/924,068 US7277098B2 (en) | 2004-08-23 | 2004-08-23 | Apparatus and method of an improved stencil shadow volume operation |
US10/924,068 | 2004-08-23 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1722175A true CN1722175A (en) | 2006-01-18 |
CN100354891C CN100354891C (en) | 2007-12-12 |
Family
ID=35909196
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB2005100921322A Active CN100354891C (en) | 2004-08-23 | 2005-08-19 | Method and apparatus for operating improved stencil shadow awl |
Country Status (3)
Country | Link |
---|---|
US (1) | US7277098B2 (en) |
CN (1) | CN100354891C (en) |
TW (1) | TWI307054B (en) |
Families Citing this family (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4218840B2 (en) * | 2005-05-27 | 2009-02-04 | 株式会社ソニー・コンピュータエンタテインメント | Drawing processing apparatus and drawing processing method |
US20080007563A1 (en) * | 2006-07-10 | 2008-01-10 | Microsoft Corporation | Pixel history for a graphics application |
US8427495B1 (en) | 2006-11-01 | 2013-04-23 | Nvidia Corporation | Coalescing to avoid read-modify-write during compressed data operations |
US9058792B1 (en) * | 2006-11-01 | 2015-06-16 | Nvidia Corporation | Coalescing to avoid read-modify-write during compressed data operations |
JP4902748B2 (en) * | 2006-12-08 | 2012-03-21 | メンタル イメージズ ゲーエムベーハー | Computer graphic shadow volume with hierarchical occlusion culling |
US8189237B2 (en) * | 2006-12-19 | 2012-05-29 | Xerox Corporation | Distributing a SRE codes in halftone pixels pattern in supercell |
US10115221B2 (en) * | 2007-05-01 | 2018-10-30 | Advanced Micro Devices, Inc. | Stencil compression operations |
US8184117B2 (en) * | 2007-05-01 | 2012-05-22 | Advanced Micro Devices, Inc. | Stencil operations |
US8184118B2 (en) * | 2007-05-01 | 2012-05-22 | Advanced Micro Devices, Inc. | Depth operations |
KR100883804B1 (en) | 2007-05-16 | 2009-02-16 | 박우찬 | 3-dimensional graphic managing apparatus including compression part and decompression part |
US7982734B2 (en) * | 2007-08-01 | 2011-07-19 | Adobe Systems Incorporated | Spatially-varying convolutions for rendering soft shadow effects |
US7970237B2 (en) * | 2007-08-01 | 2011-06-28 | Adobe Systems Incorporated | Spatially-varying convolutions for rendering glossy reflection effects |
WO2009035410A2 (en) * | 2007-09-12 | 2009-03-19 | Telefonaktiebolaget L M Ericsson (Publ) | Depth buffer compression |
GB2487421A (en) * | 2011-01-21 | 2012-07-25 | Imagination Tech Ltd | Tile Based Depth Buffer Compression |
US9378560B2 (en) * | 2011-06-17 | 2016-06-28 | Advanced Micro Devices, Inc. | Real time on-chip texture decompression using shader processors |
US9437025B2 (en) | 2012-07-12 | 2016-09-06 | Nvidia Corporation | Stencil data compression system and method and graphics processing unit incorporating the same |
US9098924B2 (en) * | 2013-07-15 | 2015-08-04 | Nvidia Corporation | Techniques for optimizing stencil buffers |
US9390464B2 (en) * | 2013-12-04 | 2016-07-12 | Nvidia Corporation | Stencil buffer data compression |
US9959590B2 (en) | 2016-03-30 | 2018-05-01 | Intel Corporation | System and method of caching for pixel synchronization-based graphics techniques |
Family Cites Families (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5266941A (en) * | 1991-02-15 | 1993-11-30 | Silicon Graphics, Inc. | Apparatus and method for controlling storage of display information in a computer system |
US6104417A (en) * | 1996-09-13 | 2000-08-15 | Silicon Graphics, Inc. | Unified memory computer architecture with dynamic graphics memory allocation |
US6552723B1 (en) * | 1998-08-20 | 2003-04-22 | Apple Computer, Inc. | System, apparatus and method for spatially sorting image data in a three-dimensional graphics pipeline |
JP3599268B2 (en) * | 1999-03-08 | 2004-12-08 | 株式会社ソニー・コンピュータエンタテインメント | Image processing method, image processing apparatus, and recording medium |
US6384822B1 (en) | 1999-05-14 | 2002-05-07 | Creative Technology Ltd. | Method for rendering shadows using a shadow volume and a stencil buffer |
GB2354416B (en) * | 1999-09-17 | 2004-04-21 | Technologies Limit Imagination | Depth based blending for 3D graphics systems |
US6801203B1 (en) * | 1999-12-22 | 2004-10-05 | Microsoft Corporation | Efficient graphics pipeline with a pixel cache and data pre-fetching |
US7119809B1 (en) * | 2000-05-15 | 2006-10-10 | S3 Graphics Co., Ltd. | Parallel architecture for graphics primitive decomposition |
US6486887B1 (en) * | 2000-06-08 | 2002-11-26 | Broadcom Corporation | Method and system for improving color quality of three-dimensional rendered images |
US6580427B1 (en) * | 2000-06-30 | 2003-06-17 | Intel Corporation | Z-compression mechanism |
US6557083B1 (en) * | 2000-06-30 | 2003-04-29 | Intel Corporation | Memory system for multiple data types |
US6763175B1 (en) * | 2000-09-01 | 2004-07-13 | Matrox Electronic Systems, Ltd. | Flexible video editing architecture with software video effect filter components |
US6798421B2 (en) * | 2001-02-28 | 2004-09-28 | 3D Labs, Inc. Ltd. | Same tile method |
US6825847B1 (en) * | 2001-11-30 | 2004-11-30 | Nvidia Corporation | System and method for real-time compression of pixel colors |
US6903741B2 (en) * | 2001-12-13 | 2005-06-07 | Crytek Gmbh | Method, computer program product and system for rendering soft shadows in a frame representing a 3D-scene |
JP4001227B2 (en) * | 2002-05-16 | 2007-10-31 | 任天堂株式会社 | GAME DEVICE AND GAME PROGRAM |
US20050134588A1 (en) * | 2003-12-22 | 2005-06-23 | Hybrid Graphics, Ltd. | Method and apparatus for image processing |
US7030878B2 (en) * | 2004-03-19 | 2006-04-18 | Via Technologies, Inc. | Method and apparatus for generating a shadow effect using shadow volumes |
-
2004
- 2004-08-23 US US10/924,068 patent/US7277098B2/en active Active
-
2005
- 2005-07-29 TW TW094125897A patent/TWI307054B/en active
- 2005-08-19 CN CNB2005100921322A patent/CN100354891C/en active Active
Also Published As
Publication number | Publication date |
---|---|
US7277098B2 (en) | 2007-10-02 |
CN100354891C (en) | 2007-12-12 |
TWI307054B (en) | 2009-03-01 |
TW200608309A (en) | 2006-03-01 |
US20060038822A1 (en) | 2006-02-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN1722175A (en) | Method and apparatus for operating improved stencil shadow awl | |
CN100342403C (en) | Method and apparatus for generating shadow effect using shadow space | |
EP3308359B1 (en) | Rendering using ray tracing to generate a visibility stream | |
US11170555B2 (en) | Graphics processing systems | |
US8659589B2 (en) | Leveraging graphics processors to optimize rendering 2-D objects | |
US8072460B2 (en) | System, method, and computer program product for generating a ray tracing data structure utilizing a parallel processor architecture | |
DE102020108218A1 (en) | Apparatus and method for constructing bounding volume hierarchies with reduced accuracy | |
US9965886B2 (en) | Method of and apparatus for processing graphics | |
DE102020124932A1 (en) | Apparatus and method for real-time graphics processing using local and cloud-based graphics processing resources | |
CN1287330C (en) | Eeficient graphics state management for zone rendering | |
US11216993B2 (en) | Graphics processing systems | |
CN1928918A (en) | Graphics processing apparatus and method for performing shading operations therein | |
DE102012213643A1 (en) | Multitthread physics engine with impulse forwarding | |
CN1773552A (en) | Systems and methods for compressing computer graphics color data | |
CN103959338A (en) | Switching between direct rendering and binning in graphics processing using an overdraw tracker | |
JP2008165760A (en) | Method and apparatus for processing graphics | |
GB2546810B (en) | Sparse rendering | |
US11210821B2 (en) | Graphics processing systems | |
CN111508056B (en) | Graphics processing system using extended transform level masks | |
CN1713224A (en) | System and method for cache optimized data formatting | |
EP3664037A1 (en) | Tiling a primitive in a graphics processing system | |
US8553041B1 (en) | System and method for structuring an A-buffer to support multi-sample anti-aliasing | |
US11734869B2 (en) | Graphics processing | |
GB2444628A (en) | Sorting graphics data for processing | |
US11210847B2 (en) | Graphics processing systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |