JPH11328438A

JPH11328438A - Method and device for high-efficiency floating-point z buffering

Info

Publication number: JPH11328438A
Application number: JP11073848A
Authority: JP
Inventors: F Dearring Michael; マイケル・エフ・ディアーリング
Original assignee: Sun Microsystems Inc
Current assignee: Sun Microsystems Inc
Priority date: 1998-03-18
Filing date: 1999-03-18
Publication date: 1999-11-30
Also published as: KR19990078036A

Abstract

PROBLEM TO BE SOLVED: To perform a floating-point Z buffer value process with high numeric precision while minimizing hardware by determining the exponent having the largest specific primitive and generating a common Z exponential value according to it. SOLUTION: A Z value represents the Z coordinate of the vertex of a triangle primitive which can be used to plat a three-dimensional body on a display device. The Z value of the specific primitive is represented in floating-point format through converting operation (S402). Then the Z value of the specific primitive having the largest exponential part is determined (S404) and the common Z exponent of the primitive is generated (S406). Then the Z value of the specific primitive is converted to the fixed-point format (S408). In this state, the common Z value exponential value is advanced forward together with the primitive (S410). After being transferred, the primitive is processed on a fixed- point basis (S412). Consequently, the Z value can be processed with higher efficiency.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本出願は、Ｍｉｃｈａｅｌ
Ｆ. Ｄｅｅｌｉｎｇによつて１９９６年７月１日に出
願され、本出願の譲受け人に譲渡された米国特許出願第
０８／６７３，１１７号「Ｚバッファ・プリミティブの
高解像度描画を実現する方法と装置」の一部継続出願で
ある。BACKGROUND OF THE INVENTION The present application relates to Michael
U.S. patent application Ser. No. 08 / 673,117, filed Jul. 1, 1996 by F. Deering and assigned to the assignee of the present application, entitled "Methods for Implementing High-Resolution Drawing of Z-Buffer Primitives; Is a continuation-in-part application for "device".

【０００２】本発明は、３Ｄグラフィックスに関し、特
に３Ｄグラフィックス・アクセラレ―タ処理用のパイプ
ライン内でのＺバッファ値の処理に関する。The present invention relates to 3D graphics, and more particularly, to processing Z buffer values in a pipeline for 3D graphics accelerator processing.

【０００３】[0003]

【従来の技術】現在の三次元コンピュ―タ・グラフィッ
クス技術は、ジオメトリを広範に使用して三次元物体を
記述する。デスプレイされる複雑で滑らかな物体表面
は、高度の抽象化を用いて表現される。詳細な表面ジオ
メトリは、テクスチャ・マップを用いて描画されるが、
より現実感を与えるためには、未処理のジオメトリ、通
常は三角形プリミティブの形のジオメトリが必要であ
る。現在のワ―クステ―ション・コンピュ―タ・システ
ムでは、一般には、位置、色（例えば、赤、青、緑およ
びオプションとして選べるα）およびこれらの三角形の
垂直成分が、浮動少数点数として表現される。Ｆｏｌｅ
ｙ，ｖａｎＤａｍ等による論文「コンピュータ・グラ
フィックス：原理と実際」（第２版）には、三次元グラ
フィクスについてかなり詳細に一般的に説明されている
ので、もっと詳しい背景について知りたい人は、これを
調べるとよい。2. Description of the Related Art Current three-dimensional computer graphics technology uses a wide range of geometries to describe three-dimensional objects. Complex and smooth object surfaces to be displayed are represented using a high degree of abstraction. Detailed surface geometry is drawn using a texture map,
For more realism, raw geometry, usually in the form of triangular primitives, is required. In current workstation computer systems, generally the position, color (eg, red, blue, green and optionally α) and the vertical components of these triangles are represented as floating point numbers. You. Fole
The article "Computer Graphics: Principles and Practice" (2nd edition) by y, van Dam et al. provides a fairly detailed description of three-dimensional graphics in general, so if you want more background, Check this out.

【０００４】図１は、従来技術のグラフィックス・シス
テム１０を示すもので、例えば、ユ―ザのつくった像を
ディスプレイするためコンピュ―タ・システムの中で用
いられている。図に示されているように、グラフィック
ス・システム１０は、システム・バス２２を介してＣＰ
Ｕ２０からグラフィックス入力デ―タを受取る。グラフ
ィックス入力デ―タには、ディスプレイ装置３０に三次
元物体を描画するために用いられる三角形プリミティブ
を表現する命令とデ―タが含まれている。一般には、グ
ラフィックス・システム１０は、グラフィックス・アク
セラレ―タ装置１４、フレ―ムバッファ用のランダム・
アクセス・メモリ１６（３ＤＲＡＭまたは他の型のメモ
リを用いて実現される）、およびビデオ・コントロ―ル
装置１８を含む。グラフィックス・システム１０からの
処理されたビデオは、次にモニタ―３０に伝えられ、そ
のモニタ―が４０および５０のような画像を表示する。FIG. 1 shows a prior art graphics system 10, which is used, for example, in a computer system to display an image created by a user. As shown, the graphics system 10 has a CP via a system bus 22.
The graphics input data is received from U20. The graphics input data includes instructions and data representing a triangle primitive used for drawing a three-dimensional object on the display device 30. Generally, the graphics system 10 includes a graphics accelerator unit 14, a random buffer for the frame buffer.
An access memory 16 (implemented using 3DRAM or other type of memory), and a video control unit 18 are included. The processed video from graphics system 10 is then communicated to monitor 30, which displays images such as 40 and 50.

【０００５】次に説明される本発明の関係する状況で
は、一般には、表示物体４０および５０は三次元の表面
を有し、通常、一つの物体の一部は他の物体の一部で隠
されている。例えば、図１では、物体４０の一部は物体
５０の一部の前にあり、したがって物体５０の隠された
部分を視界から隠している。視界から隠されている物体
５０の部分は、隠面と呼ばれる。In the context of the present invention described below, generally, the display objects 40 and 50 have a three-dimensional surface, where a portion of one object is usually hidden by a portion of another object. Have been. For example, in FIG. 1, a portion of the object 40 is in front of a portion of the object 50, thus hiding a hidden portion of the object 50 from view. The part of the object 50 that is hidden from view is called a hidden surface.

【０００６】三次元物体を描画することで得られる臨場
感は、物体の目に見える部分と隠されている部分（例え
ば、隠面のある）とをすばやく計算することをはじめと
した多くの要素に依存している。このようにして、装置
１４などのグラフィックス・アクセラレ―タにより、デ
ィスプレイされる各物体の隠面除去が行われる。グラフ
ィックス・アクセラレ―タ装置１０と関連するＺバッフ
ァ装置１２には、例えば描画される各ピクセルのＺ値、
例えば深さの値が格納されている。一つの所定のピクセ
ルに二以上の物体がマッピングされるので、視点から遠
く離れている物体が視点に近い物体の後ろに確実に投影
されるようにするためにＺバッファ装置１２が用いられ
る。例えば、物体５０のあるピクセルのＺ値は、そのＺ
値が既に格納されているそのピクセルの画面位置のＺ値
よりも見ている場所に近いときにだけ、Ｚバッファ１２
に書かれなければならない。実際には、描画される現ピ
クセルの古いＺ値を、バッファから読み出し、その現ピ
クセルの新しく作られたＺ値と数値を比較する。比較の
結果によって、古いＺ値および色フレ―ム・バッファ・
ピクセル値はそのまま残されるか、または新しく計算さ
れた値と置き換えられる。この概念は通常「Ｚバッファ
リング」と呼ばれる。[0006] The sense of realism obtained by drawing a three-dimensional object depends on a number of factors, including the ability to quickly calculate the visible and hidden parts (eg, with hidden surfaces) of the object. Depends on. In this manner, the graphics accelerator, such as device 14, removes the hidden surface of each displayed object. The Z buffer device 12 associated with the graphics accelerator device 10 includes, for example, a Z value of each pixel to be drawn,
For example, a depth value is stored. Since more than one object is mapped to a given pixel, a Z-buffer device 12 is used to ensure that objects far away from the viewpoint are projected behind objects near the viewpoint. For example, the Z value of a pixel of the object 50 is
Only when the value is closer to the viewing location than the Z value of that pixel's screen location already stored,
Must be written in In effect, the old Z value of the current pixel being drawn is read from the buffer and the numerical value is compared with the newly created Z value of the current pixel. Depending on the result of the comparison, the old Z value and color frame buffer
The pixel value is left as is or replaced with a newly calculated value. This concept is commonly referred to as "Z buffering."

【０００７】グラフィックス・入力デ―タがグラフィッ
クス・アクセラレ―タ１４に入力されると、ジオメトリ
・デ―タ（即ち、座標）が変換マトリックスを介してモ
デル座標から視点クリッピング座標に変換される。その
結果として得られたクリッピング座標は「同質」であ
り、通常のデカルト座標に加えて「Ｗ」座標を含むこと
を意味している。Ｗ座標は、視点から変換された点への
ｚ軸に沿うスケーリングされた距離を示す。ｚ軸は、同
じスケーリングされた距離を参照にするが、視点以外の
点を原点（一般には、前方クリッピング面の原点、また
は前方クリッピング面と後方クリッピング面の間の点）
とする距離である。透視分割に従って、Ｚの新しい値
（Ｚ´）が数量Ｚ／Ｗから得られる。ここで、ＺとＷ
は、透視分割前の座標値を表す。正射影投影（この場合
には、視点からの距離の増加に従って、物体の大きさが
小さくならない）については、与えられた点のＺ´値は
ワ―ルド座標系の点までの距離に正比例する。初期の三
次元コンピュ―タ・グラフィックス・システムは、大抵
この種の投影に関するものであった。これらの応用にお
いては、固定少数点または浮動少数点のいずれかの数値
演算を用いたＺ値の数値表現が、その数値演算で得られ
る数値によく合っていた。When graphics input data is input to graphics accelerator 14, geometry data (ie, coordinates) is transformed from model coordinates to viewpoint clipping coordinates via a transformation matrix. . The resulting clipping coordinates are "homogeneous", meaning that they include "W" coordinates in addition to normal Cartesian coordinates. The W coordinate indicates the scaled distance along the z-axis from the viewpoint to the transformed point. The z-axis refers to the same scaled distance, but to a point other than the viewpoint at the origin (typically, the origin of the front clipping plane or the point between the front and rear clipping planes)
Is the distance to be set. According to the perspective division, a new value of Z (Z ') is obtained from the quantity Z / W. Where Z and W
Represents coordinate values before perspective division. For orthographic projection (in this case, the size of the object does not decrease with increasing distance from the viewpoint), the Z 'value of a given point is directly proportional to the distance to the point in the world coordinate system. . Early three-dimensional computer graphics systems were mostly concerned with this type of projection. In these applications, the numerical representation of the Z value using either fixed point or floating point arithmetic has been well matched to the numeric value obtained by the arithmetic.

【０００８】しかし、三次元のコンピュ―タ・グラフィ
ックスで透視投影（視点からの距離が増すにつれて大き
さが小さくなる）が採用されたが、歴史的に開発されて
来た基本的な式や数値表現は変えられないでおり、最早
使用される数値演算に満足に適合しなくなっていた。例
えば、透視投影では、ある点のＺ’値はワ―ルド座標の
点までの距離の逆数に正比例する。従来技術で使用され
ていた式によって生成される画面空間（変換された）の
Ｚ値はワ―ルド空間座標と次式で関係付けられる。[0008] However, perspective projection (which decreases in size as the distance from the viewpoint increases) has been adopted in three-dimensional computer graphics. Numerical representations could not be changed and were no longer satisfactorily adapted to the mathematical operations used. For example, in perspective projection, the Z 'value of a point is directly proportional to the reciprocal of the distance to the point in world coordinates. The Z value of the screen space (transformed) generated by the equation used in the prior art is related to the world space coordinate by the following equation.

【０００９】[0009]

【数１】 (Equation 1)

【００１０】ここで、Ｆはワ―ルド座標における前方ク
リッピング面までの距離であり、Ｂはワ―ルド座標にお
ける後方クリッピング面までの距離である。式（１）
は、前方クリッピング面Ｆに近い点がほぼゼロに近い値
を持ち、後方クリッピング面Ｂに近い点がほぼ１に近い
値を持つように定義されていることに注意する必要があ
る。（僅かに異なる式を用いることで、前方および後方
のクリッピング面でＺ値は１と−１または−１と１に近
づくようになる）。Here, F is the distance to the front clipping plane in world coordinates, and B is the distance to the rear clipping plane in world coordinates. Equation (1)
It should be noted that is defined such that points close to the front clipping plane F have values close to zero and points close to the rear clipping plane B have values close to 1. (By using a slightly different equation, the Z value approaches 1 and -1 or -1 and 1 at the front and rear clipping planes).

【００１１】ＢがＦより遥かに大きいときには、式
（１）は、When B is much larger than F, equation (1) becomes

【００１２】[0012]

【数２】 (Equation 2)

【００１３】となる。比Ｂ／Ｆは、Ｚバッファリングに
おいて重要な役割をする。透視投影グラフィックスの初
期においては、比Ｂ／Ｆは一般に小さな整数であった。
一般に、前方クリッピング面は視点から２フィ―トであ
り、後方クリッピング面は４から１０フィ―ト離れてい
たものだった。この座標は、数フィ―トの目に見える大
きさの三次元物体を見るには充分であつた。## EQU1 ## The ratio B / F plays an important role in Z buffering. In the early days of perspective projection graphics, the ratio B / F was generally a small integer.
Generally, the front clipping plane was two feet from the viewpoint and the rear clipping plane was four to ten feet away. These coordinates were sufficient to see a three-dimensional object of visible size with a few feet.

【００１４】しかし、極く最近の現実感のある大規模な
仮想環境では、前方クリッピング面は見る人の鼻からほ
んの数インチであり、後方クリッピング面は数マイル離
れていることが要求されている。この構成については、
Ｂ／Ｆ比は結果的に非常に大きくなる。例えば、６イン
チにある前方クリッピング面と１０マイル離れている後
方クリッピング面では、Ｂ／Ｆ比は１００，０００を越
える。６インチと６００フィ―トの比較的小さな範囲で
も、まだ１，０００を超える比になる。However, very recent realistic large virtual environments require the front clipping plane to be only a few inches from the viewer's nose and the rear clipping plane to be several miles away. . For this configuration,
The B / F ratio is consequently very large. For example, with a front clipping plane at 6 inches and a rear clipping plane 10 miles away, the B / F ratio exceeds 100,000. Relatively small ranges of 6 inches and 600 feet still have ratios in excess of 1,000.

【００１５】一般的な従来技術のＺバッファのでは、Ｂ
／Ｆ比が大きいと、Ｚ値を表す数値の精度に問題が生じ
る。Ｂ／Ｆ＝１，０２４である特定のケ―スを考える。
（１）式を用いる画面空間では、この範囲の後ろ半分
（後方クリッピング面に向かって中間を越えた範囲）に
位置する点は、どの点も距離を表す他のビットが始まる
前に１０個の先行する１の数字を持っている。例え
ば、．１１１１１１１１１１ｘｘｘ…ｘｘｘの形であ
る。したがつて、ワ―ルド座標では、Ｂ／２とＢの間の
値を表現するためにｎビットの精度を必要とする任意の
点は、（１）式を用いれば、画面空間の表現に１０＋ｎ
ビットの精度を必要とすることになる。一般には、
（１）式は、画面空間の表現に最高限度約（ｌｏｇ２
（Ｂ／Ｆ））の追加ビットを必要とする。例えば、１０
０，０００のＢ／Ｆ比は１７ビットの追加を必要とす
る。上記の表現上のロスは、固定小数点表示にも浮動小
数点表示にも影響を与えることに注意する必要がある。
浮動少数点表示は先行するビットがゼロである値を符号
化するときには有効であるが、後方クリッピング面に近
い点の画面空間のＺ値は１にマップされるので、生成さ
れる余剰ビットは先頭の１ビットである。In a general prior art Z buffer, B
If the / F ratio is large, a problem arises in the accuracy of the numerical value representing the Z value. Consider a particular case where B / F = 1,024.
In the screen space using the expression (1), points located in the rear half of this range (a range beyond the middle toward the rear clipping plane) are set to 10 points before any other bit indicating the distance starts. Has a leading 1 digit. For example,. 11111111111xxx ... xxx. Therefore, in the world coordinates, any point that requires n-bit precision to represent a value between B / 2 and B can be expressed in screen space by using equation (1). 10 + n
This would require bit precision. Generally,
Equation (1) expresses the maximum limit (log2
(B / F)). For example, 10
A B / F ratio of 0000 requires an additional 17 bits. It should be noted that the above loss of representation affects both fixed-point and floating-point representations.
Floating point representation is useful for encoding values where the leading bit is zero, but the Z value in screen space at the point near the back clipping plane is mapped to 1, so the extra bits generated are Is one bit.

【００１６】前方クリッピング面Ｆが視点から５ｃｍの
ところにある別な例を考える。但し、後方クリッピング
面Ｂは視点から１００，０００ｃｍ（１ｋｍ）に位置し
ている。これらのＺバッファ値を表すために従来の固定
少数点フォ―マットを使用すると、前方および後方のク
リッピング面で個々のステップに非常に大きな不一致が
生じる。一つ実施例では、２４ビットの固定小数点Ｚバ
ッファは、最初の１ｃｍでは２，７９６，３２４ステッ
プを持つが、最後の１０，０００ｃｍではたったの９４
ステップである（１０６．４ｃｍ／ステップ）。一方、
別な実施の形態で浮動小数点（２８ビット）表示を使用
すると、改善が見られる（最初の１ｃｍでは４４，７４
１，４７８ステップ、最後の１０，０００ｃｍでは１４
９０ステップ）けれども、大きなＦ／Ｂ比についてさら
に高い精度が必要である。Consider another example where the front clipping plane F is 5 cm from the viewpoint. However, the rear clipping plane B is located at 100,000 cm (1 km) from the viewpoint. Using the conventional fixed-point format to represent these Z-buffer values results in very large discrepancies in the individual steps in the front and back clipping planes. In one embodiment, a 24-bit fixed point Z-buffer has 2,796,324 steps in the first 1 cm, but only 94 in the last 10,000 cm.
Step (106.4 cm / step). on the other hand,
An improvement is seen when using a floating point (28 bit) representation in another embodiment (44,74 for the first 1 cm).
1,478 steps, the last 10,000 cm is 14
However, higher accuracy is required for a large F / B ratio.

【００１７】Ｚ値の従来技術による数値表現を使用する
と、可視的な人工的な物が生じることになる。Ｚバッフ
ァのデ―タを適正に計算すると、像４０および５０は図
１に示すように表示され、リンゴ４０は箱５０の前にあ
る。図２は、これら二つの画像の適正に投影された画像
表現を図示する。しかし、Ｚバッファのデ―タが非常に
不正確になるＺ／Ｆの距離の状況では、エラ―が生じ
る。このようにして、図３において、箱５０の隠面のた
めの下側のピクセル・デ―タを上書きせずに、従来技術
のＺバッファ装置１２は誤って物体４０と物体５０との
ピクセルを同時にペイントすることを決めてしまった。
この「Ｚバッファリングの競合」によって、望ましくな
い重ね合せの領域が生じている。図４は、間違ったＺバ
ッファリングの結果として生じたさらに別の型のディス
プレイ・エラ―を示す。図４では、箱５０の一部が間違
ってリンゴ４０の一部の前にはっきりと現れている。The use of prior art numerical representations of the Z value results in visible artifacts. With the Z-buffer data properly calculated, the images 40 and 50 are displayed as shown in FIG. 1 and the apple 40 is in front of the box 50. FIG. 2 illustrates a properly projected image representation of these two images. However, in situations where the Z / F distance is such that the Z-buffer data is very inaccurate, an error occurs. Thus, in FIG. 3, without overwriting the lower pixel data for the hidden surface of the box 50, the prior art Z-buffer device 12 erroneously deletes the pixels of the object 40 and the object 50. I decided to paint at the same time.
This "Z-buffering contention" creates an undesirable area of overlap. FIG. 4 illustrates yet another type of display error resulting from incorrect Z-buffering. In FIG. 4, a part of the box 50 is erroneously clearly shown in front of a part of the apple 40.

【００１８】上記の解像度損の問題は、特にＢ／Ｆ比に
依存している。比Ｂ／ＦがＭであるとする。即ち、前方
クリッピング面が距離Ｆにあり、後方クリッピング面が
距離ＭＦにあるとする。ここで、後方クリツピング面ま
での距離が２倍になったとする（すなわち、後方クリッ
ピング面が今は距離２ＭＦにある）。このように距離が
２倍になる毎に、分解能はさらに１ビツト失われる。そ
して、残りのビットが非常に少なくてなると、Ｚバッフ
ァの半分からずっと遠くの残りの距離（例えばＭＦと２
ＭＦの間）を分解することが出来なくなる。最高ｎビッ
トのＺ表示を仮定すると、１Ｆと２Ｆの間の距離にある
点は、ｎ−１ビットの精度で表される。２Ｆと４Ｆの間
にある点は、せいぜいｎ−２の精度で表される。このよ
うにして、残りビットでは覆い隠す物体間の距離を十分
適切に分解することができなくなり、隠面の間違った計
算やディスプレィを引き起こすようになる。The above-mentioned problem of the resolution loss particularly depends on the B / F ratio. Assume that the ratio B / F is M. That is, the front clipping plane is at a distance F and the rear clipping plane is at a distance MF. Now, assume that the distance to the rear clipping plane has doubled (ie, the rear clipping plane is now at a distance of 2MF). Thus, each time the distance is doubled, one more bit of resolution is lost. Then, when the remaining bits become very small, the remaining distance farther from half the Z-buffer (eg, MF and 2
(During MF) cannot be resolved. Assuming a Z representation of up to n bits, points at a distance between 1F and 2F are represented with n-1 bits of precision. Points between 2F and 4F are represented with an accuracy of at most n-2. In this way, the remaining bits will not be able to adequately resolve the distance between the objects to be obscured, causing incorrect calculations and displays of hidden surfaces.

【００１９】この概念は、図５で説明されている。前方
および後方のクリッピング面はそれぞれＦとＢとして示
されている。図５の左にある「０」が視点を表す。即
ち、ディスプレィされる物体の描写を観察する人の目で
ある。図１に示す物体４０と５０も図５に描かれてい
る。This concept is illustrated in FIG. The front and rear clipping planes are shown as F and B, respectively. “0” on the left side of FIG. 5 represents the viewpoint. That is, the eyes of the person observing the depiction of the object being displayed. Objects 40 and 50 shown in FIG. 1 are also depicted in FIG.

【００２０】[0020]

【発明が解決しようとする課題】上で引用した出願人の
原特許出願は、従来技術のシステムで見られる数値精度
の問題がないＺバッファリング法と装置を提供してい
る。原出願で開示されているＺ値は（１）式とは異なる
式で計算され、浮動小数点で表現されている。このＺバ
ッファリングの優れていることは、遠方に描写される物
体にビットの非直線性喪失を課さないことである。The above-cited applicant's original patent application provides a Z-buffering method and apparatus that does not suffer from the numerical accuracy problems found in prior art systems. The Z value disclosed in the original application is calculated by an equation different from the equation (1), and is represented by a floating point. The advantage of this Z-buffering is that it does not impose a loss of bit non-linearity on objects rendered far away.

【００２１】しかし、この新しい浮動小数点表示をグラ
フィックス・パイプライン全体に適用することは、専用
の浮動小数点ハ―ドウェアを必要とするので費用がかか
る。付加された精度は描画パイプラインの大部分では使
用されないので、新しいＺバッファリング表示の利点よ
りも多分に欠点の方が大きくなる。However, applying this new floating point representation to the entire graphics pipeline is expensive because it requires dedicated floating point hardware. Since the added precision is not used in much of the rendering pipeline, the disadvantages are probably greater than the advantages of the new Z-buffered display.

【００２２】したがって、余分に必要とされるハ―ドウ
ェアを最小にしながら、原特許出願で開示された浮動小
数点Ｚバッファ表示の利点を得ることが望ましい。Therefore, it is desirable to take advantage of the floating point Z-buffer representation disclosed in the original patent application while minimizing the extra hardware required.

【００２３】[0023]

【発明を解決するための手段】上に概説した課題は、本
発明によるＺ値処理の方法によって大部分が解決され
る。Ｚ値は、グラフィックス・パイプライン内で処理さ
れる所定のプリィミティブの頂点に対応する。パイプラ
インで受取られたＺ値は、仮数部と指数部を含む第１浮
動小数点フォ―マット（例えば、ＩＥＥＥ浮動小数点フ
ォ―マット）で表現される。本発明の方法は、所定のプ
リィミティブのＺ値のどれが最も大きな指数部の値を持
っているかを決定するステップを含む。一つの実施の形
態では、これは、現指数値と現最大指数値との比較を行
って、全てのＺ値を試験するステップを含む。本発明の
方法は、実質的に、その所定のプリィミティブの最も大
きな指数値を決定することに応答して共通のＺ指数値を
生成するステップを含む。一つの実施の形態では、共通
の指数値は最も大きな指数値から或る定数値を引くこと
で生成される。次に、本発明の方法は、所定のプリィミ
ティブのＺ値を仮数部が共通のＺ指数値にスケーリング
される固定小数点フォ―マットに変換するステップを含
む。次に、その変換された値は、そのＺ値（固定小数点
フォ―マットで表されている）と共通指数値とを利用す
る第１の組のオペレ―ションをグラフィックス処理が続
けるとき、プリィミティブと共に送られる。この第１の
組のオペレ―ションの後で、Ｚ値は第２浮動小数点フォ
―マットに変換して戻される。次に、第２の組のグラフ
ィックス・オペレ―ション（即ち、隠面除去）が、この
第２浮動小数点フォ―マットを用いて行われる。SUMMARY OF THE INVENTION The problems outlined above are largely solved by the method of Z-value processing according to the present invention. Z-values correspond to vertices of certain primitives that are processed in the graphics pipeline. The Z value received in the pipeline is represented in a first floating point format including a mantissa and an exponent (eg, an IEEE floating point format). The method includes determining which of the Z values of a given primitive has the largest exponent value. In one embodiment, this includes comparing the current index value with the current maximum index value to test all Z values. The method of the invention substantially comprises the step of generating a common Z-index value in response to determining the largest exponent value of the predetermined primitive. In one embodiment, the common index value is generated by subtracting a constant value from the largest index value. Next, the method of the invention comprises the step of converting the Z value of the given primitive into a fixed point format in which the mantissa is scaled to a common Z exponent value. The transformed value is then converted to a primitive as graphics processing continues with the first set of operations utilizing the Z value (represented in fixed point format) and the common exponent value. Sent with. After this first set of operations, the Z values are converted back to a second floating point format. Next, a second set of graphics operations (i.e., hidden surface removal) is performed using this second floating point format.

【００２４】所定のプリィミティブの全ての頂点に単一
の指数を用いることで、Ｚ値をより効率的に表現するこ
とができ、グラフィックス・パイプラインの性能が高め
られる結果になる。効率は、等価の浮動小数点フォーマ
ットよりもコンパクトな中間の固定小数点フォ―マット
を用いることで更に向上する。最後に、一つの実施の形
態では、Ｚ値はＷ_f／Ｗの式を用いて表される。従来技
術のシステムとは異なって、この表現はＦ／Ｂ比が大き
くなる遠方の領域の点にビットの非直線性喪失を課すこ
とがない。The use of a single exponent for all vertices of a given primitive allows the Z value to be represented more efficiently, resulting in improved graphics pipeline performance. Efficiency is further improved by using an intermediate fixed-point format that is more compact than the equivalent floating-point format. Finally, in one embodiment, the Z value is represented using the formula W _f / W. Unlike prior art systems, this representation does not impose a loss of bit nonlinearity at points in the far region where the F / B ratio is large.

【００２５】次の図面と共に、好ましい実施の形態につ
いての下記の詳細な説明を考察することで、本発明を一
層よく理解することができる。The present invention may be better understood with reference to the following detailed description of the preferred embodiment, taken in conjunction with the following drawings.

【００２６】[0026]

【発明の実施の形態】図６コンピュ―タ・システム図６において、本発明による三次元（３−Ｄ）グラフィ
ックス・アクセラレ―タを含むコンピュ―タ・システム
８０が示されている。図に示されるように、コンピュ―
タ・システム８０は、システム装置８２と、そのシステ
ム装置８２に連結されたビデオ・モニタまたはディスプ
レイ装置８４とを備える。ディスプレイ装置８４は、任
意の様々の型のディスプレイ・モニタまたは装置でよ
い。キ―ボ―ド８６および／またはマウス８８、または
その他の入力装置を含む様々の入力装置をコンピュ―タ
・システムに接続することができる。アプリケ―ション
・ソフトウェアがコンピュ―タ・システム８０で実行さ
れて、ビデオ・モニタ８４に３−Ｄグラフィックス物体
がディスプレイされる。更に下記で説明されるように、
コンピュ―タ・システム８０の３−Ｄグラフィックス・
アクセラレ―タは、ディスプレイ装置８４に三次元グラ
フィックスの物体を描画するために使用されるプリィミ
ティブに対応するジオメトリ・デ―タのＺ値を処理する
改良された機能を含む。DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG . 6 Computer System Referring to FIG. 6, a computer system 80 including a three-dimensional (3-D) graphics accelerator according to the present invention is shown. As shown in the figure, the computer
The monitor system 80 includes a system unit 82 and a video monitor or display device 84 coupled to the system unit 82. Display device 84 may be any of a variety of types of display monitors or devices. Various input devices, including a keyboard 86 and / or a mouse 88, or other input device, can be connected to the computer system. The application software runs on the computer system 80 to display the 3-D graphics object on the video monitor 84. As further described below,
3-D graphics computer system 80
The accelerator includes an improved ability to process the Z values of the geometry data corresponding to the primitives used to render the three-dimensional graphics object on the display device 84.

【００２７】図７コンピュ―タ・システムのブロック
図さて、図７において、図６のコンピュ―タ・システムを
説明する簡単化されたブロック図が示されている。本発
明を理解するために必要でないコンピュ―タ・システム
の要素は、便宜上、示されていない。図に示すように、
コンピュ―タ・システム８０は、高速バスまたはシステ
ム・バス１０４に連結された中央処理装置（ＣＰＵ）１
０２を含む。システム・メモリ１０６も、その高速バス
１０４に連結されることが好ましい。FIG . 7 Blocks of the computer system
Referring now to FIG. 7, a simplified block diagram illustrating the computer system of FIG. 6 is shown. Elements of the computer system that are not necessary for understanding the present invention are not shown for convenience. As shown in the figure,
Computer system 80 includes a central processing unit (CPU) 1 coupled to a high speed bus or system bus 104.
02. System memory 106 is also preferably coupled to the high speed bus 104.

【００２８】ホスト処理装置１０２は、コンピュ―タ処
理装置、多重処理装置およびＣＰＵの様々な型のいずれ
であってもよい。システム・メモリ１０６は、ランダム
・アクセス・メモリおよび大容量記憶装置をはじめとす
るメモリ・サブシステムの様々な型のいづれであっても
よい。システム・バスまたはホスト・バス１０４は、専
用サブシステムは勿論のこと、ホスト処理装置、ＣＰＵ
およびメモリ・サブシステムの間の通信用バス、または
通信用のホスト・コンピュ―タ・バスのどのような型の
ものであつてもよい。好ましい実施の形態では、ホスト
・バス１０４は、８３ＭＨｚで動作する６４ビット・バ
スであるＵＰＡバスである。The host processor 102 may be any of various types of computer processors, multiprocessors and CPUs. System memory 106 may be any of various types of memory subsystems, including random access memory and mass storage. The system bus or the host bus 104 includes a host processor, a CPU, as well as a dedicated subsystem.
And any type of communication bus between the memory subsystem and the host computer bus for communication. In the preferred embodiment, host bus 104 is a UPA bus, which is a 64-bit bus operating at 83 MHz.

【００２９】本発明による３−Ｄグラフィックス・アク
セラレ―タ１１２は、高速メモリ・バス１０４に連結さ
れている。３−Ｄグラフィックス・アクセラレ―タ１１
２は、例えばクロスバ−スイッチまたはその他のバス連
結性論理によってバス１０４に連結してもよい。当技術
分野では周知のように、様々のその他の周辺装置、また
はその他のバスを高速メモリ・バス１０４に接続するこ
とができる。３−Ｄグラフィックス・アクセラレ―タ
は、希望するように、様々のバスのどれにでも連結する
ことができることに留意する必要がある。図に示すよう
に、ビデオ・モニタまたはディスプレイ装置８４が、３
−Ｄグラフィックス・アクセラレ―タ１１２に接続す
る。グラフィックス・アクセラレ―タ１１２について、
下記に一層詳細に説明する。The 3-D graphics accelerator 112 according to the present invention is coupled to the high speed memory bus 104. 3-D graphics accelerator 11
2 may be connected to bus 104 by, for example, a crossbar switch or other bus connectivity logic. Various other peripherals, or other buses, may be connected to the high speed memory bus 104, as is well known in the art. It should be noted that the 3-D graphics accelerator can be connected to any of a variety of buses, as desired. As shown, the video monitor or display device 84 has 3
-Connect to D graphics accelerator 112. About Graphics Accelerator 112
This will be described in more detail below.

【００３０】図８グラフィックス・アクセラレ―タさて、図８において、本発明の好ましい実施の形態によ
るグラフィックス・アクセラレ―タ１１２を説明するブ
ロック図が示されている。図に示すように、グラフィッ
クス・アクセラレ―タ１１２は、原理的にはコマンド・
ブロック１４２、浮動小数点処理装置１５２Ａ〜１５２
Ｆの組、描画処理装置１７２Ａおよび１７２Ｂ、３ＤＲ
ＡＭより構成されるフレ―ムバッファ１００およびラン
ダム・アクセス・メモリ／ディジタル−アナログ変換器
（ＲＡＭＤＡＣ）１９６より構成される。FIG . 8 Graphics Accelerator Referring now to FIG. 8, a block diagram illustrating the graphics accelerator 112 according to a preferred embodiment of the present invention is shown. As shown, the graphics accelerator 112 is in principle a command
Block 142, floating point processing units 152A-152
Set F, drawing processing devices 172A and 172B, 3DR
It comprises a frame buffer 100 composed of an AM and a random access memory / digital-analog converter (RAMDAC) 196.

【００３１】図に示すように、グラフィックス・アクセ
ラレ―タ１１２は、メモリ・バス１０４とのインタ―フ
ェイスを行うコマンド・ブロック１４２を含む。コマン
ド・ブロック１４２は、グラフィックス・アクセラレ―
タ１１２のホスト・バス１０４に対するインタ―フェイ
スを行い、グラフィックス・アクセラレ―タ内で他のブ
ロックまたはチップの間のデ―タ転送を制御する。ま
た、コマンド・ブロック１４２は、三角形デ―タおよび
ベクトル・デ―タの前処理を行い、ジオメトリ・デ―タ
の伸長を行う。As shown, the graphics accelerator 112 includes a command block 142 that interfaces with the memory bus 104. Command block 142 is a graphics accelerator
It interfaces the data bus 112 with the host bus 104 and controls data transfer between other blocks or chips within the graphics accelerator. The command block 142 performs pre-processing of the triangle data and the vector data, and decompresses the geometry data.

【００３２】コマンド・ブロック１４２は、複数の浮動
小数点処理装置１５２Ａ〜１５２Ｆとのインタ―フェイ
スを行う。好ましくは、グラフィックス・アクセラレ―
タ１１２は、図に示すように１５２Ａ〜１５２Ｆの表示
のついている６個までの浮動小数点処理装置を含む。浮
動小数点処理装置１５２Ａ〜１５２Ｆは、高レベルの描
画コマンドを受け取り、画面上に三次元の物体を描画す
るための三角形、線などのグラフィックス・プリィミテ
ィブを生成する。浮動小数点処理装置１５２Ａ〜１５２
Ｆは、受取ったジオメトリ・デ―タについて変換、クリ
ッピング、面確定、ライティングおよびセットアップの
オペレ―ションを行う。浮動小数点処理装置１５２Ａ〜
１５２Ｆの各々は、それぞれのメモリ１５３Ａ〜１５３
Ｆに接続している。メモリ１５３Ａ〜１５３Ｆは、３２
ｋ×３６ビットのＳＲＡＭであることが好ましく、マイ
クロコ―ドおよびデ―タの記憶のために使用される。Command block 142 interfaces with a plurality of floating point processors 152A-152F. Preferably, graphics accelerator
The data 112 includes up to six floating point processors labeled 152A-152F as shown. The floating point processing units 152A to 152F receive high-level drawing commands and generate graphics primitives such as triangles and lines for drawing a three-dimensional object on a screen. Floating point processing units 152A-152
F performs transformation, clipping, surface determination, lighting and setup operations on the received geometry data. Floating point processor 152A-
Each of the 152F has a respective memory 153A-153.
Connected to F. The memories 153A to 153F store 32
It is preferably a k.times.36-bit SRAM and is used for storing microcode and data.

【００３３】浮動小数点処理装置１５２Ａ〜Ｆの各々
は、二つの描画処理装置１７２Ａと１７２Ｂに接続され
る。グラフィックス・アクセラレ―タ１１２は、二つの
描画処理装置１７２Ａと１７２Ｂを含むのが望ましい
が、もつと多くのまたはもっと少ない数の描画装置も使
用することができる。描画処理装置１７２Ａと１７２Ｂ
は、様々のグラフィックス・プリィミティブの画面空間
レンダリングを行い、完成されたピクセルを順番に処理
して３ＤＲＡＭの配列に入れ又は詰め込むように働く。
また、描画処理装置１７２Ａと１７２Ｂは、フレ―ムバ
ッファ１００用の３ＤＲＡＭの制御チップとしても機能
する。描画処理装置１７２Ａと１７２Ｂは、浮動小数点
処理装置１５２Ａ〜１５２Ｆの一つから受信した描画パ
ケットにしたがって、またはコマンド処理装置１４２か
ら受信したダイレクト・ポ―ト・パケットにしたがつて
同時に一つの画像をフレ―ムバッファ１００に描画す
る。Each of the floating point processors 152A-F is connected to two rendering processors 172A and 172B. Graphics accelerator 112 preferably includes two rendering processors 172A and 172B, although more or less rendering devices could be used. Drawing processing devices 172A and 172B
Performs screen space rendering of various graphics primitives and acts to sequentially process the completed pixels into an array of 3DRAMs.
Further, the drawing processors 172A and 172B also function as control chips of the 3DRAM for the frame buffer 100. The drawing processors 172A and 172B simultaneously process one image according to a drawing packet received from one of the floating-point processors 152A to 152F or according to a direct port packet received from the command processor 142. Draw in the frame buffer 100.

【００３４】浮動小数点処理装置１５２Ａ〜１５２Ｆの
各々は、二つの描画処理装置１７２Ａ、１７２Ｂに同じ
デ―タを同報するように動作するのが好ましい。言い換
えれば、各浮動小数点処理装置１５２Ａ〜１５２Ｆから
来る両方の組のデ―タ・ライン上に常に同じデ―タが存
在する。したがって、浮動小数点処理装置１５２Ａ〜１
５２Ｆがデ―タを転送する時には、該浮動小数点処理装
置は、描画処理装置１７２Ａと１７２Ｂに至る両方のＦ
Ｄバスの部分に同じデ―タを転送する。Each of the floating point processors 152A-152F preferably operates to broadcast the same data to the two rendering processors 172A, 172B. In other words, the same data is always present on both sets of data lines coming from each floating point processor 152A-152F. Therefore, the floating point processing units 152A-1
When 52F transfers the data, the floating point processor will send both F to drawing processors 172A and 172B.
The same data is transferred to the D bus.

【００３５】それぞれの描画処理装置１７２Ａと１７２
Ｂの各々は、フレ―ムバッファ１００に連結され。フレ
―ムバッファ１００は３ＤＲＡＭメモリの四つのバンク
１９２Ａ，Ｂおよび１９４Ａ，Ｂを備える。描画処理装
置１７２Ａは二つの３ＤＲＡＭバンク１９２Ａと１９２
Ｂに連結され、描画処理装置１７２Ｂは二つの３ＤＲＡ
Ｍバンク１９４Ａと１９４Ｂに連結している。各バンク
は、図に示すように、三つの３ＤＲＡＭチップから成
る。３ＤＲＡＭメモリまたはバンク１９２Ａ，Ｂおよび
１９４Ａ，Ｂは、一括してフレ―ムバッファ１００を形
成している。このフレ―ムバッファ１００は９６ビット
深さで１２８０×１０２４である。フレ―ムバッファ
は、描画処理装置１７２Ａと１７２Ｂで描画される３−
Ｄ物体に対応するピクセルを格納する。Each of the drawing processing units 172A and 172
Each of B is connected to a frame buffer 100. The frame buffer 100 includes four banks 192A, B and 194A, B of 3DRAM memory. The drawing processing unit 172A includes two 3DRAM banks 192A and 192.
B, the drawing processing device 172B has two 3DRAs.
It is connected to M banks 194A and 194B. Each bank consists of three 3DRAM chips as shown. The three DRAM memories or banks 192A, B and 194A, B together form a frame buffer 100. The frame buffer 100 is 1280 × 1024 at a depth of 96 bits. The frame buffer is drawn by the drawing processors 172A and 172B.
The pixel corresponding to the D object is stored.

【００３６】３ＤＲＡＭメモリ１９２Ａ，１９２Ｂおよ
び１９４Ａ，１９４Ｂの各々は、ＲＡＭＤＡＣ（ランダ
ム・アクセス・メモリ／ディジタル−アナログ変換器）
１９６に連結している。ＲＡＭＤＡＣ１９６は、従来の
カラ―・ルックアップ・テ―ブル回路および三重ビデオ
ＤＡＣ回路はもちろん、クロスバ―機能ともに、プログ
ラマブル・ビデオ・タイミング発生器とプログラマブル
・ピクセル・クロック・シンセサイザとを備える。そし
てまた、ＲＡＭＤＡＣはビデオ・モニタ８４に連結して
いる。Each of the 3 DRAM memories 192A, 192B and 194A, 194B is a RAMDAC (random access memory / digital-to-analog converter).
196. The RAMDAC 196 has a programmable video timing generator and a programmable pixel clock synthesizer, as well as a conventional color look-up table circuit and a triple video DAC circuit, as well as a crossbar function. Also, the RAMDAC is coupled to a video monitor 84.

【００３７】コマンド・ブロックは、単一チップで実現
するのが好ましい。浮動小数点処理装置１５２Ａ〜１５
２Ｆの各々は、別々のチップで実現するのが好ましい。
好ましい実施の形態では、６個までの浮動小数点処理装
置またはチップ１５２Ａ〜１５２Ｆが含まれている。描
画ブロックまたは処理装置１７２Ａと１７２Ｂの各々も
また別々のチップから成ることが好ましい。The command block is preferably implemented on a single chip. Floating point processor 152A-15
Each of the 2Fs is preferably implemented on a separate chip.
In the preferred embodiment, up to six floating point processors or chips 152A-152F are included. Preferably, each of the drawing blocks or processors 172A and 172B also comprises a separate chip.

【００３８】図９浮動小数点処理装置のブロック図図９において、本発明の好ましい実施の形態による浮動
小数点処理装置１５２Ａ〜１５２Ｆの一つを図示するブ
ロック図が示されている。それぞれの浮動小数点処理装
置１５２Ａ〜Ｆの各々は同一であるので、ここでは、便
宜上、そのうちの一つ（１５２で示す）について説明す
る。図に示すように、浮動小数点処理装置１５２Ａ〜１
５２Ｆの各々は、三つの主機能装置であるコア・プロセ
ッサ、即ち、Ｆコア・プロセッサ２０２，Ｌコア・プロ
セッサ２０４およびＳコア・プロセッサ２０６を含む。
Ｆコア・プロセッサ２０２はコマンド・ブロック１４２
から転送されるデ―タをＣＦバスから受信するように連
結されている。Ｆコア・プロセッサは出力デ―タをＬコ
ア・プロセッサ２０４とＳコア・プロセッサ２０６の各
々に供給する。また、Ｌコア・プロセッサ２０４はデ―
タをＳコア・プロセッサ２０６に供給する。Ｓコア・プ
ロセッサ２０６は出力デ―タをＦＤバスに供給する。 FIG . 9 is a block diagram of a floating point processor. FIG . 9 is a block diagram illustrating one of the floating point processors 152A-152F according to a preferred embodiment of the present invention. Since each of the floating-point processing units 152A to 152F is the same, only one of them (indicated by 152) will be described here for convenience. As shown in the figure, the floating point processing units 152A-1
Each of 52F includes three main functional units, a core processor, namely, an F-core processor 202, an L-core processor 204, and an S-core processor 206.
The F-core processor 202 executes the command block 142
Are connected so as to receive data transferred from the CF bus from the CF bus. The F-core processor supplies output data to each of the L-core processor 204 and the S-core processor 206. The L-core processor 204 is
Data to the S-core processor 206. S-core processor 206 provides output data to the FD bus.

【００３９】Ｆコア・プロセッサ２０２は、ジオメトリ
変換、クリップ試験、面確定、透視分割および画面空間
変換を含む全ての浮動小数点集約型のオペレ―ションを
行う。また、Ｆコ・プロセッサ２０２は、必要なときに
はクリッピングを行う。好ましい実施の形態において
は、３２ｋワ―ドのＳＲＡＭ（１５３）に格納された３
６ビットのマイクロインストラクション・ワ―ドを用い
てＦコア・プロセッサ２０２は完全にプログラム可能で
ある。The F-core processor 202 performs all floating point intensive operations, including geometry transformation, clip testing, surface determination, perspective splitting, and screen space transformation. The F co-processor 202 performs clipping when necessary. In the preferred embodiment, the three bits stored in a 32 kword SRAM (153) are used.
Using a 6-bit microinstruction word, the F-core processor 202 is fully programmable.

【００４０】Ｌコア・プロセッサ２０４は、オンチップ
のＲＡＭベ―スのマイクロコ―ドを用いて殆どのライテ
ィングの計算を行う。従来技術のライティング装置とは
異なって、Ｌコア・プロセッサ２０４は、これらの計算
を行うために固定小数点の演算を使用する。好ましい実
施の形態では、Ｌコア・プロセッサ２０４の数値範囲
は、ｓ１．１４フォ―マット（１符号ビット、１整数ビ
ット、および１４少数ビット）を使用して−２．０から
＋２．０である。ライティング計算の大部分は、このタ
イプの１６ビット・オペランドを使用してこの範囲で行
われる。しかし、ライティング計算に必要なパラメ―タ
には、この範囲を越えて、下記の様に処理されるものも
ある。The L-core processor 204 performs most lighting calculations using on-chip RAM-based microcode. Unlike prior art lighting devices, L-core processor 204 uses fixed-point arithmetic to perform these calculations. In the preferred embodiment, the numerical range of the L-core processor 204 is -2.0 to +2.0 using the s1.14 format (1 sign bit, 1 integer bit, and 14 fractional bits). . Most of the lighting calculations are performed in this range using 16-bit operands of this type. However, some parameters required for the lighting calculation are processed as follows, beyond this range.

【００４１】また、Ｌコア・プロセッサ２０４は、より
効率よくライティング計算を行うために効率のよいトリ
プル・ワ―ドの設計を含んでいる。トリプル・ワ―ド設
計は、１６ビットの固定小数点値から成る４８ビットの
デ―タ・ワ―ドで動作する。したがって、一つの命令
で、３色全ての成分（ＲＧＢ）または法線の３成分（Ｎ
ｘ，Ｎｙ，およびＮｚ）全てについて同じ機能が１サイ
クルで行われる。Ｌコア・プロセッサに含まれる数値計
算装置によって、数値は自動的に許容数値範囲にクラン
プされるので、追加のブランチを必要としない。The L-core processor 204 also includes an efficient triple word design for more efficient lighting calculations. The triple word design operates on a 48-bit data word consisting of 16-bit fixed point values. Therefore, with one command, all three color components (RGB) or the normal three components (N
x, Ny, and Nz) perform the same function in one cycle. By means of a numerical calculator included in the L-core processor, the numbers are automatically clamped to the permissible value range, so that no additional branches are required.

【００４２】Ｓコア・プロセッサは、全てのプリィミテ
ィブのセットアップ計算を行う。このセットアップ計算
には、一つの頂点から別の頂点への距離を多次元で計算
すること、およびその稜線に沿って勾配を計算すること
が含まれている。三角形については、Ｚ深度の勾配、色
およびＵＶ（テクスチャに対する）も走査線の方向で計
算される。The S-core processor performs setup calculations for all primitives. This setup calculation involves calculating the distance from one vertex to another vertex in multiple dimensions, and calculating the gradient along that edge. For triangles, the gradient of Z depth, color and UV (for texture) are also calculated in the direction of the scan line.

【００４３】図１０描画装置のブロック図さて、図１０において、描画装置１７２Ａを図示するブ
ロック図が示されている。描画装置１７２Ａと１７２Ｂ
は同一であるので、ここでは便宜上、そのうちの一つ
（１７２で示す）について説明する。描画装置１７２
は、３ＤＲＡＭチップの配列の順番を制御する。描画処
理装置１７２は、ピクセル用内部キャッシュおよびビデ
オ出力リフレッシュの論理スケジュリングを行う３ＤＲ
ＡＭを備える。これらの資源は、描画されるピクセルを
３ＤＲＡＭに到達する前に待ち行列に入れること、およ
び、この待ち行列中でピクセルのアドレスを探して３Ｄ
ＲＡＭキャッシュ・ミスを予測することで制御される。FIG . 10 is a block diagram of the drawing apparatus. FIG. 10 is a block diagram showing the drawing apparatus 172A. Drawing devices 172A and 172B
Are the same, and here, for convenience, one of them (indicated by 172) will be described. Drawing device 172
Controls the order of arrangement of the three DRAM chips. The rendering processing unit 172 performs 3DR that performs logical scheduling of an internal cache for pixels and a video output refresh.
With AM. These resources include queuing the pixel to be rendered before it reaches the 3DRAM, and searching the 3D
Controlled by predicting RAM cache misses.

【００４４】図に示すように、描画処理装置１７２は、
ＦＤバスにインタ―フェイス３０２を取るＦＤバス・イ
ンタ―フェイス・ブロック３０２を含む。ＦＤバスイン
タ―フェイス・ブロック３０２は、ＣＤＣバス・インタ
―フェイス論理３１２に連結する。ＣＤＣバス・インタ
―フェイス論理３１２は、スクラッチ・バッファ３１４
とダイレクト・ポ―ト装置３１６に連結する。ダイレク
ト・ポ―ト装置３１６は、フレ―ムバッファ・インタ―
フェイス論理３３６から入力を受け、ピクセル・デ―タ
・マルチプレクサ論理３３２に出力を供給する。また、
ＣＤＣバス・インタフェイス論理３１２は、ＤＣバスに
出力を供給するように連結している。ＦＤバス・インタ
―フェイス３０２は、プリィミティブ累積バッファ３０
４に出力を供給する。As shown in the figure, the drawing processing device 172
It includes an FD bus interface block 302 that interfaces to the FD bus. The FD bus interface block 302 couples to the CDC bus interface logic 312. The CDC bus interface logic 312 includes a scratch buffer 314
And the direct port device 316. The direct port device 316 has a frame buffer interface.
It receives an input from the face logic 336 and provides an output to the pixel data multiplexer logic 332. Also,
CDC bus interface logic 312 is coupled to provide output to the DC bus. The FD bus interface 302 includes the primitive accumulation buffer 30
4 to the output.

【００４５】また、描画処理装置１７２は、コマンド・
ブロック１４２によって指定された通りにプリィミティ
ブの順番の記録を取るスコアボ―ド論理３１８を含む。
図に示すように、スコアボ―ド論理は、Ｆ＿Ｎｕｍ入力
を受取り、プリィミティブ累積バッファ３０４に出力を
供給する。コマンド・ブロック１４２は、（ユニキャス
トの）プリィミティブがＣＦバス出力ＦＩＦＯの一つに
複写される度に、３ビットのコ―ドを描画処理装置１７
２に供給する。そのコ―ドによって、浮動小数点ブロッ
クの６個の処理装置１５２Ａ〜１５２Ｆのどれがそのプ
リィミティブを受取るかが指定される。また、そのコ―
ドは、そのプリィミティブが順序付けられたものか、順
序付けられたものでないかを表示するビットを含む。全
ての順序付けられたプリミティブは、入ってきた順番で
出て行くことが必要である。順序付けられていないプリ
ミティブは、可能になると常に、プリミティブ累積バッ
ファ３０４から取去られる。例えばテキストやマ―カな
どのように、各入力プリミティブに対して複数のプリミ
ティブを出力するプリミティブがあり、これらのプリミ
ティブは、効率を高めるために順序付けられないモ―ド
に置くことが好ましい。しかし、描画処理装置１７２に
送られる全ての属性は、それが修正するプリミティブに
関して順序付けられた状態を継続しなければならない。
更に、厳密な順番付けが守られなければならない直線や
三角形に関する場合がある。スコアボ―ド論理３１８は
少なくとも６４個のプリミティブを追跡する。スコアボ
―ド論理３１８は、スコアボ―ド論理３１８が一杯にな
りそうになった時に、スコアボ―ド論理３１８がオ―バ
フロ―するのを防ぐために、コマンド・ブロック１４２
に信号を送り返す。Further, the drawing processing device 172 has a command
It includes scoreboard logic 318, which keeps track of the order of the primitives as specified by block 142.
As shown, the scoreboard logic receives the F_Num input and provides an output to a primitive accumulation buffer 304. Command block 142 renders a 3-bit code each time a (unicast) primitive is copied to one of the CF bus output FIFOs.
Feed to 2. The code specifies which of the six processing units 152A-152F of the floating point block will receive the primitive. Also, the core
The field contains a bit that indicates whether the primitive is ordered or not. All ordered primitives need to exit in the order in which they entered. Unordered primitives are removed from primitive accumulation buffer 304 whenever possible. There are primitives, such as text and markers, that output multiple primitives for each input primitive, and these primitives are preferably placed in an unordered mode to increase efficiency. However, all attributes sent to the rendering processor 172 must remain ordered with respect to the primitives they modify.
Further, there may be lines and triangles that must be strictly ordered. Scoreboard logic 318 tracks at least 64 primitives. The scoreboard logic 318 includes a command block 142 to prevent the scoreboard logic 318 from overflowing when the scoreboard logic 318 is nearly full.
Send a signal back to.

【００４６】上に述べたように、プリミティブ累積バッ
ファ３０４は、ＦＤバス・インタフェイス３０２とスコ
アボ―ド論理３１８とから出力を受取る。プリミティブ
累積バッファ３０４は、稜線ウォ―カ―（ｗａｌｋｅ
ｒ）論理３２２に出力を供給し、その稜線ウォ―カ―論
理３２２が次にスパン・フィル論理３２４に出力を供給
する。スパン・フィル論理３２４は、テクスチャ・ピク
セル処理装置３２６に出力を供給する。また、スパン・
フィル論理３２４は、ダイレクト・ポ―ト装置３１６に
出力を供給する。また、プリミティブ累積バッファ３０
４は、テクスチャ拡大器論理３２８に出力を供給する。
テクスチャ拡大器論理３２８は、テクスチャ・メモリ・
キャッシュ３３０に連結している。テクスチャ・メモリ
・キャッシュ３３０は、テクスチャ・ピクセル処理装置
３２６にデ―タを供給する。また、テクスチャ・メモリ
・キャッシュ３３０は、ダイレクト・ポ―ト装置３１６
にデ―タを供給する。テクスチャ・ピクセル処理装置３
２６とダイレクト・ポ―ト装置３１６は各々ピクセル・
デ―タ・マルチプレクサ論理３３２にデ―タを供給す
る。ピクセル・デ―タ・マルチプレクサ論理３３２は、
ピクセル処理装置３３４にその出力を供給する。ピクセ
ル処理装置３３４は、その出力をフレ―ムバッファ・イ
ンタ―フェイス３３６に供給し、またダイレクト・ポ―
ト装置３１６に出力を供給する。As mentioned above, primitive accumulation buffer 304 receives outputs from FD bus interface 302 and scoreboard logic 318. The primitive accumulation buffer 304 stores an edge line walker (walk).
r) provides an output to logic 322, whose edge walker logic 322 then provides an output to span fill logic 324. The span fill logic 324 provides an output to the texture pixel processor 326. In addition, span
Fill logic 324 provides an output to direct port device 316. The primitive accumulation buffer 30
4 provides an output to the texture expander logic 328.
Texture expander logic 328 includes a texture memory
It is connected to the cache 330. Texture memory cache 330 provides data to texture pixel processor 326. Also, the texture memory cache 330 has a direct port device 316.
Supply data to Texture / pixel processing unit 3
26 and the direct port device 316 each have a pixel
Data is provided to data multiplexer logic 332. The pixel data multiplexer logic 332 includes:
The output is provided to a pixel processing unit 334. Pixel processing unit 334 provides its output to frame buffer interface 336 and direct port
The output is supplied to the switching device 316.

【００４７】プリミティブ累積バッファ３０４は、完全
なプリミティブを受取るまで、プリミティブ・デ―タを
累積するために用いられる。このようにして、６個の浮
動小数点処理装置１５２Ａ〜１５２Ｆからデ―タが集め
られるとき、最終的には、デ―タは完全なプリミティブ
を形成する。プリミティブ累積バッファ３０４は、一つ
の完全なプリミティブを保持するに十分なメモリ空間に
加えて、パイプラインをスム―ズに流れる状態に維持す
るために第２プリミティブの一部分を蓄えるだけの十分
な記憶容量を含んでいる。６個のプリミティブ累積バッ
ファ３０４は、６個の浮動小数点処理装置１５２Ａ〜１
５２Ｆの各々からデ―タが入って来たときに、一杯にな
る。プリミティブが完全に受取られると直ちに、一般に
は、次のプリミティブがその後に入ってくる。したがっ
て、プリミティブ累積バッファ３０４は、次のプリミテ
ィブから入って来るデ―タで満たされる前に、完全なプ
リミティブをプリミティブ累積バッファ３０４から稜線
ウォ―カ―論理３２２に転送するために、十分な余分な
バッファリングを含んでいる。好ましい実施の形態で
は、プリミティブ累積バッファ３０４は、処理される最
も大きなプリミティブ（三角形）よりも数ワ―ド大き
い。プリミティブ累積バッファ３０４は、６４ビットの
出力を稜線ウォ―カ―論理３２２に供給する。プリミテ
ィブは、スコアボ―ド論理３１８の内容に基づいて、一
度にプリミティブ累積バッファ３０４から取り出され
る。The primitive accumulation buffer 304 is used to accumulate primitive data until a complete primitive is received. Thus, as data is collected from the six floating point processors 152A-152F, the data will eventually form complete primitives. Primitive accumulation buffer 304 has sufficient memory space to store a portion of the second primitive in order to keep the pipeline running smoothly, in addition to enough memory space to hold one complete primitive. Contains. The six primitive accumulation buffers 304 include six floating point processing units 152A to 152A.
Full when data comes in from each of the 52Fs. As soon as a primitive is completely received, the next primitive generally follows. Thus, the primitive accumulation buffer 304 has sufficient extra data to transfer the complete primitive from the primitive accumulation buffer 304 to the edge walker logic 322 before being filled with incoming data from the next primitive. Includes buffering. In the preferred embodiment, the primitive accumulation buffer 304 is several words larger than the largest primitive (triangle) to be processed. Primitive accumulation buffer 304 provides a 64-bit output to edge walker logic 322. Primitives are retrieved from the primitive accumulation buffer 304 at a time based on the contents of the scoreboard logic 318.

【００４８】稜線ウォ―カ―論理３２２は、スパン・フ
ィル装置３２４が容易に処理することができるように、
プリミティブをばらばらに分割する。三角形の場合に
は、稜線ウォ―カ―論理３２２は、二つのそのときのエ
ッジをたどり、最も近いピクセルのサンプル点に対して
調整された一組の垂直スパンを生成する。次に、この一
組の垂直スパンはスパン・フィル装置３２４に送られ
る。また、稜線ウォ―カ―装置３２２は、直線について
も同じ様な調整を行い、三角形スパンに非常によく似た
直線記述をスパン・フィル装置３２４に送る。稜線ウォ
―カ―論理３２２は、これらの調整を行うために使用さ
れる２個の１６×２４乗算器を備える。更に、稜線ウォ
―カ―論理３２２は、他の計算をするために使用される
計数を追跡するいくつかの加算器を含む。三角形と直線
以外のプリミティブは、最も効率的な資源の使用に基づ
いて分割される。ぎざぎざのあるドットもエイリアス除
去がされたドットも、最小の調整（例えば、ぎざぎざの
あるドットに０．５を加える）で、その論理を介してそ
のまま送られる。大きなドットは、個々のピクセルとし
て稜線ウォ―カ―論理３２２を介して供給される。稜線
ウォ―カ―論理３２２は、多角形と長方形を水平スパン
に変換する。稜線ウォ―カ―論理３２２はＢｒｅｓｅｎ
ｈａｍ線を、それに変更を加えることなくスパン・フィ
ル装置３２４に送る。[0048] The edge walker logic 322 is designed to be easily processed by the span fill unit 324.
Split primitives apart. In the case of a triangle, the edge walker logic 322 follows the two current edges and produces a set of vertical spans adjusted to the nearest pixel sample point. The set of vertical spans is then sent to a span fill unit 324. The ridge walker unit 322 also makes similar adjustments for straight lines and sends a straight line description very similar to a triangular span to the span fill unit 324. Edge walker logic 322 comprises two 16 × 24 multipliers used to make these adjustments. In addition, the edge walker logic 322 includes several adders that keep track of the counts used to make other calculations. Primitives other than triangles and straight lines are segmented based on the most efficient use of resources. Both jagged and anti-aliased dots are sent through the logic with minimal adjustment (eg, adding 0.5 to the jagged dots). Large dots are provided as individual pixels via the edge walker logic 322. Edge walker logic 322 converts polygons and rectangles to horizontal spans. Edge line walker logic 322 is Bresen
The ham line is sent to the span fill device 324 without modification.

【００４９】スパン・フィル装置３２４は、任意の方向
を向いたスパンに渡って（通常は、三角形と直線につい
て）値の補間を行ない、また、エイリアス除去がされた
直線についてフィルタ重み付けテ―ブルの探索を行う。
三角形スパンの対、長方形や多角形のスパンおよびエイ
リアス除去の行われた直線と点を含む最適化されたプリ
ミティブについて、サイクル毎に二つのピクセルが生成
される。他のプリミティブは全てサイクル毎に一つのピ
クセルを生成する。また、スパン・フィル装置３２４の
最終段はディザリングをおこない、４×４画面空間のデ
ィザ・パタ―ンを用いて１２ビットのカラ―を８ビット
値に変換する。スパン・フィル論理３２４は、テクスチ
ャ・ピクセル処理装置３２６に出力を供給する。The span fill unit 324 interpolates values (usually for triangles and straight lines) over spans oriented in any direction, and filters the weighted tables for anti-aliased straight lines. Perform a search.
For optimized primitives, including triangle span pairs, rectangular and polygon spans and anti-aliased lines and points, two pixels are generated per cycle. All other primitives generate one pixel per cycle. The last stage of the span fill device 324 performs dithering, and converts a 12-bit color into an 8-bit value using a dither pattern of a 4 × 4 screen space. The span fill logic 324 provides an output to the texture pixel processor 326.

【００５０】テクスチャ・ピクセル処理装置３２６は、
テクスチャ計算を行い、テクスチャ・メモリ・キャッシ
ュ３３０でのテクセルの探索を制御する。テクスチャ・
ピクセル処理装置３２６は、ピクセル処理装置３３４に
よってピクセルに溶け込ませるべき色を生成する。テク
スチャ・ピクセル処理装置３２６は、テクスチャ化され
た三角形を除く他の全てのプリミティブのデ―タをピク
セル・デ―タ・マルチプレクサ論理３３２に送る。The texture / pixel processing unit 326 includes:
It performs texture calculations and controls the search for texels in the texture memory cache 330. texture·
Pixel processing unit 326 generates colors to be blended into pixels by pixel processing unit 334. The texture pixel processor 326 sends data for all other primitives except the textured triangles to the pixel data multiplexer logic 332.

【００５１】上に述べたように、プリミティブ累積バッ
ファ３０４は、テクスチャ拡大器論理３２８に出力を供
給する。テクスチャ拡大器３２８は、テクスチャ・メモ
リ・キャッシュ３３０で記憶するために受取ったテクス
チャを拡大するように動作する。このように、テクスチ
ャ・メモリ・キャッシュ３３０は、プリミティブ累積バ
ッファ３０４から直接ロ―ドされ、テクセル探索のため
にテクスチャ・ピクセル処理装置に接続されている。テ
クスチャ・メモリ・キャッシュ３３０は、全ての比較的
小さなミップマッピングを含めて、１６×１６のテクセ
ル領域にテクスチャ・マッピングを行うために十分なデ
―タを保持できるように設計されている。テクスチャ・
メモリ・キャッシュ３３０は、現バッファが使用されて
いる間に、それ以外のバッファをロ―ドすることができ
るように、ダブル・バッファ型であることが好ましい。
１６×１６のテクセル領域は、補間が正しく行われるた
めに、実際には１７×１７のアレイとして格納されてい
ることに留意する必要がある。As mentioned above, primitive accumulation buffer 304 provides an output to texture expander logic 328. Texture expander 328 operates to expand received textures for storage in texture memory cache 330. Thus, the texture memory cache 330 is loaded directly from the primitive accumulation buffer 304 and is connected to the texture pixel processing unit for texel search. The texture memory cache 330 is designed to hold enough data to texture map a 16x16 texel area, including all relatively small mipmaps. texture·
The memory cache 330 is preferably double buffered so that other buffers can be loaded while the current buffer is in use.
It should be noted that the 16x16 texel area is actually stored as a 17x17 array for the interpolation to be performed correctly.

【００５２】上に述べたように、ピクセル・デ―タ・マ
ルチプレクサ論理３３２は、テクスチャ・ピクセル処理
装置３２６とダイレクト・ポ―ト装置３１６から入力デ
―タを受取る。ピクセル・デ―タ・マルチプレクサ論理
３３２は、スパン・フィル装置３２４から来るピクセル
とＣＤバスから来るピクセルとの間を調停する。ＣＤバ
スからのピクセルが、常に優先される。ピクセル・デ―
タ・マルチプレクサ論理３３２はその出力をピクセル処
理装置３３４に供給する。As mentioned above, the pixel data multiplexer logic 332 receives input data from the texture pixel processing unit 326 and the direct port unit 316. Pixel data multiplexer logic 332 arbitrates between pixels coming from span fill unit 324 and pixels coming from the CD bus. Pixels from the CD bus are always preferred. Pixel Day
Data multiplexer logic 332 provides its output to pixel processing unit 334.

【００５３】ピクセル処理装置３３４は、３ＤＲＡＭ１
９２、１９４での論理オペレーションのために、混合、
エアリィアス除去、深度キュ―イングおよびセットアッ
プを行う。また、ピクセル処理装置３３４は、線パタ―
ン化、型板パタ―ン化、Ｖポ―ト・クリッピングその他
のオペレ―ションのためのピクセル書き込みを禁止する
ように動作可能な論理を備えている。ピクセル処理装置
３３４は、フレ―ムバッファ・インタ―フェイス３３６
に出力を供給する。The pixel processing device 334 includes the 3DRAM 1
For logical operations at 92, 194, mixed,
Perform alias removal, depth curing and set up. Further, the pixel processing device 334 includes a line pattern.
Logic operable to inhibit pixel writing for patterning, template patterning, V port clipping, and other operations. The pixel processing unit 334 includes a frame buffer interface 336.
Supply output to

【００５４】フレ―ムバッファ・インタフェ―ス３３６
は、３ＤＲＡＭメモリ１９２Ａ，１９２Ｂからピクセル
を読み書きするために必要な論理を備えている。フレ―
ムバッファ・インタ―フェイス３３６は、３ＤＲＡＭチ
ップのレベル１（Ｌ１）とレベル２（Ｌ２）キャッシュ
を管理する。これは、他のピクセルのアクセスが生じて
いる間に、書き込まれるべきピクセルを予測し、必要と
されるキャッシュにペ―ジングすることで行われる。次
に、図に示すように、フレ―ムバッファ・インタ―フェ
イス３３６は、３ＤＲＡＭメモリ１９２Ａ、１９２Ｂ、
１９４Ａ、１９４Ｂに連結している。Frame buffer interface 336
Have the necessary logic to read and write pixels from the 3DRAM memories 192A, 192B. Frame
The buffer interface 336 manages the level 1 (L1) and level 2 (L2) caches of the 3DRAM chips. This is done by predicting the pixel to be written and paging the required cache while other pixel accesses are occurring. Next, as shown in the figure, the frame buffer interface 336 includes 3DRAM memories 192A, 192B,
194A and 194B.

【００５５】Ｚ値の処理さて、図１１において、グラフィックス処理用のパイプ
ラインにおけるＺ値の処理方法４００のフロ―チャ―ト
が図示されている。Ｚ値は、システムＣＰＵである中央
処理装置１０２によってシステム・バス１０４上でグラ
フィックス・アクセラレ―タ１１２に転送されるグラフ
ィックス・デ―タの一部である。このＺ値は、ディスプ
レイ装置８４に三次元物体を描画するために使用可能な
三角形プリミティブの頂点のＺ座標を表す。転送された
Ｚ値はモデル空間座標であり、図１２のＡに示されるＩ
ＥＥＥ浮動小数点規格５００のような第１浮動小数点フ
ォ―マットで表現される。図に示すように、ＩＥＥＥフ
ォ―マット５００は、符号ビット５０２、指数部５０４
および仮数部５０６を含む。これは「隠くされた１」の
位置５０８を含む。他の浮動小数点フォ―マットも使用
することができる。 Processing of Z Value Now, FIG. 11 shows a flowchart of a method 400 of processing a Z value in a pipeline for graphics processing. The Z value is part of the graphics data transferred to the graphics accelerator 112 on the system bus 104 by the central processing unit 102, which is the system CPU. This Z value represents the Z coordinate of the vertex of a triangle primitive that can be used to draw a three-dimensional object on the display device 84. The transferred Z value is a model space coordinate, and the I value shown in FIG.
It is expressed in a first floating point format such as the EEE floating point standard 500. As shown in the figure, the IEEE format 500 includes a sign bit 502, an exponent part 504.
And a mantissa 506. This includes the "hidden 1" location 508. Other floating point formats can also be used.

【００５６】グラフィックス・アクセラレ―タ１１２が
受取るモデル空間座標は同質なクリッピング座標に変換
され、Ｆコア・プロセッサ２０２内で正規化されて、Ｚ
値はＺ／Ｗとして表されるようになる。この値は、−１
から＋１の範囲に存在する。クリッピングの後、（Ｘお
よびＹ座標と共に）所定のプリミティブのＺ値は、処理
方法４００のステップ４０２の変換オペレ―ションを経
て画面空間にマッピングされる。画面空間のＸおよびＹ
座標は、ディスプレイ装置８４上のピクセル値に対応す
る。一方、Ｚ座標は画面に垂直な想像上のｚ軸に沿った
深度に対応する。原特許出願で開示されたように、Ｚ値
は、Ｗ_f／Ｗと表されるのが好ましい。ここで、Ｗ_fは前
方クリッピング面でのＷを表す。上で説明したように、
このようにして変換されたＺ値は、前方クリッピング面
での１．０（含む）と物体が後方クリッピング面に近づ
くときの０．０（含まない）との間に広がっている。従
来技術のシステムとは異なり、この変換では、ゼロに近
い数（本発明の一つの実施の形態による後方クリッピン
グ面に近い）について重大な精度の喪失を生じない。ス
テップ４０２の後、Ｚ値は浮動小数点フォ―マット５０
０で表されることに注意する必要がある。The model space coordinates received by the graphics accelerator 112 are transformed to homogeneous clipping coordinates, normalized in the F-core processor 202, and
The value comes to be expressed as Z / W. This value is -1
Exists in the range from to +1. After clipping, the Z values of the given primitive (along with the X and Y coordinates) are mapped to screen space via the transform operation of step 402 of processing method 400. X and Y of screen space
The coordinates correspond to pixel values on the display device 84. On the other hand, the Z coordinate corresponds to a depth along an imaginary z-axis perpendicular to the screen. As disclosed in the original patent application, the Z value is preferably expressed as W _f / W. Here, W _f represents the W in the front clipping plane. As explained above,
The Z value converted in this way extends between 1.0 (inclusive) at the front clipping plane and 0.0 (not included) when the object approaches the rear clipping plane. Unlike prior art systems, this transformation does not result in significant loss of precision for numbers near zero (close to the back clipping plane according to one embodiment of the invention). After step 402, the Z value is in floating point format 50
Note that it is represented by zero.

【００５７】次に、ステップ４０４と４０６において、
最も大きな指数部５０４を持つ所定のプリミティブのＺ
値が決定され、そのプリミティブの共通のＺ指数が生成
される。上に説明したように、浮動小数点Ｚバッファの
使用には、表示ビットが付加されるために、実現するた
めには一層高価になるという潜在的な不都合がある。こ
の欠点は、所定のプリミティブの各Ｚ値について単一の
指数を使用することによって、本発明の範囲内では大部
分が小さなものとなる。これが可能なのは、所定のプリ
ミティブの全ての頂点が全てのＺ値について同じ指数を
持つこと（または、指数が非常に近いこと）が統計的に
あり得ることだからである。密接に関係づけられていな
い指数部を持つ多角形は、一般には、貧弱な図形プログ
ラミングの慣例に起因する。Next, in steps 404 and 406,
Z for a given primitive with the largest exponent 504
The value is determined and a common Z index for the primitive is generated. As explained above, the use of a floating point Z-buffer has the potential disadvantage of being more expensive to implement due to the additional indication bits. This disadvantage is largely small within the scope of the present invention by using a single exponent for each Z value of a given primitive. This is possible because it is statistically possible that all vertices of a given primitive have the same exponent for all Z values (or that the exponents are very close). Polygons with exponents that are not closely related are generally due to poor graphic programming practices.

【００５８】ステッブ４０４は、図１１と１３を参照に
して説明されるいくつかのサブステップを含む。図１３
は、Ｆコア・プロセッサ２０２内部の指数比較器装置６
００を図示する。指数比較器装置６００は、現Ｚ値６０
２を受取るように連結された現Ｚ指数レジスタ６１０を
含む。レジスタ６１０の出力は、最大Ｚ指数レジスタ６
２０、加算器６２２および比較器６４０に連結されてい
る。また、比較器６４０は、レジスタ６２０の出力を受
取るように接続されており、比較信号６４２Ａ，６４２
Ｂを制御装置６５０に伝える。次に、制御装置６５０
は、負荷指数信号６５２をレジスタ６２０だけでなく共
通Ｚ指数レジスタ６３０にも伝える。レジスタ６２０
は、さらにクリア信号６０４を受取る。レジスタ６３０
は、加算器６２２の出力に連結されたクランプ・マルチ
プレクサ６２４から入力を受取る。その加算器６２２は
もう一つの入力として減算定数６２６を受取る。Step 404 includes several sub-steps described with reference to FIGS. FIG.
Is the exponent comparator unit 6 inside the F-core processor 202.
00 is illustrated. The exponent comparator device 600 has a current Z value of 60.
2 includes a current Z index register 610 coupled to receive two. The output of register 610 is the maximum Z exponent register 6
20, adder 622 and comparator 640. Comparator 640 is connected to receive the output of register 620, and outputs comparison signals 642A and 642A.
B is transmitted to the controller 650. Next, the control device 650
Conveys the load index signal 652 not only to the register 620 but also to the common Z index register 630. Register 620
Receives a clear signal 604. Register 630
Receives an input from a clamp multiplexer 624 coupled to the output of adder 622. The adder 622 receives a subtraction constant 626 as another input.

【００５９】ステップ４０４Ａで、最大Ｚ指数レジスタ
６２０は、クリア信号６０４をアサ―トすることによっ
てクリアされる。レジスタ６２０は、現プリミティブの
現最大Ｚ指数値を保持するように設計されているので、
このレジスタは現プリミティブのＺ値を受取る前にクリ
アされる。好ましい実施の形態においては、この動作
は、設定された特別の状態ビツトで「ｆｉｘｚ」命令を
実行することで行われる。At step 404A, the maximum Z index register 620 is cleared by asserting a clear signal 604. Register 620 is designed to hold the current maximum Z-index value of the current primitive, so
This register is cleared before receiving the Z value of the current primitive. In the preferred embodiment, this is done by executing a "fixz" instruction with a special status bit set.

【００６０】次に、ステップ４０４Ｂにおいて、現Ｚ値
６０２はレジスタ６１０で受取られる。レジスタ６１０
と６２０の内容は、現Ｚ値６０２がレジスタ６２０の内
容よりも大きいかどうかを決定するために比較器６４０
に進められる。比較器６４０は、比較信号６４２Ａ、６
４２Ｂを制御装置６５０に伝える。制御装置６５０は
（ステップ４０４Ｃで）、６２０の内容をレジスタ６１
０の内容で置換するべきかどうかを決定する。所定のプ
リミティブのために受取られた第１Ｚ値に、常にレジス
タ６２０の内容は置換される。Next, in step 404B, the current Z value 602 is received in the register 610. Register 610
And 620 are used to determine whether the current Z value 602 is greater than the contents of register 620.
Proceed to The comparator 640 outputs the comparison signals 642A, 6
42B is transmitted to the controller 650. Controller 650 (at step 404C) stores the contents of 620 in register 61.
Determine whether to replace with the contents of 0. The contents of register 620 are always replaced with the first Z value received for a given primitive.

【００６１】置換が必要であれば、ステップ４０４Ｄ
で、現Ｚ指数レジスタ６１０の内容が最大Ｚ指数レジス
タ６２０に書き込まれる。レジスタ６１０の内容が、比
較器６４０に伝えられてステッブ４０４Ｂで比較動作を
行う時、レジスタ６１０の値も、レジスタ６２０の入力
に伝えられる（純論理的に）。（比較信号６４２に基づ
いて）制御装置６５０が置換が必要であると決定する
と、負荷指数信号６５２がアサ―トされて、レジスタ６
１０の現値がレジスタ６２０にコピ―される。このよう
にして、現Ｚ値は現最大指数値になる。好ましい実施の
形態では、動作は、ＳＲＡＭ１５３に格納されている
「ｆｉｘｇｔｚ」命令によって引き起こされ、Ｆコア
・プロセッサ２０２内の回路で実行される。If replacement is required, step 404D
Then, the contents of the current Z index register 610 are written to the maximum Z index register 620. When the contents of the register 610 are transmitted to the comparator 640 and the comparison operation is performed in the step 404B, the value of the register 610 is also transmitted to the input of the register 620 (purely). If controller 650 determines (based on comparison signal 642) that replacement is required, load index signal 652 is asserted and register 6
The current value of 10 is copied to register 620. Thus, the current Z value becomes the current maximum exponent value. In the preferred embodiment, the operation is triggered by a “fix gtz” instruction stored in SRAM 153 and executed by circuitry within F-core processor 202.

【００６２】共通Ｚ指数値は、ステップ４０６で置換動
作と同時に計算される。共通Ｚ指数値（レジスタ６３０
に格納されている）は、レジスタ６２０の内容に関係付
けられ、表現に使用されるビット数は少なくなる。例え
ば、図１２のＡに示されるように、ＩＥＥＥフォ―マッ
ト５００の指数部５０４は８ビットである。このビット
数は浮動小数点の全体の範囲を表現するために必要とさ
れるが、０と１の間の数を表現するために使用されるビ
ットの数は少なくなる。さらに、これは、Ｚ値を表現す
るために使用される間隔の大きさを維持する。The common Z index value is calculated at step 406 at the same time as the replacement operation. Common Z exponent value (register 630)
Is associated with the contents of register 620, and the number of bits used in the representation is reduced. For example, as shown in FIG. 12A, the exponent part 504 of the IEEE format 500 is 8 bits. This number of bits is needed to represent the entire range of the floating point, but fewer bits are used to represent numbers between 0 and 1. In addition, this maintains the size of the interval used to represent the Z value.

【００６３】本発明の好ましい実施の形態においては、
０と１５の間の値を可能にする共通Ｚ指数値を符号化す
るために、４ビットが使用される。これは、１５の正規
化された表現と１の正規化されない表現を許容する。１
５の共通Ｚ指数値は、１．０より少ない数の最大指数値
５０４に相当する。これは、フォ―マット５００では１
２６である。このようにして、現Ｚ値（例えば、値６０
２）から４ビット表現への変換は、一定値を引算するこ
とを含む。好ましい実施の形態では、１１１を引くこと
（または、代わりに、−１１１を加算すること）によっ
て達成されている。In a preferred embodiment of the present invention,
Four bits are used to encode a common Z-index value that allows values between 0 and 15. This allows for 15 normalized expressions and 1 unnormalized expression. 1
A common Z index value of 5 corresponds to a maximum index value 504 of less than 1.0. This is 1 in format 500
26. In this way, the current Z value (eg, the value 60
The conversion from 2) to a 4-bit representation involves subtracting a constant value. In the preferred embodiment, this is accomplished by subtracting 111 (or, alternatively, adding -111).

【００６４】ステップ４０４Ｂで比較が行われる度に、
レジスタ６１０の値は加算器６２２に伝えられる。ま
た、加算器６２２は減算定数６２６を受取る。本発明の
好ましい実施の形態では、この定数は０ｘ９１（十進数
では、−１１１）に等しい。次に、加算器６２２の出力
は、クランプ・マルチプレクサ６２４に伝えられる。加
算器６２２の出力が負の結果となるときには、クランプ
・マルチプレクサ６２４はその結果をゼロにクランプす
る。そうでないときは、加算器６２２の出力は変らな
い。両方の場合において、クランプ・マルチプレクサ６
２４の出力は共通Ｚ指数レジスタ６３０の入力に伝えら
れる。（負荷指数信号６５２がアサ―トされることで指
示される）置換が行われないときには、マルチプレクサ
６２４の出力はレジスタ６３０に書かれない。ステップ
４０４Ｄで置換動作が行われると、その時には、ステッ
プ４０６で対応する共通Ｚ指数値がレジスタ６３０に格
納される。Each time a comparison is made in step 404B,
The value of the register 610 is transmitted to the adder 622. Further, the adder 622 receives the subtraction constant 626. In a preferred embodiment of the invention, this constant is equal to 0x91 (decimal number -111). Next, the output of adder 622 is passed to clamp multiplexer 624. When the output of adder 622 has a negative result, clamp multiplexer 624 clamps the result to zero. Otherwise, the output of adder 622 does not change. In both cases, the clamp multiplexer 6
The output of 24 is communicated to the input of a common Z index register 630. When no replacement is performed (indicated by assertion of load index signal 652), the output of multiplexer 624 is not written to register 630. When the replacement operation is performed in step 404D, the corresponding common Z index value is stored in the register 630 in step 406.

【００６５】判定ボックス４０４Ｅに示されるように、
上述のステップはプリミティブの残りの頂点（三角形プ
リミティブの一般的な場合では、さらに２度）について
も実行される。最後のＺ値が比較された（そして、多
分、レジスタ６２０に書き込まれた）後、レジスタ６２
０には、現プリミティブのＺ値について指数部５０４の
最大値が保持されている。同様に、レジスタ６３０は、
レジスタ６２０の値に対応する共通Ｚ指数値を格納して
いる。As shown in decision box 404E,
The above steps are also performed for the remaining vertices of the primitive (two more in the general case of triangle primitives). After the last Z value has been compared (and possibly written to register 620), register 62
0 holds the maximum value of the exponent part 504 for the Z value of the current primitive. Similarly, register 630 contains
The common Z exponent value corresponding to the value of the register 620 is stored.

【００６６】ステップ４０８において、所定のプリミテ
ィブのＺ値が固定小数点フォ―マット（例えば図１２の
Ｂに示すフォ―マット５１０）に変換される。図に示す
ように、フォ―マット５１０は、符号ビット５１２、整
数ビット５１４および仮数部５１６を含む。これは、ｓ
１．３０フォ―マット（符号ビット、１整数ビットおよ
び３０分数ビット）と言われている。仮数部５１６はス
テップ４０６で計算される共通Ｚ指数値に対してスケー
リングされるので、固定小数点フォ―マット５１０には
指数部がない。好ましい実施の形態では、各Ｚ値は、下
記のアルゴリズムを実施するコ―ド列で、Ｆコア・プロ
セッサ２０２の内部で正規化される。At step 408, the Z value of the given primitive is converted to a fixed point format (eg, format 510 shown in FIG. 12B). As shown, format 510 includes a sign bit 512, an integer bit 514, and a mantissa 516. This is
It is referred to as 1.30 format (sign bit, integer bit, and fractional 30 bit). Since the mantissa 516 is scaled with respect to the common Z exponent value calculated in step 406, the fixed-point format 510 has no exponent. In the preferred embodiment, each Z value is normalized within F-core processor 202 with a code sequence that implements the following algorithm.

【００６７】[0067]

【表１】 [Table 1]

【００６８】このように、アルゴリズムは、最大Ｚ指数
と現Ｚ指数の両方の指数部５０４の差を取る。この値
は、可変「シフト」に格納される。次に、隠し１ビット
が論理和演算されて浮動小数点仮数値に入れられて、２
４ビットの仮数値をつくる。次に、この２４ビット値は
左に６ビットだけシフトされ（３０ビット仮数フィ―ル
ド５１６の残りにゼロがパッドされ）る。これらの余分
の６ビットによって、その後の固定小数点処理の精度が
増すことに注意する必要がある。最後に、３０ビット値
は、仮数値５１６がプリミティブの共通Ｚ指数に正規化
されるように、適切なビット数だけ右にシフトされる。
他の実施の形態では、専用のハ―ドウェア回路によるも
のも含めて、浮動−固定変換処理が別に行われる。Thus, the algorithm takes the difference between the exponent parts 504 of both the maximum Z index and the current Z index. This value is stored in a variable "shift". Next, the hidden one bit is ORed and put into the floating-point mantissa,
Create a 4-bit mantissa value. This 24-bit value is then shifted left by 6 bits (the rest of the 30-bit mantissa field 516 is padded with zeros). It should be noted that these extra 6 bits increase the precision of subsequent fixed point processing. Finally, the 30-bit value is shifted right by the appropriate number of bits so that the mantissa value 516 is normalized to the primitive's common Z exponent.
In other embodiments, floating-fixed conversion processing is performed separately, including by a dedicated hardware circuit.

【００６９】ステップ４０８でＺ値が固定小数点に変換
された状態で、共通Ｚ指数値はプリミティブとともに前
に進められ、ステップ４１０で更に処理される。図１４
に示すように、共通Ｚ指数値は、ヘッダ―・ワ―ド７０
０の中のＺ指数フィ―ルド７０２に格納される。ヘッダ
―・ワ―ド７００はＦコア・プロセッサ２０２によって
書かれ、図９に示すＦＳバッファを介してＳコア・プロ
セッサ２０６に伝えられる。対応するプリミティブが処
理されるにつれて、ヘッダ―・ワ―ド７００はパイプラ
インを伝播して行く。したがって、フィ―ルド７０２の
Ｚ指数値は、下に説明するような計算に利用可能であ
る。With the Z value converted to fixed point in step 408, the common Z index value is advanced with the primitive and further processed in step 410. FIG.
As shown in FIG.
It is stored in the Z exponent field 702 within zero. The header word 700 is written by the F-core processor 202 and transmitted to the S-core processor 206 via the FS buffer shown in FIG. The header word 700 propagates down the pipeline as the corresponding primitive is processed. Thus, the Z-index value of field 702 is available for calculations as described below.

【００７０】共通Ｚ指数が生成され、プリミティブとと
もに転送された後、ステップ４１２でプリミティブに固
定小数点処理が行われる。図１１に示すように、ステッ
プ４１２は、サブステップ４１２Ａ（セットアップ）、
４１２Ｂ（稜線ウォ―キング）および４１２Ｃ（スパン
・フィリング）を含む。これらのオペレ―ションは、上
に説明したように行われる。好ましい実施の形態では、
セットアップは処理装置１５２のＳコア・プロセッサ２
０６で行われ、稜線補間は描画処理装置１７２内の稜線
ウォ―カ―装置３２２で行われ、そしてスパン補間は描
画処理装置１７２のスパン・フィル装置３２４で行われ
る。これらのステップは、図１２のＢに示されるｓ１．
３０フォ―マット５１０を用いて一層精度高く効果的に
行われる。また別の実施の形態では、ここに示すものと
は異なったオペレ―ションがステップ４１２内で行われ
ることに注意を必要とする。After a common Z index has been generated and transferred with the primitive, fixed-point processing is performed on the primitive in step 412. As shown in FIG. 11, step 412 includes sub-step 412A (setup),
412B (ridgeline walking) and 412C (span filling). These operations are performed as described above. In a preferred embodiment,
The setup is performed by the S-core processor 2 of the processing unit 152.
At 06, the edge interpolation is performed by the edge walker unit 322 in the drawing processing unit 172, and the span interpolation is performed by the span filling unit 324 of the drawing processing unit 172. These steps correspond to s1.
Using the 30 format 510 can be performed more accurately and effectively. It should be noted that in other embodiments, operations different from those shown here are performed within step 412.

【００７１】図１５において、ピクセル処理装置３３４
のブロック図が示されている。ステップ４１２Ｃのスパ
ン補間に続いて、プリミティブ・デ―タは、フレ―ムバ
ッファ・インタ―フェイス３３６を介してフレ―ムバッ
ファ１００に転送される前に、付加的なオペレ―ション
を行う処理装置３３４に転送される。図に示すように、
上に説明したように計算される共通Ｚ指数値は、バス８
１０上で奇数と偶数のＹＺパイプライン８０２Ａ，８０
２Ｂに伝えられる。同様に、（ｓ１．３０フォ―マット
の）固定小数点Ｚ値は、バス８１２Ａおよび８１２Ｂ上
でそれぞれパイプライン８０２Ａ，８０２Ｂに転送され
る。図１２のには、Ｚ値のピクセル処理用に別個のパイ
プライン（奇数と偶数）が図示されているが、別の実施
の形態では、Ｚ値は単一のパイプラインを用いて処理さ
れる。また、図１５には、固定小数点バス８１４Ａと８
１４Ｂによってそれぞれパイプライン８０２Ａと８０２
Ｂに接続されている深度キュ―イング・パイプライン８
０４Ａ，８０４Ｂも図示されている。In FIG. 15, the pixel processing device 334
Is shown in FIG. Following the span interpolation of step 412C, the primitive data is sent to the processing unit 334 for additional operations before being transferred to the frame buffer 100 via the frame buffer interface 336. Will be transferred. As shown in the figure,
The common Z-index value calculated as described above is
Odd and even YZ pipelines 802A, 80 on 10
2B. Similarly, fixed-point Z values (in s1.30 format) are transferred to pipelines 802A and 802B on buses 812A and 812B, respectively. Although separate pipelines (odd and even) are shown in FIG. 12 for pixel processing of the Z values, in another embodiment, the Z values are processed using a single pipeline. . FIG. 15 also shows fixed-point buses 814A and 814A.
Pipelines 802A and 802 respectively by 14B
Depth queuing pipeline 8 connected to B
04A and 804B are also shown.

【００７２】下に説明するように、パイプライン８０２
は、バス８１０と８１２上で伝えられるｓ１．３０フォ
―マットのデ―タに２つの型の変換を行う。第１に、こ
のデ―タは、例えば図１２のＤに示すフォ―マット５３
０のような等価の固定小数点に変換される。この変換
は、図１１に示すステップ４１４に対応する。フォ―マ
ット５３０を用いて表される固定小数点デ―タは、次
に、ステップ４１８で深度キュ―イングおよびＺクリッ
プ決定を行うために使用される。As described below, the pipeline 802
Performs two types of conversions on s1.30 format data carried on buses 810 and 812. First, this data is, for example, in a format 53 shown in FIG.
It is converted to an equivalent fixed point such as 0. This conversion corresponds to step 414 shown in FIG. The fixed point data represented using the format 530 is then used in step 418 to make depth queuing and Z-clip decisions.

【００７３】第２に、パイプライン８０２は、入って来
るデ―タがフレ―ムバッファ１００に転送される前に、
必要であれば、このデ―タを浮動小数点フォ―マットに
変換するように構成されている。この変換処理（ステッ
プ４１６）は、Ｚ値がフレ―ムバッファ１００内で浮動
小数点表記法で表現されるように選ばれる場合に行われ
る。そのような表現の一つの例は、図１２のＣに示すフ
ォ―マット５２０である。図１６の説明では、ｃｐ＿ｚ
ｆ信号９１６がアサ―トされているものとする。これは
浮動小数点Ｚ値が必要であることを意味する。ステップ
４１６の固定−浮動変換処理で生じるＺ値は、次にステ
ップ４２０で、フレ―ムバッファ１００内で隠面除去オ
ペレ―ションを行うために使用される。Secondly, the pipeline 802 is used to store incoming data before it is transferred to the frame buffer 100.
If necessary, this data is configured to be converted to a floating point format. This conversion process (step 416) is performed when the Z value is chosen to be represented in frame buffer 100 in floating point notation. One example of such a representation is the format 520 shown in FIG. In the description of FIG. 16, cp_z
It is assumed that the f signal 916 is asserted. This means that a floating point Z value is required. The Z value resulting from the fixed-floating conversion process of step 416 is then used in step 420 to perform a hidden surface removal operation in the frame buffer 100.

【００７４】図１６において、パイプライン８０２のブ
ロック図が図示されている。Ｚピクセル値を処理する部
分だけが図示されている。パイプライン８０２は、パイ
プライン８０２Ａと８０２Ｂを表す。一つの実施の形態
では、パイプライン８０２はステ―ジ９００Ａ〜９００
Ｈに分割される。しかし、他の実施の形態では、パイプ
ライン８０２はもっと多くの、またはもっと少ないステ
―ジに分割することができる。Referring to FIG. 16, a block diagram of the pipeline 802 is shown. Only the part that processes the Z pixel values is shown. Pipeline 802 represents pipelines 802A and 802B. In one embodiment, pipeline 802 includes stages 900A-900.
H. However, in other embodiments, the pipeline 802 can be divided into more or fewer stages.

【００７５】Ｚパイプライン８０２の第１ステ―ジ（ス
テ―ジ９００Ａ）は、２つのソ―スのＺピクセル・デ―
タ、即ち、一定Ｚ値９０２とピクセル・デ―タ・マルチ
プレクサ３３２からのＺ値（バス８１０上の共通Ｚ指数
値とバス８１２上の固定小数点Ｚ値とから成る）とを受
取るように構成されているマルチプレクサ９０６を含
む。処理方法４００において、マルチプレクサ９０６
は、制御信号９０４に応じてバス８１０／８１２上にデ
―タを転送する。このデ―タはピクセル出力バス９０７
上で伝えられる。The first stage (stage 900A) of the Z pipeline 802 is a two-source Z pixel database.
Data, i.e., a constant Z value 902 and a Z value from the pixel data multiplexer 332 (comprising a common Z index value on bus 810 and a fixed point Z value on bus 812). Multiplexer 906. In processing method 400, multiplexer 906
Transfers data on the bus 810/812 in response to the control signal 904. This data is sent to the pixel output bus 907
Conveyed above.

【００７６】ピクセル出力バス９０７はインバ―タ９０
８とバレル・シフタ―９１２に連結されている。ステッ
プ４１４で、これらのユニットは、論理ゲ―ト９１０と
共に、バス９０７上のデ―タを固定小数点フォ―マット
５３０に変換する。このステップは、共通Ｚ指数値の補
数に等しい量だけフォ―マット５１０の仮数部５１６を
シフトすることを含む。したがって、フォ―マット５３
０の仮数部５３２は、フォ―マット５１０の仮数部５１
６とは異なり、ステップ４０６で生成される共通Ｚ指数
値に対応する指数値に結び付けられなければならない。The pixel output bus 907 is connected to the inverter 90
8 and the barrel shifter 912. At step 414, these units, along with logic gate 910, convert the data on bus 907 to fixed point format 530. This step involves shifting the mantissa 516 of the format 510 by an amount equal to the complement of the common Z exponent value. Therefore, format 53
The mantissa part 532 of 0 is the mantissa part 51 of the format 510.
Unlike 6, it must be tied to an index value corresponding to the common Z index value generated in step 406.

【００７７】バス９０７上のデ―タ（共通Ｚ指数値）の
最上位４ビットはインバ―タ９０８に伝えられる。次
に、反転されたビットは論理ゲ―ト９１０に伝えられ、
そこで制御信号９１６で修正される。したがって、論理
ゲ―ト９１０の出力は、バレル・シフタ９１２に供給さ
れる仮数部５１６の値のシフト計数である。例えば、共
通Ｚ指数値が１５（最大値）であれば、インバ―タ９０
８で与えられるシフト数はゼロである。結果的には、シ
フタ９１２はシフトを行わない。逆に、０の共通Ｚ指数
値によって、１５のシフト数が作られる。The four most significant bits of the data (common Z index value) on the bus 907 are transmitted to the inverter 908. Next, the inverted bit is passed to logic gate 910,
Therefore, the correction is made by the control signal 916. Thus, the output of logic gate 910 is a shift count of the value of mantissa 516 provided to barrel shifter 912. For example, if the common Z index value is 15 (maximum value), the inverter 90
The shift number given by 8 is zero. As a result, shifter 912 does not shift. Conversely, a common Z index value of 0 produces a shift number of 15.

【００７８】バレル・シフタ９１２により、マルチプレ
クサ９０６から伝えられたデ―タから２８ビットの固定
小数点仮数部５３２が作られる。このデ―タは、次のク
ロックサイクルでＺクリップ試験ハ―ドウェアに伝える
ために、第１パイプライン・ステ―ジの端でラッチされ
る。また、バス９０７上の最上位の仮数ビットは、先行
ゼロ符号化ユニット９１４に伝えられる。ユニット９１
４は、バス９０７上の仮数値の先行ゼロの数をパイプラ
イン・ステ―ジ９００Ｂのハ―ドウェアに伝える。The barrel shifter 912 forms a 28-bit fixed-point mantissa 532 from the data transmitted from the multiplexer 906. This data is latched at the end of the first pipeline stage for transmission to the Z-clip test hardware in the next clock cycle. Also, the most significant mantissa bit on bus 907 is passed on to leading zero encoding unit 914. Unit 91
4 communicates the number of leading zeros of the mantissa value on bus 907 to the hardware of pipeline stage 900B.

【００７９】パイプライン・ステ―ジ９００Ｂでは、入
力バス８１２上のデ―タを正規化された浮動小数点数に
変換する第１の部分だけでなく、Ｚクリップ決定も行わ
れる。前のパイプライン・ステ―ジ９００Ａから転送さ
れたデ―タについてのＺクリップ決定が比較器９２２に
よって行われる。比較器９２２Ａは、ｐ１＿ｚｆｉｘバ
ス８１４上を伝達されるＺピクセル値をｚｍｉｎ値９２
０と比較する。同様に、比較器９２２Ｂは、バス８１４
上の値をｚｍａｘ値９１８と比較する。Ｚピクセル値が
これらの値のいずれかの外にある時は、Ｚ方向でクリッ
ピングが行われるべきである。したがって、論理ゲ―ト
９２４はその出力をアサ―トし、該出力は後のパイプラ
イン・ステ―ジ９００Ｃでｚｐ＿ｚｃｌｉｐ信号９２６
として伝えられる。更に、バス８１４上の固定小数点値
は、さらに処理するために深度キュ―イング・パイプラ
イン８０４に伝えられる。At pipeline stage 900B, a Z-clip decision is also made, as well as the first part of converting the data on input bus 812 to a normalized floating point number. A Z-clip decision on the data transferred from the previous pipeline stage 900A is made by comparator 922. Comparator 922A converts the Z pixel value transmitted on p1_zfix bus 814 to a zmin value of 92.
Compare with 0. Similarly, comparator 922B is connected to bus 814
Compare the above value with the zmax value 918. When the Z pixel value is outside any of these values, clipping should occur in the Z direction. Accordingly, logic gate 924 asserts its output, which is output at a later pipeline stage 900C by the zp_zclip signal 926.
Conveyed as. Further, the fixed point values on bus 814 are communicated to depth queuing pipeline 804 for further processing.

【００８０】また、Ｚバッファ値の浮動小数点正規化
は、パイプライン・ステ―ジ９００Ｂで始められる。ｚ
値をフォ―マット５１０からフォ―マット５２０に変換
するために、仮数部５１６の最上位１ビットは、隠し１
の位置５２４にシフトされる必要がある。この処理は、
所定のＺ値の仮数部５１４の先行ゼロの数を数えるユニ
ット９１４によって始められる。The floating point normalization of the Z buffer value is started at pipeline stage 900B. z
To convert the value from format 510 to format 520, the most significant bit of mantissa 516 is
Needs to be shifted to position 524. This process
Beginned by a unit 914 that counts the number of leading zeros in the mantissa 514 of a given Z value.

【００８１】減算器９３６は、バス９２８上のＺ値の先
行ゼロのこの数と共通Ｚ指数値（バス９３０の最上位４
ビット）とを受取るように連結されている。減算器９３
６は、共通指数値から先行ゼロの数を引いて、その結果
を差バス９４０に伝えるように構成されている。この結
果が負でない場合には、仮数部５２６の値（これは、正
規化されている）を生成するために、先行ゼロの数に１
を足した数に等しい数だけ仮数部５１６の値をシフトし
なければならない。この結果が負の場合（先行ゼロの数
の方が共通Ｚ指数より大きい）には、これは、バス９０
７上を伝えられる仮数部の値５１６を共通Ｚ指数値だけ
シフトしなければならないことを示している（しかし、
これによって、正規化されない数が生じる）。The subtractor 936 calculates this number of leading zeros of the Z value on bus 928 and the common Z index value (the most significant four
Bit). Subtractor 93
6 is configured to subtract the number of leading zeros from the common index value and communicate the result to the difference bus 940. If the result is non-negative, the number of leading zeros is incremented by 1 to produce the value of the mantissa 526 (which is normalized).
Must be shifted by a number equal to the sum of. If the result is negative (the number of leading zeros is greater than the common Z index),
7 indicates that the mantissa value 516 carried over must be shifted by the common Z-index value (but
This results in unnormalized numbers).

【００８２】したがって、減算器９３６は二つの出力、
即ち、差バス９４０（減算の結果）と（差バスが負の場
合にアサ―トされる）キャリーアウト信号９４２とを伝
える。差バス９４０は論理ゲ―ト９３８に伝えられる。
差バス９４０がゼロでなければ、論理ゲ―ト９３８は、
アサ―トされた信号９３４を加算器９３２に伝える。ま
た、加算器９３２は、バス９２８上の先行ゼロの数を受
取る。差バス９４０がゼロでなければ、加算器９３２は
先行ゼロの数に１を足した数をマルチプレクサ９４８に
伝える。マルチプレクサ９４８の他方の入力は、バス９
３０から共通Ｚ指数値を受取る。Therefore, the subtractor 936 has two outputs,
That is, the difference bus 940 (the result of the subtraction) and the carry-out signal 942 (which is asserted when the difference bus is negative) are transmitted. The difference bus 940 is passed to logic gate 938.
If difference bus 940 is non-zero, logic gate 938 is:
The asserted signal 934 is transmitted to the adder 932. Adder 932 also receives the number of leading zeros on bus 928. If difference bus 940 is not zero, adder 932 communicates the number of leading zeros plus one to multiplexer 948. The other input of multiplexer 948 is connected to bus 9
A common Z index value is received from 30.

【００８３】このようにして、マルチプレクサ９４８の
出力（シフト計数）は、キャリーアウト信号９４２の値
に基づいて選択される。先行ゼロの数が共通Ｚ指数値を
超えれば、共通Ｚ指数値がマルチプレクサ９５６に送ら
れる。そうでなければ、バス９４４上の値が伝えられ
る。マルチプレクサ９５６は、これら２つの値の中の選
ばれ１つをパイプライン・ステ―ジ９００Ｃに伝える。
信号９１６がアサ―トされていなければ、ステ―ジ９０
０Ｃに伝えられるシフト計数は４である（これは、当該
値を変更することなくバス９０７上で通過させる効果を
持っている。これらの「喪失された」４ビットは、以下
に説明するように、再連結されるからである。）ことに
留意する必要がある。Thus, the output (shift count) of multiplexer 948 is selected based on the value of carry-out signal 942. If the number of leading zeros exceeds the common Z-index value, the common Z-index value is sent to multiplexer 956. Otherwise, the value on bus 944 is communicated. Multiplexer 956 communicates a selected one of these two values to pipeline stage 900C.
If signal 916 is not asserted, stage 90
The shift count conveyed to OC is 4 (this has the effect of passing the value on bus 907 without change. These 4 "lost" bits are described below. , Because they are reconnected.)

【００８４】マルチプレクサ９５０は、パイプライン８
０２で処理されているＺ値の最上位４ビットとしてマル
チプレクサ９５２にどの値を伝えるかを選択するために
使用される。Ｚ値について浮動小数点モ―ドが選択され
ると（信号９１６がオン）、これらの４ビットは、図１
２のＣに示す浮動小数点指数部５２２の値を形成する。
浮動小数点モ―ドが選ばれなければ、これらの４ビット
は図１２のＤに示す仮数部５３２の最上位仮数ビットと
なる。また、マルチプレクサ９５０で選択される値は、
キャリーアウト信号９４２の値に依存している。キャリ
ーアウト信号９４２が活動状態であれば、ゼロ定数９４
６がマルチプレクサ９５２に伝えられる。この結果は指
数値をゼロに設定することになる。しかし、キャリーア
ウト信号９４２が非活動状態であれば、減算器９３６で
計算された差バス９４０上の値が、指数部５２２の値と
して伝えられる。The multiplexer 950 is connected to the pipeline 8
02 is used to select which value to pass to multiplexer 952 as the four most significant bits of the Z value being processed. When the floating point mode is selected for the Z value (signal 916 is on), these four bits are
2 to form the value of the floating point exponent 522 shown in C.
If the floating point mode is not selected, these four bits become the most significant mantissa bits of the mantissa 532 shown in FIG. The value selected by the multiplexer 950 is
It depends on the value of the carry-out signal 942. If carry out signal 942 is active, zero constant 94
6 is communicated to the multiplexer 952. This will set the exponent value to zero. However, if the carry-out signal 942 is inactive, the value on the difference bus 940 calculated by the subtractor 936 is transmitted as the value of the exponent 522.

【００８５】マルチプレクサ９５０から伝えられる値
は、マルチプレクサ９５２を介してパイプライン・ステ
―ジ９００Ｃに転送される。パイプライン・ステ―ジ９
００Ｃにおいて、マルチプレクサ９５２の出力はＺ値の
最上位４ビットになる。上に説明したように、Ｚ値が浮
動小数点フォ―マット５２０で表現されるものであれ
ば、これらの４ビットは指数値５２２を形成する。ま
た、パイプライン・ステ―ジ９００Ｃは、浮動小数点フ
ォ―マット５２０に変換されているＺ値の正規化を完成
させる。これは、バス９３０の仮数値を左にシフトする
バレル・シフタ９５８によって行われ、仮数部５２６の
値が生成される。この仮数値は、レジスタ９６０の指数
値と連結される。固定小数点Ｚ値（信号９１６は非活動
状態）は変更されることなくレジスタ９６０に伝えられ
る。The value transmitted from multiplexer 950 is transferred to pipeline stage 900C via multiplexer 952. Pipeline Stage 9
At 00C, the output of multiplexer 952 is the four most significant bits of the Z value. As explained above, if the Z value is represented in floating point format 520, these four bits form an exponent value 522. Also, pipeline stage 900C completes the normalization of the Z values that have been converted to floating point format 520. This is done by barrel shifter 958 which shifts the mantissa of bus 930 to the left, producing the value of mantissa 526. This mantissa value is concatenated with the exponent value in register 960. The fixed point Z value (signal 916 is inactive) is passed to register 960 unchanged.

【００８６】パイプライン・ステ―ジ９００Ｄにおい
て、マルチプレクサ９６２は、ステ―ジ９００Ｃから伝
えられるＺ値と別のパイプラインから伝えられるピクセ
ル値とのうちのいずれかを選ぶ。ステ―ジ９００Ｅ〜９
００Ｇは、ピクセル処理装置３３４の他のデ―タパス・
パイプラインとの整合を取るために存在している。ステ
―ジ９００Ｈでは、浮動小数点Ｚ値はバス９６４上で伝
えられる。好ましい実施の形態では、この値（正規化さ
れていても、いなくてもよい）は、浮動小数点フォ―マ
ット５２０で表される。この値は、隠面除去オペレ―シ
ョンのためにフレ―ムバッファ・インタ―フェイス３３
６に、そして続いてフレ―ムバッファ１００に伝えられ
る。本発明の好ましい実施の形態で使用される浮動小数
点フォ―マットでは、前から後ろに一様に単調に減少す
る数の分布が生じるので、Ｚバッファの比較には、簡単
な整数比較で十分である。At pipeline stage 900D, multiplexer 962 selects between the Z value transmitted from stage 900C and the pixel value transmitted from another pipeline. Stage 900E-9
00G is another data path data of the pixel processing unit 334.
It exists to align with the pipeline. At stage 900H, the floating point Z value is transmitted on bus 964. In the preferred embodiment, this value (which may or may not be normalized) is represented in floating point format 520. This value is used by the frame buffer interface 33 for hidden surface removal operations.
6 and subsequently to the frame buffer 100. In the floating point format used in the preferred embodiment of the present invention, a simple integer comparison is sufficient for the Z-buffer comparison, since a uniformly monotonically decreasing number distribution occurs from front to back. is there.

【００８７】方法４００を利用することによって、Ｚ値
のより効率の高い処理が効果的に可能となる。変換され
たＺ値を固定小数点フォ―マット（例えば、５１０）に
変換することによって、グラフィックス・パイプライン
内で色およびアルファ値を処理する同じハ―ドウェアの
再利用が可能である。パイプラインのピクセル処理ステ
―ジまでプリミティブ全体を単一のＺ値で表すことで、
効率の向上が達成される。一つの実施の形態では、中間
の表現（即ち、フォ―マット５２０）として使用される
固定小数点フォ―マットを、Ｚクリップ試験や深度キュ
―イングのようなオペレ―ション用の等価な固定小数点
フォ―マットに変換することができる。By utilizing the method 400, more efficient processing of the Z value is effectively enabled. By converting the transformed Z values to a fixed point format (eg, 510), the same hardware that handles color and alpha values in the graphics pipeline can be reused. By representing the entire primitive with a single Z value up to the pixel processing stage of the pipeline,
Increased efficiency is achieved. In one embodiment, the fixed-point format used as an intermediate representation (ie, format 520) is converted to an equivalent fixed-point format for operations such as Z-clip testing and depth queuing. -Can be converted to a mat.

【００８８】本発明のシステムおよび方法は、記載され
た実施の形態に関連して説明されたが、ここに示された
特定の形態に限定されることを意図したものではなく、
反対に、添付の特許請求の範囲で定義される本発明の精
神と範囲内に合理的に含まれる代替え物、変更物および
等価物を含むことを意図するものである。Although the system and method of the present invention have been described in connection with the described embodiments, it is not intended to be limited to the specific form set forth herein.
On the contrary, it is intended to cover alternatives, modifications and equivalents, which are reasonably included within the spirit and scope of the invention as defined by the appended claims.

[Brief description of the drawings]

【図１】従来技術によるグラフィックス処理システムに
おけるＺバッファリング処理を示す。FIG. 1 shows a Z-buffering process in a graphics processing system according to the prior art.

【図２】隠面除去が正しく行われた三次元物体の説明図
である。FIG. 2 is an explanatory diagram of a three-dimensional object from which hidden surface removal has been correctly performed.

【図３】隠面除去が正しく行われなかった三次元物体の
説明図である。FIG. 3 is an explanatory diagram of a three-dimensional object in which hidden surface removal has not been correctly performed.

【図４】隠面除去が正しく行われなかった三次元物体の
説明図である。FIG. 4 is an explanatory diagram of a three-dimensional object for which hidden surface removal has not been correctly performed.

【図５】Ｚ値の従来技術による数値表現によって生じる
精度の喪失を示す図である。FIG. 5 illustrates the loss of precision caused by the prior art numerical representation of the Z value.

【図６】本発明による三次元（３−Ｄ）グラフィックス
・アクセラレ―タを含むコンピュ―タ・システムを示す
図である。FIG. 6 illustrates a computer system including a three-dimensional (3-D) graphics accelerator according to the present invention.

【図７】図６のコンピュ―タ・システムの簡単化された
ブロック図である。FIG. 7 is a simplified block diagram of the computer system of FIG.

【図８】本発明の好ましい実施の形態による３−Ｄグラ
フィックス・アクセラレ―タを説明するブロック図であ
る。FIG. 8 is a block diagram illustrating a 3-D graphics accelerator according to a preferred embodiment of the present invention.

【図９】本発明の好ましい実施の形態の３−Ｄグラフィ
ックス・アクセラレ―タの浮動小数点処理装置の一つを
説明するブロック図である。FIG. 9 is a block diagram illustrating one floating-point processing device of a 3-D graphics accelerator according to a preferred embodiment of the present invention.

【図１０】本発明の一つの実施の形態の３−Ｄグラフィ
ックス・アクセラレ―タの描画処理装置の一つを説明す
るブロック図である。FIG. 10 is a block diagram illustrating one of the rendering processing apparatuses of the 3-D graphics accelerator according to one embodiment of the present invention.

【図１１】本発明の一つの実施の形態によるグラフィッ
クス・システムの処理パイプラインにおいてＺ値を処理
する方法を図示するフロ―チャ―トである。FIG. 11 is a flowchart illustrating a method of processing a Z value in a processing pipeline of a graphics system according to one embodiment of the present invention.

【図１２】Ａ〜Ｄは、本発明の一つの実施の形態による
３−Ｄグラフィックス・アクセラレ―タ内で利用される
Ｚ値の様々のフォ―マットを示す図である。FIGS. 12A-12D illustrate various formats of Z values utilized in a 3-D graphics accelerator according to one embodiment of the present invention.

【図１３】本発明の一つの実施の形態による指数比較器
装置を示すブロック図である。FIG. 13 is a block diagram illustrating an exponent comparator device according to one embodiment of the present invention.

【図１４】本発明の一つの実施の形態による共通Ｚ指数
値を格納するために使用されるヘッダ―・ワ―ドを示す
図である。FIG. 14 illustrates a header word used to store a common Z-index value according to one embodiment of the present invention.

【図１５】本発明の一つの実施の形態によるピクセル処
理装置のブロック図である。FIG. 15 is a block diagram of a pixel processing apparatus according to one embodiment of the present invention.

【図１６】本発明の一つの実施の形態によるＺピクセル
値を処理するように構成されたパイプラインのブロック
図である。FIG. 16 is a block diagram of a pipeline configured to process Z pixel values according to one embodiment of the present invention.

───────────────────────────────────────────────────── フロントページの続き (71)出願人 597004720 2550 ＧａｒｃｉａＡｖｅｎｕｅ，ＭＳＰＡＬ１−521，ＭｏｕｎｔａｉｎＶｉｅｗ，Ｃａｌｉｆｏｒｎｉａ 94043− 1100，ＵｎｉｔｅｄＳｔａｔｅｓｏｆＡｍｅｒｉｃａ ──────────────────────────────────────────────────続き Continuation of front page (71) Applicant 597004720 2550 Garcia Avenue, MS PAL1-521, Mountain View, California 94043-1100, United States of America

Claims

[Claims]

1. A first system in a graphics system.
The method of processing a set of z-coordinate values, wherein the first set of z-coordinate values corresponds to vertices of a first geometric primitive, the method represented in a first floating-point format. Receiving a first set of z-coordinate values; converting the first set of z-coordinate values to a fixed-point format; and receiving the first set of z-coordinate values in the fixed-point format. First set of graphics operators using z-coordinate values
Generating a second set of z-coordinate values expressed in the fixed-point format; and converting the second set of z-coordinate values to a second floating-point format. And the second floating point format represented by the second floating point format.
Performing a second set of graphics operations using the set of z-coordinate values.

2. The method according to claim 1, wherein the value represented in the first floating-point format includes a mantissa part and an exponent part.

3. The method according to claim 1, further comprising: setting a common z value of the first set of z coordinate values.
3. The method of claim 2, comprising generating an index value.

4. The step of generating the common z-index value comprises: determining a maximum z-coordinate value of the first set of z-coordinate values; and generating the common z-index value from the maximum z-coordinate value. 4. The method of claim 3, comprising:

5. The method of claim 1, wherein said first set of graphics
4. The method of claim 3, wherein performing an operation comprises utilizing the common z-index value.

6. The step of converting the first set of z coordinate values to the fixed point format comprises:
5. The method of claim 4, comprising scaling the first set of Z coordinate values according to an exponent value.

7. The method of claim 6, wherein performing the first set of graphics operations comprises utilizing the scaled first set of z coordinate values.

8. The method according to claim 1, wherein the first set of Z coordinate values is greater than 0.0 or less than 1.0 and the first set of Z coordinate values is 1.
A z-value of 0 indicates that the given vertex is in the front z-clipping plane, and a z-value approaching 0.0 for the given vertex is that the given vertex is near the back z-clipping plane. The method of claim 1, wherein the method comprises:

9. The method of claim 1, wherein the first set of graphics operations includes a setup operation of the first geometric primitive.

10. The method of claim 1, wherein the first set of graphics operations includes edge interpolation of the first geometric primitive.

11. The method of claim 1, wherein the first set of graphics operations includes span interpolation of the first geometric primitive.

12. The method of claim 3, wherein the value represented in the fixed-point format includes a mantissa.

13. The step of converting the second set of z-coordinate values to the second floating-point format, wherein the step of converting the second set of z-coordinate values to the second floating-point format comprises
Normalizing the mantissa of each of the coordinate values, wherein the normalizing utilizes the common Z-exponent value to generate a mantissa and an exponent of the second floating-point format. 13. The method of claim 12, comprising:

14. The method of claim 1, wherein the second set of graphics operations includes performing a Z-buffer comparison to achieve hidden surface removal.

15. The method of claim 15, further comprising enabling a floating point Z-buffer mode, converting the second set of z coordinate values to the second floating point format, and 2. The method of claim 1, wherein said step of performing a set of graphics operations is performed in response to said enabling step.

16. A system for processing z-coordinate values corresponding to vertices of a first graphics primitive, the system comprising a first graphics primitive corresponding to the first graphics primitive and represented in a first floating-point format. An input device coupled to receive the first set of z-coordinate values, and an input device coupled to receive the first set of z-coordinate values from the input device; A floating-point to fixed-point conversion device configured to convert to a fixed-point format; and a fixed-point format coupled to receive the representation of the first set of z-coordinate values in the fixed-point format.
A first graphics process configured to perform a first graphics operation using the first set of z-coordinate values represented in a matte to generate a second set of z-coordinate values; And a second set of z from the first graphics processing device.
A fixed-point to floating-point converter coupled to receive coordinate values and configured to convert the second set of z-coordinate values to a second floating-point format; The second expressed in matte
To perform a second set of graphics operations using the second set of z-coordinate values expressed in the second floating-point format. And a second graphics processing device configured as described above.

17. The system of claim 16, wherein the value represented in the first floating point format includes a mantissa and an exponent.

18. The system of claim 17, wherein the floating-point to fixed-point converter is configured to generate a common z-index value of the first set of z-coordinate values.

19. The floating-point to fixed-point conversion device is configured to determine a maximum z-coordinate value of the first set of z-coordinate values, and further comprising the common z-index from the maximum z-coordinate value. 19. The system of claim 18, wherein the system is configured to generate a value.

20. The first graphics processing device,
20. The system of claim 18, wherein the system is configured to utilize the common z-index value when performing the first set of graphics operations.

21. The system of claim 18, wherein the floating-point to fixed-point converter is configured to scale the first set of z-coordinate values according to the common z-index value.

22. The first graphics processing device,
22. The system of claim 21, wherein the system is configured to utilize the scaled first set of z-coordinate values in performing the first set of graphics operations.

23. The first set of z coordinate values is greater than 0.0 or less than or equal to 1.0, and a z value of 1.0 for a given vertex is such that the given vertex is in the forward z clipping plane. 17. The method of claim 16, indicating that a z value approaching 0.0 for the given vertex indicates that the given vertex is approaching a posterior z clipping plane.

24. The system of claim 16, wherein the first graphics processing device is configured to perform a setup operation of the first geometric primitive.

25. The system of claim 16, wherein the first graphics processing device is configured to perform edge interpolation of the first geometric primitive.

26. The system of claim 16, wherein the first graphics processing device is configured to perform span interpolation of the first geometric primitive.

27. The system of claim 16, wherein the value represented in the fixed point format includes a mantissa.

28. The fixed-point-to-floating-point converter converts the second point represented by the fixed point format into
And further configured to normalize a mantissa of each of the set of z-coordinate values, and further utilize the common z-exponent value to generate a mantissa and an exponent of the second floating-point format. 28. The system of claim 27, wherein the system is configured to:

29. The second graphics processing device,
17. The system of claim 16, wherein the system is configured to perform a Z-buffer comparison to achieve hidden surface removal.

30. The apparatus of claim 30, further comprising a controller configured to enable a floating point Z-buffer mode, wherein the fixed point-to-floating point converter includes a floating point Z-buffer mode. Responsive to being enabled, configured to convert the second set of z coordinate values to the second floating point format; and
17. The graphics processing unit is configured to perform the second set of graphics operations in response to the floating-point Z-buffer mode being enabled. System.

31. A system for processing a first set of z-coordinate values, wherein the first set of z-coordinate values corresponds to vertices of a first geometric primitive. Receiving means for receiving the first set of z-coordinate values expressed in a format; first converting means for converting the first set of z-coordinate values to a fixed-point format; and the fixed-point format. A first set of graphics operations is performed using the first set of z-coordinate values expressed in a matte, and a second set of z-coordinate values expressed in the fixed-point format. First graphics processing means for generating the second set of z-coordinate values into a second floating-point format, and the second set of z-coordinate values in the second floating-point format. Second
And a second graphics processing means for performing a second set of graphics operations using the set of z-coordinate values.