JP3457453B2

JP3457453B2 - Coordinate transformation processing device

Info

Publication number: JP3457453B2
Application number: JP02018596A
Authority: JP
Inventors: 進博井出; 敦国松; 麻紀植野
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1996-02-06
Filing date: 1996-02-06
Publication date: 2003-10-20
Anticipated expiration: 2016-02-06
Also published as: JPH09212660A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、座標変換処理装置
に関し、特に、コンピュータグラフィック処理等に用い
られるジオメトリカルな処理を行う座標変換処理装置に
関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a coordinate conversion processing device, and more particularly to a coordinate conversion processing device for performing geometrical processing used for computer graphic processing and the like.

【０００２】[0002]

【従来の技術】近年、マルチメディアの急速な普及、Ｗ
ＹＳＷＹＧの徹底化、高度なＧＵＩ（Graphic User Int
erface）、またグラフィックを用いたＴＶゲームの普
及、などにより、コンピュータ・グラフィック（ＣＧ）
が大変重要になってきている。特にパソコンの家庭への
急激な普及、ＴＶゲーム機の普及により、高性能プロセ
ッサ上で走らせるアプリケーションとして3 次元コンピ
ュータ・グラフィックス(3D-CG) 、特に高い品位の動画
の要求が高まっている。動画を処理するには、１フレー
ムを３０〜６０分の１秒で処理する必要があり、この処
理には膨大な計算量、計算能力が要求される。2. Description of the Related Art In recent years, the rapid spread of multimedia, W
Thorough implementation of YSWYG, advanced GUI (Graphic User Int)
computer graphics (CG) due to the spread of TV games using graphics
Is becoming very important. In particular, due to the rapid spread of personal computers in homes and the spread of TV game machines, there is an increasing demand for three-dimensional computer graphics (3D-CG), especially high-quality moving images, as applications that run on high-performance processors. In order to process a moving image, it is necessary to process one frame in 30 to 1/60 second, and this process requires a huge amount of calculation and calculation ability.

【０００３】コンピュータによるグラフィック処理は、
大きく２つのフェイズに分かれる。すなわち、モデリン
グされたデータそのものの移動、視点に合わせた移動な
どの座標の変換、および投影を行ってＣＲＴ上に映し出
すイメージを生成する幾何学処理を行うジオメトリカル
処理と、画像イメージを実際にＣＲＴ上に描いていくレ
ンダリング処理である。座標変換、視点変換などの幾何
学的なグラフィックモデルの変換処理や照光処理を行う
フェイズであるジオメトリカル処理では、行列演算、ベ
クトル演算、が行われるため、内積の演算が多く用いら
れる。座標変換の詳細に関しては、コンピュータ・グラ
フィックスの文献に種々紹介されている。Graphic processing by a computer is
There are two major phases. That is, movement of modeled data itself, transformation of coordinates such as movement according to a viewpoint, geometric processing for performing projection and geometric processing for generating an image projected on a CRT, and actual image processing for CRT. This is the rendering process that we will draw above. In geometrical processing, which is a phase of performing geometrical graphic model conversion processing such as coordinate conversion and viewpoint conversion, and illumination processing, matrix operations and vector operations are performed, and thus inner product operations are often used. Details of coordinate transformation are introduced in various documents of computer graphics.

【０００４】図８に典型的な座標変換処理装置（ＧＴ
Ｅ）の構成を示す。ＧＴＥは、演算器部８０１、レジス
タファイル８０２、内部記憶装置８０３、入出力インタ
フェース８０４などから構成される。演算部８０１は、
行列演算を行うデータパスで、加減算器、乗算器、除算
器、開平演算器などから構成される。内部記憶装置８０
３は、外部記憶装置に記憶されたデータを処理を効率よ
く行うために一時的にデータを記憶するほか、変換行列
などの定数を格納する。入出力インタフェース８０４
は、外部記憶装置と内部記憶装置８０３、レジスタファ
イル８０２、演算器８０１とのインタフェースである。FIG. 8 shows a typical coordinate conversion processing device (GT
The structure of E) is shown. The GTE is composed of an arithmetic unit 801, a register file 802, an internal storage device 803, an input / output interface 804, and the like. The calculation unit 801
This is a data path for matrix operations, and is composed of adder / subtractor, multiplier, divider, square root calculator, and the like. Internal storage device 80
3 temporarily stores the data stored in the external storage device in order to efficiently process the data, and also stores a constant such as a conversion matrix. I / O interface 804
Is an interface between the external storage device and the internal storage device 803, the register file 802, and the arithmetic unit 801.

【０００５】１：データ転送３Ｄコンピュータ・グラフィックスのデータは、モデリ
ングによるが、一般に独立三角形の集まりとして扱われ
る。独立三角形の３つの頂点は同次座標によって表さ
れ、外部記憶装置に格納されている。1: Data Transfer 3D computer graphics data, though modeled, is generally treated as a collection of independent triangles. The three vertices of the independent triangle are represented by homogeneous coordinates and are stored in the external storage device.

【０００６】従来の座標変換処理装置の多くは、大きな
容量の記憶装置を内部に実装していない。そのため、図
形データを外部記憶装置より読み込み、ＦＩＦＯなどを
通して、演算器、レジスタ・ファイルなどのデータパス
に送り込んでいる。この方法では、バスのレイテンシ、
記憶装置のアクセス速度などによるデータ転送速度の揺
らぎは、ＦＩＦＯを入力あるいは出力用のバッファとし
て用いて隠蔽しているが、記憶装置のアクセス速度、バ
スの応答速度に律即され、十分な転送バンド幅を確保す
ることができない。Most of the conventional coordinate conversion processing devices do not have a large capacity storage device installed therein. Therefore, graphic data is read from an external storage device and sent to a data path such as an arithmetic unit or a register file through a FIFO or the like. This way, the latency of the bus,
The fluctuation of the data transfer rate due to the access speed of the storage device is hidden by using the FIFO as a buffer for input or output, but it is controlled by the access speed of the storage device and the response speed of the bus, and the transfer band is sufficient. The width cannot be secured.

【０００７】一方、ある程度の内部記憶装置を実装し、
ＤＭＡ（Direct memory access）により、高速にデー
タを取り込み、演算を行う方式のものがある。このよう
な方式の演算器は、内部記憶装置を外部記憶装置と内部
の演算器及びレジスタファイルよりアクセスされるた
め、データの転送と演算を並列に実行することが難し
い。従って、データ転送、データ処理の２つの処理フェ
イズを交互に行うことになり、処理がパイプライン的に
効率よく実行することができない。また、ＤＭＡにより
データの転送は高速になるが、全体の処理の高速化は十
分に行うことができない。On the other hand, an internal storage device is mounted to some extent,
There is a method of fetching data at high speed and performing calculation by DMA (Direct memory access). In an arithmetic unit of such a system, the internal storage device is accessed from the external storage device and the internal arithmetic unit and register file, so that it is difficult to perform data transfer and arithmetic operation in parallel. Therefore, the two processing phases of data transfer and data processing are alternately performed, and the processing cannot be efficiently executed in a pipeline manner. Further, although the data transfer becomes faster by the DMA, it is not possible to sufficiently speed up the whole process.

【０００８】同様な構成で並列実行を行い処理の効率を
上げるために、複数のポートを有する記憶装置を実装す
る構成が考えられる。しかし、この場合は同一記憶装置
へのアクセス・コンフクリクトの調停など、制御が著し
く複雑になるとともに、記憶装置のコストも大きくなる
という問題があり、十分な処理性能を得るだけの大容量
記憶装置を実装することができない。A configuration having a storage device having a plurality of ports is conceivable in order to perform parallel execution and improve processing efficiency with the same configuration. However, in this case, there is a problem that the control is remarkably complicated such as arbitration of access conflict to the same storage device and the cost of the storage device is also increased. Therefore, it is necessary to use a large-capacity storage device that can obtain sufficient processing performance. Cannot be implemented.

【０００９】２：変換処理ここでは従来例を示す前に、簡単な透視変換の例を示
す。透視変換とは、３次元の図形モデルを遠近を考慮し
て、２次元上に投影する変換である。今、入力 (x, y,
z, 1) を、変換されるべき頂点座標とすると、透視変換
は、以下のようにして行われ、 (X, Y) 透視変換後のス
クリーンX 座標・Y 座標を出力する。2: Transform processing Here, an example of a simple perspective transformation is shown before showing the conventional example. The perspective transformation is a transformation in which a three-dimensional figure model is projected in two dimensions in consideration of perspective. Now input (x, y,
Let z, 1) be the coordinates of the vertex to be transformed, the perspective transformation is performed as follows, and the screen X and Y coordinates after (X, Y) perspective transformation are output.

【００１０】 (２) W = 1/w′ (３) (X, Y) = (x′, y′) × W このように透視変換では、行列計算に伴う積和演算だけ
ではなく、その結果を用いて除算をおこなう必要があ
る。また、ｘ，ｙ，ｚ，ｗの各座標に関する演算は、
ほぼ同一かつ独立であり、演算に関し高い並列性と対称
性を持っているという特徴がある。[0010] (2) W = 1 / w ′ (3) (X, Y) = (x ′, y ′) × W Thus, in perspective transformation, not only the product sum operation associated with matrix calculation but also the result is used. It is necessary to perform division. Also, the calculation for each coordinate of x, y, z, w is
They are almost the same and independent, and have the characteristic that they have a high degree of parallelism and symmetry regarding arithmetic operations.

【００１１】図８の典型的な従来例では、積和演算器、
加減算器が１ユニットづつのみ実装されている。このよ
うな変換処理装置では、上記に示した演算を単純なパイ
プライン処理によって逐次的に処理することしかできな
い。演算に関する高い並列性と対称性を持っているとい
う特徴は、単に命令のスケジューリングに於いて活用さ
れるのみである。In the typical conventional example shown in FIG.
Only one adder / subtractor is implemented. In such a conversion processing device, the above-described operations can only be sequentially processed by a simple pipeline process. The feature of having high parallelism and symmetry with respect to operations is utilized only in instruction scheduling.

【００１２】演算に関する特徴を応用した構成として、
図９に示すような構成がある。この構成では、ｘ，ｙ，
ｚ，ｗの各座標にレジスタファイル、積和演算器を対
応づけることにより、独立に計算することができる。す
なわち、 (１) (x′, y′, w′) ＝ (x, y, z, 1)×( a, b, c ) ( d, e, f ) ( g, h, i ) ( j, k, l ) ＝(ax+dy+gz+j, bx+ey+hz+k, cx+fy+iz+l) において、ax+dy+gz+jを第１の演算器, bx+ey+hz+kを第
２の演算器, cx+fy+iz+lを第３の演算器に割り当てて、
独立に計算する。このように演算の特性を考慮すること
によって、高速な演算を可能にしている。しかし、この
ような構成では、 (２) W = 1/w′ (３) (X, Y) = (x′, y′) × W の計算を効率よく行うことができない。除算は、一度だ
け行えばよいので、除算を行っている間は、複数の演算
器を有効に活用することができない。また、除算は、他
の演算に比べレイテンシが大きいので、特に高価な複数
の演算器を有効に動作させることができないという問題
がある。しかし、このような構成では、投資したハード
ウエアに見合う十分な性能が得られないという問題点が
ある。As a configuration to which the characteristics relating to the calculation are applied,
There is a configuration as shown in FIG. In this configuration, x, y,
By associating each register of z and w with a register file and a product-sum calculator, calculation can be performed independently. That is, (1) (x ', y', w ') = (x, y, z, 1) × (a, b, c) (d, e, f) (g, h, i) (j, k, l) ＝ (ax + dy + gz + j, bx + ey + hz + k, cx + fy + iz + l) where ax + dy + gz + j is the first arithmetic unit, bx + ey + hz + k is assigned to the second computing unit and cx + fy + iz + l is assigned to the third computing unit,
Calculate independently. By considering the characteristics of the calculation in this way, high-speed calculation is possible. However, with such a configuration, the calculation of (2) W = 1 / w '(3) (X, Y) = (x', y ') x W cannot be performed efficiently. Since the division only needs to be performed once, a plurality of arithmetic units cannot be effectively used while the division is being performed. Moreover, since the latency of the division is larger than that of other arithmetic operations, there is a problem that particularly expensive plural arithmetic units cannot be effectively operated. However, such a configuration has a problem in that sufficient performance commensurate with the invested hardware cannot be obtained.

【００１３】３：照光処理また、現実感のある画像を得るためにオブジェクトに対
して照光処理が行われる。以下の例では、カラーは
Ｒ，Ｇ、Ｂ、の合成として表され、処理は各カラーによ
って行われるものとする。輝度の計算は、光のモデリン
グによるが、一般に以下のようにして求められる。すな
わち、頂点のカラー＝その頂点の材質からの反射＋その
頂点での材質の環境恒特性によって拡大縮小される全体
的な環境光＋全ての光源から適当に減衰された環境
光、拡散、鏡面の影響である。以下に処理の概略を示す。3: Illumination processing Further, in order to obtain a realistic image, the object is illuminated. In the following example, colors are represented as a composite of R, G, B, and processing is performed by each color. The calculation of the brightness depends on the modeling of light, but is generally obtained as follows. That is, the color of the apex = reflection from the material at the apex + the total ambient light scaled by the environmental constants of the material at the apex + the ambient light appropriately attenuated from all light sources, diffuse, mirror surface It is an influence. The outline of the processing is shown below.

【００１４】処理開始必要ならば光線、頂点の法線の正規化する光源無し状態での放射光・環境光を定数として設定する for each ( 光源) { 光源ごとの環境光・拡散光・鏡面光は、個々の光源に対
して計算し全て加算する。If it is necessary to start processing, normalize the rays and the normals of the vertices. Set the radiated light and ambient light in the absence of a light source as constants for each (light source) {Ambient light / diffused light / specular light for each light source. Is calculated for each light source and all are added.

【００１５】頂点から光源へのベクトル( 光入射ベクト
ル：光線ベクトル) を求めるそのベクトルから、頂点と
光源との距離を求め、さらに頂点から光源へのベクトル
を正規化も行う。The vector from the apex to the light source (light incident vector: ray vector) is obtained, the distance between the apex and the light source is obtained, and the vector from the apex to the light source is also normalized.

【００１６】距離から減衰率を計算する光源ベクトルと頂点法線の内積(cosθ) をとるスポットライト効果光源ごとの環境光環境影響＝光源環境係数 × 物質( 頂点) 環境係
数光源ごとの拡散光拡散影響＝ (光源ベクトル・頂点の法線) × 光源拡
散係数× 物質( 頂点) 拡散係数光源ごとの鏡面光Ｌを入社方向の単位ベクトル、Ｖを視
線方向の単位ベクトル、Ｎを放線方向の単位ベクトル、
θを入射角、αを視線ベクトルと反射ベクトルの角度
とすると、視線ベクトルを頂点ベクトルから算出する場合 sx = lx - vx sy = ly - vy sz = lz - vz 視線ベクトルを強制的に−Ｚ軸方向に仮定する場合 sx = lx sy = ly sz = lz + 1 でｓ(sx, sy, sz) を求め、ｓと norm の内積を求
める。Spotlight effect that calculates the attenuation rate from the distance and the inner product (cosθ) of the light source vector and the vertex normal. Environmental light environmental influence of each light source = Environmental light source environmental coefficient x Material (vertex) environmental coefficient Diffuse light diffusion of each light source Impact = (Light source vector / vertex normal) × Light source diffusion coefficient × Material (vertex) Diffusion coefficient Specular light L for each light source is the unit vector in the joining direction, V is the unit vector in the line-of-sight direction, and N is the unit vector in the radial direction ,
If θ is the angle of incidence and α is the angle between the line-of-sight vector and the reflection vector, To calculate the line-of-sight vector from the vertex vector sx = lx-vx sy = ly-vy sz = lz-vz To force the line-of-sight vector in the −Z-axis direction sx = lx sy = ly sz = lz + 1 Find s (sx, sy, sz) and find the inner product of s and norm.

【００１７】内積の結果を光源ごとの鏡面係数 Shinine
ss[i] 乗(pow) して、spec_coef と求める。The result of the inner product is used as the specular coefficient Shinine for each light source.
ss [i] raised (pow) to obtain spec_coef.

【００１８】鏡面影響＝ spec_coef × 光源鏡面係
数 × 物質( 頂点) 鏡面係数全影響＝減衰率 × スポットライト効果 ×(環
境光影響＋拡散光影響＋鏡面光影響) 光源 i の全影響を R, G, B に加える} 全部の光源の影響を加算したら R, G, B を０〜１の間
にクランプする処理終了照光処理に於ける各計算は、前述のように光のモデリン
グに依存するため、詳細については、多少異なるが、こ
こで重要なのは、輝度はの値で定義され、計算が実行さ
れた後の輝度は、これらの値にクランプ［０，１］され
る点である。ここで［０，１］は、０≦ｎ≦１なるｎを
表わす。Specular effect ＝ spec_coef × Light source specular coefficient × Material (vertex) Specular coefficient total effect ＝ Attenuation rate × Spotlight effect × (Environmental light effect + Diffuse light effect + Specular light effect) Total effect of light source i is R, G , Add to B} Clamp R, G, B between 0 and 1 after adding all the influences of light sources Since each calculation in the illumination process depends on the modeling of light as described above, The details are slightly different, but what is important here is that the brightness is defined by the values of and the brightness after the calculation is performed is clamped to these values [0,1]. Here, [0, 1] represents n satisfying 0 ≦ n ≦ 1.

【００１９】従来の演算器では、以下の処理フローに示
すように、比較命令によって輝度の値と‘０’、‘１’
を比較して、必要ならば条件分岐命令で分岐し、定数
‘０’、‘１’を出力することによりクランプ処理を行
っていた。In the conventional arithmetic unit, as shown in the following processing flow, the brightness value and "0" or "1" are given by the comparison instruction.
, And if necessary, branch by a conditional branch instruction, and the constants “0” and “1” are output to perform the clamp processing.

【００２０】 /* R, G, B 値のクランプ・フロー */ if (R 0.0) { R = 0.0 } if (R 1.0) { R = 1.0 } if (G 0.0) { G = 0.0 } if (G 1.0) { G = 1.0 } if (B 0.0) { B = 0.0 } if (B 1.0) { B = 1.0 } このような方法では、クランプの際に分岐命令の実行が
発生し、演算パイプラインの乱れが生じる。輝度の計算
は、画面を構成する各頂点について、Ｒ，Ｇ，Ｂ、の三
原色の計算を行うため、膨大な量の処理が必要となる。
従って上記のような従来のフローでは、頻繁にパイプラ
インの乱れが生じ、輝度計算の処理性能を著しく損ねて
いた。/ * Clamping flow of R, G, B values * / if (R 0.0) {R = 0.0} if (R 1.0) {R = 1.0} if (G 0.0) {G = 0.0} if (G 1.0) {G = 1.0} if (B 0.0) {B = 0.0} if (B 1.0) {B = 1.0} In such a method, execution of a branch instruction occurs during clamping, and the operation pipeline is disturbed. Occurs. The brightness calculation requires a huge amount of processing because the three primary colors R, G, B are calculated for each vertex forming the screen.
Therefore, in the conventional flow as described above, the pipeline is frequently disturbed, and the processing performance of the brightness calculation is significantly impaired.

【００２１】[0021]

【発明が解決しようとする課題】以上のように従来の座
標変換処理装置（ＧＴＥ）では、（１）変換すべき図形
データを効率よく演算器、及びレジスタファイルに転送
することができない、（２）透視変換を行う行列演算に
伴う内積演算及び‘奥行き’による除算を効率よく実行
することができない、（３）照光処理に於ける輝度Ｒ，
Ｇ，Ｂのクランプ処理を高速に実行することができな
い、という問題点があった。本発明は、これらの問題点
を鑑みて成されたものであり、（１）図形データの記憶
装置と座標変換処理装置間の効率の良いデータ転送を行
うこと、（２）透視変換を行う行列演算に伴う内積演算
及び‘奥行き’による除算を効率よく実行すること、
（３）照光処理に於ける輝度Ｒ，Ｇ，Ｂのクランプ処理
を高速に実行すること、以上３つを実現することを目的
とする。As described above, in the conventional coordinate transformation processor (GTE), (1) the graphic data to be transformed cannot be efficiently transferred to the arithmetic unit and the register file. ) It is not possible to efficiently execute the inner product operation and the division by'depth 'that accompany the matrix operation that performs perspective transformation. (3) The brightness R in the illumination processing,
There is a problem that the G and B clamp processing cannot be executed at high speed. The present invention has been made in view of these problems. (1) Efficient data transfer between a storage device for graphic data and a coordinate conversion processing device (2) Matrix for performing perspective conversion Efficiently perform inner product operation and division by'depth 'accompanying operation,
(3) The object is to perform the clamp processing of the brightness R, G, B in the illumination processing at high speed, and to realize the above three.

【００２２】[0022]

【課題を解決するための手段】上記目的を達成するた
め、第１の発明の特徴は、外部記憶装置に記憶され、同
次座標によって表現された図形の頂点データに対し所定
の幾何学的演算処理を施す座標変換処理装置において、
複数の記憶ブロックに分割され、この記憶ブロック毎に
データの入出力を行うことができる内部記憶部であっ
て、前記外部記憶装置より所定の前記頂点データを前記
記憶ブロックに入力して保持し、接続先をデータ保持部
に切り替えて前記頂点データを出力する内部記憶部と、
前記内部記憶部の所定の記憶ブロックに記憶された前記
頂点データの一部を一時的に記憶するデータ保持部と、
このデータ保持部に記憶された頂点データを入力し、所
定の処理を施して図形データを生成する演算部と、を備
え、前記第１の記憶ブロックは前記外部記憶装置に接続
して頂点データを受信し、前記第１の記憶ブロックへの
データ受信が終了すると、前記第１の記憶ブロックは前
記データ保持部に接続して前記頂点データを出力し、前
記演算部は所定の処理を施し、処理結果は前記第１の記
憶ブロックに書き戻され、前記第２の記憶ブロックは、
前記外部記憶装置に接続して次に処理する頂点データを
受信し、前記第２の記憶ブロックへのデータ受信が終了
すると、前記第１の記憶ブロックは、前記外部記憶装置
に接続し、処理結果を前記外部記憶装置への書き戻し、
次に処理する頂点データを受信し、前記第２の記憶ブロ
ックはデータ保持部に接続して前記頂点データを出力
し、前記演算部は所定の処理を施し、処理結果は前記第
２の記憶ブロックに書き戻されるようにしたことであ
る。 To achieve the above object, a feature of the first invention is that a predetermined geometric operation is performed on vertex data of a figure stored in an external storage device and represented by homogeneous coordinates. In the coordinate conversion processing device that performs processing,
An internal storage unit that is divided into a plurality of storage blocks and is capable of inputting / outputting data for each storage block, and holds the predetermined vertex data from the external storage device by inputting to the storage block. An internal storage unit that switches the connection destination to a data holding unit and outputs the vertex data,
A data holding unit for temporarily storing a part of the vertex data stored in a predetermined storage block of the internal storage unit;
An arithmetic unit for inputting vertex data stored in the data holding unit and performing predetermined processing to generate graphic data, wherein the first storage block is connected to the external storage device.
To receive the vertex data and store the data in the first storage block.
When the data reception is completed, the first storage block is
Connected to the data holding unit to output the vertex data,
The calculation unit performs a predetermined process, and the processing result is the first record.
Stored in the memory block, and the second memory block is
Connects to the external storage device and processes the next vertex data
And data reception to the second storage block is completed.
Then, the first storage block is the external storage device.
, And write back the processing result to the external storage device,
The vertex data to be processed next is received, and the second storage block is received.
Output the vertex data by connecting to the data holding unit
However, the arithmetic unit performs a predetermined process, and the processing result is the first
So that it can be written back to the second memory block.
It

【００２３】上記発明の構成では、第２の記憶装置は独
立にアクセス可能な複数の記憶ブロックに分割されてお
り、これらの記憶ブロックの内の幾つかは、外部記憶装
置と接続して前記データ転送装置により高速に前記図形
データの転送を行い、一方、外部記憶装置に接続されて
いない記憶ブロックの内の幾つかは、レジスタファイル
や演算器に接続され、記憶ブロックに記憶された図形デ
ータに対して目的の処理を施す。さらに目的の処理、デ
ータ転送が終了すると、複数の記憶ブロックの内の前述
のレジスタファイルや演算器に接続されていた記憶ブロ
ックは、今度は外部記憶装置と接続してデータ転送装置
により高速に図形データの転送を行い、一方、前述の外
部記憶装置と接続してデータ転送装置により高速に図形
データの転送を行っていた記憶ブロックは、レジスタフ
ァイル及び演算器に接続され、記憶ブロックに記憶され
た図形データに対して目的の処理を施すようにしてあ
る。このように記憶ブロックは、外部記憶装置または、
レジスタファイル及び演算器に排他的に接続され、大量
のデータ転送と演算処理を高速かつ並行して実行するこ
とができるのである。In the configuration of the above invention, the second storage device is divided into a plurality of storage blocks that can be accessed independently, and some of these storage blocks are connected to an external storage device to store the data. The graphic data is transferred at high speed by the transfer device. On the other hand, some of the memory blocks not connected to the external storage device are connected to the register file or the arithmetic unit and converted to the graphic data stored in the memory block. On the other hand, the intended processing is performed. When the target processing and data transfer are completed, the memory block connected to the above-mentioned register file or arithmetic unit among the plurality of memory blocks is connected to the external memory device this time, and the data transfer device performs high-speed graphic processing. The memory block that was transferring data, on the other hand, was connected to the external memory device and was transferring the graphic data at high speed by the data transfer device, was connected to the register file and the arithmetic unit, and was stored in the memory block. The intended processing is applied to the graphic data. In this way, the storage block is an external storage device, or
It is exclusively connected to the register file and the arithmetic unit, and can transfer a large amount of data and arithmetic processing at high speed and in parallel.

【００２４】また、第２の発明の特徴は、少なくとも同
次座標系のｘ，ｙ，ｚに対応するｘ，ｙ，ｚの積和演算
を行う積和演算器と、少なくとも一つの除算器と、少な
くとも同次座標系のｘ，ｙ，ｚに対応する図形の頂点デ
ータを記憶する第１、第２、第３のレジスタファイル
と、前記積和演算器および前記除算器と前記レジスタフ
ァイルとを接続し、前記積和演算器、前記除算器に第１
のオペランドデータを供給する第１のバス・ネットワー
クと、前記積和演算器および前記除算器と前記レジスタ
ファイルとを接続し、前記積和演算器、前記除算器に第
２のオペランドデータを供給する第２のバス・ネットワ
ークと、前記積和演算器および前記除算器と前記レジス
タファイルとを接続し、前記積和演算器、前記除算器の
演算結果を前記レジスタファイルに書き戻す第３のバス
・ネットワークと、を備え、前記第１、第２、第３のレ
ジスタファイルの各第１の読み出しポートは、前記第１
のバス・ネットワークにより、対応する前記第１、第
２、第３の積和演算器および前記除算器の第１オペラン
ドの入力端子と、前記第１のバス・ネットワークにより
接続され、前記第１、第２、第３のレジスタファイルの
各第２の読み出しポートは、前記第１、第２、第３の各
々の積和演算器の第２のオペランド入力端子および前記
除算器の第２のオペランドの入力端子と、クロスバ・ス
イッチを含む前記第２のバス・ネットワークにより接続
され、前記第１、第２、第３の各々の積和演算器および
前記除算器の第２のオペランドの入力端子と前記第１、
第２、第３のレジスタファイルの各第２の読み出しポー
トは、互いに排他的な組み合わせであるようなレジスタ
対演算器が１対１対応となる相互接続および特定のレジ
スタを複数の演算器に接続する１対多相互接続が可能で
あり、前記第１、第２、第３の積和演算器および前記除
算器の出力端子は、前記第１、第２、第３のレジスタフ
ァイルの各書き込みポートに接続され、前記第１、第
２、第３の積和演算器の出力端子のうち少なくとも一つ
および除算器の出力端子は、前記第１、第２、第３のレ
ジスタファイルの何れの書き込みポートにも排他的に接
続可能で、前記レジスタの所定の番地に対しても書き込
みが可能であることである。The feature of the second invention is that at least a product-sum operator for performing a product-sum operation of x, y, z corresponding to x, y, z of the homogeneous coordinate system, and at least one divider. , First, second, and third register files storing at least vertex data of figures corresponding to x, y, and z of the homogeneous coordinate system, the product-sum calculator, the divider, and the register file. Connected to the product-sum calculator and the divider, and
Connecting the first bus network for supplying the operand data to the product-sum calculator and the divider and the register file, and supplying the second operand data to the product-sum calculator and the divider. A third bus that connects the second bus network to the product-sum calculator and the divider and the register file, and writes back the calculation results of the product-sum calculator and the divider to the register file. A network, wherein each first read port of the first, second, and third register files is the first read port.
Connected by the first bus network to the input terminals of the first operands of the corresponding first, second, and third product-sum calculators and the divider, and the first, Each of the second read ports of the second and third register files has a second operand input terminal of the first, second, and third product-sum calculators and a second operand of the divider. The input terminal is connected by the second bus network including a crossbar switch, and the input terminal of the second operand of each of the first, second, and third product-sum calculators and the divider is connected to the input terminal. First,
The second read ports of the second and third register files are interconnected such that register pair arithmetic units are in a one-to-one correspondence such that they are mutually exclusive combinations, and specific registers are connected to a plurality of arithmetic units. One-to-many interconnection is possible, and the output terminals of the first, second, and third product-sum calculators and the divider are write ports of the first, second, and third register files. And at least one of the output terminals of the first, second, and third product-sum calculators and the output terminal of the divider are used to write any of the first, second, and third register files. This means that it is possible to connect exclusively to a port and to write to a predetermined address of the register.

【００２５】上記発明の構成では、図形の頂点データ
は、第１、第２のバス・ネットワークを用いて、第１、
第２、第３のレジスタファイルから対応する第１、第
２、第３の演算器に入力され、目的の演算が施され、第
３のバス・ネットワークを用いて、対応する第１、第
２、第３のレジスタファイルに書き戻されるようしてあ
る。これにより透視変換を行う行列演算に伴う内積演算
及び‘奥行き’による除算を効率よく実行することがで
きるのである。In the configuration of the above-mentioned invention, the vertex data of the graphic is first and second data using the first and second bus networks.
Input from the second and third register files to the corresponding first, second, and third arithmetic units, the target arithmetic operation is performed, and the corresponding first and second arithmetic operations are performed using the third bus network. , And is written back to the third register file. As a result, it is possible to efficiently execute the inner product operation and the division by the “depth” that accompany the matrix operation that performs the perspective transformation.

【００２６】ここで、前記第１、第２、第３の積和演算
器および前記除算器の出力端子を前記第１、第２、第３
の積和演算器および前記除算器の前記第２の入力端子に
直接接続し、演算結果を前記レジスタファイルに書き戻
す前、あるいは書き戻す処理と並行して前記第１、第
２、第３の積和演算器および前記除算器のオペランドと
して直接、前記第１、第２、第３の積和演算器および前
記除算器の前記第１の入力端子に供給する第１のバイパ
ス・ネットワークと、前記第１、第２、第３の積和演算
器および前記除算器の出力端子を前記第１、第２のバス
・ネットワークに直接接続し、演算結果を書き戻す処理
と並行して前記第１、第２、第３の積和演算器および前
記除算器のオペランドとして直接、前記第１、第２、第
３の積和演算器の前記第１、あるいは第２の入力端子に
供給する第２のバイパス・ネットワークと、をさらに上
記第２の発明の構成に備えることが好ましい。Here, the output terminals of the first, second and third product-sum calculators and the divider are the first, second and third output terminals.
Connected directly to the second input terminal of the multiply-accumulate operator and the divider, and before the operation result is written back to the register file, or in parallel with the writing back processing, the first, second, and third A first bypass network for supplying directly to the first input terminals of the first, second, and third product-sum calculators and the divider as an operand of the product-sum calculator and the divider; The output terminals of the first, second, and third product-sum calculators and the divider are directly connected to the first and second bus networks, and the first and second operations are performed in parallel with the process of writing back the operation results. A second supply supplied directly to the first or second input terminal of the first, second or third product-sum operator as an operand of the second and third product-sum operator or the divider. A bypass network, and the configuration of the second invention. It is preferably provided.

【００２７】また、第３の発明の特徴は、少なくとも同
次座標系のｘ，ｙ，ｚ，ｗに対応するｘ，ｙ，ｚ，ｗの
積和演算を行う積和演算器と、少なくとも一つの除算器
と、少なくとも同次座標系のｘ，ｙ，ｚ，ｗに対応する
図形の頂点データを記憶する第１、第２、第３、第４の
レジスタファイルと、前記積和演算器および前記除算器
と前記レジスタファイルとを接続し、前記積和演算器、
前記除算器に第１のオペランドデータを供給する第１の
バス・ネットワークと、前記積和演算器および前記除算
器と前記レジスタファイルとを接続し、前記積和演算
器、前記除算器に第２のオペランドデータを供給する第
２のバス・ネットワークと、前記積和演算器および前記
除算器と前記レジスタファイルとを接続し、前記積和演
算器、前記除算器の演算結果を前記レジスタファイルに
書き戻す第３のバス・ネットワークと、を備え、前記第
１、第２、第３、第４のレジスタファイルの各第１の読
み出しポートは、前記第１のバス・ネットワークによ
り、対応する前記第１、第２、第３、第４の積和演算器
および前記除算器の第１オペランドの入力端子と、前記
第１のバス・ネットワークにより接続され、前記第１、
第２、第３、第４のレジスタファイルの各第２の読み出
しポートは、前記第１、第２、第３、第４の各々の積和
演算器の第２のオペランド入力端子および前記除算器の
第２のオペランドの入力端子と、クロスバ・スイッチを
含む前記第２のバス・ネットワークにより接続され、前
記第１、第２、第３、第４の各々の積和演算器および前
記除算器の第２のオペランドの入力端子と前記第１、第
２、第３、第４のレジスタファイルの各第２の読み出し
ポートは、互いに排他的な組み合わせであるようなレジ
スタ対演算器が１対１対応となる相互接続および特定の
レジスタを複数の演算器に接続する１対多相互接続が可
能であり、前記第１、第２、第３、第４の積和演算器お
よび前記除算器の出力端子は、前記第１、第２、第３、
第４のレジスタファイルの各書き込みポートに接続さ
れ、前記第１、第２、第３、第４の積和演算器の出力端
子のうち少なくとも一つおよび除算器の出力端子は、前
記第１、第２、第３、第４のレジスタファイルの何れの
書き込みポートにも排他的に接続可能で、前記レジスタ
の所定の番地に対しても書き込みが可能であることであ
る。Further, a feature of the third invention is that at least one of a product-sum operator for performing a product-sum operation of x, y, z, w corresponding to x, y, z, w of the homogeneous coordinate system Two dividers, first, second, third, and fourth register files storing at least vertex data of a figure corresponding to x, y, z, and w of the homogeneous coordinate system; The divider and the register file are connected to each other, and the product-sum calculator is
A first bus network for supplying first operand data to the divider is connected to the product-sum calculator and the divider and the register file, and a second bus is connected to the product-sum calculator and the divider. Second bus network for supplying the operand data of the above, the product-sum calculator and the divider, and the register file are connected, and the calculation results of the product-sum calculator and the divider are written in the register file. A third bus network for returning, each first read port of the first, second, third, and fourth register files being associated with the first bus network by the first bus network. , The second, the third, the fourth product-sum operator and the input terminal of the first operand of the divider are connected by the first bus network, and the first,
The second read port of each of the second, third, and fourth register files has a second operand input terminal of each of the first, second, third, and fourth product-sum operators and the divider. Connected to the input terminal of the second operand of the second bus network including a crossbar switch of the first, second, third, and fourth product-sum operators and the divider. There is a one-to-one correspondence between register-pair arithmetic units in which the input terminals of the second operand and the respective second read ports of the first, second, third, and fourth register files are mutually exclusive combinations. And a one-to-many interconnection connecting a specific register to a plurality of arithmetic units are possible, and the output terminals of the first, second, third and fourth multiply-accumulate arithmetic units and the divider. Is the first, second, third,
At least one of the output terminals of the first, second, third and fourth product-sum calculators and the output terminal of the divider are connected to the respective write ports of the fourth register file, It is possible to exclusively connect to any write port of the second, third, and fourth register files, and to write to a predetermined address of the register.

【００２８】ここで、前記第１、第２、第３、第４の積
和演算器および前記除算器の出力端子を前記第１、第
２、第３、第４の積和演算器および前記除算器の前記第
２の入力端子に直接接続し、演算結果を前記レジスタフ
ァイルに書き戻す前、あるいは書き戻す処理と並行して
前記第１、第２、第３、第４の積和演算器および前記除
算器のオペランドとして直接、前記第１、第２、第３、
第４の積和演算器および前記除算器の前記第１の入力端
子に供給する第１のバイパス・ネットワークと、前記第
１、第２、第３、第４の積和演算器および前記除算器の
出力端子を前記第１、第２のバス・ネットワークに直接
接続し、演算結果を書き戻す処理と並行して前記第１、
第２、第３、第４の積和演算器および前記除算器のオペ
ランドとして直接、前記第１、第２、第３、第４の積和
演算器の前記第１、あるいは第２の入力端子に供給する
第２のバイパス・ネットワークと、を上記第３の発明の
構成にさらに備えることが好ましい。Here, the output terminals of the first, second, third, and fourth product-sum calculators and the divider are connected to the first, second, third, and fourth product-sum calculators and the output terminals of the dividers. The first, second, third, and fourth product-sum calculators, which are directly connected to the second input terminal of the divider, before the operation results are written back to the register file, or in parallel with the write-back processing. And directly as the operand of the divider, the first, second, third,
A fourth bypass-sum network for supplying to the first input terminal of the fourth product-sum calculator and the divider, and the first, second, third, and fourth product-sum calculators and the divider. Of the first and second bus networks are directly connected to the first and second bus networks, and the first and
Directly as an operand of the second, third, fourth product-sum calculator and the divider, the first or second input terminal of the first, second, third, fourth product-sum calculator It is preferable to further include a second bypass network for supplying to the above-mentioned third invention.

【００２９】[0029]

【００３０】[0030]

【００３１】[0031]

【００３２】[0032]

【発明の実施の形態】図１は、本発明の座標変換処理装
置（以下ＧＴＥ：Graphic TranslateEngine）の構成を
示すブロックである。まず、図１を参照して提案する計
算方式の実施形態の構成を説明する。この座標変換処理
装置１００は、外部記憶装置２００より所定の頂点デー
タを記憶ブロックに入力して保持し、接続先をデータ保
持部１２０に切り替えて頂点データを出力する内部記憶
部１１０と、内部記憶部１１０の所定の記憶ブロックに
記憶された前記頂点データの一部を一時的に記憶するデ
ータ保持部１２０と、データ保持部１２０に記憶された
頂点データを入力し、所定の処理を施して図形データを
生成する演算部１３０と、を備え、内部記憶部１１０の
記憶ブロックは演算部１３０にて生成された図形データ
を入力し、接続先を外部記憶装置２００に切り替えて前
記図形データを出力するように構成してある。Figure 1 DETAILED DESCRIPTION OF THE INVENTION, the coordinate conversion processing device of the present invention (hereinafter GTE: Graphic Tran s lateEngine) is a block diagram showing the configuration of. First, the configuration of the embodiment of the proposed calculation method will be described with reference to FIG. The coordinate conversion processing device 100 inputs predetermined vertex data from an external storage device 200 to a storage block and holds the same, and switches a connection destination to a data holding unit 120 to output the vertex data, and an internal storage unit 110. The data holding unit 120 that temporarily stores a part of the vertex data stored in a predetermined storage block of the unit 110, and the vertex data stored in the data holding unit 120 are input, and a predetermined process is performed to draw a figure. The storage block of the internal storage unit 110 receives the graphic data generated by the calculation unit 130, switches the connection destination to the external storage device 200, and outputs the graphic data. It is configured as follows.

【００３３】図２に内部記憶部１１０の構成を示す。Ｄ
ＭＡコントローラ１１１（後述）で外部の記憶装置２０
０と接続している。内部記憶装置１１２は２Ｍバイトの
記憶装置で、２つのバンク１１２ａ, １１２ｂ（各１Ｍ
バイト）より構成されており、これらのバンクは、ロー
ド・ストア・ユニット１２１を通して演算部およびレジ
スタファイル、あるいはＤＭＡコントローラ１１１を通
して記憶装置と接続している。これらの接続は、互いに
排他的に接続され、どちらか一方だけが接続されている
ことはない。アドレス・ジェネレータ１１３は、内部記
憶装置１１２をアクセスするためのアドレスを生成す
る。FIG. 2 shows the configuration of the internal storage unit 110. D
An external storage device 20 using the MA controller 111 (described later)
It is connected to 0. The internal storage device 112 is a 2 Mbyte storage device and has two banks 112 a and 112 b (each having 1 Mbytes).
These banks are connected to the arithmetic unit and the register file through the load / store unit 121 or the storage device through the DMA controller 111. These connections are exclusively connected to each other, and only one of them is not connected. The address generator 113 generates an address for accessing the internal storage device 112.

【００３４】図３にデータ保持部１２０の構成を示す。
ロード・ストア・ユニット１２１は、内部記憶装置１１
２とレジスタファイル１２２を１２８ビット（３２ビッ
トｘ４）の高バンド幅バス１１４によって結び、データ
の相互転送を行う。レジスタ・ファイル１１２は、バン
ク０〜バンク３の４つのバンクに分割された３２ビット
ｘ６４のレジスタファイルである。レジスタ番号ｎ（６
４＞ｎ≧０）のレジスタは、バンク番号が（ｎｍｏｄ
４）のバンクに属する（ここでａｍｏｄｂは、ａ
をｂで割った剰りを表す）。ここで各バンクは同次座標
系の（ｘ，ｙ，ｚ，ｗ）に対応する。すなわち、バンク
０はｘ，バンク１はｙ，バンク２はｚ，バンク３はｗに
対応する。FIG. 3 shows the structure of the data holding unit 120.
The load / store unit 121 is the internal storage device 11.
2 and the register file 122 are connected by a 128-bit (32-bit × 4) high-bandwidth bus 114 to mutually transfer data. The register file 112 is a 32-bit x64 register file divided into four banks, Bank 0 to Bank 3. Register number n (6
Registers with 4> n ≧ 0 have bank numbers (n mod
4) belongs to the bank (where a modb is a
Represents the remainder divided by b). Here, each bank corresponds to (x, y, z, w) in the homogeneous coordinate system. That is, bank 0 corresponds to x, bank 1 corresponds to y, bank 2 corresponds to z, and bank 3 corresponds to w.

【００３５】図４に演算部１３０の構成を示す。構成部
材１３１ないし１３４は、積和演算器であり、３ステー
ジのパイプライン構成である。レジスタ同様、同次座標
系の（ｘ，ｙ，ｚ，ｗ）に対応する。すなわち、演算器
１３１はｘ，演算器１３２はｙ，演算器１３３はｚ，演
算器１３４はｗに対応する。演算器１３５は、除算・平
方根演算器である。６サイクルで演算を終了する。ポー
ト１３６は、外部のプロセッサとデータを転送する入出
力ポートである。６４ビットのバスで接続されている。FIG. 4 shows the configuration of the arithmetic unit 130. The constituent members 131 to 134 are sum-of-products arithmetic units and have a three-stage pipeline structure. Like the register, it corresponds to (x, y, z, w) in the homogeneous coordinate system. That is, the arithmetic unit 131 corresponds to x, the arithmetic unit 132 corresponds to y, the arithmetic unit 133 corresponds to z, and the arithmetic unit 134 corresponds to w. The calculator 135 is a division / square root calculator. The calculation is completed in 6 cycles. The port 136 is an input / output port that transfers data with an external processor. It is connected by a 64-bit bus.

【００３６】バス１４１は、レジスタファイル１２２と
演算部１３０、ロード・ストア・ユニット１２１を相互
接続するバス・ネットワークである。バスは、３２ビッ
トｘ４（１２８ビット）の幅を持ち、同次座標系の
（ｘ，ｙ，ｚ，ｗ）に対応するレジスタファイル１２２
と演算部１３０をクロスバ・スイッチ１９０ａを介して
接続している。クロスバ・スイッチは、レジスタファイ
ル１２２と演算部１３０、ロード・ストア・ユニット１
２１を排他的であれば如何なる組み合わせにも接続を可
能にする。The bus 141 is a bus network for interconnecting the register file 122, the arithmetic unit 130, and the load / store unit 121. The bus has a width of 32 bits x 4 (128 bits), and the register file 122 corresponding to (x, y, z, w) in the homogeneous coordinate system.
And the arithmetic unit 130 are connected via a crossbar switch 190a. The crossbar switch includes the register file 122, the arithmetic unit 130, and the load / store unit 1.
If 21 is exclusive, connection is possible to any combination.

【００３７】バス１４２は、レジスタファイル１２２と
演算器（ポート１３６を除く）１３０を相互接続するバ
ス・ネットワークである。バスは、３２ビットｘ４（１
２８ビット）の幅を持ち、同次座標系の（ｘ，ｙ，ｚ，
ｗ）に対応するレジスタファイル１２２と演算部１３０
を対応づけて接続する。The bus 142 is a bus network for interconnecting the register file 122 and the arithmetic unit (excluding the port 136) 130. The bus is 32 bits x 4 (1
It has a width of 28 bits, and has (x, y, z,
w) corresponding register file 122 and operation unit 130
And connect.

【００３８】バス１４３は、レジスタファイル１２２と
演算部１３０、ロード・ストア・ユニット１２２を相互
接続するバス・ネットワークである。バスは、３２ビッ
トｘ４（１２８ビット）の幅を持ち、同次座標系の
（ｘ，ｙ，ｚ，ｗ）に対応するレジスタファイル１２２
と演算部１３０、ロード・ストア・ユニット１２１をク
ロスバ・スイッチ１９０ｂを介して接続している。スイ
ッチ１９０ｂは、レジスタファイル１２２と演算部１３
０、ロード・ストア・ユニット１２１を排他的に接続を
可能にする。演算部１３０の演算結果は、対応するレジ
スタファイル１２２にのみ書き戻すことが可能である。
一方、演算器１３４，１３５、ポート１３６、ロード・
ストア・ユニット１２１の値は、レジスタ・ファイルの
如何なる番地にも書き込むことが可能になっている。以
上が、本実施形態の構成である。次に本実施形態に於け
るデータ転送、座標変換処理の動作について説明する。The bus 143 is a bus network that interconnects the register file 122, the arithmetic unit 130, and the load / store unit 122. The bus has a width of 32 bits x 4 (128 bits), and the register file 122 corresponding to (x, y, z, w) in the homogeneous coordinate system.
The calculation unit 130 and the load / store unit 121 are connected to each other via the crossbar switch 190b. The switch 190b includes the register file 122 and the arithmetic unit 13.
0, the load / store unit 121 can be exclusively connected. The calculation result of the calculation unit 130 can be written back only to the corresponding register file 122.
On the other hand, arithmetic units 134 and 135, port 136, load
The value of the store unit 121 can be written to any address in the register file. The above is the configuration of the present embodiment. Next, the operation of data transfer and coordinate conversion processing in this embodiment will be described.

【００３９】まず、外部記憶装置２００と座標変換処理
装置１００との間のデータ転送に関して説明する。外部
記憶装置２００は、図形の頂点座標データ、色情報のほ
か、テクスチャ情報などが記憶されている。また、プロ
セッサの汎用記憶装置として用いられる。内部記憶装置
１１０のうち、１つのバンク１１２ｂは、ＤＭＡコント
ローラ１１１を通して、外部記憶装置に接続し、ＤＭＡ
コントローラ１１１の制御のもと、高速に必要な図形デ
ータが内部記憶装置のバンク１１２ｂに転送される。こ
の間、バンク１１２ａは、ロード・ストアユニット１２
１を通してバス１４１，１４３に接続し、内部演算部１
３０、レジスタファイル１２２と接続している。First, the data transfer between the external storage device 200 and the coordinate conversion processing device 100 will be described. The external storage device 200 stores vertex information of graphics, color information, texture information, and the like. It is also used as a general-purpose storage device of the processor. One bank 112b of the internal storage device 110 is connected to an external storage device through the DMA controller 111, and
Under the control of the controller 111, required graphic data is transferred at high speed to the bank 112b of the internal storage device. During this time, the bank 112a keeps the load / store unit 12
1 to connect to buses 1 4 1, 1 4 3 and
30 and the register file 122.

【００４０】バンク１１２ｂに必要なデータの転送が終
了すると、バンク１１２ｂはロード・ストアユニットを
通してバス１４１，１４３に接続し、演算部１３０、レ
ジスタファイル１２２と接続する。必要なデータは、バ
ンク１１２ｂよりレジスタファイル１２２に転送され、
演算部１３０によって所定の処理が施される。処理結果
は、レジスタファイル１２２を通してバンク１１２ｂに
書き戻される。一方、バンク１１２ａは、ＤＭＡコント
ローラ１１１を通して、外部記憶装置に接続し、ＤＭＡ
コントローラ１１１の制御のもと、高速に必要な図形デ
ータが内部記憶装置のバンク１１２ａに転送される。[0040] Once the data necessary for the bank 112b transfer terminates, the bank 112b is connected through a load store unit to the bus 1 4 1,1 4 3, operation unit 130, connected to the register file 122. The necessary data is transferred from the bank 112b to the register file 122,
The arithmetic unit 130 performs a predetermined process. The processing result is written back to the bank 112b through the register file 122. On the other hand, the bank 112a is connected to an external storage device through the DMA controller 111,
Under the control of the controller 111, required graphic data is transferred at high speed to the bank 112a of the internal storage device.

【００４１】バンク１１２ｂのデータに対しての処理、
およびバンク１１２ａへのデータ転送が終了すると、再
度バンク１１２ｂは外部記憶装置に接続し、ＤＭＡコン
トローラ１１１の制御のもと、処理結果を外部記憶装置
への書き戻し、次に処理する図形データの転送を行う。
一方、バンク１１２ａは、ロード・ストアユニットを通
してバス１４１，１４３に接続し、演算部１３０、レジ
スタファイル１２２と接続する。必要なデータは、バン
ク１１２ｂよりレジスタファイル１２２に転送され、演
算部１３０によって所定の処理が施される。処理結果
は、レジスタファイル１２２を通してバンク１１２ａに
書き戻される。Processing on the data of the bank 112b,
When the data transfer to the bank 112a is completed, the bank 112b is connected to the external storage device again, and the processing result is written back to the external storage device under the control of the DMA controller 111, and the graphic data to be processed next is transferred. I do.
On the other hand, the bank 112a is connected to the buses 141 and 143 through the load / store unit, and is connected to the arithmetic unit 130 and the register file 122. The necessary data is transferred from the bank 112b to the register file 122, and the arithmetic unit 130 performs a predetermined process. The processing result is written back to the bank 112a through the register file 122.

【００４２】以下、上述のように２つのバンクをデータ
の転送、演算処理に交互に割り当てることによって、２
つの処理を並列かつ高速に実行することができる。ま
た、複雑な制御や特殊な記憶装置を必要としないので、
低コストで十分な量の内部記憶装置を実装することが可
能である。Hereinafter, as described above, the two banks are alternately assigned to the data transfer and the arithmetic processing, thereby
Two processes can be executed in parallel and at high speed. Also, because it does not require complicated controls or special storage devices,
It is possible to implement a sufficient amount of internal storage device at low cost.

【００４３】次に、本実施形態で透視変換を行う例を示
す。今、入力 (x, y, z, 1) を、変換されるべき頂点座
標とすると、透視変換は、以下のようにして行われ、
(X, Y) 透視変換後のスクリーンX 座標・Y 座標を出力
する。Next, an example of performing perspective transformation in this embodiment will be described. Now, letting the input (x, y, z, 1) be the vertex coordinates to be transformed, the perspective transformation is performed as follows,
(X, Y) Output screen X and Y coordinates after perspective transformation.

【００４４】 ( １) (x′, y′, w′)＝（x, y, z, 1) × ( a, b, c ) （ d, e, f ) （ g, h, i ) （ j, k, l ) ＝(ax+dy+gz+j, bx+ey+hz+k, cx+fy+iz+l) ( ２) W = 1/w′ ( ３) (X, Y) = (x′, y′) × W 以上の処理を独立三角形（頂点数３）に適用したプログ
ラムの例を次に示す。通常、３次元図形データは、独立
三角形の集合として扱われるので、以下に示すプログラ
ムの繰り返して、処理することになる。ここで、行列
は、座標変換・透視変換の積行列、乗算、積和演算のレ
イテンシは３、スループットは１、除算のレイテンシは
６、スループットは５、最後の命令のレイテンシは考慮
に入れないものとする。また、入力頂点データは内部記
憶装置からロードされ、座標変換結果は固定小数点に変
換してから内部記憶装置上ににストアする。(1) (x ', y', w ') = (x, y, z, 1) x (a, b, c) (d, e, f) (g, h, i) (j , k, l) ＝ (ax + dy + gz + j, bx + ey + hz + k, cx + fy + iz + l) (2) W = 1 / w ′ (3) (X, Y) = ( x ′, y ′) × W The following is an example of a program that applies the above processing to an independent triangle (3 vertices). Normally, three-dimensional graphic data is treated as a set of independent triangles, and therefore the program shown below is repeatedly processed. Here, the matrix is a product matrix of coordinate transformation / perspective transformation, the latency of multiplication and product-sum operation is 3, throughput is 1, latency of division is 6, throughput is 5, latency of the last instruction is not taken into consideration. And Further, the input vertex data is loaded from the internal storage device, and the coordinate conversion result is converted into a fixed point and then stored in the internal storage device.

【００４５】以下、プログラムに現れる記号、ニーモニ
ックについて、簡単に述べる。The symbols and mnemonics appearing in the program will be briefly described below.

【００４６】記号／ R* : CPU register GR* : GTE floating register IR : GTE integer register ニーモニック／ GMACn : 積和演算命令、アキュームレータに書き戻す GMACFn : 積和演算命令、レジスタファイルに書き戻す GMULAn : 乗算命令、アキュムレータに書き戻す GDIV : 除算命令 GFTOIn : 浮動小数点→固定小数点変換命令 GSWn : ストア命令 GLWn : ロード命令ここでｎは、同時に動作する演算器の数を表す。例え
ば、ＧＭＡＣ４は、レジスタファイル１２２より４つの
演算部１３０の各演算器に独立にデータが入力され、ま
た演算結果は対応する４つのレジスタファイル１２２に
書き戻される。Symbol / R *: CPU register GR *: GTE floating register IR: GTE integer register mnemonic / GMACn: Multiply-add operation instruction, write back to accumulator GMACFn: Multiply-add operation instruction, write back to register file GMULAn: Multiply instruction , Write back to accumulator GDIV: Division instruction GFTOIn: Floating-point to fixed-point conversion instruction GSWn: Store instruction GLWn: Load instruction Here, n represents the number of arithmetic units operating simultaneously. For example, in the GMAC4, data is independently input from the register file 122 to each arithmetic unit of the four arithmetic units 130, and the arithmetic result is written back to the corresponding four register files 122.

【００４７】また、レジスタファイル１２２には、以下
のように各データが格納される。Each data is stored in the register file 122 as follows.

【００４８】 ; register map ; GR00, GR01, GR02, GR03, ; 頂点1(x, y, z, 1) 座標 ; GR04, GR05, GR06, GR07, ; 頂点2(x, y, z, 1) 座標 ; GR08, GR09, GR10, GR11, ; 頂点3(x, y, z, 1) 座標 ; GR12, GR13, GR14, GR15, ; 640, 480, 0, 1 ( 定数置場) ; GR16, GR17, GR18, GR19, ; 頂点1 tmp 座標(x′, y′, z′), 1/z ; GR20, GR21, GR22, GR23, ; 頂点2 tmp 座標(x′, y′, z′), 1/z ; GR24, GR25, GR26, GR27, ; 頂点3 tmp 座標(x′, y′, z′), 1/z ; GR28, GR29, GR30, GR31, ; ; GR32, GR33, GR34, GR35, ; 座標・透視変換行列 ; GR36, GR37, GR38, GR39, ; 座標・透視変換行列 ; GR40, GR41, GR42, GR43, ; 座標・透視変換行列 ; GR44, GR45, GR46, GR47, ; 座標・透視変換行列 ; GR48, GR49, GR50, GR51, ; 最終結果(x", y") 頂点1 ; GR52, GR53, GR54, GR55, ; 最終結果(x", y") 頂点2 ; GR56, GR57, GR58, GR59, ; 最終結果(x", y") 頂点3 ; GR60, GR61, GR62, GR63, ; 以下に最適化を行わない場合のプログラムを示す。[0048] register map ; GR00, GR01, GR02, GR03,; vertex 1 (x, y, z, 1) coordinates ; GR04, GR05, GR06, GR07,; Vertex 2 (x, y, z, 1) coordinates ; GR08, GR09, GR10, GR11,; vertex 3 (x, y, z, 1) coordinates GR12, GR13, GR14, GR15,; 640, 480, 0, 1 (constant storage) ; GR16, GR17, GR18, GR19,; vertex 1 tmp coordinate (x ′, y ′, z ′), 1 / z GR20, GR21, GR22, GR23,; vertex 2 tmp coordinates (x ′, y ′, z ′), 1 / z ; GR24, GR25, GR26, GR27,; vertex 3 tmp coordinates (x ′, y ′, z ′), 1 / z GR28, GR29, GR30, GR31,; GR32, GR33, GR34, GR35 ,; Coordinate / perspective transformation matrix GR36, GR37, GR38, GR39 ,; Coordinate / perspective transformation matrix GR40, GR41, GR42, GR43 ,; Coordinate / perspective transformation matrix GR44, GR45, GR46, GR47 ,; Coordinate / perspective transformation matrix GR48, GR49, GR50, GR51,; Final result (x ", y") Vertex 1 GR52, GR53, GR54, GR55 ,; Final result (x ", y") Vertex 2 GR56, GR57, GR58, GR59,; final result (x ", y") vertex 3 GR60, GR61, GR62, GR63,; The following shows the program when optimization is not performed.

【００４９】 ; 頂点1 -------------------- GLW4 GR(00-03), 0x00(IR1) ; V1 : 頂点1 座標ロード GMULA3 GR(32-34), GR00 ; V1 : 変換頂点1 x & ACC clear GMAC3 GR(36-38), GR01 ; V1 : 変換頂点1 y GMAC3 GR(40-42), GR02 ; V1 : 変換頂点1 z GMACF3 GR(16-18), GR(44-46), GR03 ; V1 : 平行移動要素(GR03=1) GDIV GR19, GR15, GR18 ; V1 : 除算実行(GR15=1) GMUL2 GR(48-49), GR(16-17), GR19 ; V1 : (x′, y′)×1/z GFTOI2 GR(48-49), GR(48-49), FM1 ; V1 : 固定小数点変換 GSW2 GR(48-49), 0x10(IR2) ; V1 : GPU 前処理部へストア ; 頂点2 -------------------- GLW4 GR(00-03), 0x10(IR1) ; V2 : 頂点1 座標ロード GMULA3 GR(32-34), GR04 ; V2 : 変換頂点2 x & ACC clear GMAC3 GR(36-38), GR05 ; V2 : 変換頂点2 y GMAC3 GR(40-42), GR06 ; V2 : 変換頂点2 z GMACF3 GR(20-22), GR(44-46), GR07 ; V2 : 平行移動要素(GR07=1) GDIV GR23, GR15, GR22 ; V2 : 除算実行(GR15=1) GMUL2 GR(52-53), GR(20-21), GR23 ; V2 : (x′, y′)×1/z GFTOI2 GR(52-53), GR(52-53), FM1 ; V2 : 固定小数点変換 GSW2 GR(52-53), 0x10(IR2) ; V2 : GPU 前処理部へストア ; 頂点3 -------------------- GLW4 GR(00-03), 0x20(IR1) ; V3 : 頂点1 座標ロード GMULA3 GR(32-34), GR08 ; V3 : 変換頂点3 x & ACC clear GMAC3 GR(36-38), GR09 ; V3 : 変換頂点3 y GMAC3 GR(40-42), GR10 ; V3 : 変換頂点3 z GMACF3 GR(24-26), GR(44-46), GR11 ; V3 : 平行移動要素(GR11=1) GDIV GR27, GR15, GR26 ; V3 : 除算実行(GR15=1) GMUL2 GR(56-57), GR(24-25), GR27 ; V3 : (x′, y′)×1/z GFTOI2 GR(56-57), GR(56-57), FM1 ; V3 : 固定小数点変換 GSW2 GR(56-57), 0x20(IR2) ; V3 : GPU 前処理部へストア以下は、レイテンシ、スループットを考慮して、最適化
を行った場合のプログラムである。但し、以下のプログ
ラムでは、データのロード・ストア・固定小数点への変
換処理は省略してある。Vertex 1 -------------------- GLW4 GR (00-03), 0x00 (IR1); V1: Vertex 1 coordinate load GMULA3 GR (32- 34), GR00; V1: Transform vertex 1 x & ACC clear GMAC3 GR (36-38), GR01; V1: Transform vertex 1 y GMAC3 GR (40-42), GR02; V1: Transform vertex 1 z GMACF3 GR (16 -18), GR (44-46), GR03; V1: Translating element (GR03 = 1) GDIV GR19, GR15, GR18; V1: Division execution (GR15 = 1) GMUL2 GR (48-49), GR (16 -17), GR19; V1: (x ′, y ′) × 1 / z GFTOI2 GR (48-49), GR (48-49), FM1; V1: Fixed point conversion GSW2 GR (48-49), 0x10 (IR2); V1: Store to GPU preprocessor; Vertex 2 -------------------- GLW4 GR (00-03), 0x10 (IR1); V2: Vertex 1 coordinate load GMULA3 GR (32-34), GR04; V2: Transform vertex 2 x & ACC clear GMAC3 GR (36-38), GR05; V2: Transform vertex 2 y GMAC3 GR (40-42), GR06; V2 : Transform vertex 2 z GMACF3 GR (20-22), GR (44-46), GR07; V2: Translating element (GR07 = 1) GDIV GR23, GR15, GR22; V2: Division execution (GR15 = 1) GMUL2 GR (52-53), GR (20-21), GR23; V2: (x ′, y ′) × 1 / z GFTOI2 GR (52-53), GR (52-53), FM1; V2: Fixed point conversion GSW2 GR (52-53), 0x10 (IR2); V2: Store to GPU preprocessor; Vertex 3 ------------------- -GLW4 GR (00-03), 0x20 (IR1); V3: Load vertex 1 coordinate GMULA3 GR (32-34), GR08; V3: Convert vertex 3 x & ACC clear GMAC3 GR (36-38), GR09; V3 : Transform vertex 3 y GMAC3 GR (40-42), GR10; V3: Transform vertex 3 z GMACF3 GR (24-26), GR (44-46), GR11; V3: Translating element (GR11 = 1) GDIV GR27 , GR15, GR26; V3: Division execution (GR15 = 1) GMUL2 GR (56-57), GR (24-25), GR27; V3: (x ′, y ′) × 1 / z GFTOI2 GR (56-57 ), GR (56-57), FM1; V3: Fixed-point conversion GSW2 GR (56-57), 0x20 (IR2); V3: Store to GPU preprocessor The following is optimized considering latency and throughput. It is a program when you do. However, in the following program, the data loading / storing / conversion to fixed point is omitted.

【００５０】 GMULA3 GR(32-34), GR00 ; V1 : 変換頂点1 x & ACC clear GMAC3 GR(36-38), GR01 ; V1 : 変換頂点1 y GMAC3 GR(40-42), GR02 ; V1 : 変換頂点1 z GMACF3 GR(16-18), GR(44-46), GR03 ; V1 : 平行移動要素(GR03=1) GMULA3 GR(32-34), GR04 ; V2 : 変換頂点2 x & ACC clear GMAC3 GR(36-38), GR05 ; V2 : 変換頂点2 y GMAC3 GR(40-42), GR06 ; V2 : 変換頂点2 z GDIV GR19, GR15, GR18 ; V1 : 除算実行(GR15=1) GMACF3 GR(20-22), GR(44-46), GR07 ; V2 : 平行移動要素(GR07=1) GMULA3 GR(32-34), GR08 ; V3 : 変換頂点3 x & ACC clear GMAC3 GR(36-38), GR09 ; V3 : 変換頂点3 y GMAC3 GR(40-42), GR10 ; V3 : 変換頂点3 z GDIV GR23, GR15, GR22 ; V2 : 除算実行(GR15=1) GMACF3 GR(24-26), GR(44-46), GR11 ; V3 : 平行移動要素(GR11=1) -- stall -- stall GMUL2 GR(48-49), GR(16-17), GR19 ; V1 : (x′, y′) ×1/z GDIV GR27, GR15, GR26 ; V3 : 除算実行(GR15=1) -- stall -- stall GMUL2 GR(52-53), GR(20-21), GR23 ; V2 : (x′, y′) ×1/z -- stall -- stall GMUL2 GR(56-57), GR(24-25), GR27 ; V3 : (x′, y′) ×1/z 以上のプログラムを実行した場合のタイミングを図５，
図６に示す。このように本発明をＧＴＥに適用すること
により、座標変換を行う行列演算に伴う内積演算を効率
よく実行することができる。特に除算と行列演算を並列
に実行できるので、複数の演算器の能力を無駄にするこ
とがない。GMULA3 GR (32-34), GR00; V1: Conversion vertex 1 x & ACC clear GMAC3 GR (36-38), GR01; V1: Conversion vertex 1 y GMAC3 GR (40-42), GR02; V1: Transform Vertex 1 z GMACF3 GR (16-18), GR (44-46), GR03; V1:: Translating element (GR03 = 1) GMULA3 GR (32-34), GR04; V2: Transform Vertex 2 x & ACC clear GMAC3 GR (36-38), GR05; V2: Transform vertex 2 y GMAC3 GR (40-42), GR06; V2: Transform vertex 2 z GDIV GR19, GR15, GR18; V1: Perform division (GR15 = 1) GMACF3 GR (20-22), GR (44-46), GR07; V2: Translating element (GR07 = 1) GMULA3 GR (32-34), GR08; V3: Transform vertex 3 x & ACC clear GMAC3 GR (36-38 ), GR09; V3: Transform vertex 3 y GMAC3 GR (40-42), GR10; V3: Transform vertex 3 z GDIV GR23, GR15, GR22; V2: Perform division (GR15 = 1) GMACF3 GR (24-26), GR (44-46), GR11; V3: Translating element (GR11 = 1) --stall --stall GMUL2 GR (48-49), GR (16-17), GR19; V1: (x ′, y ′ ) × 1 / z GDIV GR27, GR15, GR26; V3: Division execution (GR15 = 1) --stall --stall GMUL2 GR (52-53), GR (20-21), GR23; V2: (x ′, y ′) × 1 / z --stall --stall GM UL2 GR (56-57), GR (24-25), GR27; V3: (x ′, y ′) × 1 / z The timing when the program above is executed is shown in Fig.5.
As shown in FIG. By applying the present invention to GTE in this way, it is possible to efficiently execute the inner product operation that accompanies the matrix operation that performs coordinate conversion. In particular, since division and matrix operation can be executed in parallel, the capacity of a plurality of arithmetic units is not wasted.

【００５１】すなわち、本構成を採用することによっ
て、（１）図形データの記憶装置と座標変換処理装置間
の効率良くデータ転送を行うこと、（２）座標変換を行
う行列演算に伴う内積演算を効率よく実行すること、が
可能な座標変換処理装置（ＧＴＥ）を提供することがで
きる。That is, by adopting this configuration, (1) efficient data transfer between the graphic data storage device and the coordinate conversion processing device, and (2) inner product calculation accompanying matrix calculation for coordinate conversion are performed. It is possible to provide a coordinate transformation processing device (GTE) capable of executing efficiently.

【００５２】図７は、第４の発明の構成を示すブロック
である。図７を参照して、提案する演算方式の実施形態
の構成、および動作について説明する。また、本浮動小
数点演算器で扱われる浮動小数点数は、ＩＥＥＥ７５４
浮動小数点演算規格に定められた単精度数（３２ビッ
ト）とする。FIG. 7 is a block diagram showing the configuration of the fourth invention. The configuration and operation of the embodiment of the proposed arithmetic method will be described with reference to FIG. 7. The floating point numbers handled by this floating point unit are IEEE754.
It is a single precision number (32 bits) defined in the floating point arithmetic standard.

【００５３】構成部材７０１は、本発明の機能を含む浮
動小数点演算器であり、前記座標変処理装置を構成する
演算器の一つである。演算器７０１は、符号部判定手段
７０２、指数部判定手段７０３、定数生成手段７０４を
含む。符号部判定手段７０２は、入力された浮動小数点
数の符号部の値により、入力値が正であるか、負である
かを判定する。実施形態では、‘１’ならば負、
‘０’ならば正と判定する。したがって実施形態で
は、具体的な回路は必要とせず、符号信号をそのまま用
いることができる。判定結果は、定数生成手段７０４に
入力される。The constituent member 701 is a floating point arithmetic unit including the functions of the present invention, and is one of the arithmetic units constituting the coordinate transformation processing device. The arithmetic unit 701 includes a code part determination means 702, an exponent part determination means 703, and a constant generation means 704. The sign part determination unit 702 determines whether the input value is positive or negative, depending on the value of the sign part of the input floating-point number. In the embodiment, if it is “1”, it is negative,
If it is "0", it is determined to be positive. Therefore, in the embodiment, a specific circuit is not necessary and the code signal can be used as it is. The determination result is input to the constant generation unit 704.

【００５４】指数部判定手段７０３は、指数部の値によ
り、入力値の絶対値が‘１以上’であるか、否かを判定
する比較器である。入力された浮動小数点数は正規化さ
れているので、実施形態では指数部の値が‘１２７’な
らば入力値の絶対値は、仮数部（１．ｘｘｘｘｘ．．．．．） × ２⁰ （ｘは、‘０’または‘１’）を表す。したがって、指
数部の値が‘１２７’以上（≧１２７）の正規化数であ
れば、入力された浮動小数点数は、‘１以上’であると
判定される。従って、実施形態では、本指数部判定手段
は、定数１２７との大小関係を判定する比較回路（コン
パレータ）でよい。判定結果は、定数生成手段７０４に
入力される。The exponent part judging means 703 is a comparator for judging whether or not the absolute value of the input value is “1 or more” based on the value of the exponent part. Since inputted floating-point numbers are normalized, the absolute value of the input value if the value of the exponent is '127' in the embodiment, the mantissa (1.xxxxx .....) × 2 ⁰ ( x represents "0" or "1"). Therefore, if the value of the exponent part is a normalized number of 127 or more (≧ 127), the input floating-point number is determined to be 1 or more. Therefore, in the embodiment, the exponent part determination means may be a comparison circuit (comparator) that determines the magnitude relationship with the constant 127. The determination result is input to the constant generation unit 704.

【００５５】定数生成手段７０４は、符号部判定手段７
０２、指数部判定手段７０３の判断結果によって、
‘０’あるいは‘＋１’の浮動小数点数を演算結果とし
て出力する。定数生成手段７０４は、符号部判定手段７
０２の判定結果が、‘負’の場合には、‘０’を表す浮
動小数点数になるよう符号部、指数部、仮数部の３つの
フィールドの値を変更する。また、符号部判定手段７０
２の判定結果が‘正’であって、かつ指数部判定手段７
０３の判定結果が‘絶対値が１以上である’と判定した
場合には‘＋１’を表す浮動小数点数になるように符号
部、指数部、仮数部の３つのフィールドの値を変更し、
演算結果として‘＋１’を出力する。したがって、本実
施形態では、定数生成手段は、符号部判定手段、指数部
判定手段の結果により、定数‘０’か‘１’、または入
力値（入力された符号小数点数）の何れかを選択する選
択回路で構成することができる。The constant generation means 704 is the code part determination means 7
02, according to the judgment result of the exponent part judging means 703,
A floating point number of "0" or "+1" is output as the operation result. The constant generation means 704 is the code part determination means 7
When the determination result of 02 is'negative ', the values of the three fields of the sign part, the exponent part, and the mantissa part are changed so as to be a floating point number representing' 0 '. Also, the code part determination means 70
The determination result of 2 is “positive”, and the exponent determination means 7
If the determination result of 03 is'absolute value is 1 or more ', the values of the three fields of the sign part, exponent part, and mantissa part are changed so that the floating point number represents' + 1'.
Outputs "+1" as the calculation result. Therefore, in the present embodiment, the constant generation unit selects either the constant "0" or "1" or the input value (the input code decimal point number) according to the results of the sign unit determination unit and the exponent unit determination unit. It can be configured by a selection circuit.

【００５６】以上の実施形態では、単精度数に関して述
べた。倍精度数の場合は、指数部判定手段に於いて、上
記定数１２７と比較するかわりに、定数１０２３と比較
すればよい。また、以上の説明では、ＩＥＥＥ７５４規
格に基づく浮動小数点数に関して述べたが、他のフォー
マットで表現された浮動小数点数であっても同様の処理
手順で実現することができる。In the above embodiments, single precision numbers have been described. In the case of a double precision number, the exponent part determining means may compare with the constant 1023 instead of comparing with the constant 127. Further, in the above description, the floating point number based on the IEEE754 standard is described, but a floating point number expressed in another format can be realized by the same processing procedure.

【００５７】以上のように、本実施形態の座標変換処理
装置を用いれば、少ないハードウエアの追加で、入力の
値により特定の値（実施形態では、[ ０，１] ）にクラ
ンプする事ができるので、従来、比較命令、条件分岐命
令を用いて行っていたクランプ処理を分岐によるパイプ
ラインの乱れを生じさせることなく、高速に実行するこ
とができる。特にコンピュータ・グラフィックに於ける
照光処理の輝度計算、色彩等のブレンド計算、アンエリ
アシング処理で多用される[ ０，１] へのクランプ処理
に用いれば、照光処理を高速に実行できる。As described above, if the coordinate conversion processing device of this embodiment is used, it is possible to clamp to a specific value (in the embodiment, [0, 1]) by the input value by adding a small amount of hardware. Therefore, the clamp processing, which has been conventionally performed by using the comparison instruction and the conditional branch instruction, can be executed at high speed without causing the disturbance of the pipeline due to the branch. In particular, the illumination processing can be executed at a high speed by using it for the brightness calculation of the illumination processing in computer graphics, the blend calculation of colors, etc., and the clamp processing to [0, 1] which is often used in the unaliasing processing.

【００５８】[0058]

【発明の効果】本発明によれば、従来の方式に比較し
て、（１）図形データの記憶装置と座標変換処理装置間
の効率の良いデータ転送、（２）透視変換を行う行列演
算に伴う内積演算及び‘奥行き’による除算、（３）照
光処理に於ける輝度Ｒ，Ｇ，Ｂのクランプ処理を高速に
実行すること、可能にする座標変換処理装置（ＧＴＥ）
を提供することができる。According to the present invention, in comparison with the conventional method, (1) efficient data transfer between the graphic data storage device and the coordinate conversion processing device, and (2) matrix operation for performing perspective transformation are performed. Coordinate conversion processing unit (GTE) that enables high speed execution of inner product calculation and division by'depth ', and (3) clamp processing of brightness R, G, B in illumination processing.
Can be provided.

[Brief description of drawings]

【図１】本発明に係る座標変換処理装置のブロック図で
ある。FIG. 1 is a block diagram of a coordinate conversion processing device according to the present invention.

【図２】内部記憶部１１０を示すブロック図である。FIG. 2 is a block diagram showing an internal storage unit 110.

【図３】データ保持部１２０を示すブロック図である。FIG. 3 is a block diagram showing a data holding unit 120.

【図４】演算部１３０を示すブロック図である。FIG. 4 is a block diagram showing a calculation unit 130.

【図５】本実施形態の演算器を用いた座標変換の実行タ
イミング図（その１）である。FIG. 5 is an execution timing diagram (part 1) of coordinate conversion using the arithmetic unit of the present embodiment.

【図６】本実施形態の演算器を用いた座標変換の実行タ
イミング図（その２）である。FIG. 6 is an execution timing diagram (part 2) of coordinate conversion using the arithmetic unit of the present embodiment.

【図７】本発明に係る座標変換処理装置の浮動小数点演
算器のブロック図である。FIG. 7 is a block diagram of a floating-point arithmetic unit of the coordinate conversion processing device according to the present invention.

【図８】従来の座標変換処理装置の例（その１）であ
る。FIG. 8 is an example (No. 1) of a conventional coordinate conversion processing device.

【図９】従来の座標変換処理装置の例（その２）であ
る。FIG. 9 is an example (No. 2) of a conventional coordinate conversion processing device.

[Explanation of symbols]

１００座標変換処理装置１１０内部記憶部１１１ダイレクト・メモリ・コントローラ１１２内部メモリ１１３アドレス・ジェネレータ１１４内部メモリ・レジスタファイル間バス１２０データ保持部１２１ロード・ストア回路１２２レジスタファイル１３０演算部１３１，１３２，１３３，１３４積和演算器１３５除算器１３６入出力ポート１３７バイパス・ネットワーク１４１，１４２，１４３バス・ネットワーク１５１，１５２クロスバ・スイッチ２００外部記憶装置７０１浮動小数点演算器７０２符号部判定手段７０３指数部判定手段７０４定数生成手段８０１，９０１演算器部８０２，９０２レジスタファイル８０３，９０３内部記憶装置８０４，９０４入出力インターフェース８０５，９０５データバス 100 coordinate conversion processing device 110 internal storage 111 Direct Memory Controller 112 internal memory 113 address generator 114 Bus between internal memory and register file 120 data storage 121 Load / Store Circuit 122 register file 130 arithmetic unit 131, 132, 133, 134 Product-sum calculator 135 divider 136 I / O port 137 Bypass Network 141,142,143 bus network 151,152 Crossbar switch 200 External storage device 701 floating point calculator 702 code part determination means 703 Exponent part determination means 704 constant generation means 801,901 Computing unit 802,902 register file 803,903 Internal storage device 804,904 Input / output interface 805,905 data bus

フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩＧ０９Ｇ 5/36 ５３０Ｃ (72)発明者植野麻紀神奈川県川崎市川崎区駅前本町25番地１東芝マイクロエレクトロニクス株式会社内 (56)参考文献特開平４−348485（ＪＰ，Ａ) 特開平７−200404（ＪＰ，Ａ) 特開昭62−205482（ＪＰ，Ａ) 特開平４−144361（ＪＰ，Ａ) 特開平５−73706（ＪＰ，Ａ) 特開昭57−193840（ＪＰ，Ａ) 特開昭58−51352（ＪＰ，Ａ) 特開平５−46783（ＪＰ，Ａ) 特表平６−511330（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06T 15/00 G06F 7/00 G09G 5/36 G06F 12/00 - 12/12 Continuation of front page (51) Int.Cl. ⁷ Identification code FI G09G 5/36 530C (72) Inventor Maki Ueno 25, Ekimaehonmachi, Kawasaki-ku, Kawasaki-shi, Kanagawa Toshiba Microelectronics Stock Association In-house (56) References Special Features Kaihei 4-348485 (JP, A) JP 7-200404 (JP, A) JP 62-205482 (JP, A) JP 4-144361 (JP, A) JP 5-73706 ( JP, A) JP 57-193840 (JP, A) JP 58-51352 (JP, A) JP 5-46783 (JP, A) JP 6-511330 (JP, A) (58) ) Fields surveyed (Int.Cl. ⁷ , DB name) G06T 15/00 G06F 7/00 G09G 5/36 G06F 12/00-12/12

Claims

(57) [Claims]

1. A coordinate conversion processing device which is stored in an external storage device and which performs a predetermined geometric operation process on vertex data of a figure represented by homogeneous coordinates, wherein a first storage block and a second storage block are provided. And a memory block,
An internal storage unit for inputting and outputting data for each of these storage blocks, in which the predetermined vertex data is input and held in the storage block from the external storage device, and the connection destination is switched to a data holding unit to store the vertex data. An internal storage unit that outputs data, a data holding unit that temporarily stores a part of the vertex data stored in a predetermined storage block of the internal storage unit, and the vertex data stored in the data holding unit. An arithmetic unit for inputting, performing a predetermined process to generate graphic data, and connecting the first storage block to the external storage device.
When the vertex data is received and the data reception to the first storage block is completed,
The first storage block is connected to the data holding unit.
The vertex data is output and the arithmetic unit performs a predetermined process.
Then, the processing result is written back to the first storage block,
The second storage block is connected to the external storage device.
When the vertex data to be processed next is received and the data reception to the second storage block is completed,
The first storage block is connected to the external storage device.
Then, the processing result is written back to the external storage device and then processed.
Receiving the vertex data to be processed, the second storage block is
Connect to the data holding unit to output the vertex data,
The arithmetic unit performs a predetermined process, and the process result is stored in the second storage.
A coordinate conversion processing device characterized by being written back to a block .

2. A product-sum operator for performing a product-sum operation of x, y, z corresponding to x, y, z of at least a homogeneous coordinate system, at least one divider, and at least x of a homogeneous coordinate system. , Y, z, first, second, and third register files for storing vertex data of figures, and the product-sum calculator and the divider and the register file are connected to each other, and the product-sum calculator is connected. A first bus network for supplying first operand data to the divider, the product-sum calculator, the divider and the register file are connected to each other, and A second bus network for supplying two operand data, the product-sum calculator and the divider, and the register file are connected to each other, and the calculation results of the product-sum calculator and the divider are stored in the register file. Write back A first read port of each of the first, second, and third register files corresponding to the first, second, and third bus networks. 3 is connected to the input terminal of the first operand of the product-sum operator and the divider by the first bus network, and the second read ports of each of the first, second, and third register files are , A second operand input terminal of each of the first, second, and third product-sum calculators, an input terminal of the second operand of the divider, and the second bus network including a crossbar switch And an input terminal of the second operand of each of the first, second, and third product-sum calculators and the divider, and the first, second, and
Each second read port of the third register file is
It is possible to make interconnections in which register-pair arithmetic units that are mutually exclusive combinations have a one-to-one correspondence and one-to-many interconnections that connect specific registers to a plurality of arithmetic units. Output terminals of the third product-sum calculator and the divider are connected to respective write ports of the first, second, and third register files, and the first, second, and third product-sum calculators are connected. At least one of the output terminals of the register and the output terminal of the divider can be exclusively connected to any of the write ports of the first, second, and third register files, and can be connected to a predetermined address of the register. A coordinate conversion processing device characterized by being writable also.

3. The output terminals of the first, second and third product-sum calculators and the divider are connected to the first, second and third product-sum calculators and the second terminal of the divider. Before connecting the input terminal directly and writing the operation result back to the register file,
Alternatively, in parallel with the writing back process, the first, second, and third
A first bypass network that supplies directly to the first input terminals of the first, second and third product-sum calculators and the divider as operands of the product-sum calculator and the divider. The output terminals of the first, second, and third product-sum calculators and the divider are directly connected to the first and second bus networks, and the first result is written in parallel with the process of writing back the calculation result. ,
A second supply supplied directly to the first or second input terminal of the first, second or third product-sum operator as an operand of the second and third product-sum operator or the divider. The coordinate transformation processing device according to claim 2, further comprising: a bypass network.

4. An x, y, z, w of at least a homogeneous coordinate system.
The sum-of-products calculator for performing the sum-of-products calculation of x, y, z, w corresponding to, at least one divider, and at least the vertex data of the figure corresponding to x, y, z, w of the homogeneous coordinate system The first, second, third, and fourth register files to be stored, the product-sum calculator and the divider, and the register file are connected to each other, and the product-sum calculator and the divider have a first operand. A second bus network for supplying data, the product-sum calculator and the divider are connected to the register file, and a second operand data is supplied to the product-sum calculator and the divider. And a third bus network for connecting the product-sum operation unit and the divider to the register file, and writing back the operation results of the product-sum operation unit and the divider to the register file. , The first read port of each of the first, second, third, and fourth register files is connected to the corresponding first, second, third, and fourth by the first bus network. An input terminal for the first sum operand of the product-sum operator and the divider,
The second read port of each of the first, second, third, and fourth register files connected by the first bus network is connected to each of the first, second, third, and fourth. A second operand input terminal of the product-sum calculator and an input terminal of the second operand of the divider, and a crossbar
The first and second input / output terminals of the first, second, third, and fourth product-sum calculators and the divider are connected to each other by the second bus network including a switch. Each second of the second, third, and fourth register files
The read ports can be interconnects in which register pair arithmetic units have a one-to-one correspondence such that they are mutually exclusive combinations, and one-to-many interconnects that connect a specific register to a plurality of arithmetic units. The output terminals of the first, second, third, fourth product-sum calculator and the divider are connected to the write ports of the first, second, third, and fourth register files, respectively. At least one of the output terminals of the first, second, third, and fourth product-sum calculators and the output terminal of the divider write to any of the first, second, third, and fourth register files. A coordinate conversion processing device, which can be exclusively connected to a port and can be written to a predetermined address of the register.

5. The output terminals of the first, second, third and fourth product-sum calculators and the divider are connected to the first, second, third and fourth product-sum calculators and the divider. Of the first, second, third, and fourth product-sum calculators, which are directly connected to the second input terminal of A first bypass network for supplying directly to the first, second, third, fourth product-sum operator and the first input terminal of the divider as an operand of the divider; The output terminals of the second, third, fourth product-sum calculator and the divider are directly connected to the first and second bus networks, and the first result is written in parallel with the process of writing back the calculation result. , The second, the third, the fourth product-sum operator and the divider as a direct operand, , Second, third, fourth
5. The coordinate transformation processing device according to claim 4, further comprising: a second bypass network that supplies the first or second input terminal of the multiply-accumulate calculator.