JP4325038B2

JP4325038B2 - Image processing device

Info

Publication number: JP4325038B2
Application number: JP29891899A
Authority: JP
Inventors: 辰己光下; 裕幸小沢; 俊男堀岡; 謙士朗荒瀬; 悦和黒瀬; 睦弘大森
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1999-10-20
Filing date: 1999-10-20
Publication date: 2009-09-02
Anticipated expiration: 2019-10-20
Also published as: JP2001118056A

Description

【０００１】
【発明の属する技術分野】
本発明は、小規模な構成で高画質な画像を提供できる画像処理装置に関する。
【０００２】
【従来の技術】
種々のＣＡＤ(Computer Aided Design) システムや、アミューズメント装置などにおいて、コンピュータグラフィックスがしばしば用いられている。特に、近年の画像処理技術の進展に伴い、３次元コンピュータグラフィックスを用いたシステムが急速に普及している。
このような３次元コンピュータグラフィックスでは、各画素（ピクセル）に対応する色を決定するときに、各画素の色の値を計算し、この計算した色の値を、当該画素に対応するディスプレイバッファ（フレームバッファ）のアドレスに書き込むレンダリング(Rendering) 処理を行う。
レンダリング処理の手法の一つに、ポリゴン（Polygon)レンダリングがある。この手法では、立体モデルを三角形の単位図形（ポリゴン）の組み合わせとして表現しておき、このポリゴンを単位として描画を行なうことで、表示画面の色を決定する。
【０００３】
ポリゴンレンダリングでは、物理座標系における三角形の各頂点についての、座標（ｘ，ｙ，ｚ）と、色データ（Ｒ，Ｇ，Ｂ）と、張り合わせのイメージパターンを示すテクスチャデータの同次座標（ｓ，ｔ）および同次項ｑの値とを三角形の内部で補間する処理が行われる。
ここで、同次項ｑは、簡単にいうと、拡大縮小率のようなもので、実際のテクスチャバッファのＵＶ座標系における座標、すなわち、テクスチャ座標データ（Ｕ，Ｖ）は、同次座標（ｓ，ｔ）を同次項ｑで除算した（ｓ／ｑ，ｔ／ｑ）＝（ｕ，ｖ）に、それぞれテクスチャサイズＵＳＩＺＥおよびＶＳＩＺＥを乗じた乗算結果に応じたものとなる。
【０００４】
このようなポリゴンレンダリングを用いた３次元コンピュータグラフィックシステムでは、描画を行う際に、テクスチャデータをテクスチャバッファから読み出し、この読み出したテクスチャデータを、立体モデルの表面に張り付け、リアリティの高い画像データを得るテクスチャマッピング処理を行う。
なお、立体モデルについてテクスチャマッピングを行なうと、各画素毎に、張り付けを行なうテクスチャデータが示す画像の拡大縮小率が変化する。
【０００５】
ところで、テクスチャマッピングを行なう際に、高画質を得る手法として、ＭＩＰ(Multum In Parvo) ＭＡＰ（複数解像度テクスチャ）フィルタリングがある。
このＭＩＰＭＡＰフィルタリングは、図２２に示すように、複数の異なる縮小率のそれぞれに対応した、複数のフィルタ処理済みのテクスチャデータ２００，２０１，２０２，２０３を予め用意し、各画素の縮小率２０４に対応したテクスチャデータを選択２０５することで、縮小率２０４に応じた最適なテクスチャデータ２０６を使用するものであり、イメージの縮小に伴う情報欠落に起因するエイリアシングの影響を抑制できる。
【０００６】
以下、上述したＭＩＰＭＡＰフィルタリングを採用した従来の３次元コンピュータグラフィックシステムを説明する。
図２３は、従来の３次元コンピュータグラフィックシステムの構成を説明するための図、図２４は図２３に示すテクスチャマッピング装置２１０における処理のフローチャートである。
図２３に示すように、従来の３次元コンピュータグラフィックシステムでは、それぞれ異なる半導体チップ内に組み込まれたテクスチャマッピング装置２１０、テクスチャバッファ２１１およびディスプレイバッファ２１３が、配線を介して相互に接続されている。
【０００７】
以下、テクスチャマッピング装置２１０における処理を説明する。
ステップＳ１：先ず、テクスチャマッピング装置２１０は、三角形の各頂点についての同次座標および同次項を示す（ｓ１，ｔ１，ｑ１），（ｓ２，ｔ２，ｑ２），（ｓ３，ｔ３，ｑ３）データを入力する。
ステップＳ２：次に、テクスチャマッピング装置２１０は、入力した各頂点の（ｓ１，ｔ１，ｑ１），（ｓ２，ｔ２，ｑ２），（ｓ３，ｔ３，ｑ３）データを線形補間して、三角形の内部の各画素の同次座標および同次項を示す（ｓ，ｔ，ｑ）データを求める。
【０００８】
ステップＳ３：テクスチャマッピング装置２１０は、内蔵した縮小率算出装置２１２において、三角形の内部の各画素の（ｓ，ｔ，ｑ）データから、各画素の縮小率ｌｏｄを求める。
ステップＳ４：テクスチャマッピング装置２１０は、各画素の（ｓ，ｔ，ｑ）データについて、ｓデータをｑデータで除算したｕデータと、ｔデータをｑデータで除算したｖデータとを算出し、テクスチャ座標データ（ｕ，ｖ）を求める。
次に、テクスチャマッピング装置２１０は、縮小率算出装置２１２で算出した縮小率ｌｏｄと、テクスチャ座標データ（ｕ，ｖ）とから、テクスチャバッファ２１１における物理アドレスであるテクスチャアドレス（Ｕ，Ｖ）を求める。
【０００９】
ステップＳ５：テクスチャマッピング装置２１０は、テクスチャアドレス（Ｕ，Ｖ）をテクスチャバッファ２１１に出力し、テクスチャデータ（Ｒ，Ｇ，Ｂ）を読み出す。
ステップＳ６：テクスチャマッピング装置２１０は、ステップＳ５で読み出しテクスチャデータに所定の処理を施した画素データＳ２１０をディスプレイバッファ２１３に書き込む。
これにより、テクスチャバッファ２１１に記憶されている、複数の異なる縮小率のそれぞれに対応した複数のテクスチャデータのうち、縮小率ｌｏｄに対応したテクスチャデータについてのアクセスが実現される。
【００１０】
また、高速な描画を実現するために、複数の画素について並行してテクスチャマッピング処理を行い、それらの画素データをディスプレイバッファに同時に書き込む高速テクスチャマッピング装置がある。
このような高速テクスチャマッピング装置では、図２５に示すように、三角形の頂点についての（ｓ１，ｔ１，ｑ１），（ｓ２，ｔ２，ｑ２），（ｓ３，ｔ３，ｑ３）データを、ｎ個のテクスチャマッピング装置２１０₁〜２１０_nで並行して処理し、その処理結果である画素データＳ２１０₁〜Ｓ２１０_nをディスプレイバッファに同時に書き込む。
すなわち、複数の画素についてのテクスチャマッピング処理が並行して（同時に）行なわれる。
【００１１】
なお、テクスチャマッピング処理は、単位図形である三角形を単位として行なわれ、テクスチャデータの縮小率などの処理条件が三角形を単位として決定され、同時に処理が行なわれる複数の画素のうち、当該三角形の内側に位置する画素についての処理結果のみが有効とされ、当該三角形の外側に位置する画素についての処理結果は無効とされる。
【００１２】
【発明が解決しようとする課題】
しかしながら、上述した従来の３次元コンピュータグラフィックシステムでは、テクスチャマッピング装置２１０と、テクスチャバッファ２１１およびディスプレイバッファ２１３との間のデータ転送速度が、システム全体の処理能力を高める上でのボトルネックになっていた。
【００１３】
また、上述した従来の３次元コンピュータグラフィックシステムでは、テクスチャマッピング装置２１０、テクスチャバッファ２１１およびディスプレイバッファ２１３が異なる半導体チップ内に組み込まれているため、システムが大規模化するという問題がある。
【００１４】
また、縮小率ｌｏｄを求める演算には、多数の乗算および除算が含まれており、膨大な演算量を要する。
従って、図２５に示すように、ｎ個のテクスチャマッピング装置２１０₁〜２１０_nの各々に、縮小率算出装置２１２₁〜２１２_nを内蔵すると、高速処理は実現できるが、装置が大規模化していまうという問題がある。
このような問題を解決するために、並行して処理を行なう複数のテクスチャマッピング装置のうち、一のテクスチャマッピング装置にのみ縮小率算出装置を内蔵し、当該テクスチャマッピン装置の処理対象となる画素を縮小率を求める上での代表点とし、当該縮小率算出装置で求められた縮小率を、全てのテクスチャマッピング装置で用いる方法が考えられる。
この場合には、同時に処理を行なう複数の画素のうち、代表点となる画素の位置が固定となる。
従って、同時に処理が行なわれる複数の画素のうちに、前述した単位図形である三角形の外側に位置する画素が代表点となる可能性がある。
しかしながら、当該三角形の内側と外側とでは、縮小率が大きく異なることがあり、処理対象となっている三角形の外側に位置する画素が代表点となると、当該三角形の内側に位置する画素について、最適なテクスチャデータを選択できない。その結果、画質が大きく低下するという問題がある。
【００１５】
本発明は上述した従来技術の問題点に鑑みてなされ、小規模な装置構成で、高画質を安定して提供できる画像処理装置を提供することを目的とする。
【００１６】
【課題を解決するための手段】
本発明によれば、共通の処理条件が適用される複数の画素からなる単位図形を組み合わせて表示モデルを表現し、必要に応じてテクスチャデータを用いて前記画素に対応する画素データを生成する画像処理装置において、
前記表示モデルは立体モデルであり、
前記単位図形が三角形であり、
表示データと、同一のパターンについて相互に異なる縮小率に対応した複数のテクスチャデータとを記憶する記憶回路と、
同時に処理を行おうとする複数の画素データに対して共通して用いられる縮小率を算出する縮小率算出回路と、
前記算出された縮小率に対応する前記テクスチャデータを前記記憶回路から読み出す読み出し回路と、
前記読み出されたテクスチャデータを用いて前記複数の画素データの処理を同時に行って表示データを生成する画像処理回路と、
前記生成された表示データを前記記憶回路に書き込む書き込み回路と、
前記同時に処理を行おうとする複数の画素データに対応する画素のうち、処理対象となっている前記単位図形の内側に位置する画素のなかから代表点となる画素を決定する代表点決定回路と
を有し、
前記縮小率算出回路は、下記式に基づいて前記縮小率を示すＬＯＤを実質的に算出するため、
ＬＯＤ＝Ｃｌａｍｐ（（（ｌｏｇ _２１／ｑ）＋ｍａｘｅ）
＜＜Ｌ＋Ｋ）
ここで、
ＬＯＤは、整数部と少数部とで構成され、符号なしの縮小率を示す記号であり、
Ｃｌａｍｐは、下記のクランプ回路においてクランプすることを示す記号であり、
ｑは、前記同次項を示す記号であり、
ｍａｘｅは、前記処理対象となっている前記単位図形の頂点の同次座標（ｓ，ｔ）および同次項ｑの最大指数を示す整数部のみからなるデータであり、
＜＜Ｌは、下記のシフト回路においてデータをＬビットシフトすることを示すものであり、
Ｋは、整数部と少数部とで構成され、符号有りで、下記の加算回路において加算に使用するデータであり、
前記同次項データｑを正規化して指数ｑｅおよび仮数ｑｍを生成する正規化回路と、
前記指数ｑｅと前記仮数ｑｍとをビット結合したデータを、前記データＬが示す値だけＭＳＢ(Most Significant Bit)に向けてシフトする第１のシフト回路と、
前記第１のシフト回路の出力を反転する第１の反転回路と、
前記仮数ｑｍを入力して、「ｌｏｇ _２（｛１，ｑｍ｝）−ｑｍ」を示すデータμを出力するデータ出力手段と、
前記データｍａｘｅと前記データμとをビット結合したデータを、前記データＬが示す値だけＭＳＢに向けてシフトする第２のシフト回路と、
前記第２のシフト回路の出力を反転する第２の反転回路と、
前記データＫと２進数の「１０」とをビット結合したデータと、前記第１の反転回路の出力と、前記第２の反転回路の出力とを加算する加算回路と、
前記加算回路の出力を所定ビット内にクランプ処理して前記縮小率ＬＯＤを生成するクランプ回路と
を有し、
前記読み出し回路は、前記同時に処理が行われる複数の画素データの各々について、前記決定された縮小率、前記同次座標（ｓ，ｔ）および同次項ｑによって特定されるテクスチャデータを前記記憶回路から読み出す、
画像処理装置が提供される。
【００３８】
【発明の実施の形態】
以下、本実施形態においては、家庭用ゲーム機などに適用される、任意の３次元物体モデルに対する所望の３次元画像をＣＲＴ(Cathode Ray Tube)などのディスプレイ上に高速に表示する３次元コンピュータグラフィックシステムに、本発明の画像処理装置を適用した場合について説明する。
図１は、本実施形態の３次元コンピュータグラフィックシステム１のシステム構成図である。
３次元コンピュータグラフィックシステム１は、立体モデルを単位図形である三角形（ポリゴン）の組み合わせとして表現し、このポリゴンを描画することで表示画面の各画素の色を決定し、ディスプレイに表示するポリゴンレンダリング処理を行うシステムである。
また、３次元コンピュータグラフィックシステム１では、平面上の位置を表現する（ｘ，ｙ）座標の他に、奥行きを表すｚ座標を用いて３次元物体を表し、この（ｘ，ｙ，ｚ）の３つの座標で３次元空間の任意の一点を特定する。
【００３９】
図１に示すように、３次元コンピュータグラフィックシステム１は、メインメモリ２、Ｉ／Ｏインタフェース回路３、メインプロセッサ４およびレンダリング回路５が、メインバス６を介して接続されている。
ここで、レンダリング回路５が、本発明の画像処理装置に対応している。
以下、各構成要素の機能について説明する。
メインプロセッサ４は、例えば、ゲームの進行状況などに応じて、メインメモリ２から必要なグラフィックデータを読み出し、このグラフィックデータに対してクリッピング(Clipping)処理、ライティング(Lighting)処理およびジオメトリ(Geometry)処理などを行い、ポリゴンレンダリングデータを生成する。メインプロセッサ４は、ポリゴンレンダリングデータＳ４を、メインバス６を介してレンダリング回路５に出力する。
Ｉ／Ｏインタフェース回路３は、必要に応じて、外部からポリゴンレンダリングデータを入力し、これをメインバス６を介してレンダリング回路５に出力する。
【００４０】
ここで、ポリゴンレンダリングデータは、ポリゴンの各３頂点の（ｘ，ｙ，ｚ，Ｒ，Ｇ，Ｂ，ＣＯＥ_blend，ｓ，ｔ，ｑ，ＣＯＥ_fog）のデータを含んでいる。
ここで、（ｘ，ｙ，ｚ）データは、ポリゴンの頂点の３次元座標を示し、（Ｒ，Ｇ，Ｂ）データは、それそれ当該３次元座標における赤、緑、青の輝度値を示している。
データＣＯＥ_blendは、これから描画する画素と、ディスプレイバッファ２１に既に記憶されている画素とのＲ，Ｇ，Ｂデータのブレンド（混合）係数を示している。
（ｓ，ｔ，ｑ）データのうち、（ｓ，ｔ）は、対応するテクスチャの同次座標を示しており、ｑは同次項を示している。ここで、「ｓ／ｑ」および「ｔ／ｑ」に、それぞれテクスチャサイズＵＳＩＺＥおよびＶＳＩＺＥを乗じてテクスチャ座標データ（ｕ，ｖ）が得られる。テクスチャバッファ２０に記憶されたテクスチャデータへのアクセスは、テクスチャ座標データ（ｕ，ｖ）を用いて行われる。
データＣＯＥ_fogは、フォギング処理において用いられる混合係数を示している。
【００４１】
以下、レンダリング回路５について詳細に説明する。
図１に示すように、レンダリング回路５は、ＤＤＡ(Digital Differential Anarizer) セットアップ回路１０、トライアングルＤＤＡ回路１１、テクスチャエンジン回路１２、メモリＩ／Ｆ回路１３、ＣＲＴコントローラ回路１４、ＲＡＭＤＡＣ回路１５、ＤＲＡＭ１６、ＳＲＡＭ１７およびクロック信号生成回路１８を有し、これらが一つの半導体チップ内に混載されている。
ここで、テクスチャエンジン回路１２およびＤＲＡＭ１６によって、本発明の画像処理装置が構成される。ＤＲＡＭ１６が本発明の記憶回路に対応している。レンダリング回路５では、上述したように、各構成要素を一つの半導体チップ内に混載することで、各構成要素間でのデータ伝送の高速化による高性能化、並びに回路規模の縮小化を図れる。
【００４２】
ＤＲＡＭ１６は、テクスチャバッファ２０、ディスプレイバッファ２１、ｚバッファ２２およびテクスチャＣＬＵＴバッファ２３として機能する。
また、クロック信号生成回路１８からのクロック信号Ｓ１８は、レンダリング回路５内の各構成要素を駆動する信号として用いられる。
【００４３】
ＤＲＡＭ１６
ＤＲＡＭ１６は、テクスチャデータを記憶するテクスチャバッファ２０、ＣＲＴに出力してディスプレイに表示する表示データを記憶するディスプレイバッファ２１、ｚデータを記憶するｚバッファ２２およびカラールックアップデータを記憶するテクスチャＣＬＵＴバッファ２３として機能する。
【００４４】
ＤＤＡセットアップ回路１０
ＤＤＡセットアップ回路１０は、後段のトライアングルＤＤＡ回路１１において物理座標系上の三角形の各頂点の値を線形補間して、三角形の内部の各画素の色と深さ情報を求めるに先立ち、ポリゴンレンダリングデータＳ４が示す（ｚ，Ｒ，Ｇ，Ｂ，ＣＯＥ_blend，ｓ，ｔ，ｑ，ＣＯＥ_fog）データについて、三角形の辺と水平方向の差分などを求めるセットアップ演算を行う。
このセットアップ演算は、具体的には、開始点の値と終点の値と、開始点と終点との距離を用いて、単位長さ移動した場合における、求めようとしている値の変分を算出する。
【００４５】
すなわち、ＤＤＡセットアップ回路１０は、各画素について、（ｓ，ｔ，ｑ）データのｘ方向の変分であるｄｓｄｘ，ｄｔｄｘ，ｄｑｄｘと、ｙ方向の変分であるｄｓｄｙ，ｄｔｄｙ，ｄｑｄｙとを生成する。
ＤＤＡセットアップ回路１０は、算出した変分データＳ１０をトライアングルＤＤＡ回路１１に出力する。
【００４６】
トライアングルＤＤＡ回路１１
トライアングルＤＤＡ回路１１は、ＤＤＡセットアップ回路１０から入力した変分データＳ１０を用いて、三角形内部の各画素における線形補間された（ｚ，Ｒ，Ｇ，Ｂ，ＣＯＥ_blend，ｓ，ｔ，ｑ，ＣＯＥ_fog）データを算出する。
また、トライアングルＤＤＡ回路１１は、並行して処理を行う８画素について、処理対象となる三角形の内部に位置するか否かを示すそれぞれ１ビットの有効ビットデータＩ₁〜Ｉ₈を生成する。
有効ビットデータＩ₁〜Ｉ₈は、例えば、三角形の内部に位置する画素について「１」とし、三角形の外部に位置する画素について「０」とする。
具体的には、図２に示すように、ｘ，ｙ座標系に位置する三角形２５０について有効ビットデータＩ₁〜Ｉ₈が決定される。
なお、図２において、実線は、同時に処理が行なわれる８（＝２×４）画素が属する矩形領域を示している。
トライアングルＤＤＡ回路１１は、各画素の（ｘ，ｙ）データと、当該（ｘ，ｙ）座標における（ｚ，Ｒ，Ｇ，Ｂ，ＣＯＥ_blend，ｓ，ｔ，ｑ，ＣＯＥ_fog）データと、有効ビットデータＩ₁〜Ｉ₈と、処理対象となる三角形の頂点のｓ，ｔ，ｑデータの最大指数を示すｍａｘｅデータＳ１１ｃとを、ＤＤＡデータＳ１１としてテクスチャエンジン回路１２に出力する。
ここで、後述する図４に示す（Ｒ，Ｇ，Ｂ）データＳ１１ｂおよび（ｓ，ｔ，ｑ）データＳ１１ａ₁〜Ｓ１１ａ₈が、（ｚ，Ｒ，Ｇ，Ｂ，ＣＯＥ_blend，ｓ，ｔ，ｑ，ＣＯＥ_fog）データから得られる。
本実施形態では、トライアングルＤＤＡ回路１１は、並行して処理を行う矩形内に位置する８（＝２×４）画素分を単位として、ＤＤＡデータＳ１１をテクスチャエンジン回路１２に出力する。
【００４７】
テクスチャエンジン回路１２
テクスチャエンジン回路１２は、テクスチャデータの縮小率の選択処理、「ｓ／ｑ」および「ｔ／ｑ」の算出処理、テクスチャ座標データ（ｕ，ｖ）の算出処理、テクスチャアドレス（Ｕ，Ｖ）の算出処理、テクスチャバッファ２０からの（Ｒ，Ｇ，Ｂ，ｔα）データの読み出し処理、ＭＩＰＭＡＰ処理、および、テクスチャファンクション処理を順に、例えばパイプライン方式で行う。
なお、テクスチャエンジン回路１２は、所定の矩形領域内に位置する８画素についての処理を同時に並行して行う。
また、テクスチャエンジン回路１２は、処理対象の三角形内に位置する画素については同じパターンのテクスチャデータを用いる。但し、選択するテクスチャデータの縮小率は、同時に処理を行う矩形領域内に位置する８画素を単位として決定する。
【００４８】
テクスチャエンジン回路１２は、ＳＲＡＭ１７あるいはテクスチャバッファ２０から読み出した（Ｒ，Ｇ，Ｂ）データＳ１７を用いて、ＭＩＰＭＡＰ（複数解像度テクスチャ）処理やテクスチャファンクション処理を行う。
【００４９】
ＭＩＰＭＡＰ処理では、（Ｒ，Ｇ，Ｂ）データＳ１７から２次元上の所望の位置の画素の（Ｒ，Ｇ，Ｂ）データを算出する４点近傍補間処理と、縮小率ＬＯＤ(Level Of Detail) のレベルを補間するレベル補間処理とが行われる。
ＳＲＡＭ１７およびテクスチャバッファ２０には、例えば、図３に示すように、ＭＩＰＭＡＰに基づいた複数の縮小率に対応したテクスチャデータ、すなわち、縮小率１．０のレベルのテクスチャデータ１００と、縮小率ＬＯＤが２．０のレベルのテクスチャデータ１０１と、縮小率ＬＯＤが３．０のレベルのテクスチャデータ１０２とが記憶されている。
そして、何れの縮小率ＬＯＤのテクスチャデータを用いるかは、所定のアルゴリズムを用いて前記ポリゴン単位で算出された縮小率ＬＯＤを用いて決定される。
なお、テクスチャデータ１００，１０１，１０２は、既にフィルタリング処理が施され、イメージの縮小などに伴う情報欠落に起因するエイリアシングの影響が抑制された表示パターンを示すデータである。
【００５０】
先ず、テクスチャエンジン回路１２で行われるＭＩＰＭＡＰ処理の４点近傍補間処理について説明する。
４点近傍補間処理では、テクスチャデータを割り当てる画素の座標から、当該座標の４近傍の点の（Ｒ，Ｇ，Ｂ）データを求める。
例えば、縮小率ＬＯＤが１．０の場合には、ＳＲＡＭ１７あるいはテクスチャバッファ２０からテクスチャエンジン回路１２に、図３に示すテクスチャデータ１００の（Ｒ，Ｇ，Ｂ）データＳ１７が読み出される。
そして、図３に示す位置ｐｉｘｅｌ０の（Ｒ，Ｇ，Ｂ）データである４点近傍補間データＣ_pixel0を、当該位置ｐｉｘｅｌ０の４近傍点Ａ０，Ｂ０，Ｃ０，Ｄ０の（Ｒ，Ｇ，Ｂ）データＣ_A0，Ｃ_B0，Ｃ_C0，Ｃ_D0を用いて、下記式（３）〜（５）に基づいて求める。
このとき、（Ｒ，Ｇ，Ｂ）データＣ_A0，Ｃ_B0，Ｃ_C0，Ｃ_D0は、テクスチャデータ１００の（Ｒ，Ｇ，Ｂ）データＳ１７から得られる。
なお、下記式（３）〜（５）において、ａ，ｂは、それぞれ位置ｐｉｘｅｌ０のｕ座標，ｖ座標の小数部を示している。
【００５１】
【数３】
Ｃ_AB0＝Ｃ_B0×ａ＋Ｃ_A0×（１−ａ） …（３）
【００５２】
【数４】
Ｃ_CD0＝Ｃ_D0×ａ＋Ｃ_C0×（１−ａ） …（４）
【００５３】
【数５】
Ｃ_pixel0＝Ｃ_CD0×ｂ＋Ｃ_AB0×（１−ｂ） …（５）
【００５４】
次に、縮小率のレベル補間処理について説明する。
ここでは、ｔｒｉ−ｌｉｎｅａｒと呼ばれるレベル補間処理を例示して説明する。
例えば、縮小率ＬＯＤが１．５である場合には、テクスチャエンジン回路１２は、上述したように、縮小率ＬＯＤが１．０のテクスチャデータ１００を用いて位置ｐｉｘｅｌ０の４点近傍補間データＣ_pixel0を算出すると共に、縮小率ＬＯＤが２．０のテクスチャデータ１０１を用いて、テクスチャデータ１００上の位置ｐｉｘｅｌ０に対応したテクスチャデータ１０１上の位置ｐｉｘｅｌ１の４点近傍補間データＣ_pixel1を算出する。次に、４点近傍補間データＣ_pixel0とＣ_pixel1とを線形補間して、縮小率ＬＯＤが１．５のレベル補間データＣ_pixelを算出する。
【００５５】
すなわち、前述した４点近傍補間データＣ_pixel0の算出処理に続いて、ＳＲＡＭ１７あるいはテクスチャバッファ２０からテクスチャエンジン回路１２に、図３に示すテクスチャデータ１０１の（Ｒ，Ｇ，Ｂ）データＳ１７が読み出される。
そして、テクスチャエンジン回路１２は、図３における位置ｐｉｘｅｌ１の（Ｒ，Ｇ，Ｂ）データである４点近傍補間データＣ_pixel1を、当該位置ｐｉｘｅｌ１の４近傍点Ａ１，Ｂ１，Ｃ１，Ｄ１の（Ｒ，Ｇ，Ｂ）データＣ_A1，Ｃ_B1，Ｃ_C1，Ｃ_D1を用いて、下記式（６）〜（８）に基づいて求める。
このとき、（Ｒ，Ｇ，Ｂ）データＣ_A1，Ｃ_B1，Ｃ_C1，Ｃ_D1は、テクスチャデータ１０１の（Ｒ，Ｇ，Ｂ）データＳ１７から得られる。
なお、下記式（６）〜（８）において、ｃ，ｄは、それぞれ位置ｐｉｘｅｌ１のｕ，ｖ座標の小数部を示している。
【００５６】
【数６】
Ｃ_AB1＝Ｃ_B1×ｃ＋Ｃ_A1×（１−ｃ） …（６）
【００５７】
【数７】
Ｃ_CD1＝Ｃ_D1×ｃ＋Ｃ_C1×（１−ｃ） …（７）
【００５８】
【数８】
Ｃ_pixel1＝Ｃ_CD1×ｄ＋Ｃ_AB1×（１−ｄ） …（８）
【００５９】
次に、テクスチャエンジン回路１２は、下記式（９）を用いて、テクスチャデータ１００と１０１との間のレベル補間を行い、レベル補間後の対応する位置（画素）の（Ｒ，Ｇ，Ｂ）データであるレベル補間データＣ_pixelを求める。なお、下記式（９）において、ミップマップ係数ＣＯＥ_mipmapは縮小率ＬＯＤの小数部０．５を示している。
【００６０】
【数９】
Ｃ_pixel＝Ｃ_pixel1×ＣＯＥ_mipmap＋Ｃ_pixel0×（１−ＣＯＥ_mipmap）
…（９）
【００６１】
次に、テクスチャエンジン回路１２で行われるテクスチャファンクション処理について説明する。
テクスチャエンジン回路１２で行われるテクスチャファンクション処理には、例えば、モジュレート(Modulate)処理、デカル(Decal) 処理、ハイライト(Highlight) 、フォギング(Fogging) 処理、アルファブレンディング (α Blending)処理などがある。
ここで、モジュレート処理は、テクスチャデータが示す色でフラグメントデータが示す色の変調を行う処理である。
なお、本実施形態では、フラグメントデータは、トライアングルＤＤＡ回路１１から入力したＤＤＡデータＳ１１に含まれる（Ｒ，Ｇ，Ｂ）データＳ１１ｂである。
デカル処理は、テクスチャデータが示す色でフラグメントデータが示す色を置き換える処理である。
ハイライト処理は、ハイライト効果を出すために乗算結果に加算データＨｉを加算する処理である。
フォギング処理は、遠くの物体をぼかす効果を出す処理である。
アルファブレンディング処理は、ソースデータが示す色とディスティネーションデータが示す色とを所定の混合比で混合する処理である。
ここで、ソースデータが示す色は図１に示すディスプレイバッファ２１に記憶されているデータが示す色であり、ディスティネーションデータが示す色はディスプレイバッファ２１に描画しようとするデータが示す色である。
【００６２】
これらのテクスチャファンクション処理は、テクスチャデータをＣ_tex、フラグメントデータをＣ_flag、ハイライト処理の加算データをＨｉとし、モジュレート処理後のデータをＣ_mod、デカル処理後のデータをＣ_dcl、ハイライト処理後のデータをＣ_hghとすると、下記式（１０）〜（１２）のように表せる。
なお、式（１２）において、Ｈｉは、ハイライト用の加算データを示している。
【００６３】
【数１０】
Ｃ_mod＝Ｃ_tex×Ｃ_flag …（１０）
【００６４】
【数１１】
Ｃ_dcl＝Ｃ_tex …（１１）
【００６５】
【数１２】
Ｃ_hgh＝Ｃ_tex×Ｃ_flag＋Ｈｉ …（１２）
【００６６】
また、フォギング処理およびアルファブレンディング処理は、フラグメントデータをＣ_flag、フォグデータをＣ_fog、フォグ係数データをＣＯＥ_fog、ソース（カラー）データをＣ_src、デスティネーション（カラー）データをＣ_dst、ブレンディング係数をＣＯＥ_blendとし、フォギング処理後のデータをＣ_fogged、ブレンディング処理後のデータをＣ_blendとすると、以下式（１３），（１４）で示される。
【００６７】
【数１３】
Ｃ_fogged＝Ｃ_flag×ＣＯＥ_fog＋Ｃ_fog×（１−ＣＯＥ_fog）
…（１３）
【００６８】
【数１４】
Ｃ_blend＝Ｃ_src×ＣＯＥ_blend＋Ｃ_dst×（１−ＣＯＥ_blend）
…（１４）
【００６９】
上述したように、式（９）〜（１４）で示されるＭＩＰＭＡＰ処理のレベル補間処理とテクスチャファンクション処理とは、データＡ，Ｂ，ＣＯＥ、ＣおよびＤを用いた下記式（１５）で表現できる。
本実施形態では、このことを利用して、後述するように、ＬＩＰ回路６１をレベル補間処理とテクスチャファンクション処理とで共用する。
【００７０】
【数１５】
Ｄ＝Ａ×ＣＯＥ＋Ｂ（１−ＣＯＥ） …（１５）
【００７１】
図４は、テクスチャエンジン回路１２の部分回路図である。
図４に示すように、テクスチャエンジン回路１２は、例えば、縮小率演算回路５０、読み出し回路５１、ＬＩＰ(Linear Inter Polator)回路５２，５３，５４、ＬＩＰ／テクスチャファンクション回路５５を有する。
ここで、縮小率演算回路５０が本発明の縮小率演算回路に対応し、読み出し回路５１が本発明の読み出し回路に対応し、ＬＩＰ回路５２，５３，５４およびＬＩＰ／テクスチャファンクション回路５５が本発明の画像処理回路に対応している。
テクスチャエンジン回路１２内の各構成要素は、図１に示すクロック信号発生回路１８からのクロック信号Ｓ１８に基づいて動作する。
テクスチャエンジン回路１２は、図４に示す構成を用いて、ＭＩＭＡＰ処理、モジュレート処理、デカル処理、ハイライト処理、フォギング処理、テクスチャブレンディング処理およびアルファブレンディング処理などの一部あるいは全てを行う。
【００７２】
以下、図４に示すテクスチャエンジン回路１２の構成要素について詳細に説明する。
〔代表点決定回路３０１〕
代表点決定回路３０１は、トライアングルＤＤＡ回路１１から入力した、ＤＤＡデータＳ１１に含まれる有効ビットデータＩ₁〜Ｉ₈から、代表点となる画素を決定し、当該決定した代表点を示す代表点指示データＳ３０１をｓｔｑ選択回路３０２に出力する。
具体的には、代表点決定回路３０１は、同時に処理される２行×４列の８画素のうち、処理対象となっている三角形の内部に位置するもののなかで、当該８画素が配置される矩形領域の中心に最も近い画素が代表点として決定する。
【００７３】
図５は、代表点決定回路３０１における代表点決定処理のフローチャートである。
ステップＳ１１：先ず、代表点決定回路３０１は、有効ビットデータＩ₁〜Ｉ₈に、「１」を示すものが少なくとも一つ存在するかを判断し、存在すれば、ステップＳ１２の処理を実行する。
ステップＳ１２：代表点決定回路３０１は、有効ビットデータＩ₁〜Ｉ₈のうち、「１」を示すものが一つであるか否かを判断し、一つである場合には、ステップＳ１５に示す処理を実行する。ステップＳ１５では、「１」を示す有効ビットデータに対応する画素を代表点として決定する。
【００７４】
ステップＳ１３：代表点決定回路３０１は、有効ビットデータＩ₁〜Ｉ₈に「１」を示すものが２つ以上ある場合に、「１」を示す有効ビットデータに対応する画素のうち、当該同時に処理が行なわれる画素が配置される矩形領域の中心に最も近い画素を、代表点として決定する。
このとき、当該矩形領域の中心に最も近い画素が複数ある場合には、これらのｘ座標が同じであるか否かが判断され、これらのｘ座標が異なる場合には、ステップＳ１６に示す処理が実行される。ステップＳ１６では、当該矩形領域の中心に最も近い複数の画素のうち、ｘ座標が最も小さいものが、代表点として決定される。
【００７５】
ステップＳ１４：代表点決定回路３０１は、前記矩形領域の中心に最も近い画素が複数あり、しかも、これらのｘ座標が同じ場合には、これら複数の画素のうち、ｙ座標が最も小さいものを代表点として決定する。
【００７６】
以下、代表点決定回路３０１における代表点の決定を具体例を挙げて説明する。
図６は、代表点決定回路３０１における代表点の決定を説明するための図である。
有効ビットデータＩ₁〜Ｉ₈に対応する画素の配置を図６（Ａ）に示すように設定する。ここで、同時に処理が実行される画素の矩形領域の中心はＡである。
例えば、図６（Ｂ）に示すように、有効ビットデータＩ₄のみが「１」の場合には、代表点決定回路３０１は、有効ビットデータＩ₄に対応する画素を代表点として決定する。
【００７７】
また、図６（Ｃ）に示すように、有効ビットデータＩ₆およびＩ₇が「１」であり、これに対応する画素のｘ座標が異なる場合には、ｘ座標が小さい有効ビットデータＩ₆に対応する画素を代表点として決定する。
また、図６（Ｄ）に示すように、有効ビットデータＩ₃およびＩ₇が「１」であり、これに対応する画素のｘ座標が同じ場合には、ｙ座標が小さい有効ビットデータＩ₇に対応する画素を代表点として決定する。
さらに、図６（Ｅ）に示すように、有効ビットデータＩ₂，Ｉ₃，Ｉ₆およびＩ₇が「１」である場合には、ｘ座標およびｙ座標が最も小さい、有効ビットデータＩ₆に対応する画素を代表点として決定する。
【００７８】
また、前述した図２に示す三角形２５０については８画素単位に、図５に示すアルゴリズムに基づいて、図７に示すように代表点が決定される。図７では、「１」が丸印で囲まれている画素が代表点になる。
このように、代表点決定回路３０１において、有効ビットデータＩ₁〜Ｉ₈に基づいて、同時に処理を行なう複数画素のうち処理対象となっている三角形の内側に位置する画素のなかから代表点を動的に決定するため、代表点を当該三角形の内側に確実に決定することができる。
その結果、当該三角形の内側に位置する画素について、適切なテクスチャデータを確実に選択でき、高い画質を安定して提供できる。
また、本実施形態で、８画素について同時処理を行なうが、縮小率演算回路３０１を１つ設ければよく、装置が大規模化することはない。
【００７９】
〔ｑデータ選択回路３０２〕
ｑデータ選択回路３０２は、ＤＤＡデータＳ１１に含まれる８画素分の（ｓ，ｔ，ｑ）データＳ１１ａ₁〜Ｓ１１ａ₈を入力し、これらのうち、代表点指示データＳ３０１によって示される画素に対応するｑデータを選択し、これをｑデータＳ３０２として縮小率演算回路３０４に出力する。
【００８０】
〔縮小率演算回路３０４〕
縮小率演算回路３０４は、トライアングルＤＤＡ回路１１からのｍａｘｅデータＳ１１ｃと、ｑ選択回路３０２からのｑデータＳ３０２とに基づいて、テクスチャデータの縮小率ＬＯＤを算出する。
ここで、ｍａｘｅデータＳ１１ｃは、例えば図２に示すような処理対象の三角形の頂点のｓ，ｔ，ｑデータの指数のうち、最大の指数を示している。
このように、縮小率演算回路３０４では、代表点決定回路３０１において代表点として決定された画素のｑデータと、ｍａｘｅデータＳ１１ｃとを用いて縮小率を算出し、これを縮小率ＬＯＤとして読み出し回路５１に出力する。
【００８１】
ここで、縮小率ＬＯＤは、元画像のテクスチャデータを、どの程度縮小したものであるかを示すものであり、元画像の縮小率を１／１とした場合には、１／２，１／４，１／８，・・・となる。
縮小率演算回路３０４における縮小率ＬＯＤの演算処理は、下記式（１６）で示される。
【００８２】
【数１６】
ＬＯＤ＝Ｃｌａｍｐ（（（ｌｏｇ₂１／ｑ）−ｍａｘｅ）＜＜Ｌ＋Ｋ）
…（１６）
【００８３】
上記式（１６）において、
ＬＯＤ：縮小率を示し、整数部３ビット、小数点部４ビット、符号無し(unsigned)のデータ、
ｍａｘｅ：図２に示す三角形の頂点のｓ，ｔ，ｑの最大指数を示し、整数部８ビット、符号無し(unsigned)のデータ、
ｑ：整数部１０ビット、小数部５ビット、符号有り(signed)のデータ、
Ｌ：２ビット、符号無しのデータ、Ｌの最大値は１０進数の「３」
Ｋ：整数部８ビット、小数部４ビット、符号有りのデータ
【００８４】
以下、縮小率演算回路５０について詳細に説明する。
図８は、縮小率演算回路５０の構成図である。
図８に示すように、縮小率演算回路５０は、例えば、プライオリティエンコーダ１０１、シフト回路１０２、シフト回路２０１，２０２、テーブル２０３、インバータ２０４，２０５、加算回路２０６およびクランプ回路１０９を有する。
ここで、プライオリティエンコーダ１０１およびシフト回路１０２が本発明の正規化回路に対応し、シフト回路２０１が本発明の第１のシフト回路に対応し、インバータ２０４が本発明の第１の反転回路に対応し、テーブル２０３が本発明のデータ出力手段に対応し、シフト回路２０２が本発明の第２のシフト回路に対応し、インバータ２０５が本発明の第２の反転回路に対応し、加算回路２０６が本発明の加算回路に対応し、クランプ回路１０９が本発明のクランプ回路に対応している。
縮小率演算回路５０は、前記式（１６）の演算を行い、その演算結果である縮小率ＬＯＤを図４に示す読み出し回路５１に出力する。
【００８５】
プライオリティエンコーダ１０１は、図４に示すｑデータ選択回路３０２から入力したｑデータの対数値「ｌｏｇ₂ｑ」を求め、当該２の対数値「ｌｏｇ₂ｑ」の整数値「ｉｎｔ（ｌｏｇ₂ｑ）」、すなわち指数ｑｅをデータａとしてシフト回路１０２，２０１に出力する。
【００８６】
シフト回路１０２は、図４に示すｑデータ選択回路３０２から入力したデータｑを、プライオリティエンコーダ１０１から入力した指数ｑｅだけ、ＬＳＢに向けてシフト演算した結果の小数部であるデータｑｍを、データｂ２としてシフト回路２０１およびテーブル２０３に出力する。
【００８７】
プライオリティエンコーダ１０１が出力したデータａと、シフト回路１０２が出力したデータｂ２とはビット結合され、その結果であるデータ｛ａ，ｂ２｝がシフト回路２０１に出力される。
【００８８】
シフト回路２０１は、入力したデータ｛ａ，ｂ２｝を、入力したデータＬだけ、ＭＳＢに向けてシフト演算した結果であるデータδ２をインバータ２０４に出力する。
【００８９】
インバータ２０４は、データδ２を反転し、その結果であるデータ￣δ２を加算回路２０６に出力する。
【００９０】
テーブル２０３は、データｑｍと、「ｌｏｇ₂（｛１，ｑｍ｝）−ｑｍ」との対応表を備え、シフト回路１０２から入力したデータｑｍ（＝ｂ２）をキーとして、当該データｑｍに対応する「ｌｏｇ₂（｛１，ｑｍ｝）−ｑｍ」を対応表から得て、これをデータμとしてシフト回路２０２に出力する。
なお、テーブル２０３の代わりに、入力したデータｑｍを用いて、「ｌｏｇ₂（｛１，ｑｍ｝）−ｑｍ」を自動的に生成するプログラムを用いてもよい。
【００９１】
トライアングルＤＤＡ回路１１から入力したｍａｘｅデータＳ１１ｃと、テーブル２０３から出力されたデータμとは、ｍａｘｅデータＳ１１ｃを整数部とし、小数部が７ビットとなるようにデータμの前に（０００）を結合して小数部として結合され、当該ビット結合後のデータ｛ｍａｘｅ，３’ｂ０，μ｝がシフト回路２０２に出力される。ここで、「３’ｂ０」は、Ｖｅｒｉｌｏｇ−ＨＤＬの表記で、３ビットの２進数の０を意味する。
【００９２】
シフト回路２０２は、入力したデータ｛ｍａｘｅ，３’ｂ０，μ｝を、入力したデータＬだけ、ＭＳＢに向けてシフト演算した結果であるγ２をインバータ２０５に出力する。
【００９３】
インバータ２０５は、データγ２を反転し、その結果であるデータ￣γ２を加算回路２０６に出力する。
【００９４】
また、データＫと「１０」とが、小数部が７ビットとなるように「１０」の前に１ビットの「０」を加えて結合され、当該ビット結合後のデータ｛Ｋ，３’ｂ０，１０｝が加算回路２０６に出力される。
【００９５】
加算回路２０６は、データ｛Ｋ，３’ｂ０，１０｝と、データ￣δ２と、データ￣γ２とを加算して、その加算結果であるε２をクランプ回路１０９に出力する。
【００９６】
クランプ回路１０９は、加算回路２０６から入力したデータε２を、整数部３ビット、小数部４ビットのデータにクランプし（丸め込み）、その結果を縮小率ＬＯＤとして図４に示す読み出し回路５１に出力する。
【００９７】
図８に示す縮小率演算回路５０では、入力したデータｑｍに対応するμ（＝「ｌｏｇ₂（｛１，ｑｍ｝）−ｑｍ」）を出力する図８に示すテーブル２０３を用いて、整数部のみからなるｍａｘｅデータＳ１１ｃと、小数部のみからなるデータμとをビット結合することで、ｍａｘｅデータＳ１１ｃに関しての加算処理を削減している。これにより、縮小率演算回路５０によれば、ゲート数の削減および演算処理の高速化を図れる。
また、縮小率演算回路５０では、前記式（１６）における「ｌｏｇ₂（１／ｑ）」の小数部ｂと仮数ｑｍの上位４ビット（＝２^-4）とが近似することと、縮小率ＬＯＤの７ビットの小数部のうち下位４ビットの精度を決定付ける「ｌｏｇ₂（１／ｑ）」の７ビットの小数部のうち下位４ビットをテーブル２０３を用いて得ることで、縮小率ＬＯＤの誤差の要因となる前記式（１６）に示す「ｌｏｇ₂（１／ｑ）」の誤差を２^-7にでき、データＬが最大値「３」である場合でも、縮小率ＬＯＤの誤差を２^-4程度にできる。
【００９８】
図９は、図８に示す縮小率演算回路５０における処理を説明するための図である。
以下、図９を参照しながら、図８に示す縮小率演算回路５０の動作を具体例を用いて説明する。
ここでは、以下に示す値を用いて、前記式（１６）の演算処理を縮小率演算回路５０において行う場合を例示する。
【００９９】
ｑ＝（０００１００１１１０．１０１００）：
ｍａｘｅ＝（０００００００１．０００００００）：
Ｌ＝（０１）：
Ｋ＝（０００１００１１．００００）：
【０１００】
ｑ選択回路３０２が出力したｑデータＳ３０２であるｑ（０００１００１１１０．１０１００）が、図８に示すプライオリティエンコーダ１０１およびシフト回路１０２に入力される。
次に、プライオリティエンコーダ１０１において、ｑ（０００１００１１１０．１０１００）の指数ｑｅ（０００１１０）が求められ、当該指数ｑｅ（０００１１０）のＭＳＢ側に（００）を付加した８ビットのデータであるａ（０００００１１０）が出力される。
【０１０１】
次に、シフト回路１０２において、ａ（０００１００１１１０．１０１００）が、ａ（０００００１１０）だけ、ＭＳＢ側に向けてシフトされ、シフト後の小数部である仮数ｑｍ（００１１１０１）が、データｂ２としてテーブル２０３に出力される。
次に、テーブル２０３において、仮数ｑｍ（００１１１０１）に対応する４ビットのデータμ（「ｌｏｇ₂（｛１，ｑｍ｝）−ｑｍ」）であるμ（１０００）が得られ、μ（１０００）が出力される。
【０１０２】
次に、トライアングルＤＤＡ回路１１から入力したデータｍａｘｅ（０００００００１）とデータμ（１０００）とが、小数部が７ビットとなるようにデータμの前に（０００）を結合して小数部として結合され、当該ビット結合後のデータ｛ｍａｘｅ，３’ｂ０，μ｝＝（０００００００１．０００１０００）がシフト回路２０２に出力される。
【０１０３】
次に、シフト回路２０２において、入力したデータ｛ｍａｘｅ，３’ｂ０，μ｝＝（０００００００１．０００１０００）を、入力したデータＬ＝（０１）だけ、ＭＳＢに向けてシフト演算した結果であるγ２（００００００１０．００１００００）がインバータ２０５に出力される。
次に、インバータ２０５において、データγ２（００００００１０．００１００００）が反転され、その結果であるデータ￣γ２（１１１１１１０１．１１０１１１１）が加算回路２０６に出力される。
【０１０４】
また、プライオリティエンコーダ１０１が出力したデータａ（０００００１１０）と、シフト回路１０２が出力したデータｂ２（００１１１０１）とがビット結合され、その結果であるデータ｛ａ，ｂ２｝＝（０００００１１０．００１１１０１）がシフト回路２０１に出力される。
【０１０５】
次に、シフト回路２０１において、データ｛ａ，ｂ２｝＝（０００００１１０．００１１１０１）が、入力したデータＬ（０１）だけ、ＭＳＢに向けてシフト演算され、その結果であるデータδ２（００００１１００．０１１１０１０）がインバータ２０４に出力される。
【０１０６】
次に、インバータ２０４において、データδ２（００００１１００．０１１１０１０）が反転され、その結果であるデータ￣δ２（１１１１００１１．１０００１０１）が加算回路２０６に出力される。
【０１０７】
また、データＫ（０００１００１１．００００）と「１０」とが、小数部が７ビットとなるように「１０」の前に１ビットの「０」を加えて結合され、当該ビット結合後のデータ｛Ｋ，３’ｂ０，１０｝＝（０００１００１１．０００００１０）が加算回路２０６に出力される。
【０１０８】
次に、加算回路２０６において、データ｛Ｋ，３’ｂ０，１０｝と、データ￣δ２と、データ￣γ２とが加算され、その加算結果であるε２（０００００１００．０１１０１１０）がクランプ回路１０９に出力される。
【０１０９】
次に、クランプ回路１０９において、加算回路２０６から入力したデータε２が、整数部３ビット、小数部４ビットのデータにクランプされ（丸め込まれ）、その結果である（１００．０１１０）が縮小率ＬＯＤとして図４に示す読み出し回路５１に出力される。
【０１１０】
〔読み出し回路５１〕
読み出し回路５１は、ＤＤＡデータＳ１１に含まれる（ｓ，ｔ，ｑ）データと、縮小率ＬＯＤと、所定のテクスチャサイズＵＳＩＺＥおよびＶＳＩＺＥとに基づいて算出したアドレス（ｕ，ｖ）を用いて、ＳＲＡＭ１７あるいはテクスチャバッファ２０内のアドレスから（Ｒ，Ｇ，Ｂ）データを読み出し、これをテクスチャデータとしてＬＩＰ回路５２，５３に出力する。
このとき、読み出し回路５１は、縮小率演算回路５０から入力した縮小率ＬＯＤの小数部が０でない場合には、縮小率ＬＯＤの前後の整数部に対応する縮小率を持つ２個のテクスチャデータをそれぞれクロック信号Ｓ１８の１クロックサイクルで順に読み出してＬＩＰ回路５２，５３に出力する。
【０１１１】
〔ＬＩＰ回路５２，５３〕
ＬＩＰ回路５２は、演算対象となっている画素について、前述した式（３）に相当する４点近傍補間処理の演算を１クロックサイクル内に行って補間データＳ５２を生成し、補間データＳ５２をＬＩＰ回路５４に出力する。
続いて、ＬＩＰ回路５２は、演算対象となっている画素について、前述した式（６）に相当する４点近傍補間処理の演算を１クロックサイクル内に行って補間データＳ５２を生成し、補間データＳ５２をＬＩＰ回路５４に出力する。
【０１１２】
ＬＩＰ回路５３は、演算対象となっている画素について、前述した式（４）に相当する４点近傍補間処理の演算を１クロックサイクル内に行って補間データＳ５３を生成し、補間データＳ５３をＬＩＰ回路５４に出力する。
続いて、ＬＩＰ回路５３は、演算対象となっている画素について、前述した式（７）に相当する４点近傍補間処理の演算を１クロックサイクル内に行って補間データＳ５３を生成し、補間データＳ５３をＬＩＰ回路５４に出力する。
ＬＩＰ回路５３の演算は、ＬＩＰ回路５２の演算と並行して行われる。
【０１１３】
〔ＬＩＰ回路５４〕
ＬＩＰ回路５４は、ＬＩＰ回路５２および５３からの補間データＳ５２，Ｓ５３を用いて、前述した式（５）に相当する４点近傍補間処理の演算を１クロックサイクル内に行って４点近傍補間データＣ_pixel0を生成し、４点近傍補間データＣ_pixel0をＬＩＰ／テクスチャファンクション回路５５に出力する。
このとき、ＬＩＰ回路５４は、縮小率ＬＯＤの小数部が０でない場合には、補間データＳ５２，Ｓ５３を用いて、レベル補間処理に用いる４点近傍補間データＣ_pixel0と４点近傍補間データＣ_pixel1とを順に生成する。
例えば、縮小率ＬＯＤが前述したように１．５である場合には、ＬＩＰ回路５４は、上記式（５）に基づいて４点近傍補間データＣ_pixel0を１クロックサイクルで生成した後に、上記式（８）に基づいて４点近傍補間データＣ_pixel1を１クロックサイクルで生成する。
なお、ＬＩＰ回路５２，５３，５４の構成および処理は、後述するＬＩＰ回路６１の構成および処理と基本的に同じである。
【０１１４】
〔ＬＩＰ／テクスチャファンクション回路５５〕
図１０は、ＬＩＰ／テクスチャファンクション回路５５の構成図である。
ＬＩＰ／テクスチャファンクション回路５５は、ＬＩＰ回路５４からの４点近傍補間データＣ_pixel0（必要に応じて４点近傍補間データＣ_pixel1）を用いて、ＭＩＭＡＰ処理のレベル補間処理と、モジュレート処理、デカル処理、ハイライト処理、フォギング処理、テクスチャブレンディング処理およびアルファブレンディング処理などのテクスチャファンクション処理の一部あるいは全てを行う。
具体的には、ＬＩＰ／テクスチャファンクション回路５５は、縮小率ＬＯＤの小数部が０である場合には、ＬＩＰ回路５４から入力した４点近傍補間データＣ_pixel0を用いて、テクスチャファンクション処理のうち必要な処理を行う。
また、ＬＩＰ／テクスチャファンクション回路５５は、縮小率の小数部が０でない場合には、ＬＩＰ回路５４から入力した４点近傍補間データＣ_pixel0，Ｃ_pixel1を用いてレベル補間処理を行った後に、テクスチャファンクション処理のうち必要な処理を行う。
【０１１５】
ＬＩＰ／テクスチャファンクション回路５５は、図１０に示すように、前処理回路６０、ＬＩＰ回路６１およびレジスタ６２を有する。
前処理回路６０は、図１０に示すように、モードコントローラ７０、レジスタ７４、マルチプレクサ７５〜７８およびレジスタ８５〜８８を有する。
モードコントローラ７０は、図１０に示すように、デコーダ７１、カウンタ７２およびデコーダ７３を有する。
【０１１６】
デコーダ７１は、カウンタ７２のカウント値を監視し、カウンタ７２のカウント値が「０」になったタイミングで、ＬＩＰ回路６１を共用する処理の数に応じた初期値「０」，「１」または「２」をセットする。
例えば、デコーダ７１は、ＬＩＰ回路６１で一つの処理のみを行う場合には初期値「０」をカウンタ７２にセットし、２個の処理でＬＩＰ回路６１を共用する場合には初期値「１」をセットし、３個の処理でＬＩＰ回路６１を共用する場合にはカウント値「２」をセットする。
なお、本実施形態では、カウント値７２にセットする初期値として「０」，「１」および「２」を用いる場合を例示するが、当該初期値の値は、ＬＩＰ回路６１を共用する処理の数に応じて任意に設定可能である。
デコーダ７１は、例えば図１に示すメインプロセッサ４あるいはテクスチャエンジン回路１２内の図示しない主コントローラからファンクションモードデータＦＭＤを入力する。
ファンクションモードデータＦＭＤは、各クロックサイクル毎に、例えば図１１に示すモード「１」〜「８」を指定し、後述するように、各モードに応じたデータをＬＩＰ回路６１に入力するための制御に用いられる。すなわち、ファンクションモードデータＦＭＤに基づいて、ＬＩＰ回路６１が行う処理の内容が決定される。図１１の内容について、後に詳細に説明する。
デコーダ７１は、例えば、ファンクションモードデータＦＭＤに基づいて、ＬＩＰ回路６１において１個のモードの処理が終了する度に、カウンタ７２のカウント値を１だけ減少させる。
【０１１７】
デコーダ７３は、図１に示すメインプロセッサ４あるいはテクスチャエンジン回路１２内の図示しない主コントローラからファンクションモードデータＦＭＤおよびフォグイネーブルデータＦＥＤを入力する。
また、デコーダ７３は、ＬＩＰ回路５４あるいは読み出し回路５１からミップマップ番号データＭＮＤを入力する。
【０１１８】
ファンクションモードデータＦＭＤは、前述したように、各クロックサイクル毎に、例えば図１１に示すモード「１」〜「８」を指定し、後述するように、各モードに応じたデータをＬＩＰ回路６１に入力するための制御に用いられる。図１１に示す例では、ＬＩＰ回路６１において、ＭＩＰＭＡＰ処理のレベル補間処理、モジュレート処理、ハイライト処理、デカル処理およびフォギング処理を行う行う場合を例示している。
この場合に、図１１に示すように、例えば、モジュール処理およびハイライト処理は、当該処理のみが行われるか、あるいは、ＭＩＰＭＡＰ処理のレベル補間処理に続いて行われるかによって異なるモードが付されている。また、フォギング処理も、当該処理のみが行われるか、あるいは、モジュレート処理に続いて行われるかによって異なるモードが付されている。これは、図１０に示すＬＩＰ回路６１の処理結果をフィードバックしてレジスタ８８に書き込むか否かをデコーダ７３において決定する必要があるためである。
なお、図１１に示すモードは一例であり、その他にも種々のモードを指定することが可能である。
【０１１９】
また、フォグイネーブルデータＦＥＤは、例えば、フォギング処理を行う場合には論理値「１」を示し、フォギング処理を行わない場合には論理値「０」を示している。
【０１２０】
また、ミップマップ番号データＭＮＤは、ＬＩＰ回路６１においてレベル補間処理を行わない場合（縮小率ＬＯＤの小数部が０である場合）の４点近傍補間データＣ_pixel0を入力するタイミングと、レベル補間処理を行う場合の４点近傍補間データＣ_pixel1を入力するタイミングとで論理値「１」を示す。
また、ミップマップ番号データＭＮＤは、レベル補間処理を行う場合の４点近傍補間データＣ_pixel0を入力するタイミングで論理値「０」を示す。
ミップマップ番号データＭＮＤは、後述するように、デコーダ７３によるマルチプレクサ７７，７８の制御に用いられる。
【０１２１】
デコーダ７３は、ファンクションモードデータＦＭＤ、ミップマップ番号データＭＮＤおよびフォグイネーブルデータＦＥＤに基づいて、ファンクションモードデータＦＭＤが指定した処理をＬＩＰ回路６１が行うのに必要なデータをＬＩＰ回路６１に供給するように、マルチプレクサ７５〜７８を制御する。
【０１２２】
具体的には、デコーダ７３は、ミップマップ番号データＭＮＤが論理値「０」を示す間は、ＬＩＰ回路５４から入力した４点近傍補間データＣ_pixel0をレジスタ８７に出力しないように、マルチプレクサ７７を制御する。このとき、４点近傍補間データＣ_pixel0は、レジスタ７４に書き込まれる。
また、デコーダ７３は、ファンクションモードデータＦＭＤが図１１に示す「１」を示しており、ＬＩＰ回路６１がＭＩＰＭＡＰ処理のレベル補間処理を行う場合には、ミップマップ番号データＭＮＤが論理値「１」を示す間に、レジスタ７４から読み出した４点近傍補間データＣ_pixel0をレジスタ８８に出力し、ＬＩＰ回路５４から入力した４点近傍補間データＣ_pixel1をレジスタ８７に出力するように、マルチプレクサ７８，７７を制御する。
また、デコーダ７３は、ファンクションモードデータＦＭＤが図１１に示す「１」を示し、ミップマップ番号データＭＮＤが論理値「１」を示す間に、図４に示す縮小率演算回路５０から入力したミップマップ係数ＣＯＥ_mipmapを、レジスタ８６に出力するように、マルチプレクサ７６を制御する。それと同時に、デコーダ７３は、論理値「０」をレジスタ８５に出力するように、マルチプレクサ７５を制御する。
これにより、４点近傍補間データＣ_pixel0，Ｃ_pixel1およびミップマップ係数ＣＯＥ_mipmapが、それぞれレジスタ８８，８７，８６に同時に書き込まれ、ＬＩＰ回路６１において、４点近傍補間データＣ_pixel0，Ｃ_pixel1を用いたレベル補間処理が行われる。
【０１２３】
図１２は、ＬＩＰ回路５４から図１０に示すモードコントローラ７０への４点近傍補間データＣ_pixel0，Ｃ_pixel1の入力タイミングと、ＬＩＰ／テクスチャファンクション回路５５におけるレベル補間処理の実行タイミングとを説明するためのタイミングチャートである。
図１２において、同一の（ａ），（ｂ），（ｃ）が付されたデータは、同じレベル補間処理に係わるデータを示している。
【０１２４】
例えば、図１２（Ａ）に示すクロック信号Ｓ１８に基づいて、図１２（Ｂ）に示すタイミングで、ＬＩＰ回路５４からＬＩＰ／テクスチャファンクション回路５５に入力されたレベル補間の対象となる４点近傍補間データＣ_pixel0がレジスタ７４に記憶される。
そして、次のクロックサイクルで、レジスタ７４から読み出された４点近傍補間データＣ_pixel0が、マルチプレクサ７８およびレジスタ８８を介して、ＬＩＰ回路６１のＩＮ_A端子に出力されると共に、ＬＩＰ回路５４から入力した４点近傍補間データＣ_pixel1が、マルチプレクサ７７およびレジスタ８７を介して、ＬＩＰ回路６１のＩＮ_B端子に出力される。
そして、次のクロックサイクルで、図１２（Ｃ）に示すように、ＬＩＰ回路６１において、ミップマップデータＣ_pixel0，Ｃ_pixel1を用いたレベル補間処理が行われる。
図１２（Ｃ）から分かるように、ＬＩＰ回路５２，５３，５４を用いて行われるＭＩＰＭＡＰ処理の４点近傍補間処理のスループットは２クロックサイクルであるのに対して、ＬＩＰ回路６１ではＭＩＭＡＰ処理のレベル補間処理が１クロックサイクルで行われる。従って、ＬＩＰ回路６１において、レベル補間処理のみを行う場合には、ＬＩＰ回路６１に処理を行わない空き時間が生じる。当該実施形態では、後述するように、当該空き時間を利用して、ＬＩＰ回路６１にテクスチャファンクション処理を行わせる。すなわち、ＭＩＰＭＡＰ処理の４点近傍補間処理と、テクスチャファンクション処理とをインターリーブする。
【０１２５】
また、デコーダ７３は、ファンクションモードデータＦＭＤが図１１に示す「２」を示しており、ＬＩＰ回路６１がモジュレート処理のみを行う場合には、対応する１クロックサイクルの間、ＬＩＰ回路５４から入力した４点近傍補間データＣ_pixel0をレジスタ７４を介してレジスタ８８に出力し、トライアングルＤＤＡ回路１１から入力したＤＤＡデータＳ１１に含まれる（Ｒ，Ｇ，Ｂ）データＳ１１ｂ（フラグメントデータＣ_flag）をレジスタ８６に出力するように、マルチプレクサ７８，７６を制御する。
また、同時に、デコーダ７３は、レジスタ８７に論理値「０」を出力し、レジスタ８５に論理値「０」を出力するように、マルチプレクサ７７，７５を制御する。
【０１２６】
また、デコーダ７３は、ファンクションモードデータＦＭＤが図１１に示す「３」を示しており、ＬＩＰ回路６１がＭＩＰＭＡＰ処理のレベル補間処理に続いてモジュレート処理を行う場合には、対応する１クロックサイクルの間、ＬＩＰ回路６１のＯＵＴ端子からフィードバックされたレベル補間データをレジスタ８８に出力し、フラグメントデータＣ_flagをレジスタ８６に出力するように、マルチプレクサ７８，７６を制御する。
また、同時に、デコーダ７３は、レジスタ８７に論理値「０」を出力し、レジスタ８５に論理値「０」を出力するように、マルチプレクサ７７，７５を制御する。
【０１２７】
また、デコーダ７３は、ファンクションモードデータＦＭＤが図１１に示す「４」を示しており、ＬＩＰ回路６１がハイライト処理のみを行う場合には、対応する１クロックサイクルの間、ＬＩＰ回路５４から入力した４点近傍補間データＣ_pixel0をレジスタ７８を介してレジスタ８８に出力し、トライアングルＤＤＡ回路１１から入力したＤＤＡデータＳ１１に含まれる（Ｒ，Ｇ，Ｂ）データＳ１１ｂ（フラグメントデータＣ_flag）をレジスタ８６に出力するように、マルチプレクサ７８，７６を制御する。
また、同時に、デコーダ７３は、レジスタ８７に論理値「０」を出力し、メインプロセッサ４あるいはテクスチャエンジン回路１２内の図示しない主コントローラから入力したハイライト演算の加算データＨｉをレジスタ８５に出力するように、マルチプレクサ７７，７５を制御する。
【０１２８】
また、デコーダ７３は、ファンクションモードデータＦＭＤが図１１に示す「５」を示しており、ＬＩＰ回路６１がＭＩＰＭＡＰ処理のレベル補間処理に続いてハイライト処理を行う場合には、対応する１クロックサイクルの間、ＬＩＰ回路６１のＯＵＴ端子からフィードバックされたレベル補間データをレジスタ８８に出力し、フラグメントデータＣ_flagをレジスタ８６に出力するように、マルチプレクサ７８，７６を制御する。
また、同時に、デコーダ７３は、レジスタ８７に論理値「０」を出力し、メインプロセッサ４あるいはテクスチャエンジン回路１２内の図示しない主コントローラから入力したハイライト演算の加算データＨｉをレジスタ８５に出力するように、マルチプレクサ７７，７５を制御する。
【０１２９】
また、デコーダ７３は、ファンクションモードデータＦＭＤが図１１に示す「６」を示しており、ＬＩＰ回路６１がデカル処理のみを行う場合には、対応する１クロックサイクルの間、ＬＩＰ回路５４から入力した４点近傍補間データＣ_pixel0をレジスタ７８を介してレジスタ８８に出力し、論理値「０ｘｆｆ（０ｘＦＦと同じ）」をレジスタ８６に出力するように、マルチプレクサ７８，７６を制御する。
また、同時に、デコーダ７３は、レジスタ８７に論理値「０」を出力し、レジスタ８５に論理値「０」を出力するように、マルチプレクサ７７，７５を制御する。
【０１３０】
また、デコーダ７３は、ファンクションモードデータＦＭＤが図１１に示す「７」を示しており、ＬＩＰ回路６１がフォギング処理のみを行う場合には、対応する１クロックサイクルの間、トライアングルＤＤＡ回路１１から入力したＤＤＡデータＳ１１に含まれる（Ｒ，Ｇ，Ｂ）データＳ１１ｂ（フラグメントデータＣ_flag）をレジスタ７４を介してレジスタ８８に出力し、例えば図示しないフォグレジスタに設定されたフォグデータＣ_fogをレジスタ８７に出力するように、マルチプレクサ７８，７７を制御する。
また、同時に、デコーダ７３は、トライアングルＤＤＡ回路１１から入力したＤＤＡデータＳ１１に含まれるフォギング係数ＣＯＥ_fogをレジスタ８６に出力するように、マルチプレクサ７６を制御する。
また、同時に、デコーダ７３は、論理値「０」をレジスタ８５に出力するように、マルチプレクサ７５を制御する。
【０１３１】
また、デコーダ７３は、ファンクションモードデータＦＭＤが図１１に示す「８」を示しており、ＬＩＰ回路６１がモジュレート処理に続いてフォギング処理を行う場合には、対応する１クロックサイクルの間、ＬＩＰ回路６１のＯＵＴ端子からフィードバックされたレベル補間データをレジスタ８８に出力し、図示しないフォグレジスタから読み出したフォグデータＣ_fogをレジスタ８７に出力するように、マルチプレクサ７８，７７を制御する。
また、同時に、デコーダ７３は、トライアングルＤＤＡ回路１１から入力したＤＤＡデータＳ１１に含まれるフォギング係数ＣＯＥ_fogをレジスタ８６に出力するように、マルチプレクサ７６を制御する。
また、同時に、デコーダ７３は、論理値「０」をレジスタ８５に出力するように、マルチプレクサ７５を制御する。
【０１３２】
また、デコーダ７３は、ＭＩＭＰＡＰ処理のレベル補間処理と、２以上のテクスチャファンクション処理とでＬＩＰ回路６１を共用する場合、すなわち合計３以上の処理でＬＩＰ回路６１を共用する場合には、共用する処理の数に応じたクロックサイクルの間、例えば図４に示す読み出し回路５１および図１に示すトライアングルＤＤＡ回路１１にウェイト（待ち）指示を出力する。
例えば、レベル補間処理と、２個のテクスチャファンクション処理とでＬＩＰ回路６１を共用する場合には、ＬＩＰ回路６１が２番目のテクスチャファンクション処理を処理している１クロックサイクルの間、ウェイト指示を読み出し回路５１およびトライアングルＤＤＡ回路１１に出力する。
【０１３３】
ＬＩＰ回路６１は、前記式（１５）の演算を行う場合に、それぞれ８ビットのデータＡ，Ｂ，ＣＯＥおよびＣを、ＩＮ_A端子、ＩＮ_B端子、ＩＮ_coe端子およびＩＮ_C端子からそれぞれ入力し、８ビットのデータＤをＯＵＴ端子から出力する。
【０１３４】
ＬＩＰ回路６１は、図１３に示すように、補正データＦ、データＣＯＥの対応するビットの論理値に基づいてデータＡまたはＢを選択した部分積ｏｕｔ＿０〜ｏｕｔ＿７、および、積和演算項であるデータＣを、シフトさせて加算することで、前記式（１５）に示す演算を行う。
【０１３５】
補正データＦは、データＣＯＥ＝０ｘＦＦ（ＣＯＥ＝１．０）の場合には、データＡを、それ以外の場合にはデータＢを選択した値を持つ。
補正データＦは、８ビットのうち全ビットが論理値「１」の場合に、「１」と見るシステムにおいて、例えば、下記式（１７）に示す演算を、下記式（１８）となるように補正するために用いられる。すなわち、「Ｘ×１．０＝Ｘ」となるように補正を行う。
【０１３６】
【数１７】
０ｘＦＦ×０ｘＦＦ＝０ｘＦＥ …（１７）
【０１３７】
【数１８】
０ｘＦＦ×０ｘＦＦ＝０ｘＦＦ …（１８）
【０１３８】
部分積ｏｕｔ＿０〜ｏｕｔ＿７は、それぞれデータＣＯＥのビット０〜７が、論理値「１」であればデータＡを示し、論理値「０」であればデータＢを示している。
ここで、データＣＯＥのＬＳＢをビット０とし、ＭＳＢをビット７とする。
部分積ｏｕｔ＿ｎ（０≦ｎ≦７）は、例えば、図１４に示すように、８個のマルチプレクサ８０₀〜８０₇を用いて生成される。
具体的には、０≦ｍ≦７とした場合に、マルチプレクサ８０_mに、データＡのビットｍのビットデータＡ〔ｍ〕と、データＢのビットｍのビットデータＢ〔ｍ〕と、データＣＯＥのｎのビットデータＣＯＥ〔ｎ〕とを入力し、ビットデータＣＯＥ〔ｎ〕が論理値「１」であればビットデータＡ〔ｍ〕を選択して、ビットデータｏｕｔ＿ｎ〔ｍ〕として出力する。
なお、ビットデータｏｕｔ＿ｎ〔０〕〜ｏｕｔ＿ｎ〔７〕によって、部分積ｏｕｔ＿ｎが構成される。
【０１３９】
部分積ｏｕｔ＿ｎは、ＭＳＢに向けて、ｎビットだけシフトされた後に、ｗａｌｌａｃｅ＿ｔｒｅｅ型のアーキテクチャを採用した加算回路８１に出力される。
また、積和演算項であるデータＣは、８ビット×８ビットの乗算結果の上位８ビットに加算されるように、図１３に示すように、ＭＳＢに向けて８ビットシフトされた後に加算回路８１に出力される。
【０１４０】
加算回路８１は、ｗａｌｌａｃｅ＿ｔｒｅｅ型のアーキテクチャを採用しており、入力を３個づつ集めて和と桁上げの２個の出力に絞り込み、最終的に加算回路８２において２入力加算器を用いて加算を行うことを可能にする。
これにより、補正データＦおよび積和演算項Ｃによる部分積が追加されても、回路規模は殆ど増加せず、加算速度も殆ど落とさずにすむ。
【０１４１】
図１５は、ｗａｌｌａｃｅ＿ｔｒｅｅ型のアーキテクチャを採用した加算回路８の部分構成図である。
図１５では、図１３に示す矢印９２，９３，９４に示される図中縦方向のビットデータの加算を行う構成のみを示し、それ以外の加算を行う部分は省略してある。
図１３に示す矢印９１に示される図中縦方向のビットデータの加算は、加算回路８２において行われる。
図１５に示すように、加算回路８１は、加算器１００₀〜１００₆を有する。
加算器１００₀は、矢印９２の加算を行い、補正データＦのビット１と、部分積ｏｕｔ＿０のビット１と、部分積ｏｕｔ＿１のビット０との加算を行い、和Ｓｕｍを加算回路８２に出力し、桁上げＣａｒｒｙを加算器１００₁に出力する。
【０１４２】
加算器１００₁，１００₂，１００₃は、矢印９３の部分の加算を行う。
加算器１００₁は、補正データＦのビット２と、部分積ｏｕｔ＿０のビット２との加算を行い、和Ｓｕｍを加算器１００₃に出力し、桁上げＣａｒｒｙを加算器１００₄に出力する。
加算器１００₂は、部分積ｏｕｔ＿１のビット１と、部分積ｏｕｔ＿２のビット０との加算を行い、和Ｓｕｍを加算器１００₃に出力し、桁上げＣａｒｒｙを加算器１００₅に出力する。
加算器１００₃は、加算器１００₁からの桁上げＣａｒｒｙと、加算器１００₂からの桁上げＣａｒｒｙとを加算し、和Ｓｕｍおよび桁上げＣａｒｒｙを加算回路８２に出力する。
【０１４３】
加算器１００₄，１００₅，１００₆は、矢印９４の部分の加算を行う。
加算器１００₄は、補正データＦのビット３と、部分積ｏｕｔ＿０のビット３との加算を行い、和Ｓｕｍを加算器１００₆に出力し、桁上げＣａｒｒｙを後段の加算器に出力する。
加算器１００₅は、部分積ｏｕｔ＿１のビット２と、部分積ｏｕｔ＿２のビット１との加算を行い、和Ｓｕｍを加算器１００₆に出力し、桁上げＣａｒｒｙを後段の加算器に出力する。
加算器１００₆は、加算器１００₄からの桁上げＣａｒｒｙと、加算器１００₅からの桁上げＣａｒｒｙとを加算し、和Ｓｕｍおよび桁上げＣａｒｒｙを加算回路８２に出力する。
【０１４４】
加算回路８２は、補正データＦのビット０および部分積ｏｕｔ＿０のビット０、加算回路８１から入力した和Ｓｕｍおよび桁上げＣａｒｒｙを、複数の２入力の加算器を用いて加算して、上記式（１５）の演算結果である１６ビットのデータを算出し、当該１６ビットのデータのうち上位８ビットをデータＤとして出力する。
ＬＩＰ回路６１は、例えば、図１０に示すカウンタ７２がカウント値「０」を示す場合には、当該算出したデータＤを図４に示すＯＵＴ端子からレジスタ６２に出力し、それ以外の場合には、当該算出したデータＤを図４に示すマルチプレクサ７８にフィードバックする。
【０１４５】
以下、図１０に示すテクスチャエンジン回路１２の動作形態について説明する。
第１の動作形態
本動作形態では、ＭＩＰＭＡＰ処理のレベル補間処理と、モジュレート処理とでＬＩＰ回路６１を共用する場合を説明する。
この場合には、図１に示すメインプロセッサ４あるいはテクスチャエンジン回路１２内の図示しない主コントローラから図１０に示すデコーダ７１，７３に、１クロックサイクル毎に、モード「１」および「３」を交互に示すファンクションモードデータＦＭＤが出力される。
また、デコーダ７１は、カウンタ７２のカウント値の初期値として「１」を設定し、カウンタ７２のカウント値が「０」になる度に、カウンタ７２に「１」を設定する。
【０１４６】
具体的には、例えば、第１のクロックサイクルで、図４に示すＬＩＰ回路５４からの４点近傍補間データＣ_pixel0がレジスタ７４に書き込まれる。
また、カウンタ７２のカウント値に「１」がセットされる。
【０１４７】
次に、第１のクロックサイクルに続く第２のクロックサイクルで、ファンクションモードデータＦＭＤがモード「１」を示し、４点近傍補間データＣ_pixel0がレジスタ７４から読み出されて、マルチプレクサ７８およびレジスタ８８を介してＬＩＰ回路６１のＩＮ_A端子に出力される。それと同時に、図４に示すＬＩＰ回路５４からの４点近傍補間データＣ_pixel1が、マルチプレクサ７７およびレジスタ８７を介してＬＩＰ回路６１のＩＮ_B端子に出力される。
それと同時に、図４に示す縮小率演算回路５０からのデータＣＯＥ_mipmapが、マルチプレクサ７６およびレジスタ８６を介してＬＩＰ回路６１のＩＮ_coeffに出力される。
そして、ＬＩＰ回路５４において、上記式（９）に示す演算が行われ、レベル補間データＣ_pixelが算出される。
そして、カウンタ７２がカウント値が「１」であるため、レベル補間データＣ_pixelがマルチプレクサ７８にフィードバックされる。
そして、カウンタ７２のカウント値が減少して「０」になる。
【０１４８】
次に、第３のクロックサイクルで、ファンクションモードデータＦＭＤがモード「３」を示し、図４に示すＬＩＰ回路５４からの次の画素に係わる４点近傍補間データがレジスタ７４に書き込まれる。
それと同時に、第２のクロックサイクルで算出されたレベル補間データＣ_pixel（＝式（１０）のＣ_texに対応）がマルチプレクサ７８およびレジスタ８８を介してＬＩＰ回路６１のＩＮ_A端子に出力される。
それと同時に、トライアングルＤＤＡ回路１１からのＤＤＡデータＳ１１に含まれる（Ｒ，Ｇ，Ｂ）データＳ１１ｂ（フラグメントカラー値Ｃ_flag）が、マルチプレクサ７６およびレジスタ８６を介してＬＩＰ回路６１のＩＮ_coeffに出力される。
そして、ＬＩＰ回路５４において、上記式（１０）に示す演算が行われ、モジュレート処理後のカラー値Ｃ_modが算出される。
そして、カウンタ７２のカウント値が「０」であるため、ＬＩＰ回路６１のＯＵＴ端子からレジスタ６２にカラー値Ｃ_modが出力される。
カラー値Ｃ_modは、レジスタ６２から読み出されて画素データＳ１２として後段のメモリＩ／Ｆ回路１３に出力される。
そして、カウンタ７２のカウント値に「１」がセットされる。
以後、上述した第２のクロックサイクルの処理と、第３のクロックサイクルの処理とが交互に繰り返される。
【０１４９】
以上説明したように、本動作形態では、ＭＩＰＭＡＰ処理のレベル補間処理と、モジュレート処理とでＬＩＰ回路６１とでＬＩＰ回路６１を共用できる。そのため、レベル補間処理用の回路とモジュレート処理用の回路とを直列に接続した場合に比べて回路規模を縮小できる。また、本実施形態では、ＭＩＰＭＡＰ処理の４点近傍補間処理は１系統で２クロックサイクルかけて行われ、当該処理に係わる回路規模は従来と同じである。
また、本動作例では、ＬＩＰ回路６１において、レベル補間処理を行わない空き時間に、モジュレート処理を行うため、処理時間が長期化することはない。
【０１５０】
第２の動作形態
本動作形態では、ＭＩＰＭＡＰ処理のレベル補間処理と、モジュレート処理と、フォギング処理とでＬＩＰ回路６１を共用する場合を説明する。
この場合には、図１に示すメインプロセッサ４あるいはテクスチャエンジン回路１２内の図示しない主コントローラから図１０に示すデコーダ７１，７３に、１クロックサイクル毎に、モード「１」、「３」、「８」を順に繰り返して示すファンクションモードデータＦＭＤが出力される。
また、デコーダ７１は、カウンタ７２のカウント値の初期値として「２」をセットし、カウンタ７２のカウント値が「０」になる度に、カウンタ７２に「２」をセットする。
【０１５１】
具体的には、例えば、第１のクロックサイクルで、図４に示すＬＩＰ回路５４からの４点近傍補間データＣ_pixel0がレジスタ７４に書き込まれる。
そして、カウンタ７２のカウント値に「２」がセットされる。
【０１５２】
次に、第２のクロックサイクルで、ファンクションモードデータＦＭＤがモード「１」を示し、４点近傍補間データＣ_pixel0がレジスタ７４から読み出されて、マルチプレクサ７８およびレジスタ８８を介してＬＩＰ回路６１のＩＮ_A端子に出力される。それと同時に、図４に示すＬＩＰ回路５４からの４点近傍補間データＣ_pixel1が、マルチプレクサ７７およびレジスタ８７を介してＬＩＰ回路６１のＩＮ_B端子に出力される。
それと同時に、図４に示す縮小率演算回路５０からのデータＣＯＥ_mipmapが、マルチプレクサ７６およびレジスタ８６を介してＬＩＰ回路６１のＩＮ_coeffに出力される。
そして、ＬＩＰ回路５４において、上記式（９）に示す演算が行われ、レベル補間データＣ_pixelが算出される。
そして、カウンタ７２がカウント値が「２」であるため、レベル補間データＣ_pixelがマルチプレクサ７８にフィードバックされる。
そして、カウンタ７２のカウント値が減少して「１」になる。
【０１５３】
次に、第３のクロックサイクルで、ファンクションモードデータＦＭＤがモード「３」を示し、図４に示すＬＩＰ回路５４からの次の画素に係わる４点近傍補間データＣ_pixel0がレジスタ７４に書き込まれる。
それ同時に、第２のクロックサイクルで算出されたレベル補間データＣ_pixel（式（１０）のＣ_texに対応）がマルチプレクサ７８およびレジスタ８８を介してＬＩＰ回路６１のＩＮ_A端子に出力される。
それと同時に、トライアングルＤＤＡ回路１１からのＤＤＡデータＳ１１に含まれる（Ｒ，Ｇ，Ｂ）データＳ１１ｂ（フラグメントカラー値Ｃ_flag）が、マルチプレクサ７６およびレジスタ８６を介してＬＩＰ回路６１のＩＮ_coeffに出力される。
そして、ＬＩＰ回路５４において、上記式（１０）に示す演算が行われ、モジュレート処理後のカラー値Ｃ_modが算出される。
そして、カラー値Ｃ_modは、カウンタ７２がカウント値が「１」であるため、マルチプレクサ７８にフィードバックされる。
そして、カウンタ７２のカウント値が減少して「０」になる。
また、４点近傍補間データＣ_pixel0の出力を１クロックサイクルだけ待つことを指示するウェイト指示が図４に示す読み出し回路５１に出力され、フラグメントデータＣ_flagの出力を１クロックサイクルだけ待つことを指示するウェイト指示が図１に示すトライアングルＤＤＡ回路１１に出力される。
【０１５４】
次に、第４のクロックサイクルで、ファンクションモードデータＦＭＤがモード「８」を示し、フォグイネーブルデータＦＥＤが論理値「１」であれば、第３のクロックサイクルで算出されたカラー値Ｃ_mod（式（１３）のＣ_flagに対応）が、マルチプレクサ７８およびレジスタ８８を介してＬＩＰ回路６１のＩＮ_A端子に出力される。
それと同時に、例えば、図示しないフォグレジスタから読み出したフォグデータＣ_fogが、マルチプレクサ７７およびレジスタ８７を介して、ＬＩＰ回路６１のＩＮ_B端子に出力される。
それと同時に、例えば、トライアングルＤＤＡ回路１１からのＤＤＡデータＳ１１に含まれるフォギング係数ＣＯＥ_fogが、マルチプレクサ７６およびレジスタ８６を介して、ＬＩＰ回路６１のＩＮ_coeff端子に出力される。
そして、ＬＩＰ回路５４において、上記式（１３）に示す演算が行われ、フォギング処理後のカラー値Ｃ_foggedが算出される。
そして、カウンタ７２のカウント値が「０」であるため、ＬＩＰ回路６１のＯＵＴ端子からレジスタ６２にカラー値Ｃ_foggedが出力される。
カラー値Ｃ_foggedは、レジスタ６２から読み出されて画素データＳ１２として後段のメモリＩ／Ｆ回路１３に出力される。
以後、上述した第２のクロックサイクルの処理と、第３のクロックサイクルの処理と、第４のクロックサイクルの処理とが交互に繰り返される。
【０１５５】
以上説明したように、本動作形態では、ＭＩＰＭＡＰ処理のレベル補間処理と、モジュレート処理とでＬＩＰ回路６１と、フォギング処理とでＬＩＰ回路６１を共用できる。そのため、レベル補間処理用の回路とモジュレート処理用の回路とを直列に接続した場合に比べてゲート数を少なくでき、回路規模を縮小できる。
【０１５６】
このように、テクスチャエンジン回路１２では、ＭＩＭＰＡＰ処理のレベル補間処理と、テクスチャファンクション処理とで図１０に示すＬＩＰ回路６１を共用することで、回路規模を縮小できる。この場合に、レベル補間処理と、１個のテクスチャファンクション処理とでＬＩＰ回路６１を共用した場合には、処理時間が長期化することはない。
【０１５７】
また、テクスチャエンジン回路１２では、図４に示すＬＩＰ回路５２，５３，６１において、図１３に示すように補正データＦを用いて演算を行うため、全ビットが論理値「１」の場合に「１」と見なすシステムにおいて、上記式（１５）のＣＯＥが「１．０」の場合の演算を、回路規模を殆ど増大させることなく、正確に行うことができる。
すなわち、補正を行わないで適切な結果を得るために、１ビット増やして９ビットを用い、「０ｘ１００」を「１」と見なすと、前段のパイプレジスタのゲート数増加を招き、全体的にゲートを増加させることになるが、本実施形態では、ビット数を増やす必要がなく、このような問題は生じない。
【０１５８】
なお、テクスチャエンジン回路１２は、フルカラー方式の場合には、ＳＲＡＭ１７あるいはテクスチャバッファ２０から読み出した（Ｒ，Ｇ，Ｂ）データを直接用いる。一方、テクスチャエンジン回路１２は、インデックスカラー方式の場合には、予め作成したカラールックアップテーブル（ＣＬＵＴ）をテクスチャＣＬＵＴバッファ２３から読み出して、内蔵するＳＲＡＭに転送および記憶し、このカラールックアップテーブルを用いて、ＳＲＡＭ１７あるいはテクスチャバッファ２０から読み出したカラーインデックスに対応する（Ｒ，Ｇ，Ｂ）データを得る。
【０１５９】
メモリＩ／Ｆ回路１３
メモリＩ／Ｆ回路１３は、テクスチャエンジン回路１２から入力した画素データＳ１２に対応するｚデータと、ｚバッファ２２に記憶されているｚデータとの比較を行い、入力した画素データＳ１２によって描画される画像が、前回、ディスプレイバッファ２１に書き込まれた画像より、手前（視点側）に位置するか否かを判断し、手前に位置する場合には、画像データＳ１２に対応するｚデータでｚバッファ２２に記憶されたｚデータを更新する。
【０１６０】
ＣＲＴコントローラ回路１４
ＣＲＴコントローラ回路１４は、与えられた水平および垂直同期信号に同期して、図示しないＣＲＴに表示するアドレスを発生し、ディスプレイバッファ２１から表示データを読み出す要求をメモリＩ／Ｆ回路１３に出力する。この要求に応じて、メモリＩ／Ｆ回路１３は、ディスプレイバッファ２１から一定の固まりで表示データを読み出す。ＣＲＴコントローラ回路１４は、ディスプレイバッファ２１から読み出した表示データを記憶するＦＩＦＯ(First In First Out)回路を内蔵し、一定の時間間隔で、ＲＡＭＤＡＣ回路１５に、ＲＧＢのインデックス値を出力する。
【０１６１】
ＲＡＭＤＡＣ回路１５
ＲＡＭＤＡＣ回路１５は、各インデックス値に対応するＲ，Ｇ，Ｂデータを記憶しており、ＣＲＴコントローラ回路１４から入力したＲＧＢのインデックス値に対応するデジタル形式のＲ，Ｇ，Ｂデータを、Ｄ／Ａコンバータに転送し、アナログ形式のＲ，Ｇ，Ｂデータを生成する。ＲＡＭＤＡＣ回路１５は、この生成されたＲ，Ｇ，ＢデータをＣＲＴに出力する。
【０１６２】
レンダリング回路５の実現手法
以下、本実施形態に係る同一半導体チップ内に混載されるレンダリング回路５のロジック回路とＤＲＡＭ１６およびＳＲＡＭ１７等からなる２次メモリとの好適な構成、配置および配線方法について、図１６〜図１８に関連付けて説明する。
【０１６３】
本実施形態においては、ＤＲＡＭ１６は、たとえば図１６に示すように、４つのＤＲＡＭモジュール１４７１〜１４７４に分割されており、メモリＩ／Ｆ回路１３には、各ＤＲＡＭモジュール１４７１〜１４７４に対応したメモリコントローラ１４４１〜１４４４、並びにこれらメモリコントローラ１４４１にデータを分配するディストリビュータ１４４５が設けられている。
そして、メモリＩ／Ｆ回路１３は、各ＤＲＡＭモジュール１４７１〜１４７４に対して、図１６に示すように、ピクセルデータを、表示領域において隣接した部分は、異なるＤＲＡＭモジュールとなるように配置する。
これにより、三角形のような平面を描画する場合には面で同時に処理できることになるため、それぞれのＤＲＡＭモジュールの動作確率は非常に高くなっている。
【０１６４】
前述した描画処理においては、最終的にはピクセルの一つ一つのアクセスにまで集約されてくることになる。したがって、ピクセル一つ一つの処理が同時並行処理されることにより、描画性能は並行処理の数だけ増加できることが理想である。
そのために、３次元コンピュータグラフィックスシステム１におけるメモリシステムを構成するメモリＩ／Ｆ回路１３においても、同時並行処理が行える構成がとられている。
【０１６５】
グラフィック描画処理においては、前述したように、ピクセルに打ち込むところの処理回路は、ＤＲＡＭと頻繁にデータのやりとりを行う必要があることがわかる。
そのため、本実施形態では、図１７に示すように、ピクセル処理を制御する機能ブロックであるピクセル処理モジュール１４４６，１４４７，１４４８，１４４９をメモリコントローラから物理的に分離し、かつ、これらピクセル処理モジュール１４４６，１４４７，１４４８，１４４９を対応するＤＲＡＭモジュール１４７１，１４７２，１４７３，１４７４の近くに配置（近接配置）している。
【０１６６】
ピクセル処理モジュール１４４６，１４４７，１４４８，１４４９は、（Ｒ，Ｇ，Ｂ）カラーのリード（Ｒｅａｄ）／モディファイ（Ｍｏｄｉｆｙ）／ライト（Ｗｒｉｔｅ）処理および、隠面処理のための以前に描画している深さデータと、今から描画しようとしているデータの深さを比較して、その結果により書き戻したりする作業に関する処理を全て行う。
これら作業をすべてピクセル処理モジュール１４４６，１４４７，１４４８，１４４９で行うことで、ＤＲＡＭモジュール１４７１，１４７２，１４７３，１４７４との配線長が短いモジュール内で、ＤＲＡＭとのやりとりを完結することが可能となる。
そのため、ＤＲＡＭとの配線数、すなわち、転送のビット数を多くとっても、面積に対する配線が占める割合を、少なく抑えることができることから、動作速度向上および、配線面積の縮小化が可能となっている。
【０１６７】
ディストリビュータ等を含むＤＲＡＭ間制御モジュール１４５０に関しては、描画処理としての、ＤＤＡセットアップ回路１０のＤＤＡセットアップ演算、トライアングルＤＤＡ回路１１のトライアングルＤＤＡ演算、テクスチャエンジン回路１２のテクスチャ貼り付け、並びに、ＣＲＴコントロール回路１４による表示処理等に比較して、それぞれのＤＲＡＭモジュール（ＤＲＡＭ＋ピクセル処理）との関連も強く、ＤＲＡＭモジュール１４７１，１４７２，１４７３，１４７４との間の信号線が最も多くなるところである。
そのため、ＤＲＡＭ間制御モジュール１４５０は、それぞれのＤＲＡＭモジュール１４７１，１４７２，１４７３，１４７４の中心付近に配置して、最長配線長ができるだけ短くなるように考慮している。
【０１６８】
また、ピクセル処理モジュール１４４６，１４４７，１４４８，１４４９とＤＲＡＭ間制御モジュール１４５０との接続のための信号入出力端子については、図１７に示すように、それぞれのピクセル処理モジュール１４４６，１４４７，１４４８，１４４９における入出力端子を同じにするのではなく、個々のピクセル処理モジュールと、ＤＲＡＭ間制御モジュール１４５０間が最適（最短）に配線されるように、個々のピクセル処理モジュールにおける信号の入出力端子位置を調整してある。
【０１６９】
具体的には、ピクセル処理モジュール１４４６は、図１７においてモジュール下縁部の右端側に入出力端子Ｔ１４４６ａが形成されている。そして、この入出力端子Ｔ１４４６ａがＤＲＡＭ間制御モジュール１４５０の上縁部の左端側に形成された入出力端子Ｔ１４５０ａと対向するように配置されて、両端子Ｔ１４４６ａおよびＴ１４５０ａが最短距離をもって接続されている。
そして、ピクセル処理モジュール１４４６には、図１７において上縁部の中央部にＤＲＡＭモジュール１４７１との接続用入出力端子Ｔ１４４６ｂが形成されている。
【０１７０】
ピクセル処理モジュール１４４７は、図１７においてモジュール下縁部の左端側に入出力端子Ｔ１４４７ａが形成されている。そして、この入出力端子Ｔ１４４７ａがＤＲＡＭ間制御モジュール１４５０の上縁部の右端側に形成された入出力端子Ｔ１４５０ｂと対向するように配置されて、両端子Ｔ１４４７ａおよびＴ１４５０ｂが最短距離をもって接続されている。
そして、ピクセル処理モジュール１４４７には、図１７において上縁部の中央部にＤＲＡＭモジュール１４７２との接続用入出力端子Ｔ１４４７ｂが形成されている。
【０１７１】
ピクセル処理モジュール１４４８は、図１７においてモジュール上縁部の右端側に入出力端子Ｔ１４４８ａが形成されている。そして、この入出力端子Ｔ１４４８ａがＤＲＡＭ間制御モジュール１４５０の下縁部の左端側に形成された入出力端子Ｔ１４５０ｃと対向するように配置されて、両端子Ｔ１４４８ａおよびＴ１４５０ｃが最短距離をもって接続されている。
そして、ピクセル処理モジュール１４４８には、図１７において下縁部の中央部にＤＲＡＭモジュール１４７３との接続用入出力端子Ｔ１４４８ｂが形成されている。
【０１７２】
ピクセル処理モジュール１４４９は、図１７においてモジュール上縁部の左端側に入出力端子Ｔ１４４９ａが形成されている。そして、この入出力端子Ｔ１４４９ａがＤＲＡＭ間制御モジュール１４５０の下縁部の右端側に形成された入出力端子Ｔ１４５０ｄと対向するように配置されて、両端子Ｔ１４４９ａおよびＴ１４５０ｄが最短距離をもって接続されている。
そして、ピクセル処理モジュール１４４９には、図１７において下縁部の中央部にＤＲＡＭモジュール１４７４との接続用入出力端子Ｔ１４４９ｂが形成されている。
【０１７３】
なお、ピクセル処理モジュール１４４６，１４４７，１４４８，１４４９は、各ＤＲＡＭモジュール１４７１，１４７２，１４７３，１４７４からＤＲＡＭ間制御モジュール１４５０に至る経路を、上記のようにして最適な長さにしても、処理速度要求が満足できない処理に関しては、たとえばレジスタで分断した少なくとも１段のパイプライン処理をとり得、所望の処理速度を達成できるように構成されている。
【０１７４】
また、本実施形態に係るＤＲＡＭモジュール１４７１〜１４７４は図１８に示すように構成されている。なお、ここでは、ＤＲＡＭモジュール１４７１を例に説明するが、他のＤＲＡＭモジュール１４７２〜１４７４も同様の構成を有することから、その説明は省略する。
【０１７５】
ＤＲＡＭモジュール１４７１は、図１８に示すように、メモリセルがマトリクス状に配置され、ロウアドレスＲＡ、カラムアドレスＣＡに基づいて選択される図示しないワード線およびビット線を通してアクセスされるＤＲＡＭコア１４８０、ロウデコーダ１４８１、センスアンプ１４８２、カラムデコーダ１４８３、およびＳＲＡＭ等からなるいわゆるキャッシュメモリと同様の機能を備えた２次メモリ１４８４を有している。
【０１７６】
本実施形態のように、ＤＲＡＭモジュール毎に、グラフィックス描画におけるピクセル処理を制御する機能ブロックとしてのピクセル処理モジュール１４４６〜１４４９と、ＤＲＡＭモジュールの２次メモリ１４８４とがＤＲＡＭモジュールに近接配置されている。
そして、この場合、ＤＲＡＭのいわゆる長辺方向が、ＤＲＡＭコア１４８０のカラム方向になるように配置されている。
【０１７７】
図１８の構成においてランダムな読み出し（リード）に関して見てみると、ピクセル処理モジュール１４４６から制御信号と必要なアドレス信号Ｓ１４４６が、アドレス制御パスからＤＲＡＭモジュール１４７１に供給され、それをもとに、ロウアドレスＲＡとカラムアドレスＣＡが生成され、所望のロウに相当するＤＲＡＭのデータがセンスアンプ１４８２を通して読み出される。
センスアンプ１４８０を通ったデータは所望のカラムアドレスＣＡに従って、カラムデコーダにて必要なカラムが集約され、ランダムアクセスポートから所望のロウ／カラムに対応した、ＤＲＡＭのデータＤ１４７１がパスを介してピクセル処理モジュール１４４６に転送される。
【０１７８】
２次メモリにデータを書き込む場合は、ピクセル処理モジュール１４４６から制御信号と必要なアドレス信号Ｓ１４４６が、アドレス制御パスからＤＲＡＭモジュール１４７１に供給され、それをもとにロウアドレスのみが生成され、１ロウ分のデータが一気にＤＲＡＭ１６からＳＲＡＭ１７等からなる２次メモリ１４８４に書き込まれる。
この場合、ＤＲＡＭのいわゆる長辺方向が、ＤＲＡＭコア１４８０のカラム方向になるように配置されていることから、ロウ方向に配置する場合に比較して、ロウアドレス指定のみで、そのロウアドレスに対応している１ロウ分のデータを、一度に２次メモリ１４８４にロードできるビット数が格段に増加する。
【０１７９】
また、テクスチャ処理モジュールとしてのテクスチャエンジン回路１４３への２次メモリ（ＳＲＡＭ）１４８４からのデータＤ１４８４の読み込みは、テクスチャエンジン回路１４３から、制御信号と必要なアドレス信号が、アドレス制御パスからＤＲＡＭに供給され、それに対応したデータＤ１４８４がデータパスを介してテクスチャエンジン回路１４３へ転送される。
【０１８０】
また、本実施形態においては、図１８に示すように、ピクセル処理モジュールとＤＲＡＭモジュールの２次メモリとが、それぞれ互いにＤＲＡＭモジュールの長辺側の同一側に近接配置されている。
これにより、ピクセル処理モジュールとＤＲＡＭモジュールの２次メモリへのデータは、同一のセンスアンプを使うことができるため、ＤＲＡＭコア１４８０の面積増加を最小限に抑えて２ポート化することが可能となっている。
【０１８１】
以下、３次元コンピュータグラフィックシステム１の全体動作について説明する。
ポリゴンレンダリングデータＳ４が、メインバス６を介してメインプロセッサ４からＤＤＡセットアップ回路１０に出力され、ＤＤＡセットアップ回路１０において、ポリゴンの辺と水平方向の差分などを示す変分データＳ１０が生成される。
この変分データＳ１０は、トライアングルＤＤＡ回路１１に出力され、トライアングルＤＤＡ回路１１において、ポリゴン内部の各画素における線形補間された（ｚ，Ｒ，Ｇ，Ｂ，ＣＯＥ_blend，ｓ，ｔ，ｑ，ＣＯＥ_fog）データが算出される。そして、この算出された（ｚ，Ｒ，Ｇ，Ｂ，ＣＯＥ_blend，ｓ，ｔ，ｑ，ＣＯＥ_fog）データと、ポリゴンの各頂点の（ｘ，ｙ）データとが、ＤＤＡデータＳ１１として、トライアングルＤＤＡ回路１１からテクスチャエンジン回路１２に出力される。
【０１８２】
次に、図４に示すテクスチャエンジン回路１２の代表点決定回路３０１において代表点が決定され、当該代表点を示す代表点指示データＳ３０１に基づいて代表点決定回路３０１においてｑデータＳ３０２が選択される。
次に、図４および図８に示す縮小率演算回路５０において、ｑデータＳ３０２およびｍａｘｅデータＳ１１ｃを用いて縮小率ＬＯＤが算出される。
次に、図４に示すテクスチャエンジン回路１２の第２のバッファメモリ５１において、ＤＤＡデータＳ１１に含まれる（ｓ，ｔ，ｑ）データＳ１１ａ₁〜Ｓ１１ａ₈について、ｓデータをｑデータで除算する演算と、ｔデータをｑデータで除算する演算とが行われる。
そして、除算結果「ｓ／ｑ」および「ｔ／ｑ」に、それぞれテクスチャサイズＵＳＩＺＥおよびＶＳＩＺＥが乗算され、テクスチャ座標データ（ｕ，ｖ）が生成される。
次に、第２のバッファメモリ５１によって、前記生成されたテクスチャ座標データ（ｕ，ｖ）を含む読み出し要求が出力され、メモリＩ／Ｆ回路１３を介して、ＤＲＡＭ１６あるいはＳＲＡＭ１７に記憶された（Ｒ，Ｇ，Ｂ）データＳ１７が読み出される。
そして、このとき、上述したように、図４および図１０に示す構成を用いて、前述したＭＩＰＭＡＰ処理およびテクスチャファンクション処理が行われ、画素データＳ１２が生成される。
この画素データＳ１２は、テクスチャエンジン回路１２からメモリＩ／Ｆ回路１３に出力される。
【０１８３】
そして、メモリＩ／Ｆ回路１３において、テクスチャエンジン回路１２から入力した画素データＳ１２に対応するｚデータと、ｚバッファ２２に記憶されているｚデータとの比較が行なわれ、入力した画素データＳ１２によって描画される画像が、前回、ディスプレイバッファ２１に書き込まれた画像より、手前（視点側）に位置するか否かが判断され、手前に位置する場合には、画像データＳ１２がディスプレイバッファ２１に書き込まれると共に、対応するｚデータでｚバッファ２２に記憶されたｚデータが更新される。
【０１８４】
本発明は上述した第１実施形態には限定されない。
例えば、上述した実施形態では、図５に示すモード「１」〜「８」を指定するファンクションモードデータＦＭＤに基づいて、ＬＩＰ回路６１が動作する場合を例示したが、例えば、ＬＩＰ回路６１がアルファブレンディング処理を行うようにしてもよい。
【０１８５】
また、ＬＩＰ回路６１を共用する処理の内容および数は任意である。例えば、ＬＩＰ回路６１において、テクスチャファンクション処理として、デカル処理やアルファブレンディング処理などを行うようにしてもよい。
【０１８６】
第２実施形態
本実施形態の３次元コンピュータグラフィックシステム５０１は、基本的に前述した第１実施形態の３次元コンピュータグラフィックシステム１と同じであるが、図１に示すテクスチャエンジン回路１２およびメモリＩ／Ｆ回路１３においてパイプライン処理を行う点に特徴を有している。
以下、３次元コンピュータグラフィックシステム５０１の構成要素の機能のうち、前述した第１実施形態の３次元コンピュータグラフィックシステム１と異なる点を説明する。
【０１８７】
ＤＤＡセットアップ回路１０
また、ＤＤＡセットアップ回路１０は、同時に処理を行う８画素のそれぞれについて、処理対象となる三角形の内部に位置するか否かを示す１ビットの有効指示データｖａｌを決定する。具体的には、有効指示データｖａｌは、三角形の内部に位置する画素について「１」とし、三角形の外部に位置する画素について「０」とする。
ＤＤＡセットアップ回路１０は、算出した変分データＳ１０と、各画素の有効指示データｖａｌとをトライアングルＤＤＡ回路１１に出力する。
【０１８８】
トライアングルＤＤＡ回路１１
トライアングルＤＤＡ回路１１は、ＤＤＡセットアップ回路１０から入力した変分データＳ１０を用いて、三角形内部の各画素の線形補間された（ｚ，Ｒ，Ｇ，Ｂ，α，ｓ，ｔ，ｑ）データを算出する。
トライアングルＤＤＡ回路１１は、各画素の（ｘ，ｙ）データと、当該（ｘ，ｙ）座標の画素についての（ｚ，Ｒ，Ｇ，Ｂ，α，ｓ，ｔ，ｑ，ｖａｌ）データとを、ＤＤＡデータ（補間データ）Ｓ１１としてテクスチャエンジン回路１２に出力する。
本実施形態では、トライアングルＤＤＡ回路１１は、並行して処理を行う矩形内に位置する８画素分のＤＤＡデータＳ１１を単位としてテクスチャエンジン回路１２に出力する。
【０１８９】
ここで、ＤＤＡデータＳ１１の（ｚ，Ｒ，Ｇ，Ｂ，α，ｓ，ｔ，ｑ，ｖａｌ）データは、図１９に示すように、１６１ビットのデータである。
具体的には、Ｒ，Ｇ，Ｂ，αデータがそれぞれ８ビットであり、ｚ，ｓ，ｔ，ｑデータがそれぞれ３２ビットであり、ｖａｌデータが１ビットである。
なお、以下、並行して処理を行う８画素についての（ｚ，Ｒ，Ｇ，Ｂ，α，ｓ，ｔ，ｑ，ｖａｌ）データのうち、ｖａｌデータをｖａｌデータＳ２２０₁〜Ｓ２２０₈とし、（ｚ，Ｒ，Ｇ，Ｂ，α，ｓ，ｔ，ｑ）データを被演算データＳ２２１₁〜Ｓ２２１₈とする。
すなわち、トライアングルＤＤＡ回路１１は、８画素分の（ｘ，ｙ）データと、ｖａｌデータＳ２２０₁〜Ｓ２２０₈と、被演算データＳ２２１₁〜Ｓ２２１₈からなるＤＤＡデータＳ１１を図１に示すテクスチャエンジン回路１２に出力する。
【０１９０】
テクスチャエンジン回路１２およびメモリＩ／Ｆ回路１３
テクスチャエンジン回路１２による、ＤＤＡデータＳ１１を用いた、縮小率ＬＯＤの算出処理と、「ｓ／ｑ」および「ｔ／ｑ」の算出処理、テクスチャ座標データ（ｕ，ｖ）の算出処理、および、テクスチャバッファ２０からの（Ｒ，Ｇ，Ｂ，α）データの読み出し処理と、メモリＩ／Ｆ回路１３によるｚ比較処理とを、図３に示す演算ブロック２００，２０１，２０２，２０４，２０５でパイプライン方式で順に実行する。
ここで、演算ブロック１９９，２００，２０１，２０２，２０４，２０５は、それぞれ８個の演算サブブロックを内蔵しており、８画素分の演算処理を並行して行う。
ここで、テクスチャエンジン回路１２が演算ブロック１９９，２００，２０１，２０２を内蔵し、メモリＩ／Ｆ回路１３が演算ブロック２０４を内蔵している。
また、演算ブロック１９９は図４に示す縮小率演算回路５０、代表点決定回路３０１およびｑデータ選択回路３０２に対応し、演算ブロック２００，２０１，２０２は図４に示す読み出し回路５１に対応し、演算ブロック２０３は図４に示すＬＩＰ回路５２，５３，５４およびＬＩＰ／テクスチャファンクション回路５５に対応している。
なお、ＬＩＰ／テクスチャファンクション回路５５では、例えば、ＭＩＭＡＰ処理のレベル補間処理と、モジュレート処理、デカル処理、ハイライト処理、フォギング処理、テクスチャブレンディング処理およびアルファブレンディング処理などのテクスチャファンクション処理のうち一の処理が選択的に行われる。
【０１９１】
〔演算ブロック１９９〕
演算ブロック１９９は、第１実施形態で説明した縮小率演算回路３０４、代表点決定回路３０１およびｑデータ選択回路３０２に対応する処理を行ってテクスチャデータの縮小率ＬＯＤを算出して演算ブロック２００に出力する。
縮小率ＬＯＤは、演算ブロック２００，２０１，２０２，２０３に順にシフトされる。
また、演算ブロック１９９は、トライアングルＤＤＡ回路１１から入力したＤＤＡデータＳ１１を後段の演算ブロック２００に出力する。
演算ブロック１９９は、ｖａｌデータＳ２２０₁〜Ｓ２２０₈が示す値とは無関係に常に動作する。
【０１９２】
〔演算ブロック２００〕
演算ブロック２００は、ＤＤＡデータＳ１１に含まれる（ｓ，ｔ，ｑ）データを用いて、ｓデータをｑデータで除算する演算と、ｔデータをｑデータで除算する演算とを行う。
演算ブロック２００は、図２０に示すように、８個の演算サブブロック２００₁〜２００₈を内蔵する。
ここで、演算サブブロック２００₁は、被演算データＳ２２１₁およびｖａｌデータＳ２２０₁を入力し、ｖａｌデータＳ２２０₁が「１」、すなわち有効であることを示す場合には、「ｓ／ｑ」および「ｔ／ｑ」を算出し、その算出結果を除算結果Ｓ２００₁として演算ブロック２０１の演算サブブロック２０１₁に出力する。
【０１９３】
また、演算サブブロック２００₁は、ｖａｌデータＳ２２０₁が「０」、すなわち無効であることを示す場合には、演算は行わず、除算結果Ｓ２００₁を出力しないか、あるいは、所定の仮値を示す除算結果Ｓ２００₁を演算ブロック２０１の演算サブブロック２０１₁に出力する。
また、演算サブブロック２００₁は、ｖａｌデータＳ２２０₁を後段の演算サブブロック２０１₁に出力する。
なお、演算サブブロック２００₂〜２００₈も、それぞれ対応する画素について、演算サブブロック２００₁と同じ演算を行い、それぞれ除算結果Ｓ２００₂〜Ｓ２００₈およびｖａｌデータＳ２２０₂〜Ｓ２２０₈を後段の演算ブロック２０１の演算サブブロック２０１₂〜２０１₈にそれぞれ出力する。
【０１９４】
図２１は、演算サブブロック２００₁の内部構成図である。
なお、図３に示す、全ての演算サブブロックは、基本的に、図２１に示す構成をしている。
図２１に示すように、演算サブブロック２００₁は、クロックイネーブラ２１０₁、データ用フリップフロップ２２２、プロセッサエレメント２２３およびフラグ用フリップフロップ２２４を有する。
クロックイネーブラ２１０₁は、システムクロック信号Ｓ２２５を基準としたタイミングでｖａｌデータＳ２２０₁を入力し、ｖａｌデータＳ２２０₁のレベルを検出する。そして、クロックイネーブラ２１０₁は、ｖａｌデータＳ２２０₁が、「１」である場合には、例えば、クロック信号Ｓ２１０₁にパルス発生させ、「０」である場合には、クロック信号Ｓ２１０₁にパルス発生させない。
【０１９５】
データ用フリップフロップ２２２は、クロック信号Ｓ２１０₁のパルスを検出すると、被演算データＳ２２１₁を取り込み、プロセッサエレメント２２３に出力する。
プロセッサエレメント２２３は、入力した被演算データＳ２２１₁を用いて前述した除算を行い、除算結果Ｓ２００₁を演算サブブロック２０１₁のデータ用フリップフロップ２２２に出力する。
フラグ用フリップフロップ２２４は、システムクロック信号Ｓ２２５を基準としたタイミングで、ｖａｌデータＳ２２０₁を取り込み、後段の演算ブロック２０１の演算サブブロック２０１₁のフラグ用フリップフロップ２２４に出力する。
なお、図２１に示すシステムクロック信号Ｓ２２５は、図２０に示す演算ブロック１９９、並びに全ての演算サブブロック２００₁〜２００₈，２０１₁〜２０１₈，２０２₁〜２０２₈，２０４₁〜２０４₈のクロックイネーブラおよびフラグ用フリップフロップ２２４に供給される。
すなわち、演算サブブロック２００₁〜２００₈，２０１₁〜２０１₈，２０２₁〜２０２₈，２０４₁〜２０４₈における処理は同期して行われ、同一の演算ブロックに内蔵された８個の演算サブブロックは並行して処理を行う。
【０１９６】
〔演算ブロック２０１〕
演算ブロック２０１は、演算サブブロック２０１₁〜２０１₈を有し、演算ブロック２００から入力した除算結果Ｓ２００₁〜Ｓ２００₈が示す「ｓ／ｑ」および「ｔ／ｑ」に、それぞれテクスチャサイズＵＳＩＺＥおよびＶＳＩＺＥを乗じて、テクスチャ座標データ（ｕ，ｖ）を生成する。
演算サブブロック２０１₁〜２０１₈は、それぞれクロックイネーブラ２１１₁〜２１１₈によりｖａｌデータＳ２２０₁〜Ｓ２２０₈のレベル検出を行った結果、当該レベルが「１」の場合にのみ演算を行い、それぞれ演算結果であるテクスチャ座標データＳ２０１₁〜Ｓ２０１₈を、演算ブロック２０２の演算サブブロック２０２₁〜２０２₈に出力する。
【０１９７】
〔演算ブロック２０２〕
演算ブロック２０２は、演算サブブロック２０２₁〜２０２₈を有し、メモリＩ／Ｆ回路１３を介して、ＳＲＡＭ１７あるいはＤＲＡＭ１６に、演算ブロック２０１で生成したテクスチャ座標データ（ｕ，ｖ）を含む読み出し要求を出力し、メモリＩ／Ｆ回路１３を介して、ＳＲＡＭ１７あるいはテクスチャバッファ２０に記憶されているテクスチャデータを読み出すことで、（ｕ，ｖ）データに対応したテクスチャアドレスに記憶された（Ｒ，Ｇ，Ｂ，α）データＳ１７を得る。
なお、テクスチャバッファ２０には、ＭＩＰＭＡＰ（複数解像度テクスチャ）などの複数の縮小率に対応したテクスチャデータが記憶されている。ここで、何れの縮小率のテクスチャデータを用いるかは、所定のアルゴリズムを用いて、前記三角形を単位として決定される。
また、ＳＲＡＭ１７には、テクスチャバッファ２０に記憶されているテクスチャデータのコピーが記憶されている。
演算サブブロック２０２₁〜２０２₈は、それぞれクロックイネーブラ２１２₁〜２１２₈によりｖａｌデータＳ２２０₁〜Ｓ２２０₈のレベル検出を行った結果、当該レベルが「１」の場合にのみ読み出し処理を行い、それぞれ読み出した（Ｒ，Ｇ，Ｂ，α）データＳ１７を、（Ｒ，Ｇ，Ｂ，α）データＳ２０２₁〜Ｓ２０２₈として、それぞれ演算ブロック２０３の演算サブブロック２０３₁〜２０３₈に出力する。
【０１９８】
なお、テクスチャエンジン回路１２は、フルカラー方式の場合には、テクスチャバッファ２０から読み出した（Ｒ，Ｇ，Ｂ，α）データを直接用いる。一方、テクスチャエンジン回路１２は、インデックスカラー方式の場合には、予め作成したカラールックアップテーブル（ＣＬＵＴ）をテクスチャＣＬＵＴバッファ２３から読み出して、内蔵するＳＲＡＭに転送および記憶し、このカラールックアップテーブルを用いて、テクスチャバッファ２０から読み出したカラーインデックスに対応する（Ｒ，Ｇ，Ｂ）データを得る。
【０１９９】
〔演算ブロック２０３〕
演算ブロック２０３は、演算サブブロック２０３₁〜２０３₈を有し、図４に示すＬＩＰ回路５２，５３，５４およびＬＩＰ／テクスチャファンクション回路５５を用いて、例えば、ＭＩＭＡＰ処理のレベル補間処理と、モジュレート処理、デカル処理、ハイライト処理、フォギング処理、テクスチャブレンディング処理およびアルファブレンディング処理などのテクスチャファンクション処理のうち一の処理を選択的に行う。
【０２００】
そして、演算ブロック２０３は、処理結果である（Ｒ，Ｇ，Ｂ，α）データＳ２０３₁〜Ｓ２０３₈を、演算ブロック２０４に出力する。
演算サブブロック２０３₁〜２０３₈は、それぞれクロックイネーブラ２１３₁〜２１３₈によりｖａｌデータＳ２２０₁〜Ｓ２２０₈のレベル検出を行った結果、当該レベルが「１」の場合にのみ混合テクスチャファンクション処理および（Ｒ，Ｇ，Ｂ，α）データＳ２０３₁〜Ｓ２０３₈の出力を行う。
【０２０１】
〔演算ブロック２０４〕
演算ブロック２０４は、演算サブブロック２０４₁〜２０４₈を有し、入力した（Ｒ，Ｇ，Ｂ，α）データＳ２０３₁〜Ｓ２０３₈について、ｚバッファ２２に記憶されたｚデータの内容を用いて、ｚ比較を行い、（Ｒ，Ｇ，Ｂ，α）データＳ２０３₁〜Ｓ２０３₈によって描画する画像が、前回、ディスプレイバッファ２１に描画した値よりも手前（視点側）に位置する場合には、ｚバッファ２２を更新すると共に、（Ｒ，Ｇ，Ｂ，α）データＳ２０３₁〜Ｓ２０３₈を、（Ｒ，Ｇ，Ｂ，α）データＳ２０４₁〜Ｓ２０４₈として、ディスプレイバッファ２１に書き込む。
演算サブブロック２０４₁〜２０４₈は、それぞれクロックイネーブラ２１４₁〜２１４₈によりｖａｌデータＳ２２０₁〜Ｓ２２０₈のレベル検出を行った結果、当該レベルが「１」の場合にのみ上述したｚ比較およびディスプレイバッファ２１への書き込みを行う。
なお、メモリＩ／Ｆ回路１３によるＤＲＡＭ１６に対してのアクセスは、１６画素について同時に行なわれる。
【０２０２】
以下、３次元コンピュータグラフィックシステム５０１の全体動作について説明する。
ポリゴンレンダリングデータＳ４が、メインバス６を介してメインプロセッサ４からＤＤＡセットアップ回路１０に出力され、ＤＤＡセットアップ回路１０において、三角形の辺と水平方向の差分などを示す変分データＳ１０が生成される。
この変分データＳ１０は、トライアングルＤＤＡ回路１１に出力され、トライアングルＤＤＡ回路１１において、三角形内部の各画素における線形補間された（ｚ，Ｒ，Ｇ，Ｂ，α，ｓ，ｔ，ｑ）データが算出される。そして、この算出された（ｚ，Ｒ，Ｇ，Ｂ，α，ｓ，ｔ，ｑ）データと、三角形の各頂点の（ｘ，ｙ）データとが、ＤＤＡデータＳ１１として、トライアングルＤＤＡ回路１１からテクスチャエンジン回路１２に出力される。
【０２０３】
次に、テクスチャエンジン回路１２およびメモリＩ／Ｆ回路１３において、ＤＤＡデータＳ１１を用いて、縮小率ＬＯＤの算出処理、「ｓ／ｑ」および「ｔ／ｑ」の算出処理、テクスチャ座標データ（ｕ，ｖ）の算出処理、テクスチャバッファ２０からのデジタルデータとしての（Ｒ，Ｇ，Ｂ，α）データの読み出し処理、テクスチャファンクション処理、および、ｚ比較処理が、図２０に示す演算ブロック１９９，２００，２０１，２０２，，２０３，２０４でパイプライン方式で順に実行される。
【０２０４】
次に、図１に示すテクスチャエンジン回路１２およびメモリＩ／Ｆ回路１３のパイプライン処理の動作について説明する。
ここでは、例えば、図７に示す矩形２５１内の８画素について同時処理する場合を考える。この場合には、ｖａｌデータＳ２２０₁，Ｓ２２０₂，Ｓ２２０₃，Ｓ２２０₅，Ｓ２２０₆が「０」を示し、ｖａｌデータＳ２２０₄，Ｓ２２０₇，Ｓ２２０₈が「１」を示している。
【０２０５】
そして、ｖａｌデータＳ２２０₁〜Ｓ２２０₈および被演算データＳ２２１₁〜Ｓ２２１₈が、演算ブロック１９９に入力され、演算ブロック１９９において、Ｉ₇の画素を代表点として縮小率ＬＯＤが算出され、当該算出された縮小率ＬＯＤ、ｖａｌデータＳ２２０₁〜Ｓ２２０₈および被演算データＳ２２１₁〜Ｓ２２１₈が演算ブロック２００に出力される。
【０２０６】
次に、ｖａｌデータＳ２２０₁〜Ｓ２２０₈および被演算データＳ２２１₁〜Ｓ２２１₈が、それぞれ対応する演算サブブロック２００₁〜２００₈のクロックイネーブラ２１０₁〜２１０₈に入力される。
そして、クロックイネーブラ２１０₁〜２１０₈において、それぞれｖａｌデータＳ２２０₁〜Ｓ２２０₈のレベルが検出される。具体的には、クロックイネーブラ２１０₄，２１０₇，２１０₈において「１」が検出され、クロックイネーブラ２１０₁，２１０₂，２１０₃，２１０₅，２１０₆において「０」が検出される。
その結果、演算サブブロック２００₄，２００₇，２００₈においてのみ、被演算データＳ２２１₄，Ｓ２２１₇，Ｓ２２１₈を用いて、「ｓ／ｑ」および「ｔ／ｑ」が算出され、当該除算結果Ｓ２００₄，Ｓ２００₇，Ｓ２００₈が演算ブロック２０１の演算ブロック２０１₄，２０１₇，２０１₈に出力される。
一方、演算サブブロック２００₁，２００₂，２００₃，２００₅，２００₆では、除算は行なわれない。
また、除算結果Ｓ２００₄，Ｓ２００₇，Ｓ２００₈の出力と同期して、ｖａｌデータＳ２２０₁〜Ｓ２２０₈が、演算ブロック２０１の演算サブブロック２０１₁〜２０１₈に出力される。
【０２０７】
次に、演算サブブロック２０１₁〜２０１₈のクロックイネーブラ２１０₁〜２１０₈において、それぞれｖａｌデータＳ２２０₁〜Ｓ２２０₈のレベルが検出される。
そして、この検出結果に基づいて、演算サブブロック２０１₄，２０１₇，２０１₈においてのみ、除算結果Ｓ２００₄，Ｓ２００₇，Ｓ２００₈が示す「ｓ／ｑ」および「ｔ／ｑ」に、それぞれテクスチャサイズＵＳＩＺＥおよびＶＳＩＺＥを乗じて、テクスチャ座標データＳ２０２₄，Ｓ２０２₇，Ｓ２０２₈が生成され、それぞれ演算ブロック２０２の演算サブブロック２０２₄，２０２₇，２０２₈に出力される。
一方、演算サブブロック２０１₁，２０１₂，２０１₃，２０１₅，２０１₆では、演算は行なわれない。
また、テクスチャ座標データＳ２０２₄，Ｓ２０２₇，Ｓ２０２₈の出力と同期して、ｖａｌデータＳ２２０₁〜Ｓ２２０₈が、演算ブロック２０２の演算サブブロック２０２₁〜２０２₈に出力される。
【０２０８】
次に、演算サブブロック２０２₁〜２０２₈のクロックイネーブラ２１２₁〜２１２₈において、それぞれｖａｌデータＳ２２０₁〜Ｓ２２０₈のレベルが検出される。
そして、この検出結果に基づいて、演算サブブロック２０２₄，２０２₇，２０２₈においてのみ、ＳＲＡＭ１７あるいはテクスチャバッファ２０に記憶されているテクスチャデータの読み出し処理が行なわれ、（ｓ，ｔ）データに対応したテクスチャアドレスに記憶された（Ｒ，Ｇ，Ｂ，α）データが読み出される。そして、この読み出した（Ｒ，Ｇ，Ｂ，α）データＳ２０２₄，Ｓ２０２₇，Ｓ２０２₈が、演算ブロック２０４の演算サブブロック２０３₄，２０３₇，２０３₈に出力される。
一方、演算サブブロック２０２₁，２０２₂，２０２₃，２０２₅，２０２₆では、読み出し処理は行なわれない。
また、（Ｒ，Ｇ，Ｂ，α）データＳ２０２₄，Ｓ２０２₇，Ｓ２０２₈の出力と同期して、ｖａｌデータＳ２２０₁〜Ｓ２２０₈が、演算ブロック２０３の演算サブブロック２０３₁〜２０３₈に出力される。
【０２０９】
次に、演算サブブロック２０３₁〜２０３₈のクロックイネーブラ２１２₁〜２１２₈において、それぞれｖａｌデータＳ２２０₁〜Ｓ２２０₈のレベルが検出される。
そして、この検出結果に基づいて、演算サブブロック２０３₄，２０３₇，２０３₈においてのみ、テクスチャファンクション処理が行われ、それによって得られた（Ｒ，Ｇ，Ｂ，α）データＳ２０３₄，２０３₇，２０３₈を、演算ブロック２０４に出力する。
一方、演算サブブロック２０３₁，２０３₂，２０３₃，２０３₅，２０３₆では、テクスチャファンクション処理は行なわれない。
【０２１０】
次に、演算サブブロック２０４₁〜２０４₈のクロックイネーブラ２１４₁〜２１４₈において、それぞれｖａｌデータＳ２２０₁〜Ｓ２２０₈のレベルが検出される。
そして、この検出結果に基づいて、演算サブブロック２０４₄，２０４₇，２０４₈においてのみ、（Ｒ，Ｇ，Ｂ，α）データＳ２０３₄，Ｓ２０３₇，Ｓ２０３₈について、ｚバッファ２２に記憶されたｚデータの内容を用いて、ｚ比較が行なわれ、（Ｒ，Ｇ，Ｂ，α）データＳ２０３₄，Ｓ２０３₇，Ｓ２０３₈によって描画する画像が、前回、ディスプレイバッファ２１に描画した値よりも手前に位置する場合には、ｚバッファ２２が更新されると共に、（Ｒ，Ｇ，Ｂ，α）データＳ２０３₄，Ｓ２０３₇，Ｓ２０３₈がディスプレイバッファ２１に書き込まれる。
【０２１１】
すなわち、テクスチャエンジン回路１２およびメモリＩ／Ｆ回路１３では、図６に示す矩形２５１の画素について同時に処理を行なう場合に、三角形２５０の外に位置する画素についての処理は行なわない。すなわち、図６に示す矩形２５１内の画素についての演算を行なっている間は、演算サブブロック２００₁，２００₂，２００₃，２００₅，２００₆，２０１₁，２０１₂，２０１₃，２０１₅，２０１₆，２０２₁，２０２₂，２０２₃，２０２₅，２０２₆，２０４₁，２０４₂，２０４₃，２０４₅，２０４₆は停止した状態になり、これらの演算サブブロックは電力を消費しない。
【０２１２】
以上説明したように、３次元コンピュータグラフィックシステム５０１は、前述した第１実施形態の３次元コンピュータグラフィックシステム１の効果に加えて以下に示す効果をさらに有する
すなわち、３次元コンピュータグラフィックシステム５０１によれば、テクスチャエンジン回路１２におけるパイプライン処理において、同時処理する８画素のうち、処理対象となる三角形の外部に位置する画素についての演算は行なわないようにすることができる。
そのため、テクスチャエンジン回路１２における消費電力を大幅に低減できる。その結果、３次元コンピュータグラフィックシステム５０１の電源として、簡単かつ安価なものを用いることができる。
なお、テクスチャエンジン回路１２は、図２０および図２１に示すように、各演算サブブロックに、クロックイネーブラおよび１ビットのフラグ用フリップフロップを組み込むことで、上述した機能を実現するが、クロックイネーブラおよび１ビットのフラグ用フリップフロップの回路規模は小さいため、テクスチャエンジン回路１２の回路規模が大幅に増大することはない。
【０２１３】
本発明は上述した実施形態には限定されない。
上述した実施形態では、同時に処理を行おうとする複数の画素データについて共通の縮小率を用いて記憶回路からテクスチャデータを読み出す場合を例示したが、前記複数の画素データについての個別に縮小率を算出する複数の縮小率算出回路を設けてもよい。この場合に、読み出し回路は、同時に処理を行おうとする複数の画素データにそれぞれ対応した複数設けられ、記憶回路、複数の縮小率算出回路および複数の読み出し回路が一つの半導体チップに混載される。
また、読み出し回路によって読み出したテクスチャデータを用いて同時に処理を行って複数の表示データを生成する複数の画像処理回路と、当該生成した表示データをＤＲＡＭなどの記憶回路に書き込む複数の書き込み回路とをさらに設け、記憶回路、複数の縮小率算出回路、複数の読み出し回路、複数の画像処理回路および複数の書き込み回路を一つの半導体チップに混載してもよい。
【０２１４】
【発明の効果】
以上説明したように、本発明の画像処理装置によれば、小規模な装置構成で、高画質を安定して提供できる。
【図面の簡単な説明】
【図１】図１は、本発明の実施形態の３次元コンピュータグラフィックシステムの構成図である。
【図２】図２は、図１に示すＤＤＡセットアップ回路における有効ビットデータの生成方法を説明するための図である。
【図３】図３は、図１に示すＳＡＲＡＭおよひテクスチャバッファに記憶されているＭＩＰＭＡＰ処理に用いられるテクスチャデータを説明するための図である。
【図４】図４は、図１に示すテクスチャエンジン回路の構成図である。
【図５】図５は、図４に示す代表点決定回路における処理のフローチャートである。
【図６】図６は、図４に示す代表点決定回路における処理を説明するための図である。
【図７】図７は、図２に示す三角形を処理対象としている場合の代表点の具体例を説明するための図である。
【図８】図８は、図４に示す縮小率算出回路の構成図である。
【図９】図９は、図８に示す縮小率算出回路の処理内容を説明するための図である。
【図１０】図１０は、図４に示すＬＩＰ／テクスチャファンクション回路の構成図である。
【図１１】図１１は、各モードにおいて、ＬＩＰ回路に入力されるデータを説明するための図である。
【図１２】図１２は、ＬＩＰ回路からＬＩＰ／テクスチャファンクション回路へのミップマップデータの入力タイミングおよびレベル補間処理の実行タイミングを説明するための図である。
【図１３】図１３は、図４に示すＬＩＰ回路の処理を説明するための図である。
【図１４】図１４は、図４に示すＬＩＰ回路の処理を説明するための図である。
【図１５】図１５は、図１３に示す前段の加算回路の部分構成図である。
【図１６】図１６は、図１に示すＤＲＡＭに対してのデータ格納方法を説明するための図である。
【図１７】図１７は、図１に示すレンダリング回路のロジック回路とＤＡＲＡＭおよび２次メモリとの好適な構成、配置および配線方法を説明するための図である。
【図１８】図１８は、図１７に示すＤＲＡＭモジュールの構成を説明するための図である。
【図１９】図１９は、本発明の第２実施形態の３次元コンピュータグラフィックシステムにおいて、図１に示すトライアングルＤＤＡ回路から出力されるＤＤＡデータのフォーマットを説明するための図である。
【図２０】図２０は、本発明の第２実施形態の３次元コンピュータグラフィックシステムにおけるテクスチャエンジン回路およびメモリＩ／Ｆ回路の部分構成図である。
【図２１】図２１は、図２０に示す演算サブブロックの構成図である。
【図２２】図２２は、ＭＩＰＭＡＰフィルタリング処理を説明するための図である。
【図２３】図２３は、従来の一般的なテクスチャマッピング装置を説明するための図である。
【図２４】図２４は、図２３に示すテクスチャマッピング装置における処理のフローチャートである。
【図２５】図２５は、高速処理を実現するテクスチャマッピング装置を説明するための図である。
【符号の説明】
１…３次元コンピュータグラフィックシステム、２…メインメモリ、３…Ｉ／Ｏインタフェース回路、４…メインプロセッサ、５…レンダリング回路、１０…ＤＤＡセットアップ回路、１１…トライアングルＤＤＡ回路、１２…テクスチャエンジン回路、１３…メモリＩ／Ｆ回路、１４…ＣＲＴコントローラ回路、１５…ＲＡＭＤＡＣ回路、１６…ＤＲＡＭ、１７…ＳＲＡＭ、２０…テクスチャバッファ、２１…ディスプレイバッファ、２２…Ｚバッファ、２３…テクスチャＣＬＵＴバッファ、３０１…代表点決定回路、３０２…ｓｔｑ選択回路、５０…縮小率演算回路、５１…テクスチャデータ読み出し回路、５５…ＬＩＰ／テクスチャファンクション回路[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an image processing apparatus capable of providing a high-quality image with a small configuration.
[0002]
[Prior art]
Computer graphics are often used in various CAD (Computer Aided Design) systems and amusement machines. In particular, with the recent development of image processing technology, systems using three-dimensional computer graphics are rapidly spreading.
In such 3D computer graphics, when determining the color corresponding to each pixel (pixel), the color value of each pixel is calculated, and the calculated color value is used as the display buffer corresponding to the pixel. Rendering processing to write to the (frame buffer) address.
One of the rendering processing methods is polygon rendering. In this method, a three-dimensional model is expressed as a combination of triangular unit graphics (polygons), and the color of the display screen is determined by drawing with the polygon as a unit.
[0003]
In polygon rendering, the coordinates (x, y, z), color data (R, G, B), and the homogeneous coordinates (s of texture data indicating the image pattern of pasting for each vertex of the triangle in the physical coordinate system. , T) and the value of the homogeneous term q are interpolated inside the triangle.
Here, simply speaking, the homogeneous term q is like an enlargement / reduction ratio, and the coordinates in the UV coordinate system of the actual texture buffer, that is, the texture coordinate data (U, V), are represented by the homogeneous coordinates (s , T) divided by the homogeneous term q and (s / q, t / q) = (u, v) are multiplied by the texture sizes USIZE and VSIZE, respectively, according to the multiplication results.
[0004]
In such a three-dimensional computer graphic system using polygon rendering, when drawing is performed, texture data is read from the texture buffer, and the read texture data is pasted on the surface of the three-dimensional model to obtain highly realistic image data. Perform texture mapping.
When texture mapping is performed on the stereo model, the enlargement / reduction ratio of the image indicated by the texture data to be pasted changes for each pixel.
[0005]
By the way, there is MIP (Multum In Parvo) MAP (Multiple Resolution Texture) filtering as a technique for obtaining high image quality when performing texture mapping.
In this MIPMAP filtering, as shown in FIG. 22, a plurality of filtered texture data 200, 201, 202, 203 corresponding to each of a plurality of different reduction ratios are prepared in advance, and the reduction ratio 204 of each pixel is set. By selecting 205 corresponding texture data, the optimum texture data 206 corresponding to the reduction ratio 204 is used, and the influence of aliasing due to information loss accompanying image reduction can be suppressed.
[0006]
Hereinafter, a conventional three-dimensional computer graphic system employing the above-described MIPMAP filtering will be described.
FIG. 23 is a diagram for explaining the configuration of a conventional three-dimensional computer graphic system, and FIG. 24 is a flowchart of processing in the texture mapping apparatus 210 shown in FIG.
As shown in FIG. 23, in a conventional three-dimensional computer graphic system, a texture mapping device 210, a texture buffer 211, and a display buffer 213 incorporated in different semiconductor chips are connected to each other via wiring.
[0007]
Hereinafter, processing in the texture mapping apparatus 210 will be described.
Step S1: First, the texture mapping device 210 obtains (s1, t1, q1), (s2, t2, q2), (s3, t3, q3) data indicating homogeneous coordinates and homogeneous terms for each vertex of the triangle. input.
Step S2: Next, the texture mapping apparatus 210 linearly interpolates (s1, t1, q1), (s2, t2, q2), (s3, t3, q3) data of each input vertex, (S, t, q) data indicating the homogeneous coordinates and the homogeneous terms of each pixel are obtained.
[0008]
Step S3: The texture mapping device 210 obtains the reduction rate lod of each pixel from the (s, t, q) data of each pixel inside the triangle in the built-in reduction rate calculation device 212.
Step S4: The texture mapping apparatus 210 calculates u data obtained by dividing s data by q data and v data obtained by dividing t data by q data for (s, t, q) data of each pixel, and texture Coordinate data (u, v) is obtained.
Next, the texture mapping device 210 obtains a texture address (U, V) that is a physical address in the texture buffer 211 from the reduction rate lod calculated by the reduction rate calculation device 212 and the texture coordinate data (u, v). .
[0009]
Step S5: The texture mapping device 210 outputs the texture address (U, V) to the texture buffer 211 and reads the texture data (R, G, B).
Step S6: The texture mapping apparatus 210 writes pixel data S210 obtained by performing predetermined processing on the read texture data in step S5 to the display buffer 213.
Thereby, access to the texture data corresponding to the reduction rate lod among the plurality of texture data stored in the texture buffer 211 and corresponding to each of a plurality of different reduction rates is realized.
[0010]
In order to realize high-speed drawing, there is a high-speed texture mapping apparatus that performs texture mapping processing on a plurality of pixels in parallel and simultaneously writes the pixel data in a display buffer.
In such a high-speed texture mapping apparatus, as shown in FIG. 25, (s1, t1, q1), (s2, t2, q2), (s3, t3, q3) data about the vertices of a triangle are converted into n pieces of data. Texture mapping device 210₁~ 210_nThe pixel data S210 that is the processing result is processed in parallel with₁~ S210_nAre simultaneously written to the display buffer.
That is, texture mapping processing for a plurality of pixels is performed in parallel (simultaneously).
[0011]
Note that the texture mapping process is performed in units of triangles that are unit graphics, and processing conditions such as the reduction rate of texture data are determined in units of triangles, and among the plurality of pixels that are processed simultaneously, the inside of the triangle Only the processing result for the pixel located at is valid, and the processing result for the pixel located outside the triangle is invalid.
[0012]
[Problems to be solved by the invention]
However, in the above-described conventional three-dimensional computer graphic system, the data transfer speed between the texture mapping apparatus 210, the texture buffer 211, and the display buffer 213 is a bottleneck for increasing the processing capacity of the entire system. It was.
[0013]
Further, in the conventional three-dimensional computer graphic system described above, the texture mapping device 210, the texture buffer 211, and the display buffer 213 are incorporated in different semiconductor chips.
[0014]
In addition, the calculation for obtaining the reduction ratio lod includes a large number of multiplications and divisions and requires a huge amount of calculation.
Accordingly, as shown in FIG. 25, n texture mapping devices 210 are provided.₁~ 210_n, The reduction rate calculation device 212₁~ 212_nHowever, there is a problem that the scale of the device is increasing.
In order to solve such a problem, among a plurality of texture mapping devices that perform processing in parallel, only one texture mapping device incorporates a reduction rate calculation device, and a pixel to be processed by the texture mapping device is determined. A method of using the reduction rate obtained by the reduction rate calculation device as a representative point for obtaining the reduction rate in all the texture mapping devices can be considered.
In this case, the position of the pixel serving as the representative point among the plurality of pixels that are processed simultaneously is fixed.
Therefore, a pixel located outside the triangle that is the unit graphic described above may be a representative point among a plurality of pixels that are processed simultaneously.
However, the reduction ratio may be greatly different between the inside and outside of the triangle, and when the pixel located outside the triangle to be processed is the representative point, the pixel located inside the triangle is optimal. Cannot select the correct texture data. As a result, there is a problem that the image quality is greatly reduced.
[0015]
The present invention has been made in view of the above-described problems of the prior art, and an object thereof is to provide an image processing apparatus that can stably provide high image quality with a small-scale apparatus configuration.
[0016]
[Means for Solving the Problems]
  According to the present invention, an image that represents a display model by combining unit graphics composed of a plurality of pixels to which common processing conditions are applied, and generates pixel data corresponding to the pixels using texture data as necessary. In the processing device,
  The display model is a three-dimensional model;
  The unit figure is a triangle;
  A storage circuit for storing display data and a plurality of texture data corresponding to different reduction ratios for the same pattern;
  A reduction ratio calculation circuit for calculating a reduction ratio commonly used for a plurality of pixel data to be processed simultaneously;
  A readout circuit for reading out the texture data corresponding to the calculated reduction ratio from the storage circuit;
  An image processing circuit that generates display data by simultaneously processing the plurality of pixel data using the read texture data;
  A writing circuit for writing the generated display data into the storage circuit;
  A representative point determination circuit for determining a pixel as a representative point from among the pixels corresponding to the plurality of pixel data to be processed at the same time, among the pixels located inside the unit graphic to be processed;
  Have
  The reduction rate calculation circuit substantially calculates LOD indicating the reduction rate based on the following formula,
    LOD = Clamp (((log ₂ 1 / q) + maxe)
                                                      << L + K)
      here,
      LOD is composed of an integer part and a decimal part, and is a symbol indicating an unsigned reduction rate,
      Clamp is a symbol indicating clamping in the following clamp circuit,
      q is a symbol indicating the homogeneous term,
      maxe is data consisting only of an integer part indicating the maximum coordinate of the homogeneous coordinates (s, t) and the homogeneous term q of the vertex of the unit graphic to be processed;
      << L indicates that data is shifted by L bits in the shift circuit described below.
      K is composed of an integer part and a decimal part, is signed, and is data used for addition in the following addition circuit,
  A normalization circuit that normalizes the homogeneous term data q to generate an exponent qe and a mantissa qm;
  A first shift circuit that shifts data obtained by bit-combining the exponent qe and the mantissa qm toward a MSB (Most Significant Bit) by a value indicated by the data L;
  A first inverting circuit for inverting the output of the first shift circuit;
  Enter the mantissa qm and enter "log ₂ Data output means for outputting data μ indicating ({1, qm}) − qm ”;
  A second shift circuit that shifts data obtained by bit-combining the data maxe and the data μ toward the MSB by a value indicated by the data L;
  A second inverting circuit for inverting the output of the second shift circuit;
  An addition circuit for adding the data obtained by bit-combining the data K and the binary number “10”, the output of the first inversion circuit, and the output of the second inversion circuit;
  A clamp circuit for clamping the output of the adder circuit within a predetermined bit to generate the reduction ratio LOD;
  Have
  The readout circuit receives texture data specified by the determined reduction ratio, the homogeneous coordinates (s, t), and the homogeneous term q from the storage circuit for each of the plurality of pixel data to be processed simultaneously. read out,
  An image processing apparatus is provided.
[0038]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, in this embodiment, a three-dimensional computer graphic that displays a desired three-dimensional image of an arbitrary three-dimensional object model applied to a home game machine or the like on a display such as a CRT (Cathode Ray Tube) at high speed. A case where the image processing apparatus of the present invention is applied to the system will be described.
FIG. 1 is a system configuration diagram of a three-dimensional computer graphic system 1 of the present embodiment.
The three-dimensional computer graphic system 1 represents a three-dimensional model as a combination of triangles (polygons) that are unit figures, draws this polygon, determines the color of each pixel on the display screen, and displays the polygon on the display It is a system that performs.
Further, in the three-dimensional computer graphic system 1, in addition to the (x, y) coordinates representing the position on the plane, the z coordinate representing the depth is used to represent a three-dimensional object, and this (x, y, z) An arbitrary point in the three-dimensional space is specified by three coordinates.
[0039]
As shown in FIG. 1, a three-dimensional computer graphic system 1 includes a main memory 2, an I / O interface circuit 3, a main processor 4, and a rendering circuit 5 connected via a main bus 6.
Here, the rendering circuit 5 corresponds to the image processing apparatus of the present invention.
Hereinafter, the function of each component will be described.
The main processor 4 reads out necessary graphic data from the main memory 2 according to the progress of the game, for example, and performs clipping processing, lighting processing, and geometry processing on the graphic data. Etc. to generate polygon rendering data. The main processor 4 outputs the polygon rendering data S4 to the rendering circuit 5 via the main bus 6.
The I / O interface circuit 3 inputs polygon rendering data from the outside as required, and outputs it to the rendering circuit 5 via the main bus 6.
[0040]
Here, the polygon rendering data includes (x, y, z, R, G, B, COE at each of the three vertices of the polygon._blend, S, t, q, COE_fog) Data.
Here, (x, y, z) data indicates the three-dimensional coordinates of the vertices of the polygon, and (R, G, B) data indicates the red, green, and blue luminance values in the three-dimensional coordinates. ing.
Data COE_blendIndicates a blend coefficient of R, G, B data of a pixel to be drawn from now and a pixel already stored in the display buffer 21.
Of the (s, t, q) data, (s, t) indicates the homogeneous coordinates of the corresponding texture, and q indicates the homogeneous term. Here, “s / q” and “t / q” are multiplied by the texture sizes USIZE and VSIZE, respectively, to obtain texture coordinate data (u, v). Access to the texture data stored in the texture buffer 20 is performed using the texture coordinate data (u, v).
Data COE_fogIndicates a mixing coefficient used in the fogging process.
[0041]
Hereinafter, the rendering circuit 5 will be described in detail.
As shown in FIG. 1, the rendering circuit 5 includes a DDA (Digital Differential Anarizer) setup circuit 10, a triangle DDA circuit 11, a texture engine circuit 12, a memory I / F circuit 13, a CRT controller circuit 14, a RAMDAC circuit 15, a DRAM 16, An SRAM 17 and a clock signal generation circuit 18 are included, and these are mounted together in one semiconductor chip.
Here, the texture engine circuit 12 and the DRAM 16 constitute an image processing apparatus of the present invention. The DRAM 16 corresponds to the memory circuit of the present invention. In the rendering circuit 5, as described above, by embedding each component in one semiconductor chip, it is possible to achieve high performance by reducing the data transmission speed between the components and to reduce the circuit scale.
[0042]
The DRAM 16 functions as a texture buffer 20, a display buffer 21, a z buffer 22, and a texture CLUT buffer 23.
The clock signal S18 from the clock signal generation circuit 18 is used as a signal for driving each component in the rendering circuit 5.
[0043]
DRAM16
The DRAM 16 includes a texture buffer 20 that stores texture data, a display buffer 21 that stores display data that is output to the CRT and displayed on the display, a z buffer 22 that stores z data, and a texture CLUT buffer 23 that stores color lookup data. Function.
[0044]
DDA setup circuit 10
Prior to obtaining the color and depth information of each pixel inside the triangle by linearly interpolating the value of each vertex of the triangle on the physical coordinate system in the triangle DDA circuit 11 at the subsequent stage, the DDA setup circuit 10 generates polygon rendering data. S4 indicates (z, R, G, B, COE_blend, S, t, q, COE_fog) Perform a setup calculation for the data to find the difference between the sides of the triangle and the horizontal direction.
Specifically, this set-up calculation uses the start point value, end point value, and distance between the start point and end point to calculate the variation of the value to be obtained when the unit length is moved. .
[0045]
That is, the DDA setup circuit 10 generates, for each pixel, dsdx, dtdx, dqdx, which are variations in the x direction of (s, t, q) data, and dsdy, dtdy, dqdy, which are variations in the y direction. To do.
The DDA setup circuit 10 outputs the calculated variation data S10 to the triangle DDA circuit 11.
[0046]
Triangle DDA circuit 11
The triangle DDA circuit 11 uses the variation data S10 input from the DDA setup circuit 10 and linearly interpolates (z, R, G, B, COE) at each pixel inside the triangle._blend, S, t, q, COE_fog) Calculate the data.
In addition, the triangle DDA circuit 11 has 1 bit of effective bit data I indicating whether or not eight pixels that are processed in parallel are positioned inside a triangle to be processed.₁~ I₈Is generated.
Effective bit data I₁~ I₈Is, for example, “1” for a pixel located inside the triangle and “0” for a pixel located outside the triangle.
Specifically, as shown in FIG. 2, the effective bit data I for a triangle 250 located in the x, y coordinate system.₁~ I₈Is determined.
In FIG. 2, a solid line indicates a rectangular area to which 8 (= 2 × 4) pixels to be processed simultaneously belong.
The triangle DDA circuit 11 includes (x, y) data of each pixel and (z, R, G, B, COE in the (x, y) coordinates._blend, S, t, q, COE_fog) Data and valid bit data I₁~ I₈And maxe data S11c indicating the maximum exponent of the s, t, and q data of the vertices of the triangle to be processed are output to the texture engine circuit 12 as DDA data S11.
Here, (R, G, B) data S11b and (s, t, q) data S11a shown in FIG.₁~ S11a₈Is (z, R, G, B, COE_blend, S, t, q, COE_fog) Obtained from the data.
In this embodiment, the triangle DDA circuit 11 outputs the DDA data S11 to the texture engine circuit 12 in units of 8 (= 2 × 4) pixels located in a rectangle that performs processing in parallel.
[0047]
Texture engine circuit 12
The texture engine circuit 12 selects the texture data reduction rate, calculates “s / q” and “t / q”, calculates the texture coordinate data (u, v), and determines the texture address (U, V). The calculation process, the (R, G, B, tα) data reading process from the texture buffer 20, the MIPMAP process, and the texture function process are sequentially performed by, for example, a pipeline method.
Note that the texture engine circuit 12 simultaneously performs processing for eight pixels located in a predetermined rectangular area in parallel.
The texture engine circuit 12 uses the same pattern of texture data for pixels located within the processing target triangle. However, the reduction ratio of the texture data to be selected is determined in units of 8 pixels located in the rectangular area to be processed simultaneously.
[0048]
The texture engine circuit 12 performs MIPMAP (multiple resolution texture) processing and texture function processing using the (R, G, B) data S17 read from the SRAM 17 or the texture buffer 20.
[0049]
In the MIPMAP processing, 4-point neighborhood interpolation processing for calculating (R, G, B) data of a pixel at a desired position in two dimensions from (R, G, B) data S17, and a reduction rate LOD (Level Of Detail). Level interpolation processing is performed to interpolate the levels.
In the SRAM 17 and the texture buffer 20, for example, as shown in FIG. 3, texture data corresponding to a plurality of reduction ratios based on MIPMAP, that is, texture data 100 at a reduction ratio of 1.0 and a reduction ratio LOD are stored. The texture data 101 at the level of 2.0 and the texture data 102 at the level of the reduction ratio LOD of 3.0 are stored.
Which reduction rate LOD texture data is to be used is determined using the reduction rate LOD calculated in units of polygons using a predetermined algorithm.
Note that the texture data 100, 101, and 102 are data indicating display patterns that have been subjected to filtering processing and suppressed the influence of aliasing due to information loss due to image reduction or the like.
[0050]
First, the 4-point neighborhood interpolation process of the MIPMAP process performed by the texture engine circuit 12 will be described.
In the 4-point neighborhood interpolation process, (R, G, B) data of points near 4 of the coordinates are obtained from the coordinates of the pixels to which the texture data is assigned.
For example, when the reduction ratio LOD is 1.0, the (R, G, B) data S17 of the texture data 100 shown in FIG. 3 is read from the SRAM 17 or the texture buffer 20 to the texture engine circuit 12.
Then, 4-point neighborhood interpolation data C, which is (R, G, B) data at position pixel0 shown in FIG._pixel0(R, G, B) data C of the four neighboring points A0, B0, C0, D0 of the position pixel0_A0, C_B0, C_C0, C_D0Is obtained based on the following formulas (3) to (5).
At this time, (R, G, B) data C_A0, C_B0, C_C0, C_D0Is obtained from the (R, G, B) data S17 of the texture data 100.
In the following formulas (3) to (5), a and b indicate the decimal part of the u coordinate and v coordinate of the position pixel0, respectively.
[0051]
[Equation 3]
C_AB0= C_B0× a + C_A0× (1-a) (3)
[0052]
[Expression 4]
C_CD0= C_D0× a + C_C0× (1-a) (4)
[0053]
[Equation 5]
C_pixel0= C_CD0Xb + C_AB0× (1-b) (5)
[0054]
Next, reduction level level interpolation processing will be described.
Here, a level interpolation process called tri-linear will be described as an example.
For example, when the reduction ratio LOD is 1.5, the texture engine circuit 12 uses the texture data 100 with the reduction ratio LOD of 1.0 as described above to use the 4-point neighboring interpolation data C at the position pixel0._pixel0And the four-point neighboring interpolation data C at the position pixel1 on the texture data 101 corresponding to the position pixel0 on the texture data 100 using the texture data 101 with the reduction ratio LOD of 2.0._pixel1Is calculated. Next, 4-point neighborhood interpolation data C_pixel0And C_pixel1Are linearly interpolated to obtain level interpolation data C with a reduction ratio LOD of 1.5._pixelIs calculated.
[0055]
That is, the 4-point neighborhood interpolation data C described above_pixel0Following this calculation process, the (R, G, B) data S17 of the texture data 101 shown in FIG.
Then, the texture engine circuit 12 generates 4-point neighboring interpolation data C, which is (R, G, B) data at the position pixel1 in FIG._pixel1, (R, G, B) data C of the four neighboring points A1, B1, C1, D1 of the position pixel1_A1, C_B1, C_C1, C_D1Is obtained based on the following formulas (6) to (8).
At this time, (R, G, B) data C_A1, C_B1, C_C1, C_D1Is obtained from the (R, G, B) data S17 of the texture data 101.
In the following formulas (6) to (8), c and d indicate the decimal part of the u and v coordinates of the position pixel1, respectively.
[0056]
[Formula 6]
C_AB1= C_B1× c + C_A1× (1-c) (6)
[0057]
[Expression 7]
C_CD1= C_D1× c + C_C1× (1-c) (7)
[0058]
[Equation 8]
C_pixel1= C_CD1Xd + C_AB1× (1-d) (8)
[0059]
Next, the texture engine circuit 12 performs level interpolation between the texture data 100 and 101 using the following equation (9), and (R, G, B) of the corresponding position (pixel) after the level interpolation. Level interpolation data C that is data_pixelAsk for. In the following equation (9), the mipmap coefficient COE_mipmapIndicates the decimal part 0.5 of the reduction ratio LOD.
[0060]
[Equation 9]
C_pixel= C_pixel1× COE_mipmap+ C_pixel0× (1-COE_mipmap)
... (9)
[0061]
Next, texture function processing performed by the texture engine circuit 12 will be described.
The texture function processing performed in the texture engine circuit 12 includes, for example, modulation processing, decal processing, highlight processing, fogging processing, alpha blending processing, and the like. .
Here, the modulation process is a process of modulating the color indicated by the fragment data with the color indicated by the texture data.
In the present embodiment, the fragment data is (R, G, B) data S11b included in the DDA data S11 input from the triangle DDA circuit 11.
The decal process is a process for replacing the color indicated by the fragment data with the color indicated by the texture data.
The highlight process is a process of adding the addition data Hi to the multiplication result in order to produce a highlight effect.
The fogging process is a process that produces an effect of blurring a distant object.
The alpha blending process is a process of mixing the color indicated by the source data and the color indicated by the destination data at a predetermined mixing ratio.
Here, the color indicated by the source data is the color indicated by the data stored in the display buffer 21 shown in FIG. 1, and the color indicated by the destination data is the color indicated by the data to be drawn in the display buffer 21.
[0062]
These texture function processes convert texture data into C_tex, Fragment data into C_flagThe addition data of the highlight processing is Hi, and the data after the modulation processing is C_mod, C after decal processing_dclThe data after highlight processing is C_hghThen, it can be expressed as the following formulas (10) to (12).
In Expression (12), Hi indicates addition data for highlighting.
[0063]
[Expression 10]
C_mod= C_tex× C_flag                              (10)
[0064]
## EQU11 ##
C_dcl= C_tex                                      ... (11)
[0065]
[Expression 12]
C_hgh= C_tex× C_flag+ Hi (12)
[0066]
In addition, the fogging process and the alpha blending process convert the fragment data into C_flag, Fog data is C_fog, Fog coefficient data_fog, Source (color) data C_src, Destination (color) data to C_dst, COE blending coefficient_blendAnd the data after fogging is C_fogged, C after blending data_blendThen, the following equations (13) and (14) are given.
[0067]
[Formula 13]
C_fogged= C_flag× COE_fog+ C_fog× (1-COE_fog)
... (13)
[0068]
[Expression 14]
C_blend= C_src× COE_blend+ C_dst× (1-COE_blend)
... (14)
[0069]
As described above, the level interpolation process and the texture function process of the MIPMAP process expressed by the expressions (9) to (14) can be expressed by the following expression (15) using the data A, B, COE, C, and D. .
In the present embodiment, utilizing this, the LIP circuit 61 is shared by the level interpolation process and the texture function process, as will be described later.
[0070]
[Expression 15]
D = A * COE + B (1-COE) (15)
[0071]
FIG. 4 is a partial circuit diagram of the texture engine circuit 12.
As shown in FIG. 4, the texture engine circuit 12 includes, for example, a reduction ratio calculation circuit 50, a readout circuit 51, LIP (Linear Inter Polator) circuits 52, 53, and 54, and a LIP / texture function circuit 55.
Here, the reduction ratio calculation circuit 50 corresponds to the reduction ratio calculation circuit of the present invention, the read circuit 51 corresponds to the read circuit of the present invention, and the LIP circuits 52, 53, 54 and the LIP / texture function circuit 55 correspond to the present invention. It corresponds to the image processing circuit.
Each component in the texture engine circuit 12 operates based on the clock signal S18 from the clock signal generation circuit 18 shown in FIG.
The texture engine circuit 12 performs part or all of MIMAP processing, modulation processing, decal processing, highlight processing, fogging processing, texture blending processing, and alpha blending processing, using the configuration shown in FIG.
[0072]
Hereinafter, the components of the texture engine circuit 12 shown in FIG. 4 will be described in detail.
[Representative point determination circuit 301]
The representative point determination circuit 301 receives the valid bit data I included in the DDA data S11 input from the triangle DDA circuit 11.₁~ I₈Then, a pixel to be a representative point is determined, and representative point instruction data S301 indicating the determined representative point is output to the stq selection circuit 302.
Specifically, the representative point determination circuit 301 arranges 8 pixels among 8 pixels of 2 rows × 4 columns that are processed at the same time among those located inside the triangle to be processed. The pixel closest to the center of the rectangular area is determined as the representative point.
[0073]
FIG. 5 is a flowchart of representative point determination processing in the representative point determination circuit 301.
Step S11: First, the representative point determination circuit 301 determines that the valid bit data I₁~ I₈Then, it is determined whether there is at least one indicating “1”, and if it exists, the process of step S12 is executed.
Step S12: The representative point determination circuit 301 uses the valid bit data I₁~ I₈Of these, it is determined whether or not there is only one indicating “1”. If there is one, the processing shown in step S15 is executed. In step S15, a pixel corresponding to valid bit data indicating “1” is determined as a representative point.
[0074]
Step S13: The representative point determination circuit 301 uses the valid bit data I₁~ I₈If there are two or more indicating “1” in the pixel, the pixel closest to the center of the rectangular area where the pixel to be processed simultaneously is arranged among the pixels corresponding to the effective bit data indicating “1”. Determined as a representative point.
At this time, if there are a plurality of pixels closest to the center of the rectangular area, it is determined whether or not these x coordinates are the same. If these x coordinates are different, the processing shown in step S16 is performed. Executed. In step S16, the pixel having the smallest x coordinate among the plurality of pixels closest to the center of the rectangular area is determined as the representative point.
[0075]
Step S14: When there are a plurality of pixels closest to the center of the rectangular area and these x coordinates are the same, the representative point determination circuit 301 represents the pixel with the smallest y coordinate among the plurality of pixels. Determine as a point.
[0076]
Hereinafter, the determination of the representative point in the representative point determination circuit 301 will be described with a specific example.
FIG. 6 is a diagram for explaining representative point determination in the representative point determination circuit 301.
Effective bit data I₁~ I₈The pixel arrangement corresponding to is set as shown in FIG. Here, A is the center of the rectangular area of the pixels to be processed simultaneously.
For example, as shown in FIG._FourIf only one is “1”, the representative point determination circuit 301 determines that the valid bit data I_FourThe pixel corresponding to is determined as a representative point.
[0077]
As shown in FIG. 6C, the effective bit data I₆And I₇Is “1” and the x-coordinates of the corresponding pixels are different, the effective bit data I having a small x-coordinate₆The pixel corresponding to is determined as a representative point.
As shown in FIG. 6D, the effective bit data I_ThreeAnd I₇Is “1” and the corresponding pixel has the same x-coordinate, the effective bit data I having a small y-coordinate₇The pixel corresponding to is determined as a representative point.
Further, as shown in FIG. 6E, the effective bit data I₂, I_Three, I₆And I₇Is "1", the effective bit data I having the smallest x-coordinate and y-coordinate₆The pixel corresponding to is determined as a representative point.
[0078]
Further, for the triangle 250 shown in FIG. 2, the representative points are determined in units of 8 pixels as shown in FIG. 7 based on the algorithm shown in FIG. In FIG. 7, a pixel in which “1” is circled is a representative point.
In this way, the representative point determination circuit 301 performs the effective bit data I₁~ I₈In order to dynamically determine the representative point from the pixels located inside the triangle to be processed among a plurality of pixels to be processed simultaneously, the representative point is surely determined inside the triangle. be able to.
As a result, appropriate texture data can be reliably selected for the pixels located inside the triangle, and high image quality can be stably provided.
In this embodiment, simultaneous processing is performed for 8 pixels, but only one reduction rate calculation circuit 301 is required, and the apparatus does not become large-scale.
[0079]
[Q data selection circuit 302]
The q data selection circuit 302 includes (s, t, q) data S11a for 8 pixels included in the DDA data S11.₁~ S11a₈Among these, q data corresponding to the pixel indicated by the representative point instruction data S301 is selected, and this is output to the reduction ratio calculation circuit 304 as q data S302.
[0080]
[Reduction ratio calculation circuit 304]
The reduction ratio calculation circuit 304 calculates the texture data reduction ratio LOD based on the maxe data S11c from the triangle DDA circuit 11 and the q data S302 from the q selection circuit 302.
Here, the maxe data S11c indicates the maximum index among the indices of the s, t, and q data of the vertices of the triangle to be processed as shown in FIG.
As described above, the reduction ratio calculation circuit 304 calculates the reduction ratio using the q data of the pixel determined as the representative point by the representative point determination circuit 301 and the maxe data S11c, and reads this as the reduction ratio LOD. To 51.
[0081]
Here, the reduction ratio LOD indicates how much the texture data of the original image is reduced. When the reduction ratio of the original image is 1/1, 1/2, 1 / 4, 1/8,...
The reduction rate LOD calculation processing in the reduction rate calculation circuit 304 is expressed by the following equation (16).
[0082]
[Expression 16]
LOD = Clamp (((log₂1 / q) −maxe) << L + K)
... (16)
[0083]
In the above formula (16),
LOD: Indicates reduction ratio, integer part 3 bits, decimal point part 4 bits, unsigned data,
maxe: indicates the maximum exponent of s, t, q of the vertices of the triangle shown in FIG. 2, integer part 8 bits, unsigned data,
q: integer part 10 bits, decimal part 5 bits, signed data,
L: 2 bits, unsigned data, the maximum value of L is decimal number “3”
K: integer part 8 bits, decimal part 4 bits, signed data
[0084]
Hereinafter, the reduction ratio calculation circuit 50 will be described in detail.
FIG. 8 is a configuration diagram of the reduction ratio calculation circuit 50.
As illustrated in FIG. 8, the reduction ratio calculation circuit 50 includes, for example, a priority encoder 101, a shift circuit 102, shift circuits 201 and 202, a table 203, inverters 204 and 205, an adder circuit 206, and a clamp circuit 109.
Here, the priority encoder 101 and the shift circuit 102 correspond to the normalization circuit of the present invention, the shift circuit 201 corresponds to the first shift circuit of the present invention, and the inverter 204 corresponds to the first inversion circuit of the present invention. The table 203 corresponds to the data output means of the present invention, the shift circuit 202 corresponds to the second shift circuit of the present invention, the inverter 205 corresponds to the second inversion circuit of the present invention, and the adder circuit 206 Corresponding to the adder circuit of the present invention, the clamp circuit 109 corresponds to the clamp circuit of the present invention.
The reduction ratio calculation circuit 50 performs the calculation of the equation (16) and outputs the reduction ratio LOD as the calculation result to the reading circuit 51 shown in FIG.
[0085]
The priority encoder 101 uses the log data “log” of the q data input from the q data selection circuit 302 shown in FIG.₂q ”and the logarithmic value of“ log ”2₂q "integer value" int (log₂q) ", that is, the exponent qe is output to the shift circuits 102 and 201 as data a.
[0086]
The shift circuit 102 converts the data qm, which is the decimal part of the result of shifting the data q input from the q data selection circuit 302 shown in FIG. 4 toward the LSB by the exponent qe input from the priority encoder 101, and the data b2 To the shift circuit 201 and the table 203.
[0087]
The data “a” output from the priority encoder 101 and the data “b2” output from the shift circuit 102 are bit-coupled, and the resulting data {a, b2} is output to the shift circuit 201.
[0088]
The shift circuit 201 outputs data δ2 that is a result of shifting the input data {a, b2} by the input data L toward the MSB to the inverter 204.
[0089]
The inverter 204 inverts the data δ2 and outputs the result data ￣δ2 to the adder circuit 206.
[0090]
The table 203 includes data qm and “log”₂({1, qm}) − qm ”, and“ log ”corresponding to the data qm using the data qm (= b2) input from the shift circuit 102 as a key.₂({1, qm}) − qm ”is obtained from the correspondence table, and is output to the shift circuit 202 as data μ.
In addition, instead of the table 203, the input data qm is used and “log”₂A program that automatically generates ({1, qm}) − qm ”may be used.
[0091]
The maxe data S11c input from the triangle DDA circuit 11 and the data μ output from the table 203 are combined with (000) before the data μ so that the maxe data S11c is an integer part and the decimal part is 7 bits. Then, they are combined as a decimal part, and the data {maxe, 3′b0, μ} after the bit combination is output to the shift circuit 202. Here, “3′b0” means Verilog-HDL notation and means 3-bit binary number 0.
[0092]
The shift circuit 202 outputs to the inverter 205 γ2 that is the result of shifting the input data {maxe, 3′b0, μ} by the input data L toward the MSB.
[0093]
The inverter 205 inverts the data γ2 and outputs the result data ￣γ2 to the adder circuit 206.
[0094]
Further, data K and “10” are combined by adding “0” of 1 bit before “10” so that the decimal part becomes 7 bits, and the data {K, 3′b0 after the bit combination is combined. , 10} is output to the adder circuit 206.
[0095]
The adder circuit 206 adds the data {K, 3′b0, 10}, the data ￣δ2, and the data ￣γ2, and outputs the addition result ε2 to the clamp circuit 109.
[0096]
The clamp circuit 109 clamps (rounds) the data ε2 input from the adder circuit 206 to data of an integer part 3 bits and a decimal part 4 bits, and outputs the result as a reduction ratio LOD to the readout circuit 51 shown in FIG. .
[0097]
In the reduction ratio calculation circuit 50 shown in FIG. 8, μ (= “log” corresponding to the input data qm.₂Using the table 203 shown in FIG. 8 that outputs ({1, qm}) − qm ”), the maxe data S11c consisting only of the integer part and the data μ consisting only of the fractional part are bit-coupled, and maxe The addition process for the data S11c is reduced. Thereby, according to the reduction ratio calculation circuit 50, it is possible to reduce the number of gates and speed up the calculation process.
In the reduction ratio calculation circuit 50, “log” in the equation (16) is used.₂(1 / q) "fractional part b and upper 4 bits of mantissa qm (= 2^-Four) And the accuracy of the lower 4 bits of the 7-bit decimal part of the reduction ratio LOD is determined as “log”₂By obtaining the lower 4 bits of the 7-bit decimal part of (1 / q) "using the table 203, the" log "shown in the above equation (16) that causes an error in the reduction ratio LOD₂(1 / q) "error is 2^-7Even if the data L is the maximum value “3”, the error of the reduction ratio LOD is 2^-FourTo the extent possible.
[0098]
FIG. 9 is a diagram for explaining processing in the reduction ratio calculation circuit 50 shown in FIG.
Hereinafter, the operation of the reduction ratio calculation circuit 50 shown in FIG. 8 will be described using a specific example with reference to FIG.
Here, a case where the calculation process of the equation (16) is performed in the reduction ratio calculation circuit 50 using the following values will be exemplified.
[0099]
q = (0001001110.10100):
maxe = (00000001.0000000):
L = (01):
K = (00010011.0000):
[0100]
q (0001001110.10100) which is q data S302 output from the q selection circuit 302 is input to the priority encoder 101 and the shift circuit 102 shown in FIG.
Next, in the priority encoder 101, an exponent qe (000110) of q (0001001110.10100) is obtained, and a (00000110) which is 8-bit data obtained by adding (00) to the MSB side of the exponent qe (000110). Is output.
[0101]
Next, in the shift circuit 102, a (0001001110.10100) is shifted toward the MSB side by a (00000110), and the mantissa qm (0011101) that is the fractional part after the shift is stored in the table 203 as data b2. Is output.
Next, in the table 203, 4-bit data μ (“log” corresponding to the mantissa qm (0011101) is stored.₂Μ (1000) which is ({1, qm}) − qm ″) is obtained, and μ (1000) is output.
[0102]
Next, data maxe (00000001) and data μ (1000) input from the triangle DDA circuit 11 are combined as a decimal part by combining (000) before the data μ so that the decimal part becomes 7 bits. The data {maxe, 3′b0, μ} = (00000001.0001000) after the bit combination is output to the shift circuit 202.
[0103]
Next, the shift circuit 202 shifts the input data {maxe, 3′b0, μ} = (00000001.0001000) by the input data L = (01) toward the MSB γ2 ( 000000010.0010000) is output to the inverter 205.
Next, in the inverter 205, the data γ2 (000000010.0010000) is inverted, and the result data ￣γ2 (11111101.1101111) is output to the adder circuit 206.
[0104]
Also, the data a (00000110) output from the priority encoder 101 and the data b2 (0011101) output from the shift circuit 102 are bit-coupled, and the resulting data {a, b2} = (000001110.0011101) is shifted. It is output to the circuit 201.
[0105]
Next, in the shift circuit 201, the data {a, b2} = (000000110.0011101) is shifted by only the input data L (01) toward the MSB, and the resulting data δ2 (00001100.0111010) Is output to the inverter 204.
[0106]
Next, in the inverter 204, the data δ2 (00001100.0111010) is inverted, and the resulting data ￣δ2 (11110011.1000101) is output to the adder circuit 206.
[0107]
Further, data K (00010011.0000) and “10” are combined by adding “0” of 1 bit before “10” so that the decimal part becomes 7 bits, and the data { K, 3′b0,10} = (00010011.0000010) is output to the adder circuit 206.
[0108]
Next, the adder circuit 206 adds the data {K, 3′b0, 10}, the data ２δ2, and the data ￣γ2, and outputs the addition result ε2 (00000100.0110110) to the clamp circuit 109. Is done.
[0109]
Next, in the clamp circuit 109, the data ε2 input from the adder circuit 206 is clamped (rounded) to the data of the integer part 3 bits and the decimal part 4 bits, and the result (100.0110) is the reduction ratio LOD. Is output to the readout circuit 51 shown in FIG.
[0110]
[Read circuit 51]
The read circuit 51 uses the address (u, v) calculated based on the (s, t, q) data included in the DDA data S11, the reduction ratio LOD, and the predetermined texture sizes USIZE and VSIZE. Alternatively, (R, G, B) data is read from the address in the texture buffer 20 and output to the LIP circuits 52 and 53 as texture data.
At this time, when the decimal part of the reduction ratio LOD input from the reduction ratio calculation circuit 50 is not 0, the reading circuit 51 outputs two texture data having a reduction ratio corresponding to the integer parts before and after the reduction ratio LOD. Each is read out sequentially in one clock cycle of the clock signal S18 and output to the LIP circuits 52 and 53.
[0111]
[LIP circuits 52 and 53]
The LIP circuit 52 generates the interpolation data S52 by performing the calculation of the four-point neighborhood interpolation processing corresponding to the above-described equation (3) within one clock cycle for the pixel to be calculated, and the interpolation data S52 is LIP. Output to the circuit 54.
Subsequently, the LIP circuit 52 performs the calculation of the four-point neighborhood interpolation processing corresponding to the above-described equation (6) within one clock cycle for the pixel to be calculated to generate the interpolation data S52, and the interpolation data S52 is output to the LIP circuit 54.
[0112]
The LIP circuit 53 generates the interpolation data S53 by performing the calculation of the 4-point neighborhood interpolation processing corresponding to the above-described equation (4) within one clock cycle for the pixel to be calculated, and the interpolation data S53 is LIP. Output to the circuit 54.
Subsequently, the LIP circuit 53 performs the calculation of the 4-point neighborhood interpolation processing corresponding to the above-described equation (7) for the pixel to be calculated within one clock cycle to generate the interpolation data S53, and the interpolation data S53 is output to the LIP circuit 54.
The operation of the LIP circuit 53 is performed in parallel with the operation of the LIP circuit 52.
[0113]
[LIP circuit 54]
The LIP circuit 54 uses the interpolation data S52 and S53 from the LIP circuits 52 and 53 to perform the calculation of the four-point neighborhood interpolation processing corresponding to the above-described equation (5) within one clock cycle, thereby obtaining the four-point neighborhood interpolation data. C_pixel0And four-point neighborhood interpolation data C_pixel0Is output to the LIP / texture function circuit 55.
At this time, if the fractional part of the reduction ratio LOD is not 0, the LIP circuit 54 uses the interpolation data S52 and S53 and uses the four-point neighboring interpolation data C used for the level interpolation processing._pixel0And 4-point interpolation data C_pixel1Are generated in order.
For example, when the reduction ratio LOD is 1.5 as described above, the LIP circuit 54 calculates the 4-point neighborhood interpolation data C based on the above equation (5)._pixel0Is generated in one clock cycle, and then the four-point neighborhood interpolation data C is calculated based on the above equation (8)._pixel1Are generated in one clock cycle.
The configuration and processing of the LIP circuits 52, 53, and 54 are basically the same as the configuration and processing of the LIP circuit 61 described later.
[0114]
[LIP / texture function circuit 55]
FIG. 10 is a configuration diagram of the LIP / texture function circuit 55.
The LIP / texture function circuit 55 receives the four-point neighborhood interpolation data C from the LIP circuit 54._pixel0(If necessary, 4-point neighborhood interpolation data C_pixel1) Is used to perform part or all of the MIMAP processing level interpolation processing and the texture function processing such as modulation processing, decal processing, highlight processing, fogging processing, texture blending processing and alpha blending processing.
Specifically, the LIP / texture function circuit 55, when the fractional part of the reduction ratio LOD is 0, the 4-point neighboring interpolation data C input from the LIP circuit 54._pixel0Is used to perform necessary processing of texture function processing.
Further, the LIP / texture function circuit 55, when the fractional part of the reduction ratio is not 0, the 4-point neighborhood interpolation data C input from the LIP circuit 54._pixel0, C_pixel1After performing level interpolation processing using, necessary processing of texture function processing is performed.
[0115]
As shown in FIG. 10, the LIP / texture function circuit 55 includes a preprocessing circuit 60, an LIP circuit 61, and a register 62.
As shown in FIG. 10, the preprocessing circuit 60 includes a mode controller 70, a register 74, multiplexers 75 to 78, and registers 85 to 88.
As shown in FIG. 10, the mode controller 70 includes a decoder 71, a counter 72, and a decoder 73.
[0116]
The decoder 71 monitors the count value of the counter 72, and at the timing when the count value of the counter 72 reaches “0”, the initial value “0”, “1” or the number corresponding to the number of processes sharing the LIP circuit 61 Set “2”.
For example, the decoder 71 sets the initial value “0” in the counter 72 when the LIP circuit 61 performs only one process, and the initial value “1” when the LIP circuit 61 is shared by two processes. When the LIP circuit 61 is shared by three processes, the count value “2” is set.
In this embodiment, the case where “0”, “1”, and “2” are used as the initial values to be set in the count value 72 is exemplified. However, the initial value is a value for processing that shares the LIP circuit 61. Any number can be set according to the number.
The decoder 71 receives function mode data FMD from, for example, the main processor 4 shown in FIG. 1 or a main controller (not shown) in the texture engine circuit 12.
The function mode data FMD designates, for example, modes “1” to “8” shown in FIG. 11 for each clock cycle, and controls to input data corresponding to each mode to the LIP circuit 61 as will be described later. Used for. That is, the content of the process performed by the LIP circuit 61 is determined based on the function mode data FMD. The contents of FIG. 11 will be described in detail later.
For example, based on the function mode data FMD, the decoder 71 decreases the count value of the counter 72 by 1 each time processing of one mode is completed in the LIP circuit 61.
[0117]
The decoder 73 receives function mode data FMD and fog enable data FED from the main processor 4 or the main controller (not shown) in the texture engine circuit 12 shown in FIG.
The decoder 73 receives the mipmap number data MND from the LIP circuit 54 or the reading circuit 51.
[0118]
As described above, the function mode data FMD designates, for example, the modes “1” to “8” shown in FIG. 11 for each clock cycle, and sends data corresponding to each mode to the LIP circuit 61 as described later. Used for control to input. In the example illustrated in FIG. 11, the LIP circuit 61 exemplifies a case where the level interpolation process, the modulation process, the highlight process, the decal process, and the fogging process of the MIPMAP process are performed.
In this case, as shown in FIG. 11, for example, module processing and highlight processing are assigned different modes depending on whether only the processing is performed or the level interpolation processing of the MIPMAP processing is performed. Yes. Also, the fogging process has different modes depending on whether only the process or the modulation process is performed. This is because it is necessary for the decoder 73 to determine whether or not to feed back the processing result of the LIP circuit 61 shown in FIG.
Note that the mode shown in FIG. 11 is an example, and various other modes can be designated.
[0119]
For example, the fog enable data FED indicates a logical value “1” when the fogging process is performed, and indicates a logical value “0” when the fogging process is not performed.
[0120]
Further, the mipmap number data MND is the four-point neighboring interpolation data C when the LIP circuit 61 does not perform level interpolation processing (when the decimal part of the reduction ratio LOD is 0)._pixel0And four-point neighboring interpolation data C when level interpolation processing is performed._pixel1Indicates a logical value “1”.
The mipmap number data MND is four-point neighboring interpolation data C when performing level interpolation processing._pixel0The logical value “0” is indicated at the timing of inputting “”.
The mipmap number data MND is used for controlling the multiplexers 77 and 78 by the decoder 73, as will be described later.
[0121]
Based on the function mode data FMD, the mipmap number data MND, and the fog enable data FED, the decoder 73 supplies the LIP circuit 61 with data necessary for the LIP circuit 61 to perform processing specified by the function mode data FMD. In addition, the multiplexers 75 to 78 are controlled.
[0122]
Specifically, the decoder 73 performs the 4-point neighborhood interpolation data C input from the LIP circuit 54 while the mipmap number data MND indicates the logical value “0”._pixel0The multiplexer 77 is controlled not to output the signal to the register 87. At this time, 4-point neighborhood interpolation data C_pixel0Is written into the register 74.
In the decoder 73, the function mode data FMD indicates “1” shown in FIG. 11, and when the LIP circuit 61 performs the level interpolation process of the MIPMAP process, the mipmap number data MND has the logical value “1”. 4 points neighboring interpolation data C read from the register 74_pixel0Is output to the register 88 and the four-point neighborhood interpolation data C input from the LIP circuit 54 is output._pixel1Are output to the register 87, the multiplexers 78 and 77 are controlled.
Further, the decoder 73 receives the mips input from the reduction ratio calculation circuit 50 shown in FIG. 4 while the function mode data FMD shows “1” shown in FIG. 11 and the mipmap number data MND shows the logical value “1”. Map coefficient COE_mipmapIs output to the register 86. The multiplexer 76 is controlled. At the same time, the decoder 73 controls the multiplexer 75 so as to output a logical value “0” to the register 85.
As a result, 4-point neighborhood interpolation data C_pixel0, C_pixel1And mipmap coefficient COE_mipmapAre simultaneously written in the registers 88, 87 and 86, respectively, and in the LIP circuit 61, the 4-point neighborhood interpolation data C_pixel0, C_pixel1Level interpolation processing using is performed.
[0123]
12 shows four-point neighborhood interpolation data C from the LIP circuit 54 to the mode controller 70 shown in FIG._pixel0, C_pixel15 is a timing chart for explaining the input timing and the execution timing of the level interpolation processing in the LIP / texture function circuit 55.
In FIG. 12, data with the same (a), (b), and (c) indicate data related to the same level interpolation processing.
[0124]
For example, based on the clock signal S18 shown in FIG. 12A, four-point neighborhood interpolation that is the target of level interpolation input from the LIP circuit 54 to the LIP / texture function circuit 55 at the timing shown in FIG. Data C_pixel0Is stored in the register 74.
Then, in the next clock cycle, the 4-point neighborhood interpolation data C read from the register 74 is displayed._pixel0Is connected to the IN of the LIP circuit 61 via the multiplexer 78 and the register 88._A4-point neighborhood interpolation data C output from the LIP circuit 54 and output to the terminal_pixel1Is connected to the IN of the LIP circuit 61 through the multiplexer 77 and the register 87._BOutput to the terminal.
Then, in the next clock cycle, as shown in FIG. 12C, in the LIP circuit 61, the mipmap data C_pixel0, C_pixel1Level interpolation processing using is performed.
As can be seen from FIG. 12C, the throughput of the 4-point neighborhood interpolation process of the MIPMAP process performed using the LIP circuits 52, 53, and 54 is 2 clock cycles, whereas the LIP circuit 61 performs the MIMAP process. Level interpolation processing is performed in one clock cycle. Accordingly, when only the level interpolation process is performed in the LIP circuit 61, a free time during which the process is not performed in the LIP circuit 61 occurs. In this embodiment, as will be described later, the LIP circuit 61 is caused to perform texture function processing using the idle time. That is, the 4-point neighborhood interpolation process of the MIPMAP process and the texture function process are interleaved.
[0125]
Further, in the decoder 73, when the function mode data FMD indicates “2” shown in FIG. 11 and the LIP circuit 61 performs only the modulation processing, the decoder 73 inputs from the LIP circuit 54 for the corresponding one clock cycle. 4-point neighborhood interpolation data C_pixel0Is output to the register 88 through the register 74, and (R, G, B) data S11b (fragment data C) included in the DDA data S11 input from the triangle DDA circuit 11 is output._flag) Is output to the register 86, the multiplexers 78 and 76 are controlled.
At the same time, the decoder 73 controls the multiplexers 77 and 75 to output the logical value “0” to the register 87 and output the logical value “0” to the register 85.
[0126]
In addition, when the function mode data FMD indicates “3” shown in FIG. 11 and the LIP circuit 61 performs the modulation process following the level interpolation process of the MIPMAP process, the decoder 73 corresponds to one clock cycle. During this period, the level interpolation data fed back from the OUT terminal of the LIP circuit 61 is output to the register 88, and the fragment data C_flagThe multiplexers 78 and 76 are controlled so as to output to the register 86.
At the same time, the decoder 73 controls the multiplexers 77 and 75 to output the logical value “0” to the register 87 and output the logical value “0” to the register 85.
[0127]
Further, in the decoder 73, when the function mode data FMD indicates “4” shown in FIG. 11 and the LIP circuit 61 performs only the highlight processing, the decoder 73 inputs from the LIP circuit 54 for the corresponding one clock cycle. 4-point neighborhood interpolation data C_pixel0Is output to the register 88 via the register 78, and (R, G, B) data S11b (fragment data C) included in the DDA data S11 input from the triangle DDA circuit 11 is output._flag) Is output to the register 86, the multiplexers 78 and 76 are controlled.
At the same time, the decoder 73 outputs a logical value “0” to the register 87, and outputs the addition data Hi of the highlight operation input from the main processor 4 or the main controller (not shown) in the texture engine circuit 12 to the register 85. Thus, the multiplexers 77 and 75 are controlled.
[0128]
In addition, when the function mode data FMD indicates “5” shown in FIG. 11 and the LIP circuit 61 performs the highlight process subsequent to the level interpolation process of the MIPMAP process, the decoder 73 corresponds to one clock cycle. During this period, the level interpolation data fed back from the OUT terminal of the LIP circuit 61 is output to the register 88, and the fragment data C_flagThe multiplexers 78 and 76 are controlled so as to output to the register 86.
At the same time, the decoder 73 outputs a logical value “0” to the register 87, and outputs the addition data Hi of the highlight operation input from the main processor 4 or the main controller (not shown) in the texture engine circuit 12 to the register 85. Thus, the multiplexers 77 and 75 are controlled.
[0129]
Further, when the function mode data FMD indicates “6” shown in FIG. 11 and the LIP circuit 61 performs only the decal processing, the decoder 73 inputs from the LIP circuit 54 during the corresponding one clock cycle. 4-point interpolation data C_pixel0Is output to the register 88 via the register 78, and the multiplexers 78 and 76 are controlled so that the logical value “0xff (same as 0xFF)” is output to the register 86.
At the same time, the decoder 73 controls the multiplexers 77 and 75 to output the logical value “0” to the register 87 and output the logical value “0” to the register 85.
[0130]
Further, in the decoder 73, when the function mode data FMD indicates “7” shown in FIG. 11 and the LIP circuit 61 performs only fogging processing, the decoder 73 inputs from the triangle DDA circuit 11 for the corresponding one clock cycle. (R, G, B) data S11b (fragment data C) included in the DDA data S11_flag) Is output to the register 88 via the register 74, for example, fog data C set in a fog register (not shown)._fogAre output to the register 87, the multiplexers 78 and 77 are controlled.
At the same time, the decoder 73 performs the fogging coefficient COE included in the DDA data S11 input from the triangle DDA circuit 11._fogIs output to the register 86. The multiplexer 76 is controlled.
At the same time, the decoder 73 controls the multiplexer 75 so as to output the logical value “0” to the register 85.
[0131]
In the decoder 73, when the function mode data FMD indicates “8” shown in FIG. 11 and the LIP circuit 61 performs the fogging process following the modulation process, the decoder 73 performs the LIP for the corresponding one clock cycle. The level interpolation data fed back from the OUT terminal of the circuit 61 is output to the register 88, and the fog data C read from the fog register (not shown) is output._fogAre output to the register 87, the multiplexers 78 and 77 are controlled.
At the same time, the decoder 73 performs the fogging coefficient COE included in the DDA data S11 input from the triangle DDA circuit 11._fogIs output to the register 86. The multiplexer 76 is controlled.
At the same time, the decoder 73 controls the multiplexer 75 so as to output the logical value “0” to the register 85.
[0132]
Further, the decoder 73 is a shared process when the LIP circuit 61 is shared by the level interpolation process of the MIMPAP process and the two or more texture function processes, that is, when the LIP circuit 61 is shared by a total of three or more processes. For example, a wait instruction is output to the read circuit 51 shown in FIG. 4 and the triangle DDA circuit 11 shown in FIG.
For example, when the LIP circuit 61 is shared by the level interpolation process and the two texture function processes, the wait instruction is read during one clock cycle in which the LIP circuit 61 is processing the second texture function process. It outputs to the circuit 51 and the triangle DDA circuit 11.
[0133]
When the LIP circuit 61 performs the calculation of the equation (15), the 8-bit data A, B, COE, and C are respectively converted into IN_ATerminal, IN_BTerminal, IN_coeTerminal and IN_CEach is input from the terminal, and 8-bit data D is output from the OUT terminal.
[0134]
As shown in FIG. 13, the LIP circuit 61 includes correction data F, partial products out_0 to out_7 in which data A or B is selected based on logical values of corresponding bits of the data COE, and data that is a product-sum operation term. The calculation shown in the equation (15) is performed by shifting and adding C.
[0135]
The correction data F has a value in which the data A is selected when the data COE = 0xFF (COE = 1.0), and the data B is selected otherwise.
In the system in which the correction data F is viewed as “1” when all of the 8 bits have the logical value “1”, for example, the calculation shown in the following formula (17) is changed to the following formula (18). Used to correct. That is, correction is performed so that “X × 1.0 = X”.
[0136]
[Expression 17]
0xFF × 0xFF = 0xFE (17)
[0137]
[Formula 18]
0xFF × 0xFF = 0xFF (18)
[0138]
The partial products out_0 to out_7 indicate data A if bits 0 to 7 of the data COE are logical values “1”, and indicate data B if the logical values are “0”.
Here, the LSB of the data COE is bit 0 and the MSB is bit 7.
The partial product out_n (0 ≦ n ≦ 7) is, for example, as shown in FIG.₀~ 80₇Is generated using
Specifically, when 0 ≦ m ≦ 7, the multiplexer 80_mBit data A [m] of bit m of data A, bit data B [m] of bit m of data B, and bit data COE [n] of n of data COE are input, and bit data COE [ If n] is a logical value “1”, bit data A [m] is selected and output as bit data out_n [m].
Note that the partial product out_n is configured by the bit data out_n [0] to out_n [7].
[0139]
The partial product out_n is shifted by n bits toward the MSB, and then output to the adder circuit 81 adopting a wallace_tree type architecture.
Further, the data C as the product-sum operation term is shifted by 8 bits toward the MSB so as to be added to the upper 8 bits of the multiplication result of 8 bits × 8 bits, as shown in FIG. 81 is output.
[0140]
The adder circuit 81 employs a wallace_tree type architecture, collects three inputs and narrows them down to two outputs, a sum and a carry. Finally, the adder circuit 82 uses a two-input adder to perform addition. Make it possible to do.
As a result, even when the partial product based on the correction data F and the product-sum operation term C is added, the circuit scale is hardly increased and the addition speed is hardly reduced.
[0141]
FIG. 15 is a partial configuration diagram of the adder circuit 8 adopting the wallace_tree type architecture.
FIG. 15 shows only a configuration for adding bit data in the vertical direction in the figure indicated by arrows 92, 93, and 94 shown in FIG. 13, and the other parts for addition are omitted.
Addition of bit data in the vertical direction in the figure indicated by an arrow 91 shown in FIG.
As shown in FIG. 15, the adder circuit 81 includes an adder 100.₀~ 100₆Have
Adder 100₀Performs addition of the arrow 92, adds bit 1 of the correction data F, bit 1 of the partial product out_0, and bit 0 of the partial product out_1, outputs the sum Sum to the adding circuit 82, and carries the carry Carry. Adder 100₁Output to.
[0142]
Adder 100₁, 100₂, 100_ThreePerforms the addition of the portion of the arrow 93.
Adder 100₁Performs addition of bit 2 of the correction data F and bit 2 of the partial product out_0, and adds the sum Sum to the adder 100._ThreeAnd carry carry to adder 100_FourOutput to.
Adder 100₂Adds the bit 1 of the partial product out_1 and the bit 0 of the partial product out_2, and adds the sum Sum to the adder 100._ThreeAnd carry carry to adder 100_FiveOutput to.
Adder 100_ThreeIs the adder 100₁Carry carry from, and adder 100₂Are added to the carry carry, and the sum Sum and carry carry are output to the adder circuit 82.
[0143]
Adder 100_Four, 100_Five, 100₆Performs the addition of the arrow 94 portion.
Adder 100_FourPerforms addition of bit 3 of the correction data F and bit 3 of the partial product out_0, and adds the sum Sum to the adder 100.₆And carry carry to the adder at the subsequent stage.
Adder 100_FiveAdds the bit 2 of the partial product out_1 and the bit 1 of the partial product out_2, and adds the sum Sum to the adder 100.₆And carry carry to the adder at the subsequent stage.
Adder 100₆Is the adder 100_FourCarry carry from, and adder 100_FiveAre added to the carry carry, and the sum Sum and carry carry are output to the adder circuit 82.
[0144]
The adder 82 adds bit 0 of the correction data F, bit 0 of the partial product out_0, the sum Sum and the carry Carry input from the adder 81 using a plurality of 2-input adders, and adds the above formula ( 15-bit data, which is the calculation result of 15), is calculated, and the upper 8 bits of the 16-bit data are output as data D.
For example, when the counter 72 shown in FIG. 10 indicates the count value “0”, the LIP circuit 61 outputs the calculated data D from the OUT terminal shown in FIG. 4 to the register 62, and otherwise The calculated data D is fed back to the multiplexer 78 shown in FIG.
[0145]
Hereinafter, an operation mode of the texture engine circuit 12 shown in FIG. 10 will be described.
First mode of operation
In this operation mode, a case will be described in which the LIP circuit 61 is shared by the level interpolation processing of the MIPMAP processing and the modulation processing.
In this case, modes "1" and "3" are alternately switched from the main processor 4 shown in FIG. 1 or the main controller (not shown) in the texture engine circuit 12 to the decoders 71 and 73 shown in FIG. 10 every clock cycle. The function mode data FMD shown in FIG.
In addition, the decoder 71 sets “1” as the initial value of the count value of the counter 72, and sets “1” to the counter 72 every time the count value of the counter 72 becomes “0”.
[0146]
Specifically, for example, in the first clock cycle, the 4-point neighborhood interpolation data C from the LIP circuit 54 shown in FIG._pixel0Is written to the register 74.
Further, “1” is set to the count value of the counter 72.
[0147]
Next, in the second clock cycle following the first clock cycle, the function mode data FMD indicates the mode “1”, and the 4-point neighborhood interpolation data C_pixel0Is read from the register 74, and the IN of the LIP circuit 61 is passed through the multiplexer 78 and the register 88._AOutput to the terminal. At the same time, the four-point neighborhood interpolation data C from the LIP circuit 54 shown in FIG._pixel1IN of the LIP circuit 61 through the multiplexer 77 and the register 87._BOutput to the terminal.
At the same time, the data COE from the reduction ratio calculation circuit 50 shown in FIG._mipmapIN of the LIP circuit 61 through the multiplexer 76 and the register 86._coeffIs output.
Then, in the LIP circuit 54, the calculation shown in the above equation (9) is performed, and the level interpolation data C_pixelIs calculated.
Since the counter 72 has a count value “1”, the level interpolation data C_pixelIs fed back to the multiplexer 78.
Then, the count value of the counter 72 is decreased to “0”.
[0148]
Next, in the third clock cycle, the function mode data FMD indicates the mode “3”, and the 4-point neighborhood interpolation data relating to the next pixel from the LIP circuit 54 shown in FIG.
At the same time, the level interpolation data C calculated in the second clock cycle_pixel(= C in formula (10)_texCorresponds to IN of the LIP circuit 61 via the multiplexer 78 and the register 88._AOutput to the terminal.
At the same time, (R, G, B) data S11b (fragment color value C) included in the DDA data S11 from the triangle DDA circuit 11_flag) Of the LIP circuit 61 through the multiplexer 76 and the register 86._coeffIs output.
In the LIP circuit 54, the calculation shown in the above equation (10) is performed, and the color value C after the modulation processing is performed._modIs calculated.
Since the count value of the counter 72 is “0”, the color value C is transferred from the OUT terminal of the LIP circuit 61 to the register 62._modIs output.
Color value C_modAre read from the register 62 and output to the memory I / F circuit 13 at the subsequent stage as pixel data S12.
Then, “1” is set to the count value of the counter 72.
Thereafter, the process of the second clock cycle and the process of the third clock cycle described above are alternately repeated.
[0149]
As described above, in this operation mode, the LIP circuit 61 can be shared by the LIP circuit 61 in the level interpolation process of the MIPMAP process and the modulation process. Therefore, the circuit scale can be reduced as compared with the case where the level interpolation processing circuit and the modulation processing circuit are connected in series. In this embodiment, the 4-point neighborhood interpolation process of the MIPMAP process is performed over 2 clock cycles in one system, and the circuit scale related to the process is the same as the conventional one.
Further, in this operation example, the LIP circuit 61 performs the modulation process in the idle time during which the level interpolation process is not performed, so that the processing time is not prolonged.
[0150]
Second operation mode
In this operation mode, a case will be described in which the LIP circuit 61 is shared by the level interpolation process of the MIPMAP process, the modulation process, and the fogging process.
In this case, the main processor 4 or the main controller (not shown) in the texture engine circuit 12 shown in FIG. 1 transfers the mode “1”, “3”, “ Function mode data FMD indicating “8” in order is output.
Further, the decoder 71 sets “2” as the initial value of the count value of the counter 72, and sets “2” to the counter 72 every time the count value of the counter 72 becomes “0”.
[0151]
Specifically, for example, in the first clock cycle, the 4-point neighborhood interpolation data C from the LIP circuit 54 shown in FIG._pixel0Is written to the register 74.
Then, “2” is set to the count value of the counter 72.
[0152]
Next, in the second clock cycle, the function mode data FMD indicates the mode “1”, and the 4-point neighborhood interpolation data C_pixel0Is read from the register 74, and the IN of the LIP circuit 61 is passed through the multiplexer 78 and the register 88._AOutput to the terminal. At the same time, four-point neighborhood interpolation data C from the LIP circuit 54 shown in FIG._pixel1IN of the LIP circuit 61 through the multiplexer 77 and the register 87._BOutput to the terminal.
At the same time, the data COE from the reduction ratio calculation circuit 50 shown in FIG._mipmapIN of the LIP circuit 61 through the multiplexer 76 and the register 86._coeffIs output.
Then, in the LIP circuit 54, the calculation shown in the above equation (9) is performed, and the level interpolation data C_pixelIs calculated.
Since the counter 72 has a count value “2”, the level interpolation data C_pixelIs fed back to the multiplexer 78.
Then, the count value of the counter 72 is decreased to “1”.
[0153]
Next, in the third clock cycle, the function mode data FMD indicates the mode “3”, and the 4-point neighborhood interpolation data C relating to the next pixel from the LIP circuit 54 shown in FIG._pixel0Is written to the register 74.
At the same time, the level interpolation data C calculated in the second clock cycle_pixel(C in formula (10)_texCorresponds to IN of the LIP circuit 61 via the multiplexer 78 and the register 88._AOutput to the terminal.
At the same time, (R, G, B) data S11b (fragment color value C) included in the DDA data S11 from the triangle DDA circuit 11_flag) Of the LIP circuit 61 through the multiplexer 76 and the register 86._coeffIs output.
In the LIP circuit 54, the calculation shown in the above equation (10) is performed, and the color value C after the modulation processing is performed._modIs calculated.
And the color value C_modThe counter 72 is fed back to the multiplexer 78 because the count value is “1”.
Then, the count value of the counter 72 is decreased to “0”.
Also, four-point neighboring interpolation data C_pixel0Is output to the read circuit 51 shown in FIG. 4 to instruct to wait for one clock cycle, and the fragment data C_flag1 is output to the triangle DDA circuit 11 shown in FIG.
[0154]
Next, in the fourth clock cycle, if the function mode data FMD indicates the mode “8” and the fog enable data FED is the logical value “1”, the color value C calculated in the third clock cycle is displayed._mod(C in formula (13)_flagCorresponds to IN of the LIP circuit 61 via the multiplexer 78 and the register 88._AOutput to the terminal.
At the same time, for example, fog data C read from a fog register (not shown)_fogIs connected to the IN of the LIP circuit 61 through the multiplexer 77 and the register 87._BOutput to the terminal.
At the same time, for example, the fogging coefficient COE included in the DDA data S11 from the triangle DDA circuit 11_fogThrough the multiplexer 76 and the register 86, the IN of the LIP circuit 61_coeffOutput to the terminal.
In the LIP circuit 54, the calculation shown in the above equation (13) is performed, and the color value C after the fogging process is performed._foggedIs calculated.
Since the count value of the counter 72 is “0”, the color value C is transferred from the OUT terminal of the LIP circuit 61 to the register 62._foggedIs output.
Color value C_foggedAre read from the register 62 and output to the memory I / F circuit 13 at the subsequent stage as pixel data S12.
Thereafter, the process of the second clock cycle, the process of the third clock cycle, and the process of the fourth clock cycle described above are alternately repeated.
[0155]
As described above, in this operation mode, the LIP circuit 61 can be shared by the level interpolation process of the MIPMAP process and the modulation process, and the LIP circuit 61 can be shared by the fogging process. Therefore, the number of gates can be reduced and the circuit scale can be reduced as compared with the case where the level interpolation processing circuit and the modulation processing circuit are connected in series.
[0156]
Thus, in the texture engine circuit 12, the circuit scale can be reduced by sharing the LIP circuit 61 shown in FIG. 10 for the level interpolation process of the MIMPAP process and the texture function process. In this case, if the LIP circuit 61 is shared by the level interpolation process and one texture function process, the processing time is not prolonged.
[0157]
In the texture engine circuit 12, since the LIP circuits 52, 53, and 61 shown in FIG. 4 perform the calculation using the correction data F as shown in FIG. 13, when all the bits are the logical value “1”, In the system that is regarded as “1”, the calculation when the COE of the above equation (15) is “1.0” can be accurately performed without substantially increasing the circuit scale.
That is, in order to obtain an appropriate result without performing correction, if 9 bits are used by increasing 1 bit and “0x100” is regarded as “1”, the number of gates of the pipe register in the previous stage is increased, and the entire gate is gated. However, in this embodiment, it is not necessary to increase the number of bits, and such a problem does not occur.
[0158]
Note that the texture engine circuit 12 directly uses the (R, G, B) data read from the SRAM 17 or the texture buffer 20 in the case of the full color system. On the other hand, in the case of the index color system, the texture engine circuit 12 reads a color lookup table (CLUT) created in advance from the texture CLUT buffer 23, transfers and stores it in the built-in SRAM, and stores this color lookup table. In this way, (R, G, B) data corresponding to the color index read from the SRAM 17 or the texture buffer 20 is obtained.
[0159]
Memory I / F circuit 13
The memory I / F circuit 13 compares the z data corresponding to the pixel data S12 input from the texture engine circuit 12 with the z data stored in the z buffer 22, and is rendered by the input pixel data S12. It is determined whether or not the image is positioned on the near side (viewpoint side) with respect to the previous image written in the display buffer 21. If the image is positioned on the near side, the z buffer 22 is used with z data corresponding to the image data S12. The z data stored in is updated.
[0160]
CRT controller circuit 14
The CRT controller circuit 14 generates an address to be displayed on a CRT (not shown) in synchronization with the applied horizontal and vertical synchronization signals, and outputs a request for reading display data from the display buffer 21 to the memory I / F circuit 13. In response to this request, the memory I / F circuit 13 reads display data from the display buffer 21 in a certain chunk. The CRT controller circuit 14 includes a FIFO (First In First Out) circuit that stores display data read from the display buffer 21 and outputs RGB index values to the RAMDAC circuit 15 at regular time intervals.
[0161]
RAMDAC circuit 15
The RAMDAC circuit 15 stores R, G, B data corresponding to each index value, and converts the digital R, G, B data corresponding to the RGB index value input from the CRT controller circuit 14 to D / Transfer to the A converter to generate R, G, B data in analog format. The RAMDAC circuit 15 outputs the generated R, G, B data to the CRT.
[0162]
Realization method of rendering circuit 5
Hereinafter, a preferred configuration, arrangement, and wiring method of the logic circuit of the rendering circuit 5 and the secondary memory including the DRAM 16 and the SRAM 17 that are mixedly mounted in the same semiconductor chip according to the present embodiment will be described with reference to FIGS. I will explain.
[0163]
In the present embodiment, for example, as shown in FIG. 16, the DRAM 16 is divided into four DRAM modules 1471 to 1474, and the memory I / F circuit 13 includes memory controllers corresponding to the DRAM modules 1471 to 1474. 1441 to 1444 and a distributor 1445 for distributing data to these memory controllers 1441 are provided.
Then, as shown in FIG. 16, the memory I / F circuit 13 arranges the pixel data for each of the DRAM modules 1471 to 1474 so that adjacent portions in the display area are different DRAM modules.
As a result, when a plane such as a triangle is drawn, processing can be performed simultaneously on the plane, so that the operation probability of each DRAM module is very high.
[0164]
In the above-described drawing process, the pixels are finally aggregated to access each pixel. Therefore, it is ideal that the rendering performance can be increased by the number of parallel processes by processing each pixel individually in parallel.
Therefore, the memory I / F circuit 13 that constitutes the memory system in the three-dimensional computer graphics system 1 is also configured to perform simultaneous parallel processing.
[0165]
In the graphic drawing process, as described above, it can be seen that the processing circuit for driving into the pixel needs to frequently exchange data with the DRAM.
Therefore, in the present embodiment, as shown in FIG. 17, the pixel processing modules 1446, 1447, 1448, and 1449, which are functional blocks that control pixel processing, are physically separated from the memory controller, and these pixel processing modules 1446 are used. , 1447, 1448, 1449 are arranged (closely arranged) in the vicinity of the corresponding DRAM modules 1471, 1472, 1473, 1474.
[0166]
Pixel processing modules 1446, 1447, 1448, 1449 have drawn before for (R, G, B) color read / modify / write processing and hidden surface processing. The depth data is compared with the depth of data that is about to be drawn, and all the processes related to the work of writing back according to the result are performed.
All of these operations are performed by the pixel processing modules 1446, 1447, 1448, and 1449, so that the exchange with the DRAM can be completed in a module having a short wiring length with the DRAM modules 1471, 1472, 1473, and 1474. .
For this reason, even if the number of wirings to the DRAM, that is, the number of transfer bits is increased, the ratio of the wiring to the area can be reduced, so that the operation speed can be improved and the wiring area can be reduced.
[0167]
Regarding the inter-DRAM control module 1450 including a distributor and the like, the DDA setup calculation of the DDA setup circuit 10, the triangle DDA calculation of the triangle DDA circuit 11, the texture pasting of the texture engine circuit 12, and the CRT control circuit 14 as drawing processing Compared with the display processing or the like according to the above, the relation with each DRAM module (DRAM + pixel processing) is strong, and the number of signal lines between the DRAM modules 1471, 1472, 1473, 1474 is the largest.
Therefore, the inter-DRAM control module 1450 is arranged near the center of each DRAM module 1471, 1472, 1473, 1474 so that the longest wiring length is as short as possible.
[0168]
As for signal input / output terminals for connection between the pixel processing modules 1446, 1447, 1448, 1449 and the inter-DRAM control module 1450, as shown in FIG. 17, the respective pixel processing modules 1446, 1447, 1448, 1449 are provided. The input / output terminal positions of signals in the individual pixel processing modules are arranged so that the individual pixel processing modules and the inter-DRAM control module 1450 are optimally (shortest) wired. It has been adjusted.
[0169]
Specifically, in the pixel processing module 1446, an input / output terminal T1446a is formed on the right end side of the module lower edge portion in FIG. The input / output terminal T1446a is arranged so as to face the input / output terminal T1450a formed on the left end side of the upper edge portion of the inter-DRAM control module 1450, and the terminals T1446a and T1450a are connected with the shortest distance. .
In the pixel processing module 1446, an input / output terminal T1446b for connection to the DRAM module 1471 is formed at the center of the upper edge in FIG.
[0170]
In the pixel processing module 1447, an input / output terminal T1447a is formed on the left end side of the lower edge of the module in FIG. The input / output terminal T1447a is disposed so as to face the input / output terminal T1450b formed on the right end side of the upper edge portion of the inter-DRAM control module 1450, and the two terminals T1447a and T1450b are connected with the shortest distance. .
In the pixel processing module 1447, an input / output terminal T1447b for connection to the DRAM module 1472 is formed at the center of the upper edge in FIG.
[0171]
In the pixel processing module 1448, an input / output terminal T1448a is formed on the right end side of the upper edge of the module in FIG. The input / output terminal T1448a is arranged to face the input / output terminal T1450c formed on the left end side of the lower edge portion of the inter-DRAM control module 1450, and the two terminals T1448a and T1450c are connected with the shortest distance. .
In the pixel processing module 1448, an input / output terminal T1448b for connection to the DRAM module 1473 is formed at the center of the lower edge in FIG.
[0172]
In the pixel processing module 1449, an input / output terminal T1449a is formed on the left end side of the upper edge of the module in FIG. The input / output terminal T1449a is arranged to face the input / output terminal T1450d formed on the right end of the lower edge of the inter-DRAM control module 1450, and the two terminals T1449a and T1450d are connected with the shortest distance. .
In the pixel processing module 1449, an input / output terminal T1449b for connection to the DRAM module 1474 is formed at the center of the lower edge in FIG.
[0173]
Note that the pixel processing modules 1446, 1447, 1448, and 1449 have the processing speed even if the paths from the DRAM modules 1471, 1472, 1473, and 1474 to the inter-DRAM control module 1450 have the optimum length as described above. For processing that cannot satisfy the requirements, for example, at least one stage of pipeline processing divided by a register can be taken, and a desired processing speed can be achieved.
[0174]
Further, the DRAM modules 1471 to 1474 according to the present embodiment are configured as shown in FIG. Here, the DRAM module 1471 is described as an example, but the other DRAM modules 1472 to 1474 have the same configuration, and thus the description thereof is omitted.
[0175]
As shown in FIG. 18, the DRAM module 1471 includes a DRAM core 1480 in which memory cells are arranged in a matrix and accessed through a word line and a bit line (not shown) selected based on a row address RA and a column address CA. A secondary memory 1484 having a function similar to a so-called cache memory including a decoder 1481, a sense amplifier 1482, a column decoder 1483, and an SRAM or the like is provided.
[0176]
As in the present embodiment, for each DRAM module, pixel processing modules 1446 to 1449 as functional blocks that control pixel processing in graphics rendering and a secondary memory 1484 of the DRAM module are arranged in proximity to the DRAM module. .
In this case, the so-called long side direction of the DRAM is arranged to be the column direction of the DRAM core 1480.
[0177]
Looking at random reading (reading) in the configuration of FIG. 18, a control signal and a necessary address signal S1446 are supplied from the pixel processing module 1446 to the DRAM module 1471 from the address control path. Address RA and column address CA are generated, and DRAM data corresponding to a desired row is read through sense amplifier 1482.
The data passed through the sense amplifier 1480 is aggregated by the column decoder in accordance with the desired column address CA, and the DRAM data D1471 corresponding to the desired row / column from the random access port is subjected to pixel processing via the path. Transferred to module 1446.
[0178]
When data is written to the secondary memory, a control signal and a necessary address signal S1446 are supplied from the pixel processing module 1446 to the DRAM module 1471 from the address control path, and only a row address is generated based on the control signal. Minutes of data at a time are written from the DRAM 16 to the secondary memory 1484 including the SRAM 17 and the like.
In this case, since the so-called long side direction of the DRAM is arranged so as to be the column direction of the DRAM core 1480, it corresponds to the row address only by specifying the row address as compared with the case of arranging in the row direction. The number of bits that can be loaded into the secondary memory 1484 at a time for the data for one row is greatly increased.
[0179]
In addition, when the data D1484 is read from the secondary memory (SRAM) 1484 to the texture engine circuit 143 as the texture processing module, the control signal and the necessary address signal are supplied from the texture engine circuit 143 to the DRAM from the address control path. The corresponding data D1484 is transferred to the texture engine circuit 143 via the data path.
[0180]
Further, in the present embodiment, as shown in FIG. 18, the pixel processing module and the secondary memory of the DRAM module are arranged close to each other on the same side of the long side of the DRAM module.
As a result, since the same sense amplifier can be used for the data to the secondary memory of the pixel processing module and the DRAM module, it is possible to reduce the increase in the area of the DRAM core 1480 to two ports. ing.
[0181]
Hereinafter, the overall operation of the three-dimensional computer graphic system 1 will be described.
Polygon rendering data S4 is output from the main processor 4 to the DDA setup circuit 10 via the main bus 6, and the DDA setup circuit 10 generates variation data S10 indicating the difference between the sides of the polygon and the horizontal direction.
The variation data S10 is output to the triangle DDA circuit 11, and the triangle DDA circuit 11 linearly interpolates (z, R, G, B, COE) at each pixel inside the polygon._blend, S, t, q, COE_fog) The data is calculated. And this calculated (z, R, G, B, COE_blend, S, t, q, COE_fog) Data and (x, y) data of each vertex of the polygon are output from the triangle DDA circuit 11 to the texture engine circuit 12 as DDA data S11.
[0182]
Next, a representative point is determined by the representative point determination circuit 301 of the texture engine circuit 12 shown in FIG. 4, and the q data S302 is selected by the representative point determination circuit 301 based on the representative point instruction data S301 indicating the representative point. .
Next, in the reduction ratio calculation circuit 50 shown in FIGS. 4 and 8, the reduction ratio LOD is calculated using the q data S302 and the maxe data S11c.
Next, in the second buffer memory 51 of the texture engine circuit 12 shown in FIG. 4, (s, t, q) data S11a included in the DDA data S11.₁~ S11a₈, The operation of dividing the s data by the q data and the operation of dividing the t data by the q data are performed.
The division results “s / q” and “t / q” are multiplied by the texture sizes USIZE and VSIZE, respectively, to generate texture coordinate data (u, v).
Next, a read request including the generated texture coordinate data (u, v) is output from the second buffer memory 51 and stored in the DRAM 16 or the SRAM 17 via the memory I / F circuit 13 (R , G, B) Data S17 is read out.
At this time, as described above, the above-described MIPMAP processing and texture function processing are performed using the configuration shown in FIGS. 4 and 10, and pixel data S12 is generated.
The pixel data S12 is output from the texture engine circuit 12 to the memory I / F circuit 13.
[0183]
Then, the memory I / F circuit 13 compares the z data corresponding to the pixel data S12 input from the texture engine circuit 12 with the z data stored in the z buffer 22, and the input pixel data S12 It is determined whether or not the image to be drawn is positioned in front (viewpoint side) of the previous image written in the display buffer 21. If the image is positioned in front, the image data S12 is written in the display buffer 21. In addition, the z data stored in the z buffer 22 is updated with the corresponding z data.
[0184]
The present invention is not limited to the first embodiment described above.
For example, in the above-described embodiment, the case where the LIP circuit 61 operates based on the function mode data FMD specifying the modes “1” to “8” illustrated in FIG. 5 is illustrated. You may make it perform a blending process.
[0185]
The contents and number of processes sharing the LIP circuit 61 are arbitrary. For example, the LIP circuit 61 may perform a decal process or an alpha blending process as the texture function process.
[0186]
Second embodiment
The three-dimensional computer graphic system 501 of this embodiment is basically the same as the three-dimensional computer graphic system 1 of the first embodiment described above, but in the texture engine circuit 12 and the memory I / F circuit 13 shown in FIG. It is characterized in that pipeline processing is performed.
Hereinafter, of the functions of the components of the three-dimensional computer graphic system 501, differences from the three-dimensional computer graphic system 1 of the first embodiment will be described.
[0187]
DDA setup circuit 10
Further, the DDA setup circuit 10 determines 1-bit valid instruction data val indicating whether or not each of the eight pixels to be processed simultaneously is positioned inside the triangle to be processed. Specifically, the valid instruction data val is “1” for a pixel located inside the triangle and “0” for a pixel located outside the triangle.
The DDA setup circuit 10 outputs the calculated variation data S10 and the valid instruction data val of each pixel to the triangle DDA circuit 11.
[0188]
Triangle DDA circuit 11
The triangle DDA circuit 11 uses the variational data S10 input from the DDA setup circuit 10 to obtain linearly interpolated (z, R, G, B, α, s, t, q) data of each pixel inside the triangle. calculate.
The triangle DDA circuit 11 generates (x, y) data for each pixel and (z, R, G, B, α, s, t, q, val) data for the pixel at the (x, y) coordinate. The DDA data (interpolated data) S11 is output to the texture engine circuit 12.
In the present embodiment, the triangle DDA circuit 11 outputs to the texture engine circuit 12 DDA data S11 for eight pixels located in a rectangle that performs processing in parallel.
[0189]
Here, (z, R, G, B, α, s, t, q, val) data of the DDA data S11 is 161-bit data as shown in FIG.
Specifically, R, G, B, and α data are each 8 bits, z, s, t, and q data are each 32 bits, and val data is 1 bit.
Hereinafter, among the (z, R, G, B, α, s, t, q, val) data for 8 pixels that are processed in parallel, the val data is the val data S220.₁~ S220₈And (z, R, G, B, α, s, t, q) data is processed data S221.₁~ S221₈And
That is, the triangle DDA circuit 11 includes (x, y) data for 8 pixels and val data S220.₁~ S220₈And operand data S221₁~ S221₈1 is output to the texture engine circuit 12 shown in FIG.
[0190]
Texture engine circuit 12 and memory I / F circuit 13
A process of calculating a reduction ratio LOD using the DDA data S11 by the texture engine circuit 12, a process of calculating "s / q" and "t / q", a process of calculating texture coordinate data (u, v), and The processing of reading (R, G, B, α) data from the texture buffer 20 and the z comparison processing by the memory I / F circuit 13 are piped by the operation blocks 200, 201, 202, 204, 205 shown in FIG. Execute sequentially in line mode.
Here, each of the operation blocks 199, 200, 201, 202, 204, and 205 incorporates 8 operation sub-blocks, and performs operation processing for 8 pixels in parallel.
Here, the texture engine circuit 12 includes calculation blocks 199, 200, 201, and 202, and the memory I / F circuit 13 includes a calculation block 204.
The arithmetic block 199 corresponds to the reduction ratio arithmetic circuit 50, the representative point determination circuit 301, and the q data selection circuit 302 shown in FIG. 4, and the arithmetic blocks 200, 201, and 202 correspond to the readout circuit 51 shown in FIG. The arithmetic block 203 corresponds to the LIP circuits 52, 53, and 54 and the LIP / texture function circuit 55 shown in FIG.
In the LIP / texture function circuit 55, for example, one of the MIMAP processing level interpolation processing and the texture function processing such as modulation processing, decal processing, highlight processing, fogging processing, texture blending processing, and alpha blending processing. Processing is performed selectively.
[0191]
[Calculation block 199]
The calculation block 199 performs processing corresponding to the reduction ratio calculation circuit 304, the representative point determination circuit 301, and the q data selection circuit 302 described in the first embodiment, calculates the reduction ratio LOD of the texture data, and stores it in the calculation block 200. Output.
The reduction ratio LOD is sequentially shifted to the operation blocks 200, 201, 202, and 203.
The calculation block 199 outputs the DDA data S11 input from the triangle DDA circuit 11 to the calculation block 200 at the subsequent stage.
The operation block 199 is composed of val data S220.₁~ S220₈Always works regardless of the value indicated by.
[0192]
[Calculation block 200]
The arithmetic block 200 performs an operation of dividing the s data by the q data and an operation of dividing the t data by the q data using the (s, t, q) data included in the DDA data S11.
As shown in FIG. 20, the calculation block 200 includes eight calculation sub-blocks 200.₁~ 200₈Built in.
Here, the arithmetic sub-block 200₁Is the operation data S221₁And val data S220₁And val data S220₁Is “1”, that is, indicates that it is valid, “s / q” and “t / q” are calculated, and the calculation result is divided into results S200.₁As a calculation sub-block 201 of the calculation block 201₁Output to.
[0193]
Also, the calculation sub-block 200₁The val data S220₁Is “0”, that is, when it is invalid, the calculation is not performed and the division result S200₁Or a division result S200 indicating a predetermined provisional value.₁The calculation sub-block 201 of the calculation block 201₁Output to.
Also, the calculation sub-block 200₁The val data S220₁In the subsequent computation sub-block 201₁Output to.
Note that the calculation sub-block 200₂~ 200₈Also, for each corresponding pixel, the computation sub-block 200₁The same operation is performed, and each division result S200₂~ S200₈And val data S220₂~ S220₈Are calculated in the operation sub-block 201 of the operation block 201 in the subsequent stage.₂~ 201₈Respectively.
[0194]
FIG. 21 shows an operation sub-block 200.₁FIG.
Note that all the arithmetic sub-blocks shown in FIG. 3 basically have the configuration shown in FIG.
As shown in FIG. 21, the computation sub-block 200₁The clock enabler 210₁, A data flip-flop 222, a processor element 223, and a flag flip-flop 224.
Clock enabler 210₁The val data S220 is a timing based on the system clock signal S225.₁And val data S220₁Detect the level. And the clock enabler 210₁The val data S220₁Is "1", for example, the clock signal S210₁When the pulse signal is "0", the clock signal S210 is generated.₁Do not generate pulses.
[0195]
The data flip-flop 222 receives the clock signal S210.₁Is detected, the operation data S221 is detected.₁Is output to the processor element 223.
The processor element 223 receives the input operation data S221.₁Is used to perform the above-described division, and the division result S200₁The operation sub-block 201₁To the data flip-flop 222.
The flag flip-flop 224 receives the val data S220 at a timing based on the system clock signal S225.₁And the calculation sub-block 201 of the calculation block 201 in the subsequent stage₁To the flag flip-flop 224.
Note that the system clock signal S225 shown in FIG. 21 is sent from the arithmetic block 199 shown in FIG.₁~ 200₈, 201₁~ 201₈, 202₁~ 202₈, 204₁~ 204₈To the clock enabler and flag flip-flop 224.
That is, the arithmetic sub-block 200₁~ 200₈, 201₁~ 201₈, 202₁~ 202₈, 204₁~ 204₈The processes in are performed synchronously, and the eight calculation sub-blocks built in the same calculation block perform the processes in parallel.
[0196]
[Calculation block 201]
The operation block 201 is an operation sub-block 201.₁~ 201₈And the division result S200 input from the operation block 200₁~ S200₈Is multiplied by the texture sizes USIZE and VSIZE, respectively, to generate texture coordinate data (u, v).
Arithmetic sub-block 201₁~ 201₈Are the clock enablers 211₁~ 211₈Val data S220₁~ S220₈As a result of the level detection, the calculation is performed only when the level is “1”, and the texture coordinate data S201 as the calculation result is obtained.₁~ S201₈, The calculation sub-block 202 of the calculation block 202₁~ 202₈Output to.
[0197]
[Calculation block 202]
The calculation block 202 is divided into calculation sub-blocks 202.₁~ 202₈And outputs a read request including the texture coordinate data (u, v) generated by the calculation block 201 to the SRAM 17 or the DRAM 16 via the memory I / F circuit 13, and passes through the memory I / F circuit 13. By reading the texture data stored in the SRAM 17 or the texture buffer 20, (R, G, B, α) data S17 stored at the texture address corresponding to the (u, v) data is obtained.
The texture buffer 20 stores texture data corresponding to a plurality of reduction ratios such as MIPMAP (multi-resolution texture). Here, which reduction rate of texture data is used is determined in units of the triangles using a predetermined algorithm.
The SRAM 17 stores a copy of the texture data stored in the texture buffer 20.
Arithmetic sub-block 202₁~ 202₈Are clock enablers 212 respectively.₁~ 212₈Val data S220₁~ S220₈As a result of the level detection, the read process is performed only when the level is “1”, and the read (R, G, B, α) data S17 is converted into (R, G, B, α) data S202.₁~ S202₈Respectively, the calculation sub-block 203 of the calculation block 203₁~ 203₈Output to.
[0198]
The texture engine circuit 12 directly uses the (R, G, B, α) data read from the texture buffer 20 in the case of the full color system. On the other hand, in the case of the index color system, the texture engine circuit 12 reads a color lookup table (CLUT) created in advance from the texture CLUT buffer 23, transfers and stores it in the built-in SRAM, and stores this color lookup table. In this way, (R, G, B) data corresponding to the color index read from the texture buffer 20 is obtained.
[0199]
[Calculation block 203]
The calculation block 203 is a calculation sub-block 203.₁~ 203₈4 and using the LIP circuit 52, 53, 54 and the LIP / texture function circuit 55 shown in FIG. 4, for example, level interpolation processing of MIMAP processing, modulation processing, decal processing, highlight processing, fogging processing, One of the texture function processes such as the texture blending process and the alpha blending process is selectively performed.
[0200]
Then, the operation block 203 has (R, G, B, α) data S203 as a processing result.₁~ S203₈Is output to the calculation block 204.
Arithmetic sub-block 203₁~ 203₈Are the clock enablers 213 respectively.₁~ 213₈Val data S220₁~ S220₈As a result of performing the level detection, the mixed texture function process and the (R, G, B, α) data S203 are performed only when the level is “1”.₁~ S203₈Is output.
[0201]
[Calculation block 204]
The calculation block 204 is a calculation sub-block 204.₁~ 204₈And input (R, G, B, α) data S203₁~ S203₈Z is compared using the contents of the z data stored in the z buffer 22 to obtain (R, G, B, α) data S203.₁~ S203₈When the image drawn by is positioned before (the viewpoint side) the value drawn in the display buffer 21 last time, the z buffer 22 is updated and (R, G, B, α) data S203 is updated.₁~ S203₈(R, G, B, α) data S204₁~ S204₈As shown in FIG.
Arithmetic sub-block 204₁~ 204₈The clock enabler 214₁~ 214₈Val data S220₁~ S220₈As a result of the level detection, the above-described z comparison and writing to the display buffer 21 are performed only when the level is “1”.
Note that the memory I / F circuit 13 accesses the DRAM 16 simultaneously for 16 pixels.
[0202]
The overall operation of the three-dimensional computer graphic system 501 will be described below.
Polygon rendering data S4 is output from the main processor 4 to the DDA setup circuit 10 via the main bus 6, and the DDA setup circuit 10 generates variation data S10 indicating the difference between the sides of the triangle and the horizontal direction.
The variation data S10 is output to the triangle DDA circuit 11, where the linearly interpolated (z, R, G, B, α, s, t, q) data in each pixel inside the triangle is obtained. Calculated. Then, the calculated (z, R, G, B, α, s, t, q) data and (x, y) data of each vertex of the triangle are used as DDA data S11 from the triangle DDA circuit 11. It is output to the texture engine circuit 12.
[0203]
Next, the texture engine circuit 12 and the memory I / F circuit 13 use the DDA data S11 to calculate the reduction ratio LOD, the calculation processing of “s / q” and “t / q”, the texture coordinate data (u , V) calculation processing, (R, G, B, α) data reading processing as digital data from the texture buffer 20, texture function processing, and z comparison processing are performed in operation blocks 199, 200 shown in FIG. , 201, 202, 203, 204 are sequentially executed in a pipeline manner.
[0204]
Next, the pipeline processing operations of the texture engine circuit 12 and the memory I / F circuit 13 shown in FIG. 1 will be described.
Here, for example, consider a case where 8 pixels in a rectangle 251 shown in FIG. 7 are processed simultaneously. In this case, the val data S220₁, S220₂, S220_Three, S220_Five, S220₆Indicates “0” and the val data S220_Four, S220₇, S220₈Indicates “1”.
[0205]
And the val data S220₁~ S220₈And operand data S221₁~ S221₈Is input to the arithmetic block 199, and in the arithmetic block 199, I₇The reduction ratio LOD is calculated with the pixel of the representative point as a representative point, and the calculated reduction ratio LOD and val data S220 are calculated.₁~ S220₈And operand data S221₁~ S221₈Is output to the calculation block 200.
[0206]
Next, the val data S220₁~ S220₈And operand data S221₁~ S221₈Are corresponding computation sub-blocks 200, respectively.₁~ 200₈The clock enabler 210₁~ 210₈Is input.
And the clock enabler 210₁~ 210₈Respectively, the val data S220₁~ S220₈Levels are detected. Specifically, the clock enabler 210_Four, 210₇, 210₈"1" is detected at the clock enabler 210₁, 210₂, 210_Three, 210_Five, 210₆"0" is detected at.
As a result, the operation sub-block 200_Four, 200₇, 200₈Only in the operation data S221_Four, S221₇, S221₈Are used to calculate “s / q” and “t / q”, and the division result S200_Four, S200₇, S200₈Is the calculation block 201 of the calculation block 201._Four, 201₇, 201₈Is output.
On the other hand, the calculation sub-block 200₁, 200₂, 200_Three, 200_Five, 200₆Then no division is performed.
Also, the division result S200_Four, S200₇, S200₈In synchronization with the output of the val data S220₁~ S220₈Is the calculation sub-block 201 of the calculation block 201.₁~ 201₈Is output.
[0207]
Next, the computation sub-block 201₁~ 201₈The clock enabler 210₁~ 210₈Respectively, the val data S220₁~ S220₈Levels are detected.
Then, based on the detection result, the computation sub-block 201_Four, 201₇, 201₈Only in divide result S200_Four, S200₇, S200₈Is multiplied by the texture sizes USIZE and VSIZE, respectively, to indicate the texture coordinate data S202._Four, S202₇, S202₈Are generated, and the calculation sub-blocks 202 of the calculation blocks 202 are generated._Four, 202₇, 202₈Is output.
On the other hand, the operation sub-block 201₁, 201₂, 201_Three, 201_Five, 201₆Then, no operation is performed.
The texture coordinate data S202_Four, S202₇, S202₈In synchronization with the output of the val data S220₁~ S220₈Is a calculation sub-block 202 of the calculation block 202.₁~ 202₈Is output.
[0208]
Next, the computation sub-block 202₁~ 202₈The clock enabler 212₁~ 212₈Respectively, the val data S220₁~ S220₈Levels are detected.
Based on the detection result, the calculation sub-block 202_Four, 202₇, 202₈Only, the texture data stored in the SRAM 17 or the texture buffer 20 is read out, and the (R, G, B, α) data stored in the texture address corresponding to the (s, t) data is read out. . The read (R, G, B, α) data S202_Four, S202₇, S202₈Is a calculation sub-block 203 of the calculation block 204._Four, 203₇, 203₈Is output.
On the other hand, the calculation sub-block 202₁, 202₂, 202_Three, 202_Five, 202₆Then, the reading process is not performed.
In addition, (R, G, B, α) data S202_Four, S202₇, S202₈In synchronization with the output of the val data S220₁~ S220₈Is a calculation sub-block 203 of the calculation block 203.₁~ 203₈Is output.
[0209]
Next, the computation sub-block 203₁~ 203₈The clock enabler 212₁~ 212₈Respectively, the val data S220₁~ S220₈Levels are detected.
Based on the detection result, the arithmetic sub-block 203_Four, 203₇, 203₈Only, the texture function processing is performed, and the (R, G, B, α) data S203 obtained thereby._Four, 203₇, 203₈Is output to the calculation block 204.
On the other hand, the calculation sub-block 203₁, 203₂, 203_Three, 203_Five, 203₆Then, texture function processing is not performed.
[0210]
Next, the computation sub-block 204₁~ 204₈The clock enabler 214₁~ 214₈Respectively, the val data S220₁~ S220₈Levels are detected.
Based on the detection result, the calculation sub-block 204_Four, 204₇, 204₈Only in (R, G, B, α) data S203_Four, S203₇, S203₈Z is compared using the contents of the z data stored in the z buffer 22 to obtain (R, G, B, α) data S203._Four, S203₇, S203₈When the image drawn by is positioned before the previous value drawn in the display buffer 21, the z buffer 22 is updated and the (R, G, B, α) data S203 is updated._Four, S203₇, S203₈Is written into the display buffer 21.
[0211]
That is, in the texture engine circuit 12 and the memory I / F circuit 13, when processing is simultaneously performed on the pixels of the rectangle 251 shown in FIG. 6, processing is not performed on the pixels located outside the triangle 250. That is, while the calculation is performed on the pixels in the rectangle 251 shown in FIG.₁, 200₂, 200_Three, 200_Five, 200₆, 201₁, 201₂, 201_Three, 201_Five, 201₆, 202₁, 202₂, 202_Three, 202_Five, 202₆, 204₁, 204₂, 204_Three, 204_Five, 204₆Is stopped and these computing sub-blocks do not consume power.
[0212]
As described above, the three-dimensional computer graphic system 501 further has the following effects in addition to the effects of the three-dimensional computer graphic system 1 of the first embodiment described above.
That is, according to the three-dimensional computer graphic system 501, in the pipeline processing in the texture engine circuit 12, the calculation is not performed on the pixels located outside the triangle to be processed among the eight pixels to be simultaneously processed. be able to.
Therefore, power consumption in the texture engine circuit 12 can be significantly reduced. As a result, a simple and inexpensive power source for the three-dimensional computer graphic system 501 can be used.
As shown in FIGS. 20 and 21, the texture engine circuit 12 implements the above-described functions by incorporating a clock enabler and a 1-bit flag flip-flop into each arithmetic sub-block. Since the circuit scale of the 1-bit flag flip-flop is small, the circuit scale of the texture engine circuit 12 does not increase significantly.
[0213]
The present invention is not limited to the embodiment described above.
In the above-described embodiment, the case where texture data is read from the storage circuit using a common reduction ratio for a plurality of pixel data to be processed simultaneously is described. A plurality of reduction rate calculation circuits may be provided. In this case, a plurality of readout circuits are provided respectively corresponding to a plurality of pixel data to be processed simultaneously, and a storage circuit, a plurality of reduction rate calculation circuits, and a plurality of readout circuits are mixedly mounted on one semiconductor chip.
In addition, a plurality of image processing circuits that simultaneously process using the texture data read by the reading circuit to generate a plurality of display data, and a plurality of writing circuits that write the generated display data to a storage circuit such as a DRAM. Further, a storage circuit, a plurality of reduction ratio calculation circuits, a plurality of reading circuits, a plurality of image processing circuits, and a plurality of writing circuits may be mounted on one semiconductor chip.
[0214]
【The invention's effect】
As described above, according to the image processing apparatus of the present invention, high image quality can be stably provided with a small-scale apparatus configuration.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of a three-dimensional computer graphic system according to an embodiment of the present invention.
FIG. 2 is a diagram for explaining a method for generating effective bit data in the DDA setup circuit shown in FIG. 1;
FIG. 3 is a diagram for explaining texture data used for the MIPMAP processing stored in the SARAM and the texture buffer shown in FIG. 1;
4 is a configuration diagram of a texture engine circuit shown in FIG. 1. FIG.
FIG. 5 is a flowchart of processing in the representative point determination circuit shown in FIG. 4;
6 is a diagram for explaining processing in the representative point determination circuit shown in FIG. 4; FIG.
7 is a diagram for explaining a specific example of representative points when the triangle shown in FIG. 2 is a processing target. FIG.
FIG. 8 is a configuration diagram of a reduction ratio calculation circuit shown in FIG. 4;
FIG. 9 is a diagram for explaining the processing contents of the reduction ratio calculation circuit shown in FIG. 8;
FIG. 10 is a configuration diagram of the LIP / texture function circuit shown in FIG. 4;
FIG. 11 is a diagram for explaining data input to the LIP circuit in each mode;
FIG. 12 is a diagram for explaining the input timing of mipmap data from the LIP circuit to the LIP / texture function circuit and the execution timing of the level interpolation process;
FIG. 13 is a diagram for explaining processing of the LIP circuit shown in FIG. 4;
FIG. 14 is a diagram for explaining processing of the LIP circuit shown in FIG. 4;
15 is a partial configuration diagram of the previous stage adder circuit shown in FIG. 13;
16 is a diagram for explaining a data storage method for the DRAM shown in FIG. 1; FIG.
FIG. 17 is a diagram for explaining a preferable configuration, arrangement, and wiring method of the logic circuit of the rendering circuit shown in FIG. 1, the DARAM, and the secondary memory.
FIG. 18 is a diagram for explaining the configuration of the DRAM module shown in FIG. 17;
FIG. 19 is a diagram for explaining the format of DDA data output from the triangle DDA circuit shown in FIG. 1 in the three-dimensional computer graphic system according to the second embodiment of the present invention.
FIG. 20 is a partial configuration diagram of a texture engine circuit and a memory I / F circuit in the three-dimensional computer graphic system according to the second embodiment of the present invention.
FIG. 21 is a configuration diagram of a calculation sub block shown in FIG. 20;
FIG. 22 is a diagram for explaining MIPMAP filtering processing;
FIG. 23 is a diagram for explaining a conventional general texture mapping apparatus;
FIG. 24 is a flowchart of processing in the texture mapping apparatus shown in FIG.
FIG. 25 is a diagram for explaining a texture mapping apparatus that realizes high-speed processing;
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Three-dimensional computer graphic system, 2 ... Main memory, 3 ... I / O interface circuit, 4 ... Main processor, 5 ... Rendering circuit, 10 ... DDA setup circuit, 11 ... Triangle DDA circuit, 12 ... Texture engine circuit, 13 ... Memory I / F circuit, 14 ... CRT controller circuit, 15 ... RAMDAC circuit, 16 ... DRAM, 17 ... SRAM, 20 ... Texture buffer, 21 ... Display buffer, 22 ... Z buffer, 23 ... Texture CLUT buffer, 301 ... Representative Point determination circuit, 302... Stq selection circuit, 50... Reduction ratio calculation circuit, 51... Texture data read circuit, 55 .. LIP / texture function circuit

Claims

In an image processing apparatus that represents a display model by combining unit graphics composed of a plurality of pixels to which common processing conditions are applied, and generates pixel data corresponding to the pixels using texture data as necessary.
The display model is a three-dimensional model;
The unit figure is a triangle;
A storage circuit for storing display data and a plurality of texture data corresponding to different reduction ratios for the same pattern;
A reduction ratio calculation circuit for calculating a reduction ratio commonly used for a plurality of pixel data to be processed simultaneously;
A readout circuit for reading out the texture data corresponding to the calculated reduction ratio from the storage circuit;
An image processing circuit that generates display data by simultaneously processing the plurality of pixel data using the read texture data;
A writing circuit for writing the generated display data into the storage circuit;
A representative point determination circuit for determining a pixel as a representative point from among the pixels corresponding to the plurality of pixel data to be processed at the same time, among the pixels located inside the unit graphic to be processed ; Have
The reduction rate calculation circuit substantially calculates LOD indicating the reduction rate based on the following formula,
LOD = Clamp (((log ₂ 1 / q) + maxe)
<< L + K)
here,
LOD is composed of an integer part and a decimal part, and is a symbol indicating an unsigned reduction rate,
Clamp is a symbol indicating clamping in the following clamp circuit,
q is a symbol indicating the homogeneous term,
maxe is data consisting only of an integer part indicating the maximum coordinate of the homogeneous coordinates (s, t) and the homogeneous term q of the vertex of the unit graphic to be processed;
<< L indicates that data is shifted by L bits in the shift circuit described below.
K is composed of an integer part and a decimal part, is signed, and is data used for addition in the following addition circuit,
A normalization circuit that normalizes the homogeneous term data q to generate an exponent qe and a mantissa qm;
A first shift circuit that shifts data obtained by bit-combining the exponent qe and the mantissa qm toward a MSB (Most Significant Bit) by a value indicated by the data L;
A first inverting circuit for inverting the output of the first shift circuit;
Data output means for inputting the mantissa qm and outputting data μ indicating “log ₂ ({1, qm}) − qm”;
A second shift circuit that shifts data obtained by bit-combining the data maxe and the data μ toward the MSB by a value indicated by the data L;
A second inverting circuit for inverting the output of the second shift circuit;
An addition circuit for adding the data obtained by bit-combining the data K and the binary number “10”, the output of the first inversion circuit, and the output of the second inversion circuit;
A clamp circuit for clamping the output of the adder circuit within a predetermined bit to generate the reduction ratio LOD;
Have
The readout circuit receives texture data specified by the determined reduction ratio, the homogeneous coordinates (s, t), and the homogeneous term q from the storage circuit for each of the plurality of pixel data to be simultaneously processed. Read ,
Image processing device.

The storage circuit, the reduction ratio calculation circuit, the readout circuit, the image processing circuit, the writing circuit, and the representative point determination circuit are mounted together in one semiconductor chip;
The image processing apparatus according to claim 1.

The image according to claim 2, wherein among the pixels corresponding to the plurality of pixel data to be processed simultaneously, only the processing result of the pixel located inside the unit graphic to be processed is used as an effective one. Processing equipment.

The image processing circuit includes:
The first texture data indicating the display pattern of the first reduction ratio is interpolated to calculate the first pixel data corresponding to the pixel at a predetermined position in two dimensions, and the second display pattern of the second reduction ratio is shown. A first image processing circuit for performing a first interpolation process for interpolating two texture data and calculating second pixel data corresponding to the pixels;
Display of a third reduction ratio between the first reduction ratio and the second reduction ratio by performing a second interpolation process using the first pixel data and the second pixel data. A second image processing circuit for calculating third pixel data corresponding to the pixel according to a pattern,
The second image processing circuit feeds back the third pixel data when the second interpolation processing can be performed in a shorter time than the first interpolation processing, and the fed back third pixel Perform predetermined image processing using data
The image processing apparatus according to claim 1 .

The first image processing circuit repeatedly performs the first interpolation processing a plurality of times;
The second image processing circuit sequentially repeats the second interpolation processing and the predetermined image processing a plurality of times.
The image processing apparatus according to claim 4 .

The second image processing circuit includes:
As the predetermined image processing, at least one of modulation processing, decal processing, highlight processing, fogging processing, and alpha blending processing is performed.
The image processing apparatus according to claim 4 .