JP2005078357A

JP2005078357A - Image processor and its method

Info

Publication number: JP2005078357A
Application number: JP2003307649A
Authority: JP
Inventors: Tanio Nagasaki; 多仁生長崎; Seigo Iwasaki; 誠吾岩崎
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2003-08-29
Filing date: 2003-08-29
Publication date: 2005-03-24
Anticipated expiration: 2023-08-29
Also published as: JP4419480B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an image processor and its method capable of avoiding increases in area and electric power, simplifying a memory access mechanism, reducing design load, and being independent of a memory system. <P>SOLUTION: A rasterization engine 32 accesses the memory through a memory interface 22. A cache system 324 is provided between a texture pixel engine which processes texture and pixel levels and the memory interface. The function of generating the true address in a way depending on an image memory architecture is divided between the method of a command that is independent of the image memory architecture and the method of delivering data, whereby the memory access of image processing is separated into a cache that does not depend on the image process memory architecture. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、モデルを単位図形の組み合わせによって表現し、スクリーン座標系の描画対象領域内にピクセルを発生し、メモリに対してレンダリング処理を行う画像処理装置およびその方法に関するものである。 The present invention relates to an image processing apparatus and method for expressing a model by a combination of unit graphics, generating pixels in a drawing target area of a screen coordinate system, and performing a rendering process on a memory.

昨今のコンピュータシステムにおける演算速度の向上や描画機能の強化とも相俟って、コンピュータ資源を用いて図形や画像の作成や処理を行う「コンピュータ・グラフィックス（ＣＧ）」技術が盛んに研究・開発され、さらに実用化されている。 Combined with improvements in computing speed and enhancement of drawing functions in recent computer systems, research and development of “computer graphics (CG)” technology that creates and processes graphics and images using computer resources is actively conducted. Has been put to practical use.

たとえば、３次元グラフィックスは、３次元オブジェクトが所定の光源によって照らされたときの光学現象を数学モデルで表現して、このモデルに基づいてオブジェクト表面に陰影や濃淡を付けたり、さらには模様を貼り付けたりして、よりリアルで３次元的な２次元高精細画像を生成するものである。
このようなコンピュータ・グラフィックスは、科学、工学、製造などの開発分野でのＣＡＤ／ＣＡＭ、その他の各種応用分野においてますます盛んに利用されるようになってきている。
そして、近年、コンピュータ・グラフィックス技術は、携帯電話、携帯情報端末（ＰｅｒｓｏｎａｌＤｉｇｉｔａｌＡｓｓｉｓｔａｎｔ：ＰＤＡ）、カーナビゲーション子システム等に利用されている。 For example, in 3D graphics, optical phenomena when a 3D object is illuminated by a predetermined light source are expressed by a mathematical model, and the object surface is shaded or shaded based on this model. By pasting, a more realistic and three-dimensional two-dimensional high-definition image is generated.
Such computer graphics are increasingly used in CAD / CAM in development fields such as science, engineering and manufacturing, and in various other application fields.
In recent years, computer graphics technology has been used for mobile phones, personal digital assistants (PDAs), car navigation systems, and the like.

３次元グラフィックスは、一般には、フロントエンドとして位置づけられる「ジオメトリ・サブシステム」と、バックエンドとして位置づけられる「ラスタ・サブシステム」とにより構成される。 Three-dimensional graphics is generally composed of a “geometry subsystem” positioned as a front end and a “raster subsystem” positioned as a back end.

ジオメトリ・サブシステムとは、ディスプレイ・スクリーン上に表示する３次元オブジェクトの位置や姿勢などの幾何学的な演算処理を行う過程のことである。
ジオメトリ・サブシステムでは、一般にオブジェクトは多数のポリゴンの集合体として扱われ、ポリゴン単位で、「座標変換」、「クリッピング」、「光源計算」などの幾何学的な演算処理が行われる。 The geometry subsystem is a process of performing geometric calculation processing such as the position and orientation of a three-dimensional object displayed on a display screen.
In the geometry subsystem, an object is generally handled as a collection of a large number of polygons, and geometric calculation processing such as “coordinate transformation”, “clipping”, “light source calculation”, and the like is performed for each polygon.

一方、ラスタ・サブシステムは、オブジェクトを構成する各ピクセル（ｐｉｘｅｌ）を塗りつぶす過程のことである。
ラスタライズ処理は、たとえばポリゴンの頂点毎に求められた画像パラメータを基にして、ポリゴン内部に含まれるすべてのピクセルの画像パラメータを補間することによって実現される。
ここで言う画像パラメータには、いわゆるＲＧＢ形式などで表される色（描画色）データ、奥行き方向の距離を表すｚ値などがある。
また、最近の高精細な３次元グラフィックス処理では、遠近感を醸し出すためのｆ（ｆｏｇ：霧）や、物体表面の素材感や模様を表現してリアリティを与えるテクスチャｔ（ｔｅｘｔｕｒｅ）なども、画像パラメータの１つとして含まれている。 On the other hand, the raster subsystem is a process of painting each pixel constituting an object.
The rasterization process is realized by interpolating the image parameters of all the pixels included in the polygon based on the image parameters obtained for each vertex of the polygon, for example.
The image parameters referred to here include color (drawing color) data expressed in a so-called RGB format and the like, a z value indicating a distance in the depth direction, and the like.
In addition, in recent high-definition 3D graphics processing, f (fog: fog) for creating a sense of perspective, texture t (texture) for expressing the texture and pattern of the object surface and providing reality, It is included as one of the image parameters.

ここで、ポリゴンの頂点情報からポリゴン内部のピクセルを発生する処理では、よくＤＤＡ（ＤｉｇｉｔａｌＤｉｆｆｅｒｅｎｔｉａｌＡｎａｌｙｚｅｒ）と呼ばれる線形補間手法を用いて実行される。
ＤＤＡプロセスでは、頂点情報からポリゴンの辺方向へのデータの傾きを求め、この傾きを用いて辺上のデータを算出した後、続いてラスタ走査方向（Ｘ方向）の傾きを算出し、この傾きから求めたパラメータの変化分を走査の開始点のパラメータ値に加えていくことで、内部のピクセルを発生していく。 Here, the process of generating the pixels inside the polygon from the vertex information of the polygon is often performed using a linear interpolation method called DDA (Digital Differential Analyzer).
In the DDA process, the inclination of the data in the side direction of the polygon is obtained from the vertex information, the data on the side is calculated using this inclination, and then the inclination in the raster scanning direction (X direction) is calculated. An internal pixel is generated by adding the change amount of the parameter obtained from the above to the parameter value of the scanning start point.

そして、３次元コンピュータグラフィックスでは、各ピクセルに対応する色を決定するときに、各ピクセルの色の計算をし、この計算した色の値を、当該ピクセルに対応するディスプレイバッファ（フレームバッファ）のアドレスに書き込むレンダリング（Ｒｅｎｄｅｒｉｎｇ）処理を行う。
このようなレンダリング処理を行うために、ピクセルエンジン（描画エンジン）が設けられる。 In 3D computer graphics, when determining the color corresponding to each pixel, the color of each pixel is calculated, and the calculated color value is stored in a display buffer (frame buffer) corresponding to the pixel. Rendering processing to write to the address is performed.
In order to perform such rendering processing, a pixel engine (rendering engine) is provided.

３次元コンピュータグラフィックスにおける描画速度は、描画エンジンからフレームバッファへの書き込み速度に影響され、フレームバッファのアクセス速度が遅いと描画速度が低下することになる。
この課題を解消するために高価な高速メモリを大容量のフレームバッファに用いることはシステムの高価格化につながることから、安価な汎用メモリを用いることを前提として、キャッシュ機構をとる画像処理装置が提案されている（たとえば、特許文献１参照）。 The drawing speed in three-dimensional computer graphics is affected by the writing speed from the drawing engine to the frame buffer, and the drawing speed decreases when the access speed of the frame buffer is low.
In order to solve this problem, using an expensive high-speed memory for a large-capacity frame buffer leads to an increase in the price of the system. Therefore, on the premise that an inexpensive general-purpose memory is used, an image processing apparatus using a cache mechanism is proposed. It has been proposed (see, for example, Patent Document 1).

この特許文献１に記載された画像処理装置は、描画エンジンにより生成された画像データを一時的に先読み可能なＦＩＦＯメモリ等に一時蓄え、この先読み可能なＦＩＦＯメモリ等とフレームバッファとの間にキャッシュメモリを設け、先読み可能なＦＩＦＯメモリ等の内容をキャッシュメモリに先読みして、キャッシュメモリの読み書き制御を行う。
特開平９−２１２６６１号公報 The image processing apparatus described in Patent Document 1 temporarily stores image data generated by a drawing engine in a prefetchable FIFO memory or the like, and caches the prefetched FIFO memory or the like between the frame buffer and the frame buffer. A memory is provided, and the contents of the prefetchable FIFO memory and the like are prefetched into the cache memory, and read / write control of the cache memory is performed.
JP-A-9-212661

しかしながら、従来の３次元コンピュータグラフィックスにおいては、その機能を実現するために、専用のメモリシステムもしくはメモリシステムに依存（密着）したアーキテクチャ構成をとっていた。
汎用メモリ対応面でも、上記した特許文献１に記載されているように、汎用ＤＲＡＭを前提としたキャッシュ機構がとられる状況であった。 However, in the conventional three-dimensional computer graphics, in order to realize the function, an exclusive memory system or an architecture configuration depending on (adhering to) the memory system has been adopted.
In terms of general-purpose memory compatibility, as described in Patent Document 1 described above, a cache mechanism based on a general-purpose DRAM is used.

一方、ＳＯＣのように、高集積化と多機能化に対応するため、ＩＰに代表される高機能モジュールが３次元コンピュータグラフィックス（３ＤＣＧ）機能にも要求される。このとき、従来手法の場合はメモリシステムが異なる場合、専用メモリを持ってＩＰとするか、もしくは異なるシステムメモリにあわせた再設計が必要となる。
専用メモリを付加させることは面積、電力の増大、メモリアクセス機構の複雑化を招く。再設計を行うことは、３次元コンピュータグラフィックス機能実現構造まで含めた設計負荷をもたらす。
このように、従来の３次元コンピュータグラフィックスに対応した画像処理アーキテクチャは、メモリアーキテクチャに依存して、メモリアーキテクチャから分離して構成することができず、ＩＰに代表される高機能モジュール等のシステムに容易に適用することが困難であるという不利益があった。 On the other hand, high-function modules represented by IP are also required for three-dimensional computer graphics (3DCG) functions in order to cope with high integration and multi-function like SOC. At this time, in the case of the conventional method, when the memory system is different, it is necessary to use a dedicated memory as an IP or to redesign it to a different system memory.
Adding a dedicated memory causes an increase in area, power, and complexity of the memory access mechanism. Performing the redesign brings about a design load including a three-dimensional computer graphics function implementation structure.
As described above, the conventional image processing architecture corresponding to three-dimensional computer graphics cannot be configured separately from the memory architecture depending on the memory architecture, and is a system such as a high-function module represented by IP. There was the disadvantage that it was difficult to easily apply.

本発明は、かかる事情に鑑みてなされたものであり、その目的は、面積、電力の増大を抑止し、メモリアクセス機構の簡単化を図れ、設計負荷を軽減でき、メモリシステムに依存することなく、各種システムに容易に適応することが可能な画像処理装置およびその方法を提供することにある。 The present invention has been made in view of such circumstances, and its purpose is to suppress an increase in area and power, to simplify a memory access mechanism, to reduce a design load, and without depending on a memory system. Another object of the present invention is to provide an image processing apparatus and method that can be easily adapted to various systems.

上記目的を達成するため、本発明の第１の観点は、メモリに対するレンダリング処理を行う画像処理装置であって、描画すべきプリミティブに関する情報に基づいてピクセルデータを発生し、上記メモリに対するアクセスに関する情報として２次元座標を出力するピクセルエンジンと、上記ピクセルエンジンによるに２次元座標を受けて、ピクセルデータに対して当該２次元座標に相応する２次元構造をとり、上記メモリをアクセスするために２次元座標情報に相応したコマンドとデータに標準化したキャッシュコマンドを出力するキャッシュとを有する。 In order to achieve the above object, a first aspect of the present invention is an image processing apparatus that performs a rendering process on a memory, generates pixel data based on information on a primitive to be drawn, and information on access to the memory A pixel engine that outputs two-dimensional coordinates, and receives the two-dimensional coordinates from the pixel engine, takes a two-dimensional structure corresponding to the two-dimensional coordinates for the pixel data, and uses the two-dimensional to access the memory A command corresponding to the coordinate information and a cache for outputting a cache command standardized to data are provided.

好適には、上記キャッシュコマンドには、上記メモリの実アドレスに必要な汎用情報を含む。 Preferably, the cache command includes general information necessary for the real address of the memory.

好適には、上記キャッシュによるキャッシュコマンドを受けて上記メモリの実アドレスを生成するメモリインタフェースを、有する。 Preferably, it has a memory interface that receives a cache command from the cache and generates a real address of the memory.

好適には、上記キャッシュによるキャッシュコマンドを受けてメモリアーキテクチャに依存した上記メモリの実アドレスを生成するメモリインタフェースを、有する。 Preferably, it has a memory interface that receives a cache command from the cache and generates a real address of the memory depending on a memory architecture.

本発明の第２の観点は、メモリに対するレンダリング処理を行う画像処理装置であって、外部装置とのインタフェースを司る外部インタフェースと、上記メモリとのインタフェースを司るメモリインタフェースと、上記外部インタフェースを介して入力した描画すべきプリミティブに関する情報に基づいてピクセルデータを発生し、上記メモリに対するアクセスに関する情報として２次元座標を出力するピクセルエンジンと、上記ピクセルエンジンによるに２次元座標を受けて、ピクセルデータに対して当該２次元座標に相応する２次元構造をとり、上記メモリをアクセスするために２次元座標情報に相応したコマンドとデータに標準化したキャッシュコマンドを出力するキャッシュと、を有し、上記メモリインタフェースは、上記キャッシュによるキャッシュコマンドを受けて上記メモリの実アドレスを生成する。 According to a second aspect of the present invention, there is provided an image processing apparatus that performs a rendering process on a memory, an external interface that controls an interface with an external apparatus, a memory interface that controls an interface with the memory, and the external interface. Pixel data is generated based on the input information about the primitive to be drawn, and a pixel engine that outputs two-dimensional coordinates as information relating to access to the memory; receives the two-dimensional coordinates from the pixel engine; A memory having a two-dimensional structure corresponding to the two-dimensional coordinates and outputting a cache command standardized to data and a command corresponding to the two-dimensional coordinate information for accessing the memory. , Above cash In response to the Cash command to generate a real address of the memory.

本発明の第３の観点は、メモリに対するレンダリング処理を行う画像処理方法であって、描画すべきプリミティブに関する情報に基づいてピクセルデータを発生する第１のステップと、上記メモリに対するアクセスに関する情報として２次元座標を生成してキャッシュに与える第２のステップと、上記キャッシュにおいて、上記メモリをアクセスするために２次元座標情報に相応したコマンドとデータに標準化したキャッシュコマンドを出力する第３のステップとを有する。 According to a third aspect of the present invention, there is provided an image processing method for performing a rendering process on a memory, wherein a first step for generating pixel data based on information on a primitive to be drawn and information on access to the memory are 2 A second step of generating and providing a dimensional coordinate to the cache, and a third step of outputting a cache command standardized to data and a command corresponding to the two-dimensional coordinate information in order to access the memory in the cache. Have.

本発明によれば、ピクセルエンジンにおいて、たとえば各種データ（ｚ，カラーなど）の描画すべきプリミティブに関する情報に基づいてピクセルデータが発生される。
ピクセルエンジンでは、メモリに対するアクセスに関する情報として２次元座標がキャッシュに対して出力される。
キャッシュでは、ピクセルエンジンによるに２次元座標を受けて、ピクセルデータに対して当該２次元座標に相応する２次元構造をとり、メモリをアクセスするために２次元座標情報に相応したコマンドとデータに標準化したキャッシュコマンドが出力される。
そして、メモリインタフェースにおいて、キャッシュによるキャッシュコマンドを受けてメモリアーキテクチャに依存したメモリの実アドレスが生成される。 According to the present invention, pixel data is generated in the pixel engine based on information on primitives to be drawn, for example, various data (z, color, etc.).
In the pixel engine, two-dimensional coordinates are output to the cache as information relating to access to the memory.
The cache receives two-dimensional coordinates from the pixel engine, takes a two-dimensional structure corresponding to the two-dimensional coordinates for the pixel data, and standardizes commands and data corresponding to the two-dimensional coordinate information to access the memory. Cache command is output.
The memory interface receives a cache command from the cache and generates a memory real address depending on the memory architecture.

本発明によれば、面積、電力の増大を抑止でき、メモリアクセス機構の簡単化を図れ、設計負荷を軽減できる利点がある。
その結果、従来はメモリアーキテクチャに依存して構成されていた画像処理、すなわち３次元コンピュータグラフィックスアーキテクチャをメモリから分離し、汎用ＳＯＣなどに使用するＩＰにした場合、メモリシステムに依存する機能ブロック仕様が標準化されているため、容易にシステムに順応した構成をとることが可能となる。 According to the present invention, there is an advantage that an increase in area and power can be suppressed, a memory access mechanism can be simplified, and a design load can be reduced.
As a result, when the image processing, which is conventionally configured depending on the memory architecture, that is, the three-dimensional computer graphics architecture is separated from the memory and used as an IP used for general-purpose SOCs, etc., the functional block specifications depending on the memory system Can be easily adapted to the system.

以下、本実施形態について添付図面に関連付けて説明する。
本実施形態においては、画像メモリアーキテクチャに依存して実アドレス生成を行う機能を、画像メモリアーキテクチャに非依存のコマンドの形式とデータで受け渡す形態で切りわけることにより、画像処理におけるメモリアクセスを画像処理メモリアーキテクチャに依存しないキャッシュに分離している。
これにより、従来はメモリアーキテクチャに依存して構成されていた画像処理、すなわち３次元コンピュータグラフィックスアーキテクチャをメモリから分離し、汎用ＳＯＣなどに使用するＩＰにした場合、メモリシステムに依存する機能ブロック仕様が標準化されているため、容易にシステムに順応した構成をとることが可能となっている。 Hereinafter, this embodiment will be described with reference to the accompanying drawings.
In the present embodiment, memory access in image processing is controlled by switching the function of performing real address generation depending on the image memory architecture in the form of command and data that is independent of the image memory architecture. Separated into a cache independent of processing memory architecture.
Thus, when image processing, which is conventionally configured depending on a memory architecture, that is, a three-dimensional computer graphics architecture is separated from a memory and used as an IP for general-purpose SOCs, etc., functional block specifications depending on the memory system Is standardized, it is possible to easily adopt a configuration adapted to the system.

図１は、本発明に係る画像処理装置の要部の一実施形態を示す構成図である。 FIG. 1 is a configuration diagram showing an embodiment of a main part of an image processing apparatus according to the present invention.

本画像処理装置１０は、３次元コンピュータグラフィックスシステムに適用され、たとえば立体モデルを単位図形である三角形（ポリゴン）の組み合わせとして表現し、このポリゴンを描画することで表示画面の各画素の色を決定し、メモリに対するポリゴンレンダリング処理を行う。
また、３次元コンピュータグラフィックスシステムでは、平面上の位置を表現する（ｘ，ｙ）座標の他に、奥行きを表すｚ座標を用いて３次元物体を表し、この（ｘ，ｙ，ｚ）の３つの座標で３次元空間の任意の一点を特定する。 The image processing apparatus 10 is applied to a three-dimensional computer graphics system. For example, a three-dimensional model is expressed as a combination of triangles (polygons) that are unit figures, and the colors of each pixel on the display screen are displayed by drawing the polygons. Determine and perform polygon rendering processing for the memory.
In the three-dimensional computer graphics system, in addition to the (x, y) coordinate representing the position on the plane, the z coordinate representing the depth is used to represent the three-dimensional object, and this (x, y, z) An arbitrary point in the three-dimensional space is specified by three coordinates.

本画像処理装置１は、図１に示すように、システム依存ブロック（Ｓｙｓｔｅｍ−ｄｅｐｅｎｄｂｌｏｃｋ）２、およびコアブロック（Ｃｏｒｅｂｌｏｃｋ）３を主構成要素として有している。 As shown in FIG. 1, the image processing apparatus 1 includes a system-dependent block 2 and a core block 3 as main components.

システム依存ブロック２は、図１に示すように、外部インタフェースとしてのＣＰＵインタフェース（ＢＢ）２１およびメモリインタフェース（ＭＥＭＩＦ）２２により構成される。 As shown in FIG. 1, the system-dependent block 2 includes a CPU interface (BB) 21 and a memory interface (MEMIF) 22 as external interfaces.

ＣＰＵインタフェース２１は、たとえばＣＰＵバス４を介して図示しない上位装置としてのＣＰＵ（メインプロセッサ）に接続され、ＣＰＵとコアブロック３間、あるいはコアブロック３内の機能ブロック同士間のデータ授受等のインタフェース処理を行う。 The CPU interface 21 is connected to, for example, a CPU (main processor) as a host device (not shown) via the CPU bus 4, and is an interface such as data exchange between the CPU and the core block 3 or between functional blocks in the core block 3. Process.

メモリインタフェース２２は、コアブロック３と図示しないメモリバスあるいはシステムバスを通して接続されるメモリ間のデータ授受等のインタフェース処理を行う。 The memory interface 22 performs interface processing such as data exchange between the core block 3 and a memory connected through a memory bus (not shown) or a system bus.

コアブロック３は、図２に示すように、ジオメトリエンジン（ＧＥ：ＧｅｏｍｅｔｒｙＥｎｇｉｎｅ）３１、およびラスタライゼーションエンジン（ＲＥ；ＲａｓｔｅｒｉｚａｔｉｏｎＥｎｇｉｎｅ）３３により構成されている。 As illustrated in FIG. 2, the core block 3 includes a geometry engine (GE) 31 and a rasterization engine (RE) 33.

ジオメトリエンジン３１は、たとえば図示しないＣＰＵによりＣＰＵバス４を転送され、ＣＰＵインタフェースを通して転送された、３次元座標、法線ベクトル、テクスチャ座標の各頂点データが入力されると、頂点データに対して、たとえば浮動少数点演算を行う。
代表的な演算としては、物体の変形やスクリーンへの投影などを行う座標変換の演算処理、ライティング（Ｌｉｇｈｔｉｎｇ）の演算処理、クリッピング（Ｃｌｉｐｐｉｎｇ）の演算処理がある。
ジオメトリエンジン３１は、演算処理結果を、たとえばポリゴンレンダリングデータとして、ＣＰＵインタフェース２１を通してラスタライゼーションエンジン３２に転送する。 When the geometry engine 31 is transferred to the CPU bus 4 by a CPU (not shown), for example, and each vertex data of three-dimensional coordinates, normal vectors, and texture coordinates transferred through the CPU interface is input, For example, floating point arithmetic is performed.
Typical calculations include coordinate conversion calculation processing that performs deformation of an object, projection onto a screen, lighting calculation processing, and clipping calculation processing.
The geometry engine 31 transfers the calculation processing result to the rasterization engine 32 through the CPU interface 21 as, for example, polygon rendering data.

このポリゴンレンダリングデータは、ポリゴンの各３頂点の（ｘ，ｙ，ｚ，Ｒ，Ｇ，Ｂ，α，ｓ，ｔ，ｑ）のデータを含んでいる。
ここで、（ｘ，ｙ，ｚ）データは、ポリゴンの頂点の３次元座標を示し、（Ｒ，Ｇ，Ｂ）データは、それぞれ当該３次元座標における赤、緑、青の輝度値を示している。
また、αはブレンド値（係数）を示している。
（ｓ，ｔ，ｑ）データのうち、（ｓ，ｔ）は、対応するテクスチャの同次座標を示しており、ｑは同次項を示している。ここで、「ｓ／ｑ」および「ｔ／ｑ」に、それぞれテクスチャサイズＵＳＩＺＥおよびＶＳＩＺＥを乗じて、実際のテクスチャ座標データ（ｕ，ｖ）が得られる。
ラスタライゼーションエンジン３２のメモリに記憶されたテクスチャデータへのアクセスは、テクスチャ座標データ（ｕ，ｖ）を用いて行われる。
すなわち、ポリゴンレンダリングデータは、三角形の各頂点の物理座標値と、それぞれの頂点の色とテクスチャデータである。 This polygon rendering data includes (x, y, z, R, G, B, α, s, t, q) data of each of the three vertices of the polygon.
Here, (x, y, z) data indicates the three-dimensional coordinates of the vertices of the polygon, and (R, G, B) data indicates the luminance values of red, green, and blue at the three-dimensional coordinates, respectively. Yes.
Α represents a blend value (coefficient).
Of the (s, t, q) data, (s, t) indicates the homogeneous coordinates of the corresponding texture, and q indicates the homogeneous term. Here, “s / q” and “t / q” are multiplied by the texture sizes USIZE and VSIZE, respectively, to obtain actual texture coordinate data (u, v).
Access to the texture data stored in the memory of the rasterization engine 32 is performed using the texture coordinate data (u, v).
That is, the polygon rendering data is a physical coordinate value of each vertex of the triangle, a color of each vertex, and texture data.

ラスタライゼーションエンジン３２は、ジオメトリエンジン３１によるポリゴンレンダリングデータを受けて、ラスタライゼーションに必要な各種データ（奥行き（Ｚ），カラー（Ｃ）、テクスチャ座標など）の傾き等のＤＤＡパラメータを算出し、算出した各種データをラスタライズし、テクスチャ座標のパースペクティブコレクション（ＰｅｒｓｐｅｃｔｉｖｅＣｏｒｒｅｃｔｉｏｎ）を行い、テクスチャフィルタリングを行う。
テクスチャ座標のパースペクティブコレクション処理には、ＬＯＤ（ＬｅｖｅｌｏｆＤｅｔａｉｌ）計算によるミップマップ（ＭＩＰＭＡＰ）レベルの算出や、テクスチャアクセスのための（ｕ，ｖ）アドレス計算も含まれる。テクスチャフィルタリング処理においては、図示しないメモリから読み出されたデータと、（ｕ，ｖ）アドレスを算出した時に得る小数部を４近傍補間などのフィルタリング処理を行う。
ラスタライゼーションエンジン３２は、次に、ピクセルレベルの処理を行う。この処理においては、フィルタリング後のテクスチャデータと、ラスタライズ後の各種データ（Ｚ，Ｃ）を用いてピクセル単位の演算を行う。ここで行う処理には、ピクセルレベルでのライティングなどのピクセルオペレーションに加えて、アルファテスト、Ｚ（奥行き）テスト、アルファブレンディング等の各種処理を行う。
そして、ラスタライゼーションエンジン３２は、ピクセルレベルの処理における各種テストをパスしたピクセルデータを、メモリインタフェース２２を通して外部のメモリに書き込む。 The rasterization engine 32 receives polygon rendering data from the geometry engine 31 and calculates DDA parameters such as inclinations of various data (depth (Z), color (C), texture coordinates, etc.) necessary for rasterization. The various data are rasterized, a perspective collection of texture coordinates (Perspective Correction) is performed, and texture filtering is performed.
The texture coordinate perspective collection processing includes calculation of a mipmap (MIPMAP) level by LOD (Level of Detail) calculation and calculation of (u, v) address for texture access. In the texture filtering process, a filtering process such as 4-neighbor interpolation is performed on the data read from the memory (not shown) and the decimal part obtained when the (u, v) address is calculated.
The rasterization engine 32 then performs pixel level processing. In this processing, calculation in units of pixels is performed using the texture data after filtering and the various data (Z, C) after rasterization. In this processing, in addition to pixel operations such as lighting at the pixel level, various processes such as an alpha test, a Z (depth) test, and alpha blending are performed.
The rasterization engine 32 writes pixel data that has passed various tests in pixel level processing to an external memory through the memory interface 22.

本実施形態におけるラスタライゼーションエンジン３２は、メモリインタフェース２２を通してメモリへのアクセスを行うが、上述した、テクスチャおよびピクセルレベルの処置を行うテクスチャ・ピクセルエンジンとメモリインタフェースとの間にキャッシュ機構を設けて、画像メモリアーキテクチャに依存して実アドレス生成を行う機能を、画像メモリアーキテクチャに非依存のコマンドの形式とデータで受け渡す形態で切りわけることにより、画像処理におけるメモリアクセスを画像処理メモリアーキテクチャに依存しないキャッシュに分離している。 The rasterization engine 32 in this embodiment accesses the memory through the memory interface 22, but a cache mechanism is provided between the texture pixel engine and the memory interface that perform the texture and pixel level processing described above, and The memory access in image processing does not depend on the image processing memory architecture by switching the function of generating real addresses depending on the image memory architecture in the form of command and data that is independent of the image memory architecture. Separated into cache.

具体的には、本実施形態におけるラスタライゼーションエンジン３２は、
１）キャッシュ機構までのアドレスにおいて仮想２次元画面座標を扱う、
２）キャッシュからのメモリアクセスには上記内容によるコマンドとデータで標準化する、
３）実アドレスに必要な汎用情報（メモリ先頭アドレス等）もコマンドに収めることで標準化する。
４）実アドレス空間は専用の機構に切り分け、必要最小の実現規模で実現する、
という特徴的な構成を有する。 Specifically, the rasterization engine 32 in the present embodiment is
1) Handle virtual 2D screen coordinates at addresses up to the cache mechanism.
2) For memory access from the cache, standardize with commands and data according to the above contents.
3) Standardize general-purpose information (memory start address, etc.) required for real addresses by including them in commands.
4) The real address space is divided into dedicated mechanisms and is realized with the minimum necessary scale.
It has a characteristic configuration.

以下、この特徴的なキャッシュ機構を含むラスタライゼーションエンジン３２のより具体的な構成について説明する。 Hereinafter, a more specific configuration of the rasterization engine 32 including this characteristic cache mechanism will be described.

図２は、本実施形態に係るラスタライゼーションエンジン３２の具体的な構成例を示すブロック図である。
このラスタライゼーションエンジン３２は、図２に示すように、線形補間演算のための初期設定演算ブロックとしてのＤＤＡ(Digital Differential Analizer) セットアップ回路３２１、線形補間処理ブロックとしてのトライアングルＤＤＡ回路３２２、テクスチャマップエンジン（ＴＭＥ）・ピクセルオペレーションエンジン（ＰＯＥ）（以下、ＴＭＥ＿ＰＯＥという）３２３、およびキャッシュシステム３２４を有する。 FIG. 2 is a block diagram showing a specific configuration example of the rasterization engine 32 according to the present embodiment.
As shown in FIG. 2, the rasterization engine 32 includes a DDA (Digital Differential Analizer) setup circuit 321 as an initial setting calculation block for linear interpolation calculation, a triangle DDA circuit 322 as a linear interpolation processing block, and a texture map engine. (TME) pixel operation engine (POE) (hereinafter referred to as TME_POE) 323 and cache system 324.

ＤＤＡセットアップ回路３２１は、後段のトライアングルＤＤＡ回路３２２において物理座標系上の三角形の各頂点の値を線形補間して、三角形の内部の各画素の色と深さ情報を求めるに先立ち、ポリゴンレンダリングデータＳ１１が示す（ｚ，Ｒ，Ｇ，Ｂ，α，ｓ，ｔ，ｑ）データについて、三角形の辺と水平方向の差分などを求めるセットアップ演算を行う。
このセットアップ演算は、具体的には、開始点の値と終点の値と、開始点と終点との距離を用いて、単位長さ移動した場合における、求めようとしている値の変分を算出する。
ＤＤＡセットアップ回路３２１は、算出した変分データを含むプリミティブに関する情報としてのセットアップデータＳ３２１をトライアングルＤＤＡ回路３２２に出力する。 Prior to obtaining the color and depth information of each pixel inside the triangle by linearly interpolating the values of the vertices of the triangle on the physical coordinate system in the subsequent triangle DDA circuit 322, the DDA setup circuit 321 generates polygon rendering data. For the (z, R, G, B, α, s, t, q) data indicated by S11, a setup calculation is performed to find the difference between the sides of the triangle and the horizontal direction.
Specifically, this set-up calculation uses the start point value, end point value, and distance between the start point and end point to calculate the variation of the value to be obtained when the unit length is moved. .
The DDA setup circuit 321 outputs setup data S321 as information on the primitive including the calculated variation data to the triangle DDA circuit 322.

ＤＤＡセットアップ回路３２１の機能について図３に関連付けてさらに説明する。
上述したように、ＤＤＡセットアップ回路３２１の主な処理は、前段のジオメトリ処理を経て物理座標にまで落ちてきた各頂点における各種情報（色、テクスチャ座標）の与えられた三頂点Ｐ０（ｘ０，ｙ０）、Ｐ１（ｘ１，ｙ１）、Ｐ２（ｘ２，ｙ２）により構成される三角形内部で変分を求めて、後段の線形補間処理の基礎デ−タを算出することである。
三角形の描画はひとつひとつの画素の描画に集約されるが、そのために描画開始点における最初の値を求める必要がある。
最初の描画点における各種情報は、頂点からその最初の描画点までの水平距離に水平方向の変分を掛けた値と、垂直距離に垂直方向の変分を掛けた値を足し合わせたものとなる。いったん目的の三角形の内部の一つの整数格子上の値が求まれば、対象の三角形内部のその他の格子点における値は変分の整数倍で求めることが可能となる。 The function of the DDA setup circuit 321 will be further described with reference to FIG.
As described above, the main processing of the DDA setup circuit 321 is the three vertices P0 (x0, y0) given various information (color, texture coordinates) at each vertex that has fallen to the physical coordinates after the previous geometry processing. ), P1 (x1, y1), and P2 (x2, y2) to obtain the variation within the triangle, and calculate the basic data of the subsequent linear interpolation processing.
Triangular drawing is summarized into drawing of every pixel, and for this purpose, it is necessary to obtain the first value at the drawing start point.
Various information at the first drawing point is the sum of the value obtained by multiplying the horizontal distance from the vertex to the first drawing point by the horizontal variation and the value obtained by multiplying the vertical distance by the vertical variation. Become. Once the values on one integer grid inside the target triangle are found, the values at other grid points inside the target triangle can be found by integer multiples of the variation.

三角形の各頂点データは、たとえばｘ，ｙ座標が１６ビット、ｚ座標が２４ビット、ＲＧＢカラー値が各１２ビット（＝８＋４）、ｓ，ｔ，ｑテクスチャ座標は各３２ビット浮動少数値（ＩＥＥＥフォーマット）等で構成される。 Each vertex data of the triangle is, for example, 16 bits for x and y coordinates, 24 bits for z coordinates, 12 bits for RGB color values (= 8 + 4), and 32 bits for s, t, q texture coordinates (IEEE). Format).

なお、このＤＤＡセットアップ回路３２１は、従来のようにＤＳＰ構造ではなく、ＡＳＩＣ手法により実装している。
具体的には、図４に示すように、多段に配置したレジスタ３２１１−１〜３２１１−４間に複数の演算ユニットを並列に配置した演算ユニット群３２１２−１〜３２１２−３を挿入したフルデータパスロジック、換言すれば、同期パイプライン方式の時間並列構造として構成されている。 Note that the DDA setup circuit 321 is mounted by an ASIC technique instead of a DSP structure as in the prior art.
Specifically, as shown in FIG. 4, full data in which arithmetic unit groups 3212-1 to 2122-3 in which a plurality of arithmetic units are arranged in parallel are inserted between registers 3211-1 to 2112-4 arranged in multiple stages. It is configured as a path logic, in other words, a time parallel structure of a synchronous pipeline system.

トライアングルＤＤＡ回路３２２は、ＤＤＡセットアップ回路３２１から入力した変分データを含むプリミティブに関する情報としてのセットアップデータＳ３２１を基に、三角形内部の各画素における線形補間された（ｚ，Ｒ，Ｇ，Ｂ，α，ｓ，ｔ，ｑ）データを算出する。
トライアングルＤＤＡ回路３２２は、各画素の（ｘ，ｙ）データと、当該（ｘ，ｙ）座標における（ｚ，Ｒ，Ｇ，Ｂ，α，ｓ，ｔ，ｑ）データとを、ＤＤＡデータ（補間データ）Ｓ３２２としてＴＭＥ＿ＰＯＥ３２３に出力する。 The triangle DDA circuit 322 is linearly interpolated (z, R, G, B, α) in each pixel inside the triangle based on the setup data S321 as information regarding primitives including variation data input from the DDA setup circuit 321. , S, t, q) data is calculated.
The triangle DDA circuit 322 converts (x, y) data of each pixel and (z, R, G, B, α, s, t, q) data at the (x, y) coordinates into DDA data (interpolation). Data) Output to TME_POE 323 as S322.

すなわち、トライアングルＤＤＡ回路３２２は、ポリゴンの頂点毎に求められた画像パラメータに基づいてポリゴン内部に含まれるすべてのピクセルの画像パラメータを補間するラスタライズ処理（ラスタライゼーション：Ｒａｓｔｅｒｉｚａｔｉｏｎ）を行う。
具体的には、トライアングルＤＤＡ回路３２２は、各種データ（ｚ，テクスチャ座標、カラーなど）をラスタライズする。 That is, the triangle DDA circuit 322 performs a rasterization process (rasterization) that interpolates the image parameters of all the pixels included in the polygon based on the image parameters obtained for each vertex of the polygon.
Specifically, the triangle DDA circuit 322 rasterizes various data (z, texture coordinates, color, etc.).

ＴＭＥ＿ＰＯＥ３２３は、たとえばテクスチャブレンディング処理、アルファテスト、奥行き（デプス）テスト、アルファブレンディング処理、およびピクセルオペレーションを行う。ピクセルオペレーションにおいては、各種テストをパスしたピクセルデータを、キャッシュシステム３２４、メモリインタフェース２２を通して外部のメモリに書き込む。 The TME_POE 323 performs, for example, a texture blending process, an alpha test, a depth (depth) test, an alpha blending process, and a pixel operation. In pixel operation, pixel data that passes various tests is written to an external memory through the cache system 324 and the memory interface 22.

以上のように、ＤＤＡセットアップ回路３２１、トライアングルＤＤＡ回路３２２、ＴＭＥ＿ＰＯＥ３２３における所定処理を経て、最終的なメモリアクセスがピクセル(Pixel；Picture Cell Element) という描画画素単位になる。 As described above, after predetermined processing in the DDA setup circuit 321, the triangle DDA circuit 322, and the TME_POE 323, the final memory access becomes a drawing pixel unit called a pixel (Pixel; Picture Cell Element).

キャッシュシステム３２４は、テクスチャキャッシュ（Ｔキャッシュ）３２４１、カラーキャッシュ（Ｃキャッシュ）３２４２、およびデプスキャッシュ（Ｚキャッシュ）３２４２を有する。 The cache system 324 includes a texture cache (T cache) 3241, a color cache (C cache) 3242, and a depth cache (Z cache) 3242.

ＴＭＥ＿ＰＯＥ３２３におけるブレンディングのためのテクスチャデータは、Ｔキャッシュ３２４１に配されたテクスチャ座標を通し、さらにメモリインタフェース２２を介してアクセスされる。
ＴＭＥ＿ＰＯＥ３２３におけるカラー／デプスオペレーションのためのカラーデータは、Ｃキャッシュ３２４２およひＺキャッシュ３２４３に配された２次元のピクセル座標を通し、さらに、メモリインタフェース２２を介してアクセスされる。
このように、本実施形態に係るＴＭＥ＿ＰＯＥ３２３においては、キャッシュシステムに基づく座標およびアドレスブロック（ＭＥＭＩＦ）に依存する分離のメモリシステム、並びに、キャッシュシステム３２４とメモリインタフェース２２間のインタフェースに依存するコマンドベースシステムを採用している。 Texture data for blending in the TME_POE 323 is accessed via the memory interface 22 through the texture coordinates arranged in the T cache 3241.
Color data for color / depth operation in the TME_POE 323 is accessed through the memory interface 22 through two-dimensional pixel coordinates arranged in the C cache 3242 and the Z cache 3243.
As described above, in TME_POE 323 according to the present embodiment, a separate memory system that depends on the coordinates and address blocks (MEMIF) based on the cache system, and a command-based system that depends on the interface between the cache system 324 and the memory interface 22 Is adopted.

なお、本実施形態において、ＴＭＥ＿ＰＯＥ３２３におけるピクセルオペレーションにおける座標空間は、使用されるメモリシステムに依存しない。
キャッシュシステム３２４に基づく座標は、いかなるメモリシステムに対して構成することができる。
また、アドレスブロック（ＭＥＭＩＦ）に依存するシステムは、キャッシュシステム３２４から分離可能である。 In the present embodiment, the coordinate space in the pixel operation in TME_POE 323 does not depend on the memory system used.
Coordinates based on the cache system 324 can be configured for any memory system.
A system that relies on an address block (MEMIF) can be separated from the cache system 324.

図５は、キャッシュシステム３２４のＴＭＥ＿ＰＯＥ３３４（ＲＥ）側とメモリアクセスブロックであるメモリインタフェース（ＭＥＭＩＦ）２２が分離した、コマンドベース・キャッシュデータアクセス構造を示す図である。 FIG. 5 is a diagram showing a command-based cache data access structure in which the TME_POE 334 (RE) side of the cache system 324 and the memory interface (MEMIF) 22 as a memory access block are separated.

図５に示すキャッシュコマンド（Ｃ・ＣＭＤ）は、図６に示すように、たとえば１ビットのレジスタセットビット、複数ビットのキャッシュに配された座標、タグ（ＴＡＧ）アドレス情報、ライン情報、１ビットのフィル（ｆｉｌｌ）／ライト（ｗｒｉｔｅ）バックディレクション、複数ビットのライトバック用データマスクビットを含む。 The cache command (C / CMD) shown in FIG. 5 includes, for example, a 1-bit register set bit, coordinates arranged in a multi-bit cache, tag (TAG) address information, line information, 1 bit, as shown in FIG. Fill / write back direction, and a plurality of write back data mask bits.

この構造を用いることにより、ページ割り当てやバスプロトコルのようなアドレスフィーチャ依存のシステムは、メモリインタフェース（ＭＥＭＩＦ）２２にのみ搭載すればよい。
したがって、設計が容易で、低コストのシステムを構築できる。 By using this structure, an address feature dependent system such as page allocation or bus protocol need only be installed in the memory interface (MEMIF) 22.
Therefore, it is possible to construct a low-cost system that is easy to design.

以上のように、本実施形態にラスタライゼーションエンジン３２におけるメモリアクセスは、カラー（Ｃ）をもつピクセルと、３次元奥行きをもつＺと、テクスチャをもつＴに分かれ、先ず実アドレスとは切り離された２次元座標系に対応したキャッシュ機構に対して行われる。 As described above, the memory access in the rasterization engine 32 in this embodiment is divided into pixels having color (C), Z having a three-dimensional depth, and T having a texture, and is first separated from a real address. This is performed for a cache mechanism corresponding to a two-dimensional coordinate system.

図７は、本実施形態におけるＣキャッシュおよびＺキャッシュの具体的なアクセス制御系を示す図である。
このアクセス制御系１００は、ＰＯＥコントローラ（ＰＯＥＣＴＬ）１０１、キャッシュラインタグ・フラグ部１０２、キャッシュライン１０３−１〜１０３−ｎ（ｎはたとえば８，１６，３２・・・）、コントロールレジスタ（ＣＴＬＲＥＧ）１０５、およびＤＭＡ１０４を有する。
なお、ＤＭＡ１０４は、図１、図２のメモリインタフェース２２に相当するものであり、システムに依存する部分である。 FIG. 7 is a diagram showing a specific access control system of the C cache and the Z cache in the present embodiment.
The access control system 100 includes a POE controller (POECTL) 101, a cache line tag / flag unit 102, cache lines 103-1 to 103-n (n is 8, 16, 32...), A control register (CTLREG). 105 and DMA 104.
The DMA 104 corresponds to the memory interface 22 in FIGS. 1 and 2 and is a system-dependent part.

このアクセス制御系１００において、アクセス単位は１ピクセルであり、ｘ，ｙの２次元座標で示す。１キャッシュライン１０３（−１〜−ｎ）は１２８ビット／２５６ビットであり、ピクセルデータに対して２次元座標に相応する２次元構造をとる。
ＰＯＥコントローラ（ＰＯＥＣＴＬ）１０１は、ＴＭＥ＿ＰＯＥ３３からのピクセルリード／ライト等に対応した制御を行う。
コントロールレジスタ（ＣＴＬＲＥＧ）１０５は、キャッシュのピクセルフォーマットを確定し、キャッシュのアクセス属性（Ｒ／Ｗ，Ｗ）を確定する。
上述したように、実アドレス空間のアドレスはＤＭＡ１０５が行い、これらの設定はＤＭＡコマンドで伝達される。
また、キャッシュラインタグ・フラグ部１０２におけるタグ／フラグは、キャッシュラインとメモリを２次元で結びつけ、また、キャッシュラインのステータスを示す。
たとえば、ＤＭＡ１０５からは１２８ビットアクセス、ＰＯＥ３３側からは最大３２ビットアクセスが実現される。 In this access control system 100, the access unit is one pixel, which is indicated by two-dimensional coordinates of x and y. One cache line 103 (-1 to -n) is 128 bits / 256 bits, and takes a two-dimensional structure corresponding to two-dimensional coordinates for pixel data.
A POE controller (POECTL) 101 performs control corresponding to pixel read / write from the TME_POE 33.
The control register (CTLREG) 105 determines the cache pixel format and the cache access attributes (R / W, W).
As described above, the address of the real address space is set by the DMA 105, and these settings are transmitted by the DMA command.
The tag / flag in the cache line tag / flag unit 102 links the cache line and the memory two-dimensionally and indicates the status of the cache line.
For example, 128-bit access is realized from the DMA 105 and a maximum of 32-bit access is realized from the POE 33 side.

つぎに、Ｃ（カラー）／Ｚ（奥行き／デプス）キャッシュの２次元（２Ｄ）マップ処理に一例について説明する。
この場合、２次元アドレスに対応するＣ／Ｚキャッシュのライン番号の算出、ＰＯＥ３３のリード／ライト（Ｒ／Ｗ）参照時の参照位置を決定する。
ラインマップは、コントロールレジスタ１０４に示すレジスタ設定に応じて、ピクセルフォーマットにより異なる以下の３種類に分類される。 Next, an example will be described for two-dimensional (2D) map processing of a C (color) / Z (depth / depth) cache.
In this case, the C / Z cache line number corresponding to the two-dimensional address is calculated, and the reference position when the POE 33 is read / written (R / W) is determined.
The line map is classified into the following three types that differ depending on the pixel format in accordance with the register setting shown in the control register 104.

すなわち、
１）１６ライン・・Ｃキャッシュにおける１６ビットピクセルフォーマット、Ｚキャッシュにおける１６ビットピクセルフォーマット、
２）８ライン（４×２）・・Ｃキャッシュにおける３２ビットピクセルフォーマット、Ｚキャッシュにおける３２ピクセルフォーマット、
３）８ライン（４×４）・・Ｃキャッシュにおける８ビットピクセルフォーマット、
の３種類である。 That is,
1) 16-bit pixel format in 16-line .C cache, 16-bit pixel format in Z-cache,
2) 8 lines (4 × 2) .. 32 bit pixel format in C cache, 32 pixel format in Z cache,
3) 8 lines (4 × 4) .. 8 bit pixel format in C cache,
There are three types.

ここで、Ｃ／Ｚキャッシュの１６ラインの場合を例に具体的な処理について説明する。なお、８ライン（４×２）、８ライン（４×４）は１６ラインの場合と同様に行われることから、本実施形態ではその詳細な説明は省略する。 Here, specific processing will be described by taking the case of 16 lines of C / Z cache as an example. Since 8 lines (4 × 2) and 8 lines (4 × 4) are performed in the same manner as in the case of 16 lines, detailed description thereof is omitted in this embodiment.

Ｃ／Ｚキャッシュの１６ラインの場合、アクセス座標（アドレス）は、キャッシュサイズとなるように、４×２でブロックされる。このブロックアドレス（ｘb ，ｙb ）と参照アドレス（ｘo ，ｙo ）の関係は以下のようになる。
ｘb ＝ｘo ＞＞２
ｙb ＝ｙo ＞＞１ In the case of 16 lines in the C / Z cache, the access coordinates (address) are blocked at 4 × 2 so as to be the cache size. The relationship between the block address (xb, yb) and the reference address (xo, yo) is as follows.
xb = xo >> 2
yb = yo >> 1

このブロックアドレス（ｘb ，ｙb ）について、タグＮｏｎｕｍの算出は以下のようになる。
ｎｕｍ＝（ｘb ＆０Ｘ３）｜（（ｙb ＆０Ｘ３）＜＜２） For this block address (xb, yb), the tag Nonum is calculated as follows.
num = (xb & 0X3) | ((yb & 0X3) << 2)

２次元座標値をアドレスとしている関係（ＤＭＡ１０５によるメモリ参照物理アドレスのページアドレスはＤＭＡ１０５側で確定する）から、タグに保存されるアドレスはブロックアドレス（ｘb ，ｙb ）の上位となる。
ｘtag ＝ｘb ＞＞２
ｙtag ＝ｙb ＞＞２ Since the two-dimensional coordinate value is an address (the page address of the memory reference physical address by the DMA 105 is determined on the DMA 105 side), the address stored in the tag is higher than the block address (xb, yb).
xtag = xb >> 2
ytag = yb >> 2

キャッシュビット参照は、タグアドレッシング後にこの値を比較して行われることになる。
この場合の参照座標入力との関係を図８に示す。 The cache bit reference is performed by comparing this value after tag addressing.
The relationship with reference coordinate input in this case is shown in FIG.

次に、Ｃ／Ｚキャッシュの１６ラインの場合のピクセルデータの割り当てについて説明する。
１キャッシュライン１２８ビットデータについて、ＤＭＡ１０５側とＰＯＥ３３側の２次元８ピクセルの並びに対する割り付けについて説明すると、ＤＭＡ側の１２８ビットのラインデータに対するピクセル割り付けは、図９に示す形となる。
なお、３２ビットのピクセル出た割り付けはＬＳＢ側の１６ビットとなる。 Next, pixel data allocation in the case of 16 lines in the C / Z cache will be described.
The assignment of one cache line 128-bit data to the two-dimensional 8-pixel arrangement on the DMA 105 side and the POE 33 side will be described. The pixel assignment for the 128-bit line data on the DMA side has the form shown in FIG.
The allocation of 32 bit pixels is 16 bits on the LSB side.

次に、Ｃ／Ｚキャッシュの１６ラインの場合のスタンプアクセス（ｓｔａｍｐａｃｃｅｓｓ）について説明する。
ＰＯＥ３３からのリード／ライト（Ｒ／Ｗ）参照は、１６ビットの１スタンプ分１ピクセルで行われる。
参照は、その性質上、リード／ライト（Ｒ／Ｗ）のペアになるが。、処理の結果、書き込みが行われない場合は、無効（ｉｎｖａｌｉｄ）としてライト（ｗｒｉｔｅ）が発行されなければならない。
スタンプ参照座標（ｘs0，ｙs0）（ｘs0１１ビット、ｙs0１１ビット）についてのスタンプ参照ブロックアドレス（ｘsb，ｙsb）は、次のように与えられる。
ｘsb＝ｘso＞＞２
ｙsb＝ｙso＞＞１ Next, stamp access (stamp access) in the case of 16 lines in the C / Z cache will be described.
The read / write (R / W) reference from the POE 33 is performed by one pixel for one 16-bit stamp.
The reference is a read / write (R / W) pair because of its nature. If writing is not performed as a result of the processing, write (write) must be issued as invalid.
The stamp reference block address (xsb, ysb) for the stamp reference coordinates (xs0, ys0) (xs011 bits, ys011 bits) is given as follows.
xsb = xso >> 2
ysb = yso >> 1

１２８ビット／ｌｉｎｅ内でのスタンプ参照の関係は、スタンプ参照座標（ｘs0，ｙs0）のＬＳＢ側ビットを使い、図１０に示すようになる。 The relationship of stamp reference within 128 bits / line is as shown in FIG. 10 using LSB side bits of stamp reference coordinates (xs0, ys0).

次に、本実施形態に係るメモリインタフェース（ＭＥＭＩＦ）２２について述べる。
上述したように、Ｃ／Ｚキャッシュののアクセスは汎用コマンドとして実システムとのアクセスを実現するメモリインタフェース（ＭＥＭＩＦ）２２とインタフェースされる（転送の整合が行われる）。Ｔキャッシュのアクセスは、同様な汎用コマンドとして実システムとのアクセスを実現すメモリインタフェース（ＭＥＭＩＦ）２２とインターフェースされる。
このようなメモリインタフェース（ＭＥＭＩＦ）２２は、ＳＯＣにおけるメモリアクセスのインタフェース（ＩＦ）仕様に合わせ、さらに２次元キャッシュ機構のメモリアクセス（キャッシュミス時）の特性を考慮した実アドレス構成を取り、３次元コンピュータグラフィックス（３ＤＣＧ）のメモリシステム非依存のＣ／Ｚ／Ｔのインターフェースを実現する。 Next, the memory interface (MEMIF) 22 according to the present embodiment will be described.
As described above, access to the C / Z cache is interfaced with the memory interface (MEMIF) 22 that realizes access to the real system as a general-purpose command (transfer matching is performed). The T-cache access is interfaced with a memory interface (MEMIF) 22 that realizes access to the real system as a similar general-purpose command.
Such a memory interface (MEMIF) 22 adopts a real address configuration in consideration of the characteristics of memory access (at the time of a cache miss) of the two-dimensional cache mechanism in accordance with the memory access interface (IF) specification in the SOC. A C / Z / T interface independent of a memory system of computer graphics (3DCG) is realized.

メモリインタフェース（ＭＥＭＩＦ）２２の機能仕様は、３ＤＣＧ機能とは分離された仕様となり、Ｃ／Ｚの場合、たとえば対象ＳＯＣのバス仕様、ＳＤＲＡＭ仕様に合わせたアドレスになるように汎用コマンドを処理する機能仕様が採用される。
メモリインタフェース（ＭＥＭＩＦ）２２の実メモリアドレスを実現するため汎用コマンドによる先頭アドレスとアドレス計算係数の設定が行われる。Ｔキャッシュについても同様に実現される。
本実施形態んおいて注目すべきは、最も規模が大きく複雑な３ＤＣＧは完全に分離され、対象システムの要求に合わせた機能ブロックを、３ＤＣＧ機能ブロックから指定された仕様によるインターフェースにあわせ込むだけで実現している点である。したがって、他機能要求（他ＳＯＣなど）に対する対応は上記ＭＥＭＩＦに該当する設計だけで良い。 The function specification of the memory interface (MEMIF) 22 is a specification separated from the 3DCG function, and in the case of C / Z, for example, a function for processing general-purpose commands so as to have an address that matches the bus specification of the target SOC and the SDRAM specification. Specifications are adopted.
In order to realize a real memory address of the memory interface (MEMIF) 22, a head address and an address calculation coefficient are set by a general-purpose command. The same applies to the T-cache.
What should be noted in this embodiment is that the largest and most complicated 3DCG is completely separated, and the function block that matches the requirements of the target system is simply adjusted to the interface specified by the 3DCG function block. This is a realization. Therefore, only a design corresponding to the MEMIF needs to respond to other function requests (other SOC, etc.).

次に、上記構成による動作を説明する。 Next, the operation of the above configuration will be described.

まず、ジオメトリエンジン３１において、たとえば図示しないＣＰＵによりＣＰＵバス４を転送され、ＣＰＵインタフェースを通して転送された、３次元座標、法線ベクトル、テクスチャぜひょうの各頂点データが入力されると、頂点データに対して、たとえば浮動少数点演算が行われる。たとえば、必要に応じて、グラフィックス描画等のデータは、メインプロセッサ１１等において、座標変換、クリップ処理、ライティング処理等のジオメトリ処理が行われる。
そして、ジオメトリエンジン３１の演算処理結果は、たとえばポリゴンレンダリングデータとして、ＣＰＵインタフェース２１を通してラスタライゼーションエンジン３２に転送される。 First, in the geometry engine 31, for example, when the CPU bus 4 is transferred by a CPU (not shown) and the vertex data of the three-dimensional coordinates, the normal vector, and the texture that are transferred through the CPU interface are input, On the other hand, for example, a floating-point operation is performed. For example, as necessary, data such as graphics drawing is subjected to geometry processing such as coordinate conversion, clip processing, and lighting processing in the main processor 11 or the like.
The calculation processing result of the geometry engine 31 is transferred to the rasterization engine 32 through the CPU interface 21 as polygon rendering data, for example.

ラスタライゼーションエンジン３２においては、まず、ＤＤＡセットアップ回路３２１において、ポリゴンレンダリングデータに基づいて、三角形の辺と水平方向の差分などを示す変分データが生成される。
具体的には、開始点の値と終点の値、並びに、その間の距離を用いて、単位長さ移動した場合における、求めようとしている値の変化分である変分が算出され、変分データを含むセットアップデータＳ３２１としてトライアングルＤＤＡ回路３２２に出力される。 In the rasterization engine 32, first, the DDA setup circuit 321 generates variation data indicating the difference between the sides of the triangle and the horizontal direction based on the polygon rendering data.
Specifically, using the starting point value and the ending point value, and the distance between them, a variation that is a change in the value to be obtained when the unit length is moved is calculated, and the variation data Is output to the triangle DDA circuit 322 as setup data S321.

トライアングルＤＤＡ回路３２２においては、変分データを含むセットアップデータＳ３２１を用いて、、三角形内部の各画素における線形補間された（ｚ，Ｒ，Ｇ，Ｂ，α，ｓ，ｔ，ｑ）データが算出される。
そして、この算出された（ｚ，Ｒ，Ｇ，Ｂ，α，ｓ，ｔ，ｑ）データと、三角形の各頂点の（ｘ，ｙ）データとが、ＤＤＡデータＳ３２２として、トライアングルＤＤＡ回路３２２からＴＭＥ＿ＰＯＥ３２３に出力される。
すなわち、トライアングルＤＤＡ回路３２２においては、ポリゴンの頂点毎に求められた画像パラメータに基づいてポリゴン内部に含まれるすべてのピクセルの画像パラメータ（ｚ，テクスチャ座標、カラーなど）を補間するラスタライズ処理が行われる。 In the triangle DDA circuit 322, linearly interpolated (z, R, G, B, α, s, t, q) data for each pixel inside the triangle is calculated using the setup data S321 including variation data. Is done.
The calculated (z, R, G, B, α, s, t, q) data and (x, y) data of each vertex of the triangle are used as DDA data S322 from the triangle DDA circuit 322. It is output to TME_POE 323.
That is, in the triangle DDA circuit 322, rasterization processing is performed for interpolating image parameters (z, texture coordinates, color, etc.) of all pixels included in the polygon based on the image parameters obtained for each vertex of the polygon. .

ＴＭＥ＿ＰＯＥ３２３においては、たとえばテクスチャブレンディング処理、アルファテスト、奥行き（デプス）テスト、アルファブレンディング処理、およびピクセルオペレーションが行われる。そして、ピクセルオペレーションにおいては、各種テストをパスしたピクセルデータが、キャッシュシステム３２４、メモリインタフェース２２を通して外部のメモリに書き込まれる。
ＴＭＥ＿ＰＯＥ３２３におけるブレンディングのためのテクスチャデータは、Ｔキャッシュ３２４１に配されたテクスチャ座標を通し、さらにメモリインタフェース２２を介してアクセスされる。
ＴＭＥ＿ＰＯＥ３２３におけるカラー／デプスオペレーションのためのカラーデータは、Ｃキャッシュ３２４２およひＺキャッシュ３２４３に配された２次元のピクセル座標を通し、さらに、メモリインタフェース２２を介してアクセスされる。 In TME_POE 323, for example, a texture blending process, an alpha test, a depth (depth) test, an alpha blending process, and a pixel operation are performed. In the pixel operation, pixel data that passes various tests is written to an external memory through the cache system 324 and the memory interface 22.
Texture data for blending in the TME_POE 323 is accessed via the memory interface 22 through the texture coordinates arranged in the T cache 3241.
Color data for color / depth operation in the TME_POE 323 is accessed through the memory interface 22 through two-dimensional pixel coordinates arranged in the C cache 3242 and the Z cache 3243.

以上説明したように、本実施形態によれば、ラスタライゼーションエンジン３２において、メモリインタフェース２２を通してメモリへのアクセスを行うが、テクスチャおよびピクセルレベルの処置を行うテクスチャ・ピクセルエンジンとメモリインタフェースとの間にキャッシュシステム３２４を設けて、画像メモリアーキテクチャに依存して実アドレス生成を行う機能を、画像メモリアーキテクチャに非依存のコマンドの形式とデータで受け渡す形態で切りわけることにより、画像処理におけるメモリアクセスを画像処理メモリアーキテクチャに依存しないキャッシュに分離していることから、以下の効果を得ることができる。 As described above, according to the present embodiment, the rasterization engine 32 accesses the memory through the memory interface 22, but between the texture and pixel engine that performs texture and pixel level processing and the memory interface. A cache system 324 is provided, and the function of generating real addresses depending on the image memory architecture is divided into a command format independent of the image memory architecture and a form of passing data, thereby enabling memory access in image processing. Since the cache is independent of the image processing memory architecture, the following effects can be obtained.

すなわち、面積、電力の増大を抑止でき、メモリアクセス機構の簡単化を図れ、設計負荷を軽減できる利点がある。
その結果、従来はメモリアーキテクチャに依存して構成されていた画像処理、すなわち３次元コンピュータグラフィックスアーキテクチャをメモリから分離し、汎用ＳＯＣなどに使用するＩＰにした場合、メモリシステムに依存する機能ブロック仕様が標準化されているため、容易にシステムに順応した構成をとることが可能となる。 That is, there is an advantage that an increase in area and power can be suppressed, a memory access mechanism can be simplified, and a design load can be reduced.
As a result, when the image processing, which is conventionally configured depending on the memory architecture, that is, the three-dimensional computer graphics architecture is separated from the memory and used as an IP used for general-purpose SOCs, etc., the functional block specifications depending on the memory system Can be easily adapted to the system.

本発明に係る画像処理装置の要部の一実施形態を示す構成図である。It is a block diagram which shows one Embodiment of the principal part of the image processing apparatus which concerns on this invention. 本実施形態に係るコアブロックの構成例を示すブロック図である。It is a block diagram which shows the structural example of the core block which concerns on this embodiment. 本実施形態に係るトライアングルＤＤＡ回路の機能を説明するための図である。It is a figure for demonstrating the function of the triangle DDA circuit which concerns on this embodiment. 本実施形態に係るトライアングルＤＤＡ回路の要部構成を示すブロック図である。It is a block diagram which shows the principal part structure of the triangle DDA circuit which concerns on this embodiment. 本実施形態に係るキャッシュシステムのＴＭＥ＿ＰＯＥ（ＲＥ）側とメモリアクセスブロックであるメモリインタフェース（ＭＥＭＩＦ）が分離した、コマンドベース・キャッシュデータアクセス構造を示す図である。It is a figure which shows the command base cache data access structure which the memory interface (MEMIF) which is a memory access block and TME_POE (RE) side of the cache system which concerns on this embodiment isolate | separated. 図５のキャッシュコマンド（Ｃ・ＣＭＤ）構造例を示す図である。FIG. 6 is a diagram illustrating a structure example of a cache command (C / CMD) in FIG. 5. 本実施形態におけるＣキャッシュおよびＺキャッシュの具体的なアクセス制御系を示す図である。It is a figure which shows the specific access control system of C cache and Z cache in this embodiment. Ｃ／Ｚキャッシュ１６ラインの場合の２次元マップ処理を示す図である。It is a figure which shows the two-dimensional map process in the case of C / Z cache 16 lines. Ｃ／Ｚキャッシュ１６ラインの場合のＤＭＡ側のピクセル割り当てを示す図である。It is a figure which shows the pixel allocation by the side of DMA in the case of 16 lines of C / Z cache. Ｃ／Ｚキャッシュ１６ラインの場合のスタンプ参照の関係の一例を示す図である。It is a figure which shows an example of the relationship of the stamp reference in the case of C / Z cache 16 lines.

Explanation of symbols

１…画像処理装置、２…システム依存ブロック、２１…ＣＰＵインタフェース（ＢＢ）、２２…メモリインタフェース（ＭＥＭＩＦ）、３…コアブロック、３１…ジオメトリエンジン（ＧＥ）、３２…ラスタライゼーションエンジン（ＲＥ）、３２１…トライアングルセットアップ回路、３２２…トライアングルＤＤＡ回路、３２３…テクスチャマップエンジン（ＴＭＥ）・ピクセルオペレーションエンジン（ＰＯＥ）（ＴＭＥ＿ＰＯＥ）、３２４…キャッシュシステム。
DESCRIPTION OF SYMBOLS 1 ... Image processing apparatus, 2 ... System dependent block, 21 ... CPU interface (BB), 22 ... Memory interface (MEMIF), 3 ... Core block, 31 ... Geometry engine (GE), 32 ... Rasterization engine (RE), 321 ... Triangle setup circuit, 322 ... Triangle DDA circuit, 323 ... Texture map engine (TME) / Pixel operation engine (POE) (TME_POE), 324 ... Cache system.

Claims

An image processing apparatus that performs a rendering process on a memory,
A pixel engine that generates pixel data based on information about a primitive to be drawn and outputs two-dimensional coordinates as information about access to the memory;
The two-dimensional coordinates are received by the pixel engine, the two-dimensional structure corresponding to the two-dimensional coordinates is taken for the pixel data, and the commands and data corresponding to the two-dimensional coordinate information are standardized to access the memory. An image processing apparatus having a cache that outputs a cache command.

The image processing apparatus according to claim 1, wherein the cache command includes general-purpose information necessary for an actual address of the memory.

The image processing apparatus according to claim 1, further comprising a memory interface that receives a cache command from the cache and generates a real address of the memory.

The image processing apparatus according to claim 1, further comprising: a memory interface that receives a cache command from the cache and generates a real address of the memory depending on a memory architecture.

The image processing apparatus according to claim 2, further comprising a memory interface that receives a cache command from the cache and generates a real address of the memory.

The image processing apparatus according to claim 3, further comprising a memory interface that receives a cache command from the cache and generates a real address of the memory depending on a memory architecture.

An image processing apparatus that performs a rendering process on a memory,
An external interface that controls the interface with external devices;
A memory interface for controlling the interface with the memory;
A pixel engine that generates pixel data based on information about a primitive to be drawn input via the external interface, and outputs two-dimensional coordinates as information about access to the memory;
The two-dimensional coordinates are received by the pixel engine, the two-dimensional structure corresponding to the two-dimensional coordinates is taken for the pixel data, and the commands and data corresponding to the two-dimensional coordinate information are standardized to access the memory. A cache that outputs a cache command, and
The image processing apparatus, wherein the memory interface receives a cache command from the cache and generates a real address of the memory.

The external interface and the memory interface that controls the interface with the memory constitute a system-dependent block,
The image processing apparatus according to claim 7, wherein the pixel engine and the cache constitute a system-independent core block.

The image processing apparatus according to claim 7, wherein the cache command includes general-purpose information necessary for a real address of the memory.

The image processing apparatus according to claim 9, further comprising: a memory interface that receives a cache command from the cache and generates a real address of the memory depending on a memory architecture.

An image processing method for rendering a memory,
A first step of generating pixel data based on information about the primitive to be drawn;
A second step of generating two-dimensional coordinates as information relating to access to the memory and providing the same to a cache;
An image processing method comprising: a third step of outputting a cache command standardized to data and a command corresponding to two-dimensional coordinate information for accessing the memory in the cache.

The image processing method according to claim 11, further comprising a fourth step of receiving the cache command and generating a real address of the memory depending on a memory architecture.