JP2016524241A

JP2016524241A - Fragment shader that performs vertex shader operations

Info

Publication number: JP2016524241A
Application number: JP2016519563A
Authority: JP
Inventors: ツェルニー、マーク、エヴァン; シンプソン、デイヴィッド; スカンリン、ジェイソン
Original assignee: Sony Interactive Entertainment Inc
Current assignee: Sony Interactive Entertainment Inc
Priority date: 2013-06-10
Filing date: 2014-06-06
Publication date: 2016-08-12
Anticipated expiration: 2034-06-06
Also published as: WO2014200866A1; EP3008700B1; EP3008700A4; US20190035050A1; EP3008700A1; CN105556565A; US10733691B2; CN110097625A; US20140362101A1; US10096079B2; JP6230702B2; CN105556565B; CN110097625B

Abstract

【解決手段】グラフィック処理は、ＧＰＵによって頂点シェーダ及び画素シェーダを実行することを含む。頂点シェーダからの頂点インデックス出力がキャッシュに書き込まれる。キャッシュに書き込まれた頂点インデックスが画素シェーダによってアクセスされ、頂点インデックスに対応付けられた頂点パラメータ値が画素シェーダによってメモリ部でアクセスされる。この要約書は、研究者または他の読者が技術的開示の主題を直ちに確知することを可能とする要約を要件とする規則に準拠するための提供されることが強調される。これは特許請求の範囲または意味を解釈または限定するのに使用されないとの理解のもとに提出される。【選択図】図２ＢGraphic processing includes executing a vertex shader and a pixel shader by a GPU. The vertex index output from the vertex shader is written to the cache. The vertex index written in the cache is accessed by the pixel shader, and the vertex parameter value associated with the vertex index is accessed by the pixel shader in the memory unit. It is emphasized that this abstract is provided to comply with rules that require abstracts that allow researchers or other readers to immediately know the subject of technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. [Selection] Figure 2B

Description

本開示の態様は、コンピュータグラフィックに関する。特に、本開示は、グラフィック処理装置における頂点シェーダ及び画素シェーダの使用に関する。 Aspects of the present disclosure relate to computer graphics. In particular, this disclosure relates to the use of vertex shaders and pixel shaders in graphic processing devices.

グラフィック処理は通常、中央処理装置（ＣＰＵ）及びグラフィック処理装置（ＧＰＵ）の２つのプロセッサの協働を伴う。ＧＰＵは、ディスプレイへの出力のためのフレームバッファにおける画像の作成を加速するように設計された専用電子回路である。ＧＰＵは、組込みシステム、タブレットコンピュータ、可搬ゲーム機器、携帯電話、パーソナルコンピュータ、ワークステーション及びゲームコンソールにおいて使用される。ＧＰＵは通常、コンピュータグラフィックを操作する際に効率的となるように設計される。ＧＰＵは、大きなブロックのデータの処理が並列に行われるアルゴリズムのために、ＧＰＵを汎用ＣＰＵよりも効果的なものとする高度に並列化された処理アーキテクチャを有することが多い。 Graphics processing typically involves the cooperation of two processors: a central processing unit (CPU) and a graphics processing unit (GPU). A GPU is a dedicated electronic circuit designed to accelerate the creation of images in a frame buffer for output to a display. GPUs are used in embedded systems, tablet computers, portable gaming devices, mobile phones, personal computers, workstations and game consoles. GPUs are usually designed to be efficient when manipulating computer graphics. GPUs often have highly parallel processing architectures that make GPUs more effective than general-purpose CPUs due to algorithms that process large blocks of data in parallel.

ＣＰＵは、特定のグラフィック処理タスクを実行し、例えば、画像の直前フレームに対して変化した特定のテクスチャをレンダリングするようにコマンドをＧＰＵに送ることができる。これらの描画コマンドは、特定のアプリケーションの仮想環境の状態に対応するグラフィックレンダリングコマンドを送出するために、グラフィクス・アプリケーション・インターフェイス（ＡＰＩ）とともにＣＰＵによって調整される。 The CPU can perform specific graphics processing tasks and send commands to the GPU to render, for example, specific textures that have changed for the previous frame of the image. These drawing commands are coordinated by the CPU along with the graphics application interface (API) to send out graphics rendering commands corresponding to the state of the virtual environment of the particular application.

特定のプログラムについてテクスチャをレンダリングするために、ＧＰＵは、「グラフィックパイプライン」における一連の処理タスクを実行して、仮想環境における映像を、ディスプレイ上にレンダリングされる画像に変換する。通常のグラフィックパイプラインは、仮想空間での仮想オブジェクトに対する所定のレンダリングまたはシェーディング操作、ディスプレイへの出力に適した形式の画素データを生成する、シーンにおける仮想オブジェクトの変換及びラスター化、並びにレンダリングされた画像をディスプレイに出力する前のピクセル（またはフラグメント）に対する追加のレンダリングタスクを実行すること含む。 To render a texture for a particular program, the GPU performs a series of processing tasks in a “graphics pipeline” to convert the video in the virtual environment into an image that is rendered on the display. A regular graphics pipeline generates a predetermined rendering or shading operation on a virtual object in virtual space, transforms and rasterizes the virtual object in the scene, generating pixel data in a format suitable for output to the display, and rendered Performing additional rendering tasks on pixels (or fragments) prior to outputting the image to the display.

画像の仮想オブジェクトは、プリミティブとして知られる形状に関して仮想空間において記述されることが多く、これは仮想シーンにおけるオブジェクトの形状を併せて構成する。例えば、レンダリングされる３次元仮想世界におけるオブジェクトは、３次元空間におけるそれらの座標に関して規定された頂点を有する一連の個別の三角形プリミティブに縮小され、これにより、これらの多角形がオブジェクトの表面を構成する。各多角形は、所与の多角形を他の多角形から区別するように、グラフィック処理システムによって使用される対応のインデックスを有する。同様に、各頂点は、所与の頂点を他の頂点から区別するのに使用される対応のインデックスを有する。グラフィックパイプラインは、これらのプリミティブに対する所定の操作を実行して、仮想シーンに対する映像を生成し、このデータをディスプレイの画素による再生に適した２次元形式に変換する。ここで使用するグラフィックプリミティブ情報（または単に「プリミティブ情報」）という用語は、グラフィックプリミティブを表すデータのことをいうのに用いられる。そのようなデータは、以下に限定されないが、頂点情報（例えば、頂点位置または頂点インデックスを表すデータ）及び多角形情報、例えば、特定の頂点を特定の多角形に対応付ける多角形インデックス及び情報を含む。 Virtual objects in images are often described in virtual space with respect to shapes known as primitives, which together constitute the shape of the object in the virtual scene. For example, rendered objects in the 3D virtual world are reduced to a series of individual triangle primitives with vertices defined with respect to their coordinates in 3D space, so that these polygons constitute the surface of the object To do. Each polygon has a corresponding index used by the graphics processing system to distinguish a given polygon from other polygons. Similarly, each vertex has a corresponding index used to distinguish a given vertex from other vertices. The graphic pipeline performs predetermined operations on these primitives to generate a video for the virtual scene and converts this data into a two-dimensional format suitable for playback by display pixels. As used herein, the term graphic primitive information (or simply “primitive information”) is used to refer to data representing graphic primitives. Such data includes, but is not limited to, vertex information (eg, data representing vertex position or vertex index) and polygon information, eg, polygon index and information that associates a particular vertex with a particular polygon. .

ＧＰＵは、一般にシェーダとして知られるプログラムを実行することによってグラフィックパイプラインのレンダリングタスクを実行することができる。通常のグラフィックパイプラインは、頂点ごとにプリミティブの特定のプロパティを操作できる頂点シェーダ（「フラグメントシェーダ」としても知られる）、及びグラフィックパイプラインにおける頂点シェーダからのダウンストリームを操作し、画素データをディスプレイに送信する前に画素ごとに所定の値を操作することができる画素シェーダを含む。パイプラインはまた、パイプラインの種々の段階において、頂点シェーダの出力を用いて新たな組のプリミティブ（または対応するプリミティブ情報）を生成する幾何シェーダ、ＧＰＵによって実行されて他の所定の一般的演算タスクを実行する演算シェーダ（ＣＳ）などといった他のシェーダを含むことができる。 A GPU can perform graphics pipeline rendering tasks by executing a program commonly known as a shader. Normal graphics pipelines display pixel data by manipulating vertex shaders (also known as "fragment shaders") that can manipulate specific properties of primitives per vertex, and downstream from vertex shaders in the graphics pipeline It includes a pixel shader that can manipulate a predetermined value for each pixel before sending it to. The pipeline is also a geometric shader that uses the output of the vertex shader to generate a new set of primitives (or corresponding primitive information) at various stages of the pipeline, executed by the GPU, and other predetermined general operations. Other shaders such as a computational shader (CS) that performs the task may be included.

パイプラインにおいてグラフィックを処理することに関する１つの課題は、データがパイプラインにおける種々のシェーダに対して入出力されるにつれて特定のボトルネックが発生して性能を低下させることである。さらに、映像を実行する特定のアプリケーションの開発者がレンダリング処理を最適化できるようにするために、シェーダが種々の視覚パラメータ及び内在データを利用する態様に関して、より多い制御方法を開発者に与えることが望ましい。 One challenge with processing graphics in the pipeline is that certain bottlenecks occur and degrade performance as data is input to and output from various shaders in the pipeline. In addition, to give developers more control over how aspects of shaders utilize various visual parameters and underlying data to allow developers of specific applications that run video to optimize the rendering process Is desirable.

この状況の中で、本開示の態様が生じている。 In this context, aspects of the present disclosure arise.

本開示の態様によると、コンピュータグラフィック処理方法は、頂点インデックス出力を頂点シェーダからキャッシュに書き込むこと、画素シェーダによって、キャッシュに書き込まれた頂点インデックスにアクセスすること、及び画素シェーダによって、頂点インデックスに対応付けられた頂点パラメータ値にメモリ部でアクセスすることを含む。 According to aspects of this disclosure, a computer graphics processing method is adapted to write vertex index output from a vertex shader to a cache, to access a vertex index written to the cache by a pixel shader, and to support a vertex index by a pixel shader. Including accessing the attached vertex parameter value in the memory unit.

本開示のある実施例では、コンピュータグラフィック処理方法は、画素シェーダによって頂点パラメータ値に対して頂点シェーダ演算を実行することを更に含む。 In certain embodiments of the present disclosure, the computer graphic processing method further includes performing a vertex shader operation on the vertex parameter value by the pixel shader.

ある実施例では、コンピュータグラフィック処理方法は、画素シェーダによって頂点パラメータ値を補間することを更に含む。 In certain embodiments, the computer graphic processing method further includes interpolating vertex parameter values with a pixel shader.

ある実施例では、頂点インデックスにアクセスすることは、頂点インデックスをキャッシュからＧＰＵのローカルメモリ部に複製すること、及び画素シェーダによってローカルメモリ部でインデックスにアクセスすることを含む。 In some embodiments, accessing the vertex index includes replicating the vertex index from the cache to the local memory portion of the GPU, and accessing the index in the local memory portion by a pixel shader.

ある実施例では、頂点シェーダ演算は、３次元仮想空間におけるプリミティブの頂点の視覚効果を操作することを含む。 In one embodiment, vertex shader operations include manipulating the visual effects of primitive vertices in a three-dimensional virtual space.

ある実施例では、頂点パラメータ値にアクセスすることは、三角形プリミティブの全３頂点のパラメータ値にアクセスすることを含む。 In one embodiment, accessing vertex parameter values includes accessing parameter values for all three vertices of a triangle primitive.

ある実施例では、方法は、頂点パラメータ値にアクセスすることの後に、画素シェーダによって頂点パラメータ値に対して頂点シェーダ演算を実行すること、画素シェーダによってパラメータ値を補間すること、及び画素シェーダによって、補間されたパラメータ値に対して画素シェーダ演算を実行することを更に含む。 In one embodiment, the method includes performing a vertex shader operation on the vertex parameter value by the pixel shader, interpolating the parameter value by the pixel shader, and by the pixel shader after accessing the vertex parameter value. It further includes performing a pixel shader operation on the interpolated parameter value.

ある実施例では、頂点シェーダ出力は頂点位置及び頂点インデックスに制限され、画素シェーダは、頂点インデックスにアクセスすることの後に、残余の頂点シェーダ演算を実行する。 In one embodiment, vertex shader output is limited to vertex position and vertex index, and the pixel shader performs the remaining vertex shader operations after accessing the vertex index.

ある実施例では、メモリ部はシステムの主メモリである。 In one embodiment, the memory portion is the main memory of the system.

ある実施例では、パラメータ値は主メモリにおける頂点バッファに記憶される。 In one embodiment, the parameter value is stored in a vertex buffer in main memory.

本開示の態様によると、グラフィック処理システムは、グラフィック処理装置（ＧＰＵ）、メモリ部及びキャッシュを含み、システムは、ＧＰＵによって頂点シェーダ及び画素シェーダを実装すること、頂点インデックス出力を頂点シェーダからキャッシュに書き込むこと、画素シェーダによって、キャッシュに書き込まれた頂点インデックスにアクセスすること、及び画素シェーダによって、頂点インデックスに対応付けられた頂点パラメータ値にメモリ部でアクセスすることを含むグラフィック処理方法を実施するように構成される。 According to aspects of the present disclosure, a graphics processing system includes a graphics processing unit (GPU), a memory unit, and a cache, the system implements a vertex shader and a pixel shader with the GPU, and vertex index output from the vertex shader to the cache. Implementing a graphics processing method including writing, accessing a vertex index written to the cache by a pixel shader, and accessing a vertex parameter value associated with the vertex index by a pixel shader in a memory unit Configured.

ある実施例では、メモリ部はシステムの主メモリ部である。 In some embodiments, the memory portion is the main memory portion of the system.

ある実施例では、ＧＰＵは複数の演算部及び複数のローカルメモリ部を含み、ローカルメモリ部の各々は演算部の各１つに対応付けられている。 In one embodiment, the GPU includes a plurality of arithmetic units and a plurality of local memory units, and each of the local memory units is associated with one of the arithmetic units.

ある実施例では、頂点インデックスにアクセスすることは、頂点インデックスをキャッシュからローカルメモリ部に複製すること、及び画素シェーダによってローカルメモリ部から頂点インデックスにアクセスすることを含む。 In one embodiment, accessing the vertex index includes replicating the vertex index from the cache to the local memory portion, and accessing the vertex index from the local memory portion by the pixel shader.

ある実施例では、キャッシュはＧＰＵに集積される。 In one embodiment, the cache is integrated into the GPU.

本開示の態様によると、非一時的コンピュータ可読媒体は、そこに具現されるコンピュータ可読命令を有し、コンピュータ可読命令は、実行される場合にグラフィック処理方法を実施するように構成され、グラフィック処理方法は、頂点インデックス出力を頂点シェーダからキャッシュに書き込むこと、画素シェーダによって、キャッシュに書き込まれた頂点インデックスにアクセスすること、及び画素シェーダによって、メモリ部から頂点インデックスに対応付けられた頂点パラメータ値にアクセスすることを含む。 According to aspects of this disclosure, a non-transitory computer readable medium has computer readable instructions embodied therein, wherein the computer readable instructions are configured to perform a graphic processing method when executed, The method writes the vertex index output from the vertex shader to the cache, accesses the vertex index written to the cache by the pixel shader, and changes the vertex parameter value associated with the vertex index from the memory unit by the pixel shader. Including access.

本開示の教示は、添付図面との関係において以下の詳細な説明を検討することによって直ちに理解できる。 The teachings of the present disclosure can be readily understood by considering the following detailed description in conjunction with the accompanying drawings, in which:

図１Ａ−１Ｃは、種々のグラフィック処理手法を説明する三角形プリミティブの模式図である。1A-1C are schematic diagrams of triangle primitives illustrating various graphic processing techniques. 図１Ｄ−１Ｅは、従来的なグラフィック処理手法のフロー図である。1D-1E are flow diagrams of conventional graphic processing techniques. 図２Ａは、図１Ｅの実施例との類似性を有する従来的なグラフィック処理手法の説明図である。FIG. 2A is an explanatory diagram of a conventional graphic processing technique having similarity to the embodiment of FIG. 1E. 図２Ｂは、本開示の態様によるグラフィック処理手法の説明図である。FIG. 2B is an illustration of a graphic processing technique according to aspects of the present disclosure. 図３は、本開示の態様によるグラフィック処理手法のフロー図である。FIG. 3 is a flow diagram of a graphic processing technique according to aspects of the present disclosure. 図４は、本開示の態様によるグラフィック処理手法を実施するためのシステムの説明図である。FIG. 4 is an illustration of a system for implementing a graphics processing technique in accordance with aspects of the present disclosure.

以降の詳細な説明は、説明の目的で多数の具体的詳細を含むが、当業者であれば、以降の詳細に対する多数の変形例及び変更例が発明の範囲内にあることを理解するはずである。したがって、以下に記載される発明の例示的実施形態は、特許請求の範囲に記載される発明に対して、いかなる一般性の喪失もなく、限定を加えることもなく詳述される。 The following detailed description includes a number of specific details for purposes of explanation, but those skilled in the art should understand that many variations and modifications to the following details are within the scope of the invention. is there. Accordingly, the exemplary embodiments of the invention described below are detailed without any loss of generality or limitation to the claimed invention.

序論
本開示の態様によると、頂点シェーダ出力は出力位置及び頂点インデックスだけに縮小され、画素シェーダは通常の画素シェーダ演算と同様に残余の頂点シェーダ演算を実行し、オブジェクトの加速されたレンダリングがもたらされる。 Introduction According to aspects of this disclosure, the vertex shader output is reduced to the output position and vertex index only, and the pixel shader performs the residual vertex shader operation in the same way as a normal pixel shader operation, resulting in accelerated rendering of the object. It is.

本開示の態様によると、頂点シェーダ演算を頂点の素のパラメータ値に対して実行するために、画素シェーダはパラメータ値にシステムメモリで直接アクセスする。そして、画素シェーダは、パラメータ値を補間し、レンダリングされた画素が画素シェーダからフレームバッファに出力される前に画素シェーダ演算を実行する。スループット及びパラメータ値の複製に関連するボトルネックは、頂点インデックス、及び頂点シェーダからの全部ではない出力パラメータを出力し、これらのインデックスを用いてメモリにおけるパラメータ値を画素シェーダで識別することによって軽減される。 According to aspects of the present disclosure, in order to perform vertex shader operations on prime parameter values of a vertex, the pixel shader accesses the parameter values directly in system memory. The pixel shader then interpolates the parameter values and performs a pixel shader operation before the rendered pixel is output from the pixel shader to the frame buffer. Bottlenecks associated with throughput and parameter value replication are mitigated by outputting vertex indices and not all output parameters from the vertex shader, and using these indices to identify the parameter values in memory with the pixel shader. The

図１Ａ−１Ｃは、グラフィック処理手法の種々の態様、及びどのように頂点パラメータの補間が使用されてグラフィックを処理し、画像における仮想オブジェクトをレンダリングするかを示す。表示される仮想オブジェクトの種々の位置におけるパラメータ値を規定するために、グラフィック処理手法は重心補間処理を利用する。例として、そして限定ではなく、パラメータ値は、仮想空間に位置するプリミティブの各頂点における位置、色、テクスチャ座標、発光などであり、これらの頂点パラメータの重心補間はプリミティブ内の任意の位置におけるパラメータ値を特定するのに使用される。例えば、任意数の画素が、仮想シーンをディスプレイの画素にレンダリングするのに用いられる場合にプリミティブ内に位置し、プリミティブ内の画素位置において対応のパラメータ値を特定するのに上記の頂点パラメータ値の補間が使用される。 1A-1C illustrate various aspects of a graphic processing technique and how vertex parameter interpolation is used to process a graphic and render a virtual object in an image. In order to define parameter values at various positions of the virtual object to be displayed, the graphic processing technique uses centroid interpolation processing. By way of example and not limitation, the parameter values are the position, color, texture coordinates, light emission, etc. at each vertex of the primitive located in the virtual space, and the centroid interpolation of these vertex parameters is the parameter at any position within the primitive. Used to specify a value. For example, any number of pixels may be located within a primitive when used to render a virtual scene into a display pixel, and the vertex parameter values described above may be used to identify the corresponding parameter value at the pixel location within the primitive. Interpolation is used.

重心座標系を用いる補間処理の説明に係る態様を図１Ａに示す。図１Ａに、ＧＰＵによってグラフィックを処理するためのプリミティブとして利用される多角形（例えば、三角形）１０２を示す。なお、三角形は、最小頂点数（３点）を有する２次元形状であり、各三角形が平面となることが保証されているので、グラフィック処理におけるプリミティブとして一般に使用される。レンダリングされる画像における３次元オブジェクトのような仮想オブジェクトの表面を、仮想空間において方向付けられた多数の三角形プリミティブ１０２で構成することができる。三角形１０２は、各々が所定のパラメータ値Ｐ０、Ｐ１及びＰ２をそれぞれ有する頂点１０４ａ、１０４ｂ及び１０４ｃを含む。 A mode related to the description of the interpolation processing using the barycentric coordinate system is shown in FIG. 1A. FIG. 1A shows a polygon (eg, triangle) 102 used as a primitive for processing graphics by the GPU. Note that a triangle is a two-dimensional shape having the minimum number of vertices (three points), and since each triangle is guaranteed to be a plane, it is generally used as a primitive in graphic processing. The surface of a virtual object, such as a three-dimensional object, in the rendered image can be composed of a number of triangle primitives 102 that are oriented in virtual space. Triangle 102 includes vertices 104a, 104b and 104c, each having a predetermined parameter value P0, P1 and P2, respectively.

頂点パラメータ値Ｐ０、Ｐ１及びＰ２を補間することによって、三角形１０２の任意の点におけるパラメータ値Ｐ_ｉ，ｊは、形状の角部におけるパラメータ間の線形関係を用いて規定される。座標ｉ，ｊは、仮想オブジェクトを有する画像がディスプレイ上の画面スペースにレンダリングされる場合に、画素（または画素中心）の位置に対応する。したがって、この補間処理は、プリミティブ１０２に位置する画素のいずれかについてのパラメータ値を特定するのに使用される。仮想オブジェクトの任意の三角形１０２には、三角形内に位置する任意数の画素中心があり得る。例えば、プリミティブ内に位置し得る０個、１個、１０個またはそれ以上の画素があり得る。 By interpolating the vertex parameter values P0, P1, and P2, the parameter values P _{i, j} at any point of the triangle 102 are defined using the linear relationship between the parameters at the corners of the shape. The coordinates i and j correspond to the position of the pixel (or pixel center) when an image having a virtual object is rendered in the screen space on the display. Thus, this interpolation process is used to specify the parameter value for any of the pixels located in the primitive 102. Any triangle 102 of the virtual object can have any number of pixel centers located within the triangle. For example, there can be 0, 1, 10 or more pixels that can be located within a primitive.

位置ｉ，ｊにおける頂点パラメータを補間するために、頂点パラメータ値の１つが他の頂点のパラメータ値から減算され、これらの減算値が、所望のパラメータ値の位置に対応する三角形１０２内の重心座標位置の各々によって乗算される。これは、以下のように数学的に表され、ここでは、頂点パラメータＰ０が他の２つの頂点パラメータＰ１及びＰ２から減算され、これらの減算値が対応の座標値ｉ，ｊによって乗算される。
Ｐｉ，ｊ＝Ｐ０＋（Ｐ１−Ｐ０）ｉ＋（Ｐ２−Ｐ０）ｊ In order to interpolate the vertex parameters at positions i, j, one of the vertex parameter values is subtracted from the parameter value of the other vertex, and these subtraction values are the barycentric coordinates in triangle 102 corresponding to the position of the desired parameter value. Multiply by each of the positions. This is expressed mathematically as follows, where the vertex parameter P0 is subtracted from the other two vertex parameters P1 and P2 and these subtracted values are multiplied by the corresponding coordinate values i, j.
Pi, j = P0 + (P1-P0) i + (P2-P0) j

図１Ｂに、グラフィック処理アプリケーションのために仮想オブジェクトをレンダリングするのに使用される図１Ａの三角形１０２と同様の複数の三角形１０２ａ−ｄを示す。図１Ｂ及び以下の記載は、グラフィック処理手法を実施する際に頂点パラメータデータが利用及び記憶される種々の態様を説明するために簡略化された概略的な記述である。 FIG. 1B shows a plurality of triangles 102a-d similar to the triangle 102 of FIG. 1A used to render virtual objects for graphics processing applications. FIG. 1B and the following description are simplified schematic descriptions to illustrate various aspects in which vertex parameter data is utilized and stored in implementing a graphic processing technique.

三角形１０２ａ−ｄの各々は、対応するパラメータ値を各々が有する３個の頂点を有する。さらに、三角形１０２ａ−ｄは多数の共通頂点を共有し、このように、パラメータ値の多くが異なる三角形に対して共通する。パラメータ値を、それらが三角形の各々に対応付けられるように複数回記憶するのではなく、各頂点が識別インデックスに割り当てられる。簡略な例として、図１Ｂに示す頂点は、それぞれが識別インデックス０、１、３、９、１０及び４に割り当てられる。これらのインデックス及びそれらに対応したパラメータ値は、「頂点バッファ」として一般に知られるものに記憶される。さらに、三角形１０２ａ−ｄの各々は、それらの対応する頂点インデックスによって識別され、例えば、三角形１０２ａは（０，１，３）によって識別され、三角形１０２ｂは（１，３，９）によって識別され、以下同様にして、この情報が「インデックスバッファ」として一般に知られているものに記憶される。したがって、共通の頂点パラメータ値は、バッファにおいて識別されるそれぞれのインデックスによって各個別の三角形１０２ａ−ｄに対応付けられる。 Each of the triangles 102a-d has three vertices, each with a corresponding parameter value. Furthermore, the triangles 102a-d share a number of common vertices, and thus many of the parameter values are common to different triangles. Rather than storing parameter values multiple times so that they are associated with each of the triangles, each vertex is assigned to an identification index. As a simple example, the vertices shown in FIG. 1B are assigned to identification indexes 0, 1, 3, 9, 10 and 4, respectively. These indexes and their corresponding parameter values are stored in what is commonly known as a “vertex buffer”. Further, each of the triangles 102a-d is identified by their corresponding vertex index, for example, triangle 102a is identified by (0,1,3), triangle 102b is identified by (1,3,9), Similarly, this information is stored in what is commonly known as an “index buffer”. Accordingly, a common vertex parameter value is associated with each individual triangle 102a-d by a respective index identified in the buffer.

図１Ｂはまた、プリミティブ１０２ａ−ｄ上に重ねられる一連の画素位置ａ−ｆを示す。例えば図１Ａに関して上述したようなパラメータ値の補間は、各プリミティブ内の画素位置ａ−ｆの各々におけるパラメータ値を、各プリミティブを識別する各頂点パラメータ値及びインデックスに基づいて特定するのに使用される。例として、そして限定ではなく、三角形１０２ａ−ｄは３次元仮想環境において方向付けられ、画素位置ａ−ｆは、レンダリングされる仮想環境の画像を表示するのに使用される２次元画面の画素に対応する。 FIG. 1B also shows a series of pixel locations af overlaid on primitives 102a-d. For example, parameter value interpolation as described above with respect to FIG. 1A is used to identify the parameter value at each of the pixel locations a-f within each primitive based on each vertex parameter value and index identifying each primitive. The By way of example and not limitation, triangles 102a-d are oriented in a three-dimensional virtual environment, and pixel locations af are the pixels of the two-dimensional screen used to display the rendered virtual environment image. Correspond.

パラメータ値が三角形内に位置する画素ａ、ｂ及びｃに割り当てられる種々の態様を説明するために、図１Ｃに、図１Ｂの三角形１０２ａ及び１０２ｂを示す。図１Ｃに示すように、頂点パラメータ値Ｐ０、Ｐ１及びＰ２は、各個別の三角形１０２ａ及び１０２ｂに固有に割り当てられ、インデックスバッファに記憶されたインデックス０、１、３及び９に基づいて識別される。補間は、図１Ａに関して上述したように、対応するパラメータ値を頂点バッファでアクセスし、パラメータ値Ｐ０を残余の頂点パラメータＰ１及びＰ２から減算することによって実行される。 To illustrate the various ways in which parameter values are assigned to pixels a, b and c located within the triangle, FIG. 1C shows the triangles 102a and 102b of FIG. 1B. As shown in FIG. 1C, vertex parameter values P0, P1, and P2 are uniquely assigned to each individual triangle 102a and 102b and identified based on indexes 0, 1, 3, and 9 stored in the index buffer. . Interpolation is performed by accessing the corresponding parameter value in the vertex buffer and subtracting the parameter value P0 from the remaining vertex parameters P1 and P2, as described above with respect to FIG. 1A.

各プリミティブのパラメータ値を補間することの代替として、「フラットシェーディング」として知られる手法が用いられてもよい。フラットシェーディングを用いると、「起点となる頂点」、例えばＰ０が各三角形について規定され、残余の頂点との差、例えばＰ１−Ｐ０及びＰ２−Ｐ０が単にゼロ設定される。三角形内に位置する任意の画素が、起点となる頂点のパラメータ値とともに頂点シェーダから出力される。これにより、補間演算に関連する多量のオーバーヘッドを節約することができるが、仮想オブジェクトが相状に見えてしまい、これは多くのアプリケーションにおいて望ましくない。 As an alternative to interpolating the parameter values of each primitive, a technique known as “flat shading” may be used. With flat shading, a “starting vertex”, eg, P0, is defined for each triangle, and differences from the remaining vertices, eg, P1-P0 and P2-P0 are simply set to zero. Arbitrary pixels located within the triangle are output from the vertex shader together with the parameter value of the vertex as the starting point. This can save a lot of overhead associated with the interpolation operation, but makes the virtual object look like this, which is undesirable in many applications.

図１Ｄに、従来的な一方法１００ａによる頂点パラメータの補間を実行する説明に係る種々の態様のフロー図を示し、この方法によると画素シェーダによって受信される前に補間全体が実行される。画素ａ−ｆについてのパラメータ値を特定するために頂点パラメータが頂点シェーダ１１０及び画素シェーダ１１２との協働においてどのように補間されるのかを示すために、図１Ｄの方法１００ａは、図１Ｂ及び１Ｃに示すような三角形１０２ａ−ｄを利用する（なお、ａ−ｆをフレームバッファに出力する前に更なる修正が画素シェーダによって実行されるので、ａ−ｆは、より正確にはフラグメントまたはプレ画素を意味するが、ここでは説明の便宜上、単に画素というものとする）。 FIG. 1D shows a flow diagram of various aspects according to the description of performing vertex parameter interpolation according to a conventional method 100a, where the entire interpolation is performed before being received by the pixel shader. To illustrate how vertex parameters are interpolated in cooperation with vertex shader 110 and pixel shader 112 to identify parameter values for pixels a-f, method 100a of FIG. Utilize triangles 102a-d as shown in 1C (note that af is more precisely a fragment or pre-fetch because further modification is performed by the pixel shader before outputting af to the frame buffer. It means a pixel, but here it is simply referred to as a pixel for convenience of explanation).

方法１００ａは、頂点シェーダ１１０によって所定の頂点シェーダ演算１１４を実行することを含み、これは、レンダリングされるグラフィックをアプリケーションの仮想環境で調整するグラフィックＡＰＩから受信された描画コマンドによる、仮想オブジェクトの頂点パラメータの頂点ごとの所定操作を含む。頂点シェーダ１１０は、図１Ｄに示す三角形１０２ａ−ｄの各々についての対応する頂点パラメータ値Ｐ０、Ｐ１及びＰ２を出力する。 The method 100a includes performing a predetermined vertex shader operation 114 by the vertex shader 110, which is a vertex of a virtual object according to a drawing command received from a graphics API that adjusts the rendered graphic in the virtual environment of the application. Includes predetermined operations for each parameter vertex. The vertex shader 110 outputs corresponding vertex parameter values P0, P1, and P2 for each of the triangles 102a-d shown in FIG. 1D.

これらの頂点パラメータ値Ｐ０、Ｐ１及びＰ２は、対応する三角形１０２ａ−ｄ内に位置する画素位置ａ−ｆにおけるパラメータ値Ｐ_ａ−Ｐ_ｆを特定するために、各三角形について１１６において補間される。１１６における補間は、例えば図１Ａに関して説明したように、頂点パラメータ値Ｐ０を他の２つの頂点パラメータ値Ｐ１及びＰ２から減算し、これらの減算値をそれらの対応する重心座標で乗算し、座標で規定される画素位置におけるパラメータを補間するように乗算値を加算することを含む。図１Ｄに示す手法では、補間１１６は、全体として、ＧＰＵに対応付けられたパラメータ補間ハードウェア構成要素によって、画素シェーダプログラム１１２がパラメータ値を入力として受信する前に実行される。画素シェーダ１１２は、１１８において所定の画素シェーダ演算を画素ａ−ｆの各々について、すなわち画素ごとに実行することによって各画素ａ−ｆを更に操作し、出力画素１２０が得られる。出力画素１２０はフレームバッファに記憶され、ディスプレイ上にレンダリングされる画像として出力される。 These vertex parameter values P0, P1, and P2 are interpolated at 116 for each triangle to identify the parameter values P _a -P _f at pixel positions a-f located within the corresponding triangles 102a-d. Interpolation at 116 subtracts the vertex parameter value P0 from the other two vertex parameter values P1 and P2, for example as described with respect to FIG. 1A, and multiplies these subtraction values by their corresponding barycentric coordinates, Adding the multiplication values to interpolate the parameters at the defined pixel positions. In the approach shown in FIG. 1D, interpolation 116 is generally performed by the parameter interpolation hardware component associated with the GPU before the pixel shader program 112 receives the parameter value as input. The pixel shader 112 further manipulates each pixel af by performing a predetermined pixel shader operation at 118 for each pixel af, ie, pixel by pixel, to obtain an output pixel 120. Output pixels 120 are stored in a frame buffer and output as an image to be rendered on the display.

図１Ｅに、他の従来的な方法１００ｂによって頂点パラメータの補間を実行する説明に係る種々の態様のフロー図を示す。図１Ｅに示す従来的な方法１００ｂは、図１Ｄに示す従来的な方法１００ａとは、パラメータが画素シェーダ１１２に到達する前に補間１１６の減算部１２２のみが実行される点を除いて同様である。この手法１００ｂでは、補間１１６の減算部１２２は、画素シェーダプログラム１１２が減算済みパラメータ値を入力として受信して補間１１６の残りを実行する前に、ＧＰＵに対応付けられたパラメータ補間ハードウェア構成要素によって実行される。したがって、頂点パラメータの補間１１６の残りは、絶対頂点パラメータＰ０、パラメータＰ０に対して減算された減算パラメータ値Ｐ１０及びＰ２０並びに絶対頂点パラメータ値Ｐ０に対する所望のパラメータＰの座標の、１２４における単純な乗算及び加算演算に軽減されることになる。対応する三角形１０２ａ−ｄの各々について、Ｐ１０＝Ｐ１−Ｐ０であり、Ｐ２０＝Ｐ１−Ｐ０である。これにより、前述した所望のパラメータ値Ｐ_ａ−Ｐ_ｆが得られ、これが、出力画素１２０を生成するように１１８において画素シェーダによって更に操作される。 FIG. 1E shows a flow diagram of various aspects according to the description of performing vertex parameter interpolation by another conventional method 100b. The conventional method 100b shown in FIG. 1E is similar to the conventional method 100a shown in FIG. 1D, except that only the subtractor 122 of the interpolation 116 is executed before the parameters reach the pixel shader 112. is there. In this method 100b, the subtractor 122 of the interpolation 116 receives the parameter value that has been subtracted by the pixel shader program 112 and performs the rest of the interpolation 116 before executing the rest of the interpolation 116. Executed by. Thus, the remainder of the vertex parameter interpolation 116 is a simple multiplication at 124 of the absolute vertex parameter P0, the subtraction parameter values P10 and P20 subtracted from the parameter P0 and the coordinates of the desired parameter P relative to the absolute vertex parameter value P0. And the addition operation is reduced. For each of the corresponding triangles 102a-d, P10 = P1-P0 and P20 = P1-P0. This yields the desired parameter values P _a -P _f described above, which are further manipulated by the pixel shader at 118 to produce the output pixel 120.

図２Ａは、従来的な方法によってグラフィックを処理するように構成された種々のハードウェア及びソフトウェア構成要素で実行される方法２００ａの説明図を示す。図２Ａに示す方法２００ａは、図１Ｅに示す方法１００ｂと類似する。 FIG. 2A shows an illustration of a method 200a performed on various hardware and software components configured to process graphics in a conventional manner. The method 200a shown in FIG. 2A is similar to the method 100b shown in FIG. 1E.

頂点シェーダ２１０は、画面スペースにおけるプリミティブの頂点の位置２３０、頂点の発光、影、色などの操作といったような、各プリミティブの頂点に対する他の種々のレンダリング効果２３４を特定することを含む、種々の頂点シェーダ演算２１４を実行する。頂点シェーダ演算２１４から得られる種々のパラメータＰ０、Ｐ１及びＰ２は一時的な記憶のためにパラメータキャッシュ２３６に書き込まれ、システムのパラメータ補間ハードウェア構成要素２２２は、パラメータキャッシュ２３６からのそれぞれの組のパラメータをＧＰＵの各演算部の各小容量ローカルメモリ部２３７に書き込む前にパラメータ値を減算することによって、補間の一部を実行する。各ローカルメモリ部２３７は、ＧＰＵの各演算部に対応付けられたローカルデータシェア（ＬＤＳ）としても知られる小容量であるが高速なローカルメモリ部であり、シェーダプログラムを並列に稼働させる複数の上記メモリ部及び演算部がある。 Vertex shader 210 includes various other rendering effects 234 for each primitive vertex, such as manipulation of primitive vertex position 230 in screen space, vertex lighting, shadows, colors, etc. A vertex shader operation 214 is executed. Various parameters P 0, P 1 and P 2 resulting from the vertex shader operation 214 are written to the parameter cache 236 for temporary storage, and the system's parameter interpolation hardware component 222 receives each set of parameters from the parameter cache 236. Part of the interpolation is performed by subtracting the parameter value before writing the parameter to each small-capacity local memory unit 237 of each processing unit of the GPU. Each local memory unit 237 is a small-capacity but high-speed local memory unit also known as a local data share (LDS) associated with each calculation unit of the GPU, and operates a plurality of shader programs in parallel. There are a memory part and a calculation part.

頂点シェーダ出力位置２３０は、例えばここに記載するように、それらがパラメータ値を補間するのに使用されるように各プリミティブに対する画素の重心座標ｉ，ｊを生成するハードウェア構成要素２３８によって使用される。画素シェーダ２１２は、各所望のパラメータｉ，ｊの座標を用いて乗算及び加算演算２２４を実行することによって補間を完了するために、ローカルデータシェア２３７で絶対パラメータ値Ｐ０並びに相対パラメータ値Ｐ１０及びＰ２０にアクセスする。そして、画素シェーダ２１２は、所定の更なる画素シェーダ演算２１８を実行して、例えばフレームバッファに出力する前に画素を更に操作する。 Vertex shader output position 230 is used by hardware component 238 to generate pixel centroid coordinates i, j for each primitive, for example, as described herein, as they are used to interpolate parameter values. The The pixel shader 212 completes the interpolation by performing a multiply and add operation 224 using the coordinates of each desired parameter i, j, in order to complete the interpolation with an absolute parameter value P0 and relative parameter values P10 and P20. To access. The pixel shader 212 then performs a predetermined further pixel shader operation 218 to further manipulate the pixel before outputting it to, for example, a frame buffer.

上述した手法２００ａの１つの欠点は、画素シェーダに対するパラメータのスループットに関連する特定のボトルネックが起こることであり、これは仮想オブジェクトのレンダリングの速度を低下させ得る。１つに、パラメータキャッシュへのパラメータ書込みスループットがボトルネックとなることが認識されてきた。例えば、各パラメータは、例えば３２ビットの浮動小数点数のような大きな属性変数となり、頂点シェーダはこれらの属性変数を一連の、例えば一時に４個のウェーブフロントとしてパラメータキャッシュ２３６に書き込む。さらに、パラメータキャッシュの使用によって、記憶される頂点シェーダのウェーブフロント数が更に制限され、更なるボトルネックが生ずる。そして、パラメータはローカルデータシェア２３７に複製され、画素シェーダによってアクセスされる前に一時的に記憶され、画素シェーダのウェーブフロント数を制限することによって、制限されたスループット及び合計ローカルデータシェアの使用によって再度ボトルネックが生ずる。 One drawback of the technique 200a described above is that certain bottlenecks associated with parameter throughput for the pixel shader occur, which can slow down the rendering of virtual objects. For one thing, parameter write throughput to the parameter cache has been recognized as a bottleneck. For example, each parameter becomes a large attribute variable, such as a 32-bit floating point number, and the vertex shader writes these attribute variables to the parameter cache 236 as a series of, for example, four wavefronts at a time. In addition, the use of a parameter cache further limits the number of vertex shader wavefronts stored, creating additional bottlenecks. The parameters are then replicated to the local data share 237 and temporarily stored before being accessed by the pixel shader, by limiting the number of wavefronts in the pixel shader, thereby limiting the throughput and total local data share usage. A bottleneck occurs again.

上述した手法２００ａの他の欠点は、減算されたパラメータ値Ｐ１０及びＰ２０が画素シェーダ２１２に到達する前に計算されるため、画素シェーダは素のパラメータ値Ｐ１及びＰ２に直接アクセスできないことであり、これにより、画素シェーダによって実行されるレンダリング効果の種類が制限される。 Another disadvantage of the technique 200a described above is that the pixel shader cannot directly access the raw parameter values P1 and P2, since the subtracted parameter values P10 and P20 are calculated before reaching the pixel shader 212. This limits the type of rendering effect performed by the pixel shader.

実施例
図２Ｂに、本開示の態様の実施例を示す。図２Ｂは、本開示の種々の態様による、グラフィックを処理するように構成された種々のハードウェア及びソフトウェア構成要素によって実施される手法２００ｂの概略図を示す。図２Ｂに示す実施例では、例えば、上述したようなパラメータ値のスループットに関連するボトルネックは、頂点パラメータ値への直接のアクセスを画素シェーダに与えることによって対処される。そして、画素シェーダは、通常の画素シェーダ演算を実行する前の頂点パラメータの補間と同様に、頂点シェーダに従来的に対応付けられたこれらの素の頂点パラメータ値に対して多数の演算を実行することができる。 Example FIG. 2B illustrates an example of an aspect of the present disclosure. FIG. 2B shows a schematic diagram of a technique 200b implemented by various hardware and software components configured to process graphics according to various aspects of the disclosure. In the embodiment shown in FIG. 2B, for example, bottlenecks associated with parameter value throughput as described above are addressed by giving the pixel shader direct access to the vertex parameter values. The pixel shader then performs a number of operations on these prime vertex parameter values conventionally associated with the vertex shader, similar to the vertex parameter interpolation prior to performing the normal pixel shader operation. be able to.

図２Ｂに示すように、画素シェーダ２１０は、２１４ａにおいて示すような頂点シェーダ演算を実行する。場合によっては、これらの演算は、他のパラメータ値が頂点シェーダ出力から省かれるように、頂点位置２３０を特定すること及びインデックスを出力することに限られてもよい。頂点シェーダ出力パラメータをパラメータキャッシュ２３６に書き込むのではなく、頂点シェーダ２１０は、各プリミティブ（例えば、三角形または他の多角形）を識別する頂点インデックスＩ０、Ｉ１及びＩ０をパラメータキャッシュ２３６にただ書き込むように構成される。そして、これらのインデックスは、画素シェーダがそれらにローカルにアクセスできるように、ローカルデータシェア２３７に送られる。これらのインデックスＩ０、Ｉ１及びＩ２にアクセスできるので、画素シェーダ２１２は、インデックスに対応付けられた素のパラメータ値Ｐ０、Ｐ１及びＰ２に、システムの主メモリ、例えば頂点バッファで直接アクセスすることができる。これらのパラメータ値にアクセスできるので、画素シェーダ２１２は、三角形の頂点に対する他の視覚効果２３４を含み得る残余の頂点シェーダ演算２１４ｂを実行することができる。そして、画素シェーダは、各画素位置でパラメータ値を特定するために、座標ｉ，ｊを用いて、残余の頂点シェーダ演算２１４ｂから得られたパラメータ値の補間２１６を実行する。そして、画素シェーダ２１２は、補間値に対して、画素への更なる視覚効果を含む更なる画素シェーダ演算２１８を実行して出力画素を生成する。 As shown in FIG. 2B, the pixel shader 210 performs a vertex shader operation as shown at 214a. In some cases, these operations may be limited to identifying a vertex position 230 and outputting an index so that other parameter values are omitted from the vertex shader output. Rather than writing the vertex shader output parameters to the parameter cache 236, the vertex shader 210 simply writes the vertex indices I0, I1 and I0 identifying each primitive (eg, a triangle or other polygon) to the parameter cache 236. Composed. These indexes are then sent to the local data share 237 so that the pixel shader can access them locally. Since these indexes I0, I1 and I2 can be accessed, the pixel shader 212 can directly access the raw parameter values P0, P1 and P2 associated with the index in the main memory of the system, for example, the vertex buffer. Having access to these parameter values, the pixel shader 212 can perform a residual vertex shader operation 214b that can include other visual effects 234 on the vertices of the triangle. Then, the pixel shader performs interpolation 216 of the parameter value obtained from the remaining vertex shader calculation 214b using the coordinates i and j in order to specify the parameter value at each pixel position. The pixel shader 212 then performs additional pixel shader operations 218 that include additional visual effects on the pixels on the interpolated values to generate output pixels.

この説明に係る実施例では、画素シェーダに送信されるインデックスＩ０、Ｉ１及びＩ２はパラメータ値について使用される属性変数よりも非常に小さいデータ量であるので、例えばインデックスは基本的に１つの数のみであるので、これらのパラメータ値のスループットに関連するボトルネックが減少する。 In the embodiment according to this description, the indexes I0, I1 and I2 sent to the pixel shader are much smaller data amounts than the attribute variables used for the parameter values, so for example the index is basically only one number. Thus, the bottleneck associated with the throughput of these parameter values is reduced.

ただし、頂点シェーダ演算２１４ｂ及び補間２１６を図２Ｂに示す態様で画素シェーダ２１２に実行させることによって、全体のシェーダと同様に画素シェーダにおける演算負荷が増加することになる。例えば、画像には、通常は可視頂点よりも多くの画素がある。説明する方法では頂点シェーダ演算２１４ｂは頂点シェーダ演算２１０ではなく画素シェーダ２１２によって実行されるので、頂点シェーダがそうするようにそれらを頂点ごとに１回実行するのではなく、それらは画素ごとに３回（すなわち、画素を含む三角形プリミティブの各頂点について）実行される。したがって、演算負荷をこのような態様で増加させることは従来的な知恵に反し、これは、演算負荷を増加させることによって性能が低下して望ましくないと言えそうである。しかし、パラメータスループットに関連するボトルネックはレンダリング速度よりも大きな制限的要因であり、たとえより多くの演算が画素シェーダによって実行されなければならないとしても、全体的性能は実際に高められ、オブジェクトのレンダリングは加速される。 However, by causing the pixel shader 212 to execute the vertex shader calculation 214b and the interpolation 216 in the manner shown in FIG. 2B, the calculation load on the pixel shader increases as with the entire shader. For example, an image usually has more pixels than visible vertices. In the described method, the vertex shader operation 214b is performed by the pixel shader 212, not the vertex shader operation 210, so they do not execute them once per vertex, as the vertex shader does, they are 3 per pixel. Times (ie, for each vertex of the triangle primitive containing the pixel). Therefore, it is contrary to conventional wisdom to increase the calculation load in this manner, and this seems to be undesirable because the performance decreases by increasing the calculation load. However, the bottleneck associated with parameter throughput is a more limiting factor than rendering speed, and even if more operations have to be performed by the pixel shader, the overall performance is actually increased and the rendering of the object Is accelerated.

またさらに、例えば上述したような従来的なフラットシェーディング手法では、画素シェーダは、インデックスの全て及び各三角形に対する頂点パラメータの全てに直接アクセスすることができない。せいぜい、それは起点となる頂点にアクセスできるに過ぎず、例えば、図２Ｂに示すような頂点シェーダ演算２１４ｂを画素シェーダが実行することを阻害するといったように、実行可能なレンダリング及び生成可能な視覚効果の種類を制限することになる。 Still further, for example, with the conventional flat shading techniques described above, the pixel shader cannot directly access all of the indices and all of the vertex parameters for each triangle. At best, it only has access to the starting vertex, eg executable rendering and visual effects that can be generated, such as preventing the pixel shader from executing the vertex shader operation 214b as shown in FIG. 2B. Will limit the type of.

図３は、本開示の種々の態様による、頂点シェーダ及び画素シェーダによってグラフィックを処理するための方法３００のフロー図を示す。方法３００は、図２Ｂに示すグラフィック処理手法２００ｂと類似性を有する。 FIG. 3 shows a flow diagram of a method 300 for processing graphics by a vertex shader and a pixel shader in accordance with various aspects of the present disclosure. The method 300 is similar to the graphic processing technique 200b shown in FIG. 2B.

説明する方法３００は、頂点シェーダ３１０によって頂点シェーダ演算３１４ａを実行することを含む。多数の通常の頂点シェーダ演算が、頂点シェーダ３１０によって実行される演算３１４ａから省かれ、その代わりに画素シェーダ３１２を用いて画素ごとに実行される。頂点シェーダ３１０の出力は、プリミティブ３０２ａ−ｄの各々について頂点出力位置及び頂点インデックスに限定される。プリミティブは、インデックスＩ０、Ｉ１及びＩ２によって識別される３頂点を各々が有する三角形であり、図１Ｂに示す三角形１０２ａ−ｄと同様である。 The described method 300 includes performing a vertex shader operation 314a by a vertex shader 310. A number of normal vertex shader operations are omitted from the operation 314a performed by the vertex shader 310 and are instead performed pixel by pixel using the pixel shader 312. The output of vertex shader 310 is limited to the vertex output position and vertex index for each of primitives 302a-d. The primitives are triangles each having three vertices identified by indices I0, I1 and I2, and are similar to the triangles 102a-d shown in FIG. 1B.

インデックスは画素シェーダ３１２に送られ、それはインデックスＩ０、Ｉ１及びＩ２を用いて、異なるプリミティブ３０２ａ−ｄの各々に対する頂点パラメータ値Ｐ０、Ｐ１及びＰ２の各々に直接アクセスすることができる。画素シェーダ３１２は、例えば、頂点インデックスに対応付けられた頂点バッファに記憶されたパラメータ値を有するシステムメモリに直接アクセスすることによって、素のパラメータ値に直接アクセスする。画素シェーダはパラメータ値を用いて残余の頂点シェーダ演算３１４ｂを実行し、それは各プリミティブの各頂点に対して所定の視覚効果をレンダリングすることを含む。そして、画素シェーダ３１２は、プリミティブ内の各画素位置におけるパラメータ値Ｐ_ａ−Ｐ_ｆを抽出するために、画素シェーダ演算３１４ｂを実行した後に、得られるパラメータ値を補間する。そして、画素シェーダ３１２は、補間パラメータ値を用いて画素に対して画素シェーダ演算３１８を実行することによって画素に対する追加の視覚効果を生成し、レンダリングされた画素３２０を、例えばシステムメモリのフレームバッファに出力する。 The index is sent to the pixel shader 312 which can directly access each of the vertex parameter values P0, P1 and P2 for each of the different primitives 302a-d using the indices I0, I1 and I2. The pixel shader 312 directly accesses the raw parameter value, for example, by directly accessing the system memory having the parameter value stored in the vertex buffer associated with the vertex index. The pixel shader performs a residual vertex shader operation 314b using the parameter values, which includes rendering a predetermined visual effect for each vertex of each primitive. Then, the pixel shader 312 interpolates the obtained parameter value after executing the pixel shader calculation 314b in order to extract the parameter value P _a -P _f at each pixel position in the primitive. The pixel shader 312 then generates an additional visual effect for the pixel by performing a pixel shader operation 318 on the pixel using the interpolation parameter value, and the rendered pixel 320 is placed in, for example, a frame buffer in the system memory. Output.

本開示の態様は、上述した構成を実現するように構成されたグラフィック処理システムを含む。例として、そして限定ではなく、図４に、本開示の態様によるグラフィック処理を実施するのに使用されるコンピュータシステム４００のブロック図を示す。本開示の態様によると、システム４００は、組込みシステム、携帯電話、パーソナルコンピュータ、タブレットコンピュータ、可搬ゲーム機器、ワークステーション、ゲームコンソールなどである。 Aspects of the present disclosure include a graphics processing system configured to implement the above-described configuration. By way of example and not limitation, FIG. 4 illustrates a block diagram of a computer system 400 used to perform graphics processing according to aspects of the present disclosure. According to aspects of this disclosure, system 400 is an embedded system, mobile phone, personal computer, tablet computer, portable gaming device, workstation, game console, and the like.

システム４００は、概略として、中央処理装置（ＣＰＵ）４７０、グラフィック処理装置（ＧＰＵ）４７１、並びにＣＰＵ及びＧＰＵの双方にアクセス可能な主メモリ４７２を含む。ＣＰＵ４７０及びＧＰＵ４７１の各々は、１以上のプロセッサコア、例えば、単一のコア、２個のコア、４個のコア、８個のコアまたはそれ以上を含む。主メモリ４７２は、アドレス指定可能なメモリ、例えばＲＡＭ、ＤＲＡＭなどを与える集積回路の形態となり得る。 The system 400 generally includes a central processing unit (CPU) 470, a graphics processing unit (GPU) 471, and a main memory 472 that is accessible to both the CPU and GPU. Each of CPU 470 and GPU 471 includes one or more processor cores, eg, a single core, two cores, four cores, eight cores or more. Main memory 472 may be in the form of an integrated circuit that provides addressable memory, such as RAM, DRAM, and the like.

例として、そして限定ではなく、ＣＰＵ４７０及びＧＰＵ４７１は、データバス４７６を用いて主メモリ４７２にアクセスする。場合によっては、システム４００が２以上の異なるバスを含むことが有用である。主メモリ４７２は、ＣＰＵ４７０及びＧＰＵ４７２によってアクセス可能なデータを含む。主メモリはデータを、頂点バッファ４６３、インデックスバッファ４６６及びフレームバッファ４６４を含むバッファに一時的に記憶する。 By way of example and not limitation, the CPU 470 and GPU 471 access the main memory 472 using the data bus 476. In some cases, it may be useful for system 400 to include two or more different buses. Main memory 472 includes data accessible by CPU 470 and GPU 472. Main memory temporarily stores data in buffers including vertex buffer 463, index buffer 466 and frame buffer 464.

ＣＰＵは、ＣＰＵコードを実行するように構成され、これは、レンダリングされたグラフィックを利用するアプリケーション４６０と、ＧＰＵによって実行されるプログラムに描画コマンドを送出するためのドライバ／コンパイラ４６１及びグラフィックＡＰＩ４６２とを含む。ＣＰＵコードはまた、物理シミュレーション及び他の機能を実現する。ＧＰＵは、本開示の説明に係る実施例に関して上述したように動作するように構成される。特に、ＧＰＵはＧＰＵコードを実行し、これは、上述したように、頂点シェーダ４１０及び画素シェーダ４１２を実現する。これらのシェーダは、主メモリ４７２のデータとのインターフェイスとなり、画素シェーダは、レンダリングされる画素を、ディスプレイへの出力前の一時的な記憶のためにフレームバッファ４６４に出力する。ＧＰＵは、グラフィック処理タスクを並列に実行するように構成された複数の演算部（ＣＵ）４６５を含む。各演算部は、上述したローカルデータシェア（ＬＤＳ）４３７のような、それ自身の専用ローカルメモリストアを含む。システム４００はまた、頂点インデックスデータ４６８を一時的に記憶するためのキャッシュ４３６を含み、データはキャッシュ４３６から各ＬＤＳ４３７に複製され、それはデータを並列に利用するシェーダプログラムを実現する。パラメータキャッシュ４３６は、ＧＰＵに集積されていてもよいし、ＧＰＵとは別体であって、例えばバス４７６を介してＧＰＵにアクセス可能であってもよい。ＧＰＵは、例えば、幾何シェーダ及び演算シェーダのような他のプログラムを実行することもできる。 The CPU is configured to execute CPU code, which includes an application 460 that utilizes rendered graphics, and a driver / compiler 461 and graphics API 462 for sending drawing commands to a program executed by the GPU. Including. The CPU code also implements physical simulation and other functions. The GPU is configured to operate as described above with respect to embodiments in accordance with the description of the present disclosure. In particular, the GPU executes GPU code, which implements the vertex shader 410 and the pixel shader 412 as described above. These shaders interface with data in main memory 472, and the pixel shader outputs the rendered pixels to frame buffer 464 for temporary storage before output to the display. The GPU includes a plurality of computing units (CUs) 465 configured to execute graphic processing tasks in parallel. Each computing unit includes its own dedicated local memory store, such as the local data share (LDS) 437 described above. The system 400 also includes a cache 436 for temporarily storing vertex index data 468, which is replicated from the cache 436 to each LDS 437, which implements a shader program that uses the data in parallel. The parameter cache 436 may be integrated in the GPU, or may be separate from the GPU and accessible to the GPU via the bus 476, for example. The GPU can also execute other programs such as, for example, geometric shaders and arithmetic shaders.

システム４００は公知のサポート機能４７７を含んでいてもよく、これは、例えばバス４７６を介してシステムの他の構成要素と通信することができる。そのようなサポート機能は、限定するわけではないが、入力／出力（Ｉ／Ｏ）要素４７９、電源（Ｐ／Ｓ）４８０及びクロック（ＣＬＫ）４８１を含む。 System 400 may include a known support function 477, which can communicate with other components of the system, for example, via bus 476. Such support functions include, but are not limited to, input / output (I / O) element 479, power supply (P / S) 480 and clock (CLK) 481.

装置４００は、プログラム及び／またはデータを記憶するディスクドライブ、ＣＤ−ＲＯＭドライブ、フラッシュメモリ、テープドライブなどといった大容量記憶装置４８４を選択的に含む。装置４００はまた、装置４００とユーザとの間の相互作用を促進する表示部４８６及びユーザインターフェイス部４８８を含む。表示部４８６は、フラットパネルディスプレイ、冷陰極管（ＣＲＴ）画面、タッチスクリーン、またはテキスト、数字、グラフィカルな記号若しくは画像を表示できる他の装置である。ディスプレイ４８６は、ここに記載される種々の手法によって処理されたレンダリングされた画像４８７を表示する。ユーザインターフェイス４８８は、キーボード、マウス、ジョイスティック、ライトペン、ゲームコントローラ、またはグラフィカルユーザインターフェイス（ＧＵＩ）との関係で使用される他の装置を含む。システム４００はまた、装置がネットワークを介して他の装置と通信することを可能とするネットワークインターフェイス４９０を含む。ネットワークは、例えば、ローカルエリアネットワーク（ＬＡＮ）、インターネットなどのワイドエリアネットワークＢｌｕｅｔｏｏｔｈ（登録商標）ネットワークなどのパーソナルエリアネットワーク、または他のタイプのネットワークである。これらの構成要素は、ハードウェア、ソフトウェア若しくはファームウェアまたはこれらの２以上の組合せにおいて実現される。 The device 400 optionally includes a mass storage device 484 such as a disk drive, CD-ROM drive, flash memory, tape drive, etc. that stores programs and / or data. The device 400 also includes a display portion 486 and a user interface portion 488 that facilitate interaction between the device 400 and the user. The display unit 486 is a flat panel display, a cold cathode tube (CRT) screen, a touch screen, or other device capable of displaying text, numbers, graphical symbols or images. Display 486 displays rendered image 487 that has been processed by the various techniques described herein. User interface 488 includes a keyboard, mouse, joystick, light pen, game controller, or other device used in connection with a graphical user interface (GUI). System 400 also includes a network interface 490 that allows the device to communicate with other devices over a network. The network is, for example, a local area network (LAN), a personal area network such as a wide area network Bluetooth® network such as the Internet, or other type of network. These components are implemented in hardware, software or firmware or a combination of two or more thereof.

上記は本発明の好適な実施形態の完全な説明であるが、種々の変更、変形及び等化を用いることが可能である。したがって、本発明の範囲は、上記の説明を参照して特定されるべきではなく、付随する特許請求の範囲及びそれらの均等の全範囲を参照して特定されるべきである。好適か否かにかかわらずここに記載されるいずれの構成も、好適か否かにかかわらずここに記載される他の構成と組み合わせることができる。以降の特許請求の範囲において、不定冠詞「Ａ」または「Ａｎ」は、明示的に断りがある場合を除き、冠詞に続く項目の１以上の数量を意味する。付随する特許請求の範囲は、文言「・・・のための手段」を用いる所与の請求項にそのような発明特定事項が明示的に記載されない限り、ミーンズプラスファンクションの発明特定事項を含むものとして解釈されるべきではない。 While the above is a complete description of the preferred embodiment of the present invention, it is possible to use various modifications, variations and equalizations. The scope of the invention should, therefore, be determined not with reference to the above description, but instead should be determined with reference to the appended claims along with their full scope of equivalents. Any configuration described herein, whether preferred or not, may be combined with other configurations described herein, whether preferred or not. In the following claims, the indefinite article “A” or “An” means one or more quantities of items following the article, unless expressly stated otherwise. The accompanying claims are intended to include means-plus-function invention matters, unless such invention-specific matters are explicitly stated in a given claim using the word "means for ..." Should not be interpreted as.

Claims

A computer graphic processing method comprising:
Writing the vertex index output from the vertex shader to the cache;
Accessing the vertex index written to the cache by a pixel shader; and accessing a vertex parameter value associated with the vertex index in the memory unit by the pixel shader.

The method of claim 1, further comprising performing a vertex shader operation on the vertex parameter value by the pixel shader.

3. The method of claim 2, wherein the vertex shader operation includes manipulating the visual effects of primitive vertices in a three-dimensional virtual space.

The method of claim 1, further comprising interpolating the vertex parameter value with the pixel shader.

The method of claim 1, wherein accessing the vertex index is replicating the vertex index from the cache to a local memory unit of a GPU, and accessing the index in the local memory unit by the pixel shader. Said method.

2. The method of claim 1, wherein accessing the vertex parameter values includes accessing parameter values for all three vertices of a triangle primitive.

The method of claim 1, wherein after accessing the vertex parameter value,
Performing a vertex shader operation on the vertex parameter value by the pixel shader;
The method further comprising interpolating the parameter value with the pixel shader and performing a pixel shader operation on the parameter value interpolated with the pixel shader.

The method of claim 1, wherein vertex shader output is limited to a vertex position and the vertex index, and the pixel shader performs a residual vertex shader operation after accessing the vertex index.

The method of claim 1, wherein the memory portion is a system main memory.

The method of claim 9, wherein the parameter value is stored in a vertex buffer in the main memory.

A graphics processing system,
Graphics processing unit (GPU),
A memory unit and a cache,
The system is configured to implement a graphics processing method, the method comprising:
Implementing a vertex shader and a pixel shader with the GPU;
Writing the vertex index output from the vertex shader into the cache;
The system comprising: accessing the vertex index written to the cache by the pixel shader; and accessing the vertex parameter value associated with the vertex index in the memory unit by the pixel shader.

12. The system according to claim 11, wherein the memory unit is a main memory unit of the system.

12. The system according to claim 11, wherein the GPU includes a plurality of arithmetic units and a plurality of local memory units, and each of the local memory units is associated with one of the arithmetic units.

14. The system of claim 13, wherein accessing the vertex index is replicating the vertex index from the cache to the local memory unit, and accessing the vertex index from the local memory unit by the pixel shader. Said system.

12. The system of claim 11, wherein the cache is integrated with the GPU.

The system of claim 11, wherein the method further comprises performing a vertex shader operation on the vertex parameter value by the pixel shader.

The system of claim 11, wherein the method further comprises interpolating the parameter value with the pixel shader.

12. The system according to claim 11, wherein the system is an embedded system, a mobile phone, a personal computer, a tablet computer, a portable game device, a workstation or a game console.

A non-transitory computer readable medium having computer readable instructions embodied therein, wherein the computer readable instructions are configured to perform a graphic processing method when executed, the graphic processing method comprising:
Writing the vertex index output from the vertex shader to the cache;
Non-transitory computer readable comprising: accessing a vertex index written to the cache by a pixel shader; and accessing a vertex parameter value associated with the vertex index from a memory unit by the pixel shader. Medium.