JP2016072826A

JP2016072826A - Image decoding device, graphic processing device, image decoding method, and graphic processing method

Info

Publication number: JP2016072826A
Application number: JP2014200724A
Authority: JP
Inventors: 佐藤　仁; Hitoshi Sato; 仁佐藤
Original assignee: Sony Computer Entertainment Inc
Current assignee: Sony Interactive Entertainment Inc
Priority date: 2014-09-30
Filing date: 2014-09-30
Publication date: 2016-05-09
Anticipated expiration: 2034-09-30
Also published as: JP6465606B2

Abstract

PROBLEM TO BE SOLVED: To provide a graphic processing technology for efficiently decompressing a compressed texture.SOLUTION: An image decoding device includes a main memory 300 and a graphic processing unit (GPU) 200. The GPU 200 includes a variable length decoding part 30 and an inverse discrete cosine transformation (IDCT) part 40. The variable length decoding part 30 executes variable length decoding on a compressed texture on the basis of an encoding table for allocating a code corresponding to a pair of a range of a run number and a range of a level value together with an immediate field indicating at least either an immediate value of the run number or an immediate value of the level value, and the IDCT part 40 restores the texture by performing inverse discrete cosine transformation on the texture on which the variable length decoding has been performed. The main memory 300 includes a texture pool for partially caching the restored texture.SELECTED DRAWING: Figure 1

Description

この発明は、画像を復号する技術、特に圧縮テクスチャを伸張するグラフィックス処理技術に関する。 The present invention relates to a technique for decoding an image, and more particularly to a graphics processing technique for decompressing a compressed texture.

パーソナルコンピュータやゲーム専用機において、高品質な３次元コンピュータグラフィックスを用いたゲームやシミュレーションなどのアプリケーションを実行したり、実写とコンピュータグラフィックスを融合させた映像コンテンツの再生を行うなど、高画質のグラフィックスの利用が広がっている。 High-quality images such as playing games and simulations using high-quality 3D computer graphics and playing video content that combines live-action and computer graphics on personal computers and game machines The use of graphics is spreading.

一般に、グラフィックス処理は、ＣＰＵとグラフィックスプロセッシングユニット（ＧＰＵ）が連携することで実行される。ＣＰＵが汎用的な演算を行う汎用プロセッサであるのに対して、ＧＰＵは高度なグラフィックス演算を行うための専用プロセッサである。ＣＰＵはオブジェクトの３次元モデルにもとづいて投影変換などのジオメトリ演算を行い、ＧＰＵはＣＰＵから頂点データなどを受け取ってレンダリングを実行する。ＧＰＵはラスタライザやピクセルシェーダなどの専用ハードウェアから構成され、パイプライン処理でグラフィックス処理を実行する。最近のＧＰＵには、プログラマブルシェーダと呼ばれるように、シェーダ機能がプログラム可能なものもあり、シェーダプログラミングをサポートするために、一般にグラフィックスライブラリが提供されている。 In general, graphics processing is executed by cooperation of a CPU and a graphics processing unit (GPU). While the CPU is a general-purpose processor that performs general-purpose operations, the GPU is a dedicated processor for performing advanced graphics operations. The CPU performs a geometry calculation such as projection conversion based on the three-dimensional model of the object, and the GPU receives vertex data from the CPU and executes rendering. The GPU is composed of dedicated hardware such as a rasterizer and a pixel shader, and executes graphics processing by pipeline processing. Some recent GPUs have programmable shader functions called programmable shaders, and graphics libraries are generally provided to support shader programming.

グラフィックス処理では、オブジェクトの表面の質感を表現するためにテクスチャをオブジェクトの表面に貼り付けるテクスチャマッピングが行われる。ゲームなどのアプリケーションで利用される画像の高精細化にともない、テクスチャも高解像度のデータが利用されるようになり、テクスチャデータは大容量化している。たとえば、ゲームで利用されるテクスチャはＧｉＢ（ギビバイト）のオーダーであり、必要なテクスチャデータをすべてメモリ上に格納することは困難である。 In the graphics processing, texture mapping is performed in which a texture is pasted on the surface of the object in order to express the texture of the surface of the object. As the images used in applications such as games become higher in definition, higher resolution data is used for textures, and the volume of texture data is increasing. For example, the texture used in the game is in the order of GiB (Gibibyte), and it is difficult to store all necessary texture data on the memory.

そこで非圧縮テクスチャまたはＧＰＵが直接扱える低圧縮テクスチャをハードディスクなどの記憶装置に格納しておき、必要に応じてメモリ上のテクスチャバッファにロードして描画に用いることが行われている。ハードディスクからテクスチャをロードするのに要する時間は通常数十ミリ秒から時には数秒になることもあり、安定しない。そのため、ハードディスクからのテクスチャのロードが間に合わなかった場合、本来表示したいテクスチャが利用できないという問題が生じる。 In view of this, uncompressed textures or low-compressed textures that can be directly handled by the GPU are stored in a storage device such as a hard disk, and loaded into a texture buffer on a memory and used for drawing as necessary. The time it takes to load a texture from the hard disk is usually tens of milliseconds to sometimes several seconds and is not stable. Therefore, when the texture loading from the hard disk is not in time, there arises a problem that the texture to be originally displayed cannot be used.

一方、高圧縮テクスチャであれば、メインメモリ容量を上回るテクスチャであってもメモリに保持することができ、ハードディスクからのロードなしにテクスチャを扱うことができるようになる。しかし、この場合、高圧縮テクスチャは一般にＧＰＵが直接扱えるものでないため、高圧縮テクスチャをリアルタイムで伸張するための専用ハードウェアが必要になる。専用ハードウェアが利用できなければ、ＣＰＵで圧縮テクスチャを伸張してテクスチャバッファに展開することになるが、この場合は伸張に時間がかかり、描画をリアルタイムで行うことが難しくなる。 On the other hand, in the case of a highly compressed texture, even a texture exceeding the main memory capacity can be held in the memory, and the texture can be handled without loading from the hard disk. However, in this case, since the high-compression texture is generally not directly handled by the GPU, dedicated hardware for decompressing the high-compression texture in real time is required. If dedicated hardware is not available, the CPU will decompress the compressed texture and expand it in the texture buffer. In this case, however, it takes time to decompress and it becomes difficult to perform drawing in real time.

本発明はこうした課題に鑑みてなされたものであり、その目的は、圧縮テクスチャを効率良く伸張することのできるグラフィックス処理技術を提供することにある。 The present invention has been made in view of these problems, and an object thereof is to provide a graphics processing technique capable of efficiently decompressing a compressed texture.

上記課題を解決するために、本発明のある態様の画像復号装置は、ラン数の範囲とレベル値の範囲のペアに対応する符号を、ラン数の即値およびレベル値の即値の少なくとも一方を示す即値フィールドとともに割り当てる符号化テーブルにもとづいて圧縮画像の可変長復号を実行する可変長復号部と、可変長復号された画像を逆空間周波数変換することにより画像を復元する逆空間周波数変換部とを含む。 In order to solve the above problems, an image decoding apparatus according to an aspect of the present invention indicates a code corresponding to a pair of a range of run numbers and a range of level values, and indicates at least one of an immediate value of run numbers and an immediate value of level values. A variable length decoding unit that performs variable length decoding of a compressed image based on an encoding table that is assigned together with an immediate field, and an inverse spatial frequency conversion unit that restores an image by performing inverse spatial frequency conversion on the variable length decoded image Including.

本発明の別の態様は、グラフィックス処理装置である。この装置は、メインメモリとグラフィックスプロセッシングユニットとを含むグラフィックス処理装置であって、前記グラフィックスプロセッシングユニットは、ラン数の範囲とレベル値の範囲のペアに対応する符号を、ラン数の即値およびレベル値の即値の少なくとも一方を示す即値フィールドとともに割り当てる符号化テーブルにもとづいて圧縮テクスチャの可変長復号を実行する可変長復号部と、可変長復号されたテクスチャを逆空間周波数変換することによりテクスチャを復元する逆空間周波数変換部とを含む。前記メインメモリは、復元されたテクスチャを部分的にキャッシュするテクスチャプールを含む。 Another aspect of the present invention is a graphics processing apparatus. This device is a graphics processing device including a main memory and a graphics processing unit, wherein the graphics processing unit assigns a code corresponding to a pair of a run number range and a level value range to an immediate value of the run number. And a variable-length decoding unit that performs variable-length decoding of the compressed texture based on an encoding table that is assigned together with an immediate field indicating at least one of the immediate values of the level value, and texture by performing inverse spatial frequency conversion on the variable-length decoded texture And an inverse spatial frequency transforming unit for restoring. The main memory includes a texture pool that partially caches the restored texture.

本発明のさらに別の態様は、画像復号方法である。この方法は、ラン数の範囲とレベル値の範囲のペアに対応する符号を、ラン数の即値およびレベル値の即値の少なくとも一方を示す即値フィールドとともに割り当てる符号化テーブルにもとづいて圧縮画像の可変長復号を実行するステップと、可変長復号された画像を逆空間周波数変換することにより画像を復元するステップとを含む。 Yet another aspect of the present invention is an image decoding method. In this method, a variable length of a compressed image is assigned based on a coding table in which a code corresponding to a pair of a run number range and a level value range is assigned together with an immediate field indicating at least one of an immediate value of a run number and an immediate value of a level value. Performing decoding, and restoring the image by performing inverse spatial frequency transform on the variable length decoded image.

本発明のさらに別の態様は、グラフィックス処理方法である。この方法は、メインメモリとグラフィックスプロセッシングユニットとを含むグラフィックス処理装置におけるグラフィックス処理方法であって、グラフィックスプロセッシングユニットが、コンピュートシェーダによって、ラン数の範囲とレベル値の範囲のペアに対応する符号を、ラン数の即値およびレベル値の即値の少なくとも一方を示す即値フィールドとともに割り当てる符号化テーブルにもとづいて圧縮テクスチャの可変長復号を実行し、可変長復号されたテクスチャを逆空間周波数変換することによりテクスチャを復元し、テクスチャを部分的にキャッシュする前記メインメモリ内のテクスチャプールに復元されたテクスチャを格納する。 Yet another embodiment of the present invention is a graphics processing method. This method is a graphics processing method in a graphics processing device including a main memory and a graphics processing unit, and the graphics processing unit supports a pair of a range of run numbers and a range of level values by a compute shader. Variable-length decoding of the compressed texture is performed based on an encoding table that assigns the code to be executed together with an immediate field indicating at least one of the immediate value of the run number and the immediate value of the level value, and the variable-length decoded texture is subjected to inverse spatial frequency conversion. The texture is restored, and the restored texture is stored in the texture pool in the main memory where the texture is partially cached.

なお、以上の構成要素の任意の組合せ、本発明の表現を方法、装置、システム、コンピュータプログラム、データ構造、記録媒体などの間で変換したものもまた、本発明の態様として有効である。 It should be noted that any combination of the above-described constituent elements and the expression of the present invention converted between a method, an apparatus, a system, a computer program, a data structure, a recording medium, and the like are also effective as an aspect of the present invention.

本発明によれば、符号化された画像、特に圧縮テクスチャを効率良く伸張することができる。 According to the present invention, an encoded image, particularly a compressed texture can be efficiently decompressed.

実施の形態に係るグラフィックス処理装置の構成図である。It is a block diagram of the graphics processing apparatus which concerns on embodiment. 図２（ａ）〜図２（ｃ）は、ミップマップテクスチャを説明する図である。Fig.2 (a)-FIG.2 (c) are the figures explaining a mipmap texture. 本実施の形態のＰＲＴの仕組みを説明する図である。It is a figure explaining the mechanism of PRT of this Embodiment. 図１の即値フィールド付き符号化テーブルの一例を示す図である。It is a figure which shows an example of the encoding table with an immediate field of FIG. 図１の即値フィールド付き符号化テーブルの別の例を示す図である。It is a figure which shows another example of the encoding table with an immediate field of FIG. 図１の即値フィールド付き符号化テーブルのさらに別の例を示す図である。It is a figure which shows another example of the encoding table with an immediate field of FIG. 比較のため、分岐先に偏りがない場合のスレッドの実行過程を説明する図である。It is a figure explaining the execution process of the thread | thread when a branch destination has no bias for a comparison. 分岐先に偏りがある場合のスレッドの実行過程を説明する図である。It is a figure explaining the execution process of the thread | thread when a branch destination has a bias. 符号化データが図６の即値フィールド付き符号化テーブルのいずれの行に当てはまるかをサーチする際の分岐を説明する図である。FIG. 7 is a diagram for explaining a branch when searching for which row of the encoding table with an immediate field in FIG. 6 corresponds to the encoded data. 図９で説明した分岐を有するプログラムコードを示す図である。It is a figure which shows the program code which has the branch demonstrated in FIG. 図１の即値フィールド付き符号化テーブルの実施例を示す図である。It is a figure which shows the Example of the encoding table with an immediate field of FIG.

図１は、実施の形態に係るグラフィックス処理装置の構成図である。グラフィックス処理装置は、メインプロセッサ１００、グラフィックスプロセッシングユニット（ＧＰＵ）２００、およびメインメモリ３００を含む。 FIG. 1 is a configuration diagram of a graphics processing apparatus according to an embodiment. The graphics processing device includes a main processor 100, a graphics processing unit (GPU) 200, and a main memory 300.

メインプロセッサ１００は、単一のメインプロセッサであってもよく、複数のプロセッサを含むマルチプロセッサシステムであってもよく、あるいは、複数のプロセッサコアを１個のパッケージに集積したマルチコアプロセッサであってもよい。メインプロセッサ１００はバスを介してメインメモリ３００に対してデータを読み書きすることができる。 The main processor 100 may be a single main processor, a multiprocessor system including a plurality of processors, or a multicore processor in which a plurality of processor cores are integrated in one package. Good. The main processor 100 can read / write data from / to the main memory 300 via the bus.

ＧＰＵ２００は、グラフィックプロセッサコアを搭載したグラフィックチップであり、バスを介してメインメモリ３００に対してデータを読み書きすることができる。 The GPU 200 is a graphic chip equipped with a graphic processor core, and can read / write data from / to the main memory 300 via a bus.

メインプロセッサ１００とＧＰＵ２００は、バスで接続されており、メインプロセッサ１００とＧＰＵ２００は互いにバスを介してデータをやりとりすることができる。 The main processor 100 and the GPU 200 are connected by a bus, and the main processor 100 and the GPU 200 can exchange data with each other via the bus.

同図は、グラフィックス処理の中で特にテクスチャ処理に関する構成を図示しており、それ以外の処理に関する構成は省略している。 FIG. 2 shows a configuration related to texture processing in the graphics processing, and the configuration related to other processing is omitted.

メインメモリ３００のメモリ領域はＧＰＵ２００からアクセスできるようにＧＰＵ２００が参照するアドレス空間にメモリマッピングされており、ＧＰＵ２００は、メインメモリ３００からテクスチャデータを読み取ることができる。テクスチャデータは、ＰＲＴ（Partially Resident Textures）と呼ばれる方法を用いて、部分的にメインメモリ３００にキャッシュされる。 The memory area of the main memory 300 is memory-mapped in an address space referred to by the GPU 200 so that the GPU 200 can access the GPU 200, and the GPU 200 can read texture data from the main memory 300. Texture data is partially cached in the main memory 300 using a method called PRT (Partially Resident Textures).

メインプロセッサ１００は、グラフィックス演算部２０およびＰＲＴ制御部１０を含む。グラフィックス演算部２０は、ＧＰＵ２００のグラフィックス処理部５０からテクスチャの詳細度を示すＬＯＤ（level of detail）値を受け取り、ＰＲＴ制御部１０にＬＯＤ値を渡す。ＰＲＴ制御部１０は、グラフィックス処理部５０から受け取ったＬＯＤ値にもとづいて、今後必要となるであろうミップマップテクスチャを算出し、テクスチャプールであるＰＲＴキャッシュ３２０への展開を指示したり、使わなくなったページをはがしたりすることでＰＲＴのマッピングを更新する。 The main processor 100 includes a graphics calculation unit 20 and a PRT control unit 10. The graphics calculation unit 20 receives an LOD (level of detail) value indicating the level of detail of the texture from the graphics processing unit 50 of the GPU 200 and passes the LOD value to the PRT control unit 10. Based on the LOD value received from the graphics processing unit 50, the PRT control unit 10 calculates a mipmap texture that will be required in the future, instructs the PRT cache 320, which is a texture pool, and uses it. PRT mapping is updated by removing the missing page.

図２（ａ）〜図２（ｃ）は、ミップマップテクスチャを説明する図である。ミップマップテクスチャは、詳細度（ＬＯＤ）に応じて解像度を異ならせた複数のテクスチャである。図２（ａ）のミップマップテクスチャ３４０は、高解像度のテクスチャである。図２（ｂ）のミップマップテクスチャ３４２は、図２（ａ）の高解像度のミップマップテクスチャ３４０の縦、横のサイズをそれぞれ１／２にした、中解像度のテクスチャである。図２（ｃ）のミップマップテクスチャ３４４は、図２（ｂ）の中解像度のミップマップテクスチャ３４２の縦、横のサイズをそれぞれ１／２にした、低解像度のテクスチャである。 Fig.2 (a)-FIG.2 (c) are the figures explaining a mipmap texture. The mipmap texture is a plurality of textures having different resolutions according to the level of detail (LOD). The mipmap texture 340 in FIG. 2A is a high-resolution texture. The mipmap texture 342 in FIG. 2B is a medium resolution texture in which the vertical and horizontal sizes of the high resolution mipmap texture 340 in FIG. The mipmap texture 344 in FIG. 2C is a low resolution texture in which the vertical and horizontal sizes of the medium resolution mipmap texture 342 in FIG.

図１に戻り、ＰＲＴ制御部１０は、グラフィックス演算部２０に指定された詳細度のミップマップテクスチャを読み出すようにＧＰＵ２００に指示する。より具体的には、ＰＲＴ制御部１０は、ＧＰＵ２００の可変長復号部３０および逆離散コサイン変換（ＩＤＣＴ）部４０を制御し、また、メインメモリ３００に格納されたＰＲＴキャッシュ３２０のスワップイン、スワップアウトを制御する。 Returning to FIG. 1, the PRT control unit 10 instructs the GPU 200 to read the mipmap texture of the level of detail specified in the graphics calculation unit 20. More specifically, the PRT control unit 10 controls the variable length decoding unit 30 and the inverse discrete cosine transform (IDCT) unit 40 of the GPU 200, and also swaps in and swaps the PRT cache 320 stored in the main memory 300. Control out.

ＧＰＵ２００は、可変長復号部３０、ＩＤＣＴ部４０、およびグラフィックス処理部５０を含む。 The GPU 200 includes a variable length decoding unit 30, an IDCT unit 40, and a graphics processing unit 50.

可変長復号部３０は、ＰＲＴ制御部１０から指定された詳細度に対応する圧縮テクスチャ３１０をメインメモリ３００から読み出し、即値フィールド付き符号化テーブル６０（以下、略して「符号化テーブル６０」と呼ぶこともある）を参照して圧縮テクスチャ３１０を可変長復号し、ＤＣＴブロックリングバッファ８０に格納する。 The variable length decoding unit 30 reads out the compressed texture 310 corresponding to the level of detail specified from the PRT control unit 10 from the main memory 300, and calls the encoding table 60 with an immediate value field (hereinafter referred to as “coding table 60” for short). The compressed texture 310 is subjected to variable length decoding with reference to the data, and stored in the DCT block ring buffer 80.

ＩＤＣＴ部４０は、ＤＣＴブロックリングバッファ８０に格納された可変長復号後のテクスチャのＤＣＴブロックを逆離散コサイン変換し、ＰＲＴキャッシュ３２０に格納する。 The IDCT unit 40 performs inverse discrete cosine transform on the textured DCT block after variable length decoding stored in the DCT block ring buffer 80 and stores it in the PRT cache 320.

グラフィックス処理部５０は、ＰＲＴキャッシュ３２０から必要なミップマップテクスチャを読み出す。ＰＲＴキャッシュ３２０は、テクスチャを部分的にキャッシュするテクスチャタイルプールであり、必要なテクスチャをスワップインし、不要なものはスワップアウトする。 The graphics processing unit 50 reads a necessary mipmap texture from the PRT cache 320. The PRT cache 320 is a texture tile pool that partially caches textures, swaps in necessary textures, and swaps out unnecessary ones.

図３は、本実施の形態のＰＲＴの仕組みを説明する図である。 FIG. 3 is a diagram for explaining the mechanism of the PRT according to the present embodiment.

仮想メモリ上にはミップマップテクスチャ３４０、３４２、３４４の領域が配置される。テクスチャの領域を一定のサイズのチャンクに分け、ページテーブル３３０を用いて、必要なテクスチャ領域だけをテクスチャタイルプール３６０に格納する。ここで、テクスチャは圧縮テクスチャ３１０としてメインメモリ３００に存在しているため、テクスチャタイルプール３６０にテクスチャ領域をキャッシュする際、圧縮テクスチャ３１０を伸張する処理が必要になる。ＰＲＴ制御部１０は、グラフィックス処理部５０からの要求に従い、可変長復号部３０およびＩＤＣＴ部４０を制御して、必要に応じて圧縮テクスチャ３１０を伸張させる。 On the virtual memory, areas of mipmap textures 340, 342, and 344 are arranged. The texture area is divided into chunks of a certain size, and only the necessary texture area is stored in the texture tile pool 360 using the page table 330. Here, since the texture exists in the main memory 300 as the compressed texture 310, when the texture area is cached in the texture tile pool 360, a process for expanding the compressed texture 310 is required. The PRT control unit 10 controls the variable length decoding unit 30 and the IDCT unit 40 in accordance with a request from the graphics processing unit 50, and expands the compressed texture 310 as necessary.

同図の例では、高解像度のミップマップテクスチャ３４０のチャンク３５２、中解像度のミップマップテクスチャ３４２のチャンク３５８は、それぞれページテーブル３３０のページ３３２、３３８に対応づけられており、物理メモリがテクスチャタイルプール３６０からマップされている。 In the example shown in the figure, the chunk 352 of the high resolution mipmap texture 340 and the chunk 358 of the medium resolution mipmap texture 342 are associated with the pages 332 and 338 of the page table 330, respectively, and the physical memory is a texture tile. Mapped from pool 360.

他方、高解像度のミップマップテクスチャ３４０のチャンク３５４、中解像度のミップマップテクスチャ３４２のチャンク３５６は、それぞれページテーブル３３０のページ３３４、３３６に対応づけられているが、いずれも物理メモリがまだテクスチャタイルプール３６０からマップされていない。この場合、前述のように、ＰＲＴ制御部１０は、グラフィックス処理部５０から受け取ったＬＯＤ値にもとづいて必要なテクスチャがテクスチャタイルプール３６０にあるように制御し、テクスチャタイルプール３６０の物理メモリが割り当てられ、圧縮テクスチャ３１０から必要なテクスチャデータが伸張されてテクスチャタイルプール３６０に格納される。一方、グラフィックス処理部５０は、メインプロセッサ１００を介することなく、自分自身が計算したＬＯＤ値を使ってミップマップテクスチャをテクスチャタイルプール３６０から読み出す。このとき、もし計算したＬＯＤ値に対応するミップマップテクスチャがテクスチャタイルプール３６０に存在しない場合は、グラフィックス処理部５０はフォールバックして、要求する詳細度を下げ、解像度の低いミップマップテクスチャをテクスチャタイルプール３６０から読み出し、描画する。 On the other hand, the chunk 354 of the high-resolution mipmap texture 340 and the chunk 356 of the medium-resolution mipmap texture 342 are associated with the pages 334 and 336 of the page table 330, respectively. Not mapped from pool 360. In this case, as described above, the PRT control unit 10 controls the texture tile pool 360 so that the necessary texture is in the texture tile pool 360 based on the LOD value received from the graphics processing unit 50, and the physical memory of the texture tile pool 360 is The necessary texture data is expanded from the compressed texture 310 and stored in the texture tile pool 360. On the other hand, the graphics processing unit 50 reads the mipmap texture from the texture tile pool 360 using the LOD value calculated by itself without going through the main processor 100. At this time, if the mipmap texture corresponding to the calculated LOD value does not exist in the texture tile pool 360, the graphics processing unit 50 falls back to reduce the requested detail level, Read from the texture tile pool 360 and draw.

ここで、テクスチャのデータフォーマットについて説明する。圧縮される前の元のテクスチャデータは、たとえばＲＧＢ３２ビットフォーマットで与えられる。ＧＰＵ２００が直接扱うことのできるテクスチャフォーマットとして、ＢＣ５またはＢＣ７と呼ばれるテクスチャ圧縮方式により圧縮されたテクスチャがあり、これによれば、品質を比較的良好に保ったまま、元のテクスチャデータに比べておよそ１／４の圧縮率でデータ量を削減できる。品質が比較的低くなってもよいのであれば、ＢＣ１またはＤＸＴ１と呼ばれるテクスチャ圧縮方式により圧縮されたテクスチャを利用することもでき、この場合、元のテクスチャデータに比べておよそ１／８の圧縮率でデータ量を削減できる。 Here, the data format of the texture will be described. The original texture data before being compressed is given in RGB 32-bit format, for example. As a texture format that can be directly handled by the GPU 200, there is a texture compressed by a texture compression method called BC5 or BC7. According to this, compared with the original texture data, the quality is kept relatively good. Data volume can be reduced with a compression ratio of 1/4. If the quality may be relatively low, a texture compressed by a texture compression method called BC1 or DXT1 can be used. In this case, the compression ratio is about 1/8 compared to the original texture data. Can reduce the amount of data.

他方、ＧＰＵ２００が直接扱えなくなるが、ＪＰＥＧにより圧縮されたテクスチャを利用すれば、元のテクスチャデータに比べておよそ１／２０の圧縮率でデータ量を削減できる。この場合、ＧＰＵ２００のコンピュートシェーダではＪＰＥＧ伸張のような複雑なアルゴリズムを実行することは非効率であり、ＪＰＥＧ伸張を行うことのできる専用ハードウェアがなければ、リアルタイムで圧縮テクスチャを伸張してグラフィックス処理に利用することは難しい。 On the other hand, the GPU 200 cannot directly handle the data, but if a texture compressed by JPEG is used, the data amount can be reduced at a compression ratio of about 1/20 compared to the original texture data. In this case, it is inefficient to execute a complicated algorithm such as JPEG decompression in the compute shader of the GPU 200, and if there is no dedicated hardware capable of performing JPEG decompression, the compressed texture is decompressed in real time. It is difficult to use for processing.

本実施の形態では、ＤＣＴと即値フィールド付きの符号化テーブルを用いた可変長符号化を行うことで、およそ１／２０の圧縮率でデータ量を削減できる。ここまで高圧縮されると、圧縮テクスチャ３１０はメインメモリ３００に常駐させることが可能になる。ＧＰＵ２００は、メインメモリ３００から圧縮テクスチャ３１０を読み出し、コンピュートシェーダによって、リアルタイムで即値フィールド付きの符号化テーブルを用いた可変長復号および逆離散コサイン変換（ＩＤＣＴ）を実行してテクスチャを復元することが可能である。 In the present embodiment, the amount of data can be reduced with a compression rate of approximately 1/20 by performing variable length encoding using a DCT and an encoding table with an immediate field. When high compression is performed up to this point, the compressed texture 310 can be made resident in the main memory 300. The GPU 200 reads the compressed texture 310 from the main memory 300, and performs variable length decoding and inverse discrete cosine transform (IDCT) using an encoding table with an immediate field in real time by a compute shader to restore the texture. Is possible.

ＪＰＥＧ圧縮されたテクスチャは、ＧＰＵ２００が直接利用することができないため、ＪＰＥＧデコーダによっていったん復号する必要がある。ＪＰＥＧのコーデックが搭載されたグラフィックス装置であれば、ＪＰＥＧ圧縮されたテクスチャにも対応可能であるが、一般にはＪＰＥＧのコーデックを利用可能ではない。ＪＰＥＧ圧縮は、画像を離散コサイン変換し、量子化した後、ハフマン符号化を行うものである。ハフマン符号化は複雑な圧縮アルゴリズムであるから、仮にＧＰＵ２００のコンピュートシェーダがＪＰＥＧ圧縮されたテクスチャのハフマン復号を行ったとすると、計算量が膨大なものになってしまう。 Since textures compressed with JPEG cannot be directly used by the GPU 200, they need to be decoded once by a JPEG decoder. A graphics device equipped with a JPEG codec can handle JPEG-compressed textures, but generally cannot use a JPEG codec. JPEG compression performs Huffman coding after subjecting an image to discrete cosine transform and quantization. Since Huffman coding is a complicated compression algorithm, if the compute shader of the GPU 200 performs Huffman decoding of a texture that has been JPEG-compressed, the amount of calculation will be enormous.

それに対して、即値フィールド付き符号化テーブル６０を用いた可変長符号化は、即値フィールドを用いたことで符号化テーブルを小さくすることができるため、通常のハフマン符号化とは違って、ＧＰＵ２００のコンピュートシェーダによって効率的に実行することができる。 On the other hand, the variable-length coding using the coding table 60 with the immediate field can reduce the coding table by using the immediate field, so that unlike the normal Huffman coding, It can be efficiently executed by a compute shader.

通常のハフマン符号化では、連続する「０」の数を示す「ラン数（Run）」と「０」以外の値である「レベル値（Level）」の組み合わせに対して１個のハフマン符号を割り当てて符号化する。出現頻度の高いラン数とレベル値の組み合わせに対しては短い符号を、出現頻度の低いラン数とレベル値の組み合わせに対しては長い符号を割り当てることで、データの平均符号長を最小にすることができる。 In normal Huffman coding, one Huffman code is assigned to a combination of “Run” indicating the number of consecutive “0” and “Level” which is a value other than “0”. Assign and encode. Minimize the average code length of data by assigning short codes to combinations of run numbers and level values with high appearance frequency and long codes to combinations of run numbers and level values with low appearance frequency be able to.

これに対して、即値フィールド付き符号化テーブル６０を用いた可変長符号化は、「ラン数」と「レベル値」のペアに、指数ゴロム的な「即値フィールド」を組み合わせて符号を作ることで、符号化テーブルの行数を小さくする。符号化テーブルの行数は高々１２行程度であり、符号化テーブルの各行は、行毎に決められた「ラン数の範囲」と「レベル値の範囲」のペアを表しており、実際のラン数とレベル値は各行の即値フィールドに与えられる「即値」によって与えられる。ここで、「ラン数の範囲」と「レベル値の範囲」のペアについて、出願頻度の高いペアを短いビット長の符号で、出願頻度の低いペアを長いビット長の符号（コード）で表現する。 On the other hand, variable-length coding using the coding table 60 with an immediate field is made by combining a “run number” and “level value” pair with an exponent Golomb-like “immediate field” to create a code. The number of rows in the encoding table is reduced. The number of rows in the coding table is about 12 at most, and each row of the coding table represents a pair of “run number range” and “level value range” determined for each row, and an actual run Numbers and level values are given by "immediate" given in the immediate field of each row. Here, for the "run number range" and "level value range" pairs, a pair with a high application frequency is expressed with a code with a short bit length, and a pair with a low application frequency is expressed with a code with a long bit length. .

即値フィールド付き符号化テーブル６０を用いた可変長復号時には、符号化テーブル６０のいずれの行に当てはまるかをまずサーチし、当てはまった行から「ラン数の範囲」と「レベル値の範囲」のペアが特定され、その行の即値フィールドからラン数の即値とレベル値の即値を取得すればよい。通常のハフマン符号化のテーブルの場合、行数が多く、テーブルサーチが複雑でＧＰＵ２００では実行することが難しいが、即値フィールド付き符号化テーブル６０は行数が少ないため、条件分岐を少なくすることができ、ＧＰＵ２００で複数のスレッドを並列実行することで効率良く可変長復号することができる。 At the time of variable-length decoding using the encoding table 60 with an immediate value field, first, a search is made as to which row of the encoding table 60 is applicable, and a pair of “run number range” and “level value range” is determined from the applied row. And the immediate value of the run number and the immediate value of the level value may be obtained from the immediate field of the row. In the case of a normal Huffman coding table, the number of rows is large and the table search is complicated and difficult to execute on the GPU 200. However, since the coding table 60 with an immediate value field has a small number of rows, conditional branches may be reduced. The GPU 200 can execute variable length decoding efficiently by executing a plurality of threads in parallel.

図４は、即値フィールド付き符号化テーブル６０の一例を示す図である。この符号化テーブル６０は４行であり、ラン数の範囲とレベル値の範囲のペアに対して異なるビット長の符号（コード）を割り当てる。ここでは、ＤＣＴブロックは１６×１６であり、１２ビットのＤＣＴ係数の値を２５６個毎に区切って符号化するため、ラン数は０〜２５５の値を取り、レベル値は０〜４０９５の値を取る。 FIG. 4 is a diagram illustrating an example of the encoding table 60 with an immediate field. The encoding table 60 has four rows, and codes (codes) having different bit lengths are assigned to pairs of the range of run numbers and the range of level values. Here, the DCT block is 16 × 16, and the value of the 12-bit DCT coefficient is divided and coded every 256. Therefore, the number of runs takes a value of 0 to 255, and the level value is a value of 0 to 4095. I take the.

コード１「１ＲＲｓＬＬ」は、ラン数の範囲０〜３（２ビット）、レベル値の範囲０〜３（２ビット）のペアに対応し、ビット長６である。先頭の「１」はコード１であることを識別する符号である。「ＲＲ」はラン数の即値であり、０〜３のいずれかの値を取る。「ＬＬ」はレベル値の即値であり、０〜３のいずれかの値を取る。「ｓ」は符号（sign）ビットであり、レベル値の正負を示す（以下、同じ）。 Code 1 “1RRsLL” corresponds to a pair of run numbers ranging from 0 to 3 (2 bits) and level values ranging from 0 to 3 (2 bits), and has a bit length of 6. The leading “1” is a code for identifying the code 1. “RR” is an immediate value of the number of runs and takes any value from 0 to 3. “LL” is an immediate value of the level value and takes any value from 0 to 3. “S” is a sign bit and indicates whether the level value is positive or negative (the same applies hereinafter).

コード２「０１ＲＲＲＲＲｓＬＬＬＬＬ」は、ラン数の範囲０〜３１（５ビット）、レベル値の範囲０〜３１（５ビット）のペアに対応し、ビット長１３である。先頭の「０１」はコード２であることを識別する符号である。「ＲＲＲＲＲ」はラン数の即値であり、０〜３１のいずれかの値を取る。「ＬＬＬＬＬ」はレベル値の即値であり、０〜３１のいずれかの値を取る。 Code 2 “01RRRRRRsLLLLLL” corresponds to a pair of run number range 0 to 31 (5 bits) and level value range 0 to 31 (5 bits), and has a bit length 13. The leading “01” is a code for identifying the code 2. “RRRRRR” is an immediate value of the number of runs and takes any value from 0 to 31. “LLLLLL” is an immediate value of the level value and takes any value from 0 to 31.

コード３「００１ＲＲＲＲＲＲＲＲｓＬＬＬＬＬＬＬＬＬＬＬＬ」は、ラン数の範囲０〜２５５（８ビット）、レベル値の範囲０〜４０９５（１２ビット）のペアに対応し、ビット長２４である。先頭の「００１」はコード３であることを識別する符号である。「ＲＲＲＲＲＲＲＲ」はラン数の即値であり、０〜２５５のいずれかの値を取る。「ＬＬＬＬＬＬＬＬＬＬＬＬ」はレベル値の即値であり、０〜４０９５のいずれかの値を取る。 Code 3 “001RRRRRRRRsLLLLLLLLLLLLLL” corresponds to a pair of run numbers ranging from 0 to 255 (8 bits) and level values ranging from 0 to 4095 (12 bits), and has a bit length of 24. The leading “001” is a code for identifying the code 3. “RRRRRRRR” is an immediate value of the number of runs and takes any value from 0 to 255. “LLLLLLLLLLLLLL” is an immediate value of the level value and takes any value from 0 to 4095.

コード４「０００１」は、これ以降はすべて０であることを示す、ブロックの終わりを示す符号ＥＯＢ（End of Block）に対応し、ビット長４である。「０００１」はコード４であることを識別する符号である。 The code 4 “0001” corresponds to a code EOB (End of Block) indicating the end of the block, indicating that it is all 0 thereafter, and has a bit length of 4. “0001” is a code for identifying the code 4.

このように、即値フィールド付き符号化テーブル６０の各行は、ラン数の範囲とレベル値の範囲のペアに対応するコード識別符号と、ラン数の即値、レベル値の即値、およびレベル値の正負を示す符号ビットを表現した即値フィールドとを含む。 As described above, each row of the encoding table 60 with the immediate value field includes the code identification code corresponding to the pair of the range of the run number and the range of the level value, the immediate value of the run number, the immediate value of the level value, and the sign of the level value. And an immediate field representing the sign bit to be indicated.

本実施の形態のテクスチャ圧縮では、画像のブロックに対して離散コサイン変換（ＤＣＴ）がなされた後、量子化され、可変長符号化される。自然画を離散コサイン変換すると、周波数成分のほとんどが低周波領域に集中し、高周波成分は無視できるほど小さくなる。特に量子化により、高周波成分のＤＣＴ係数はほとんどゼロになる。このことから、可変長符号化の入力データはゼロが多数連続することが多くなる。 In texture compression according to the present embodiment, discrete cosine transform (DCT) is performed on a block of an image, and then quantization and variable length coding are performed. When a natural image is subjected to discrete cosine transform, most of the frequency components are concentrated in the low frequency region, and the high frequency components become so small that they can be ignored. In particular, due to quantization, the DCT coefficient of the high frequency component becomes almost zero. For this reason, there are many cases in which a lot of zeros continue in the variable length coding input data.

あるテクスチャ画像の量子化ＤＣＴ係数に対して図４の符号化テーブル６０にもとづいて可変長符号化すると、各コードの出現個数は、コード１が７，２００個、コード２が８１０個、コード３が６２個、コード４が２６０個であった。出現個数に各コードのビット長を乗じて合計すると、圧縮テクスチャ全体の符号量は５６，２５８ビットである。 When variable length coding is performed on a quantized DCT coefficient of a texture image based on the coding table 60 of FIG. 4, the number of appearances of each code is 7,200 for code 1, 810 for code 2, and code 3 There were 62 and code 4 was 260. When the number of appearances is multiplied by the bit length of each code and totaled, the code amount of the entire compressed texture is 56,258 bits.

図５は、即値フィールド付き符号化テーブル６０の別の例を示す図である。図４の符号化テーブル６０では、ラン数の範囲０〜３、レベル値の範囲０〜３のペアに対応するコード１の出現個数が非常に多かったことから、図５の符号化テーブル６０では、図４の４行の符号化テーブル６０に、ラン数の範囲０〜１、レベル値１のペアに対応する３ビットのコード１を追加することで５行のテーブルとした。 FIG. 5 is a diagram showing another example of the encoding table 60 with an immediate field. In the encoding table 60 of FIG. 4, since the number of occurrences of code 1 corresponding to the pairs of the run number range 0 to 3 and the level value range 0 to 3 is very large, the encoding table 60 of FIG. 4 is added to the 4-row coding table 60 in FIG. 4 by adding 3-bit code 1 corresponding to the range of run numbers 0 to 1 and level value 1 to obtain a 5-row table.

図５の符号化テーブル６０では、コード１「１０ｓ」（３ビット）、コード２「０１ＲＲｓＬＬ」（７ビット）、コード３「００１ＲＲＲＲＲｓＬＬＬＬＬ」（１４ビット）、コード４「０００１ＲＲＲＲＲＲＲＲｓＬＬＬＬＬＬＬＬＬＬＬＬ」（２５ビット）、コード５「００００１」（５ビット）で符号化される。 In the coding table 60 of FIG. 5, code 1 “10s” (3 bits), code 2 “01RRRRLL” (7 bits), code 3 “001RRRRRRsLLLLLL” (14 bits), code 4 “0001RRRRRRRRsLLLLLLLLLLLL” (25 bits), code It is encoded with 5 “00001” (5 bits).

図４の符号化テーブル６０では、コード１（６ビット）の出現個数は７，２００個であり、コード１の合計ビット数は４３，２００ビットであったのに対して、図５の符号化テーブル６０では、これが３，９００個のコード１（３ビット）と３，３００個のコード２（７ビット）に分かれ、コード１とコード２の合計ビット数は、１１，７００＋２３，１００＝３４，８００ビットに減少する。図５の符号化テーブル６０を用いた場合、圧縮テクスチャ全体の符号量は４８，９９０ビットであり、図４の符号化テーブル６０を用いた場合よりも符号量を減らすことができる。 In the encoding table 60 of FIG. 4, the number of occurrences of code 1 (6 bits) is 7,200, and the total number of bits of code 1 is 43,200 bits, whereas the encoding of FIG. In the table 60, this is divided into 3,900 code 1 (3 bits) and 3,300 code 2 (7 bits), and the total number of bits of code 1 and code 2 is 11,700 + 23, 100 = 34, Decrease to 800 bits. When the encoding table 60 of FIG. 5 is used, the code amount of the entire compressed texture is 48,990 bits, and the code amount can be reduced as compared with the case of using the encoding table 60 of FIG.

図６は、即値フィールド付き符号化テーブル６０のさらに別の例を示す図である。図６の符号化テーブル６０は、図５の符号化テーブル６０に比べて、さらに行数、すなわち、ラン数の範囲とレベル値の範囲のペアの組み合わせの数を増やして１０行、すなわち１０コードのテーブルとした。 FIG. 6 is a diagram showing still another example of the encoding table 60 with an immediate field. Compared with the encoding table 60 of FIG. 5, the encoding table 60 of FIG. 6 further increases the number of rows, that is, the number of combinations of the pairs of the range of run numbers and the range of the level value, so that 10 rows, that is, 10 codes. The table.

コード１「１Ｒｓ」は、ラン数の範囲０〜１（１ビット）、レベル値１のペアに対応し、ビット長３である。「Ｒ」は０または１の値を取り、ラン数の即値をそのまま表す。コード１は、（Ｒｕｎ，Ｌｅｖｅｌ）＝（０，１）、（１，１）を符号化するコードである。 Code 1 “1Rs” corresponds to a pair of run numbers ranging from 0 to 1 (1 bit) and level value 1, and has a bit length of 3. “R” takes a value of 0 or 1, and represents the immediate value of the run number as it is. The code 1 is a code for encoding (Run, Level) = (0, 1), (1, 1).

コード２「０１０ＲｓＬ」は、ラン数の範囲０〜１（１ビット）、レベル値の範囲２〜３（１ビット）のペアに対応し、ビット長６である。「Ｒ」は０または１の値を取り、ラン数の即値をそのまま表す。「Ｌ」は０または１の値を取り、オフセット２を加算することで、レベル値の即値を表す。 Code 2 “010RsL” corresponds to a pair of run numbers ranging from 0 to 1 (1 bit) and level values ranging from 2 to 3 (1 bit), and has a bit length of 6. “R” takes a value of 0 or 1, and represents the immediate value of the run number as it is. “L” takes a value of 0 or 1 and adds an offset 2 to represent an immediate value of the level value.

コード３「０１１ＲＲｓＬＬ」は、ラン数の範囲２〜５（２ビット）、レベル値の範囲１〜４（２ビット）のペアに対応し、ビット長８である。「ＲＲ」は０〜３のいずれかの値を取り、オフセット２を加算することで、ラン数の即値を表す。「ＬＬ」は０〜３のいずれかの値を取り、オフセット１を加算することで、レベル値の即値を表す。 Code 3 “011RRsLL” corresponds to a pair of run number range 2 to 5 (2 bits) and level value range 1 to 4 (2 bits), and has a bit length of 8. “RR” takes any value from 0 to 3 and adds an offset 2 to represent an immediate value of the number of runs. “LL” takes any value from 0 to 3 and adds an offset 1 to represent an immediate value of the level value.

コード４「００１０ＲｓＬＬ」は、ラン数の範囲０〜１（１ビット）、レベル値の範囲４〜７（２ビット）のペアに対応し、ビット長８である。「Ｒ」は０または１の値を取り、ラン数の即値をそのまま表す。「ＬＬ」は０〜３のいずれかの値を取り、オフセット４を加算することで、レベル値の即値を表す。 Code 4 “0010RsLL” corresponds to a pair of run numbers ranging from 0 to 1 (1 bit) and level values ranging from 4 to 7 (2 bits), and has a bit length of 8. “R” takes a value of 0 or 1, and represents the immediate value of the run number as it is. “LL” takes any value from 0 to 3, and an offset 4 is added to represent the immediate value of the level value.

コード５「００１１ＲＲｓＬＬ」は、ラン数の範囲６〜９（２ビット）、レベル値の範囲１〜４（２ビット）のペアに対応し、ビット長９である。「ＲＲ」は０〜３のいずれかの値を取り、オフセット６を加算することで、ラン数の即値を表す。「ＬＬ」は０〜３のいずれかの値を取り、オフセット１を加算することで、レベル値の即値を表す。 Code 5 “0011RRsLL” corresponds to a pair of run numbers ranging from 6 to 9 (2 bits) and level values ranging from 1 to 4 (2 bits), and has a bit length of 9. “RR” takes any value from 0 to 3 and adds an offset 6 to represent an immediate value of the number of runs. “LL” takes any value from 0 to 3 and adds an offset 1 to represent an immediate value of the level value.

コード６「０００１０ＲＲＲＲＲＲｓ」は、ラン数の範囲１０〜７３（６ビット）、レベル値１のペアに対応し、ビット長１２である。「ＲＲＲＲＲＲ」は０〜６３のいずれかの値を取り、オフセット１０を加算することで、ラン数の即値を表す。 Code 6 “00010RRRRRRs” corresponds to a pair of run numbers ranging from 10 to 73 (6 bits), level value 1, and has a bit length of 12. “RRRRRR” takes any value from 0 to 63 and adds an offset of 10 to represent the immediate value of the number of runs.

コード７「０００１１ＲＲＲＲＲｓＬＬＬＬＬ」は、ラン数の範囲０〜３１（５ビット）、レベル値の範囲０〜３１（５ビット）のペアに対応し、ビット長１６である。「ＲＲＲＲＲ」は０〜３１のいずれかの値を取り、ラン数の即値をそのまま表す。「ＬＬＬＬＬ」は０〜３１のいずれかの値を取り、レベル値の即値をそのまま表す。 Code 7 “00011RRRRRRsLLLLLL” corresponds to a pair of run number range 0 to 31 (5 bits) and level value range 0 to 31 (5 bits), and has a bit length of 16. “RRRRRR” takes any value from 0 to 31 and represents the immediate value of the run number as it is. “LLLLLL” takes any value from 0 to 31 and represents the immediate value of the level value as it is.

コード８「００００１ｓＬＬＬＬＬＬＬＬＬＬＬＬ」は、ラン数０、レベル値の範囲０〜４０９５（１２ビット）のペアに対応し、ビット長１８である。「ＬＬＬＬＬＬＬＬＬＬＬＬ」は０〜４０９５のいずれかの値を取り、レベル値の即値をそのまま表す。 A code 8 “00001sLLLLLLLLLLLLLL” corresponds to a pair of 0 in the number of runs and a level value range of 0 to 4095 (12 bits), and has a bit length of 18. “LLLLLLLLLLLLLL” takes any value from 0 to 4095 and represents the immediate value of the level value as it is.

コード９「０００００１」は、これ以降はすべて０であることを示す、ブロックの終わりを示す符号ＥＯＢ（End of Block）に対応し、ビット長６である。 The code 9 “000001” corresponds to a code EOB (End of Block) indicating the end of the block, indicating that it is all 0 thereafter, and has a bit length of 6.

コード１０「００００００１ＲＲＲＲＲＲＲＲｓＬＬＬＬＬＬＬＬＬＬＬＬ」は、ラン数の範囲０〜２５５（８ビット）、レベル値の範囲０〜４０９５（１２ビット）のペアに対応し、ビット長２８である。「ＲＲＲＲＲＲＲＲ」は０〜２５５のいずれかの値を取り、ラン数の即値をそのまま表す。「ＬＬＬＬＬＬＬＬＬＬＬＬ」は０〜４０９５のいずれかの値を取り、レベル値の即値をそのまま表す。 Code 10 “0000001RRRRRRRRsLLLLLLLLLLLLLL” corresponds to a pair of run number range 0-255 (8 bits) and level value range 0-4095 (12 bits), and has a bit length of 28. “RRRRRRRR” takes any value from 0 to 255 and represents the immediate value of the run number as it is. “LLLLLLLLLLLLLL” takes any value from 0 to 4095 and represents the immediate value of the level value as it is.

コード１〜１０のそれぞれの出現個数および合計ビット数は図示の通りである。図６の符号化テーブル６０を用いた場合、ビット長の短いコードが増えたことで、各行の合計ビット数を小さく抑えることができ、その結果、圧縮テクスチャ全体の符号量は４３，５３６ビットであり、図５の符号化テーブル６０を用いた場合よりもさらに符号量を減らすことができる。 The number of occurrences and the total number of bits of codes 1 to 10 are as shown in the figure. When the encoding table 60 of FIG. 6 is used, the number of bits with a short bit length is increased, so that the total number of bits in each row can be reduced. As a result, the code amount of the entire compressed texture is 43,536 bits. Yes, the amount of codes can be further reduced as compared with the case of using the encoding table 60 of FIG.

本実施の形態の即値フィールド付き符号化テーブル６０はいずれも、異なるコード間でラン数の範囲とレベル値の範囲が重複することを許している。２以上のコードのラン数の範囲とレベル値の範囲に当てはまる場合は、より符号長の短いコードが優先的に用いられる。 In any of the encoding tables 60 with an immediate value field according to the present embodiment, the range of run numbers and the range of level values are allowed to overlap between different codes. In the case where the run number range and the level value range of two or more codes are applicable, a code having a shorter code length is preferentially used.

図４の即値フィールド付き符号化テーブル６０を参照して、即値フィールド付き符号化テーブル６０を用いた可変長復号を説明する。符号化データが図４の符号化テーブル６０のコード１〜４のいずれに当てはまるかをサーチするために、符号化データのビット列において最初に１が現れるのは何番目のビットであるかを調べる。 With reference to the encoding table 60 with an immediate field in FIG. 4, variable length decoding using the encoding table 60 with an immediate field will be described. In order to search which of the codes 1 to 4 in the encoding table 60 of FIG. 4 corresponds to the encoded data, it is examined what bit the 1 appears first in the bit string of the encoded data.

１番目のビットに最初に１が現れた場合（「分岐Ａ」と呼ぶ）、コード１であり、残りの５ビットの即値フィールドからラン数の即値（２ビット）、符号ビット、レベル値の即値（２ビット）を順に読み出す。 If 1 appears first in the first bit (referred to as “branch A”), it is code 1 and the immediate value of the run number (2 bits), the sign bit, and the immediate value of the level value from the remaining 5-bit immediate field Read (2 bits) sequentially.

２番目のビットに最初に１が現れた場合（「分岐Ｂ」と呼ぶ）、コード２であり、残りの１１ビットの即値フィールドからラン数の即値（５ビット）、符号ビット、レベル値の即値（５ビット）を順に読み出す。 If 1 appears first in the second bit (referred to as “branch B”), it is code 2 and the immediate value of the run number (5 bits), the sign bit, and the immediate value of the level value from the remaining 11-bit immediate field Read (5 bits) sequentially.

３番目のビットに最初に１が現れた場合（「分岐Ｃ」と呼ぶ）、コード３であり、残りの２１ビットの即値フィールドからラン数の即値（８ビット）、符号ビット、レベル値の即値（１２ビット）を順に読み出す。 If 1 appears first in the third bit (referred to as “branch C”), it is code 3, and the immediate value of the run number (8 bits), the sign bit, and the immediate value of the level value from the remaining 21-bit immediate field Read (12 bits) in order.

４番目のビットに最初に１が現れた場合（「分岐Ｄ」と呼ぶ）、コード４であり、ＥＯＢである。 When 1 appears first in the fourth bit (referred to as “branch D”), it is code 4 and is EOB.

図４に示した各コードの出現個数の例から、図４の符号化テーブル６０を用いて可変長符号化された圧縮テクスチャデータを可変長復号すると、分岐Ａを通ることがきわめて多くなることがわかる。図４のような符号化テーブル６０を用いた可変長符号化の性質によって、ＧＰＵ２００のコンピュートシェーダが効率良く可変長復号することができる。なぜなら、ＧＰＵ２００は、ＳＩＭＤ（Single Instruction Multiple Data）アーキテクチャであり、複数のスレッドが異なるデータに対して同じインストラクションを同時に実行するため、分岐条件に偏りがあれば、並列度が高まり、実行効率が上がる。 From the example of the number of appearances of each code shown in FIG. 4, when variable length decoding is performed on the compressed texture data that has been subjected to variable length encoding using the encoding table 60 of FIG. Recognize. Due to the nature of variable length coding using the coding table 60 as shown in FIG. 4, the compute shader of the GPU 200 can efficiently perform variable length decoding. This is because the GPU 200 has a SIMD (Single Instruction Multiple Data) architecture, and a plurality of threads execute the same instruction simultaneously on different data, so if the branch condition is biased, the degree of parallelism increases and the execution efficiency increases. .

ＧＰＵ２００は、一つのプログラムカウンタ（ＰＣ）がインストラクションキャッシュに格納されたインストラクションを参照し、たとえば１６個のＡＬＵ（Arithmetic Logic Unit）が同時にＰＣが参照するインストラクションを実行する。ｉｆ−ｔｈｅｎ−ｅｌｓｅ文のループやｓｗｉｔｃｈ−ｃａｓｅ文のループの分岐毎に異なる命令を１６個のスレッドにセットして同時に実行することになる。１６個のスレッドに対して、ｉｆ−ｔｈｅｎ−ｅｌｓｅ文による条件分岐では、ｉｆ条件が成立する場合（Ｔｒｕｅ）のピクセルを担当するスレッドを有効にして並列に実行し、ｅｌｓｅ分岐では、ｅｌｓｅ条件が成立する場合（Ｆａｌｓｅ）のピクセルを担当するスレッドを有効にして並列に実行する。ｓｗｉｔｃｈ−ｃａｓｅ文による条件分岐では、条件が成立したｃａｓｅのピクセルを担当するスレッドを有効にして並列に実行する。 In the GPU 200, one program counter (PC) refers to an instruction stored in the instruction cache, and, for example, 16 ALUs (Arithmetic Logic Units) execute instructions referred to by the PC at the same time. A different instruction is set in 16 threads for each branch of the loop of the if-then-else statement and the switch-case statement, and is executed simultaneously. For the 16 threads, in the conditional branch by the if-then-else statement, the thread responsible for the pixel when the if condition is true (True) is executed and executed in parallel. In the else branch, the else condition is If it is established (False), the thread responsible for the pixel is enabled and executed in parallel. In the conditional branch by the switch-case statement, the thread in charge of the pixel of the case where the condition is satisfied is enabled and executed in parallel.

ｉｆ−ｔｈｅｎ−ｅｌｓｅ文による条件分岐では、ｉｆ条件が成立する場合とｅｌｓｅ条件が成立する場合がほぼ同数である場合、Ｔｒｕｅの場合とＦａｌｓｅの場合で有効化するスレッドの入れ替えを頻繁に行うことになるが、ｉｆ条件成立が８割、ｅｌｓｅ条件成立が２割のように偏っていれば、Ｔｒｕｅの場合に有効化するスレッドの集合を繰り返し使えるため、実行効率が高まる。ｓｗｉｔｃｈ−ｃａｓｅ文による条件分岐では、各ｃａｓｅが成立する場合がほぼ同数である場合、ｃａｓｅ毎に有効化するスレッドの入れ替えを頻繁に行うことになるが、各ｃａｓｅの成立の頻度が偏っていれば、成立頻度の高いｃａｓｅで有効化するスレッドの集合を繰り返し使えるため、実行効率が高まる。図７および図８を参照してこの点をより詳しく説明する。 In conditional branching with if-then-else statements, if the if condition is satisfied and the else condition is approximately equal, the threads that are enabled in the case of True and False are frequently replaced. However, if the if condition is satisfied, such as 80%, and the else condition is satisfied, such as 20%, the set of threads to be enabled in the case of True can be used repeatedly, and the execution efficiency increases. In the conditional branch by the switch-case statement, if the number of cases established is almost the same, the threads to be activated are frequently replaced for each case, but the frequency of establishment of each case is biased. For example, since a set of threads that are activated in a case having a high establishment frequency can be used repeatedly, execution efficiency is increased. This point will be described in more detail with reference to FIGS.

図７は、比較のため、分岐先に偏りがない場合のスレッドの実行過程を説明する図である。 FIG. 7 is a diagram for explaining the thread execution process when there is no bias in the branch destination for comparison.

ＧＰＵ２００は複数の計算ユニット（Computing Unit）を含む。ＧＰＵ２００の１つの計算ユニットで同時に実行されるスレッドの数は計算ユニット内の演算器の数によって決まるが、ここではこれを１６個とする。１つの計算ユニットに同時に投入可能な最大１６スレッドの集まりを「スレッドセット」と呼ぶ。スレッドセットに含まれる各スレッドは、同じシェーダプログラムを実行するが、処理するデータはそれぞれ異なり、プログラム内に分岐がある場合は、それぞれ別の分岐先をもつことがある。１つの計算ユニットはあるサイクルでは、１つのスレッドセット（ここでは最大１６スレッド）を並列に実行する。 The GPU 200 includes a plurality of computing units. The number of threads simultaneously executed by one calculation unit of the GPU 200 is determined by the number of arithmetic units in the calculation unit. A group of up to 16 threads that can be simultaneously input to one computing unit is called a “thread set”. Each thread included in the thread set executes the same shader program, but the data to be processed is different. If there is a branch in the program, each thread may have a different branch destination. One calculation unit executes one thread set (here, a maximum of 16 threads) in parallel in a certain cycle.

たとえば、各分岐先での必要な命令数が数個であっても、プラグラムカウンタが１個であり、計算ユニット内のすべての演算器は同一の命令を実行するＳＩＭＤ構造であるため、スレッドマスクによって実行するスレッドを変えながら各分岐の一つ一つの命令を実行することになる。 For example, even if the number of instructions required at each branch destination is only a few, the program counter is one, and all the arithmetic units in the calculation unit have a SIMD structure that executes the same instruction. Each instruction of each branch is executed while changing the thread to be executed.

一例として、図４の符号化テーブル６０を用いて可変長復号する際の分岐Ａは４命令、分岐Ｂは４命令、分岐Ｃは４命令、分岐Ｄは２命令で実行されるとする。図７の例では、スレッドセット４５０内の１６個のスレッドの分岐先が順にＡ、Ａ、Ｂ、Ａ、Ａ、Ａ、Ｂ、Ｃ、Ｂ、Ａ、Ｂ、Ａ、Ｂ、Ａ、Ｂ、Ｄである場合を説明している。 As an example, assume that branch A when executing variable length decoding using the encoding table 60 of FIG. 4 is executed with 4 instructions, branch B with 4 instructions, branch C with 4 instructions, and branch D with 2 instructions. In the example of FIG. 7, branch destinations of 16 threads in the thread set 450 are A, A, B, A, A, A, B, C, B, A, B, A, B, A, B, The case of D is described.

サイクル１において、分岐Ａを実行するスレッドのみ（この場合、８個のスレッド）を有効にし、プログラムカウンタを１つずつ進めながら、分岐Ａの４命令Ａ−１、Ａ−２、Ａ−３、Ａ−４を実行する。 In cycle 1, only the thread that executes branch A (in this case, eight threads) is enabled, and the four instructions A-1, A-2, A-3, A-4 is executed.

サイクル５において、分岐Ｂを実行するスレッドのみ（この場合、６個のスレッド）を有効にし、プログラムカウンタを１つずつ進めながら、分岐Ｂの４命令Ｂ−１、Ｂ−２、Ｂ−３、Ｂ−４を実行する。 In cycle 5, only the thread that executes branch B (in this case, six threads) is enabled, and the four instructions B-1, B-2, B-3, B-4 is executed.

サイクル９において、分岐Ｃを実行するスレッドのみ（この場合、１個のスレッド）を有効にし、プログラムカウンタを１つずつ進めながら、分岐Ｃの４命令Ｃ−１、Ｃ−２、Ｃ−３、Ｃ−４を実行する。 In cycle 9, only the thread that executes branch C (in this case, one thread) is enabled, and while the program counter is advanced by one, four instructions C-1, C-2, C-3, Execute C-4.

サイクル１３において、分岐Ｄを実行するスレッドのみ（この場合、１個のスレッド）を有効にし、プログラムカウンタを１つずつ進めながら、分岐Ｄの２命令Ｄ−１、Ｄ−２を実行する。 In cycle 13, only the thread that executes branch D (in this case, one thread) is enabled, and the two instructions D-1 and D-2 of branch D are executed while the program counter is advanced one by one.

このように、図７の例では、スレッドセットに含まれる１６個のスレッドが４つの分岐Ａ〜Ｄのすべての命令を実行するために、１４サイクルが必要となる。 Thus, in the example of FIG. 7, 14 cycles are required for the 16 threads included in the thread set to execute all the instructions of the four branches A to D.

図８は、分岐先に偏りがある場合のスレッドの実行過程を説明する図である。図８の例では、スレッドセット４５２内の１６個のスレッドの分岐先が順にＡ、Ａ、Ｂ、Ａ、Ａ、Ａ、Ｂ、Ｂ、Ｂ、Ａ、Ｂ、Ａ、Ａ、Ａ、Ａ、Ａである場合を説明している。この例では、シェーダプログラム上は分岐先が４種類あるが、分岐条件が成立するピクセルが偏っており、分岐先が分岐Ａと分岐Ｂの２種類しかない。スレッドセットに含まれる１６個のスレッドはこの２種類の分岐だけを実行すればよい。 FIG. 8 is a diagram illustrating a thread execution process when there is a bias in the branch destination. In the example of FIG. 8, the branch destinations of the 16 threads in the thread set 452 are A, A, B, A, A, A, B, B, B, A, B, A, A, A, A, The case of A is described. In this example, there are four types of branch destinations in the shader program, but the pixels that satisfy the branch condition are biased, and there are only two types of branch destinations, branch A and branch B. The 16 threads included in the thread set need only execute these two types of branches.

サイクル１において、分岐Ａを実行するスレッドのみ（この場合、１１個のスレッド）を有効にし、プログラムカウンタを１つずつ進めながら、分岐Ａの４命令Ａ−１、Ａ−２、Ａ−３、Ａ−４を実行する。 In cycle 1, only the thread that executes branch A (in this case, 11 threads) is enabled and the four instructions A-1, A-2, A-3, A-4 is executed.

サイクル５において、分岐Ｂを実行するスレッドのみ（この場合、５個のスレッド）を有効にし、プログラムカウンタを１つずつ進めながら、分岐Ｂの４命令Ｂ−１、Ｂ−２、Ｂ−３、Ｂ−４を実行する。 In cycle 5, only the thread that executes branch B (in this case, five threads) is enabled, and while the program counter is advanced by one, four instructions B-1, B-2, B-3, B-4 is executed.

このように、図８の例では、スレッドセットに含まれる１６個のスレッドが２つの分岐Ａ、Ｂのすべての命令を実行すればよく、必要サイクル数は８サイクルに減る。 As described above, in the example of FIG. 8, the 16 threads included in the thread set only need to execute all the instructions of the two branches A and B, and the necessary number of cycles is reduced to 8.

このように入力されるデータの性質によってプログラムの分岐先に偏りが生じる場合は、同じスレッドマスクをそのまま使って繰り返し命令を実行することができ、実行効率が向上する。分岐先にばらつきがあると、分岐毎にスレッドマスクを切り替えることになり、実行効率が低下する。 If there is a bias in the branch destination of the program due to the nature of the input data, the same thread mask can be used as it is to execute the instruction repeatedly, improving the execution efficiency. If there are variations in the branch destinations, the thread mask is switched for each branch, and the execution efficiency decreases.

自然画由来のＤＣＴ係数の特性から、ＤＣＴ係数行列の左上の低周波成分に０以外の値が集中し、ＤＣＴ係数行列の右下の高周波成分に０が連続するようになる。したがって、離散コサイン変換後の画像ブロックをジグザグパターンにより１次元配列にすると、どのブロックのＤＣＴ係数も最初は非ゼロの値が続き、後半に０が連続するデータ列となる傾向がある。 Due to the nature of DCT coefficients derived from natural images, values other than 0 are concentrated on the upper left low frequency component of the DCT coefficient matrix, and 0 continues to the lower right high frequency component of the DCT coefficient matrix. Therefore, when an image block after the discrete cosine transform is made into a one-dimensional array by a zigzag pattern, the DCT coefficient of any block tends to be a data string in which a non-zero value continues first and 0 continues in the latter half.

このＤＣＴ係数の傾向を踏まえて、スレッドセットの各スレッドには、異なるＤＣＴブロックのＤＣＴ係数を処理するように可変長符号化データを割り当て、各スレッドがＤＣＴブロック内で相対的に同じ位置のＤＣＴ係数の可変長復号を行うようにスレッドセットを構成する。図４の符号化テーブル６０の場合、分岐先が分岐Ａ〜Ｄのいずれかになる。スレッドセットの構成から、ＤＣＴブロック内の相対的に同じ位置ではＤＣＴ係数の傾向が似るため、スレッドセット内の各スレッドの分岐先は同じものに偏るようになる。これにより、図７のように分岐先がばらつくのではなく、図８のように分岐先が偏るようになり、スレッドセットの効率的な実行状態を長く続けることができる。その結果、スレッドセットによって可変長復号は効率良く実行される。 Based on the tendency of the DCT coefficient, variable length encoded data is allocated to each thread of the thread set so as to process the DCT coefficient of a different DCT block, and the DCTs in which the threads are relatively in the same position in the DCT block are allocated. Configure the thread set to perform variable length decoding of the coefficients. In the case of the encoding table 60 in FIG. 4, the branch destination is one of the branches A to D. From the configuration of the thread set, the tendency of the DCT coefficient is similar at the relatively same position in the DCT block, so that the branch destination of each thread in the thread set is biased to the same. Accordingly, the branch destinations do not vary as shown in FIG. 7, but the branch destinations become biased as shown in FIG. 8, and the efficient execution state of the thread set can be continued for a long time. As a result, variable length decoding is efficiently performed by the thread set.

図６の即値フィールド付き符号化テーブル６０を用いた可変長復号の手順を詳しく説明する。図９は、符号化データが図６の符号化テーブル６０のいずれの行に当てはまるかをサーチする際の分岐を説明する図である。符号化データのビット列において最初に１が現れるのが何番目のビットであるかを調べる。 The procedure of variable length decoding using the encoding table 60 with immediate field in FIG. 6 will be described in detail. FIG. 9 is a diagram for explaining a branch when searching for which row of the encoding table 60 in FIG. 6 corresponds to the encoded data. It is examined what number bit the 1 appears first in the bit string of the encoded data.

１番目のビットに最初に１が現れた場合、ｃａｓｅ０であり、２番目のビットに最初に１が現れた場合、ｃａｓｅ１であり、３番目のビットに最初に１が現れた場合、ｃａｓｅ２であり、４番目のビットに最初に１が現れた場合、ｃａｓｅ３であり、５番目のビットに最初に１が現れた場合、ｃａｓｅ４であり、６番目のビットに最初に１が現れた場合、ｃａｓｅ５であり、７番目のビットに最初に１が現れた場合、ｃａｓｅ６である。 If 1 appears first in the first bit, it is case 0; if 1 appears first in the second bit, it is case 1; if 1 appears first in the third bit, it is case 2 If 1 appears first in the 4th bit, it is case 3; if 1 appears first in the 5th bit, it is case 4; if 1 appears first in the 6th bit, it is in case 5 Yes, if 1 appears first in the seventh bit, it is case6.

ｃａｓｅ０はコード１に対応し、ｃａｓｅ４はコード８に対応し、ｃａｓｅ５はコード９に対応し、ｃａｓｅ６はコード１０に対応するから、残りの即値フィールドから適宜、ラン数の即値、レベル値の即値を読み出せばよい。 Case 0 corresponds to code 1, case 4 corresponds to code 8, case 5 corresponds to code 9, and case 6 corresponds to code 10. Therefore, the immediate value of the run number and the immediate value of the level value are appropriately set from the remaining immediate field. Read it out.

ｃａｓｅ１はコード２およびコード３に対応し、３ビット目が０であればコード２、３ビット目が１であればコード３であることが判明するから、その後は、残りの即値フィールドからラン数の即値、レベル値の即値を読み出せばよい。 Case 1 corresponds to code 2 and code 3, and if the third bit is 0, it is determined to be code 2 and if the third bit is 1, it is determined to be code 3, and then the number of runs from the remaining immediate field The immediate value of level and the immediate value of the level value may be read out.

同様に、ｃａｓｅ２はコード４およびコード５に対応し、４ビット目が０であればコード４、４ビット目が１であればコード５である。また、ｃａｓｅ３はコード６およびコード７に対応し、５ビット目が０であればコード６、４ビット目が１であればコード７である。いずれのコードであるかが特定されたなら、残りの即値フィールドから、適宜、ラン数の即値、レベル値の即値を読み出す。 Similarly, case2 corresponds to code 4 and code 5, code 4 if the fourth bit is 0, and code 5 if the fourth bit is 1. Case 3 corresponds to code 6 and code 7, and is code 6 if the 5th bit is 0, and code 7 if the 4th bit is 1. If any code is specified, the immediate value of the run number and the immediate value of the level value are appropriately read from the remaining immediate value fields.

図１０は、図９で説明した分岐を有するプログラムソースコードを示す。clz=FirstSetBit_Hi_MSB(code)は、符号化データのビット列において最初に１が現れる列番号ｃｌｚを求める演算式である。列番号は０からカウントしているから、プログラムコードのswitch文のcase 0〜case 6は図９のｃａｓｅ０〜ｃａｓｅ６に対応する。関数BITAT(code,n-1,m)は符号化データのビット列の第ｎ列から前方にｍビットのビット列を読み出す演算である。 FIG. 10 shows program source code having the branch described in FIG. clz = FirstSetBit_Hi_MSB (code) is an arithmetic expression for obtaining a column number clz in which 1 first appears in the bit string of the encoded data. Since the column numbers are counted from 0, case 0 to case 6 of the switch statement of the program code correspond to case 0 to case 6 in FIG. The function BITAT (code, n−1, m) is an operation for reading an m-bit bit string forward from the n-th column of the bit string of the encoded data.

switch文のcase 1のソースコードを説明すると、if(BITAT(code,2,1)==0)は、符号化データのビット列の３ビット目が０である場合であり、これは図９のコード２である。コード２では、４ビット目からラン数の即値を読み出せばよいから、run=BITAT(code,3,1)を実行する。次に６ビット目からレベル値の即値を読み出すが、オフセットとして２を加算する必要があるため、level=BITAT(code,5,1)+2を実行する。符号ビットは、５ビット目から読み出せばよいから、sign=BITAT(code,4,1)を実行する。 Describing the source code of case 1 of the switch statement, if (BITAT (code, 2,1) == 0) is the case where the third bit of the bit string of the encoded data is 0, which is shown in FIG. Code 2. In code 2, run = BITAT (code, 3,1) is executed because the immediate value of the run number may be read from the fourth bit. Next, the immediate value of the level value is read from the 6th bit, but since it is necessary to add 2 as an offset, level = BITAT (code, 5,1) +2 is executed. Since the sign bit may be read from the fifth bit, sign = BITAT (code, 4, 1) is executed.

if(BITAT(code,2,1)==0)が成立しない場合、符号化データのビット列の３ビット目が１であるから、これは図９のコード３である。この場合、else文が実行される。コード３では、４ビット目と５ビット目からラン数の即値を読み出すが、オフセットとして２を加算する必要があるため、run=BITAT(code,4,2)+2を実行する。ここで、BITAT(code,4,2)は、５ビット目から前方へ２ビットのビット列を読み出す演算であるから、結果的に４ビット目と５ビット目が読み出されることに留意する。次に７ビット目と８ビット目からレベル値の即値を読み出すが、オフセットとして１を加算する必要があるため、level=BITAT(code,7,2)+1を実行する。符号ビットは、６ビット目から読み出せばよいから、sign=BITAT(code,5,1)を実行する。 If if (BITAT (code, 2,1) == 0) does not hold, since the third bit of the bit string of the encoded data is 1, this is code 3 in FIG. In this case, the else statement is executed. In code 3, the immediate value of the number of runs is read from the 4th and 5th bits, but since 2 must be added as an offset, run = BITAT (code, 4,2) +2 is executed. Here, since BITAT (code, 4, 2) is an operation for reading a bit string of 2 bits forward from the 5th bit, it is noted that the 4th and 5th bits are read as a result. Next, the immediate value of the level value is read from the 7th bit and the 8th bit, but since it is necessary to add 1 as an offset, level = BITAT (code, 7, 2) +1 is executed. Since the sign bit may be read from the sixth bit, sign = BITAT (code, 5, 1) is executed.

switch文のcase 2〜case 6についても同様に各行で決められたラン数の範囲、レベル値の範囲に応じて即値フィールドからラン数の即値、レベル値の即値を読み出して、適宜オフセットを加算する演算を行えばよい。 Similarly, in case 2 to case 6 of the switch statement, the immediate value of the run number and the immediate value of the level value are read from the immediate field according to the range of the number of runs and the level value range determined in each line, and an appropriate offset is added. What is necessary is just to perform an operation.

図１１は、即値フィールド付き符号化テーブル６０の実施例を示す図である。図６の符号化テーブル６０よりもさらに２行増やして、１２種類のコードで符号化する。各行のコードのラン数の範囲、レベル値の範囲、ビット長、出現個数は図示した通りである。 FIG. 11 is a diagram illustrating an example of the encoding table 60 with an immediate field. The coding table 60 is further increased by two lines from the coding table 60 of FIG. The range of code run numbers, the range of level values, the bit length, and the number of appearances of each line are as shown in the figure.

図１１の符号化テーブル６０のコードの性質は次のようにまとめることができる。
（１）０〜７３個の連続する０に続くレベル値１に対して３〜１２ビットのコードを割り当てる。
（２）０〜９個の連続する０に続くレベル値２〜４に対して６〜９ビットのコードを割り当てる。
（３）１個以内の０に続くレベル値４〜７に対して８ビットのコードを割り当てる。
（４）０〜３１個の連続する０に続くレベル値０〜３１に対して１６ビットのコードを割り当てる。
（５）３２以上の連続するレベル値に対して１８ビットのコードを割り当てる。
（６）その他の任意の連続する０に続く任意のレベル値に対して２９ビットのコードを割り当てる。 The code properties of the encoding table 60 of FIG. 11 can be summarized as follows.
(1) A code of 3 to 12 bits is assigned to a level value 1 following 0 to 73 consecutive zeros.
(2) A code of 6 to 9 bits is assigned to level values 2 to 4 following 0 to 9 consecutive zeros.
(3) An 8-bit code is assigned to level values 4 to 7 following zero within one.
(4) A 16-bit code is assigned to level values 0 to 31 following 0 to 31 consecutive zeros.
(5) An 18-bit code is assigned to 32 or more consecutive level values.
(6) A 29-bit code is assigned to any other level value following any other consecutive zero.

ハフマン符号化では、与えられた画像に対して、出現頻度の高いラン数とレベル値の組み合わせに対して短い符号を、出現頻度の低いラン数とレベル値の組み合わせに対しては長い符号を割り当てた符号化テーブルが動的に生成される。それに対して、本実施の形態の即値フィールド付き符号化テーブル６０を用いた可変長符号化では、即値フィールド付き符号化テーブル６０は動的に生成されるのではなく、あらかじめ決められたものが用いられる。もっとも、複数の異なる即値フィールド付き符号化テーブル６０を用意しておき、何らかの条件でいずれかのテーブルに切り替えて用いてもよく、複数の即値フィールド付き符号化テーブル６０の中から、与えられた画像を実際に可変長符号化した場合の符号量が最も小さくなるテーブルを最適なテーブルとして選択してもよい。 In Huffman coding, for a given image, a short code is assigned to a combination of run number and level value with high appearance frequency, and a long code is assigned to a combination of run number and level value with low appearance frequency. The encoding table is dynamically generated. On the other hand, in the variable length coding using the coding table 60 with the immediate field according to the present embodiment, the coding table 60 with the immediate field is not dynamically generated, but a predetermined one is used. It is done. Of course, a plurality of different encoding tables 60 with immediate fields may be prepared and used by switching to one of the tables under some condition. A given image is selected from the plurality of encoding tables 60 with immediate fields. A table having the smallest code amount when variable length coding is actually performed may be selected as the optimum table.

本実施の形態のグラフィックス処理装置によれば、離散コサイン変換後に即値フィールド付き符号化テーブルを用いて可変長符号化されたテクスチャを用いるため、テクスチャ容量を大きく削減することができる。ＧＰＵ２００のコンピュートシェーダが圧縮テクスチャを即値フィールド付き符号化テーブルを用いて可変長復号し、逆離散コサイン変換するため、高速に圧縮テクスチャを伸張してグラフィックス処理に投入することができる。高圧縮されたテクスチャはメモリに常駐させることができるため、大容量のテクスチャをハードディスクなどの記憶装置から読み出す必要がなく、オンメモリでＰＲＴを実行することが可能である。圧縮テクスチャがオンメモリ化されているため、必要に応じて圧縮テクスチャを読み出し、伸張してＰＲＴキャッシュにスワップインする構成にしても、レイテンシは短く、リアルタイムでテクスチャ処理を実行することができる。 According to the graphics processing apparatus of the present embodiment, the texture capacity that has been subjected to variable length coding using the coding table with an immediate field after discrete cosine transform is used, so that the texture capacity can be greatly reduced. Since the compute shader of the GPU 200 performs variable-length decoding of the compressed texture using the encoding table with an immediate field and performs inverse discrete cosine transform, the compressed texture can be decompressed at high speed and input to the graphics processing. Since the highly compressed texture can be resident in the memory, it is not necessary to read out a large-capacity texture from a storage device such as a hard disk, and the PRT can be executed on-memory. Since the compressed texture is on-memory, the latency is short and texture processing can be executed in real time even if the compressed texture is read out as needed, decompressed, and swapped into the PRT cache.

以上、本発明を実施の形態をもとに説明した。実施の形態は例示であり、それらの各構成要素や各処理プロセスの組合せにいろいろな変形例が可能なこと、またそうした変形例も本発明の範囲にあることは当業者に理解されるところである。 The present invention has been described based on the embodiments. The embodiments are exemplifications, and it will be understood by those skilled in the art that various modifications can be made to combinations of the respective constituent elements and processing processes, and such modifications are within the scope of the present invention. .

上記の実施の形態では、圧縮テクスチャをメモリに格納したが、圧縮テクスチャをハードディスクや光ディスクなどの記録媒体に格納してもよい。テクスチャは高圧縮されているため、記憶容量を抑えることができ、また、オンメモリの場合のレイテンシにはかなわないが、記録媒体からの読み出しのレイテンシをある程度抑えることもできる。 In the above embodiment, the compressed texture is stored in the memory. However, the compressed texture may be stored in a recording medium such as a hard disk or an optical disk. Since the texture is highly compressed, the storage capacity can be suppressed, and the latency in reading from the recording medium can be suppressed to some extent although it does not meet the latency in the case of on-memory.

上記の実施の形態では、画像の空間領域を空間周波数領域に変換する空間周波数変換の一例として、離散コサイン変換を用いたが、これ以外の空間周波数変換、たとえば離散フーリエ変換を用いてもよい。 In the above embodiment, the discrete cosine transform is used as an example of the spatial frequency transform for transforming the spatial region of the image into the spatial frequency region. However, other spatial frequency transforms such as a discrete Fourier transform may be used.

上記の実施の形態では、ＧＰＵ２００が可変長復号部３０とＩＤＣＴ部４０を含む構成において圧縮テクスチャを伸張する手順を説明したが、本実施の形態の即値フィールド付き符号化テーブル６０を用いた可変長復号は、グラフィックス処理装置における圧縮テクスチャの伸張以外にも、一般的な画像処理装置において可変長符号化された画像を復号する場合にも利用することができる。 In the above embodiment, the procedure for expanding the compressed texture in the configuration in which the GPU 200 includes the variable length decoding unit 30 and the IDCT unit 40 has been described. However, the variable length using the encoding table 60 with the immediate value field according to the present embodiment is described. Decoding can be used not only for decompressing compressed textures in a graphics processing apparatus, but also when decoding a variable-length encoded image in a general image processing apparatus.

１０ＰＲＴ制御部、２０グラフィックス演算部、３０可変長復号部、４０逆離散コサイン変換部、５０グラフィックス処理部、６０即値フィールド付き符号化テーブル、８０ＤＣＴブロックリングバッファ、１００メインプロセッサ、２００ＧＰＵ、３００メインメモリ、３１０圧縮テクスチャ、３２０ＰＲＴキャッシュ、３３０ページテーブル、３４０ミップマップテクスチャ、３６０テクスチャタイルプール。 10 PRT control unit, 20 graphics operation unit, 30 variable length decoding unit, 40 inverse discrete cosine transform unit, 50 graphics processing unit, 60 encoding table with immediate field, 80 DCT block ring buffer, 100 main processor, 200 GPU 300 main memory, 310 compressed textures, 320 PRT cache, 330 page tables, 340 mipmap textures, 360 texture tile pool.

Claims

Variable length decoding of a compressed image is executed based on an encoding table that assigns a code corresponding to a pair of run number range and level value range together with an immediate field indicating at least one of the immediate value of the run number and the immediate value of the level value. A variable length decoding unit;
An image decoding apparatus comprising: an inverse spatial frequency conversion unit that restores an image by performing inverse spatial frequency conversion on the variable length decoded image.

A graphics processing device including a main memory and a graphics processing unit,
The graphics processing unit compresses textures based on an encoding table that assigns a code corresponding to a run number range and level value range pair together with an immediate field indicating at least one of an immediate value of a run number and an immediate value of a level value. A variable length decoding unit that performs variable length decoding, and an inverse spatial frequency conversion unit that restores texture by performing inverse spatial frequency conversion on the variable length decoded texture,
The graphics processing apparatus according to claim 1, wherein the main memory includes a texture pool that partially caches the restored texture.

The graphics processing apparatus according to claim 2, wherein the variable length decoding unit is executed by a plurality of threads of a compute shader.

4. The graphics processing apparatus according to claim 2, wherein the compressed texture is stored in the main memory, and the variable length decoding unit reads the compressed texture from the main memory.

Variable length decoding of a compressed image is executed based on an encoding table that assigns a code corresponding to a pair of run number range and level value range together with an immediate field indicating at least one of the immediate value of the run number and the immediate value of the level value. Steps,
And a step of restoring the image by performing inverse spatial frequency conversion on the variable length decoded image.

A graphics processing method in a graphics processing apparatus including a main memory and a graphics processing unit,
Graphics processing unit
A variable length of compressed texture based on a coding table that is assigned by the compute shader to a code corresponding to a run number range and level value range pair with an immediate field indicating at least one of the immediate value of the run number and the immediate value of the level value. Perform decryption,
A graphics processing method comprising: restoring a texture by performing inverse spatial frequency conversion on a variable length decoded texture; and storing the restored texture in the texture pool in the main memory that partially caches the texture .

Variable length decoding of compressed texture is performed based on an encoding table that assigns codes corresponding to a pair of run number ranges and level value ranges together with an immediate field indicating at least one of the immediate value of the run number and the immediate value of the level value. Steps,
Restoring the texture by inverse spatial frequency transforming the variable length decoded texture and storing the restored texture in a texture pool that partially caches the texture, and causing the compute shader of the graphics processing unit to perform A program characterized by