JP2005218055A

JP2005218055A - Apparatus, method and program for processing image

Info

Publication number: JP2005218055A
Application number: JP2004025897A
Authority: JP
Inventors: Tomoya Kodama; 知也児玉; Noboru Yamaguchi; 昇山口; Tadaaki Masuda; 忠昭増田; Atsushi Matsumura; 淳松村
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2004-02-02
Filing date: 2004-02-02
Publication date: 2005-08-11

Abstract

<P>PROBLEM TO BE SOLVED: To effectively utilize parallel processing performance of a processor and to prevent an adverse effect on processing of adjacent pixels, when a parallel processing instruction processes an image including fewer pixels than concurrently processable pixels. <P>SOLUTION: An SIMD parallel processor includes a conversion unit 105 for arranging a plurality of image data so that image data of respective components alternate for each N/M (where M is an integer) pixels, when the parallel processing instruction of the processor SIMD is to concurrently process the plurality of pixel data including the pixels less than the concurrently processable pixels N; a read control unit 106 for reading the plurality of pixel data which have been arranged and converted by 2N pixels with N pixels as a unit; and a write control unit 107 for extracting the pixel data of the N/M pixels to be processed from the pixel data of the 2N pixels read by executing a byte shuffle instruction of the SIMD processor and compensating the shift of the pixel data of the extracted N/M pixels. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、予め定められたアドレス位置でメモリに対する読み書き処理を行うＳＩＭＤ（ＳｉｎｇｌｅＩｎｓｔｒｕｃｔｉｏｎＭｕｌｔｉｐｌｅＤａｔａ）型並列処理プロセッサによって、画像データに対する画像処理を行う画像処理装置、画像処理方法および画像処理プログラムに関するものである。 The present invention relates to an image processing apparatus, an image processing method, and an image processing program for performing image processing on image data by a SIMD (Single Instruction Multiple Data) type parallel processing processor that performs read / write processing on a memory at a predetermined address position. It is.

近年、ＭＰＥＧ−１、ＭＰＥＧ−２、ＭＰＥＧ−４などの動画データの画像符号化および復号化をソフトウェアで処理することが一般的に行われている。ここで、動き補償処理とは、復号画像をマクロブロックという矩形ブロックに分割し、そのマクロブロックごとに過去に復号した参照画像から、動きベクトル（映像の中の各マクロブロックが、どの方向へどのくらい動いているかという情報）だけずれた位置にあるマクロブロックをコピーする処理である（例えば、特許文献１参照）。 In recent years, image encoding and decoding of moving image data such as MPEG-1, MPEG-2, and MPEG-4 are generally performed by software. Here, the motion compensation process divides a decoded image into rectangular blocks called macroblocks, and from the reference image decoded in the past for each macroblock, a motion vector (how much each macroblock in the video is in which direction and how much This is a process of copying a macroblock at a position shifted by (information on whether it is moving) (see, for example, Patent Document 1).

一般的なＭＰＥＧの復号化処理では、画像データは、フレームまたはフィールドごとに輝度成分（Ｙ）、色差成分（Ｃｂ／Ｃｒ）をひとまとまりとして配置する面順次フォーマットで格納される。そして、マクロブロックのサイズは輝度成分に対しては１６×１６画素、色差成分に対しては８×８画素のサイズが採用されている。 In a general MPEG decoding process, image data is stored in a frame sequential format in which a luminance component (Y) and a color difference component (Cb / Cr) are arranged as a group for each frame or field. The macroblock size is 16 × 16 pixels for the luminance component and 8 × 8 pixels for the color difference component.

ここで、Ｉｎｔｅｌ社製のプロセッサには、ＳＩＭＤ（ＳｉｎｇｌｅＩｎｓｔｒｕｃｔｉｏｎＭｕｌｔｉｐｌｅＤａｔａ）という命令セットが準備されている。この命令セットに含まれるＳＩＭＤ命令は１命令で複数のデータに共通の処理を行う命令であり、ＭＰＥＧのように複数の画素に対して同様の処理を行う場合にこのＳＩＭＤ命令を用いて同時に処理することにより、処理を高速化することが可能となっている。 Here, an instruction set called SIMD (Single Instruction Multiple Data) is prepared for an Intel processor. The SIMD instruction included in this instruction set is an instruction that performs processing common to a plurality of data with one instruction. When similar processing is performed on a plurality of pixels as in MPEG, simultaneous processing is performed using this SIMD instruction. By doing so, it is possible to speed up the processing.

また、Ｉｎｔｅｌ社製のプロセッサには、ＭＭＸ／ＳＳＥ（ＳｔｒｅａｍｉｎｇＳＩＭＤＥｘｔｅｎｓｉｏｎ）という６４ビットレジスタを用いたＳＩＭＤ命令とＳＳＥ２という１２８ビットレジスタを用いたＳＩＭＤ命令が命令セットに含まれている。１画素のデータは８ビットで表現されるので、輝度成分に対してはＳＳＥ２を使用し、マクロブロックの水平方向の１６画素を同時に処理することができるようになっている。また、色差成分に対しては、ＭＭＸ／ＳＳＥを用い、ブロックの８画素を同時に処理することができるようになっている。 The Intel processor includes an SIMD instruction using a 64-bit register called MMX / SSE (Streaming SIMD Extension) and a SIMD instruction using a 128-bit register called SSE2. Since 1-pixel data is expressed by 8 bits, SSE2 is used for the luminance component, and 16 pixels in the horizontal direction of the macroblock can be processed simultaneously. For color difference components, MMX / SSE can be used to simultaneously process 8 pixels of the block.

一方、次世代のゲーム機やホームサーバ、デジタルテレビジョンなどに搭載されるＣＰＵとして、１２８ビットのＳＩＭＤ命令のみを持つＲＩＳＣプロセッサが搭載される可能性がある．例えば、ＰｏｗｅｒＰＣプロセッサにおけるＡｌｔｉＶｅｃ命令セットなどがこのようなプロセッサの代表である。このようなＲＩＳＣプロセッサで利用できるＳＩＭＤ命令は、１２８ビットすなわち１６バイトを複数のワード（８ｂｉｔ、１６ｂｉｔ、３２ｂｉｔ）に分割した個々のデータを対象としている。また、今後将来的に、プロセッサによって同時処理可能なデータ量は増える傾向にあり、２５６ビット（３２バイト）を同時に処理できるようなＳＩＭＤ命令を備えたＲＩＳＣプロセッサが登場する可能性もある。 On the other hand, there is a possibility that a RISC processor having only a 128-bit SIMD instruction may be mounted as a CPU mounted on a next-generation game machine, a home server, a digital television, or the like. For example, the AltiVec instruction set in the Power PC processor is representative of such a processor. The SIMD instruction that can be used in such a RISC processor targets individual data obtained by dividing 128 bits, that is, 16 bytes into a plurality of words (8 bits, 16 bits, 32 bits). In the future, the amount of data that can be simultaneously processed by the processor tends to increase, and there is a possibility that a RISC processor equipped with a SIMD instruction capable of simultaneously processing 256 bits (32 bytes) may appear.

特開平１１−２１５５０９号公報JP 11-215509 A

しかしながら、このような１２８ビットＳＩＭＤ命令セットを有するＲＩＳＣプロセッサによりＭＰＥＧの復号化処理の中の動き補償処理を実行する場合、ＳＩＭＤ命令が１２８ビット単位の処理しか実行することができない。このため、このようなＲＩＳＣプロセッサを使用した場合に、マクロブロックが１６×１６画素である輝度成分の動き補償処理については１２８ビットを有効に使用した処理が行えるが、マクロブロックが８×８画素の色差成分の動き補償処理の場合には、１２８ビットのうち６４ビットしか使用されず、残りの６４ビットは有効に利用されない。すなわち、動き補償を行う色差成分（Ｃｂ／Ｃｒ）のマクロブロックの面積は、輝度成分のマクロブロックの１／２となってるが、１２８ビットのＳＩＭＤ命令セットを使用した場合、色差成分についてはＳＩＭＤ命令の並列性を十分に生かしきれず、結果として輝度成分の動き補償処理と同程度の処理負荷がかかってしまい、ＲＩＳＣプロセッサによる並列処理の効率を向上させることができないという問題がある。 However, when the motion compensation process in the MPEG decoding process is executed by the RISC processor having such a 128-bit SIMD instruction set, the SIMD instruction can execute only a process of 128 bits. For this reason, when such a RISC processor is used, the motion compensation processing of the luminance component whose macroblock is 16 × 16 pixels can be performed using 128 bits effectively, but the macroblock has 8 × 8 pixels. In the case of the motion compensation processing of the color difference component, only 64 bits out of 128 bits are used, and the remaining 64 bits are not used effectively. That is, the area of the chrominance component (Cb / Cr) macroblock for performing motion compensation is ½ of the luminance component macroblock, but when the 128-bit SIMD instruction set is used, the chrominance component is SIMD. There is a problem that the parallelism of the instructions cannot be fully utilized, and as a result, the processing load on the same level as the motion compensation processing of the luminance component is applied, and the efficiency of the parallel processing by the RISC processor cannot be improved.

また、一般的なＲＩＳＣプロセッサでは、メモリのアドレッシングはプロセッサの処理単位にアラインメントされたアドレスからしか行えず、自由にアドレッシングを行えるＩｎｔｅｌ社製プロセッサと異なっている。このため、メモリからデータを読み出す場合には、その読み出しアドレスは１６×ｎ（ｎは整数）となる。 Further, in a general RISC processor, memory addressing can be performed only from addresses aligned in the processing unit of the processor, and is different from an Intel processor that can perform addressing freely. Therefore, when data is read from the memory, the read address is 16 × n (n is an integer).

このため、このようなＲＩＳＣプロセッサによってＭＰＥＧの復号化処理を行う場合、ＭＰＥＧ−２のＳＤＴＶ解像度（一般的には水平７２０×垂直４８０画素）では、色差成分の水平方向の画素数が１６の倍数ではないため、Ｉｎｔｅｌ社製のＣＰＵのように自由なアドレスからデータを取得することができるプロセッサを用いて動き補償を行う場合と異なり、１６バイトアライメント右端の８画素を処理する際に次のラインの左端の画素に悪影響を与えるという問題がある。 Therefore, when MPEG decoding is performed by such a RISC processor, the number of pixels in the horizontal direction of the color difference component is a multiple of 16 at the MPEG-2 SDTV resolution (generally horizontal 720 × vertical 480 pixels). Therefore, unlike the case where motion compensation is performed using a processor that can acquire data from a free address such as an Intel CPU, the next line is used when processing the 8 pixels at the right end of the 16-byte alignment. There is a problem of adversely affecting the leftmost pixel.

本発明は、上記の点に鑑みてなされたものであって、並列処理命令が同時に処理可能な画素数よりも少ない画素数の画像処理を行う際に、プロセッサの並列処理性能を有効に利用するとともに、隣接する画素の処理に与える悪影響を防止することができる画像処理装置、画像処理方法および画像処理プログラムを提供することを目的とする。 The present invention has been made in view of the above points, and effectively uses the parallel processing performance of a processor when performing image processing with a number of pixels smaller than the number of pixels that can be processed simultaneously by a parallel processing instruction. In addition, an object is to provide an image processing apparatus, an image processing method, and an image processing program that can prevent adverse effects on processing of adjacent pixels.

上述した課題を解決し、目的を達成するために、本発明は、一命令で複数のデータを処理するとともに予め定められたアドレス位置でメモリに対する読み書き処理を行うＳＩＭＤ（ＳｉｎｇｌｅＩｎｓｔｒｕｃｔｉｏｎＭｕｌｔｉｐｌｅＤａｔａ）型並列処理プロセッサによって、画像データに対する画像処理を行う画像処理方法であって、前記画像データの異なる色成分であって前記ＳＩＭＤ型プロセッサの並列処理命令が同時に処理可能な画素数Ｎよりも少ない複数の画素データを同時に処理する場合に、前記複数の画素データをＮ／Ｍ（Ｍ：整数）画素ごとに各成分の画素データが交互になるように配置変換する変換ステップと、前記変換ステップによって配置変換された複数の画素データを、Ｎ画素を単位として２Ｎ画素分読み出す読み出しステップと、前記ＳＩＭＤ型プロセッサで提供されるバイトシャッフル命令を実行して前記読み出しステップによって読み出された２Ｎ画素の画素データから、画像処理対象のＮ／Ｍ画素の画素データを抽出し、抽出されたＮ／Ｍ画素の画素データに対して画像処理を施す画像処理ステップと、を含むことを特徴とする。 In order to solve the above-described problems and achieve the object, the present invention processes SIMD (Single Instruction Multiple Data) parallel processing that processes a plurality of data with one instruction and performs read / write processing on a memory at a predetermined address position. An image processing method for performing image processing on image data by a processing processor, wherein a plurality of pixels that are different color components of the image data and are smaller than the number N of pixels that can be simultaneously processed by the parallel processing command of the SIMD type processor In the case of processing data simultaneously, the plurality of pixel data are arranged and converted by the conversion step for arranging and converting the pixel data of each component alternately for each N / M (M: integer) pixel. Read multiple pixel data for 2N pixels in units of N pixels A read step and a byte shuffle instruction provided by the SIMD type processor to extract pixel data of N / M pixels to be image-processed from pixel data of 2N pixels read by the read step; And an image processing step for performing image processing on the extracted pixel data of N / M pixels.

また、本発明は、一命令で複数のデータを処理するとともに予め定められたアドレス位置で画像メモリに対する読み書き処理を行うＳＩＭＤ（ＳｉｎｇｌｅＩｎｓｔｒｕｃｔｉｏｎＭｕｌｔｉｐｌｅＤａｔａ）型並列処理プロセッサと、プログラムを記憶し、前記ＳＩＭＤ型並列処理プロセッサがアクセス可能なメモリと、画像データを記憶し、前記ＳＩＭＤ型並列処理プロセッサがアクセス可能な画像メモリと、を備え、前記プログラムは、前記ＳＩＭＤ型並列処理プロセッサを、前記画像データの異なる色成分であって前記ＳＩＭＤ型プロセッサの並列処理命令が同時に処理可能な画素数Ｎよりも少ない複数の画素データを同時に処理する場合に、前記複数の画素データをＮ／Ｍ（Ｍ：整数）画素ごとに各成分の画素データが交互になるように配置変換する変換手段と、前記変換手段によって配置変換された複数の画素データを、Ｎ画素を単位として２Ｎ画素分読み出す読み出し手段と、前記ＳＩＭＤ型プロセッサで提供されるバイトシャッフル命令を実行して前記読み出しステップによって読み出された２Ｎ画素の画素データから、画像処理対象のＮ／Ｍ画素の画素データを抽出し、抽出されたＮ／Ｍ画素の画素データに対して画像処理を施す画像処理手段として機能させることを特徴とする。 The present invention also stores a SIMD (Single Instruction Multiple Data) type parallel processing processor that processes a plurality of data with a single instruction and performs read / write processing on an image memory at a predetermined address position, and stores the program. A memory accessible by the type parallel processor, and an image memory storing the image data and accessible by the SIMD type parallel processor, wherein the program includes the SIMD type parallel processor. When simultaneously processing a plurality of pixel data having different color components and smaller than the number N of pixels that can be processed simultaneously by the parallel processing instruction of the SIMD type processor, the plurality of pixel data is converted into N / M (M: integer). The pixel data of each component alternates for each pixel A conversion unit that performs layout conversion in such a manner, a reading unit that reads a plurality of pixel data that has been converted by the conversion unit for 2N pixels in units of N pixels, and a byte shuffle instruction provided by the SIMD processor Then, the pixel data of N / M pixels to be image-processed is extracted from the pixel data of 2N pixels read out by the reading step, and the image processing is performed on the extracted pixel data of N / M pixels. It functions as a processing means.

また、本発明は、命令で複数のデータを処理するとともに予め定められたアドレス位置でメモリに対する読み書き処理を行うＳＩＭＤ（ＳｉｎｇｌｅＩｎｓｔｒｕｃｔｉｏｎＭｕｌｔｉｐｌｅＤａｔａ）型並列処理プロセッサによって、画像データに対する画像処理を行う画像処理プログラムであって、前記画像データの異なる色成分であって前記ＳＩＭＤ型プロセッサの並列処理命令が同時に処理可能な画素数Ｎよりも少ない複数の画素データを同時に処理する場合に、前記複数の画素データをＮ／Ｍ（Ｍ：整数）画素ごとに各成分の画素データが交互になるように配置変換する変換手順と、前記変換ステップによって配置変換された複数の画素データを、Ｎ画素を単位として２Ｎ画素分読み出す読み出し手順と、前記ＳＩＭＤ型プロセッサで提供されるバイトシャッフル命令を実行して前記読み出しステップによって読み出された２Ｎ画素の画素データから、画像処理対象のＮ／Ｍ画素の画素データを抽出し、抽出されたＮ／Ｍ画素の画素データに対して画像処理を施す画像処理手順とをコンピュータに実行させる画像処理プログラムにかかるものである。 The present invention also provides image processing for performing image processing on image data by a SIMD (Single Instruction Multiple Data) type parallel processing processor that processes a plurality of data with instructions and performs read / write processing on a memory at a predetermined address position. The plurality of pixel data when the program simultaneously processes a plurality of pixel data having different color components of the image data and having a parallel processing instruction of the SIMD processor smaller than the number N of pixels that can be processed simultaneously. For each N / M (M: integer) pixel, the conversion procedure for converting the pixel data of each component alternately, and a plurality of pixel data subjected to the conversion by the conversion step are converted into 2N in units of N pixels. A readout procedure for reading out pixels, and the SIMD type The N / M pixel extracted from the pixel data of the 2N pixels read out by the read step by executing the byte shuffle instruction provided by the image processor is extracted. And an image processing program for causing a computer to execute an image processing procedure for performing image processing on the pixel data.

本発明によれば、画像データの異なる色成分であってＳＩＭＤ型プロセッサの並列処理命令が同時に処理可能な画素数Ｎよりも少ない複数の画素データを同時に処理する場合に、複数の画素データをＮ／Ｍ（Ｍ：整数）画素ごとに各成分の画素データが交互になるように配置変換し、配置変換された複数の画素データを、Ｎ画素を単位として２Ｎ画素分読み出し、バイトシャッフル命令を実行して、読み出された２Ｎ画素の画素データから、画像処理対象のＮ／Ｍ画素の画素データを抽出し、抽出されたＮ／Ｍ画素の画素データに対して画像処理を施すことで、異なる成分の画像データに対する画像処理を、無駄な領域を発生させずに同時に行うことができ、ＳＩＭＤ型プロセッサの並列処理を効果的に使用して動き補償処理を行うことができるという効果を奏する。 According to the present invention, when a plurality of pieces of pixel data that are different color components of image data and are less than the number N of pixels that can be processed simultaneously by a parallel processing instruction of the SIMD processor, / M (M: integer) The pixel is converted so that the pixel data of each component is alternated, and a plurality of pixel data subjected to the conversion are read for 2N pixels in units of N pixels, and a byte shuffle instruction is executed. Then, pixel data of N / M pixels to be image-processed is extracted from the read pixel data of 2N pixels, and the image processing is performed on the extracted pixel data of N / M pixels. Image processing of component image data can be performed simultaneously without generating a useless area, and motion compensation processing can be performed effectively using parallel processing of SIMD type processors. There is an effect that that.

また、本実施の形態の画像復号化装置では、複数の画素データをＮ／Ｍ（Ｍ：整数）画素ごとに各成分の画素データが交互になるように配置変換しているので、自由なアドレスからメモリに対するアクセスが不可能なプロセッサを用いた場合でも、右端の画素データを処理する際に他の領域の画素データに対して悪影響を与えずに画像処理を行うことができるという効果を奏する。 In the image decoding apparatus according to the present embodiment, a plurality of pieces of pixel data are arranged and converted so that the pixel data of each component is alternated for each N / M (M: integer) pixel. Thus, even when a processor that cannot access the memory is used, it is possible to perform image processing without adversely affecting the pixel data in other regions when processing the pixel data on the right end.

以下に添付図面を参照して、この発明にかかる画像処理装置、画像処理方法および画像処理プログラムの最良な実施の形態を詳細に説明する。本実施の形態は本発明の画像処理装置、画像処理方法および画像処理プログラムを、ＭＰＥＧの復号化処理を行う画像復号化方法、画像復号化装置および画像復号化プログラムに適用したものである。 Exemplary embodiments of an image processing apparatus, an image processing method, and an image processing program according to the present invention are explained in detail below with reference to the accompanying drawings. In the present embodiment, the image processing apparatus, the image processing method, and the image processing program of the present invention are applied to an image decoding method, an image decoding apparatus, and an image decoding program that perform MPEG decoding.

図１は、本発明の実施の形態にかかる画像復号化装置の機能的構成を示すブロック図である。画像復号化装置は、図１に示すように、ストリームバッファ１２０と、ストリームバッファ制御部１３０と、デコーダ制御部１１０と、デコーダ１００と、画像メモリ１４０とを主に備えている。 FIG. 1 is a block diagram showing a functional configuration of an image decoding apparatus according to an embodiment of the present invention. As shown in FIG. 1, the image decoding apparatus mainly includes a stream buffer 120, a stream buffer control unit 130, a decoder control unit 110, a decoder 100, and an image memory 140.

また、デコーダ１００は、図１に示すように、可変長復号化部１０１と、逆量子化部１０２と、逆スキャン部１０３と、逆ＤＣＴ部１０４と、動き補償部１０５とを備えている。 As shown in FIG. 1, the decoder 100 includes a variable length decoding unit 101, an inverse quantization unit 102, an inverse scan unit 103, an inverse DCT unit 104, and a motion compensation unit 105.

ストリームバッファ１２０は、ＤＶＤなどの記録媒体やインターネットなどのネットワーク、衛星などの放送電波を介して、装置外部から入力された符号化ストリーム（ＭＰＥＧ２ビットストリーム）を蓄積するＤＲＡＭ（ＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）などのメモリである。 The stream buffer 120 is a DRAM (Dynamic Random Access Memory) that stores an encoded stream (MPEG2 bit stream) input from the outside of the apparatus via a recording medium such as a DVD, a network such as the Internet, or a broadcast wave such as a satellite. Memory.

ストリームバッファ制御部１３０は、デコーダ制御部１１０の指示に基づいて、符号化ストリームの入力時刻のばらつきを均一化して、デコーダ１００に符号化ストリームを入力する処理部である。 The stream buffer control unit 130 is a processing unit that equalizes the variation in the input time of the encoded stream based on an instruction from the decoder control unit 110 and inputs the encoded stream to the decoder 100.

デコーダ制御部１１０は、ストリームバッファ制御部１３０を介して、ストリームバッファ１２０から符号化ストリームをデコーダ１００に出力する処理部である。 The decoder control unit 110 is a processing unit that outputs an encoded stream from the stream buffer 120 to the decoder 100 via the stream buffer control unit 130.

デコーダ１００は、概略的に、ストリームバッファ制御部１３０から入力された符号化ストリームをデコーダ制御部１１０の指示に基づいて復号して、画像メモリ１４０に出力する処理部である。この復号処理は、可変長復号化部１０１、逆量子化部１０２、逆スキャン部１０３、逆ＤＣＴ部１０４および動き補償部１０８の各処理部による処理を経ておこなわれる。以下に、デコーダ１００の各処理部の処理を説明する。 The decoder 100 is generally a processing unit that decodes the encoded stream input from the stream buffer control unit 130 based on an instruction from the decoder control unit 110 and outputs the decoded stream to the image memory 140. This decoding process is performed through processing by the processing units of the variable length decoding unit 101, the inverse quantization unit 102, the inverse scan unit 103, the inverse DCT unit 104, and the motion compensation unit 108. Hereinafter, processing of each processing unit of the decoder 100 will be described.

可変長復号化部１０１は、符号化ストリームに含まれる可変長符号化データを復号して量子化ＤＣＴ係数を復元する処理部である。具体的には、ストリームバッファ制御部１３０から入力された符号化ストリームをデコーダ制御部１１０の指示に従い、マクロブロックを分離し、各マクロブロックの量子化ＤＣＴ係数を復号し、復号した量子化ＤＣＴ係数を逆量子化部１０２に出力する。 The variable length decoding unit 101 is a processing unit that decodes variable length encoded data included in an encoded stream to restore quantized DCT coefficients. Specifically, the coded stream input from the stream buffer control unit 130 is separated into macroblocks according to instructions from the decoder control unit 110, the quantized DCT coefficients of each macroblock are decoded, and the decoded quantized DCT coefficients are decoded. Is output to the inverse quantization unit 102.

また、可変長復号化部１０１、予測モードや予測ベクトルなどのパラメータの復号もおこない、復号した予測モードおよび予測ベクトルを動き補償部１０５に出力する。 The variable length decoding unit 101 also decodes parameters such as the prediction mode and the prediction vector, and outputs the decoded prediction mode and prediction vector to the motion compensation unit 105.

逆量子化部１０２は、可変長復号化部１０１から入力された量子化ＤＣＴ係数を逆量子化してＤＣＴ係数を復号し、復号したＤＣＴ係数を逆スキャン部１０３に出力する処理部である。 The inverse quantization unit 102 is a processing unit that inversely quantizes the quantized DCT coefficient input from the variable length decoding unit 101 to decode the DCT coefficient, and outputs the decoded DCT coefficient to the inverse scan unit 103.

逆スキャン部１０３は、逆量子化部１０２から入力されたＤＣＴ係数を逆スキャンし、逆スキャンしたＤＣＴ係数を逆ＤＣＴ部１０４に出力する処理部である。なお、図１においては、逆量子化処理をおこなった後に逆スキャン処理をおこなう場合の構成を示しているが、必ずしもこの順序にしたがう必要はなく、逆スキャン処理をおこなった後に逆量子化処理をおこなう構成でもよい。 The inverse scan unit 103 is a processing unit that inversely scans the DCT coefficient input from the inverse quantization unit 102 and outputs the inversely scanned DCT coefficient to the inverse DCT unit 104. Note that FIG. 1 shows a configuration in which the inverse scan process is performed after the inverse quantization process is performed, but it is not always necessary to follow this order, and the inverse quantization process is performed after the inverse scan process is performed. The structure to perform may be sufficient.

逆ＤＣＴ部１０４は、逆スキャン部１０３から入力されたＤＣＴ係数を逆ＤＣＴ変換して符号化前の実画像成分である画像データ（画素値を持つデータ）を復号し、復号した画像データを動き補償部１０８に出力する処理部である。 The inverse DCT unit 104 performs inverse DCT transform on the DCT coefficient input from the inverse scan unit 103 to decode image data (data having pixel values) that is a real image component before encoding, and moves the decoded image data. It is a processing unit that outputs to the compensation unit 108.

動き補償部１０８は、逆ＤＣＴ部１０４から入力された画像データと、可変長復号化部１０１から入力された予測モードおよび予測ベクトルとに基づいて、動き補償をおこない、動き補償をおこなった画像データを画像メモリ１４０に書き込む処理部であり、機能概念的に、変換部１０５と、書き込み制御部１０６と読み出し制御部１０７とから構成される。ここで、変換部１０５は本発明における変換手段を、書き込み制御部１０６は本発明における画像処理手段を、読み出し制御部１０７は本発明における読み出し手段を構成する。 The motion compensation unit 108 performs motion compensation based on the image data input from the inverse DCT unit 104 and the prediction mode and prediction vector input from the variable length decoding unit 101, and the image data subjected to motion compensation. Is a processing unit for writing the image data into the image memory 140, and is configured from a conversion unit 105, a writing control unit 106, and a reading control unit 107 in terms of functional concept. Here, the conversion unit 105 constitutes the conversion means in the present invention, the write control unit 106 constitutes the image processing means in the present invention, and the read control unit 107 constitutes the read means in the present invention.

動き補償部１０８における変換部１０５は、色差成分ＣｂおよびＣｒの画素データを、各成分８画素ごとに交互に配置してフレームメモリ（Ｃｂ／Ｃｒ）１４４に配置するものである。 The conversion unit 105 in the motion compensation unit 108 arranges the pixel data of the color difference components Cb and Cr in the frame memory (Cb / Cr) 144 alternately for every 8 pixels of each component.

動き補償部１０８における読み出し制御部１０７は、複数の画像データを画像メモリから読み出すものである。具体的には、読み出し制御部１０７は、画像メモリ１４０のフレームメモリ（Ｙ）１４１から輝度成分Ｙの画素データを１６画素分読み出し、またフレームメモリ（Ｃｂ／Ｃｒ）１４４から色差成分Ｃｂ，Ｃｒが８画素ごとに交互に配置された画素データを１６画素分読み出す処理を行う。 The read control unit 107 in the motion compensation unit 108 reads a plurality of image data from the image memory. Specifically, the readout control unit 107 reads out the pixel data of the luminance component Y for 16 pixels from the frame memory (Y) 141 of the image memory 140, and the color difference components Cb and Cr are read from the frame memory (Cb / Cr) 144. A process of reading out 16 pixels of pixel data arranged alternately every 8 pixels is performed.

動き補償部１０８における書き込み制御部１０６は、読み出し制御部１０７から読み出された輝度成分の１６画素分の画素データからバイトシャッフル命令を実行して、動き補償処理に必要な画素を抽出する。また、書き込み制御部１０６は、読み出し制御部１０７から読み出された色差成分Ｃｂ／Ｃｒの１６画素分の画素データからバイトシャッフル命令を実行して、動き補償処理に必要な画素を抽出する。 The write control unit 106 in the motion compensation unit 108 executes a byte shuffle command from the pixel data for 16 pixels of the luminance component read from the read control unit 107, and extracts pixels necessary for the motion compensation process. Further, the writing control unit 106 executes a byte shuffle command from the pixel data for 16 pixels of the color difference component Cb / Cr read from the reading control unit 107, and extracts pixels necessary for the motion compensation process.

画像メモリ１４０は、動き補償処理の対象となる画像データを格納するものである。画像メモリ１４０には、輝度成分Ｙの画素データを保存する領域であるフレームメモリ（Ｙ）１４１、色差成分Ｃｂの画素データと色差成分Ｃｒの画素データとを交互に配置した画像データを保存する領域であるフレームメモリ（Ｃｂ／Ｃｒ）１４４がそれぞれ確保されている。なお、フレームメモリ（ｙ）１４１、フレームメモリ（Ｃｂ／Ｃｒ）１４４は、図中はそれぞれ一つであるが、実際には、動き補償処理を行うために、参照画像のフレームメモリと復号画像のフレームメモリがフレームメモリ（ｙ）１４１、フレームメモリ（Ｃｂ／Ｃｒ）１４４のそれぞれ確保されている。 The image memory 140 stores image data to be subjected to motion compensation processing. In the image memory 140, a frame memory (Y) 141, which is an area for storing pixel data of the luminance component Y, an area for storing image data in which the pixel data of the color difference component Cb and the pixel data of the color difference component Cr are alternately arranged Frame memory (Cb / Cr) 144 is secured. The frame memory (y) 141 and the frame memory (Cb / Cr) 144 are each one in the figure, but actually, in order to perform motion compensation processing, the frame memory of the reference image and the decoded image A frame memory (y) 141 and a frame memory (Cb / Cr) 144 are secured as frame memories.

本実施形態の画像復号化装置は、１６バイトアライメントでメモリに対する読み書き処理を行うＲＩＳＣのＳＩＭＤ（ＳｉｎｇｌｅＩｎｓｔｒｕｃｔｉｏｎＭｕｌｔｉｐｌｅＤａｔａ）型並列処理プロセッと、ハードディスク装置（ＨＤＤ）などの記録装置と、画像メモリ１４０、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ））等の記憶装置とを有する動画再生装置あるいはコンピュータを利用した構成である。 The image decoding apparatus according to the present embodiment includes a RISC SIMD (Single Instruction Multiple Data) parallel processing processor that performs read / write processing on a memory with 16-byte alignment, a recording device such as a hard disk device (HDD), an image memory 140, This is a configuration using a moving image playback device or computer having a storage device such as a RAM (Random Access Memory) and a ROM (Read Only Memory).

本実施形態の画像復号化装置で実行される画像復号化プログラムは、本実施形態の画像復号化装置が動画再生装置の場合にはＲＯＭに組み込まれた形態で提供される。また、本実施形態の画像復号化装置がコンピュータを利用した構成の場合には、、画像復号化プログラムは、インストール可能な形式又は実行可能な形式のファイルでＣＤ−ＲＯＭ、フレキシブルディスク（ＦＤ）、ＣＤ−Ｒ、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋ）等のコンピュータで読み取り可能な記録媒体に記録されて提供される。 The image decoding program executed by the image decoding apparatus according to the present embodiment is provided in a form incorporated in a ROM when the image decoding apparatus according to the present embodiment is a moving image reproduction apparatus. When the image decoding apparatus according to the present embodiment is configured using a computer, the image decoding program is a file in an installable format or an executable format, such as a CD-ROM, a flexible disk (FD), The program is recorded on a computer-readable recording medium such as a CD-R or a DVD (Digital Versatile Disk).

また、本実施形態の画像復号化装置で実行される画像復号化プログラムを、インターネット等のネットワークに接続されたコンピュータ上に格納し、ネットワーク経由でダウンロードさせることにより提供するように構成しても良い。また、本実施形態の画像復号化装置で実行される画像復号化プログラムをインターネット等のネットワーク経由で提供または配布するように構成しても良い。 The image decoding program executed by the image decoding apparatus according to the present embodiment may be provided by being stored on a computer connected to a network such as the Internet and downloaded via the network. . The image decoding program executed by the image decoding apparatus according to the present embodiment may be provided or distributed via a network such as the Internet.

本実施形態の画像復号化装置で実行される画像復号化プログラムは、ＲＯＭあるいは上記記憶媒体から読み出して実行することにより主記憶装置上にロードされ、上述した動き補償部１０８、可変長復号化部１０１、逆量子化部１０２、逆スキャン部１０３、逆ＤＣＴ部が主記憶装置上に生成されるようになっている。また、動き補償部１０８、可変長復号化部１０１、逆量子化部１０２、逆スキャン部１０３、逆ＤＣＴ部１０４の一部または全てをハードウェアで構成してもよい。 The image decoding program executed by the image decoding apparatus according to the present embodiment is loaded on the main storage device by being read from the ROM or the storage medium and executed, and the motion compensation unit 108 and the variable length decoding unit described above are loaded. 101, an inverse quantization unit 102, an inverse scan unit 103, and an inverse DCT unit are generated on the main storage device. Further, a part or all of the motion compensation unit 108, the variable length decoding unit 101, the inverse quantization unit 102, the inverse scan unit 103, and the inverse DCT unit 104 may be configured by hardware.

次に、以上のように構成された本実施の形態の画像復号化装置による各種について説明する。本実施の形態に係る画像復号化装置は、デコーダ１００の可変長復号化部１０１による可変長復号化処理、逆量子化部１０２による逆量子化処理、逆スキャン部１０３による逆スキャン処理および逆ＤＣＴ部１０４による逆ＤＣＴ変換処理を受けて動き補償部１０８による動き補償処理を実行する。 Next, various types of the image decoding apparatus according to the present embodiment configured as described above will be described. The image decoding apparatus according to the present embodiment includes a variable length decoding process by the variable length decoding unit 101 of the decoder 100, an inverse quantization process by the inverse quantization unit 102, an inverse scan process by the inverse scan unit 103, and an inverse DCT. In response to the inverse DCT conversion processing by the unit 104, the motion compensation processing by the motion compensation unit 108 is executed.

本実施の形態の画像復号化装置では、動き補償部１０８によって、動き補償処理が実行される。本実施の形態では、変換部１０５によって、フレームメモリにおける画像データのフォーマットが色差成分Ｃｂ／Ｃｒを８画素（８バイト）ごとに交互に配置されたフォーマットとなっている。図２は、通常の面順次フォーマットを示す説明図である。図３に示すように、面順次フォーマットでは、フレームまたはフィールドごとに輝度成分（ｙ）、色差成分（Ｃｂ／Ｃｒ）の画素をひとまとまりとして配置している。一方、図３は、本実施の形態の画像復号化装置で使用する画像データのフォーマットを示す説明図である。図３に示すように、フレームメモリ（Ｙ）１４２の輝度成分のフォーマットは図２に示す面順次と同様であるが、フレームメモリ（Ｃｂ／Ｃｒ）１４４には、色差成分ＣｂとＣｒの画素が８バイトごとに交互に配置されている。通常，ＭＰＥＧにおける色差成分の復号化処理は８×８画素の単位のマクロブロックごとに行われる。このため、図３に示すフォーマットで色差成分の画素データを格納することによる処理のオーバーヘッドはない。 In the image decoding apparatus according to the present embodiment, motion compensation processing is executed by the motion compensation unit 108. In the present embodiment, the conversion unit 105 sets the format of the image data in the frame memory so that the color difference components Cb / Cr are alternately arranged every 8 pixels (8 bytes). FIG. 2 is an explanatory diagram showing a normal frame sequential format. As shown in FIG. 3, in the frame sequential format, pixels of luminance component (y) and color difference component (Cb / Cr) are arranged as a group for each frame or field. On the other hand, FIG. 3 is an explanatory diagram showing a format of image data used in the image decoding apparatus according to the present embodiment. As shown in FIG. 3, the format of the luminance component of the frame memory (Y) 142 is the same as the frame sequential shown in FIG. 2, but the frame memory (Cb / Cr) 144 contains the pixels of the color difference components Cb and Cr. Alternatingly arranged every 8 bytes. Usually, the decoding process of the color difference component in MPEG is performed for each macroblock of 8 × 8 pixels. For this reason, there is no processing overhead due to storing the color difference component pixel data in the format shown in FIG.

本実施の形態の画像復号化装置では、まず輝度信号（Ｙ）についての動き補償処理を行い、次いで色差信号（Ｃｂ／Ｃｒ）についての動き補償処理を行う。輝度信号（Ｙ）についての動き補償処理については後述し、まず色差信号（Ｃｂ／Ｃｒ）についての動き補償処理について説明する。図４は、色差信号（Ｃｂ／Ｃｒ）についての動き補償処理の手順を示すフローチャートである。 In the image decoding apparatus according to the present embodiment, first, motion compensation processing is performed on the luminance signal (Y), and then motion compensation processing is performed on the color difference signal (Cb / Cr). The motion compensation process for the luminance signal (Y) will be described later. First, the motion compensation process for the color difference signal (Cb / Cr) will be described. FIG. 4 is a flowchart showing the procedure of the motion compensation process for the color difference signal (Cb / Cr).

まず、読み出し制御部１０７は、参照画像のフレームメモリ（Ｃｂ／Ｃｒ）１４４の参照ブロックの左上端のアドレスｒａｄｒを数１式のように算出する（ステップＳ４０１）。 First, the read control unit 107 calculates the address radr at the upper left corner of the reference block of the frame memory (Cb / Cr) 144 of the reference image as shown in Equation 1 (step S401).

ここで、（ｘ，ｙ）は復号ブロックの左上端の座標、（ｍｖｘ，ｍｖｙ）は動きベクトル、ｗｉｄｔｈは輝度成分の画像の幅（ＳＤＴＶでは７２０画素）を示している。また、ｍｖｘ’は動きベクトルの水平方向成分ｍｖｘから数２式で求められる。 Here, (x, y) represents the coordinates of the upper left corner of the decoded block, (mvx, mvy) represents the motion vector, and width represents the width of the luminance component image (720 pixels in SDTV). In addition, mvx ′ is obtained from the horizontal direction component mvx of the motion vector by Equation (2).

また、ｏｆｆｓｅｔ（ｒｅｆ，ｃ）は、参照画像の色差成分（Ｃｏｌｏｒ）のフレームメモリの左上端のアドレスを示す。ここでｏｆｆｓｅｔ（ｒｅｆ，ｃ）は必ず１６バイトアラインされたアドレスとなる。色差成分Ｃｂ／Ｃｒに対しては８画素単位に処理が行われるので、ｘは必ず１６の倍数となる。通常、動きベクトルは０．５画素単位の精度を有するが、ここでは簡単のため１画素精度とする。このため，上式により算出されたアドレスｒａｄｒは一般的には１６バイトアラインされていないことになる。 Further, offset (ref, c) indicates the address of the upper left corner of the frame memory of the color difference component (Color) of the reference image. Here, offset (ref, c) is always a 16-byte aligned address. Since the color difference component Cb / Cr is processed in units of 8 pixels, x is always a multiple of 16. Normally, the motion vector has an accuracy of 0.5 pixel unit, but here it is assumed to be 1 pixel accuracy for simplicity. For this reason, the address radr calculated by the above equation is generally not 16-byte aligned.

次に、読み出し制御部１０５は、復号画像のフレームメモリ（Ｃｂ／Ｃｒ）１４４の復号ブロックの左上端のアドレスｗａｄｒを数３式のように算出する（ステップＳ４０２）。 Next, the read control unit 105 calculates the address waddr at the upper left end of the decoded block of the decoded image frame memory (Cb / Cr) 144 as shown in Equation 3 (step S402).

ここで、ｏｆｆｓｅｔ（ｃｕｒ，ｃ）は、復号画像（ＣＵＲｒｅｎｔ）の色差成分のフレームメモリ（Ｃｂ／Ｃｒ）１４４の左上端のアドレスを示す。算出されたｗａｄｒは必ず１６バイトにアラインメントされている。 Here, offset (cur, c) indicates the address of the upper left corner of the frame memory (Cb / Cr) 144 of the color difference component of the decoded image (CURrent). The calculated waddr is always aligned to 16 bytes.

次に、読み出し制御部１０５によって、参照ブロックの左上端アドレスｒａｄｒの下位４ビットを０としたアドレス（（ｒａｄｒ／１６）＊１６）から１６バイト（１２８ビット）の画像データを読み出し、レジスタＲ０にロードする（ステップＳ４０３）。 Next, the read control unit 105 reads 16 bytes (128 bits) of image data from an address ((radr / 16) * 16) in which the lower 4 bits of the upper left end address radr of the reference block are 0, and stores them in the register R0. Load (step S403).

ここで、アドレスｒａｄｒの下位４ビットを０としたアドレスを求めているのは、ステップＳ４０１で算出されたアドレスｒａｄｒが１６バイトアライメントされた位置のアドレスではないため、画像データのレジスタＲ０へロード処理が行えるように、アドレスｒａｄｒの直前の１６バイトアライメントされた位置のアドレスを求めているものである。 Here, since the address radr calculated in step S401 is not the address at the position aligned by 16 bytes, the address in which the lower 4 bits of the address radr are set to 0 is loaded into the register R0 of the image data. The address of the 16 byte aligned position immediately before the address radr is obtained.

次に、読み出し制御部１０５によって、参照ブロックの左上端アドレスｒａｄｒの下位４ビットを０としたアドレスから１６バイト隔てて隣接するアドレス（（ｒａｄｒ／１６）＊１６＋１６）から１６バイト（１２８ビット）の画像データを読み出し、レジスタＲ１にロードする（ステップＳ４０４）。 Next, the read control unit 105 transfers 16 bytes (128 bits) from an adjacent address ((radr / 16) * 16 + 16) 16 bytes apart from an address in which the lower 4 bits of the upper left end address radr of the reference block are 0. Image data is read and loaded into the register R1 (step S404).

そして、次に書き込み制御部１０６によって、制御テーブルから（ｒａｄｒ＆０ｘ０７）番目のバイトシャッフル制御ワードを読み出してレジスタＲ２にロードする（ステップＳ４０５）。ここで、バイトシャッフル制御ワードは、次ステップで実行するＳＩＭＤ型命令セットで提供されるバイトシャッフル命令において、動き補償処理のためにＲ０およびＲ１にロードされた画素から抽出する画素を定めたものである。このバイトシャッフル制御ワードは、参照ブロックのアドレスｒａｄｒと１６バイトアライメントの位置のアドレス差ごとに、レジスタＲ０にロードした画素データ（８画素）とレジスタＲ１にロードした画素データ（８画素）の３２画素から抽出する１６画素のレジスタＲ０とＲ１とを連結した状態での位置を予め制御テーブルに登録されている。 Then, the write control unit 106 reads the (radr & 0x07) th byte shuffle control word from the control table and loads it into the register R2 (step S405). Here, the byte shuffle control word defines pixels to be extracted from the pixels loaded in R0 and R1 for motion compensation processing in the byte shuffle instruction provided by the SIMD type instruction set executed in the next step. is there. This byte shuffle control word includes 32 pixels of pixel data (8 pixels) loaded into the register R0 and pixel data (8 pixels) loaded into the register R1 for each address difference between the address radr of the reference block and the 16-byte alignment position. The positions in the state where the 16-pixel registers R0 and R1 extracted from are connected to the control table are registered in advance.

ここで、参照ブロックのアドレスｒａｄｒと１６バイトアライメントの位置のアドレス差はｒａｄｒ＆０ｘ０７で算出する。なお、この制御テーブルは、本実施の形態ではメモリ中に保持する他、画像復号化プログラムに保持するように構成してもよい。 Here, the address difference between the address radr of the reference block and the position of the 16-byte alignment is calculated by radr & 0x07. In this embodiment, the control table may be held in the image decoding program in addition to being held in the memory.

図５は、制御テーブルの内容を示す説明図である。ｒａｄｒ＆０ｘ０７の値は、ｒａｄｒの下位４ビットを０にしたアドレス、すなわち参照ブロックのアドレスｒａｄｒと１６バイトアライメントの位置のアドレス差である。 FIG. 5 is an explanatory diagram showing the contents of the control table. The value of radr & 0x07 is an address obtained by setting the lower 4 bits of radr to 0, that is, the address difference between the address radr of the reference block and the position of 16 byte alignment.

また、制御ワードの１６個の数値は、レジスタＲ０とＲ１とを連結した場合の、先頭位置からのバイト数を示している。 The 16 numerical values of the control word indicate the number of bytes from the head position when the registers R0 and R1 are concatenated.

次に、書き込み制御部１０６は、バイトシャッフル命令（例えば、ｂｓｈｕｆｆｌｅ命令）を実行し、レジスタＲ２の制御ワードに従って、レジスタＲ０とＲ１に格納された画素から動き補償に必要な画素を１６個抽出し、その実行結果、すなわち抽出された１６個の画素をレジスタＲ３に格納する（ステップＳ４０６）。 Next, the write control unit 106 executes a byte shuffle instruction (for example, bshuffle instruction), and extracts 16 pixels necessary for motion compensation from the pixels stored in the registers R0 and R1 according to the control word of the register R2. The execution result, that is, the extracted 16 pixels are stored in the register R3 (step S406).

ここで、バイトシャッフル命令について説明する。バイトシャッフル命令とは、ＳＩＭＤ命令を備えたＲＩＳＣプロセッサにおいて，２つのレジスタの中から，各バイトごとに２つのレジスタの任意のバイトを取り出す命令である。このバイトシャッフル命令は、１６バイトアライメントの位置のアドレスでしかメモリにアクセスできない場合に、メモリの任意のアドレスからデータを読み出しまたはデータの書き込みを行うために設けられた命令である。このようなバイトシャッフル命令としては、例えば，ＳｕｎＭｉｃｒｏｓｙｓｔｅｍ（Ｒ）社のＵｌｔｒａＳＰＡＲＣ−ＩＩＩに実装されているＳＩＭＤ命令であるＶＩＳ命令セットで提供されるｂｓｈｕｆｆｌｅ命令や、ＰｏｗｅｒＰＣアーキテクチャにおけるＡｌｔｉＶｅｃ命令セットのｖｐｅｒｍ命令も同様の命令が該当する。 Here, the byte shuffle instruction will be described. The byte shuffle instruction is an instruction for extracting an arbitrary byte of two registers for each byte from two registers in a RISC processor having a SIMD instruction. This byte shuffle instruction is an instruction provided to read data from or write data to an arbitrary address in the memory when the memory can be accessed only at an address at a 16-byte alignment position. As such a byte shuffle instruction, for example, a bshuffle instruction provided in the VIS instruction set that is a SIMD instruction implemented in the Ultra SPARC-III of Sun Microsystems (R), or a vperm instruction in the AltVec instruction set in the PowerPC architecture. The same command applies to.

図６は、バイトシャッフル命令の動作を説明するための説明図である。例えば、図６に示すように、レジスタＲＡ（１６バイト）にＡ０からＡＦの各１バイトの値が格納され、レジスタＲＢ（１６バイト）にＢ０からＢＦの各１バイトの値が格納されている場合を考える。ここで、レジスタＲＡとＲＢとを連結した状態を考え、先頭から１バイトごとに００〜１ｆまでの番号を振る。制御ワードでかかる番号を指定したデータがバイトシャッフル命令の実行結果として出力される。 FIG. 6 is an explanatory diagram for explaining the operation of the byte shuffle instruction. For example, as shown in FIG. 6, a value of 1 byte of each of A0 to AF is stored in the register RA (16 bytes), and a value of 1 byte of each of B0 to BF is stored in the register RB (16 bytes). Think about the case. Here, considering a state in which the registers RA and RB are connected, numbers from 00 to 1f are assigned for each byte from the top. Data specifying such a number in the control word is output as the execution result of the byte shuffle instruction.

図６の例では、制御ワードに０ｘ０ｆ０００３０ｅ００１８０６１ａ０５１ａ１ｂ０６１２０３０８０ｃが設定されているため、バイトシャッフル命令を実行すると、レジスタＲＡ、ＲＢの「０ｆ」、「００」、「０３」、「０ｅ」、「００」、「１８」、「０６」、「１ａ」、「０５」、「１ａ」、「１ｂ」、「０６」、「１２」、「０３」、「０８」、「０ｃ」の位置の各データが図６のバイトシャッフル命令実行後のように抽出されることになる。 In the example of FIG. 6, since 0x0f000030e0018061a051a051a061302080c is set in the control word, when a byte shuffle instruction is executed, “0f”, “00”, “03”, “0e”, “00”, “00” of the registers RA and RB are executed. The data at the positions of “18”, “06”, “1a”, “05”, “1a”, “1b”, “06”, “12”, “03”, “08”, “0c” are shown in FIG. This is extracted after the execution of the byte shuffle instruction.

ここで、上記処理を具体例をあげて説明する。図７は、ｒａｄｒの下位４ビットが４であった場合、すなわちｒａｄｒが１６バイトアライメントのアドレスから４バイトずれていた場合におけるステップ４０３、Ｓ４０４を実行後のレジスタＲ０、Ｒ１の値、およびステップＳ４０６のバイトシャッフル命令実行後のレジスタＲ３の値の一例を示す説明図である。 Here, the above process will be described with a specific example. FIG. 7 shows the values of the registers R0 and R1 after executing steps 403 and S404 when the lower 4 bits of radr are 4, that is, when radr is shifted by 4 bytes from the 16-byte alignment address, and step S406. It is explanatory drawing which shows an example of the value of the register | resistor R3 after this byte shuffle instruction execution.

ｒａｄｒの下位４ビットが４であった場合には、図５から制御ワードがｒａｄｒ＆０ｘ０７が４の列「０４０５０６０７１０１１１２１３０ｃ０ｄ０ｅ０ｆ１８１９１ａ１ｂ」となる。このため、図７に示すように、バイトシャッフル命令を実行するとＲ３に、Ｃｂの画素としてレジスタＲ０から０４、０５、０６、０７の画素と、レジスタＲ１から１０、１１、１２、１３の画素が抽出される。また、Ｃｒの画素としてレジスタＲ０から０ｃ、０ｄ、０ｅ、０ｆの画素と、レジスタＲ１から１８、１９、１ａ、１ｂの画素が抽出される。 When the lower 4 bits of radr are 4, the control word becomes the sequence “04 05 06 07 10 11 12 13 0c 0d 0e 0f 18 19 1a 1b” where radr & 0x07 is 4 from FIG. Therefore, as shown in FIG. 7, when a byte shuffle instruction is executed, the pixels R0 to 04, 05, 06, and 07 and the pixels R1 to 10, 11, 12, and 13 are stored as Rb pixels in R3. Extracted. Further, as pixels of Cr, pixels of registers R0 to 0c, 0d, 0e, and 0f and pixels of registers R1 to 18, 19, 1a, and 1b are extracted.

また、図８は、ｒａｄｒの下位４ビットが６であった場合、すなわちｒａｄｒが１６バイトアライメントのアドレスから６バイトずれていた場合におけるステップ５０３、Ｓ５０４を実行後のレジスタＲ０、Ｒ１の値、およびステップＳ５０６のバイトシャッフル命令実行後のレジスタＲ３の値の一例を示す説明図である。 FIG. 8 shows the values of the registers R0 and R1 after executing step 503 and S504 when the lower 4 bits of radr are 6, that is, when radr is shifted by 6 bytes from the 16-byte alignment address, and It is explanatory drawing which shows an example of the value of the register | resistor R3 after the byte shuffle instruction execution of step S506.

ｒａｄｒの下位４ビットが６であった場合には、図５から制御ワードがｒａｄｒ＆０ｘ０７が４の列「０６０７１０１１１２１３１４１５０ｅ０ｆ１８１９１ａ１ｂ１ｃ１ｄ」となる。このため、図８に示すように、バイトシャッフル命令を実行するとＲ３に、Ｃｂの画素としてレジスタＲ０から０６、０７の画素と、レジスタＲ１から１０、１１、１２、１３、１４，１５の画素が抽出される。また、Ｃｒの画素としてレジスタＲ０から０ｅ、０ｆの画素と、レジスタＲ１から１８、１９、１ａ、１ｂ、１ｃ、１ｄの画素が抽出される。これにより，動き補償に必要なＣｂ／Ｃｒ成分を同時に取得することができる。 When the lower 4 bits of radr are 6, the control word becomes the column “06 07 10 11 12 13 1415 0e 0f 18 19 1a 1b 1c 1d” with radr & 0x07 as shown in FIG. Therefore, as shown in FIG. 8, when a byte shuffle instruction is executed, the pixels R0 to 06 and 07 and the pixels R11 to 10, 11, 12, 13, 14, and 15 are registered as Cb pixels in R3. Extracted. Further, as the pixels of Cr, the pixels of registers R0 to 0e and 0f and the pixels of registers R1 to 18, 19, 1a, 1b, 1c and 1d are extracted. As a result, Cb / Cr components necessary for motion compensation can be acquired simultaneously.

バイトシャッフル命令の実行後、書き込み制御部１０６は、レジスタＲ３の内容を復号ブロックの左上端のアドレスｗａｄｒから書き込む（ステップＳ４０７）。そして、読み出し制御部１０５は、参照ブロックの次のラインの先頭アドレスを取得してｒａｄｒに格納し（ステップＳ４０８）、さらに復号ブロックの次のラインの先頭アドレスを取得してｗａｄｒに格納する（ステップＳ４０９）。そして、ステップＳ４０３からＳ４０９までの処理を８回繰り返し、マクロブロック全ての画素、すなわち８×８画素について上記処理を行う。これにより色差成分の動き補償処理が完了する。 After execution of the byte shuffle instruction, the write control unit 106 writes the contents of the register R3 from the address waddr at the upper left corner of the decoded block (step S407). Then, the read control unit 105 acquires the head address of the next line of the reference block and stores it in radr (step S408), and further acquires the head address of the next line of the decoded block and stores it in waddr (step S408). S409). Then, the processing from step S403 to S409 is repeated eight times, and the above processing is performed for all the pixels of the macroblock, that is, 8 × 8 pixels. This completes the motion compensation process for the color difference component.

次に、色差信号の動き補償処理の前に実行される輝度信号の動き補償処理について説明する。図９は、輝度成分の動き補償処理の手順を示すフローチャートである。なお、画像データの輝度成分については、図２に示すように従来と同様、面順次フォーマットでフレームメモリ（Ｙ）に格納されている。 Next, the luminance signal motion compensation processing executed before the color difference signal motion compensation processing will be described. FIG. 9 is a flowchart showing a procedure of luminance component motion compensation processing. As shown in FIG. 2, the luminance component of the image data is stored in the frame memory (Y) in the frame sequential format as in the conventional case.

まず、読み出し制御部１０７は、参照画像のフレームメモリ（Ｙ）１４１の参照ブロックの左上端のアドレスｒａｄｒを次式のように算出する（ステップＳ９０１）。 First, the read control unit 107 calculates the upper left address radr of the reference block of the reference image frame memory (Y) 141 as in the following equation (step S901).

ここで、（ｘ，ｙ）は復号ブロックの左上端の座標、（ｍｖｘ，ｍｖｙ）は動きベクトル、ｗｉｄｔｈは輝度成分の画像の幅（ＳＤＴＶでは７２０画素）を示している。 Here, (x, y) represents the coordinates of the upper left corner of the decoded block, (mvx, mvy) represents the motion vector, and width represents the width of the luminance component image (720 pixels in SDTV).

また、ｏｆｆｓｅｔ（ｒｅｆ，Ｙ）は、参照画像（ＲＥＦｅｒｅｎｃｅ）の輝度成分のフレームメモリの左上端のアドレスを示す。なお、ｏｆｆｓｅｔ（ｒｅｆ，Ｙ）は必ず１６バイトアラインされたアドレスとなる。 Further, offset (ref, Y) indicates the address of the upper left corner of the frame memory of the luminance component of the reference image (REFerence). Note that offset (ref, Y) is always a 16-byte aligned address.

次に、読み出し制御部１０５は、復号画像のフレームメモリ（Ｙ）１４１の復号ブロックの左上端のアドレスｗａｄｒを数５式のように算出する（ステップＳ９０２）。 Next, the read control unit 105 calculates the address waddr at the upper left end of the decoded block in the frame memory (Y) 141 of the decoded image as shown in Equation 5 (step S902).

ここで、ｏｆｆｓｅｔ（ｃｕｒ，Ｙ）は、復号画像（ＣＵＲｒｅｎｔ）の輝度成分のフレームメモリ（Ｙ）１４１の左上端のアドレスを示す。算出されたｗａｄｒは必ず１６バイトにアラインメントされている。 Here, offset (cur, Y) indicates the address of the upper left corner of the frame memory (Y) 141 of the luminance component of the decoded image (CURrent). The calculated waddr is always aligned to 16 bytes.

次に、読み出し制御部１０５によって、参照ブロックの左上端アドレスｒａｄｒの下位４ビットを０としたアドレスから１６バイト（１２８ビット）の画像データを読み出し、レジスタＲ０にロードする（ステップＳ９０３）。 Next, the read control unit 105 reads 16 bytes (128 bits) of image data from an address in which the lower 4 bits of the upper left end address radr of the reference block are 0, and loads the image data into the register R0 (step S903).

次に、読み出し制御部１０５によって、参照ブロックの左上端アドレスｒａｄｒの下位４ビットを０としたアドレスから１６バイト隔てて隣接するアドレスから１６バイト（１２８ビット）の画像データを読み出し、レジスタＲ１にロードする（ステップＳ９０４）。 Next, the read control unit 105 reads 16 bytes (128 bits) of image data from an adjacent address at a distance of 16 bytes from the address in which the lower 4 bits of the upper left end address radr of the reference block are 0, and loads the image data into the register R1 (Step S904).

そして、次に書き込み制御部１０６によって、制御テーブルからバイトシャッフル制御ワードを読み出してレジスタＲ２にロードする（ステップＳ９０５）。ここで、輝度成分の動き補償処理で使用する制御テーブルは、図５で示した色差成分の動き補償処理で使用する制御テーブルと異なるものである。 Then, the write control unit 106 reads the byte shuffle control word from the control table and loads it into the register R2 (step S905). The control table used in the luminance component motion compensation process is different from the control table used in the color difference component motion compensation process shown in FIG.

次に、書き込み制御部１０６は、バイトシャッフル命令（例えば、ｂｓｈｕｆｆｌｅ命令）を実行し、レジスタＲ２の制御ワードに従って、レジスタＲ０とＲ１に格納された画素から動き補償に必要な画素を１６個抽出し、その実行結果、すなわち抽出された１６個の画素をレジスタＲ３に格納する（ステップＳ９０６）。 Next, the write control unit 106 executes a byte shuffle instruction (for example, bshuffle instruction), and extracts 16 pixels necessary for motion compensation from the pixels stored in the registers R0 and R1 according to the control word of the register R2. The execution result, that is, the 16 extracted pixels are stored in the register R3 (step S906).

バイトシャッフル命令の実行後、書き込み制御部１０６は、レジスタＲ３の内容を復号ブロックの左上端のアドレスｗａｄｒから書き込む（ステップＳ９０７）。そして、読み出し制御部１０５は、参照ブロックの次のラインの先頭アドレスを取得してｒａｄｒに格納し（ステップＳ９０８）、さらに復号ブロックの次のラインの先頭アドレスを取得してｗａｄｒに格納する（ステップＳ９０９）。そして、ステップＳ９０３からＳ９０９までの処理を１６回繰り返し、マクロブロック全ての画素、すなわち１６×１６画素について上記処理を行う。これにより輝度成分の動き補償処理が完了する。 After execution of the byte shuffle instruction, the write control unit 106 writes the contents of the register R3 from the address waddr at the upper left corner of the decoded block (step S907). Then, the read control unit 105 acquires the head address of the next line of the reference block and stores it in radr (step S908), and further acquires the head address of the next line of the decoded block and stores it in waddr (step S908). S909). Then, the processing from step S903 to S909 is repeated 16 times, and the above processing is performed for all the pixels of the macroblock, that is, 16 × 16 pixels. Thus, the motion compensation process for the luminance component is completed.

なお、ステップＳ９０４からＳ９０６におけるバイトシャッフル命令に関する処理は、色差成分の動き補償処理と同様に行われる。 Note that the processing related to the byte shuffle instruction in steps S904 to S906 is performed in the same manner as the motion compensation processing for the color difference component.

このように本実施の形態の画像復号化装置では、動き補償処理部１０８の読み出し制御部１０６によって、色差成分Ｃｂ，Ｃｒを８画素ごとに交互に配置変換した画像データから、１６画素を単位としてレジスタＲ０、Ｒ１にロードして、書き込み制御部１０７によって、バイトシャッフル明細によって動き補償処理の対象となる１６画素のデータを、予め定められたバイトシャッフル制御ワードに従って抽出しているため、１６バイトの容量の一つのレジスタに色差成分ＣｂとＣｒをそれぞれ８画素ずつ配置して混在させ、ＣｂとＣｒに対する動き補償処理を、レジスタの無駄な領域を発生させずに同時に行うことができる。色差成分に対する動き補償処理は、ＣｂとＣｒとで全く共通の処理であるため、その後の動き補償に必要な処理はＣｂとＣｒとを独立に処理した場合の半分の時間で行うことができ、ＳＩＭＤ型プロセッサの並列処理を効果的に使用して動き補償処理を行うことができる。これにより、従来の輝度成分と色差成分の動き補償の処理ループ数が同等であるため処理効率の向上が望めなかった従来の画像復号化装置に比べ、色差成分の処理ループ数が輝度成分に対する動き補償処理の処理ループの半分となり、動き補償処理全体として２５％程度の処理速度向上を図ることができる。 As described above, in the image decoding apparatus according to the present embodiment, the readout control unit 106 of the motion compensation processing unit 108 uses 16 pixels as a unit from the image data in which the color difference components Cb and Cr are alternately converted every 8 pixels. The data is loaded into the registers R0 and R1, and the write control unit 107 extracts 16-pixel data to be subjected to motion compensation processing according to the byte shuffle specification according to a predetermined byte shuffle control word. It is possible to arrange the color difference components Cb and Cr in one register having a capacity of 8 pixels and mix them, and to perform motion compensation processing for Cb and Cr at the same time without generating a wasteful area of the register. Since the motion compensation processing for the color difference component is completely common between Cb and Cr, the processing necessary for the subsequent motion compensation can be performed in half the time when Cb and Cr are processed independently. The motion compensation processing can be performed by effectively using the parallel processing of the SIMD type processor. As a result, the number of processing loops for the color difference component is smaller than the number of motion loops for the luminance component compared to the conventional image decoding apparatus in which improvement in processing efficiency cannot be expected due to the same number of processing loops for motion compensation for the luminance component and the color difference component. This is half the processing loop of the compensation processing, and the processing speed can be improved by about 25% as a whole of the motion compensation processing.

また、本実施の形態の画像復号化装置では、面順次フォーマットの色差成分ＣｂとＣｒの画素データを、１６画素ごとに交互になるように配置して１ラインの中に混在させているので、１ライン中の合計の画素数が１６の倍数となり、この結果、１６バイトアライメントでの画像メモリへのアクセスしか行えないＳＩＭＤ型プロセッサにおいても、右端の画素データを処理する際に他の領域の画素データに対して悪影響を与えずに動き補償処理を行うことができる。 Further, in the image decoding apparatus of the present embodiment, the pixel data of the color difference components Cb and Cr in the frame sequential format are alternately arranged every 16 pixels and mixed in one line. The total number of pixels in one line is a multiple of 16, and as a result, even in SIMD type processors that can only access the image memory with 16-byte alignment, pixels in other regions are processed when processing the rightmost pixel data. Motion compensation processing can be performed without adversely affecting data.

なお、本実施の形態の画像復号化装置では、面順次フォーマットの画像データを色差成分毎に交互に配置しているが、色差成分を１画素毎に配置した点順次フォーマットの画像データについても上述した動き補償処理を適用することができる。 In the image decoding apparatus according to the present embodiment, the image data in the frame sequential format is alternately arranged for each color difference component. However, the image data in the dot sequential format in which the color difference component is arranged for each pixel is also described above. Applied motion compensation processing.

また、本実施の形態の画像復号化装置では、本発明の画像処理をＭＰＥＧの色差成分に対する動き補償処理に適用しているが、同時に異なる成分の画像データに対する画像処理を行うものであれば、他の画像処理についても本発明を適用することができる。 In the image decoding apparatus of the present embodiment, the image processing of the present invention is applied to motion compensation processing for MPEG color difference components. However, if image processing is performed on image data of different components at the same time, The present invention can also be applied to other image processing.

以上のように、本発明にかかる画像処理装置、画像処理方法および画像処理プログラムは、画像データの画像処理に有用であり、特に、色差成分Ｃｂ，Ｃｒに対する動き補償処理を行う画像復号化装置に適している。 As described above, the image processing apparatus, the image processing method, and the image processing program according to the present invention are useful for image processing of image data, and in particular, for an image decoding apparatus that performs motion compensation processing on color difference components Cb and Cr. Are suitable.

本発明の実施の形態にかかる画像復号化装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the image decoding apparatus concerning embodiment of this invention. 面順次フォーマットを示す説明図である。It is explanatory drawing which shows a field sequential format. 本実施の形態の画像復号化装置で使用する画像データのフォーマットを示す説明図である。It is explanatory drawing which shows the format of the image data used with the image decoding apparatus of this Embodiment. 色差信号（Ｃｂ／Ｃｒ）についての動き補償処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the motion compensation process about a color difference signal (Cb / Cr). 制御テーブルの内容を示す説明図である。It is explanatory drawing which shows the content of a control table. バイトシャッフル命令の動作を説明するための説明図である。It is explanatory drawing for demonstrating operation | movement of a byte shuffle instruction. ｒａｄｒの下位４ビットが４であった場合ステップ５０３、Ｓ５０４を実行後のレジスタＲ０、Ｒ１の値、およびステップＳ５０６のバイトシャッフル命令実行後のレジスタＲ３の値の一例を示す説明図である。When the lower 4 bits of radr are 4, it is explanatory drawing which shows an example of the value of register R0, R1 after performing step 503, S504, and the value of register R3 after the byte shuffle instruction execution of step S506. ｒａｄｒの下位４ビットが６であった場合におけるステップ４０３、Ｓ４０４を実行後のレジスタＲ０、Ｒ１の値、およびステップＳ４０６のバイトシャッフル命令実行後のレジスタＲ３の値の一例を示す説明図である。It is explanatory drawing which shows an example of the value of register | resistor R0, R1 after execution of step 403, S404 when the lower 4 bits of radr are 6, and the value of the register | resistor R3 after the byte shuffle instruction execution of step S406. 輝度成分の動き補償処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the motion compensation process of a luminance component.

Explanation of symbols

１００デコーダ
１０１可変長復号化部
１０２逆量子化部
１０３逆スキャン部
１０４逆ＤＣＴ部
１０５変換部
１０６読み出し制御部
１０７書き込み制御部
１０８動き補償部
１１０デコーダ制御部
１２０ストリームバッファ
１３０ストリームバッファ制御部
１４０画像メモリ
１４１フレームメモリ（Ｙ）
１４４フレームメモリ（Ｃｂ／Ｃｒ） DESCRIPTION OF SYMBOLS 100 Decoder 101 Variable length decoding part 102 Inverse quantization part 103 Inverse scanning part 104 Inverse DCT part 105 Conversion part 106 Read control part 107 Write control part 108 Motion compensation part 110 Decoder control part 120 Stream buffer 130 Stream buffer control part 140 Image Memory 141 Frame memory (Y)
144 Frame memory (Cb / Cr)

Claims

An image processing method for performing image processing on image data by a single instruction multiple data (SIMD) type parallel processing processor that processes a plurality of data with one instruction and performs read / write processing on a memory at a predetermined address position,
Every N / M (M: integer) pixels when simultaneously processing a plurality of pixel data having different color components of the image data and smaller than the number N of pixels that can be processed simultaneously by the parallel processing command of the SIMD processor A conversion step for converting the arrangement so that the pixel data of each component are alternately arranged;
A reading step of reading a plurality of pixel data arranged and converted in the conversion step by 2N pixels in units of N pixels;
By executing a byte shuffle instruction provided by the SIMD type processor, pixel data of N / M pixels to be image-processed is extracted from the pixel data of 2N pixels read by the reading step, and the extracted N / M An image processing step for performing image processing on pixel data of M pixels;
An image processing method comprising:

When simultaneously processing a plurality of pieces of pixel data that are different color difference components of the image data and are smaller than the number N of pixels, the plurality of pieces of pixel data are processed for each N / M (M: integer) pixel. The image processing method according to claim 1, wherein the arrangement conversion is performed so that the two are alternately arranged.

The image processing step executes the byte shuffle instruction based on control information determined in advance for each address difference between the predetermined address position and the pixel address, and is read out by the reading step. 3. The pixel data of N / M pixels to be subjected to image processing is extracted from pixel data of 2N pixels, and image processing is performed on the extracted pixel data of N / M pixels. The image processing method as described.

In the image processing step, pixel data of N / M pixels to be subjected to motion compensation processing is extracted from the pixel data of 2N pixels read out by the reading step by executing the byte shuffle instruction, and the extracted N / M pixels are extracted. The image processing method according to claim 1, wherein motion compensation processing is performed on pixel data of M pixels.

In the conversion step, when a plurality of pixel data which are different color difference components of the image data and are smaller than the number of pixels N are processed simultaneously, the plurality of pixel data is processed for every N / 2 (M: integer) pixels. Change the arrangement so that the pixel data of each component alternate,
In the image processing step, pixel data of N / 2 pixels to be subjected to motion compensation processing is extracted from the pixel data of 2N pixels read out by the reading step by executing the byte shuffle instruction, and the extracted N / 5. The image processing method according to claim 4, wherein motion compensation processing is performed on pixel data of two pixels.

A SIMD (Single Instruction Multiple Data) type parallel processing processor that processes a plurality of data with a single instruction and performs read / write processing on the image memory at a predetermined address position;
A memory for storing a program and accessible by the SIMD type parallel processing processor;
An image memory for storing image data and accessible by the SIMD type parallel processing processor,
The program includes the SIMD type parallel processing processor,
When processing a plurality of pieces of pixel data which are different color components of the image data and are less than the number N of pixels which can be processed simultaneously by the SIMD processor parallel processing instructions, the plurality of pieces of pixel data are converted to N / M ( M: integer) conversion means for converting the arrangement so that pixel data of each component is alternated for each pixel;
Reading means for reading a plurality of pixel data arranged and converted by the converting means for 2N pixels in units of N pixels;
By executing a byte shuffle instruction provided by the SIMD type processor, pixel data of N / M pixels to be image-processed is extracted from pixel data of 2N pixels read by the reading unit, and the extracted N / M An image processing apparatus that functions as an image processing unit that performs image processing on pixel data of M pixels.

The conversion means processes the plurality of pixel data for each N / M (M: integer) pixels when simultaneously processing a plurality of pixel data that are different color difference components of the image data and are smaller than the number N of pixels. The image processing apparatus according to claim 6, wherein the arrangement conversion is performed so that pixel data of each component is alternated.

The image processing means executes the byte shuffle instruction based on control information predetermined for each address difference between the predetermined address position and the pixel address, and is read by the reading means 8. The pixel data of N / M pixels to be image processed is extracted from pixel data of 2N pixels, and image processing is performed on the extracted pixel data of N / M pixels. The image processing apparatus described.

The image processing means extracts the pixel data of N / M pixels subject to motion compensation processing from the pixel data of 2N pixels read out by the reading means by executing the byte shuffle instruction, and extracts the extracted N / M 9. The image processing apparatus according to claim 6, wherein motion compensation processing is performed on pixel data of M pixels.

The conversion means processes the plurality of pixel data for each N / 2 (M: integer) pixels when simultaneously processing a plurality of pixel data that are different color difference components of the image data and are smaller than the number N of pixels. Change the arrangement so that the pixel data of each component alternate,
The image processing means extracts the pixel data of N / 2 pixels subject to motion compensation processing from the pixel data of 2N pixels read by the reading means by executing the byte shuffle instruction, and extracts the extracted N / N The image processing apparatus according to claim 9, wherein motion compensation processing is performed on pixel data of two pixels.

An image processing program that performs image processing on image data by a SIMD (Single Instruction Multiple Data) type parallel processing processor that processes a plurality of data with one instruction and performs read / write processing on a memory at a predetermined address position,
When processing a plurality of pieces of pixel data which are different color components of the image data and are less than the number N of pixels which can be processed simultaneously by the SIMD processor parallel processing instructions, the plurality of pieces of pixel data are converted to N / M ( M: integer) a conversion procedure for converting the arrangement so that pixel data of each component is alternated for each pixel;
A reading procedure for reading out a plurality of pixel data arranged and converted by the conversion procedure for 2N pixels in units of N pixels,
The N / M pixel data to be image-processed is extracted from the 2N pixel pixel data read out by the readout procedure by executing the byte shuffle instruction provided by the SIMD type processor, and the extracted N / M An image processing procedure for performing image processing on pixel data of M pixels;
An image processing program for causing a computer to execute.