JP2009170992A

JP2009170992A - Image processing apparatus and its method, and program

Info

Publication number: JP2009170992A
Application number: JP2008003870A
Authority: JP
Inventors: Keiko Yamaguchi; 恵子山口
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2008-01-11
Filing date: 2008-01-11
Publication date: 2009-07-30

Abstract

<P>PROBLEM TO BE SOLVED: To provide an image processing apparatus capable of reducing loads for coefficient rearrangement and achieving high-speed processing, and to provide a method and a program for the image processing apparatus. <P>SOLUTION: The image processing apparatus 100 comprises a decoding unit 111 for decoding a quantized and encoded conversion coefficient and outputting a one-dimensional coefficient string, a coefficient scanning unit 112 for performing coefficient scanning of the conversion coefficient outputted from the decoding unit 111 from the one-dimensional coefficient string into a two-dimensional coefficient string, a dequantization unit 114 for dequantizing the conversion coefficient obtained by the coefficient scanning unit 112, and a conversion processing unit 115 capable of converting the one-dimensional coefficient string into the two-dimensional coefficient string with respect to dequantized coefficient data. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、デジタル画像を処理する画像処理装置およびその方法、並びにプログラムに関するものである。 The present invention relates to an image processing apparatus and method for processing a digital image, and a program.

近年、画像情報をデジタル化して取り扱い、その際、効率の高い情報の伝達および蓄積を目的とし、画像情報特有の冗長性を利用して、離散コサイン変換（Discrete Cosine Transform：ＤＣＴ）等の直交変換と動き補償とにより圧縮するＭＥＰＧ（Moving Picture Experts Group）などの方式に準拠した装置が、放送局などの情報配信および一般家庭における情報受信の双方において普及している。
特に、ＭＰＥＧ２（ISO/IEC 13818−2）は、汎用画像符号化方式として定義されており、飛び越し走査画像および順次走査画像の双方、並びに標準解像度画像および高精細画像を網羅する標準で、プロフェッショナル用途およびコンシューマ用途の広範なアプリケーションに現在広く用いられている。 In recent years, image information has been digitized and handled. At that time, for the purpose of transmitting and storing information with high efficiency, orthogonal transform such as Discrete Cosine Transform (DCT) is used by utilizing redundancy unique to image information. Devices that conform to a system such as MPEG (Moving Picture Experts Group) that compresses by motion compensation are widely used for both information distribution in broadcasting stations and information reception in general households.
In particular, MPEG2 (ISO / IEC 13818-2) is defined as a general-purpose image coding system, and is a standard that covers both interlaced and progressively scanned images, standard resolution images, and high-definition images. And is now widely used in a wide range of consumer applications.

図１は、代表的なＭＰＥＧコーデック（Ｃｏｄｅｃ）復号器の機能ブロック図である。 FIG. 1 is a functional block diagram of a typical MPEG codec decoder.

図１に示す代表的なＭｐｅｇＣｏｄｅｃ復号器１は、ＩＤＣＴ（逆離散コサイン変換）と動き補償付き予測とを組み合わせた構成になっている。そして、復号器１は、ＩＤＣＴ処理部２、動き補償付き予測部３、加算部４、およびフレームバッファ５を有する。 A typical MpegCodec decoder 1 shown in FIG. 1 is configured by combining IDCT (Inverse Discrete Cosine Transform) and prediction with motion compensation. The decoder 1 includes an IDCT processing unit 2, a prediction unit 3 with motion compensation, an adding unit 4, and a frame buffer 5.

ＩＤＣＴ処理部２は、可変長復号部２１、ＡＣ／ＤＣ予測部２２、逆量子化部（ＩＱ）部２３、係数走査（ＺｉｇＺａｇ−Ｓｃａｎ／ＡｌｔｅｒｎａｔｉｖｅＳｃａｎ）部２４、およびＩＤＣＴ部２５により構成される。
また、動き補償付き予測部３は、動きベクトル復号部３１、動きベクトル予測部３２、および動き補償部３３により構成される。
そして、最終的にＩＤＣＴ処理部２の出力と動き補償付き予測部３との出力を加算部４で足し合わせることにより、画素データを生成する。 The IDCT processing unit 2 includes a variable length decoding unit 21, an AC / DC prediction unit 22, an inverse quantization unit (IQ) unit 23, a coefficient scanning (ZigZag-Scan / Alternative Scan) unit 24, and an IDCT unit 25. .
The prediction unit 3 with motion compensation includes a motion vector decoding unit 31, a motion vector prediction unit 32, and a motion compensation unit 33.
Then, finally, the output of the IDCT processing unit 2 and the output of the prediction unit 3 with motion compensation are added together by the adding unit 4 to generate pixel data.

図示しない符号化器の可変長符号化部において、量子化された一次元の係数列に並び替えて可変長符号化し冗長分を削減する。
具体的には、可変長符号化部で一次元配列として係数を配置した際に、ゼロの個数を表すラン（ＲＵＮ）情報と係数値の大きさをしめすレベル（ＬＥＶＥＬ）情報の組み合わせに対してランレングスデータを生成する。
このＲＵＮ情報と係数値の組み合わせに対して符号長の異なる符号語を割り当てることにより可変長符号化し、データの容量を削減する。
よって、復号器１の可変長復号部２１は復号したＲＵＮ情報値の個数分のゼロ係数を生成し、非ゼロ係数についてはレベル（ＬＥＶＥＬ）の値を係数値として復号化係数列を生成する。 In a variable length encoding unit of an encoder (not shown), rearrangement into a quantized one-dimensional coefficient sequence and variable length encoding is performed to reduce redundancy.
Specifically, when the coefficient is arranged as a one-dimensional array in the variable length coding unit, a combination of run (RUN) information indicating the number of zeros and level (LEVEL) information indicating the size of the coefficient value. Generate run-length data.
Variable length coding is performed by assigning codewords having different code lengths to the combination of the RUN information and the coefficient value, thereby reducing the data capacity.
Therefore, the variable length decoding unit 21 of the decoder 1 generates zero coefficients for the number of decoded RUN information values, and generates a decoded coefficient sequence with the level (LEVEL) value as a coefficient value for the non-zero coefficients.

復号化されるＤＣＴ係数は８ｘ８単位でまとめて可変長復号された後、ＡＣ／ＤＣ予測部２２でＡＣ／ＤＣ予測適用後、逆量子化部２３において、量子化係数で乗算されることにより逆量子化される。
逆量子化後、係数走査部２４において係数走査テーブルを参照し各係数を並べ直すことによりＩＤＣＴ処理の入力となる周波数配列に変換し、ＩＤＣＴ部２５のＩＤＣＴ（逆離散コサイン変換）により周波数領域から空間領域に変換することにより画素データが復号される。 The DCT coefficients to be decoded are collectively variable-length decoded in units of 8 × 8, and after AC / DC prediction is applied by the AC / DC prediction unit 22, the inverse quantization unit 23 multiplies the quantization coefficient by the quantization coefficient. Quantized.
After inverse quantization, the coefficient scanning unit 24 refers to the coefficient scanning table and rearranges each coefficient to convert it to a frequency array that is input to the IDCT processing, and from the frequency domain by IDCT (inverse discrete cosine transform) of the IDCT unit 25. Pixel data is decoded by converting into the spatial domain.

上述した、ＭＰＥＧＣｏｄｅｃ復号器では、ゼロ係数に対するＩＱ演算を省略し、デコードの高速化を図る工夫が行われてきた。
この場合、非ゼロ係数のみデータメモリ上の所定の領域に詰めて格納しＡＣ／ＤＣ予測、ＩＱ演算等を施した後に、ゼロ係数を加味した形で二次元的なデータ配列に並べ直し(係数走査)する。 In the above-described MPEG Codec decoder, a device has been devised to increase the decoding speed by omitting the IQ operation for the zero coefficient.
In this case, only non-zero coefficients are packed and stored in a predetermined area on the data memory, subjected to AC / DC prediction, IQ calculation, etc., and then rearranged into a two-dimensional data array in consideration of the zero coefficient (coefficient Scan).

この手法であると、無駄なＩＱ演算を省くことが可能であるが、係数並べ替えにかかる手間が増える。
特に、高ビットレートでエンコードされたＭＰＥＧＣｏｄｅｃストリームは非ゼロ係数の割合が低いため、ゼロ係数を無視しＩＱ処理を軽減することによる利得が低くなり、逆に配列並べ替えに対するメモリアクセスによるオーバーヘッドが相対的に重くなるため、高ビットレートにおいて処理負荷が増加するという不利益があった。 With this method, it is possible to omit useless IQ calculation, but the time required for coefficient rearrangement increases.
In particular, since the MPEG Codec stream encoded at a high bit rate has a low ratio of non-zero coefficients, the gain due to ignoring the zero coefficients and reducing IQ processing is low, and conversely, the overhead due to memory access for array rearrangement is relatively low. Therefore, there is a disadvantage that the processing load increases at a high bit rate.

本発明は、係数並べ替えの手間を軽減でき、処理の高速化を実現可能な画像処理装置およびその方法、並びにプログラムを提供することにある。 An object of the present invention is to provide an image processing apparatus, a method thereof, and a program capable of reducing the effort of rearranging coefficients and realizing a high-speed processing.

本発明の第１の観点は、入力画像信号をブロック化し、当該ブロック単位で量子化された画像情報を逆量子化し、逆変換処理を施して復号する画像処理装置であって、量子化され符号化された変換係数を復号し、一次元係数列を出力する復号部と、上記復号部から出力される変換係数を、一次元係数列から二次元係数列への係数走査を行う係数走査部と、上記係数走査部で得られる上記変換係数を逆量子化する逆量子化部と、逆量子化された係数データに対して一次元係数列から二次元係数列への変換処理が可能な変換処理部と、を有する。 A first aspect of the present invention is an image processing apparatus that blocks an input image signal, inversely quantizes image information quantized in units of the block, performs inverse transform processing, and decodes the quantized code. A decoding unit that decodes the transformed transform coefficients and outputs a one-dimensional coefficient sequence; and a coefficient scanning unit that scans the transform coefficients output from the decoding unit from the one-dimensional coefficient sequence to the two-dimensional coefficient sequence An inverse quantization unit that inversely quantizes the transform coefficient obtained by the coefficient scanning unit, and a conversion process that can convert the inversely quantized coefficient data from a one-dimensional coefficient string to a two-dimensional coefficient string Part.

好適には、上記係数走査部は、複数要素を並列計算する命令セットを有する演算器の内部メモリに係数をロードする段階で、上記変換処理部に直接入力可能な形式である二次元配列に予め並べなおしておく機能を有する。 Preferably, the coefficient scanning unit loads a coefficient into an internal memory of an arithmetic unit having an instruction set for calculating a plurality of elements in parallel, and stores the coefficient in advance in a two-dimensional array that can be directly input to the conversion processing unit. Has the function of rearranging.

好適には、上記復号部で可変長復号されて出力された非ゼロのＤＣＴ係数値に対し、ジグザグスキャンテーブルおよびＲＵＮ情報を用い、一次元の係数列から上記変換処理部の入力列となる二次元係数列の周波数配列に変換するためのアドレス生成を行う。 Preferably, a non-zero DCT coefficient value that is variable-length decoded by the decoding unit and output is output using a zigzag scan table and RUN information, and is converted from a one-dimensional coefficient sequence to an input sequence of the conversion processing unit. An address is generated for conversion into a frequency array of dimension coefficient sequences.

好適には、上記復号部から上記変換処理部までの一連の復号プロセスにおいて、ゼロ係数を含む二次元データに対して、少なくとも逆量子化（ＩＱ）処理の前処理としてゼロ係数マスクを行う。 Preferably, in a series of decoding processes from the decoding unit to the conversion processing unit, zero coefficient masking is performed on two-dimensional data including zero coefficients as at least preprocessing of inverse quantization (IQ) processing.

好適には、上記ＩＱ処理に係るゼロ係数マスクは、ＩＱ演算を行う際に、複数要素を並列計算する上記演算器の演算対象要素が全てゼロだった場合、ＩＱ処理をマスクする。 Preferably, the zero coefficient mask related to the IQ process masks the IQ process when all the calculation target elements of the calculator for calculating a plurality of elements in parallel are zero when performing the IQ calculation.

好適には、上記演算器は、上記ＩＱ処理の有効無効判定は、演算対象の複数要素に対して適用されるハードウェア（Ｈ／Ｗ）専用命令として実装されている。 Preferably, the arithmetic unit is implemented as a hardware (H / W) dedicated instruction that is applied to the plurality of elements to be calculated in the IQ processing validity / invalidity determination.

好適には、上記演算器は、上記ＩＱ処理の有効無効判定に従って、ＩＱ演算に有効な係数群に対応する量子化テーブルのみロードする。 Preferably, the arithmetic unit loads only a quantization table corresponding to a coefficient group effective for IQ calculation according to the validity / invalidity determination of the IQ processing.

好適には、上記係数走査部の後段で上記変換処理部より前段に、ＡＣ／ＤＣ予測を行うＡＣ／ＤＣ予測部を有し、上記演算器は、上記ＩＱ処理の有効無効判定に従って、ＡＣ／ＤＣ予測時の対象係数群が全てゼロだった場合は、ＡＣ／ＤＣ予測時、そのまま隣接ブロックから所定の画素値をムーブしてくる。 Preferably, an AC / DC prediction unit that performs AC / DC prediction is provided after the coefficient scanning unit and before the conversion processing unit, and the arithmetic unit performs AC / DC prediction according to the validity / invalidity determination of the IQ processing. When all the target coefficient groups at the time of DC prediction are zero, a predetermined pixel value is moved from the adjacent block as it is at the time of AC / DC prediction.

好適には、上記演算器は、上記復号部の出力である一次元の係数列を、対象となる上記内部メモリに直接ロードし、二次元係数列への係数走査を内部メモリ間ムーブ（ｍｖ）命令によって実行する。 Preferably, the arithmetic unit directly loads a one-dimensional coefficient sequence, which is an output of the decoding unit, into the target internal memory, and moves a coefficient scan to the two-dimensional coefficient sequence between internal memories (mv). Execute by instruction.

好適には、上記演算器は、上記係数走査部の係数走査処理を上記復号部の出力段で行う第１の処理と、上記変換処理部の入力段で行う第２の処理を選択的に切り替え可能な機能を有する。 Preferably, the computing unit selectively switches between a first process performed at the output stage of the decoding unit and a second process performed at the input stage of the conversion processing unit. It has possible functions.

好適には、上記演算器は、上記一次元係数列を出力とする復号部から出力される係数が、予め設定した閾値を超えた個数の係数を出力したら、係数走査処理を上記第２の処理から上記第１の処理に切り替える。 Preferably, the arithmetic unit outputs the coefficient scanning process when the coefficient output from the decoding unit that outputs the one-dimensional coefficient sequence exceeds the preset threshold value, and performs the coefficient scanning process in the second process. Is switched to the first process.

好適には、上記演算器は、採用する処理が第１の処理か第２の処理に決定するまでは上記復号部から得た値を加工せずに当該演算器の内部メモリに保持しておく。 Preferably, the arithmetic unit holds the value obtained from the decoding unit in the internal memory of the arithmetic unit without processing until the processing to be adopted is determined to be the first processing or the second processing. .

本発明の第２の観点は、入力画像信号をブロック化し、当該ブロック単位で量子化された画像情報を逆量子化し、逆変換処理を施して復号する画像処理方法であって、量子化され符号化された変換係数を復号し、一次元係数列を得る復号ステップと、上記復号ステップで得られた変換係数を、一次元係数列から二次元係数列への係数走査を行う係数走査ステップと、上記係数走査ステップで得られる上記変換係数を逆量子化する逆量子化ステップと、逆量子化された係数データに対して一次元係数列から二次元係数列への変換処理を行う変換処理ステップと、を有する。 A second aspect of the present invention is an image processing method that blocks an input image signal, inversely quantizes image information quantized in units of the block, and performs inverse transform processing to decode the quantized code. A decoding step of decoding the transformed transform coefficient to obtain a one-dimensional coefficient sequence, a coefficient scanning step for scanning the transform coefficient obtained in the decoding step from a one-dimensional coefficient sequence to a two-dimensional coefficient sequence, An inverse quantization step for inversely quantizing the transform coefficient obtained in the coefficient scanning step; a transform processing step for performing a transform process from the one-dimensional coefficient string to the two-dimensional coefficient string on the inversely quantized coefficient data; Have.

好適には、入力画像信号をブロック化し、当該ブロック単位で量子化された画像情報を逆量子化し、逆変換処理を施して復号する画像処理であって、量子化され符号化された変換係数を復号し、一次元係数列を得る復号処理と、上記復号ステップで得られた変換係数を、一次元係数列から二次元係数列への係数走査を行う係数走査処理と、上記係数走査ステップで得られる上記変換係数を逆量子化する逆量子化処理と、逆量子化された係数データに対して一次元係数列から二次元係数列への変換処理を行う変換処理と、を含む画像処理をコンピュータに実行させるプログラムである。 Preferably, the input image signal is blocked, the image information quantized in units of the block is inversely quantized, and inverse transform processing is performed to decode the image information. Decoding processing to obtain a one-dimensional coefficient sequence, and conversion coefficients obtained in the decoding step are obtained in the coefficient scanning processing in which the coefficient scanning from the one-dimensional coefficient sequence to the two-dimensional coefficient sequence is performed in the coefficient scanning step. Image processing including inverse quantization processing that inversely quantizes the conversion coefficient and conversion processing that performs conversion processing from the one-dimensional coefficient sequence to the two-dimensional coefficient sequence for the inversely quantized coefficient data This is a program to be executed.

本発明によれば、復号部において、量子化され符号化された変換係数が復号され、その結果得られた一次元係数列が係数走査部に入力される。
係数走査部においては、一次元係数列から二次元係数列への係数走査（係数の並べ替え）が行われ、逆量子化部に出力される。
逆量子化部においては、復号部によって復号された変換係数が逆量子化され、変換処理部に出力される。
そして、変化処理部において、逆量子化された係数データに対して一次元係数列から二次元係数列への変換処理が行われる。 According to the present invention, the quantized and encoded transform coefficients are decoded in the decoding unit, and the resulting one-dimensional coefficient sequence is input to the coefficient scanning unit.
In the coefficient scanning unit, coefficient scanning (coefficient rearrangement) from a one-dimensional coefficient sequence to a two-dimensional coefficient sequence is performed and output to the inverse quantization unit.
In the inverse quantization unit, the transform coefficient decoded by the decoding unit is inversely quantized and output to the transform processing unit.
In the change processing unit, conversion processing from the one-dimensional coefficient sequence to the two-dimensional coefficient sequence is performed on the dequantized coefficient data.

本発明によれば、複数の処理装置における効率の良い並列化処理を実現可能である。 According to the present invention, efficient parallel processing in a plurality of processing devices can be realized.

以下、本発明の実施の形態を図面に関連付けて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

本発明の実施形態は、複数要素を並列計算する命令セット（ＳＩＭＤ演算等）を有する演算器に対して、一次元係数列を出力とする復号部（可変長復号部・算術符号部等）、および係数走査（ＺｉｇＺａｇＳｃａｎ）、逆量子化（ＩＱ）、一次元係数列から二次元係数列への変換処理（逆離散コサイン変換処理・逆ウェーブレット変換等）を有する復号装置に対して適用される。
そして、本発明の実施形態においては、代表的な復号装置と異なり、基本的に、一次元係数列を出力とする復号部（可変長復号部・算術符合部等）の出力列である一次元の係数列から二次元係数列への変換を可変長復号の直後で行う。
つまり、本実施形態においては、演算器の内部メモリに係数をロードする段階で、一次元係数列から二次元係数列への変換処理部（ＩＤＣＴ部等）に直接入力可能な形式である二次元配列に予め並べなおしておくことにより、途中段階での冗長な並べ替え処理を削減する構成を実現している。
換言すると、本実施形態においては、可変長復号（ＶＬＣ）後、ＤＣＴブロックを複数のサブＤＣＴに分割して、各サブＤＣＴブロック毎にプロセッサエレメントに対応する量子化テーブルを割り当てることにより、ＩＱコードを効率よくＳＩＭＤ演算が可能となっている。 An embodiment of the present invention provides a decoding unit (variable-length decoding unit, arithmetic coding unit, etc.) that outputs a one-dimensional coefficient sequence for an arithmetic unit having an instruction set (SIMD calculation, etc.) for calculating a plurality of elements in parallel Further, the present invention is applied to a decoding device having coefficient scanning (ZigZagScan), inverse quantization (IQ), and conversion processing from a one-dimensional coefficient sequence to a two-dimensional coefficient sequence (inverse discrete cosine transform processing, inverse wavelet transform, etc.).
In the embodiment of the present invention, unlike a typical decoding device, basically, a one-dimensional output sequence of a decoding unit (variable length decoding unit, arithmetic code unit, etc.) that outputs a one-dimensional coefficient sequence. The conversion from the coefficient sequence to the two-dimensional coefficient sequence is performed immediately after the variable length decoding.
That is, in the present embodiment, a two-dimensional format that can be directly input to a conversion processing unit (such as an IDCT unit) from a one-dimensional coefficient sequence to a two-dimensional coefficient sequence at the stage of loading coefficients into the internal memory of the arithmetic unit By rearranging in the array in advance, a configuration that reduces redundant rearrangement processing at an intermediate stage is realized.
In other words, in this embodiment, after variable length decoding (VLC), an IQ code is obtained by dividing a DCT block into a plurality of sub DCTs and assigning a quantization table corresponding to a processor element for each sub DCT block. Can be efficiently performed by SIMD.

以下、本発明の実施形態に係る画像処理装置としてのＭＰＥＧコーデック（Ｃｏｄｅｃ）復号器の具体的な構成、機能について説明する。 A specific configuration and function of an MPEG codec decoder as an image processing apparatus according to an embodiment of the present invention will be described below.

＜第１実施形態＞
図２は、本発明の実施形態に係る画像処理装置としてのＭＰＥＧコーデック（Ｃｏｄｅｃ）復号器の構成を示す機能ブロック図である。 <First Embodiment>
FIG. 2 is a functional block diagram showing a configuration of an MPEG codec (Codec) decoder as an image processing apparatus according to an embodiment of the present invention.

本復号器（画像処理装置）１００は、図２に示すように、ＩＤＣＴ（逆離散コサイン変換）と動き補償付き予測とを組み合わせた構成になっている。
そして、復号器１００は、ＩＤＣＴ処理部１１０、動き補償付き予測部１２０、加算部１３０、フレームバッファメモリ１４０、およびＳＩＭＤ演算機能を有する演算器（プロセッサ）２００を含んで構成されている。 As shown in FIG. 2, the present decoder (image processing apparatus) 100 is configured by combining IDCT (Inverse Discrete Cosine Transform) and prediction with motion compensation.
The decoder 100 includes an IDCT processing unit 110, a prediction unit 120 with motion compensation, an adding unit 130, a frame buffer memory 140, and a computing unit (processor) 200 having a SIMD computing function.

ＩＤＣＴ処理部１１０は、可変長復号部１１１、係数走査（ＺｉｇＺａｇ−Ｓｃａｎ／ＡｌｔｅｒｎａｔｉｖｅＳｃａｎ）部１１２、ＡＣ／ＤＣ予測部１１３、逆量子化部（ＩＱ）部１１４、およびＩＤＣＴ部１１５により構成される。
また、動き補償付き予測部１２０は、動きベクトル復号化部１２１、および動き補償予測部１２２により構成される。
そして、最終的にＩＤＣＴ処理部１１０の出力と動き補償付き予測部１２０との出力を加算部１３０で足し合わせることにより、画素データを生成し、フレームバッファメモリ１４０に格納する。 The IDCT processing unit 110 includes a variable length decoding unit 111, a coefficient scanning (ZigZag-Scan / Alternative Scan) unit 112, an AC / DC prediction unit 113, an inverse quantization unit (IQ) unit 114, and an IDCT unit 115. .
Further, the motion compensated prediction unit 120 includes a motion vector decoding unit 121 and a motion compensation prediction unit 122.
Finally, pixel data is generated by adding the output of the IDCT processing unit 110 and the output of the motion compensated prediction unit 120 by the adding unit 130, and stored in the frame buffer memory 140.

本実施形態に係る復号器（画像処理装置）１００においては、ソフトウエアによる画像復号をターゲットとし、複数要素を並列計算する演算器を擁する構成を対象とする。また、本発明は、可変長復号部からＩＤＣＴ部までの復号プロセスに適用される。 The decoder (image processing apparatus) 100 according to the present embodiment targets a configuration including an arithmetic unit that targets image decoding by software and calculates a plurality of elements in parallel. Further, the present invention is applied to a decoding process from the variable length decoding unit to the IDCT unit.

可変長復号部１１１は、図示しない符号化装置によって符号化されたデータを受けて可変長復号化処理を行い、処理の結果得られた量子化データを係数走査部１１２に出力する。 The variable length decoding unit 111 receives data encoded by an encoding device (not shown), performs variable length decoding processing, and outputs quantized data obtained as a result of the processing to the coefficient scanning unit 112.

図示しない符号化器の可変長符号化部において、量子化された一次元の係数列に並び替えて可変長符号化し冗長分を削減する。
具体的には、可変長符号化部で一次元配列として係数を配置した際に、ゼロの個数を表すラン（ＲＵＮ）情報と係数値の大きさをしめすレベル（ＬＥＶＥＬ）情報の組み合わせに対してランレングスデータを生成する。
このＲＵＮ情報と係数値の組み合わせに対して符号長の異なる符号語を割り当てることにより可変長符号化し、データの容量を削減する。
よって、復号器１００の可変長復号部１１１は復号したＲＵＮ情報値の個数分のゼロ係数を生成し、非ゼロ係数についてはＤＣＴ係数（レベル：ＬＥＶＥＬ）の値を係数値として復号化係数列を生成する。 In a variable length encoding unit of an encoder (not shown), rearrangement into a quantized one-dimensional coefficient sequence and variable length encoding are performed to reduce redundancy.
Specifically, when the coefficient is arranged as a one-dimensional array in the variable length coding unit, a combination of run (RUN) information indicating the number of zeros and level (LEVEL) information indicating the size of the coefficient value. Generate run-length data.
Variable length coding is performed by assigning codewords having different code lengths to the combination of the RUN information and the coefficient value, thereby reducing the data capacity.
Therefore, the variable length decoding unit 111 of the decoder 100 generates zero coefficients corresponding to the number of decoded RUN information values, and for non-zero coefficients, the decoding coefficient sequence is set with the value of the DCT coefficient (level: LEVEL) as the coefficient value. Generate.

係数走査部１１２は、可変長復号部１１１において、可変長復号し出力された非ゼロのＤＣＴ係数（＝ＬＥＶＥＬ）値に対し、ジグザグスキャンテーブルおよびＲＵＮ情報を用い、一次元の係数列からＩＤＣＴの入力列となる二次元係数列の周波数配列に変換するためのアドレス生成を行う。 The coefficient scanning unit 112 uses the zigzag scan table and the RUN information for the non-zero DCT coefficient (= LEVEL) value output after the variable length decoding in the variable length decoding unit 111, and converts the IDCT from the one-dimensional coefficient sequence. An address is generated for conversion to a frequency array of a two-dimensional coefficient sequence as an input sequence.

ＡＤ／ＤＣ予測部１１３は、係数走査部１１２の出力に対するＡＤ／ＤＣ予測を行い、その結果を逆量子化部１１４に出力する。 The AD / DC prediction unit 113 performs AD / DC prediction on the output of the coefficient scanning unit 112 and outputs the result to the inverse quantization unit 114.

逆量子化部１１４は、可変長復号化部１０１により得られる量子化データをマクロブロック（ＭＢ）ごとに、たとえば８画素×８ラインのブロック単位で逆量子化し、得られたＤＣＴ（Discrete Cosine Transform：離散コサイン変換）係数データをＩＤＣＴ部１１５に出力する。 The inverse quantization unit 114 dequantizes the quantized data obtained by the variable length decoding unit 101 for each macroblock (MB), for example, in units of blocks of 8 pixels × 8 lines, and obtains the obtained DCT (Discrete Cosine Transform) : Discrete cosine transform) output coefficient data to the IDCT unit 115.

ＩＤＣＴ部１１５は、供給される逆量子化部１０２からのＤＣＴ係数データに対してＩＤＣＴ処理を行い、得られた画素データを加算部１３０に出力する。 The IDCT unit 115 performs IDCT processing on the supplied DCT coefficient data from the inverse quantization unit 102, and outputs the obtained pixel data to the addition unit 130.

動きベクトル復号化部１２１は、可変長復号部１１１によるデータにより動きベクトルをデコードし、その結果により動き補償予測部１２２の動作を制御する。 The motion vector decoding unit 121 decodes the motion vector based on the data from the variable length decoding unit 111, and controls the operation of the motion compensation prediction unit 122 based on the result.

動き補償予測部１２２は、動きベクトル復号化部１２１により動作が制御され、加算部１３０でそのとき処理しているものがＩピクチャである場合、この加算部１３０に対して何らデータを供給しない。
動き補償予測部１２２は、加算部１３０でそのとき処理しているものがＰピクチャである場合、フレームバッファメモリ１４０にアクセスして過去のフレームに相当する画像データを読み出し、これに所定の演算処理を行って得られた演算データを加算部１３０に供給する。
また、動き補償予測部１２２は、加算部１３０でそのとき処理しているものがＢピクチャである場合、フレームバッファメモリ１４０にアクセスして過去および未来のフレームに相当する画像データを読み出し、これに所定の演算処理を行って得られた演算データを加算部１３０に供給する。 The motion compensated prediction unit 122 is controlled in its operation by the motion vector decoding unit 121 and does not supply any data to the addition unit 130 when the addition unit 130 is processing an I picture at that time.
When the adder 130 is processing a P picture at this time, the motion compensation prediction unit 122 accesses the frame buffer memory 140 to read out image data corresponding to a past frame, and performs predetermined arithmetic processing on the read image data. The calculation data obtained by performing the above is supplied to the adding unit 130.
In addition, when the adder 130 is currently processing a B picture, the motion compensation prediction unit 122 accesses the frame buffer memory 140 to read out image data corresponding to past and future frames. Calculation data obtained by performing predetermined calculation processing is supplied to the adder 130.

因みに、フレームバッファメモリ１４０は、加算部１３０から順次出力される復号化された画像データのうち、ＩピクチャおよびＰピクチャに相当する画像データを保持するように構成されている。
そして加算部１３０は、そのとき処理しているものがＩピクチャである場合には、ＩＤＣＴ部１１５からの画素データを、復号化された画像データとしてそのまま出力するように構成されている。
また、加算部１３０は、そのとき処理しているものがＰピクチャまたはＢピクチャである場合にはＩＤＣＴ部１１５からの画素データと動き補償予測部１３３からの演算データとを加算処理することにより、復号化された画像データを得て、出力するように構成されている。 Incidentally, the frame buffer memory 140 is configured to hold image data corresponding to an I picture and a P picture among the decoded image data sequentially output from the adding unit 130.
The adder unit 130 is configured to output the pixel data from the IDCT unit 115 as decoded image data as it is when an image processed at that time is an I picture.
In addition, the addition unit 130 adds the pixel data from the IDCT unit 115 and the operation data from the motion compensation prediction unit 133 when the processing at that time is a P picture or a B picture, The decoded image data is obtained and output.

また、本実施形態の復号器（画像処理装置）１００は、可変長復号部１１１からＩＤＣＴ部１１５までの復号プロセスにおいて、ゼロ係数を含む二次元データに対して、逆量子化（ＩＱ）やＡＣ／ＤＣ予測等の処理の前処理としてゼロ係数マスクを行うことによりＩＱやＡＣ／ＤＣ予測等の処理を高速に実施する機能を有する。
ＩＱ処理に係るゼロ係数マスクは、ＩＱ演算を行う際に、複数要素を並列計算する演算器の演算対象要素が全てゼロだった場合、ＩＱ処理をマスクする。
ＩＱ処理有効無効判定は、演算対象の複数要素に対して同時に適用されるハードウェア（Ｈ／Ｗ）専用命令として実装することにより高速化を図る機能を有する。
また、ＩＱマスク判定に従って、ＩＱ演算有効な係数群に対応する量子化テーブルのみロードすることにより、冗長なメモリ・レジスタ間ロードを省き、ＩＱ処理を高速化する機能を有する。
また、ＩＱマスク判定に従って、ＡＣ／ＤＣ予測時の対象係数群が全てゼロだった場合は、ＡＣ／ＤＣ予測時、そのまま隣接ブロックから所定の画素値をムーブ（ｍｖ）してくる（ゼロ係数との足し合わせ等の演算を行わない）。 Also, the decoder (image processing apparatus) 100 according to the present embodiment performs inverse quantization (IQ) or AC on two-dimensional data including zero coefficients in the decoding process from the variable length decoding unit 111 to the IDCT unit 115. It has a function of performing processing such as IQ and AC / DC prediction at high speed by performing zero coefficient masking as preprocessing for processing such as / DC prediction.
The zero coefficient mask related to IQ processing masks IQ processing when all the calculation target elements of a computing unit that calculates a plurality of elements in parallel are zero when performing IQ calculation.
The IQ processing validity / invalidity determination has a function of speeding up by being implemented as a hardware (H / W) dedicated instruction applied simultaneously to a plurality of elements to be calculated.
In addition, by loading only the quantization table corresponding to the IQ calculation effective coefficient group according to the IQ mask determination, there is a function of eliminating redundant memory-register load and speeding up the IQ processing.
If the target coefficient group at the time of AC / DC prediction is all zero according to IQ mask determination, a predetermined pixel value is moved (mv) from an adjacent block as it is at the time of AC / DC prediction (zero coefficient and Are not performed).

また、本実施形態の復号プロセスにおいては、可変長復号部１１１の出力である一次元の係数列を、複数要素を並列計算する命令セットを有する演算器の対象となる内部メモリに直接ロードし、二次元係数列への変換（係数走査：ＺｉｇＺａｇＳｃａｎ）を内部メモリ間ｍｖ命令によって実装することにより高速化を図る機能を有する。 Further, in the decoding process of the present embodiment, the one-dimensional coefficient sequence that is the output of the variable-length decoding unit 111 is directly loaded into an internal memory that is a target of an arithmetic unit having an instruction set for calculating a plurality of elements in parallel, By implementing conversion to a two-dimensional coefficient sequence (coefficient scanning: ZigZagScan) with an inter-internal memory mv instruction, it has a function of speeding up.

以下、本実施形態に係る復号プロセスについてさらに詳述する。 Hereinafter, the decoding process according to the present embodiment will be further described in detail.

図３は、本実施形態に係る複数要素を並列計算する命令セットの一例であるＳＩＭＤ演算命令系有するプロセッサ（演算器）の機能ブロック図である。 FIG. 3 is a functional block diagram of a processor (arithmetic unit) having a SIMD arithmetic instruction system, which is an example of an instruction set for calculating a plurality of elements in parallel according to the present embodiment.

図３のプロセッサ２００は、係数（Ｃｏｅｆｆ）データバッファ（プロセッサ内部メモリ）２０１、ジグザグスキャナ変換テーブル２０２、ＭＶ命令部２０３、Ｑマトリクス（ｍａｔｒｉｘ）２０４、中間値データバッファ２０５、サブマクロ単位ＩＱマスク判定およびＩＱ命令部２０６、ＩＤＣＴ命令部２０７、および中間値バッファ２０８を有する。
また、図３において、符号１１２ａは外部の係数（Ｃｏｅｆｆ）データバッファを示す。 3 includes a coefficient (Coeff) data buffer (processor internal memory) 201, a zigzag scanner conversion table 202, an MV instruction unit 203, a Q matrix (matrix) 204, an intermediate value data buffer 205, a sub-macro unit IQ mask determination and An IQ command unit 206, an IDCT command unit 207, and an intermediate value buffer 208 are included.
In FIG. 3, reference numeral 112a denotes an external coefficient (Coeff) data buffer.

図４は、本実施形態に係る復号プロセスを模式的に示す図である。
図５は、図４に対する比較として既存の復号器の復号プロセスを示す図である。
図６は、本実施形態において、ＩＱ処理をマスク場合の復号プロセスを示す図である。
図７は、本実施形態において、ＩＱ演算後の復号プロセスを示す図である。 FIG. 4 is a diagram schematically showing a decoding process according to the present embodiment.
FIG. 5 is a diagram illustrating a decoding process of an existing decoder as a comparison with FIG.
FIG. 6 is a diagram illustrating a decoding process when IQ processing is masked in the present embodiment.
FIG. 7 is a diagram showing a decoding process after IQ operation in the present embodiment.

係数走査部１１２は、可変長復号部１１１において可変長復号され出力された非ゼロのＤＣＴ係数（＝ＬＥＶＥＬ）値に対し、ジグザグスキャンテーブルおよびＲＵＮ情報を用い、一次元の係数列からＩＤＣＴの入力列となる二次元係数列の周波数配列に変換するためのアドレス生成を行う。 The coefficient scanning unit 112 uses the zigzag scan table and the RUN information for the non-zero DCT coefficient (= LEVEL) value that is variable-length decoded and output by the variable-length decoding unit 111 to input IDCT from a one-dimensional coefficient sequence. An address is generated for conversion to a frequency array of a two-dimensional coefficient sequence as a sequence.

本実施形態においては、図１に示した代表的なＭｐｅｇＣｏｄｅｃ復号器の構成と異なり、一次元から二次元係数列への変換を可変長復号の直後で行う。
つまり、図４に示すように、ＳＩＭＤ演算用プロセッサ２００のレジスタに係数をロードする段階で、ＩＤＣＴに直接入力可能な形式である二次元配列に予め並べなおしておくことにより、途中段階での冗長な並べ替え処理を削減する。 In the present embodiment, unlike the typical MpegCodec decoder shown in FIG. 1, conversion from a one-dimensional to a two-dimensional coefficient sequence is performed immediately after variable-length decoding.
That is, as shown in FIG. 4, at the stage of loading the coefficients into the register of the SIMD arithmetic processor 200, by rearranging them in a two-dimensional array in a format that can be directly input to the IDCT, Reduce the sorting process.

図５に示した代表的なＭｐｅｇＣｏｄｅｃ復号器の構成では、非ゼロ係数のみ格納された一次元配列に対してＡＣ／ＤＣ予測・ＩＱ処理を施し、ゼロ係数が演算対象の配列中に含まれる二次元配列への変換をＩＤＣＴ処理の直前に実施することにより、ゼロ係数に対する冗長な演算を省いていた。 In the configuration of the typical MpegCodec decoder shown in FIG. 5, AC / DC prediction / IQ processing is performed on a one-dimensional array in which only non-zero coefficients are stored, and zero coefficients are included in the calculation target array. By performing the conversion to the dimensional array immediately before the IDCT processing, redundant operations for the zero coefficient are omitted.

これに対して、本実施形態による復号器１００の構成では、図４に示すように、複数係数要素を、並列計算をするＳＩＭＤ演算可能なプロセッサ２００にマッピングした状態で係数走査やＩＱ処理を実施する。
このため、ＤＣＴ係数の一次元から二次元への変換を可変長復号化の直後で行う。
この手法を用いた場合、ゼロ係数が配列中に含まれる二次元配列への変換をＩＱやＡＣ／ＤＣ予測の前に行うためゼロ係数に対する冗長な演算が発生してしまうおそれがある。
そのため、それぞれの処理の前処理としてゼロ係数マスクをハードウェア（Ｈ／Ｗ）命令によって行うことにより、既存の手法と比べて高速に冗長演算マスクを実施する。 On the other hand, in the configuration of the decoder 100 according to the present embodiment, as shown in FIG. 4, coefficient scanning and IQ processing are performed in a state where a plurality of coefficient elements are mapped to a processor 200 capable of performing SIMD calculation in parallel. To do.
For this reason, DCT coefficients are converted from one dimension to two dimensions immediately after variable length decoding.
When this method is used, conversion to a two-dimensional array in which zero coefficients are included in the array is performed before IQ or AC / DC prediction, so that there is a possibility that redundant calculations on the zero coefficients may occur.
Therefore, by performing a zero coefficient mask by a hardware (H / W) instruction as a pre-process of each process, a redundant operation mask is performed at a higher speed than in the existing method.

上述したように、変換アドレス生成はソフトウエア処理により実施されるが、実際の係数並べ替え処理の実行は、複数要素並列計算可能な演算器内の内部メモリ間ｍｖ命令を用いて行う。
変換アドレス生成後、図４に示すように、一次元配列として格納されている非ゼロＤＣＴ係数列を演算器であるプロセッサ２００内の内部メモリ２０１に割り当てる。
なお、この処理の前に演算器内の内部メモリはゼロクリア済みである。よって、ゼロ係数ロードを省略することが可能となる。
演算器であるプロセッサ２００内の内部メモリ２０１へ非ゼロＤＣＴ係数列をロードする際、この係数列は一次元配列のため所定のストライド幅でリニア転送する。図４の例では、ストライド幅を４としている。
また、係数走査部１１２で生成した変換アドレス配列も同じく演算器であるプロセッサ２００内の内部メモリに割り当て、所定のストライド幅でリニア転送する。この作業を１ブロック分（８ｘ８）繰り返す。 As described above, the conversion address generation is performed by software processing, but the actual coefficient rearrangement processing is executed by using the mv instruction between internal memories in the arithmetic unit capable of performing multi-element parallel calculation.
After the conversion address is generated, as shown in FIG. 4, the non-zero DCT coefficient sequence stored as a one-dimensional array is allocated to the internal memory 201 in the processor 200 which is an arithmetic unit.
Prior to this processing, the internal memory in the computing unit has been cleared to zero. Therefore, the zero coefficient load can be omitted.
When a non-zero DCT coefficient sequence is loaded into the internal memory 201 in the processor 200 which is an arithmetic unit, the coefficient sequence is linearly transferred with a predetermined stride width because it is a one-dimensional array. In the example of FIG. 4, the stride width is set to 4.
Also, the conversion address array generated by the coefficient scanning unit 112 is assigned to an internal memory in the processor 200 which is also an arithmetic unit, and is linearly transferred with a predetermined stride width. This operation is repeated for one block (8 × 8).

次に、演算器であるプロセッサ２００内の内部メモリにロードされた変換アドレスに従い、対応する非ゼロ係数を移動（係数走査）させる。
この移動の際には、図４に示すように、複数要素並列計算可能な演算器であるプロセッサ２００の内部メモリ間ｍｖ命令を用い複数要素並列で実施する。
ただし、係数走査用テーブルは８ｘ８要素に対して設定される座標変換テーブルのため、本構成において複数要素並列計算可能な演算器のｍｖ命令は、移動先アドレスが演算対象エリアの範囲外（ただし、８ｘ８範囲内）のケースも対応する。 Next, the corresponding non-zero coefficient is moved (coefficient scanning) according to the conversion address loaded in the internal memory in the processor 200 which is an arithmetic unit.
At the time of this movement, as shown in FIG. 4, it is implemented in parallel with a plurality of elements using an mv instruction between internal memories of the processor 200 which is an arithmetic unit capable of calculating a plurality of elements in parallel.
However, since the coefficient scanning table is a coordinate conversion table set for 8 × 8 elements, the mv instruction of an arithmetic unit capable of performing multi-element parallel calculation in this configuration has a destination address outside the range of the calculation target area (however, Cases within the 8x8 range are also supported.

複数要素の並列計算命令における並べ替え(係数走査)の後も、ＡＣ／ＤＣ予測、ＩＱ演算を、並列計算命令２０６を複数要素に適用することにより実施する。 After rearrangement (coefficient scanning) in the parallel calculation instruction of a plurality of elements, AC / DC prediction and IQ operation are performed by applying the parallel calculation instruction 206 to the plurality of elements.

次に、ＡＣ／ＤＣ予測時、ＩＱマスク判定に従って対象係数群が全てゼロだった場合は、ＡＣ／ＤＣ予測時、そのまま隣接ブロックから所定の画素値を移動（ｍｖ）してくる。この場合、ゼロ係数との足し合わせ等の演算を行わない。 Next, when the target coefficient group is all zero according to the IQ mask determination at the time of AC / DC prediction, a predetermined pixel value is moved (mv) from the adjacent block as it is at the time of AC / DC prediction. In this case, calculation such as addition with the zero coefficient is not performed.

ＩＱ演算を行う際に、図６に示しように、複数要素に対する並列演算の演算ターゲットとなる複数係数要素が全てゼロだった場合、ＩＱ処理をマスクする。
この有効無効判定はＨ／Ｗ専用命令２０６として実装する。
また、この際、プロセッサ２００の複数要素に対する並列演算要素サイズが８ｘ８であるときに限り符号化ブロックパターン（ＣｏｄｅｄＢｌｏｃｋＰａｔｔｅｒｎ）とＨ／Ｗ専用命令によるＩＱ処理マスク判定命令は等価の情報を持つ。
ただし、一般的にＨ／Ｗ命令によるＩＱ処理マスク判定命令を使うほうが、処理速度的に利得があるケースが多い。 When performing the IQ calculation, as shown in FIG. 6, if all the multiple coefficient elements that are the calculation target of the parallel calculation for the multiple elements are all zero, the IQ process is masked.
This valid / invalid determination is implemented as the H / W dedicated instruction 206.
At this time, only when the parallel operation element size for the plurality of elements of the processor 200 is 8 × 8, the IQ processing mask determination instruction using the coded block pattern and the H / W dedicated instruction has equivalent information.
However, in general, there are many cases where there is a gain in processing speed when using an IQ processing mask determination instruction based on an H / W instruction.

ＩＱ演算の際は、図６に示すように、所定のマトリックステーブル２０４を予めレジスタ２１０へロードしておき、ＡＣ／ＤＣ予測が終わり次第、レジスタへロードしたＱマトリックステーブルの量子化係数とＤＣＴ係数の乗算をＳＩＭＤ演算処理により複数要素並行して行う。
また、Ｑマトリックステーブル２０４のロードの際、ＩＱ演算マスクされていない複数要素に対応するＱマトリックステーブル２０４のみロードすることにより、無駄なＱマトリックステーブルロードを省く。 At the time of IQ calculation, as shown in FIG. 6, a predetermined matrix table 204 is loaded into the register 210 in advance, and as soon as the AC / DC prediction is completed, the quantization coefficient and DCT coefficient of the loaded Q matrix table. Is performed in parallel by a plurality of elements by SIMD arithmetic processing.
In addition, when loading the Q matrix table 204, only the Q matrix table 204 corresponding to a plurality of elements not subjected to IQ operation masking is loaded, thereby eliminating unnecessary Q matrix table loading.

ＩＱ演算後、図７に示すように、既に演算器であるプロセッサ２００の内部メモリ上の係数は二次元周波数配列として並んでいるため、配置アドレスの変換なしでそのまま８ｘ８ＩＤＣＴ演算を施し、画素データを生成する。
ＩＤＣＴ演算の際、通常演算のマスクはＣｏｄｅｄＢｌｏｃｋＰａｔｔｅｒｎを用いるが、ＩＱ演算で用いたＩＱ処理マスク判定命令を用いたほうが、処理速度が高速である場合はＣｏｄｅｄＢｌｏｃｋＰａｔｔｅｒｎを使わずに、無効判定命令を複数要素に対して同時に適用することにより代替することが可能である。 After IQ calculation, as shown in FIG. 7, since the coefficients on the internal memory of the processor 200, which is an arithmetic unit, are arranged as a two-dimensional frequency array, the 8 × 8 IDCT calculation is performed as it is without converting the arrangement address, and the pixel data is converted. Generate.
In the IDCT operation, Coded Block Pattern is used as the mask for normal operation. However, if the processing speed is faster when the IQ processing mask determination instruction used in IQ operation is used, the invalid determination is made without using Coded Block Pattern. An alternative is to apply the instructions to multiple elements simultaneously.

ＩＤＣＴで生成された画素データをフレームバッファメモリ１４０にストアした後、使用済みの演算器であるプロセッサ２００の内部メモリをＨ／Ｗ専用命令によって全てゼロにクリアする。 After the pixel data generated by the IDCT is stored in the frame buffer memory 140, the internal memory of the processor 200, which is a used arithmetic unit, is cleared to zero by a dedicated H / W instruction.

以上、第１の実施形態について説明した。
次に、１ブロック中に殆ど非ゼロ係数が含まれていない場合の処理例を第２の実施形態として説明する。 The first embodiment has been described above.
Next, a processing example when almost no non-zero coefficient is included in one block will be described as a second embodiment.

＜第２実施形態＞
第１の実施形態において説明した本手法（第１の方法；第１の処理）は高ビットレート時に特に有利であるが、反面、１ブロック中に殆ど非ゼロ係数が含まれていないケースでは既存の手法（第２の方法；第２の処理）のほうが有利であるケースが存在する。
そのため、本第２の実施形態においては、デコード処理中に所定の閾値を用いることにより第１の方法（第１の処理）と第２の方法（第２の処理）を動的に切り替えることにより、常に最適なデータ配置を選択する構成にすることが可能である。 Second Embodiment
The present method (first method; first processing) described in the first embodiment is particularly advantageous at a high bit rate. However, it is an existing method in the case where almost no non-zero coefficient is included in one block. There is a case where the method (second method; second processing) is more advantageous.
Therefore, in the second embodiment, by using a predetermined threshold during the decoding process, the first method (first process) and the second method (second process) are dynamically switched. It is possible to make a configuration that always selects the optimum data arrangement.

図８は、本第２の実施形態の構成を説明するための図である。
以下、この図８に関連付けて説明をする。
なお、本第２の実施形態は、前述の第１の実施形態を基本にした構成となる。ここでは、第１の実施形態１との差分部分のみ説明する。 FIG. 8 is a diagram for explaining the configuration of the second embodiment.
Hereinafter, description will be given in association with FIG.
The second embodiment has a configuration based on the first embodiment described above. Here, only the difference from the first embodiment will be described.

可変長復号部１１１におけるＤＣＴ係数のＶＬＤ処理の際、８ｘ８単位である閾値Ｖｔｈを超えた個数の係数を出力したら（ＳＴ１，ＳＴ２）、第２の方法から第１の方法（第１の実施形態で説明したＳＩＭＤ演算に係数マップする手法）に切り替える。
ＶＬＤ出力ＤＣＴ係数が閾値Ｖｔｈに達するまではＶＬＤから得た値を加工しない状態で所定の演算器であるプロセッサ２００の内部レジスタ（グローバルレジスタ）等に保持しておく（ＳＴ３）。 When VLD processing of DCT coefficients in the variable length decoding unit 111 outputs a number of coefficients exceeding the threshold Vth of 8 × 8 units (ST1, ST2), the second method to the first method (first embodiment) Switching to the SIMD calculation described in (1).
Until the VLD output DCT coefficient reaches the threshold value Vth, the value obtained from the VLD is held in an internal register (global register) of the processor 200, which is a predetermined arithmetic unit, without being processed (ST3).

係数の個数がある閾値Ｖｔｈを超えるまで可変長復号後の処理を保留することにより、余計なメモリアクセスを最小限に抑えた状態で適応的に処理の切り替えを行うことができる（ＳＴ２〜ＳＴ５）。 By deferring the processing after variable length decoding until the number of coefficients exceeds a certain threshold value Vth, it is possible to adaptively switch processing while minimizing unnecessary memory access (ST2 to ST5). .

閾値ＶｔｈはＣＰＵのもつグローバルレジスタの本数とＳＩＭＤ演算の要素数、ＳＩＭＤ演算にかかる遅延のバランスを加味して決定する。 The threshold value Vth is determined in consideration of the number of global registers of the CPU, the number of elements of SIMD operation, and the balance of delays in SIMD operation.

以上説明したように、本実施形態によれば、本実施形態による復号器１００の構成では、複数係数要素を、並列計算をするＳＩＭＤ演算可能なプロセッサ２００にマッピングした状態で係数走査やＩＱ処理を実施するため、ＤＣＴ係数の一次元から二次元変換を可変長復号化の直後で行う。そして、この手法を用いた場合、ゼロ係数が配列中に含まれる二次元配列への変換をＩＱやＡＣ／ＤＣ予測の前に行うためゼロ係数に対する冗長な演算が発生してしまうおそれがあることから、それぞれの処理の前処理としてゼロ係数マスクをハードウェア（Ｈ／Ｗ）命令によって行うことにより、既存の手法と比べて高速に冗長演算マスクを実施することから、以下の効果を得ることができる。 As described above, according to the present embodiment, in the configuration of the decoder 100 according to the present embodiment, coefficient scanning and IQ processing are performed in a state where a plurality of coefficient elements are mapped to the processor 200 that can perform SIMD calculation in parallel. In order to implement, a one-dimensional to two-dimensional transformation of DCT coefficients is performed immediately after variable length decoding. When this method is used, conversion to a two-dimensional array in which the zero coefficient is included in the array is performed before IQ or AC / DC prediction, so that there is a possibility that redundant calculation for the zero coefficient may occur. From the above, by performing a zero coefficient mask as a pre-processing of each processing by a hardware (H / W) instruction, a redundant operation mask is performed at a higher speed than the existing method, and the following effects can be obtained. it can.

複数要素並列演算可能な演算器を有するシステムにおいては、ＩＤＣＴ係数を周波数成分に配置したときの非ゼロ係数の偏りを利用してＩＱ処理を効率よくマスクすることが可能となるため、処理が高速化される。
つまり、通常非ゼロ係数はＩＤＣＴ時の低周波成分領域に集中することから、二次元データへの並べ替え後であってもある程度非ゼロデータの偏りは存在するため、あえて既存の手法（図５）のように非ゼロ係数のみをまとめてデータ配置するよりも、処理の前工程で二次元データに並べなおして複数要素に対して並列に演算処理を適用することにより、一気に並列処理を行うことに対するメリットのほうが大きい。 In a system having a computing element capable of performing multi-element parallel computation, it is possible to efficiently mask IQ processing by using non-zero coefficient bias when an IDCT coefficient is arranged in a frequency component, so that the processing speed is high. It becomes.
In other words, since non-zero coefficients are usually concentrated in the low-frequency component region during IDCT, there is some non-zero data bias even after rearrangement into two-dimensional data. Rather than allocating only non-zero coefficients together as in (), parallel processing is performed at once by rearranging the data into two-dimensional data in the previous process and applying arithmetic processing to multiple elements in parallel. The merit for is greater.

係数に対する量子化係数をテーブル引きするときの手間が軽減され、その分、処理が高速化される。
つまり、既存の手法では非ゼロ係数の一次元配列に対して順次ＩＱ処理を行うため、それぞれの非ゼロ係数の二次元配置アドレスを計算して量子化テーブルから当該の量子化係数値を参照する必要があったが、本実施形態ではＩＤＣＴ係数が二次元配置された状態でＩＱ処理を行うため、係数個別に量子化テーブル参照アドレスの算出を行う必要がない。
よって、二次元配置されたＩＤＣＴ係数と量子化テーブルを直接ベクタ演算することが可能となる。 The time and effort required to draw the quantization coefficient for the coefficient in the table is reduced, and the processing speed is increased accordingly.
In other words, since the existing method sequentially performs IQ processing on a one-dimensional array of non-zero coefficients, the two-dimensional arrangement address of each non-zero coefficient is calculated and the corresponding quantization coefficient value is referred to from the quantization table. In this embodiment, the IQ processing is performed in a state where the IDCT coefficients are two-dimensionally arranged. Therefore, it is not necessary to calculate the quantization table reference address for each coefficient.
Therefore, it is possible to perform a vector operation directly on the two-dimensionally arranged IDCT coefficients and the quantization table.

また、係数走査を複数要素に対して演算処理を並列に適用することにより処理の高速化が図れる。 Further, the processing speed can be increased by applying the arithmetic processing in parallel to the coefficient scanning.

また、複数要素並列で一括して演算処理するため、各ＤＣＴの非ゼロ係数の割合（＝ビットレート）に関わらず、ほぼ一定の処理量で復号処理が実施できる。
それによるメリットの一例として、各処理を複数のプロセッサに割り振り復号システム全体をパイプライン制御にした場合の各プロセッサのロードバランスが安定する。その結果、復号器全体としての性能の安定・処理の高速化につながる。 In addition, since arithmetic processing is performed in parallel in a plurality of elements, decoding processing can be performed with a substantially constant processing amount regardless of the non-zero coefficient ratio (= bit rate) of each DCT.
As an example of the merit, the load balance of each processor is stabilized when each process is assigned to a plurality of processors and the entire decoding system is pipeline controlled. As a result, the performance of the entire decoder is stabilized and the processing speed is increased.

また、以上詳細に説明した方法は、上記手順に応じたプログラムとして形成し、ＣＰＵ等のコンピュータで実行するように構成することも可能である。
また、このようなプログラムは、半導体メモリ、磁気ディスク、光ディスク、フロッピー（登録商標）ディスク等の記録媒体、この記録媒体をセットしたコンピュータによりアクセスし上記プログラムを実行するように構成可能である。 Further, the method described above in detail can be formed as a program corresponding to the above-described procedure and executed by a computer such as a CPU.
Such a program can be configured to be accessed by a recording medium such as a semiconductor memory, a magnetic disk, an optical disk, a floppy (registered trademark) disk, or the like, and to execute the program by a computer in which the recording medium is set.

代表的なＭＰＥＧコーデック（Ｃｏｄｅｃ）復号器の機能ブロック図である。1 is a functional block diagram of a typical MPEG codec (Codec) decoder. FIG. 本発明の実施形態に係る画像処理装置としてのＭＰＥＧＣｏｄｅｃ復号器の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the MPEGCodec decoder as an image processing apparatus which concerns on embodiment of this invention. 本実施形態に係る複数要素を並列計算する命令セットの一例であるＳＩＭＤ演算命令系有するプロセッサ（演算器）の機能ブロック図である。It is a functional block diagram of a processor (arithmetic unit) having a SIMD arithmetic instruction system which is an example of an instruction set for calculating a plurality of elements in parallel according to the present embodiment. 本実施形態に係る復号プロセスを模式的に示す図である。It is a figure which shows typically the decoding process which concerns on this embodiment. 図４に対する比較として既存の復号器の復号プロセスを示す図である。FIG. 5 is a diagram illustrating a decoding process of an existing decoder as a comparison with FIG. 4. 本実施形態において、ＩＱ処理をマスク場合の復号プロセスを示す図である。In this embodiment, it is a figure which shows the decoding process in case IQ processing is masked. 本実施形態において、ＩＱ演算後の復号プロセスを示す図である。In this embodiment, it is a figure which shows the decoding process after IQ calculation. 第２の実施形態を説明するための図である。It is a figure for demonstrating 2nd Embodiment.

Explanation of symbols

１００・・・画像処理装置（復号器）、１１０・・・ＩＤＣＴ処理部、１１１・・・可変長復号部、１１２・・・係数走査部、１１３・・・ＡＣ／ＤＣ予測部、１１４・・・逆量子化部、１１５・・・ＩＤＣＴ部、１２０・・・動き補償付き予測部、１２１・・・動きベクトル復号化部、１２２・・・動き補償予測部、１３０・・・加算部、１４０・・・フレームメモリ。 DESCRIPTION OF SYMBOLS 100 ... Image processing apparatus (decoder), 110 ... IDCT processing part, 111 ... Variable length decoding part, 112 ... Coefficient scanning part, 113 ... AC / DC prediction part, 114 ... Inverse quantization unit, 115 ... IDCT unit, 120 ... prediction unit with motion compensation, 121 ... motion vector decoding unit, 122 ... motion compensation prediction unit, 130 ... addition unit, 140 ... Frame memory.

Claims

An image processing apparatus that blocks an input image signal, inversely quantizes image information quantized in units of the block, performs inverse transform processing, and decodes the image information.
A decoding unit that decodes the quantized and encoded transform coefficients and outputs a one-dimensional coefficient sequence;
A coefficient scanning unit that performs coefficient scanning from a one-dimensional coefficient sequence to a two-dimensional coefficient sequence, and transform coefficients output from the decoding unit;
An inverse quantization unit that inversely quantizes the transform coefficient obtained by the coefficient scanning unit;
A conversion processing unit capable of converting from a one-dimensional coefficient sequence to a two-dimensional coefficient sequence on the dequantized coefficient data;
An image processing apparatus.

The coefficient scanning unit is
2. A function of rearranging the coefficients in an internal memory of an arithmetic unit having an instruction set for calculating a plurality of elements in parallel in a two-dimensional array that can be directly input to the conversion processing unit. The image processing apparatus described.

The coefficient scanning unit is
Using a zigzag scan table and RUN information for a non-zero DCT coefficient value output after variable length decoding by the decoding unit, a two-dimensional coefficient sequence serving as an input sequence of the conversion processing unit from a one-dimensional coefficient sequence The image processing apparatus according to claim 2, wherein address generation for conversion to a frequency array is performed.

4. The image according to claim 3, wherein, in a series of decoding processes from the decoding unit to the conversion processing unit, zero coefficient masking is performed as a preprocessing of at least inverse quantization (IQ) processing on two-dimensional data including zero coefficients. Processing equipment.

The image processing apparatus according to claim 4, wherein the zero coefficient mask for the IQ processing masks the IQ processing when all the calculation target elements of the arithmetic unit for calculating a plurality of elements in parallel are zero when performing IQ calculation. .

The computing unit is
The image processing apparatus according to claim 5, wherein the IQ processing validity / invalidity determination is implemented as a hardware (H / W) dedicated instruction applied to a plurality of elements to be calculated.

The computing unit is
The image processing apparatus according to claim 6, wherein only the quantization table corresponding to the coefficient group effective for IQ calculation is loaded according to the validity / invalidity determination of the IQ process.

An AC / DC prediction unit that performs AC / DC prediction at a stage after the coefficient scanning unit and before the conversion processing unit;
The computing unit is
The predetermined pixel value is moved from an adjacent block as it is at the time of AC / DC prediction when all the target coefficient groups at the time of AC / DC prediction are zero according to the validity / invalidity determination of the IQ processing. Image processing device.

The computing unit is
The one-dimensional coefficient sequence that is an output of the decoding unit is directly loaded into the target internal memory, and the coefficient scan to the two-dimensional coefficient sequence is executed by an inter-memory move (mv) instruction. Image processing device.

The computing unit is
The function capable of selectively switching between a first process in which the coefficient scanning process of the coefficient scanning unit is performed at the output stage of the decoding unit and a second process performed in the input stage of the conversion processing unit. Image processing apparatus.

The computing unit is
When the coefficient output from the decoding unit that outputs the one-dimensional coefficient sequence outputs a number of coefficients exceeding a preset threshold, the coefficient scanning process is switched from the second process to the first process. Item 15. The image processing apparatus according to Item 10.

The computing unit is
The image processing apparatus according to claim 11, wherein the value obtained from the decoding unit is stored in the internal memory of the computing unit without being processed until the process to be adopted is determined to be the first process or the second process.

An image processing method that blocks an input image signal, dequantizes image information quantized in units of the block, performs inverse transform processing, and decodes the image information.
Decoding a quantized and encoded transform coefficient to obtain a one-dimensional coefficient sequence;
A coefficient scanning step for performing a coefficient scanning from the one-dimensional coefficient string to the two-dimensional coefficient string for the transform coefficient obtained in the decoding step;
An inverse quantization step for inversely quantizing the transform coefficient obtained in the coefficient scanning step;
A conversion processing step for performing conversion processing from the one-dimensional coefficient sequence to the two-dimensional coefficient sequence on the dequantized coefficient data;
An image processing method.

An image process that blocks an input image signal, dequantizes image information quantized in units of the block, performs an inverse transform process, and decodes the image information.
A decoding process for decoding a quantized and encoded transform coefficient to obtain a one-dimensional coefficient sequence;
A coefficient scanning process for performing the coefficient scanning from the one-dimensional coefficient string to the two-dimensional coefficient string for the transform coefficient obtained in the decoding step;
An inverse quantization process for inversely quantizing the transform coefficient obtained in the coefficient scanning step;
A program that causes a computer to execute image processing including conversion processing that performs conversion processing from a one-dimensional coefficient sequence to a two-dimensional coefficient sequence on inversely quantized coefficient data.