JP5222958B2

JP5222958B2 - Moving picture coding apparatus, moving picture coding method, moving picture decoding apparatus, and moving picture decoding method

Info

Publication number: JP5222958B2
Application number: JP2010542823A
Authority: JP
Inventors: 宏一浜田; 昌史高橋; 徹横山
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2008-12-16
Filing date: 2009-11-25
Publication date: 2013-06-26
Anticipated expiration: 2029-11-25
Also published as: JPWO2010070818A1; WO2010070818A1

Description

本発明は、動画像を符号化及び復号化する動画像符号化・復号化技術に関し、特に画面間予測における符号量の削減に関するものである。 The present invention relates to a moving image encoding / decoding technique for encoding and decoding a moving image, and more particularly to a reduction in code amount in inter-screen prediction.

近年の動画像符号化規格では、画像の時間的・空間的な相関性を利用した様々な画素値予測方法を利用して、画面を細かな小領域に分割した可変長のブロック単位で符号化を行う方式を採用している。２００３年に策定されたＨ．２６４／ＡＶＣ（Advanced Video Coding）規格では、符号化モードをこのような画素値予測方法とブロックサイズの組み合わせとして定めており、多数の符号化モードの中から画像の性質に応じて最適なモードを選択することによって、符号化効率を高めている。また、Ｈ．２６４／ＡＶＣでは、想定するアプリケーションの種類に応じて利用可能な符号化技術を限定するために、プロファイルと呼ばれる符号化ツール群の集合体を規定している。 In recent video coding standards, various pixel value prediction methods using temporal and spatial correlation of images are used, and the coding is performed in units of variable length blocks obtained by dividing the screen into small small areas. The method to do is adopted. H. was formulated in 2003. In the H.264 / AVC (Advanced Video Coding) standard, an encoding mode is defined as a combination of such a pixel value prediction method and a block size, and an optimum mode is selected from a number of encoding modes according to the property of the image. By selecting, the coding efficiency is increased. H. In H.264 / AVC, an encoding tool group called a profile is defined in order to limit the encoding techniques that can be used depending on the type of application that is assumed.

Ｈ．２６４／ＡＶＣでは、１６×１６画素サイズのマクロブロックごとに符号化モードを決定する。ここでは、画面内ブロックの画素相関を利用して圧縮を行う画面内予測（Intra予測）、および画面間ブロックの画素相関を利用する画面間予測（Inter予測）のうちいずれかの予測方法を適用する。画面間の画素値予測方法として、１枚の参照画像を指定する順方向予測（Predictive予測）と、２枚の参照画像を指定することが可能な双方向予測（Bi-directional predictive予測）を規定している（Baselineプロファイルを利用する場合は、Predictive予測のみが利用可能となっている）。 H. In H.264 / AVC, an encoding mode is determined for each macroblock having a 16 × 16 pixel size. Here, one of the prediction methods of intra prediction (intra prediction) that compresses using the pixel correlation of the block in the screen and inter prediction (inter prediction) that uses the pixel correlation of the block between the screens is applied. To do. Predictive prediction that specifies one reference image and bi-directional predictive prediction that can specify two reference images are specified as pixel value prediction methods between screens. (When using the Baseline profile, only Predictive prediction is available).

さらにＨ．２６４／ＡＶＣ方式では、予測に利用するブロックのサイズを１６×１６画素サイズから８×８、８×４、４×８、４×４の小ブロックに分割することで、より符号化効率を高めている。ブロック分割の技術に関し、例えば特許文献１には、入力画像をＭ×Ｎサイズの第１のブロックに分割し、さらに第１のブロックをｍ×ｎサイズの第２のブロックに分割し、第２のブロックの画像から抽出した特徴情報に基づいて第１のブロックの分割形状を決定することが記載されている。 Further H. In the H.264 / AVC format, the block size used for prediction is divided into 16 × 16 pixel size into small blocks of 8 × 8, 8 × 4, 4 × 8, and 4 × 4, thereby further improving the encoding efficiency. ing. Regarding block division technology, for example, Patent Document 1 discloses that an input image is divided into first blocks of M × N size, and further, the first block is divided into second blocks of m × n size, The division shape of the first block is determined based on the feature information extracted from the image of the block.

特開２００８−１７３０５号公報JP 2008-17305 A

Ｈ．２６４／ＡＶＣの符号化方式において、画面間予測を用いる場合には、マクロブロック毎に複数の画素値予測方法とブロックサイズを切り替えながら予測を行う。そのため、マクロブロック毎に画面間予測モード情報として、画素値予測方法とブロックサイズ情報を符号化してストリームに付与する必要がある。具体的にはこの画面間予測モード情報として、１マクロブロック当たり１〜８ｂｉｔの符号を割り当てている。１マクロブロック当たりの平均符号量が数ｂｉｔであることを考慮すると、この予測モード情報は決して少ない量ではなく、これを削減できれば符号量の大幅な低減が可能となる。 H. In the H.264 / AVC encoding scheme, when inter-screen prediction is used, prediction is performed while switching a plurality of pixel value prediction methods and block sizes for each macroblock. For this reason, it is necessary to encode the pixel value prediction method and block size information as inter-screen prediction mode information for each macroblock and add the result to the stream. Specifically, a code of 1 to 8 bits per macroblock is allocated as the inter-screen prediction mode information. Considering that the average code amount per macroblock is several bits, the prediction mode information is not a small amount, and if it can be reduced, the code amount can be greatly reduced.

本発明の目的は、この画面間予測モード情報の符号量を削減することにより、さらなる符号量の低減を実現することにある。 An object of the present invention is to realize a further reduction in the code amount by reducing the code amount of the inter prediction mode information.

本発明の動画像符号化装置は、入力画像を複数のサイズのうち所定のサイズのブロックに分割するブロック分割部と、前記ブロックに対し参照画像からの画面間予測により予測画像を生成する画面間予測部と、前記予測画像と前記入力画像の予測差分を算出する減算部と、前記予測差分に対して周波数変換処理と量子化処理と可変長符号化処理を行い符号化ストリームを生成する符号化ストリーム生成部と、符号化済み画像の各ブロックについて予測画像を生成したときの予測モードを保持するフレームメモリとを備え、前記画面間予測部は、前記フレームメモリに保持される符号化済み画像の同位置のブロックの予測モードを基に当該ブロックの予測モードを決定する予測モードスキップ動作を実行する。 A moving image encoding apparatus according to the present invention includes a block dividing unit that divides an input image into blocks of a predetermined size among a plurality of sizes, and an inter-screen that generates a predicted image by inter-screen prediction from a reference image for the block. A prediction unit; a subtraction unit that calculates a prediction difference between the prediction image and the input image; and encoding that generates a coded stream by performing frequency conversion processing, quantization processing, and variable length coding processing on the prediction difference. A stream generation unit, and a frame memory that holds a prediction mode when a prediction image is generated for each block of the encoded image, and the inter-screen prediction unit is configured to store the encoded image stored in the frame memory. Based on the prediction mode of the block at the same position, a prediction mode skip operation for determining the prediction mode of the block is executed.

前記画面間予測部にて前記予測モードスキップ動作を実行した場合、前記符号化ストリーム生成部は前記符号化ストリーム中の予測モード情報を削除するとともに、前記予測モードスキップ動作を実行したことを示すスキップフラグを付与する。 When the prediction mode skip operation is executed in the inter-screen prediction unit, the encoded stream generation unit deletes prediction mode information in the encoded stream and indicates that the prediction mode skip operation has been executed. Give a flag.

本発明の動画像符号化方法は、入力画像を複数のサイズのうち所定のサイズのブロックに分割するブロック分割ステップと、前記ブロックに対し、参照画像からの画面間予測により予測画像を生成する画面間予測ステップと、前記予測画像と前記入力画像の予測差分を算出する減算ステップと、前記予測差分に対して周波数変換処理と量子化処理と可変長符号化処理を行い符号化ストリームを生成する符号化ストリーム生成ステップとを備え、前記画面間予測ステップには、符号化済み画像の同位置のブロックの予測モードを基に当該ブロックの予測モードを決定する予測モードスキップ動作を含む。 The moving image encoding method of the present invention includes a block dividing step of dividing an input image into blocks of a predetermined size among a plurality of sizes, and a screen for generating a predicted image by inter-screen prediction from a reference image for the block A code for generating an encoded stream by performing an inter prediction step, a subtraction step for calculating a prediction difference between the prediction image and the input image, and performing a frequency conversion process, a quantization process, and a variable-length encoding process on the prediction difference The inter-picture prediction step includes a prediction mode skip operation for determining the prediction mode of the block based on the prediction mode of the block at the same position of the encoded image.

本発明の動画像復号化装置は、符号化ストリームに対して可変長復号化処理と逆量子化処理と逆周波数変換処理を行い予測差分を生成する差分画像復号化部と、前記符号化ストリームを復号化するブロックに分割し参照画像からの画面間予測により予測画像を生成する画面間予測部と、前記予測画像と前記予測差分を加算して復号化画像を生成する加算部と、復号化済み画像の各ブロックについて予測画像を生成したときの予測モードを保持するフレームメモリとを備え、前記画面間予測部は、前記フレームメモリに保持される復号化済み画像の同位置のブロックの予測モードを基に当該ブロックの予測モードを決定する予測モードスキップ動作を実行する。 The moving picture decoding apparatus of the present invention includes a differential image decoding unit that generates a prediction difference by performing a variable length decoding process, an inverse quantization process, and an inverse frequency transform process on an encoded stream, and the encoded stream An inter-screen prediction unit that generates a prediction image by inter-screen prediction from a reference image by dividing into blocks to be decoded, an addition unit that generates a decoded image by adding the prediction image and the prediction difference, and has been decoded A frame memory that holds a prediction mode when a prediction image is generated for each block of the image, and the inter-screen prediction unit sets a prediction mode of a block at the same position of the decoded image held in the frame memory. Based on this, a prediction mode skip operation for determining the prediction mode of the block is executed.

本発明の動画像復号化方法は、符号化ストリームに対して可変長復号化処理と逆量子化処理と逆周波数変換処理を行い予測差分を生成する差分画像復号化ステップと、前記符号化ストリームを復号化するブロックに分割し参照画像からの画面間予測により予測画像を生成する画面間予測ステップと、前記予測画像と前記予測差分を加算して復号化画像を生成する加算ステップとを備え、前記画面間予測ステップには、復号化済み画像の同位置のブロックの予測モードを基に当該ブロックの予測モードを決定する予測モードスキップ動作を含む。 The moving image decoding method of the present invention includes a difference image decoding step for generating a prediction difference by performing a variable length decoding process, an inverse quantization process, and an inverse frequency transform process on an encoded stream; An inter-screen prediction step of generating a predicted image by inter-screen prediction from a reference image divided into blocks to be decoded, and an adding step of generating a decoded image by adding the predicted image and the prediction difference, The inter-screen prediction step includes a prediction mode skip operation for determining the prediction mode of the block based on the prediction mode of the block at the same position of the decoded image.

本発明によれば、画質を劣化させることなく符号量をさらに低減することのできる動画像符号化技術および復号化技術を提供する。 According to the present invention, there is provided a moving image encoding technique and a decoding technique that can further reduce the amount of code without degrading the image quality.

本発明による動画像符号化装置の一実施例を示す構成図（実施例１）。1 is a configuration diagram (Example 1) showing an embodiment of a moving picture encoding apparatus according to the present invention. 画面間予測処理の動作を概念的に示した図。The figure which showed notionally the operation | movement of the prediction process between screens. 画面間予測方法で利用可能な符号化モードの種類を示す図。The figure which shows the kind of encoding mode which can be utilized with the inter-screen prediction method. 予測モードスキップによるモード決定を示す概念図。The conceptual diagram which shows the mode determination by prediction mode skip. 予測モードスキップにより削減できるデータ領域を示す図。The figure which shows the data area which can be reduced by prediction mode skip. スキップフラグを付与した符号化ストリームの構造を示す模式図。The schematic diagram which shows the structure of the encoding stream which provided the skip flag. 本発明による動画像復号化装置の一実施例を示す構成図（実施例２）。The block diagram which shows one Example of the moving image decoding apparatus by this invention (Example 2). 動きベクトルを参照して予測モードスキップを切り替える例（実施例３）。An example of switching a prediction mode skip with reference to a motion vector (Example 3). 隣接ブロックの予測モードを参照して予測モードスキップを切り替える例。The example which switches prediction mode skip with reference to the prediction mode of an adjacent block. 対象画像がＢピクチャである場合の参照ピクチャを示す図（実施例４）。FIG. 10 is a diagram illustrating a reference picture when the target image is a B picture (Example 4).

以下、本発明の実施形態を、図面を参照して説明する。 Embodiments of the present invention will be described below with reference to the drawings.

図１は、本発明による動画像符号化装置の一実施例を示す構成図である。動画像符号化装置は、入力された原画像１０１を保持する入力画像メモリ１０２と、入力画像を小領域に分割するブロック分割部１０３を有する。また、分割したブロック単位で画面内予測を行う画面内予測部１０５と、動き探索部１０４にて検出された動き量を基にブロック単位で画面間予測を行う画面間予測部１０６と、画像の性質に合った予測符号化手段（予測方法およびブロックサイズ）を決定するモード選択部１０７を有する。そして、入力画像と予測画像との予測差分を算出する減算部１０８と、予測差分に対して符号化を行う周波数変換部１０９および量子化部１１０と、記号の発生確率に応じた符号化を行う可変長符号化部１１１を有する。また、一度符号化した予測差分を復号化する逆量子化処理部１１２および逆周波数変換部１１３と、復号化された予測差分を用いて復号化画像を生成する加算部１１４と、復号化画像を保持して後の予測に活用する参照画像メモリ１１５を有する。さらに、前ピクチャの画面間予測モードを保持するフレームメモリ１１６を有する。 FIG. 1 is a block diagram showing an embodiment of a moving picture coding apparatus according to the present invention. The moving image encoding apparatus includes an input image memory 102 that holds an input original image 101 and a block dividing unit 103 that divides the input image into small regions. In addition, an intra-screen prediction unit 105 that performs intra-screen prediction in divided block units, an inter-screen prediction unit 106 that performs inter-screen prediction in block units based on the amount of motion detected by the motion search unit 104, A mode selection unit 107 that determines predictive encoding means (prediction method and block size) that matches the properties is provided. Then, a subtraction unit 108 that calculates a prediction difference between the input image and the prediction image, a frequency conversion unit 109 and a quantization unit 110 that perform encoding on the prediction difference, and encoding according to the symbol occurrence probability. A variable length encoding unit 111 is included. In addition, an inverse quantization processing unit 112 and an inverse frequency transform unit 113 that decode a prediction difference that has been encoded once, an addition unit 114 that generates a decoded image using the decoded prediction difference, and a decoded image A reference image memory 115 is stored and used for later prediction. Furthermore, it has a frame memory 116 that holds the inter picture prediction mode of the previous picture.

まず、装置全体の動作を説明する。入力画像メモリ１０２は原画像１０１の中から一枚の画像を符号化対象画像として保持し、これをブロック分割部１０３にて細かなブロックに分割し、動き探索部１０４、画面内予測部１０５、および画面間予測部１０７に渡す。動き探索部１０４では、参照画像メモリ１１５に格納されている参照画像を用いて該当ブロックの動き量を計算し、動きベクトルを画面間予測部１０６に渡す。画面内予測部１０５および画面間予測部１０６では画面内予測処理および画面間予測処理をいくつかの大きさのブロック単位で実行して予測画像を生成する。モード選択部１０７はどちらか最適な予測画像を選択する。続いて減算部１０８では、入力画像と予測画像との予測差分を算出し周波数変換部１０９に渡す。周波数変換部１０９および量子化処理部１１０では、送られてきた予測差分に対して指定された大きさのブロック単位でそれぞれ離散コサイン変換（ＤＣＴ：Discrete Cosine Transformation）などの周波数変換および量子化処理を行い、可変長符号化処理部１１１および逆量子化処理部１１２に渡す。さらに可変長符号化処理部１１１では、周波数変換係数によって表される予測差分情報と、例えば画面内予測を行う際に利用した予測方向や画面間予測を行う際に利用した動きベクトルなどの復号化に必要な情報を、記号の発生確率に基づいて可変長符号化を行って符号化ストリーム１１７を生成する。また、逆量子化処理部１１２および逆周波数変換部１１３では、量子化後の周波数変換係数に対して、それぞれ逆量子化および逆ＤＣＴ（Inverse DCT）などの逆周波数変換を施し、予測差分を取得して加算部１１４に送る。続いて加算部１１４により復号化画像を生成して参照画像メモリ１１５に格納する。 First, the operation of the entire apparatus will be described. The input image memory 102 holds one image as an encoding target image from the original image 101, and divides it into fine blocks by the block dividing unit 103, and a motion search unit 104, an in-screen prediction unit 105, And to the inter-screen prediction unit 107. The motion search unit 104 calculates the amount of motion of the corresponding block using the reference image stored in the reference image memory 115 and passes the motion vector to the inter-screen prediction unit 106. The intra-screen prediction unit 105 and the inter-screen prediction unit 106 execute the intra-screen prediction process and the inter-screen prediction process in units of several blocks to generate a predicted image. The mode selection unit 107 selects either optimal prediction image. Subsequently, the subtraction unit 108 calculates a prediction difference between the input image and the prediction image and passes it to the frequency conversion unit 109. The frequency conversion unit 109 and the quantization processing unit 110 perform frequency conversion and quantization processing such as discrete cosine transformation (DCT) for each block of a size specified for the transmitted prediction difference. And pass to the variable length coding processing unit 111 and the inverse quantization processing unit 112. Further, the variable length coding processing unit 111 decodes the prediction difference information represented by the frequency conversion coefficient, the prediction direction used when performing intra prediction, for example, the motion vector used when performing inter prediction, and the like. The encoded stream 117 is generated by performing variable-length coding on the information necessary for the generation based on the occurrence probability of the symbols. In addition, the inverse quantization processing unit 112 and the inverse frequency transform unit 113 perform the inverse frequency transform such as inverse quantization and inverse DCT (Inverse DCT) on the quantized frequency transform coefficient, respectively, and obtain a prediction difference. To the adder 114. Subsequently, the adder 114 generates a decoded image and stores it in the reference image memory 115.

次に、画面間予測部１０６の動作について説明する。
図２は、画面間予測処理の動作を概念的に示した図である。Ｈ．２６４／ＡＶＣでは、符号化対象画像に対してラスタースキャンの順序に従ってブロック単位による符号化を行う。画面間予測を行う際には、符号化対象画像３０３と同じ画像列３０１に含まれる符号化済みの画像の復号画像を参照画像３０２とし、対象画像中の対象ブロック３０４と相関の高いブロック（予測画像）３０５を参照画像中から探索する。このとき、両ブロックの差分として計算される予測差分に加えて、両ブロックの座標値の差分を動きベクトル（ＭＶ）３０６として符号化する。一方復号化の際には上記と逆の手順を行えばよく、復号化された予測差分を参照画像中のブロック（予測画像）３０５に加えることにより、復号化画像を生成する。Next, the operation of the inter-screen prediction unit 106 will be described.
FIG. 2 is a diagram conceptually illustrating the operation of the inter-screen prediction process. H. In H.264 / AVC, the encoding target image is encoded in block units according to the raster scan order. When performing inter-screen prediction, a decoded image of an encoded image included in the same image sequence 301 as the encoding target image 303 is used as a reference image 302, and a block having a high correlation with the target block 304 in the target image (prediction Image) 305 is searched from the reference image. At this time, in addition to the prediction difference calculated as the difference between both blocks, the difference between the coordinate values of both blocks is encoded as a motion vector (MV) 306. On the other hand, in the decoding process, the reverse procedure may be performed, and the decoded image is generated by adding the decoded prediction difference to the block (predicted image) 305 in the reference image.

図３は、Ｈ．２６４／ＡＶＣで規定される画面間予測方法に関し、例えばBaselineプロファイルで利用可能な符号化モードの種類を示す図である。Ｈ．２６４／ＡＶＣでは画面間の画素値予測方法として、１枚の参照画像を指定する順方向予測（Predictive予測）と、２枚の参照画像を指定することが可能な双方向予測（Bi-directional predictive予測）を規定しているが、Baselineプロファイルを利用する場合は、Predictive予測のみが利用可能となっている。各フレームでは、画面左上のマクロブロックから右下のマクロブロックに向かってラスター走査の順番に従って順次符号化が行われる。マクロブロックはさらに小さなサイズのブロック（サブブロック）に分割することが可能であり、予め予測方法の種類ごとに定められたいくつかのサイズの中から最適なものを選んで符号化を行う。 FIG. 2 is a diagram illustrating types of encoding modes that can be used in a Baseline profile, for example, with respect to an inter-screen prediction method defined by H.264 / AVC. H. In H.264 / AVC, as a pixel value prediction method between screens, forward prediction that designates one reference image (Predictive prediction) and bi-directional prediction that can designate two reference images (Bi-directional predictive) Prediction), but when using the Baseline profile, only Predictive prediction is available. In each frame, encoding is sequentially performed in the order of raster scanning from the upper left macroblock to the lower right macroblock. The macroblock can be divided into smaller blocks (sub-blocks), and encoding is performed by selecting an optimum one from several sizes determined in advance for each type of prediction method.

画面間予測のブロックサイズとしては、１６×１６画素（Ｐ１６×１６モード）、１６×８画素（Ｐ１６×８モード）、８×１６画素（Ｐ８×１６モード）、８×８画素（Ｐ８×８モード）のサイズが用意されており、８×８画素サイズの場合はさらに８×８画素、８×４画素、４×８画素、４×４画素サイズのサブブロックに分割することが可能である。さらに、１６×１６画素のブロックサイズに対しては動きベクトル情報を符号化しないＰＳｋｉｐモードを、８×８画素サイズに対しては参照フレーム番号を符号化しないＰ８×８ｒｅｆ０モードを用意している。 As block sizes for inter-screen prediction, 16 × 16 pixels (P16 × 16 mode), 16 × 8 pixels (P16 × 8 mode), 8 × 16 pixels (P8 × 16 mode), 8 × 8 pixels (P8 × 8) Mode) is prepared, and in the case of 8 × 8 pixel size, it can be further divided into sub-blocks of 8 × 8 pixel, 8 × 4 pixel, 4 × 8 pixel, and 4 × 4 pixel size. . Further, a PSskip mode that does not encode motion vector information is prepared for a block size of 16 × 16 pixels, and a P8 × 8ref0 mode that does not encode reference frame numbers is prepared for an 8 × 8 pixel size.

画面間予測符号化を行う場合は、参照フレーム内で該当ブロックと相関の高い領域を探索してその動き量を検出し、動き補償を行って参照画像を作成する。この場合、符号化ストリームのヘッダ部には動きベクトル情報と参照フレーム番号が書き込まれる。各マクロブロックに対して、上記のいずれかのサイズのブロック単位で予測画像を生成し、原画像との予測差分に対して周波数変換および量子化処理を施して符号化を行う。 In the case of performing inter-picture prediction coding, a region having a high correlation with the corresponding block is searched in the reference frame, the amount of motion is detected, and motion compensation is performed to create a reference image. In this case, motion vector information and a reference frame number are written in the header portion of the encoded stream. For each macroblock, a prediction image is generated in units of blocks of any one of the sizes described above, and encoding is performed by performing frequency conversion and quantization processing on the prediction difference from the original image.

このように画面間予測では、マクロブロック毎に複数の画素値予測方法とブロックサイズを切り替えながら予測を行うため、画面間予測モード情報として、画素値予測方法とブロックサイズ情報をマクロブロック毎に付与して符号化する必要があった。 In this way, in inter-screen prediction, prediction is performed while switching between a plurality of pixel value prediction methods and block sizes for each macroblock, so pixel value prediction method and block size information are assigned to each macro block as inter-screen prediction mode information. Therefore, it was necessary to encode.

そこで本実施例においては、この予測モード情報を省略する新たな予測方式として、符号化済みの前ピクチャの同位置のマクロブロックの予測モードを参照して、現マクロブロックの予測モードを決定する方式を導入する。以後この方式を、「予測モードスキップ」と呼ぶことにする。これに対し、前ピクチャの予測モードとは関係なく、現マクロブロックについて画質と符号量の観点から最適なブロックサイズを選択して決定する従来の方式を「非スキップ」と呼ぶことにする。本実施例では「予測モードスキップ」を実行するために、前ピクチャの予測モードを保持するフレームメモリ１１６を設けている。「予測モードスキップ」の場合、画面間予測部１０６は、フレームメモリ１１６に格納されている前ピクチャの同位置のマクロブロックの予測モードを参照して、同一の予測モードにて現マクロブロックの予測画像を生成する。そして、「予測モードスキップ」にて符号化したデータには、「予測モードスキップ」であることを示す「スキップフラグ」を付与する。ただしこの「スキップフラグ」に要する符号量はわずかであるから（１ｂｉｔで足りる）、従来の予測モード情報の省略による符号量削減の効果が大きく、結果として出力ストリームの符号量の低減を図ることができる。 Therefore, in this embodiment, as a new prediction method that omits this prediction mode information, a method for determining the prediction mode of the current macroblock with reference to the prediction mode of the macroblock at the same position of the previous coded picture. Is introduced. Hereinafter, this method is referred to as “prediction mode skip”. In contrast, a conventional method for selecting and determining an optimal block size from the viewpoint of image quality and code amount for the current macroblock regardless of the prediction mode of the previous picture is referred to as “non-skip”. In this embodiment, in order to execute “prediction mode skip”, a frame memory 116 that holds the prediction mode of the previous picture is provided. In the case of “prediction mode skip”, the inter prediction unit 106 refers to the prediction mode of the macroblock at the same position of the previous picture stored in the frame memory 116 and predicts the current macroblock in the same prediction mode. Generate an image. Then, a “skip flag” indicating “prediction mode skip” is added to the data encoded by “prediction mode skip”. However, since the code amount required for the “skip flag” is very small (1 bit is sufficient), the effect of reducing the code amount by omitting the conventional prediction mode information is great, and as a result, the code amount of the output stream can be reduced. it can.

図４は、画面間予測モードスキップによるモード決定を示す概念図である。ここでは順方向予測（Predictive予測）の分割モードとして、Ｐ１６×１６、Ｐ１６×８、Ｐ８×１６、Ｐ８×８の各モードと、画面内予測モード（Intra）から選択するものとする。 FIG. 4 is a conceptual diagram showing mode determination by inter-screen prediction mode skip. Here, as a division mode of forward prediction (Predictive prediction), it is assumed that each mode of P16 × 16, P16 × 8, P8 × 16, and P8 × 8 and an intra-screen prediction mode (Intra) are selected.

（ａ）に示すように、「予測モードスキップ」方式では、前ピクチャの同位置マクロブロックの予測モードを基に現マクロブロックの予測モードを決定する。ここでは、現ピクチャの符号化対象マクロブロックＭＢ２と同位置の前ピクチャのマクロブロックＭＢ１の予測モードがＰ８×１６モードであるから、符号化対象マクロブロックＭＢ２の予測モードとしてＰ８×１６モードを採用する。隣接する他のマクロブロックについても同様に決定する。 As shown in (a), in the “prediction mode skip” method, the prediction mode of the current macroblock is determined based on the prediction mode of the same-position macroblock of the previous picture. Here, since the prediction mode of the macroblock MB1 of the previous picture at the same position as the encoding target macroblock MB2 of the current picture is the P8 × 16 mode, the P8 × 16 mode is adopted as the prediction mode of the encoding target macroblock MB2. To do. It determines similarly about other adjacent macroblocks.

なお（ｂ）に示すように、前ピクチャの同位置マクロブロックＭＢ１の予測モードが画面内予測モード（Ｉｎｔｒａ）であった場合には、符号化対象マクロブロックＭＢ２の予測モードとして画面内予測モードをそのまま用いることができないため、画面間予測モード（例えばＰ１６×１６）に変更して設定する。 As shown in (b), when the prediction mode of the same-position macroblock MB1 of the previous picture is the intra prediction mode (Intra), the intra prediction mode is set as the prediction mode of the encoding target macroblock MB2. Since it cannot be used as it is, it is changed to the inter-screen prediction mode (for example, P16 × 16) and set.

ここで、符号化対象であるマクロブロックに対して画面間予測モードスキップを採用するか否かは、対象ブロックの画質と符号量の観点から決定する（スキップ採否ルール）。すなわち、前ピクチャとの相関が強いときは前ピクチャの予測モードと同一のモードが最適となる可能性が高いので、予測モードスキップを採用しても画質の劣化が生じることは少ない。具体的には、動画像符号化を行うための要素技術の一つであるＲＤ−optimization等、符号化を行う方式をマクロブロック単位で決定するモード選択と同様の仕組みによって判断する。あるいは、ピクチャ間でのモード情報の類似度を計算し、閾値と比較して類似度が大きい場合には予測モードスキップを採用すればよい。 Here, whether or not to adopt inter-screen prediction mode skip for a macroblock to be encoded is determined from the viewpoint of image quality and code amount of the target block (skip adoption rule). That is, when the correlation with the previous picture is strong, there is a high possibility that the same mode as the prediction mode of the previous picture is optimal. Therefore, even if the prediction mode skip is adopted, the image quality is hardly deteriorated. Specifically, it is determined by a mechanism similar to mode selection that determines a coding method such as RD-optimization, which is one of elemental technologies for performing moving image coding, for each macroblock. Alternatively, the degree of similarity of mode information between pictures is calculated, and when the degree of similarity is larger than the threshold, prediction mode skip may be employed.

画面間予測モードスキップを採用することにより、画面間予測モード情報に用いる符号量を削減できる。以下これについて説明する。
図５は、予測モードスキップを採用することにより削減できるストリーム中のデータ領域を示す図である。（ａ）はブロックサイズが８×８より大の場合で、マクロブロックの予測モードを示すマクロブロックタイプの領域が削減される。（ｂ）はブロックサイズが８×８以下の場合で、（ａ）の場合に加え、さらにサブマクロブロックの予測モードを示すサブマクロブロックタイプの領域についても削減される。各領域での符号量の削減は１〜８ｂｉｔであり、これらの領域は各マクロブロックに対してそれぞれ存在するので、ストリーム全体で見ると大きな削減量となる。By employing inter-screen prediction mode skip, the amount of codes used for inter-screen prediction mode information can be reduced. This will be described below.
FIG. 5 is a diagram showing data areas in the stream that can be reduced by adopting prediction mode skip. (A) is a case where the block size is larger than 8 × 8, and the area of the macroblock type indicating the macroblock prediction mode is reduced. (B) is a case where the block size is 8 × 8 or less, and in addition to the case of (a), the sub macroblock type area indicating the prediction mode of the sub macroblock is also reduced. The reduction of the code amount in each region is 1 to 8 bits, and these regions exist for each macroblock, so that the amount of reduction is large when viewed in the entire stream.

図６は、スキップフラグを付与した符号化ストリームの構造を示す模式図である。スキップフラグ（ＳＦ）は、画面間予測モードスキップを行うデータの範囲を単位として１個付与すればよい。スキップフラグの符号量は１ｂｉｔで足りるので、符号量の増加は極めてわずかである（例えばスキップ＝「１」、非スキップ＝「０」とする）。 FIG. 6 is a schematic diagram showing the structure of an encoded stream to which a skip flag is assigned. One skip flag (SF) may be assigned in units of the data range for which the inter-screen prediction mode skip is performed. Since the code amount of the skip flag is 1 bit, the increase in the code amount is extremely small (for example, skip = “1”, non-skip = “0”).

スキップフラグを付与するデータの範囲としては、（ｄ）のマクロブロック単位だけでなく、連続する複数のマクロブロックを単位として（ａ）ＧＯＰ単位、（ｂ）ピクチャ単位、（ｃ）スライス単位でも指定可能とする。さらにストリーム全体について画面間予測モードスキップを行う場合は、スキップフラグをストリームヘッダに付与すればよい。復号化装置（デコーダ）は、ストリーム内のスキップフラグの付与位置により、予測モードスキップを行うデータの範囲を判断して当該ストリームの復号化を実行する。このように、スキップフラグはマクロブロック単位だけでなく、複数個のマクロブロックに渡って１個だけ付与することもできるので、符号量の増加をより抑えることが可能になる。 The range of data to which the skip flag is assigned is specified not only in units of macroblocks in (d) but also in units of (a) GOP, (b) pictures, and (c) slices in units of a plurality of consecutive macroblocks. Make it possible. Furthermore, when performing the inter-screen prediction mode skip for the entire stream, a skip flag may be added to the stream header. The decoding device (decoder) determines the range of data for which the prediction mode skip is performed based on the skip flag assignment position in the stream, and executes decoding of the stream. As described above, since only one skip flag can be provided not only in units of macroblocks but also over a plurality of macroblocks, it is possible to further suppress an increase in code amount.

図７は、本発明による動画像復号化装置の一実施例を示す構成図である。動画像復号化装置は、例えば図１に示す動画像符号化装置によって生成された符号化ストリーム２０１に対して可変長符号化の逆の手順を踏む可変長復号化部２０２と、予測差分を復号化するための逆量子化処理部２０３および逆周波数変換部２０４を有する。また、予測画像を生成するために画面間予測を行う画面間予測部２０５と、画面内予測を行う画面内予測部２０６を有する。そして、予測差分と予測画像を加算して復号化画像２１０を生成する加算部２０７と、復号化画像２１０を一時的に記憶しておく参照画像メモリ２０８を有する。さらに、前ピクチャの全マクロブロックの予測モードを保持するフレームメモリ２０９を有する。 FIG. 7 is a block diagram showing an embodiment of the moving picture decoding apparatus according to the present invention. For example, the moving picture decoding apparatus decodes a prediction difference with a variable length decoding unit 202 that performs the reverse procedure of variable length coding on the encoded stream 201 generated by the moving picture encoding apparatus shown in FIG. A dequantization processing unit 203 and an inverse frequency conversion unit 204 for converting to a normal frequency. In addition, an inter-screen prediction unit 205 that performs inter-screen prediction to generate a predicted image and an intra-screen prediction unit 206 that performs intra-screen prediction are included. And it has the addition part 207 which adds a prediction difference and a prediction image, and produces | generates the decoded image 210, and the reference image memory 208 which memorize | stores the decoded image 210 temporarily. In addition, it has a frame memory 209 that holds prediction modes for all macroblocks of the previous picture.

ここで画面間予測部２０５の動作を説明する。入力する符号化ストリーム２０１には、符号化時の画面間予測モード情報（画素値予測方法とブロックサイズの情報）が付与されている。あるいは画面間予測モード情報を省略した場合には、画面間予測モードスキップを示す「スキップフラグ」が付与されている。画面間予測モード情報が付与されている場合は、指定された予測方法とブロックサイズに従い予測画像を生成する。「スキップフラグ」が付与されている場合は、フレームメモリ２０９を参照し、前ピクチャの同位置マクロブロックの予測モードを基に現マクロブロックの予測モードを決定し予測画像を生成する。 Here, the operation of the inter-screen prediction unit 205 will be described. The input encoded stream 201 is provided with inter-screen prediction mode information (pixel value prediction method and block size information) at the time of encoding. Alternatively, when the inter-screen prediction mode information is omitted, a “skip flag” indicating inter-screen prediction mode skip is assigned. When inter-screen prediction mode information is given, a prediction image is generated according to the specified prediction method and block size. When the “skip flag” is assigned, the frame memory 209 is referred to, the prediction mode of the current macroblock is determined based on the prediction mode of the same-position macroblock of the previous picture, and a prediction image is generated.

上記では、復号化装置は、符号化装置の決めた画面間予測モードスキップの採否結果（すなわちスキップフラグの有無）に従って予測モードを決定する。これに対し、復号化装置が独自で画面間予測モードスキップの採否を決定することもできる。その場合のスキップ採否ルールは、符号化装置の採否ルールをそのまま用いることができる。両者のルール（判定基準）を共通化すればスキップ採否結果も良く再現されることになり、伝送される符号化ストリームに付与するスキップフラグについても省略することができる。 In the above, the decoding apparatus determines the prediction mode according to the inter-screen prediction mode skip adoption result (that is, the presence or absence of the skip flag) determined by the encoding apparatus. On the other hand, the decoding apparatus can determine whether to adopt the inter-screen prediction mode skip by itself. In this case, the skip acceptance rule of the encoding device can be used as it is as the skip acceptance rule. If both rules (judgment criteria) are made common, the skip acceptance / rejection result will be well reproduced, and the skip flag added to the encoded stream to be transmitted can be omitted.

本実施例では、画面間予測モードスキップを採用するか否かをどのようにして決定するか（採否ルール）について、その具体例を述べる。
図８は、動きベクトルの大きさを参照して、画面間予測モードスキップの採否を切り替える例である。動きベクトルＭＶは、前記図２で示したように、符号化対象画像中の対象ブロック３０４と、参照画像中の相関の高いブロック３０５の位置ずれ量である。判定のため、動きベクトルＭＶに予め閾値を定めておく。In the present embodiment, a specific example of how to determine whether to adopt inter-screen prediction mode skip (adoption rule) will be described.
FIG. 8 is an example of switching between adoption and non-use of the inter-screen prediction mode skip with reference to the magnitude of the motion vector. As shown in FIG. 2, the motion vector MV is a positional deviation amount between the target block 304 in the encoding target image and the highly correlated block 305 in the reference image. For determination, a threshold value is set in advance for the motion vector MV.

（ａ）は前ピクチャの動きベクトルＭＶが所定値（閾値）より小さい場合である。ＭＶが小さい場合は画面間の画像の変化が小さく、画面間予測においてマクロブロック毎の予測モードが同一となる場合が多い。よって画面間予測モードスキップを採用し、前ピクチャの同位置マクロブロックＭＢ１の予測モードを当該マクロブロックＭＢ２の予測モードに使用する。 (A) is a case where the motion vector MV of the previous picture is smaller than a predetermined value (threshold). When the MV is small, the change in the image between the screens is small, and the prediction mode for each macroblock is often the same in the inter-screen prediction. Therefore, inter-screen prediction mode skip is adopted, and the prediction mode of the same-position macroblock MB1 of the previous picture is used as the prediction mode of the macroblock MB2.

（ｂ）は前ピクチャの動きベクトルＭＶが所定値より大きい場合である。ＭＶが大きい場合は、画面間の画像の変化が大きく、画面間予測においてマクロブロック毎の予測モードが異なる場合が多い。よって画面間予測モードスキップを採用せず、当該マクロブロックＭＢ２については独自に最適な予測モードを求める「非スキップモード」を設定する。 (B) is a case where the motion vector MV of the previous picture is larger than a predetermined value. When the MV is large, the image change between the screens is large, and the prediction mode for each macroblock is often different in the inter-screen prediction. Therefore, the inter-screen prediction mode skip is not adopted, and a “non-skip mode” for uniquely determining the optimal prediction mode is set for the macroblock MB2.

図９は、隣接マクロブロックの画面間予測モードを参照して、画面間予測モードスキップの採否を切り替える例である。すなわち、現ピクチャの隣接マクロブロックの画面間予測モードが予測モードスキップであれば、当該マクロブロックについても予測モードスキップを採用する。 FIG. 9 is an example of switching between adoption and non-use of the inter-screen prediction mode skip with reference to the inter-screen prediction mode of the adjacent macroblock. That is, if the inter-screen prediction mode of the adjacent macroblock of the current picture is the prediction mode skip, the prediction mode skip is also adopted for the macroblock.

（ａ）は隣接マクロブロックＭＢ３が画面間予測モードスキップを採用している場合であり、当該マクロブロックＭＢ２についても予測モードスキップを採用し、前ピクチャの同位置マクロブロックＭＢ１の予測モードを使用する。 (A) is a case where the adjacent macroblock MB3 adopts the inter-picture prediction mode skip, and the prediction mode skip is also adopted for the macroblock MB2 and the prediction mode of the same-position macroblock MB1 of the previous picture is used. .

（ｂ）は隣接マクロブロックＭＢ３が画面間予測モードスキップを採用していない場合（非スキップモード）であり、当該マクロブロックＭＢ２についても非スキップモードを設定し、独自に最適な予測モードを求める。 (B) is a case where the adjacent macroblock MB3 does not employ the inter-picture prediction mode skip (non-skip mode), and the non-skip mode is set for the macroblock MB2 and an optimal prediction mode is uniquely obtained.

上記実施例では、順方向予測画像間（Ｐピクチャ間）の画面間予測モードスキップに関して説明したが、双方向予測画像間（Ｂピクチャ間）の画面間予測モードスキップに関しても同様の方法で適用できる。ただしＢピクチャの場合は、参照可能なピクチャの数が増えるので選択の自由度が高まる。以下、参照するピクチャについて具体的に説明する。 In the above embodiment, the inter-screen prediction mode skip between the forward prediction images (between P pictures) has been described, but the same method can be applied to the inter-screen prediction mode skip between the bi-prediction images (between B pictures). . However, in the case of a B picture, since the number of pictures that can be referred to increases, the degree of freedom of selection increases. Hereinafter, the picture to be referred to will be described in detail.

図１０は、符号化対象画像がＢピクチャである場合に画面間予測モードスキップのために参照するピクチャを示す図である。ここでは３通りの方法を示す。
（ａ）の場合は、同一ピクチャタイプであるＢピクチャの画像を参照する。ここでは符号化対象ピクチャ＃４に対し、同一のＢピクチャである前ピクチャ＃３を参照する。そして、ピクチャ＃３内の同位置マクロブロックの予測モードを基に、ピクチャ＃４内の対象マクロブロックの予測モードを決定する。FIG. 10 is a diagram illustrating a picture to be referred for skipping the inter-picture prediction mode when the encoding target image is a B picture. Here, three methods are shown.
In the case of (a), an image of a B picture having the same picture type is referred to. Here, the previous picture # 3, which is the same B picture, is referred to the encoding target picture # 4. Then, based on the prediction mode of the same-position macroblock in picture # 3, the prediction mode of the target macroblock in picture # 4 is determined.

（ｂ）の場合は、異なるピクチャタイプである順方向予測画像（Ｐピクチャ）の参照を可能とする。ここでは符号化対象ピクチャ＃４に対し、同一タイプ（Ｂピクチャ）である前ピクチャ＃３の他に、異なるタイプであるＰピクチャ＃６を参照可能とする。そして、ピクチャ＃６内の同位置マクロブロックの予測モードを基に、ピクチャ＃４内の対象マクロブロックの予測モードを決定する。この場合ピクチャ＃３とピクチャ＃６のいずれを用いるかは、符号化側と復号化側で同一のルールで運用すればよい。 In the case of (b), it is possible to refer to forward prediction images (P pictures) that are different picture types. Here, with respect to the encoding target picture # 4, in addition to the previous picture # 3 of the same type (B picture), a different type of P picture # 6 can be referred to. Then, based on the prediction mode of the same-position macroblock in picture # 6, the prediction mode of the target macroblock in picture # 4 is determined. In this case, which of the picture # 3 and the picture # 6 is used may be operated according to the same rule on the encoding side and the decoding side.

（ｃ）の場合は、２つの順方向予測画像（Ｐピクチャ）のうち時間的に近いＰピクチャを参照する。すなわち、符号化対象ピクチャ＃８に対し、２つのＰピクチャ＃６と＃１１を参照することができる。ここでは、時間的に近い方のピクチャ＃６を参照する。そして、ピクチャ＃６内の同位置マクロブロックの予測モードを基に、ピクチャ＃８内の対象マクロブロックの予測モードを決定する。 In the case of (c), a P picture that is temporally closer is referred to between the two forward prediction images (P pictures). That is, two P pictures # 6 and # 11 can be referred to the encoding target picture # 8. Here, reference is made to picture # 6 which is closer in time. Then, based on the prediction mode of the same-position macroblock in picture # 6, the prediction mode of the target macroblock in picture # 8 is determined.

以上述べた双方向予測画像間（Ｂピクチャ間）の画面間予測モードスキップの場合のスキップ採否のルールについても、前記各実施例で述べた順方向予測画像間（Ｐピクチャ間）の場合と同様に定めることができる。なお、参照するピクチャが前ピクチャ以外となる場合が発生するので、符号化装置と復号化装置のフレームメモリ１１６，２０９では、前ピクチャだけでなく参照に必要なピクチャの予測モードを保持するものとする。 The skip acceptance rule in the case of the inter-screen prediction mode skip between the bi-predictive images described above (between B pictures) is also the same as that in the case of the inter-predictive image (p-picture) described in the above embodiments. Can be determined. Since the picture to be referred to may be other than the previous picture, the frame memories 116 and 209 of the encoding device and the decoding device hold not only the previous picture but also a prediction mode of a picture necessary for reference. To do.

１０１…原画像、１０２…入力画像メモリ、１０３…ブロック分割部、１０４…動き探索部、１０５…画面内予測部、１０６…画面間予測部、１０７…モード選択部、１０８…減算部、１０９…周波数変換部、１１０…量子化部、１１１…可変長符号化部、１１２…逆量子化処理部、１１３…逆周波数変換部、１１４…加算部、１１５…参照画像メモリ、１１６…フレームメモリ、２０１…符号化ストリーム、２０２…可変長復号化部、２０３…逆量子化処理部、２０４…逆周波数変換部、２０５…画面間予測部、２０６…画面内予測部、２０７…加算部、２０８…参照画像メモリ、２０９…フレームメモリ、２１０…復号化画像。 DESCRIPTION OF SYMBOLS 101 ... Original image, 102 ... Input image memory, 103 ... Block division part, 104 ... Motion search part, 105 ... In-screen prediction part, 106 ... Inter-screen prediction part, 107 ... Mode selection part, 108 ... Subtraction part, 109 ... Frequency conversion unit, 110 ... quantization unit, 111 ... variable length coding unit, 112 ... inverse quantization processing unit, 113 ... inverse frequency conversion unit, 114 ... addition unit, 115 ... reference image memory, 116 ... frame memory, 201 ... encoded stream, 202 ... variable length decoding unit, 203 ... inverse quantization processing unit, 204 ... inverse frequency transform unit, 205 ... inter-screen prediction unit, 206 ... intra-screen prediction unit, 207 ... addition unit, 208 ... see Image memory, 209... Frame memory, 210.

Claims

In the moving image encoding device that encodes the prediction difference between the input image and the prediction image by inter-screen prediction,
A block dividing unit that divides an input image into blocks of a predetermined size among a plurality of sizes;
For the block, an inter-screen prediction unit that generates a prediction image by inter-screen prediction from a reference image;
A subtraction unit that calculates a prediction difference between the predicted image and the input image;
An encoded stream generation unit that generates an encoded stream by performing a frequency conversion process, a quantization process, and a variable-length encoding process on the prediction difference;
A frame memory that holds a prediction mode when a prediction image is generated for each block of the encoded image;
The inter-screen prediction unit performs a prediction mode skip operation for determining the prediction mode of the block based on the prediction mode of the block at the same position of the encoded image held in the frame memory ,
When the prediction mode skip operation is executed by the inter-screen prediction unit, the encoded stream generation unit deletes the prediction mode information in the encoded stream and indicates that the prediction mode skip operation has been executed. A moving picture encoding apparatus characterized by providing a flag .

The moving image encoding device according to claim 1,
The video encoding apparatus according to claim 1, wherein the skip flag is assigned in units of a block unit or a stream unit including a plurality of continuous blocks, a GOP unit, a picture unit, or a slice unit.

In a video encoding method for encoding a prediction difference between a prediction image by inter-screen prediction and an input image,
A block dividing step for dividing the input image into blocks of a predetermined size among a plurality of sizes;
An inter-screen prediction step for generating a predicted image by inter-screen prediction from a reference image for the block;
A subtraction step of calculating a prediction difference between the predicted image and the input image;
A coded stream generation step of generating a coded stream by performing a frequency conversion process, a quantization process, and a variable length coding process on the prediction difference;
The inter-screen prediction step includes a prediction mode skip operation for determining the prediction mode of the block based on the prediction mode of the block at the same position of the encoded image,
When the prediction mode skip is executed in the inter-screen prediction step, the prediction mode information in the encoded stream is deleted, and a skip flag indicating that the prediction mode skip operation is executed is added. A moving image encoding method.

In the moving image encoding method according to claim 3,
The moving image encoding method according to claim 1, wherein the skip flag is assigned in units of a block unit or a stream unit composed of a plurality of continuous blocks, a GOP unit, a picture unit, or a slice unit.