JP2010011075A

JP2010011075A - Method and apparatus for encoding and decoding moving image

Info

Publication number: JP2010011075A
Application number: JP2008167884A
Authority: JP
Inventors: Akiyuki Tanizawa; 昭行谷沢; Takeshi Nakajo; 健中條
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2008-06-26
Filing date: 2008-06-26
Publication date: 2010-01-14

Abstract

<P>PROBLEM TO BE SOLVED: To provide a moving image encoding method by which a background region and a moving region are separated in a moving image and motion compensation/prediction processing is performed upon the separated moving region. <P>SOLUTION: A binary moving region separation mask indicating a moving region and a background region is generated for each reference image signal, one background image signal is generated or updated by comparing two or more reference image signals or in accordance with a value of the binary moving region separation mask for each reference image signal, the moving region separation mask is used to perform motion compensation processing upon a first portion of the image to be predicted corresponding to the moving region, and a signal obtained by interpolating the background image signal is complemented for a second portion of the image to be predicted corresponding to the background region, thereby generating a prediction image signal. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、動画像から背景領域と動領域を分離し、分離した動領域に対して動き補償予測処理を行う動画像符号化及び動画像復号化の方法及び装置に関する。 The present invention relates to a moving image coding and moving image decoding method and apparatus for separating a background region and a moving region from a moving image and performing motion compensation prediction processing on the separated moving region.

近年、大幅に符号化効率を向上させた動画像符号化方法がITU-TとISO/IECとの共同で、ITU-T Rec. H. 264及びISO/IEC 14496-10（以下、H. 264という）として勧告されている。H.264では、予測処理・変換処理・エントロピー符号化処理が矩形ブロック単位（16x16,8x8等）で行われる。このため、H. 264では矩形ブロックで表現出来ないオブジェクトを予測する際に、より小さな予測ブロック形状(４×４等)を選択することで予測効率を高めている。このようなオブジェクトを効果的に予測するために、矩形ブロックに複数の予測パターンを用意する方法や、ブロックを任意の線分で分割し、分割した形状毎に動き補償を適応する方法などが提案されている。 In recent years, a moving picture coding method with greatly improved coding efficiency has been jointly developed by ITU-T and ISO / IEC, and ITU-T Rec. H. 264 and ISO / IEC 14496-10 (hereinafter referred to as H. 264). Recommended). In H.264, prediction processing, conversion processing, and entropy encoding processing are performed in units of rectangular blocks (16 × 16, 8 × 8, etc.). For this reason, when predicting an object that cannot be represented by a rectangular block in H.264, the prediction efficiency is increased by selecting a smaller predicted block shape (4 × 4, etc.). In order to predict such objects effectively, a method of preparing multiple prediction patterns in a rectangular block, a method of dividing a block with an arbitrary line segment, and applying motion compensation for each divided shape are proposed. Has been.

背景画像と前景画像を分離する予測方法として、2枚の参照画像に挟まれた符号化スライス（B-slice）に着目して、前景・後景を分離し、別々に動き補償する手法が提案されている［特許文献1］。また、既に符号化が完了した複数の参照画像間の差分から、それぞれの参照画像に対応する背景画像マスクと背景参照画像を作成し、動き補償予測の際に合成する手法が提案されている［非特許文献1］。 As a prediction method that separates the background image and the foreground image, focusing on a coded slice (B-slice) sandwiched between two reference images, a method that separates the foreground and the foreground and performs motion compensation separately is proposed. [Patent Document 1]. In addition, a method has been proposed in which a background image mask and a background reference image corresponding to each reference image are created from the differences between a plurality of reference images that have already been encoded, and combined in motion compensated prediction [ Non-patent document 1].

特許文献1の方法では、前景・後景に対応した動きベクトル情報やブロック分割情報を符号化する必要があるため、低ビットレート時に対して符号化効率が低下する問題がある。また、エンコーダでは、最適な予測モードを選択するために、繰り返し符号化する必要があり演算量が増大する。 In the method of Patent Document 1, since it is necessary to encode motion vector information and block division information corresponding to the foreground and the background, there is a problem that the encoding efficiency is lowered at a low bit rate. In addition, in the encoder, it is necessary to repeatedly encode in order to select an optimal prediction mode, which increases the amount of calculation.

非特許文献１の方法では、画像間の絶対差分値を基準として画素毎に領域を分離するため、符号化する映像に含まれるノイズや、高圧縮に符号化する際などに発生する量子化誤差などの影響によって、オブジェクトと背景領域との分離が困難となり、予測効率が低下する場合がある。また、参照画像毎に背景画像マスクと背景参照画像メモリを生成する必要があり、デコーダのメモリが増大すると言う問題がある。
特開２００２−３５９８５４号公報 R. Ding, F. Wang, Q. Dai, W. Xu and D. Zhu, “Composite-Block Model And Joint-Prediction Algorithm For Inter-frame Video Coding,” ICASSP-2006 May 2006 In the method of Non-Patent Document 1, since an area is separated for each pixel on the basis of an absolute difference value between images, noise included in a video to be encoded, quantization error that occurs when encoding is performed with high compression, and the like. For example, it may be difficult to separate the object and the background area, and the prediction efficiency may be reduced. Further, it is necessary to generate a background image mask and a background reference image memory for each reference image, and there is a problem that the memory of the decoder increases.
JP 2002-359854 A R. Ding, F. Wang, Q. Dai, W. Xu and D. Zhu, “Composite-Block Model And Joint-Prediction Algorithm For Inter-frame Video Coding,” ICASSP-2006 May 2006

本発明の目的は、復号化済みの複数の参照画像から、それぞれの参照画像信号毎に対応する二値の動領域分離マスクとただ１つの背景画像信号を生成し、動領域分離マスクから動領域と判定された領域に対して動き補償予測を実施し、背景領域と判定された領域に対して、背景画像信号を補完した値を補填することによって、過度のブロック細分化による符号量の増加を防ぎ、予測効率を向上させる。 An object of the present invention is to generate a binary moving region separation mask and a single background image signal corresponding to each reference image signal from a plurality of decoded reference images, and generate a moving region from the moving region separation mask. The motion compensation prediction is performed on the area determined as the background area, and the area determined as the background area is supplemented with the value supplemented with the background image signal, thereby increasing the code amount due to excessive block segmentation. Prevent and improve prediction efficiency.

本発明の一態様は、入力画像信号を複数の画素ブロックに分割し、参照画像信号を用いて各画素ブロックの予測処理を行い，前記入力画像信号と予測画像信号との差分信号を符号化する動画像符号化方法において、各参照画像の信号毎に動領域と背景領域とを示す二値の動領域分離マスクを生成するマスク生成ステップと、２つ以上の前記参照画像の信号の比較或いは前記参照画像の信号毎の二値の前記動領域分離マスクの値により、1つの背景画像の信号を生成或いは更新する背景画像生成／更新ステップと、前記動領域分離マスクを用いて、（１）前記動領域に対応する、予測対象画像の第１部分に対して動き補償処理を行い、（２）前記背景領域に対応する、前記予測対象画像の第2部分には前記背景画像の信号を補間した信号を補填する、ことによって予測画像信号を生成する予測画像生成ステップと、を有するように構成される動画像符号化方法を提供する。 One embodiment of the present invention divides an input image signal into a plurality of pixel blocks, performs a prediction process on each pixel block using a reference image signal, and encodes a difference signal between the input image signal and the predicted image signal In the moving image encoding method, a mask generating step for generating a binary moving region separation mask indicating a moving region and a background region for each reference image signal, and comparing the signals of two or more reference images or the A background image generating / updating step for generating or updating a signal of one background image according to a binary moving region separation mask value for each signal of a reference image, and using the moving region separation mask, (1) Motion compensation processing is performed on the first part of the prediction target image corresponding to the moving area, and (2) the background image signal is interpolated in the second part of the prediction target image corresponding to the background area Complement the signal , To provide a moving picture coding method configured to have a predictive image generation step of generating a predictive image signal by.

本発明の手法を用いることで、矩形ブロックに適さない動オブジェクトを予測するために、過度のブロック分割が施されて、ブロック分割情報が増大することを防ぐ。つまり、付加的な情報を増加させずに、ブロック内の動領域と背景領域を分離し、それぞれに最適な予測方法を適用することによって、符号化効率を向上させると共に主観画質も向上するという効果を奏する。 By using the method of the present invention, in order to predict a moving object that is not suitable for a rectangular block, excessive block division is prevented and block division information is prevented from increasing. In other words, the effect of improving coding efficiency and subjective image quality by separating the motion area and background area in a block without applying additional information and applying an optimal prediction method to each of them. Play.

以下、図面を参照して本発明の第１〜第６の実施形態を説明する。 Hereinafter, first to sixth embodiments of the present invention will be described with reference to the drawings.

＜動画像符号化装置＞
本発明に従った動領域分離予測符号化を実現するための動画像符号化装置１００の構成は図１に示されている。動画像符号化装置１００の予測部１０６の詳細なブロック図は図２に示されている。動領域分離予測符号化方法を実施する動領域分離予測符号化に関連するインター予測部のブロック図は図３に示されている。まず、動画像符号化に関する動領域分離予測符号化方法について図１、図２、図３を参照しながら実施形態について説明する。 <Moving picture encoding apparatus>
A configuration of a moving picture coding apparatus 100 for realizing moving area separation prediction coding according to the present invention is shown in FIG. A detailed block diagram of the prediction unit 106 of the moving picture coding apparatus 100 is shown in FIG. FIG. 3 shows a block diagram of an inter prediction unit related to the motion region separation predictive coding that performs the motion region separate prediction coding method. First, an embodiment will be described with reference to FIGS. 1, 2, and 3 regarding a moving region separation predictive coding method related to moving image coding.

（第１の実施形態）
図１を参照して、第1の実施形態に従った動画像符号化装置を説明する。この画像符号化装置は、入力画像信号を構成する各々のフレームを複数の画素ブロックに分割し、これら分割した画素ブロックに対して符号化処理を行って圧縮符号化し、符号列を出力するよう構成されている。具体的には、この画像符号化装置１００は入力画像信号１１０と予測画像信号１１７との差分を計算し、予測誤差信号１１１を出力する減算値１０１と、予測誤差信号１１１を変換及び量子化し、変換係数１１２を出力する変換・量子化部１０２と、変換係数１１２を逆量子化し、逆変換して復元予測誤差信号１１３を生成する逆量子化・逆変換部１０３とを含む。更に、画像符号化装置１００は復元予測誤差信号１１３と予測画像信号１１７とを加算して復号画像信号１１４を生成する加算器１０４と、復号画像信号１１４を参照画像信号として記憶する参照画像メモリ１０５と、参照画像信号１１６と入力画像信号１１０とを用いて予測画像信号１１７を生成する予測部１０６を備えている。更に、動画像符号化装置１００は変換係数１１２を符号化列に符号化し、符号列を出力バッファ１０９に出力する符号列符号化部１０８を備えている。この動画像符号化装置１００は符号化制御部１０７によって制御される。 (First embodiment)
With reference to FIG. 1, a moving picture encoding apparatus according to the first embodiment will be described. The image coding apparatus is configured to divide each frame constituting an input image signal into a plurality of pixel blocks, perform coding processing on the divided pixel blocks, perform compression coding, and output a code string Has been. Specifically, the image encoding device 100 calculates a difference between the input image signal 110 and the predicted image signal 117, converts and quantizes the subtraction value 101 that outputs the prediction error signal 111, and the prediction error signal 111, A transform / quantization unit 102 that outputs a transform coefficient 112 and an inverse quantization / inverse transform unit 103 that inversely quantizes the transform coefficient 112 and performs inverse transform to generate a restored prediction error signal 113 are included. Furthermore, the image coding apparatus 100 adds the restored prediction error signal 113 and the predicted image signal 117 to generate a decoded image signal 114, and a reference image memory 105 that stores the decoded image signal 114 as a reference image signal. And a prediction unit 106 that generates a predicted image signal 117 using the reference image signal 116 and the input image signal 110. Furthermore, the moving image encoding apparatus 100 includes a code sequence encoding unit 108 that encodes the transform coefficient 112 into an encoded sequence and outputs the code sequence to the output buffer 109. The moving image encoding apparatus 100 is controlled by the encoding control unit 107.

上記構成の動画像符号化装置では、動画像または静止画像の入力画像信号１１０が小画素ブロック単位、例えばマクロブロック単位に分割され、動画像符号化装置１００に入力される。ここで入力画像信号１１０とは、フレーム及びフィールドの両方を含む１つの符号化の処理単位（ピクチャ）を意味している。また、ここでは、マクロブロックを符号化処理の基本的な処理ブロックサイズとする。マクロブロックは、典型的に例えば図４Ａに示すような１６×１６画素ブロックであるが、３２×３２画素ブロック単位であっても８×８画素ブロック単位であってもよく、またマクロブロックの形状は正方格子である必要は必ずしもない。以下、入力画像信号１１０の符号化対象マクロブロックを単に対象ブロックという。本実施形態では、説明を簡単にするために図４Ａに示されているように左上から右下に向かって符号化処理がなされていくものとする。 In the moving image encoding apparatus having the above configuration, the input image signal 110 of a moving image or a still image is divided into small pixel blocks, for example, macroblocks, and input to the moving image encoding apparatus 100. Here, the input image signal 110 means one encoding processing unit (picture) including both a frame and a field. Here, the macro block is assumed to be the basic processing block size of the encoding process. The macroblock is typically a 16 × 16 pixel block as shown in FIG. 4A, for example, but may be a 32 × 32 pixel block unit or an 8 × 8 pixel block unit, and the shape of the macroblock Need not be a square lattice. Hereinafter, the encoding target macroblock of the input image signal 110 is simply referred to as a target block. In the present embodiment, in order to simplify the description, it is assumed that the encoding process is performed from the upper left to the lower right as illustrated in FIG. 4A.

動画像符号化装置１００は、ブロックサイズや予測画像信号１１７の生成方法の異なる複数の予測モードが用意されている。予測画像信号１１７の生成方法は、具体的には大きく分けて符号化対象のフレーム内（フィールド内）だけで予測画像を生成するイントラ予測（フレーム内予測）と、時間的に異なる複数の参照フレーム（参照フィールド）を用いて予測を行うインター予測（フレーム間予測）がある。 The moving image encoding apparatus 100 is provided with a plurality of prediction modes having different block sizes and generation methods of the predicted image signal 117. Specifically, the method of generating the predicted image signal 117 is roughly divided into a plurality of reference frames that are temporally different from intra prediction (intraframe prediction) in which a predicted image is generated only within a frame (field) to be encoded. There is inter prediction (interframe prediction) in which prediction is performed using (reference field).

次に、動画像符号化装置１００による符号化の流れを説明する。まず、入力画像信号１１０が、最初に減算器１０１へと入力される。減算器１０１には、後述する予測部１０６から出力された各々の予測モードに応じた予測画像信号１１７が更に入力される。減算器１０１は、入力画像信号１１０から予測画像信号１１７を減算した予測誤差信号１１１を算出する。減算器１０１で生成され、出力された予測誤差信号１１１は変換・量子化部１０２へと入力される。変換・量子化部１０２では、予測誤差信号１１１に対して例えば離散コサイン変換（ＤＣＴ）のような直交変換が施されることにより、変換係数が生成される。 Next, the flow of encoding by the moving image encoding apparatus 100 will be described. First, the input image signal 110 is first input to the subtractor 101. The subtracter 101 further receives a prediction image signal 117 corresponding to each prediction mode output from the prediction unit 106 described later. The subtractor 101 calculates a prediction error signal 111 obtained by subtracting the prediction image signal 117 from the input image signal 110. The prediction error signal 111 generated and output by the subtractor 101 is input to the transform / quantization unit 102. In the transform / quantization unit 102, a transform coefficient is generated by performing orthogonal transform such as discrete cosine transform (DCT) on the prediction error signal 111, for example.

変換・量子化部１０２は、符号化制御部１０７によって与えられる量子化パラメータ、量子化マトリクス等に代表される量子化情報に従って変換係数を量子化する。量子化後の変換係数１１２は変換・量子化部１０２から出力され、符号列符号化部１０８へと入力されるとともに、逆量子化・逆変換部１０３へも出力される。ここで、変換・量子化部１０２における変換には、H.264で用いられているような離散コサイン変換について説明したが、離散サイン変換、ウェーブレット変換や独立成分解析などの手法を用いてもよい。 The transform / quantization unit 102 quantizes the transform coefficient in accordance with quantization information represented by a quantization parameter, a quantization matrix, and the like given by the encoding control unit 107. The quantized transform coefficient 112 is output from the transform / quantization unit 102, input to the code string encoding unit 108, and also output to the inverse quantization / inverse transform unit 103. Here, although the discrete cosine transform as used in H.264 has been described for the transform in the transform / quantization unit 102, techniques such as discrete sine transform, wavelet transform, and independent component analysis may be used. .

符号列符号化部１０８では、量子化後の変換係数１１２と共に、符号化制御部１０７から出力された予測情報１１９などを含んだ、対象ブロックを符号化したときに用いた様々な符号化パラメータに対してエントロピー符号化、例えばハフマン符号化や算術符号化などが行われ、符号化データが生成される。ここで符号化パラメータとは、予測情報１１９はもとより、変換係数に関する情報、量子化に関する情報、などの復号の際に必要になるあらゆるパラメータを指す。 In the code string encoding unit 108, various encoding parameters used when the target block is encoded, including the prediction information 119 output from the encoding control unit 107, together with the transform coefficient 112 after quantization, are set. Entropy encoding, for example, Huffman encoding or arithmetic encoding, is performed on the encoded data. Here, the encoding parameter refers to all parameters necessary for decoding such as information regarding transform coefficients and information regarding quantization as well as prediction information 119.

符号列符号化部１０８により生成された符号化データ１１８は動画像符号化装置１００から出力され、多重化器（図示せず）によって復号に必要なパラメータと多重化され、出力バッファ１０９に一旦蓄積される。出力バッファ１０９の符号化データ１１８は、符号化制御部１０７が管理する出力タイミングに従って動画像符号化装置１００外へ出力される。符号化データ１１８は、図示しない蓄積系（蓄積メディア）または伝送系（通信回線）へ送出される。 The encoded data 118 generated by the code string encoding unit 108 is output from the video encoding apparatus 100, multiplexed with parameters necessary for decoding by a multiplexer (not shown), and temporarily stored in the output buffer 109. Is done. The encoded data 118 in the output buffer 109 is output to the outside of the moving image encoding apparatus 100 according to the output timing managed by the encoding control unit 107. The encoded data 118 is sent to a storage system (storage medium) or a transmission system (communication line) (not shown).

一方、変換・量子化部１０２から出力された量子化後の変換係数１１２は、逆量子化・逆変換部１０３へと入力される。逆量子化・逆変換部１０３では、変換係数１１２は最初に逆量子化処理が行われる。ここでは、変換・量子化部１０２で使用されたものと同様の量子化パラメータ、量子化マトリクス等に代表される量子化情報が、符号化制御部１０７からロードされて変換係数１１２は逆量子化処理が行われる。 On the other hand, the quantized transform coefficient 112 output from the transform / quantization unit 102 is input to the inverse quantization / inverse transform unit 103. In the inverse quantization / inverse transform unit 103, the transform coefficient 112 is first subjected to inverse quantization processing. Here, the quantization information represented by the same quantization parameter and quantization matrix as used in the transform / quantization unit 102 is loaded from the encoding control unit 107, and the transform coefficient 112 is dequantized. Processing is performed.

逆量子化後の変換係数は、逆離散コサイン変換（ＩＤＣＴ）のような逆直交変換が施されることによって、復号後の予測誤差信号１１３が再生される。復号予測誤差信号１１３は、加算器１０４に入力される。加算器１０４では、復号予測誤差信号１１３と予測部１０６から出力された予測画像信号１１７とが加算されることにより、復号画像信号１１４（局所復号画像信号）が生成される。復号画像信号１１４は、参照画像メモリ１０５に参照画像信号１１６として蓄積される。参照画像メモリ１０５に蓄積された参照画像信号１１６は、予測部１０６に出力され予測の際などに参照される。予測部１０６から出力された動領域分離マスク１１５は参照画像メモリ１０５に入力され、同時刻の復号画像信号１１４とともに参照画像メモリ１０５内に蓄積される。以後、参照画像信号１１６とは、同一時刻に符号化又は局所復号化処理された復号画像信号１１４と動領域分離マスク１１５のセットを指す。 The inverse-quantized transform coefficient is subjected to inverse orthogonal transform such as inverse discrete cosine transform (IDCT), thereby reproducing the decoded prediction error signal 113. The decoded prediction error signal 113 is input to the adder 104. The adder 104 adds the decoded prediction error signal 113 and the predicted image signal 117 output from the prediction unit 106 to generate a decoded image signal 114 (local decoded image signal). The decoded image signal 114 is stored as a reference image signal 116 in the reference image memory 105. The reference image signal 116 stored in the reference image memory 105 is output to the prediction unit 106 and is referred to at the time of prediction. The moving region separation mask 115 output from the prediction unit 106 is input to the reference image memory 105 and stored in the reference image memory 105 together with the decoded image signal 114 at the same time. Hereinafter, the reference image signal 116 refers to a set of a decoded image signal 114 and a moving region separation mask 115 that have been encoded or locally decoded at the same time.

予測部１０６では、参照画像メモリ１０５に蓄積された参照画像信号１１６の画素（復号化済み参照画素と生成済みの動領域分離マスクの画素）を利用して、インター予測またはイントラ予測が行われ、対象ブロックに対して選択可能な予測画像信号１１７が生成される。ただし、H. 264のイントラ予測、例えば図４Ｃに示される４×４画素ブロックに対応するイントラ予測または図４Ｄに示される８×８画素ブロックに対するイントラ予測のように、対象ブロック内で局部復号信号を作成しないと次の予測ができないような予測モードに関しては、予測部１０６の内部で変換／量子化及び逆量子化／逆変換或いは、それぞれ対応する画素ブロック毎の復号化処理などを行ってもよい。 The prediction unit 106 performs inter prediction or intra prediction using pixels of the reference image signal 116 accumulated in the reference image memory 105 (decoded reference pixels and generated moving region separation mask pixels), A predictive image signal 117 that can be selected for the target block is generated. However, H.264 intra prediction, for example, intra prediction corresponding to the 4 × 4 pixel block shown in FIG. 4C or intra prediction for the 8 × 8 pixel block shown in FIG. With respect to a prediction mode in which the next prediction cannot be performed without creating an image, even if conversion / quantization and inverse quantization / inverse conversion or a decoding process for each corresponding pixel block is performed inside the prediction unit 106 Good.

図２に予測部１０６のブロック図を示す。予測部１０６は、イントラ予測部２０１、インター予測部２０２、動きベクトル推定部２０３、モード判定スイッチ２０４、モード判定部２０５を備えている。予測部１０６に参照画像信号１１６が入力されると、イントラ予測部２０１とインター予測部２０２は、画素ブロックにおける利用可能な予測モードの予測画像信号１１７を生成する。それぞれの予測方法については後述する。イントラ予測部２０１で生成された予測画像信号とインター予測部２０２で生成された予測画像信号がモード判定スイッチ２０４へと出力される。モード判定スイッチ２０４は、入力されてきた予測画像信号のどちらを利用するかを切り替える機能を有する。スイッチを切り替える情報は、モード判定部２０５から提供される予測情報２０６に基づいている。モード判定部２０５の動作については後述する。 FIG. 2 shows a block diagram of the prediction unit 106. The prediction unit 106 includes an intra prediction unit 201, an inter prediction unit 202, a motion vector estimation unit 203, a mode determination switch 204, and a mode determination unit 205. When the reference image signal 116 is input to the prediction unit 106, the intra prediction unit 201 and the inter prediction unit 202 generate a prediction image signal 117 of an available prediction mode in the pixel block. Each prediction method will be described later. The prediction image signal generated by the intra prediction unit 201 and the prediction image signal generated by the inter prediction unit 202 are output to the mode determination switch 204. The mode determination switch 204 has a function of switching which of the input predicted image signals is used. The information for switching the switch is based on the prediction information 206 provided from the mode determination unit 205. The operation of the mode determination unit 205 will be described later.

イントラ予測部２０１における予測モードの例として、H.264のイントラ予測について説明する。H.264のイントラ予測では、４×４画素イントラ予測（図４Ｃ参照）、８×８画素イントラ予測（図４Ｄ参照）、１６ｘ１６画素イントラ予測（図４Ｂ参照）が規定されている。このイントラ予測では、参照画像メモリ１０５に保存されている参照画像信号１１６から、補間画素を作成し、空間方向にコピーすることによって予測値を生成する。 As an example of the prediction mode in the intra prediction unit 201, H.264 intra prediction will be described. In H.264 intra prediction, 4 × 4 pixel intra prediction (see FIG. 4C), 8 × 8 pixel intra prediction (see FIG. 4D), and 16 × 16 pixel intra prediction (see FIG. 4B) are defined. In this intra prediction, an interpolated pixel is created from the reference image signal 116 stored in the reference image memory 105, and a predicted value is generated by copying in the spatial direction.

次に、図３を参照してインター予測部２０２の構成及び動作を説明する。図３によると、インター予測部２０２は参照画像信号１１６が入力される動き補償部３０１及び動領域分離予測部３０２並びに背景画像生成部３０３を備えている。動領域分離予測部３０２には、動領域分離予測するために動領域分離マスク１１５，参照画像信号１１６，動きベクトル２０７及び背景画像信号３０６が入力される。動き補償部３０１及び動領域分離予測部３０２は予測分離スイッチ３０５によって切換えられる。予測分離スイッチ３０５の切換えは予測切換部３０４によって行われる。 Next, the configuration and operation of the inter prediction unit 202 will be described with reference to FIG. According to FIG. 3, the inter prediction unit 202 includes a motion compensation unit 301 to which a reference image signal 116 is input, a motion region separation prediction unit 302, and a background image generation unit 303. The motion region separation prediction unit 302 receives a motion region separation mask 115, a reference image signal 116, a motion vector 207, and a background image signal 306 for motion region separation prediction. The motion compensation unit 301 and the motion region separation prediction unit 302 are switched by a prediction separation switch 305. The prediction separation switch 305 is switched by the prediction switching unit 304.

上記構成のインター予測部２０２では、図２の動きベクトル推定部２０３で算出された予測対象ブロックの動きベクトル２０７と参照画像信号１１６を元に、補間処理を行って予測画像信号１１７を生成する。図５にインター予測の動き補償予測の一例を示す。インター予測では、参照画像メモリ１０５に蓄積されている複数の参照画像信号１１６を用いて補間処理を行い、作成した補間画像と原画像信号との同位置の画素ブロックからのズレ量を元に予測画像信号１１７が生成される。補間処理としては、１／２画素精度の補間処理や、１／４画素精度の補間処理などが用いられ、参照画像信号１１６に対してフィルタリング処理を行うことによって、補間画素の値を生成する。例えば輝度信号に対して１／４画素精度までの補間処理が可能なＨ．２６４では、ズレ量は整数画素精度の４倍で表現される。このズレ量を動きベクトルと呼ぶ。 The inter prediction unit 202 having the above configuration generates a predicted image signal 117 by performing an interpolation process based on the motion vector 207 and the reference image signal 116 of the prediction target block calculated by the motion vector estimation unit 203 of FIG. FIG. 5 shows an example of motion compensation prediction of inter prediction. In inter prediction, interpolation processing is performed using a plurality of reference image signals 116 stored in the reference image memory 105, and prediction is performed based on the amount of deviation from the pixel block at the same position between the created interpolated image and the original image signal. An image signal 117 is generated. As the interpolation processing, interpolation processing with 1/2 pixel accuracy, interpolation processing with 1/4 pixel accuracy, or the like is used, and the value of the interpolation pixel is generated by performing filtering processing on the reference image signal 116. For example, H.P. capable of performing interpolation processing up to 1/4 pixel accuracy on a luminance signal. In H.264, the shift amount is expressed by four times the integer pixel accuracy. This amount of deviation is called a motion vector.

インター予測では、複数の予測ブロックの中から現在の予測対象ブロックに適したブロックサイズを選択することが可能である。図６Ａにマクロブロック単位の動き補償ブロックのサイズを、図６Ｂにサブブロック（８×８画素ブロック以下）単位の動き補償ブロックのサイズを示す。これらの予測ブロックのサイズ毎に、動きベクトルを求めることが可能であるため、入力画像信号１１０の局所的な性質に従って、最適な予測ブロックの形状と動きベクトルを利用することが可能である。また、どの参照画像信号に対して動きベクトルを計算したかの情報はRef_idxとして最小で８×８画素ブロック毎に変更することが可能である。 In inter prediction, a block size suitable for the current prediction target block can be selected from among a plurality of prediction blocks. FIG. 6A shows the size of the motion compensation block in units of macroblocks, and FIG. 6B shows the size of the motion compensation block in units of sub-blocks (8 × 8 pixel blocks or less). Since a motion vector can be obtained for each size of these prediction blocks, an optimal prediction block shape and motion vector can be used in accordance with the local nature of the input image signal 110. Further, information on which reference image signal is calculated for the motion vector can be changed as a minimum for each 8 × 8 pixel block as Ref_idx.

次に、動きベクトル推定部２０３について説明する。動きベクトル推定部２０３は、入力画像信号１１０と参照画像信号１１６を用いて、予測対象ブロックに適した動きベクトル２０７を算出する機能を有する。動きベクトル２０７の算出では、入力画像信号１１０の予測対象ブロックと、参照画像信号１１６の補間画像との間でブロックマッチングを行う。マッチングの評価基準としては、入力画像信号１１０とマッチング後の補間画像との差分を画素毎に累積した値を用いる。最適な動きベクトル２０７の決定では、前述した方法の他に予測された画像と原画像との差を変換した値を用いても良いし、動きベクトルの大きさを加味したり、動きベクトルの符号量などを加味したりして、判定してもよい良い。また後述する式（１）（２）などを利用しても良い。また、マッチングのやり方は、符号化装置の外部から提供される探索範囲情報に基づいてマッチングの範囲内を全探索しても良いし、画素精度毎に階層的に実施しても良い。 Next, the motion vector estimation unit 203 will be described. The motion vector estimation unit 203 has a function of calculating a motion vector 207 suitable for the prediction target block using the input image signal 110 and the reference image signal 116. In the calculation of the motion vector 207, block matching is performed between the prediction target block of the input image signal 110 and the interpolated image of the reference image signal 116. As an evaluation criterion for matching, a value obtained by accumulating the difference between the input image signal 110 and the interpolated image after matching for each pixel is used. In the determination of the optimal motion vector 207, a value obtained by converting the difference between the predicted image and the original image may be used in addition to the above-described method. The determination may be made by taking the amount into consideration. Moreover, you may utilize Formula (1) (2) etc. which are mentioned later. Further, the matching method may be a full search within the matching range based on search range information provided from the outside of the encoding device, or may be performed hierarchically for each pixel accuracy.

このようにして複数の参照画像信号（時間的に異なる局部復号画像信号を指す）に対して算出された動きベクトル２０７は、インター予測部２０２へと入力され、予測画像信号１１７の生成に利用される。算出された動きベクトル２０７は、対応する画素ブロック形状などの予測に係わる情報とともに予測情報１１９として符号化制御部１０７に保持され、符号列符号化部１０８へ予測情報１１９として渡されて、エントロピー符号化された後、符号化データに多重化される。 The motion vector 207 calculated for a plurality of reference image signals (pointing to locally decoded image signals that are temporally different) in this way is input to the inter prediction unit 202 and used to generate the predicted image signal 117. The The calculated motion vector 207 is stored in the encoding control unit 107 as prediction information 119 together with information related to prediction such as a corresponding pixel block shape, and is passed to the code string encoding unit 108 as prediction information 119 to be used as an entropy code. And then multiplexed into encoded data.

次に、モード判定部２０５について概要を説明する。モード判定部２０５は、現在符号化しているスライスの情報に応じて、スイッチ切替情報２０６をモード判定スイッチ２０４へ出力する。スイッチ切替情報２０６には、イントラ予測部２０１の出力端とインター予測部２０２の出力端のどちらと、スイッチを繋ぐかの情報が記述されている。 Next, an outline of the mode determination unit 205 will be described. The mode determination unit 205 outputs switch switching information 206 to the mode determination switch 204 in accordance with information on the slice currently being encoded. The switch switching information 206 describes information about which of the output terminal of the intra prediction unit 201 and the output terminal of the inter prediction unit 202 is connected to the switch.

次に、モード判定部２０５の機能を説明する。現在符号化しているスライスがイントラ符号化スライスである場合、モード判定部２０５は、モード判定スイッチ２０４の出力端をイントラ予測部２０１に接続する。一方、現在符号化しているスライスがインター符号化スライスである場合、モード判定部２０５はモード判定スイッチ２０４をイントラ予測部２０１の出力端に繋ぐか、インター予測部２０２の出力端へ繋ぐかを判定する。 Next, the function of the mode determination unit 205 will be described. When the currently encoded slice is an intra encoded slice, the mode determination unit 205 connects the output terminal of the mode determination switch 204 to the intra prediction unit 201. On the other hand, when the currently encoded slice is an inter-coded slice, the mode determination unit 205 determines whether to connect the mode determination switch 204 to the output end of the intra prediction unit 201 or to the output end of the inter prediction unit 202. To do.

より具体的に説明すると、上記の場合、モード判定部２０５では次式（１）のようなコストを用いたモード判定を行う。予測モードを選択した際に必要となる予測情報１１９に関する符号量（例えば動きベクトルの符号量やブロック形状の符号量など）をＯＨ、入力画像信号１１０と予測画像信号１１７の差分絶対和（予測誤差信号１１１の絶対累積和を意味する）をＳＡＤとすると、以下のモード判定式を用いる。

More specifically, in the above case, the mode determination unit 205 performs mode determination using a cost such as the following equation (1). A code amount (for example, a motion vector code amount or a block shape code amount) related to the prediction information 119 required when the prediction mode is selected is set to OH, and an absolute difference (prediction error) between the input image signal 110 and the predicted image signal 117. If the SAD is an absolute cumulative sum of the signal 111, the following mode determination formula is used.

ここでＫはコスト、λは定数をそれぞれ表す。λは量子化スケールや量子化パラメータの値に基づいて決められるラグランジュ未定乗数である。このようにして得られたコストＫを基に、モード判定が行われる。すなわち、コストＫが最も小さい値を与えるモードが最適な予測モードとして選択される。 Here, K represents a cost, and λ represents a constant. λ is a Lagrangian undetermined multiplier determined based on the quantization scale and the value of the quantization parameter. The mode determination is performed based on the cost K obtained in this way. That is, the mode that gives the smallest value of cost K is selected as the optimal prediction mode.

モード判定部２０５においては、式（１）に代えて（ａ）予測情報１１９のみ、（ｂ）ＳＡＤのみ、を用いてモード判定を行ってもよいし、これら（ａ）予測情報１１９のみ、（ｂ）ＳＡＤのみにアダマール変換を施した値、またはそれに近似した値を利用してもよい。さらに、モード判定部２０５において入力画像信号１１０のアクテビティ（信号値の分散）を用いてコストを作成してもよいし、量子化スケールまたは量子化パラメータを利用してコスト関数を作成してもよい。 The mode determination unit 205 may perform the mode determination using (a) only the prediction information 119 and (b) only the SAD instead of the equation (1), or (a) only the prediction information 119 ( b) A value obtained by performing Hadamard transformation only on SAD or a value approximate thereto may be used. Further, the mode determination unit 205 may create a cost using the activity (variation of signal values) of the input image signal 110, or may create a cost function using a quantization scale or a quantization parameter. .

さらに別の例として、仮符号化ユニットを用意し、仮符号化ユニットによりある予測モードで生成された予測誤差信号１１１を実際に符号化した場合の符号量と、入力画像信号１１０と復号画像信号１１４との間の二乗誤差を用いてモード判定を行ってもよい。この場合のモード判定式は、以下のようになる。

As yet another example, a provisional encoding unit is prepared, and the amount of codes when the prediction error signal 111 generated by the provisional encoding unit in a certain prediction mode is actually encoded, the input image signal 110 and the decoded image signal. The mode determination may be performed using a square error with respect to 114. The mode determination formula in this case is as follows.

ここで、Ｊは符号化コスト、Ｄは入力画像信号１１０と復号画像信号１１４との間の二乗誤差を表す符号化歪みである。一方、Ｒは仮符号化によって見積もられた符号量を表している。 Here, J is an encoding cost, and D is an encoding distortion representing a square error between the input image signal 110 and the decoded image signal 114. On the other hand, R represents a code amount estimated by provisional encoding.

式（２）の符号化コストＪを用いると、予測モード毎に仮符号化と局部復号処理が必要となるため、回路規模または演算量は増大する。反面、より正確な符号量と符号化歪みを用いるため、高い符号化効率を維持することができる。式（２）に代えてＲのみ、またはＤのみを用いてコストを算出してもよいし、ＲまたはＤを近似した値を用いてコスト関数を作成してもよい。 When the encoding cost J of Expression (2) is used, provisional encoding and local decoding processing are required for each prediction mode, so that the circuit scale or the amount of calculation increases. On the other hand, since a more accurate code amount and encoding distortion are used, high encoding efficiency can be maintained. The cost may be calculated using only R or only D instead of Equation (2), or the cost function may be created using a value approximating R or D.

以上のようにして、イントラ予測部２０１で生成された予測画像信号を選ぶか、インター予測部２０２で生成された予測画像信号を選ぶか、を判定し、モード判定スイッチ２０４の出力端を切り替える。ここで選択された予測モードの予測画像信号１１７が予測部１０６から出力されて、減算器１０１へ入力されるとともに、加算器１０４へ出力される。 As described above, it is determined whether the prediction image signal generated by the intra prediction unit 201 or the prediction image signal generated by the inter prediction unit 202 is selected, and the output terminal of the mode determination switch 204 is switched. The prediction image signal 117 of the prediction mode selected here is output from the prediction unit 106, input to the subtractor 101, and output to the adder 104.

次に、インター予測部２０２についてより詳細に説明する。図３にインター予測部２０２のブロック図が示されている。インター予測部２０２は、上述したように動き補償部３０１、動領域分離予測部３０２、背景画像生成部３０３、予測切替部３０４、予測分離スイッチ３０５を備えている。 Next, the inter prediction unit 202 will be described in more detail. FIG. 3 shows a block diagram of the inter prediction unit 202. As described above, the inter prediction unit 202 includes the motion compensation unit 301, the moving region separation prediction unit 302, the background image generation unit 303, the prediction switching unit 304, and the prediction separation switch 305.

参照画像メモリ１０５から出力された参照画像信号１１６は、予測部１０６へと入力され、インター予測部２０２へと入力される。同時に動きベクトル推定部２０３で推定された動きベクトル２０７が入力される。動き補償部３０１では、まず動きベクトル２０７の情報に従って、予測画素ブロックの位置から、次式（３）を用いて動きベクトル２０７で参照されている位置を割り出す。ここでは前述の通り、Ｈ．２６４の１／４画素精度の補間を例に挙げて説明する。つまり、動きベクトルの各成分が４の倍数である場合は、整数画素位置を指していることを意味する。それ以外の場合は、分数精度の補間位置に対応する予測位置であることがわかる。

The reference image signal 116 output from the reference image memory 105 is input to the prediction unit 106 and input to the inter prediction unit 202. At the same time, the motion vector 207 estimated by the motion vector estimation unit 203 is input. In the motion compensation unit 301, first, the position referred to by the motion vector 207 is determined from the position of the predicted pixel block according to the information of the motion vector 207 using the following equation (3). Here, as described above, H.P. A description will be given by taking an example of H.264 1/4 pixel precision interpolation. That is, when each component of the motion vector is a multiple of 4, it means that it is an integer pixel position. In other cases, it is understood that the predicted position corresponds to the interpolation position with fractional accuracy.

ここで、(x,y)は予測対象ブロックの先頭位置を表す垂直、水平方向のインデックスであり、(x_pos,y_pos)は参照画像信号の対応する予測位置を表している。(mv_x,mv_y)は１／４画素精度を持つ動きベクトルを示している。次に割り出した画素位置に対して、参照画像信号１１６の対応する画素位置の補填又は補間処理によって予測画素を生成する。 Here, (x, y) is an index in the vertical and horizontal directions representing the start position of the prediction target block, and (x_pos, y_pos) represents the corresponding predicted position of the reference image signal. (mv_x, mv_y) represents a motion vector having a 1/4 pixel accuracy. Next, predicted pixels are generated by interpolation or interpolation processing of corresponding pixel positions of the reference image signal 116 with respect to the determined pixel positions.

図７にＨ．２６４の予測画素生成の例を示す。図中大文字で示されるアルファベット（斜線で表示された正方形）は整数位置の画素を示しており、網掛けで表示されている正方形は１／２画素位置の補間画素を示している。また、白塗りで表示された正方形は１／４画素位置に対応する補間画素を示している。例えば、図中でアルファベットｂ、ｈの位置に対応する１／２画素の補間処理は次式（４）で算出される。

FIG. An example of H.264 prediction pixel generation is shown. In the figure, alphabets indicated by capital letters (squares indicated by diagonal lines) indicate pixels at integer positions, and squares indicated by hatching indicate interpolation pixels at 1/2 pixel positions. A square displayed in white indicates an interpolation pixel corresponding to a 1/4 pixel position. For example, the interpolation processing of 1/2 pixel corresponding to the positions of alphabets b and h in the figure is calculated by the following equation (4).

また、図中でアルファベットａ、ｄの位置に対応する１／４画素の補間処理は次式（５）で算出される。

In the figure, the interpolation processing of 1/4 pixels corresponding to the positions of alphabets a and d is calculated by the following equation (5).

このように１／２画素位置の補間画素は、６タップＦＩＲフィルタ（タップ係数：(1，−５，２０，２０、−５，１)／３２）を用いて生成し、１／４画素位置の補間画素は、２タップの平均値フィルタ（タップ係数：（１／２，１／２））を用いて算出される。４つの整数画素位置の中間に存在するアルファベットｊに対応する１／２画素の補間処理は、垂直方向６タップと水平方向６タップの両方向を行うことによって生成される。説明した以外の画素位置も同様のルールで補間値が生成できる。以上が、動き補償部３０１おける予測画像信号生成の例である。 Thus, the interpolation pixel at the 1/2 pixel position is generated using a 6-tap FIR filter (tap coefficient: (1, -5, 20, 20, -5, 1) / 32), and the 1/4 pixel position is obtained. These interpolation pixels are calculated using a 2-tap average value filter (tap coefficients: (1/2, 1/2)). The interpolation process of 1/2 pixel corresponding to the alphabet j existing in the middle of the four integer pixel positions is generated by performing both directions of 6 taps in the vertical direction and 6 taps in the horizontal direction. Interpolated values can be generated by the same rule for pixel positions other than those described. The above is an example of predictive image signal generation in the motion compensation unit 301.

次に、背景画像生成部３０３について説明する。背景画像生成部３０３は、入力された参照画像信号１１６を用いて、背景画像信号３０６及び動領域分離マスク１１５を生成する機能と、生成した背景画像信号３０６を保持するメモリとしての機能を有する。まず、動領域分離マスク１１５の生成について説明する。動領域分離マスク１１５は、参照画像信号１１６で提供された各々の時刻で復号された復号画像信号１１４に対してそれぞれ１つずつ存在する。動領域分離マスク１１５は、同時刻に復号された復号画像信号１１４の各画素に対して、それ以前に復号された復号画像信号１１４との時間的な輝度変化（差分値）が予め定めた規定値ＴＨよりも小さい場合に背景画素と認定し、輝度変化が規定値ＴＨを超える場合を動画素と判定する２値のマスクマップである。

Next, the background image generation unit 303 will be described. The background image generation unit 303 has a function of generating the background image signal 306 and the moving region separation mask 115 using the input reference image signal 116 and a function of a memory for holding the generated background image signal 306. First, generation of the moving region separation mask 115 will be described. One moving region separation mask 115 exists for each decoded image signal 114 decoded at each time provided by the reference image signal 116. The moving region separation mask 115 is a rule in which a temporal luminance change (difference value) between the decoded image signal 114 decoded at the same time and the decoded image signal 114 decoded before is determined in advance. It is a binary mask map that is recognized as a background pixel when it is smaller than the value TH and determines that a luminance change exceeds a specified value TH as a moving pixel.

利用可能な参照画像信号１１６が複数ある場合は、時間方向に同位置の全ての画素に対して差分値を算出して、後述する代表値を確定し、確定した差分値に対して規定値を用いて背景画素であるか、動画素であるかを判定する。

When there are a plurality of reference image signals 116 that can be used, a difference value is calculated for all pixels at the same position in the time direction, a representative value to be described later is determined, and a specified value is set for the determined difference value. It is used to determine whether it is a background pixel or a moving pixel.

ここで、ＬＤは局所復号画像信号を表している。ｓは時間方向の変位を表すインデックスであり、s=0は予測対象画像そのものを指している。例えば参照画像信号のインデックスに対応する。図８に複数の参照画像信号に対して差分値を求める際の参照画素と対象画素の対応関係を示す。ｗは時間的な距離に応じて重み付けを行う変数である。例えば時間的に近い復号画像信号に対して大きな重みを与え、時間的に距離の遠い復号画像信号に対しては小さな重みを与えることによって、時間相関を考慮することが可能となる。図９に予測対象画素ブロックからの時間的距離に応じて重みｗを変える例を示す。 Here, LD represents a locally decoded image signal. s is an index representing the displacement in the time direction, and s = 0 indicates the prediction target image itself. For example, it corresponds to the index of the reference image signal. FIG. 8 shows the correspondence between reference pixels and target pixels when obtaining difference values for a plurality of reference image signals. w is a variable that performs weighting according to a temporal distance. For example, it is possible to consider temporal correlation by giving a large weight to a decoded image signal close in time and giving a small weight to a decoded image signal far in time. FIG. 9 shows an example in which the weight w is changed according to the temporal distance from the prediction target pixel block.

尚、上記では単純に差分値のみによって画素の領域判定を行う例を示したが、代表値を決める指標として、利用可能な複数の復号画像信号間（時間方向）の画素の差分値の絶対和、差分値の最大値、差分値の平均値、差分値のメディアン値、差分値の分散を用いて判定しても良いし、復号画像信号の領域判定を行う画素に隣接する画素（空間方向）の差分値の絶対和、差分値の最大値、差分値の平均値、差分値のメディアン値、差分値の分散、などの指標を用いて判定しても良い。 In addition, although the example which performs pixel area | region determination only with only a difference value was shown above, the absolute sum of the difference value of the pixel between several decoding image signals which can be used (time direction) is used as an index for determining a representative value. The maximum difference value, the average value of the difference values, the median value of the difference values, and the variance of the difference values may be used for the determination, or the pixels adjacent to the pixel that performs the region determination of the decoded image signal (in the spatial direction) Determination may be made using indices such as the absolute sum of the difference values, the maximum value of the difference values, the average value of the difference values, the median value of the difference values, and the variance of the difference values.

また、一度生成した動領域分離マスクに対して補正を行っても良い。例えば、生成した動領域分離マスクの補正対象画素において、隣接位置に対応する上下左右の４点、或いは対角方向も含めた９点のマスクの値を用いて、孤立点となるような領域を補正したり、予測ブロック形状に合わせてブロック境界のマスクの値を修正したりしても良い。この場合の例を次式（８）で示す。

Further, correction may be performed on the once generated moving region separation mask. For example, in the correction target pixel of the generated moving region separation mask, an area that becomes an isolated point is obtained by using the mask values of four points on the top, bottom, left, and right corresponding to the adjacent positions, or nine points including the diagonal direction. It may be corrected or the mask value at the block boundary may be corrected in accordance with the predicted block shape. An example of this case is shown by the following equation (8).

ここで(i,j)は、対象画素に隣接する画素のインデックスを示しており、(i,j)=(0,0)は補正対象画素を示している。図１０に対象画素と隣接画素の関係を示す。丸印の密度が高くなるほど対象画素との距離が離れることを意味している。また、ｖは隣接画素の位置関係に応じて重み付けを行う変数である。例えば、空間的距離が近い(i,j)=(0,1)、(1,0)、(0,-1)、(-1,0)に対して重みを大きくし、空間的距離が大きい(i,j)=(1,1)、(1,-1)、(-1,1)、(-1,-１)などの画素の重みを小さくする、などの空間的相関を考慮するために利用される。 Here, (i, j) indicates an index of a pixel adjacent to the target pixel, and (i, j) = (0,0) indicates a correction target pixel. FIG. 10 shows the relationship between the target pixel and adjacent pixels. It means that the distance from the target pixel increases as the density of the circles increases. Further, v is a variable that performs weighting according to the positional relationship between adjacent pixels. For example, when the spatial distance is close (i, j) = (0,1), (1,0), (0, -1), (-1,0), the weight is increased and the spatial distance is Consider spatial correlation such as reducing the weight of pixels such as large (i, j) = (1,1), (1, -1), (-1,1), (-1, -1) To be used.

図１１に空間方向の市街地距離に応じて、重みｖを変更する例を示す。算出されたDiffが予め定めた規定値ＴＶより大きい場合は、隣接画素の動領域分離マスクの値が異なっており、相関が低いと判定できることから、対象画素のマスクの値を変更する。一方、規定値ＴＶより小さい場合は、空間相関が高いために値を変更しない、などの処理を行う。このように空間方向に対して重みｖを適切に設定することによって、生成された動領域分離マスクを補正することができ、孤立点の除去、不連続点の連結、矩形ブロックへの領域拡大・縮小、エッジ修正、画素補填、画素マッチング等が可能となる。尚、本実施の形態では、市街地距離による重み変更例を示したが、距離の定義は市街地距離、マンハッタン距離などを含む、ミンコフスキー距離の中から１つを用いて計算することが可能である。 FIG. 11 shows an example in which the weight v is changed according to the city area distance in the spatial direction. When the calculated Diff is larger than the predetermined value TV, the value of the moving region separation mask of the adjacent pixel is different and it can be determined that the correlation is low, so the mask value of the target pixel is changed. On the other hand, when the value is smaller than the specified value TV, processing such as not changing the value because the spatial correlation is high is performed. By appropriately setting the weight v in the spatial direction in this way, the generated moving region separation mask can be corrected, and isolated points are removed, discontinuous points are connected, and the region is expanded to a rectangular block. Reduction, edge correction, pixel compensation, pixel matching, and the like are possible. In this embodiment, an example of changing the weight based on the city distance is shown, but the definition of the distance can be calculated using one of the Minkowski distances including the city distance, the Manhattan distance, and the like.

次に、背景画像信号３０６の生成について説明する。背景画像信号３０６とは、時間方向で輝度変化が少ない背景領域のみを集めた信号となっており、動領域分離マスク１１５と時間的に最近接の復号画像信号１１４に基づいて画素毎に導出される。前述した動領域分離マスク１１５から次式（９）を用いて背景画像信号３０６を生成する。

Next, generation of the background image signal 306 will be described. The background image signal 306 is a signal obtained by collecting only background regions with little change in luminance in the time direction, and is derived for each pixel based on the moving region separation mask 115 and the decoded image signal 114 closest in time. The A background image signal 306 is generated from the moving region separation mask 115 using the following equation (9).

ここでＢＧは背景画像信号３０６を表しており、ＬＤは更新するフレームに対して時間的に最近接の復号画像信号１１４を表している。 Here, BG represents the background image signal 306, and LD represents the decoded image signal 114 closest in time to the frame to be updated.

上式の通り、当該時刻の背景画像信号３０６の更新時は、参照画像信号１１６の中の時間的に最近接の復号画像信号１１４と動領域分離マスク１１５を利用し、マスクの値が0（背景画素の場合）のみ、最近接の復号画像信号１１４と更新前の背景画像信号３０６の重み付き和で更新が行われる。重み付き和は例えばwt=1/2に設定することで平均値フィルタとなる。一方、マスクの値が１（動画素）の場合には、更新は行われない。尚、背景画像信号３０６の初期値は、予め定めた輝度値（例えば輝度信号であれば０や最大輝度値（8ビットでは２５６）、色差信号であれば中間輝度値（8ビットで１２８））で埋めておいても良いし、画面内予測だけで符号化されるようなI-sliceの輝度値を用いたりしても良い。背景画像信号３０６のリフレッシュは、入力画像信号においてシーンチェンジが行われるような場合や、ＩＤＲピクチャが挿入される場合に行われる。本実施の形態ではI-sliceのタイミングで必ず背景画像信号３０６のリフレッシュが行われる例を示している。以上のような過程で背景画像信号３０６が適切なタイミングで更新される。 As shown in the above equation, when the background image signal 306 at that time is updated, the decoded image signal 114 and the moving region separation mask 115 that are temporally closest in the reference image signal 116 are used, and the mask value is 0 ( Only in the case of a background pixel, the update is performed with the weighted sum of the nearest decoded image signal 114 and the background image signal 306 before the update. The weighted sum becomes an average value filter by setting, for example, wt = 1/2. On the other hand, when the mask value is 1 (moving pixel), the update is not performed. The initial value of the background image signal 306 is a predetermined luminance value (for example, 0 or maximum luminance value (256 for 8 bits) for luminance signals, and intermediate luminance value (128 for 8 bits) for color difference signals). It is also possible to use I-slice luminance values that are encoded only by intra prediction. The background image signal 306 is refreshed when a scene change is performed in the input image signal or when an IDR picture is inserted. In this embodiment, an example is shown in which the background image signal 306 is always refreshed at the timing of I-slice. In the above process, the background image signal 306 is updated at an appropriate timing.

尚、背景画像信号３０６は、背景画像生成部３０３の内部メモリに保持されており、更新された信号が動領域分離予測部３０２へと出力される。また、生成された動領域分離マスク１１５は、インター予測部２０２から出力され、予測部１０６を経て、同じ時刻の復号画像信号１１４とともに参照画像信号１０５として参照画像メモリ１０５へと保存される。 The background image signal 306 is held in the internal memory of the background image generation unit 303, and the updated signal is output to the moving region separation prediction unit 302. Further, the generated moving region separation mask 115 is output from the inter prediction unit 202, and is stored in the reference image memory 105 as the reference image signal 105 together with the decoded image signal 114 at the same time via the prediction unit 106.

ここでは、時間的に最近接の復号画像信号を用いた例を説明したが、(1)ディスプレイの表示時間的に利用可能な最近接の参照画像の画素値を補填する方法、(2)符号化の実行時間的に利用可能な最近接の参照画像の画素値を補填する方法、(3)次に符号化される画像に対して時間的に利用可能な最近接の参照画像の画素値を補填する方法、(4)前記背景画像メモリに蓄積されている画素と前記表示時間的に利用可能な最近接の参照画像の画素の線形和で生成された画素を補填する方法、(5)前記背景画像メモリに蓄積されている画素と前記符号化時間的に利用可能な最近接の参照画像の画素の線形和で生成された画素を補填する方法、(6)前記背景画像メモリに蓄積されている画素と前記次に符号化される画像に時間的に利用可能な最近接の参照画像の画素の線形和で生成された画素を補填する方法、の中からいずれか1つの方法を利用してもよい。 In this example, the example using the decoded image signal closest to the time has been described. (1) A method of compensating for the pixel value of the closest reference image that can be used for the display time of the display. (3) The pixel value of the nearest reference image that can be temporally used for the next encoded image is calculated. A method of compensating, (4) a method of compensating for a pixel generated by a linear sum of a pixel stored in the background image memory and a pixel of the nearest reference image usable in display time, and (5) the above A method of compensating for a pixel generated by a linear sum of a pixel stored in a background image memory and a pixel of the nearest reference image usable in the encoding time; (6) stored in the background image memory; Of the nearest reference image that is temporally available for the current pixel and the next encoded image. How to compensate for pixels generated by the linear sum of elementary may utilize any one of the methods from the.

次に、動領域分離予測部３０２について説明する。動領域分離予測部３０２は、動きベクトル推定部２０３から出力された動きベクトル２０７、参照画像メモリ１０５から出力された参照画像信号１１６、及び背景画像信号生成部３０３から出力された背景画像信号３０６が入力される。動領域分離予測部３０２は、入力された動領域分離マスク１１５を用いて、動領域に対して動き補償処理を、背景領域に対しては、背景画像信号の補填を行い、別々の予測方法で予測された信号を合成する機能を有する。尚、入力された動きベクトル２０７を用いて動領域分離マスク１１５にもマッチングを行う。つまり、動き補償部３０１の説明で述べた動きベクトルから補間位置の導出を動領域分離マスク１１５にも適用する。この場合、動領域分離マスクは整数画素精度のみなので、分数精度の動きベクトルの場合は、整数画素精度へのマッピングを行う。１／４画素精度の動き補償処理の場合の整数画素位置へのマッピングは次式（１０）で表される。

Next, the moving region separation prediction unit 302 will be described. The motion region separation prediction unit 302 includes a motion vector 207 output from the motion vector estimation unit 203, a reference image signal 116 output from the reference image memory 105, and a background image signal 306 output from the background image signal generation unit 303. Entered. The motion region separation prediction unit 302 uses the input motion region separation mask 115 to perform motion compensation processing for the motion region and background image signal for the background region, and use different prediction methods. It has a function of synthesizing the predicted signal. Note that matching is also performed on the moving region separation mask 115 using the input motion vector 207. That is, the derivation of the interpolation position from the motion vector described in the description of the motion compensation unit 301 is also applied to the motion region separation mask 115. In this case, since the moving region separation mask has only integer pixel accuracy, in the case of a fractional accuracy motion vector, mapping to integer pixel accuracy is performed. The mapping to the integer pixel position in the case of the 1/4 pixel precision motion compensation process is expressed by the following equation (10).

ここで、(mv_x,mv_y)はそれぞれ１／４画素精度の動きベクトルの水平成分、垂直成分を表しており、(imv_x,imv_y)はそれぞれ整数画素精度の動きベクトルの水平成分、垂直成分を表している。導出された整数精度の動きベクトルを用いて次式（１１）のようにして動領域分離予測が行われる。

Here, (mv_x, mv_y) represents the horizontal component and vertical component of the motion vector with 1/4 pixel accuracy, respectively, and (imv_x, imv_y) represents the horizontal component and vertical component of the motion vector with integer pixel accuracy, respectively. ing. Using the derived integer precision motion vector, motion region separation prediction is performed as shown in the following equation (11).

ここでＰは動領域分離予測によって生成される予測画像信号を表している。ＭＣは、動き補償予測部３０１で行われる動き補償予測によって生成される予測画像信号であり、既に動き補償部３０１の説明で詳細を述べているため、ここでは説明を省略する。例えば図７で生成された補間画素ａ、ｂ、ｊなどの値や整数画素Ｇ、Ｈ、Ｍなどの値が予測画像信号ＭＣに入る。同時刻の復号画像信号１１４と動領域分離マスク１１５に対して動きベクトル２０７が適用され、動領域に対しては通常の動き補償予測を、背景領域に対しては背景画像信号３０６を補填することによって、動オブジェクトの形状によらずに予測精度を上げることが可能となる。図１２に、参照画像信号が時間方向に４枚利用可能な場合の、復号画像信号１１４と動領域分離マスク１１５の例と背景画像信号３０６の例を示す。このようにして作成された予測画像信号が動領域分離予測部３０２から出力されるとともに、この時利用したブロック形状、動きベクトルなどの予測情報１１９が、符号化制御部１０７に記録される。 Here, P represents a predicted image signal generated by moving region separation prediction. MC is a predicted image signal generated by the motion compensation prediction performed by the motion compensation prediction unit 301, and since the details have already been described in the description of the motion compensation unit 301, description thereof is omitted here. For example, values such as the interpolated pixels a, b, and j and integer pixels G, H, and M generated in FIG. 7 enter the predicted image signal MC. The motion vector 207 is applied to the decoded image signal 114 and the motion region separation mask 115 at the same time, and normal motion compensation prediction is performed for the motion region and the background image signal 306 is compensated for the background region. Thus, the prediction accuracy can be increased regardless of the shape of the moving object. FIG. 12 shows an example of the decoded image signal 114 and the moving region separation mask 115 and an example of the background image signal 306 when four reference image signals are available in the time direction. The prediction image signal generated in this way is output from the moving region separation prediction unit 302, and prediction information 119 such as a block shape and a motion vector used at this time is recorded in the encoding control unit 107.

次に、予測切替部３０４と予測分離スイッチ３０５について説明する。予測切替部３０４は、入力されてきた動領域分離マスク１１５の情報に基づいて、予測分離スイッチ３０５を制御するための予測切替情報３０７を出力する。予測分離スイッチ３０５は、予測切替情報３０７に従って、スイッチの出力端を動き補償部３０１側に接続するか、動領域分離予測部３０２側に接続するかを切り替える機能を有する。より具体的に説明すると、当該予測対象画素ブロック内に含まれる動領域分離マスクの比率を算出し、動領域が予め設定した規定値ＴＰより大きいか、小さいかによって予測切替情報３０７を更新する。例えば、予測対象とした８×８画素ブロック内に含まれる６４個のマスク値の内、４個の画素のみが０を、残り６０個の画素が１を取っている場合、対象とする画素ブロックは９割以上が動領域であるため、スイッチの出力端を動き補償部３０１へと接続する。このように、予測対象画素ブロック内の動領域分離マスクの比率を算出し、この比率の値の大きさによって、どちらの予測部と接続するかを動的に切り替えることができる。図１３に、ＴＰ＝９０％に設定した場合の切替の例を示している。このようにして、予測対象画素ブロックのインター予測の予測方法（動き補償予測と動領域分離予測）が切り替えられ、インター予測部２０２から予測画像信号１１７が出力される。 Next, the prediction switching unit 304 and the prediction separation switch 305 will be described. The prediction switching unit 304 outputs prediction switching information 307 for controlling the prediction separation switch 305 based on the input information of the moving region separation mask 115. The prediction separation switch 305 has a function of switching between connecting the output end of the switch to the motion compensation unit 301 side or connecting to the motion region separation prediction unit 302 side according to the prediction switching information 307. More specifically, the ratio of the moving region separation mask included in the prediction target pixel block is calculated, and the prediction switching information 307 is updated depending on whether the moving region is larger or smaller than a preset specified value TP. For example, if only 4 pixels take 0 and the remaining 60 pixels take 1 out of 64 mask values included in the 8 × 8 pixel block to be predicted, the target pixel block Since 90% or more is the moving region, the output end of the switch is connected to the motion compensation unit 301. As described above, the ratio of the moving region separation mask in the prediction target pixel block is calculated, and which prediction unit is connected can be dynamically switched depending on the magnitude of the ratio value. FIG. 13 shows an example of switching when TP = 90%. In this way, the inter prediction prediction method (motion compensation prediction and motion region separation prediction) of the prediction target pixel block is switched, and the prediction image signal 117 is output from the inter prediction unit 202.

次に、図１５を用いてインター予測部２０２内の背景画像生成部３０３の処理フローを説明する。まず、背景画像生成部３０３で行われる動領域分離マスク１１５の生成と背景画像信号３０６の更新は、１枚のフレーム又はスライスの符号化処理或いは局所復号処理が完了した後、或いは次のフレーム又はスライスの符号化処理が行われる直前に行われる（S501）。まず、背景画像生成部３０３は、現在の符号化スライス（次に予測が行われる符号化スライス）のスライスの種類をチェックする。当該符号化スライスがイントラ符号化スライス（I-slice）である場合（S502の判定がYES）、背景画像信号３０６が初期化される（S503）。当該符号化スライスがI-slice以外の場合（S502の判定がNO）、参照画像信号１１６を用いて、動領域分離マスク１１５が生成される（S504）。更に参照画像信号１１６と生成された動領域分離マスク１１５などを用いて背景画像信号３０６の更新を行う（S505）。この背景画像信号３０６は、背景画像生成部３０３に存在する内部メモリに保持されている。生成した動領域分離マスク１１５が出力（S506）されるとともに、背景画像信号３０６が動領域分離予測部３０２へと出力される（S507）。次に、当該符号化スライスが最終符号化フレームかどうかの判定（S508）が行われ、かかる判定がＮＯの場合、当該符号化スライスが符号化されるのを待って、処理がS502へと戻る。一方かかる判定がYESの場合、処理を終了する（S509）。 Next, a processing flow of the background image generation unit 303 in the inter prediction unit 202 will be described with reference to FIG. First, the generation of the moving region separation mask 115 and the update of the background image signal 306 performed by the background image generation unit 303 are performed after the encoding process or local decoding process of one frame or slice is completed, or after the next frame or This is performed immediately before the slice encoding process is performed (S501). First, the background image generation unit 303 checks the slice type of the current coding slice (the coding slice to be predicted next). When the coded slice is an intra-coded slice (I-slice) (YES in S502), the background image signal 306 is initialized (S503). When the encoded slice is other than I-slice (NO in S502), the moving region separation mask 115 is generated using the reference image signal 116 (S504). Further, the background image signal 306 is updated by using the reference image signal 116 and the generated moving region separation mask 115 (S505). The background image signal 306 is held in an internal memory that exists in the background image generation unit 303. The generated moving region separation mask 115 is output (S506), and the background image signal 306 is output to the moving region separation prediction unit 302 (S507). Next, it is determined whether or not the encoded slice is the final encoded frame (S508). If the determination is NO, the process returns to S502 after waiting for the encoded slice to be encoded. . On the other hand, if the determination is YES, the process ends (S509).

次に図１６を用いて、上述した詳細機能を除いた、インター予測部２０２内の背景画像生成部３０３の全体の処理フローを説明する。動領域分離予測部３０２へ、動きベクトル２０７と参照画像信号１１６及び背景画像信号３０６が入力されると（S601）、入力ベクトル２０７を用いて参照画像信号１１６中の対応する復号画像信号１１４の予測位置を導出する（S602）。次に動きベクトル２０７を用いて整数精度の動きベクトルを導出し、動領域分離マスクの対応位置を導出する（S603）。また、動領域分離マスク１１５内の予測対象ブロック内に含まれる動画素の比率を算出する（S604）。算出された動画素の比率が予め設定した規定値ＴＰよりも大きいかどうかをチェックする（S605）。かかる判定がYESの場合、画素idxを0に初期化し（S613）、画素idxに対応する画素に対して動き補償予測処理を行い（S614）、画素idxをインクリメントする（S616）。インクリメント後の画素idxが予め定めた対象予測ブロックの最後の画素に対応する値であるかどうかを判定し（S616）、かかる判定がＮＯの場合は、インクリメントされた画素idxで再度画素idxに対応する画素に対して動き補償予測を行う（S614）。一方、かかる判定がＹＥＳの場合、予測画像信号１１７を出力（S617）して処理を終了する（S618）。 Next, an overall processing flow of the background image generation unit 303 in the inter prediction unit 202, excluding the detailed functions described above, will be described with reference to FIG. When the motion vector 207, the reference image signal 116, and the background image signal 306 are input to the motion region separation prediction unit 302 (S601), the corresponding decoded image signal 114 in the reference image signal 116 is predicted using the input vector 207. The position is derived (S602). Next, a motion vector with integer precision is derived using the motion vector 207, and the corresponding position of the motion region separation mask is derived (S603). Further, the ratio of moving pixels included in the prediction target block in the moving region separation mask 115 is calculated (S604). It is checked whether or not the calculated ratio of moving pixels is larger than a preset specified value TP (S605). If this determination is YES, the pixel idx is initialized to 0 (S613), the motion compensation prediction process is performed on the pixel corresponding to the pixel idx (S614), and the pixel idx is incremented (S616). It is determined whether or not the incremented pixel idx is a value corresponding to the last pixel of the predetermined target prediction block (S616). If this determination is NO, the incremented pixel idx again corresponds to the pixel idx Motion compensation prediction is performed on the pixel to be performed (S614). On the other hand, if this determination is YES, the predicted image signal 117 is output (S617) and the process is terminated (S618).

S605の判定がNOの場合、まず、画素idxを0に初期化する。画素idxに対して、動領域分離マスクの対応する位置の値をチェックし（S607）、当該画素のマスク値が動画素であった場合（S607がYES）、当該画素に対して動き補償予測処理を実施する（S612）。一方、当該画素のマスク値が背景画素であった場合（S607がNO）、背景画像信号の予測位置を導出し（S608）、当該予測位置の背景画像信号を補填する（S609）。次に、画素idxの値をインクリメントし（S610）、画素idxが予め定めた対象予測ブロックの最後の画素に対応する値であるかどうかを判定し（S611）、かかる判定がＮＯの場合は、インクリメントされた画素idxで再度動領域分離マスクの対応する位置の値をチェックする（S607）。かかる判定がＹＥＳの場合、予測画像信号１１７を出力（S617）して処理を終了する（S618）。フローチャートの各ステップの内、S604、S605は予測切替部３０４の有する機能であり、ステップS613-S616までは動き補償部３０１が有する機能である。また、ステップS602、S603及びS607−S611までが主に動領域分離予測部３０２が有する機能である。 If the determination in S605 is NO, the pixel idx is first initialized to 0. The value of the corresponding position of the moving region separation mask is checked for the pixel idx (S607), and if the mask value of the pixel is a moving pixel (YES in S607), motion compensation prediction processing is performed for the pixel (S612). On the other hand, when the mask value of the pixel is a background pixel (NO in S607), a predicted position of the background image signal is derived (S608), and the background image signal at the predicted position is compensated (S609). Next, the value of the pixel idx is incremented (S610), and it is determined whether or not the pixel idx is a value corresponding to the last pixel of the target prediction block determined in advance (S611). The value of the corresponding position of the moving region separation mask is checked again with the incremented pixel idx (S607). If this determination is YES, the predicted image signal 117 is output (S617) and the process is terminated (S618). Among the steps of the flowchart, S604 and S605 are functions that the prediction switching unit 304 has, and steps S613 to S616 are functions that the motion compensation unit 301 has. Steps S602, S603, and S607-S611 are functions that the moving region separation prediction unit 302 mainly has.

次に、本動画像符号化装置１００におけるシンタクス構造について説明する。図２３に示すとおり、シンタクスは主に３つのパートからなり、ハイレベルシンタクス１６０１は、スライス以上の上位レイヤのシンタクス情報が詰め込まれている。スライスレベルシンタクス１６０２では、スライス毎に必要な情報が明記されており、マクロブロックレベルシンタクス１６０３では、マクロブロック毎に必要とされるデータが明記されている。 Next, a syntax structure in the moving image encoding apparatus 100 will be described. As shown in FIG. 23, the syntax is mainly composed of three parts, and the high-level syntax 1601 is packed with syntax information of higher layers above the slice. The slice level syntax 1602 specifies information necessary for each slice, and the macro block level syntax 1603 specifies data required for each macro block.

それぞれは、更に詳細なシンタクスで構成されており、ハイレベルシンタクス１６０１では、シーケンスパラメータセットシンタクス１６０４とピクチャパラメータセットシンタクス１６０５などのシーケンス、ピクチャレベルのシンタクスから構成されている。スライスレベルシンタクス１６０２では、スライスヘッダーシンタクス１６０５、スライスデータシンタクス１６０６などから成る。更に、マクロブロックレベルシンタクス１６０３は、マクロブロックレイヤーシンタクス１６０７、マクロブロックプレディクションシンタクス１６０８などから構成されている。 Each has a more detailed syntax. The high-level syntax 1601 includes a sequence parameter sequence syntax 1604, a picture parameter set syntax 1605, and the like, and a picture level syntax. The slice level syntax 1602 includes a slice header syntax 1605, a slice data syntax 1606, and the like. Further, the macroblock level syntax 1603 includes a macroblock layer syntax 1607, a macroblock prediction syntax 1608, and the like.

図２４にスライスヘッダーシンタクスの例を示す。図中に示されるslice_motion_region_separation_flagは、インター予測部２０２中の予測切替部３０４から出力される予測切替情報３０７に利用される。slice_motion_region_separation_flagが0である場合、予測切替部３０４は、スライスにおいて常に動き補償予測部３０１の出力端を出力するように予測切替情報３０７を設定して予測分離スイッチ３０５を切り替える。つまり、必ず動き補償予測が行われることを意味している。一方、slice_motion_region_separation_flagが１である場合、前述の通り、スライスにおいて背景画像生成部３０３から出力された動領域分離マスク１１５の信号に基づいて動き補償予測と動領域分離予測が動的に切り替わる。 FIG. 24 shows an example of slice header syntax. The slice_motion_region_separation_flag shown in the figure is used for the prediction switching information 307 output from the prediction switching unit 304 in the inter prediction unit 202. When slice_motion_region_separation_flag is 0, the prediction switching unit 304 switches the prediction separation switch 305 by setting the prediction switching information 307 so that the output terminal of the motion compensation prediction unit 301 is always output in the slice. That is, it means that motion compensation prediction is always performed. On the other hand, when slice_motion_region_separation_flag is 1, the motion compensation prediction and the motion region separation prediction are dynamically switched based on the signal of the motion region separation mask 115 output from the background image generation unit 303 in the slice as described above.

図２５に符号化パラメータの例としてマクロブロックレイヤーシンタクスの例を示す。表中に示されるｍｂ＿ｔｙｐｅは、マクロブロックタイプ情報を示している。すなわち、現在のマクロブロックがイントラ符号化されているか、インター符号化されているか、或いはどのようなブロック形状で予測が行われているか、などの情報を含んでいる。表中に示されるｃｏｄｅｄ＿ｂｌｏｃｋ＿ｐａｔｔｅｒｎは、８×８画素ブロック毎に、変換係数が存在するかどうかを示している。例えばこの値が０である時、対象ブロックに変換係数が存在しないことを意味している。表中のｍｂ＿ｑｐ＿ｄｅｌｔａは、量子化パラメータに関する情報を示している。この情報は対象ブロックの１つ前に符号化されたブロックの量子化パラメータからの差分値を表している。表中のｉｎｔｒａ＿ｐｒｅｄ＿ｍｏｄｅは、イントラ予測の予測方法を示す予測モードを示している。表中のｒｅｆ＿ｉｄｘ＿ｌ０及びｒｅｆ＿ｉｄｘ＿ｌ１は、インター予測が選択されているときに、対象ブロックがどの参照画像を用いて予測されたか、を表す参照画像のインデックスを示している。表中のｍｖ＿ｌ０、ｍｖ＿ｌ１は動きベクトル情報を示している。表中のｔｒａｎｓｆｏｒｍ＿８ｘ８＿ｆｌａｇは、対象ブロックが８×８変換であるかどうかを示す変換情報を表している。 FIG. 25 shows an example of macroblock layer syntax as an example of encoding parameters. Mb_type shown in the table indicates macroblock type information. That is, it includes information such as whether the current macroblock is intra-coded, inter-coded, or in what block shape is predicted. “Coded_block_pattern” shown in the table indicates whether or not a transform coefficient exists for each 8 × 8 pixel block. For example, when this value is 0, it means that there is no transform coefficient in the target block. Mb_qp_delta in the table indicates information on the quantization parameter. This information represents a difference value from the quantization parameter of the block encoded immediately before the target block. Intra_pred_mode in the table indicates a prediction mode indicating a prediction method of intra prediction. Ref_idx_l0 and ref_idx_l1 in the table indicate the index of a reference image that indicates which reference image was used to predict the target block when inter prediction is selected. Mv_l0 and mv_l1 in the table indicate motion vector information. In the table, transform_8 × 8_flag represents conversion information indicating whether or not the target block is 8 × 8 conversion.

表中の行間には、本発明で規定していないシンタクス要素が挿入されることも可能であるし、それ以外の条件分岐に関する記述が含まれていても良い。或いは、シンタクステーブルを複数のテーブルに分割、統合することも可能である。また、必ずしも同一の用語を用いる必要は無く、利用する形態によって任意に変更しても良い。更に、当該マクロブロックレイヤーシンタクスに記述されている各々のシンタクスエレメントは、後述するマクロブロックデータシンタクスに明記されるように変更しても良い。 A syntax element not defined in the present invention can be inserted between the rows in the table, and other conditional branch descriptions may be included. Alternatively, the syntax table can be divided and integrated into a plurality of tables. Moreover, it is not always necessary to use the same term, and it may be arbitrarily changed depending on the form to be used. Furthermore, each syntax element described in the macroblock layer syntax may be changed as specified in a macroblock data syntax described later.

以上が、本発明に係わる動画像符号化装置１００の説明である。 The above is the description of the moving picture coding apparatus 100 according to the present invention.

（第１の実施形態：変更例１：切替情報のシグナリング）
本実施の形態では、インター予測部２０２内の予測方法として、動き補償部３０１と動領域分離予測部３０２の２つを予測切替部３０４によって動的に切り替える例を示したが、動き補償予測と動領域分離予測の切替を動的に行わない実施の形態も可能である。この場合、どちらの予測方法が利用されたかのインデックスを符号化する必要が生じる。このインデックスは、予測切替情報３０７に記述されており、選択された予測画像信号１１７に対するインデックスが予測切替情報３０７に記述されるとともに、この情報は符号化制御部１０７に保持される。予測方法を用いて生成された予測画像信号１１７が符号化されるのと同時に、符号化制御部１０７から予測情報１１９として、保持されていた予測切替情報３０７がロードされ、符号列符号化部１０８へと入力されるとともに符号化処理が行われる。 (First Embodiment: Modification Example 1: Switching Information Signaling)
In the present embodiment, as an example of a prediction method in the inter prediction unit 202, an example in which the motion compensation unit 301 and the motion region separation prediction unit 302 are dynamically switched by the prediction switching unit 304 has been described. An embodiment in which the dynamic region separation prediction is not dynamically switched is also possible. In this case, it is necessary to encode an index indicating which prediction method is used. This index is described in the prediction switching information 307. The index for the selected prediction image signal 117 is described in the prediction switching information 307, and this information is held in the encoding control unit 107. At the same time as the prediction image signal 117 generated using the prediction method is encoded, the prediction switching information 307 held as the prediction information 119 is loaded from the encoding control unit 107 and the code string encoding unit 108 is loaded. And the encoding process is performed.

図１４にマクロブロック毎に利用した予測方法を示すインデックスを符号化する例を示す。９割以上の画素が動画素である場合は、動き補償予測が選択され、背景画素で占められているマクロブロックは動領域分離予測を行っている。更に、動画素と背景画素の比率が規定値THMAXからTHMINの間に含まれる場合は、どちらの予測を利用したかを示すインデックスを符号化する。 FIG. 14 shows an example of encoding an index indicating a prediction method used for each macroblock. When 90% or more of the pixels are moving pixels, motion compensation prediction is selected, and the macroblock occupied by the background pixels performs moving region separation prediction. Further, when the ratio of the moving pixel and the background pixel is included between the specified values THMAX and THMIN, an index indicating which prediction is used is encoded.

図２６に本実施の形態におけるマクロブロックレイヤーシンタクスの例を示す。図中に示されるmb_motion_region_separation_flagは、インター予測部２０２中の予測切替部３０４から出力される予測切替情報３０７に利用される。mb_motion_region_separation_flagが0である場合、予測切替部３０４は、マクロブロックにおいて常に動き補償予測部３０１の出力端を出力するように予測切替情報３０７を設定して予測分離スイッチ３０５を切り替える。つまり、必ず動き補償予測が行われることを意味している。一方、mb_motion_region_separation_flagが１である場合、予測切替部３０４は、当該マクロブロックにおいて常に動領域分離予測部３０２の出力端を出力するように予測切替情報３０７を設定して予測分離スイッチ３０５を切り替える。つまり、必ず動領域分離予測が行われることを意味している。SignalingFlagはmb_motion_region_separation_flagを符号化するかどうかを決定するための内部パラメータである。SignalingFlagが1の場合、動画素の比率が規定値THMAXからTHMINの間に含まれることを意味する。一方、SignalingFlagが0の場合、動画素の比率が規定値THMAXからTHMINの間に含まれないことを意味する。 FIG. 26 shows an example of macroblock layer syntax in the present embodiment. Mb_motion_region_separation_flag shown in the figure is used for the prediction switching information 307 output from the prediction switching unit 304 in the inter prediction unit 202. When mb_motion_region_separation_flag is 0, the prediction switching unit 304 switches the prediction separation switch 305 by setting the prediction switching information 307 so that the output terminal of the motion compensation prediction unit 301 is always output in the macroblock. That is, it means that motion compensation prediction is always performed. On the other hand, when mb_motion_region_separation_flag is 1, the prediction switching unit 304 switches the prediction separation switch 305 by setting the prediction switching information 307 so that the output end of the moving region separation prediction unit 302 is always output in the macroblock. That is, it means that the motion region separation prediction is always performed. SignalingFlag is an internal parameter for determining whether to encode mb_motion_region_separation_flag. When SignalingFlag is 1, it means that the ratio of moving pixels is included between the specified values THMAX and THMIN. On the other hand, when SignalingFlag is 0, it means that the ratio of moving pixels is not included between the prescribed values THMAX and THMIN.

（第１の実施形態：変更例２：予測画像信号の使いまわし）
本実施の形態では、動き補償部３０１と動領域分離予測部３０２をそれぞれ別々の予測方法として記述しているが、図１６のフローチャートでも示したとおり、動領域分離予測部３０２内で動き補償部３０１と同様の予測方法も用いている。このように同様の処理を複数回行うことによる演算量の増加を避けるため、図１７に示すように動き補償部３０１で算出した予測画像信号１１７を動領域分離予測部３０２へと入力する構造としても良い。或いは動き補償部３０１の機能を動領域分離予測部３０２と統合させても良い。 (First Embodiment: Modification Example 2: Reuse of Predictive Image Signal)
In this embodiment, the motion compensation unit 301 and the motion region separation prediction unit 302 are described as separate prediction methods, but as shown in the flowchart of FIG. 16, the motion compensation unit is included in the motion region separation prediction unit 302. A prediction method similar to 301 is also used. In order to avoid an increase in the amount of calculation due to performing the same process a plurality of times as described above, a structure in which the predicted image signal 117 calculated by the motion compensation unit 301 is input to the moving region separation prediction unit 302 as shown in FIG. Also good. Alternatively, the function of the motion compensation unit 301 may be integrated with the motion region separation prediction unit 302.

（第１の実施形態：変更例３：切替構造の削除）
本実施の形態では、動き補償部３０１と動領域分離予測部３０２をそれぞれ別々の予測方法として記述しているが、予測方法を動領域分離予測３０２に単一化して、予測切替部３０４を削除する構造としても良い。図１８に、動き補償部３０１、予測切替部３０４、予測分離スイッチ３０５を削除した実施例を示す。予測構造が簡略化するため、ハードウェア規模などの増大を防ぐことが可能となる。 (First embodiment: Modification 3: Deletion of switching structure)
In this embodiment, the motion compensation unit 301 and the motion region separation prediction unit 302 are described as separate prediction methods, but the prediction method is unified into the motion region separation prediction 302 and the prediction switching unit 304 is deleted. It is good also as a structure to do. FIG. 18 shows an embodiment in which the motion compensation unit 301, the prediction switching unit 304, and the prediction separation switch 305 are deleted. Since the prediction structure is simplified, it is possible to prevent an increase in hardware scale and the like.

（第２の実施形態：グローバルMC）
本実施の形態では、動画像符号化装置１００の構造は図２と同一であるため、説明を省略する。但し、予測部１０６の機能が異なっているため、予測７０１が設けられている。図１９に第２の実施形態における予測部７０１の構造を示す。尚、既に説明した機能と同様の機能を持つものに対しては同じインデックスを与えて、その説明を省略する。尚、図２のインター予測部２０２とは、機能の違いのために異なるインデックスとしてインター予測部８０１が設けられている。 (Second embodiment: Global MC)
In the present embodiment, the structure of moving picture coding apparatus 100 is the same as that in FIG. However, since the function of the prediction unit 106 is different, a prediction 701 is provided. FIG. 19 shows the structure of the prediction unit 701 in the second embodiment. In addition, the same index is given to those having the same functions as those already described, and the description thereof is omitted. Note that an inter prediction unit 801 is provided as a different index from the inter prediction unit 202 in FIG.

予測部７０１では、インター予測部８０１に加えて、グローバルベクトル推定部８０２が備えられている。グローバルベクトル推定部８０２は、符号化フレーム毎、符号化スライス毎或いはマクロブロック毎に、カメラなどの撮像系の変化によって生じる画面全体の変化量を表すベクトル（グローバルＭＶ（動きベクトル）８０３）を算出する機能を有する。本実施の形態では、画面全体の動きを求めるための枠組みとして平行移動モデルについて説明するが、動きのモデルとしてアフィン変換を用いたモデルや、相似変換、射影変換などに基づいたモデルを用いても良い。平行移動モデルでは、撮影した映像のカメラのパン・チルトに対応することが出来るが、アフィン変換モデルなどを使うことにより、拡大・縮小などにも対応が可能である。また、グローバルＭＶの精度は整数画素精度の場合について説明を行うが、前述の通り分数精度への拡張も容易である。 The prediction unit 701 includes a global vector estimation unit 802 in addition to the inter prediction unit 801. The global vector estimation unit 802 calculates a vector (global MV (motion vector) 803) representing a change amount of the entire screen caused by a change in an imaging system such as a camera for each encoded frame, each encoded slice, or each macroblock. It has the function to do. In this embodiment, a parallel movement model is described as a framework for obtaining the motion of the entire screen. However, a model using affine transformation, a model based on similarity transformation, projective transformation, or the like may be used as a motion model. good. The parallel movement model can handle panning and tilting of the captured video, but it can also handle enlargement and reduction by using an affine transformation model. Further, the case where the accuracy of the global MV is an integer pixel accuracy will be described, but as described above, the expansion to the fractional accuracy is easy.

グローバルベクトル推定部８０２の基本的なベクトル推定機能は、既に説明した動きベクトル推定部２０３と同様であるが、ブロックなどの領域毎に算出した局所的な動きベクトル（ローカル動きベクトル）を統合して、グローバルＭＶ８０３を算出する機能が追加されている。例えば４×４画素ブロックごとの画面内の動きベクトルを算出し、算出した動きベクトルのヒストグラムを作成する。局所的なブロックで算出した局所動きベクトルでは、画面内の移動オブジェクトの影響により、カメラの動きに追随できない場合がある。そこで、グローバルな動きベクトルを求めるため、この中でもっとも出現頻度が高い動きベクトルをグローバル動きベクトル７０２に設定する。グローバルベクトル推定部８０２で算出されたグローバルＭＶ８０３は、インター予測部８０１へと入力される。 The basic vector estimation function of the global vector estimation unit 802 is the same as that of the motion vector estimation unit 203 already described, but the local motion vector (local motion vector) calculated for each region such as a block is integrated. A function for calculating the global MV 803 has been added. For example, a motion vector in the screen for each 4 × 4 pixel block is calculated, and a histogram of the calculated motion vector is created. A local motion vector calculated by a local block may not be able to follow the movement of the camera due to the influence of a moving object in the screen. Therefore, in order to obtain a global motion vector, a motion vector having the highest appearance frequency is set as the global motion vector 702. The global MV 803 calculated by the global vector estimation unit 802 is input to the inter prediction unit 801.

次に、インター予測部８０１について説明する。図２０はインター予測部８０１のブロック図である。第１の実施形態の背景画像生成部３０３及び動領域分離予測部３０２に、グローバルＭＶ８０３が入力されている以外は図１と同一であるが、背景画像信号生成部９０１、動領域分離予測部９０２の処理が異なる。 Next, the inter prediction unit 801 will be described. FIG. 20 is a block diagram of the inter prediction unit 801. The background image signal generation unit 901 and the motion region separation prediction unit 902 are the same as those in FIG. 1 except that the global MV 803 is input to the background image generation unit 303 and the motion region separation prediction unit 302 of the first embodiment. The processing of is different.

まず、背景画像生成部９０１について説明する。背景画像生成部９０１は、参照画像メモリ１０５から出力された参照画像信号１１６及びグローバルＭＶ８０３が入力される。背景画像生成部９０１は、グローバルＭＶ８０３を利用することによって、カメラが動いているような映像に対しても、背景画像信号３０６を生成することが可能である。まず、動領域分離マスク１１５の生成方法について説明する。動領域分離マスク１１５は、参照画像信号１１６とグローバルＭＶ８０３を用いて次式（１２）で算出される。

First, the background image generation unit 901 will be described. The background image generation unit 901 receives the reference image signal 116 and the global MV 803 output from the reference image memory 105. By using the global MV 803, the background image generation unit 901 can generate the background image signal 306 even for an image in which the camera is moving. First, a method for generating the moving region separation mask 115 will be described. The moving region separation mask 115 is calculated by the following equation (12) using the reference image signal 116 and the global MV 803.

ここで、(gmv_x,gmv_y)はグローバルＭＶ８０３の水平・垂直成分を表している。また、ＭＣＬＤは、動き補償処理を施した復号画像信号を表しており、グローバルＭＶ８０３が分数精度の場合は、動き補償部３０１で説明したような動き補償処理を適用する。例えば１／４画素精度の場合は、式中の（gmv_x,gmv_y）をそれぞれ(gmv_x/4,gmv_y/4)に置き換える。グローバルＭＶ８０３が整数精度の場合は、式（１２）のＭＣＬＤをＬＤに置き換えた処理となる。 Here, (gmv_x, gmv_y) represents the horizontal / vertical component of the global MV 803. The MCLD represents a decoded image signal subjected to motion compensation processing. When the global MV 803 has fractional accuracy, the motion compensation processing described in the motion compensation unit 301 is applied. For example, in the case of 1/4 pixel accuracy, (gmv_x, gmv_y) in the equation is replaced with (gmv_x / 4, gmv_y / 4), respectively. When the global MV 803 has integer precision, the MCLD in Expression (12) is replaced with LD.

ここで、差分値の代表値を決める指標は、第１の実施形態で述べた方法を当てはめることが可能である。また、第１の実施形態と同様に一度生成した動領域分離マスクに対して補正を行っても良い。 Here, as the index for determining the representative value of the difference value, the method described in the first embodiment can be applied. Moreover, you may correct | amend with respect to the dynamic region separation mask once produced | generated similarly to 1st Embodiment.

次に、背景画像信号３０６の生成について説明する。背景画像信号３０６は、前述した動領域分離マスク１１５と復号画像信号１１４、及びグローバルＭＶ８０３を用いてから次式（１３）で導出される。

Next, generation of the background image signal 306 will be described. The background image signal 306 is derived by the following equation (13) after using the moving region separation mask 115, the decoded image signal 114, and the global MV 803 described above.

ここで、ＭＣＢＧは背景画像信号３０６に対してグローバルＭＶ８０３を用いて動き補償処理を行った値を示している。上式の通り、当該時刻の背景画像信号３０６の更新時は、参照画像信号１１６の中の時間的に最近接の復号画像信号１１４と動領域分離マスク１１５を利用し、マスクの値が0（背景画素）の場合は、最近接の復号画像信号１１４と更新前の背景画像信号３０６にグローバルＭＶ８０３を考慮したものの重み付き和で更新が行われる。 Here, MCBG indicates a value obtained by performing motion compensation processing on the background image signal 306 using the global MV 803. As shown in the above equation, when the background image signal 306 at that time is updated, the decoded image signal 114 and the moving region separation mask 115 that are temporally closest in the reference image signal 116 are used, and the mask value is 0 ( In the case of (background pixel), the update is performed with the weighted sum of the nearest decoded image signal 114 and the background image signal 306 before update in consideration of the global MV 803.

次に、動領域分離予測部９０２について説明する。動領域分離予測部９０２は、動きベクトル推定部２０３から出力された動きベクトル２０７、参照画像メモリ１０５から出力された参照画像信号１１６、及び背景画像信号生成部９０１から出力された背景画像信号３０６、及びグローバルＭＶ８０３が入力される。動領域分離予測部９０２は、入力された動領域分離マスク１１５を用いて、動領域に対して動き補償処理を、背景領域に対しては、グローバルＭＶ８０３を用いた動き補償処理を行い、別々の予測方法で予測された信号を合成する機能を有する。尚、入力された動きベクトル２０７を用いて動領域分離マスク１１５にもマッチングを行う。つまり、動き補償部３０１の説明で述べた動きベクトルから補間位置の導出を動領域分離マスク１１５にも適用する。この場合、動領域分離マスクは整数画素精度のみなので、分数精度の動きベクトルの場合は、整数画素精度へのマッピングを行う。１／４画素精度の動き補償処理の場合の整数画素位置へのマッピングは式（１１）で表される。導出された整数精度の動きベクトルを用いて次式（１４）のようにして動領域分離予測が行われる。

Next, the moving region separation prediction unit 902 will be described. The motion region separation prediction unit 902 includes a motion vector 207 output from the motion vector estimation unit 203, a reference image signal 116 output from the reference image memory 105, and a background image signal 306 output from the background image signal generation unit 901. And the global MV 803 is input. The motion region separation prediction unit 902 performs motion compensation processing on the motion region using the input motion region separation mask 115 and performs motion compensation processing on the background region using the global MV 803, and performs separate processing. It has a function of synthesizing signals predicted by the prediction method. Note that matching is also performed on the moving region separation mask 115 using the input motion vector 207. That is, the derivation of the interpolation position from the motion vector described in the description of the motion compensation unit 301 is also applied to the motion region separation mask 115. In this case, since the moving region separation mask has only integer pixel accuracy, in the case of a fractional accuracy motion vector, mapping to integer pixel accuracy is performed. The mapping to the integer pixel position in the case of the ¼ pixel precision motion compensation process is expressed by Expression (11). Using the derived integer precision motion vector, motion region separation prediction is performed as shown in the following equation (14).

ここでＰは動領域分離予測によって生成される予測画像信号を表している。動領域に対しては通常の動き補償予測を、背景領域に対しては、背景画像信号３０６を、グローバルＭＶ８０３を用いて動き補償することによって、動オブジェクトの形状によらずに予測精度を上げることが可能となる。このようにして作成された予測画像信号が動領域分離予測部３０２から出力されるとともに、この時利用したブロック形状、動きベクトル２０７、及びグローバルＭＶ８０３などの予測情報１１９が、符号化制御部１０７に記録され、エントロピー符号化され、最終的に符号化データに多重化される。 Here, P represents a predicted image signal generated by moving region separation prediction. By using the global MV 803 to perform motion compensation for normal motion compensation for a moving region and the background image signal 306 for a background region, the prediction accuracy can be improved regardless of the shape of the moving object. Is possible. The prediction image signal generated in this way is output from the motion region separation prediction unit 302, and the prediction information 119 such as the block shape, the motion vector 207, and the global MV 803 used at this time is sent to the encoding control unit 107. Recorded, entropy encoded, and finally multiplexed into encoded data.

図２７に本実施の形態におけるスライスヘッダーシンタクスの例を示す。図中に示されるslice_global_motion_flagは、グローバルＭＶ８０３を利用した動領域分離予測を行うかどうかを示すフラグである。slice_global_motion_flagが0である場合、背景画像生成部９０１及び動領域分離予測部９０２は、第１の実施の形態で説明した背景画像生成部３０３及び動領域分離予測部３０２と同様の予測を行う。つまり、グローバルＭＶ８０３は送られず、利用しない。一方、slice_global_motion_flagが１である場合、予め定められたグローバルＭＶ８０３のパラメータの数を示すNumOfGMPの数だけ、gmv_paramを符号化する。これらの情報を用いて、背景画像生成部９０１及び動領域分離予測９０２で対応する予測画像信号が生成される。本実施の形態では、NumOfGMP=2の例を示しており、gmv_param[0]は水平方向の動きベクトルを、gmv_param[1]は垂直方向の動きベクトルを表している。これらの情報は、グローバルベクトル推定部８０２によって算出され、符号化制御部１０７が与える予測情報１１９として符号列符号化部１０８で符号化される。 FIG. 27 shows an example of slice header syntax in the present embodiment. The slice_global_motion_flag shown in the figure is a flag indicating whether or not to perform motion region separation prediction using the global MV 803. When slice_global_motion_flag is 0, the background image generation unit 901 and the motion region separation prediction unit 902 perform the same prediction as the background image generation unit 303 and the motion region separation prediction unit 302 described in the first embodiment. That is, the global MV 803 is not sent and is not used. On the other hand, when slice_global_motion_flag is 1, gmv_param is encoded by the number of NumOfGMP indicating the number of parameters of global MV 803 determined in advance. Using these pieces of information, a corresponding predicted image signal is generated by the background image generation unit 901 and the moving region separation prediction 902. In the present embodiment, an example of NumOfGMP = 2 is shown, where gmv_param [0] represents a horizontal motion vector and gmv_param [1] represents a vertical motion vector. These pieces of information are calculated by the global vector estimation unit 802 and encoded by the code string encoding unit 108 as the prediction information 119 given by the encoding control unit 107.

ここで、本実施の形態ではgmv_paramが直接グローバルMV８０３のパラメータとして与えられる例を示したが、直近に符号化されたスライスのグローバルＭＶ８０３からの差分値を符号化しても良いし、予め定めた予測方法によってグローバルＭＶ８０３を算出し、そこからの差分値を符号化しても良い。 Here, although an example in which gmv_param is directly given as a parameter of global MV 803 has been shown in the present embodiment, a difference value from global MV 803 of the most recently encoded slice may be encoded, or a predetermined prediction A global MV 803 may be calculated by a method, and a difference value therefrom may be encoded.

以上が、本発明に係わる動画像符号化装置１００のインター予測部８０１の説明である。 The above is the description of the inter prediction unit 801 of the video encoding device 100 according to the present invention.

（第３の実施形態：適応補間フィルタ）
本実施の形態では、動画像符号化装置１００の構造は図２と同一であるため、説明を省略する。但し、予測部１０６の機能が異なっているため、予測部１００１が設けられている。図２１に第３の実施形態における予測部１００１を示す。尚、既に説明した機能と同様の機能を持つものに対しては同じインデックスを与えて、その説明を省略する。尚、インター予測部２０２は、機能の違いのためにインター予測部１１０１が設けられている。 (Third embodiment: adaptive interpolation filter)
In the present embodiment, the structure of moving picture coding apparatus 100 is the same as that in FIG. However, since the function of the prediction unit 106 is different, the prediction unit 1001 is provided. FIG. 21 shows a prediction unit 1001 in the third embodiment. In addition, the same index is given to those having the same functions as those already described, and the description thereof is omitted. Note that the inter prediction unit 202 is provided with an inter prediction unit 1101 for functional differences.

予測部１００１では、インター予測部１１０１に加えて、動き補償フィルタ係数推定部１１０２が設けられている。動き補償フィルタ係数推定部１１０２は、符号化フレーム毎、符号化スライス毎或いはマクロブロック毎に、インター予測の動き補償処理で用いるフィルタ係数１１０３を算出する機能を有する。本実施の形態では、動き補償処理として二次元６タップのＦＩＲフィルタを例に挙げて説明するが、タップ数はＮタップと仮定することが可能であり、利用するハードウェア等の制限によって自由に選択できる。また、一次元フィルタ、二次元フィルタや三次元フィルタなども適用可能である。 The prediction unit 1001 includes a motion compensation filter coefficient estimation unit 1102 in addition to the inter prediction unit 1101. The motion compensation filter coefficient estimation unit 1102 has a function of calculating the filter coefficient 1103 used in the inter prediction motion compensation process for each encoded frame, each encoded slice, or each macroblock. In this embodiment, a two-dimensional 6-tap FIR filter will be described as an example of motion compensation processing. However, the number of taps can be assumed to be N taps, and can be freely set according to restrictions on the hardware to be used. You can choose. In addition, a one-dimensional filter, a two-dimensional filter, a three-dimensional filter, and the like are applicable.

動き補償フィルタ係数推定部１１０２では、入力画像信号１１０や予測画像信号１１７の性質に応じてフィルタ係数を設計する。例えば、既に第１、２の実施の形態中の動き補償部３０１で説明したように、フィルタ係数固定の動き補償フィルタで予測したときの予測誤差と動きベクトルとの対応関係を累積し、動きベクトルが指す分数位置毎の予測誤差が最小になるように、最小二乗法を用いてフィルタ係数を算出する。このときの評価基準として次式（１５）を用いる。

The motion compensation filter coefficient estimation unit 1102 designs filter coefficients according to the properties of the input image signal 110 and the predicted image signal 117. For example, as described in the motion compensation unit 301 in the first and second embodiments, the correspondence between the prediction error and the motion vector when predicted by the motion compensation filter with a fixed filter coefficient is accumulated, and the motion vector The filter coefficient is calculated using the least square method so that the prediction error for each fractional position indicated by is minimized. The following formula (15) is used as an evaluation criterion at this time.

ここで、Oは入力画像信号１１０を示しており、ＭＣは固定フィルタを用いて算出された予測画像信号である。ｈは導出するフィルタ係数１１０３を示しており、(i,j)はフィルタリング処理を行う分数位置を示している。また、(a,b)はフィルタオフセットを示す固定値である。式（１５）の二乗コストが最小となるようにフィルタ係数ｈを設計する。設計されたフィルタ係数１１０３は、インター予測部１１０１へと入力される。 Here, O indicates the input image signal 110, and MC is a predicted image signal calculated using a fixed filter. h indicates the filter coefficient 1103 to be derived, and (i, j) indicates the fractional position where the filtering process is performed. (A, b) is a fixed value indicating the filter offset. The filter coefficient h is designed so that the square cost of Expression (15) is minimized. The designed filter coefficient 1103 is input to the inter prediction unit 1101.

本実施の形態では、通常の固定動き補償フィルタを用いてフィルタを設計する方法について説明したが、入力画像信号１１０の特徴量を用いてフィルタを設計しても良い。例えば、高周波数成分用のフィルタ係数セット、中周波数成分用のフィルタ係数セット、低周波数成分用のフィルタ係数セットを用意しておき、入力画像信号の周波数特性に応じて、選択的にフィルタ係数を入力しても良い。 In this embodiment, the method of designing a filter using a normal fixed motion compensation filter has been described. However, the filter may be designed using the feature amount of the input image signal 110. For example, a filter coefficient set for high-frequency components, a filter coefficient set for medium-frequency components, and a filter coefficient set for low-frequency components are prepared, and filter coefficients are selectively selected according to the frequency characteristics of the input image signal. You may enter.

次に、インター予測部１１０１について説明する。図２２はインター予測部１１０１のブロック図である。第1の実施形態の動領域分離予測部３０２に対してフィルタ係数１１０３が入力されている以外は図１と同一であるため、それ以外の説明を省略する。 Next, the inter prediction unit 1101 will be described. FIG. 22 is a block diagram of the inter prediction unit 1101. Since it is the same as that of FIG. 1 except that the filter coefficient 1103 is input to the moving region separation prediction unit 302 of the first embodiment, the other description is omitted.

先ず動領域分離予測部１２０１について説明する。動領域分離予測部１２０１は、動きベクトル推定部２０３から出力された動きベクトル２０７、参照画像メモリ１０５から出力された参照画像信号１１６、及び背景画像信号生成部９０１から出力された背景画像信号３０６、及びフィルタ係数１１０３が入力される。動領域分離予測部１２０１は、入力された動領域分離マスク１１５を用いて、動領域に対して適応動き補償処理を、背景領域に対しては、背景画像信号３０６の補填を行い、別々の予測方法で予測された信号を合成する機能を有する。尚、入力された動きベクトル２０７を用いて動領域分離マスク１１５にもマッチングを行う。つまり、動き補償部３０１の説明で述べた動きベクトルから補間位置の導出を動領域分離マスク１１５にも適用する。この場合、動領域分離マスクは整数画素精度のみなので、分数精度の動きベクトルの場合は、整数画素精度へのマッピングを行う。１／４画素精度の動き補償処理の場合の整数画素位置へのマッピングは式（１０）で表される。導出された整数精度の動きベクトルを用いて次式（１６）によって予測画像信号が生成される。

First, the moving region separation prediction unit 1201 will be described. The motion region separation prediction unit 1201 includes a motion vector 207 output from the motion vector estimation unit 203, a reference image signal 116 output from the reference image memory 105, and a background image signal 306 output from the background image signal generation unit 901. The filter coefficient 1103 is input. The motion region separation prediction unit 1201 uses the input motion region separation mask 115 to perform adaptive motion compensation processing for the motion region and perform background image signal 306 compensation for the background region to perform separate prediction. A function of synthesizing a signal predicted by the method; Note that matching is also performed on the moving region separation mask 115 using the input motion vector 207. That is, the derivation of the interpolation position from the motion vector described in the description of the motion compensation unit 301 is also applied to the motion region separation mask 115. In this case, since the moving region separation mask has only integer pixel accuracy, in the case of a fractional accuracy motion vector, mapping to integer pixel accuracy is performed. The mapping to the integer pixel position in the case of the ¼ pixel precision motion compensation process is expressed by Expression (10). A predicted image signal is generated by the following equation (16) using the derived integer precision motion vector.

ここでＡＭＣは適応動き補償予測によって導出された予測値を示している。より具体的に図７を参照しながら適応動き補償予測を説明する。 Here, AMC indicates a prediction value derived by adaptive motion compensation prediction. The adaptive motion compensation prediction will be described more specifically with reference to FIG.

最初に１／２画素位置に対応するａ，ｂ，ｃ，ｄ，ｈ，ｎの画素位置の予測値を６タップの１次元フィルタで生成する。例えばａ、ｄの画素位置に対応する予測値は次式（１７）で生成される。

First, predicted values of pixel positions a, b, c, d, h, and n corresponding to 1/2 pixel positions are generated by a 6-tap one-dimensional filter. For example, predicted values corresponding to the pixel positions a and d are generated by the following equation (17).

次に残りの分数精度位置に対応するｅ，ｆ，ｇ，ｉ，ｊ，ｋ，ｐ，ｑ，ｒの画素位置の予測値を６タップの２次元フィルタで生成する。例えばｅの画素位置に対応する予測は次式（１８）で生成される。

Next, predicted values of pixel positions e, f, g, i, j, k, p, q, and r corresponding to the remaining fractional precision positions are generated by a 6-tap two-dimensional filter. For example, the prediction corresponding to the pixel position of e is generated by the following equation (18).

上記生成方法で予測画像を作成する場合、フィルタ係数が最大３６０個程度発生する。そこで、空間的対照性を加味してフィルタ係数を統合する。例えば、上記画素ａ，ｃ，ｄ，ｌの対象性を利用して次式（１９）でフィルタ係数を統合する。

When a predicted image is created by the above generation method, about 360 filter coefficients are generated at the maximum. Therefore, the filter coefficients are integrated taking into account the spatial contrast. For example, the filter coefficients are integrated by the following equation (19) using the objectivity of the pixels a, c, d, and l.

このような対照性を利用した係数を利用することで、適応動き補償予測で用いるフィルタ係数を削減することが可能である。 By using a coefficient using such contrast, it is possible to reduce the filter coefficient used in adaptive motion compensation prediction.

このように動き補償フィルタ係数推定部１１０２で算出され、入力されたフィルタ係数１１０３を利用して式（１６）のＡＭＣの予測画像信号を生成する。 In this way, the AMC prediction image signal of Expression (16) is generated using the filter coefficient 1103 calculated and input by the motion compensation filter coefficient estimation unit 1102.

動領域に対しては、算出されたフィルタ係数１１０３を用いて、適応動き補償を行い、背景領域に対しては、背景画像信号３０６を補填することによって、動いているオブジェクトと背景領域毎に最適な予測画像信号が生成できるため、予測精度を高めることが可能となる。このようにして作成された予測画像信号１１７が動領域分離予測部１２０１から出力されるとともに、この時利用したブロック形状、動きベクトル２０７、及びフィルタ係数１１０３などの予測情報１１９が、符号化制御部１０７に記録され、エントロピー符号化され、最終的に符号化データに多重化される。 For the moving area, adaptive motion compensation is performed using the calculated filter coefficient 1103, and for the background area, the background image signal 306 is compensated to optimize the moving object and the background area. Since a predictive image signal can be generated, the prediction accuracy can be increased. The prediction image signal 117 generated in this way is output from the motion region separation prediction unit 1201 and the prediction information 119 such as the block shape, the motion vector 207, and the filter coefficient 1103 used at this time is included in the encoding control unit. It is recorded in 107, entropy-encoded, and finally multiplexed into encoded data.

図２７に本実施の形態におけるスライスヘッダーシンタクスの例を示す。図中に示されるslice_adaptive_filter_flagは、適応動き補償予測を利用した動領域分離予測を行うかどうかを示すフラグである。slice_adaptive_filter_flagが0である場合、動領域分離予測部１２０１は、第１の実施の形態で説明した動領域分離予測部３０２と同様の予測を行う。つまり、動画素に対する適応動き補償予測は行われず、フィルタ係数も利用しない。一方、slice_adaptive_filter_flagが１である場合、予め定められた二次元のフィルタ係数１１０３の個数を示すNumOfPosXとNumOfPosYの数だけ、filter_coeffを符号化する。これらの情報を用いて、動領域分離予測１２０１で動画素に対して適応動き補償予測が行われ、予測画像信号が生成される。これらの情報は、動き補償フィルタ係数推定部１１０２によって算出され、符号化制御部１０７が与える予測情報１１９として符号列符号化部１０８で符号化される。 FIG. 27 shows an example of slice header syntax in the present embodiment. The slice_adaptive_filter_flag shown in the figure is a flag indicating whether or not to perform motion region separation prediction using adaptive motion compensation prediction. When slice_adaptive_filter_flag is 0, the moving region separation prediction unit 1201 performs the same prediction as the moving region separation prediction unit 302 described in the first embodiment. That is, adaptive motion compensation prediction for moving pixels is not performed, and filter coefficients are not used. On the other hand, when slice_adaptive_filter_flag is 1, filter_coeff is encoded by the number of NumOfPosX and NumOfPosY indicating the number of predetermined two-dimensional filter coefficients 1103. Using these pieces of information, adaptive motion compensation prediction is performed on the moving pixels in the moving region separation prediction 1201, and a predicted image signal is generated. These pieces of information are calculated by the motion compensation filter coefficient estimation unit 1102 and encoded by the code string encoding unit 108 as the prediction information 119 given by the encoding control unit 107.

ここで、本実施の形態ではfilter_coeffが直接フィルタ係数１１０３のパラメータとして与えられる例を示したが、直近に符号化されたスライスのフィルタ係数１１０３からの差分値を符号化しても良いし、予め定めた予測方法によってフィルタ係数１１０３を算出し、そこからの差分値を符号化しても良い。 Here, although an example in which filter_coeff is directly given as a parameter of the filter coefficient 1103 has been described in the present embodiment, a difference value from the filter coefficient 1103 of the most recently encoded slice may be encoded or predetermined. Alternatively, the filter coefficient 1103 may be calculated by the prediction method, and the difference value therefrom may be encoded.

以上が、本発明に係わる動画像符号化装置１００のインター予測部１１０１の説明である。 The above is the description of the inter prediction unit 1101 of the video encoding device 100 according to the present invention.

以上説明したように、本実施形態では、矩形ブロックに適さない動オブジェクトを予測するために、過度のブロック分割が施されて、ブロック分割情報が増大することを防ぐ。つまり、付加的な情報を増加させずに、ブロック内の動領域と背景領域を分離し、それぞれに最適な予測方法を適用することによって、符号化効率を向上させると共に主観画質も向上するという効果を奏する。 As described above, in this embodiment, in order to predict a moving object that is not suitable for a rectangular block, excessive block division is prevented and block division information is prevented from increasing. In other words, the effect of improving coding efficiency and subjective image quality by separating the motion area and background area in a block without applying additional information and applying an optimal prediction method to each of them. Play.

＜動画像復号化装置＞
次に、動画像復号化に関する第４〜第６の実施形態について述べる。
（第４の実施形態）
図２９は、図１〜図２８を用いて説明した第１〜第３の実施形態に従う動画像符号化装置に対応する、第４の実施形態に従う動画像復号化装置を示している。動画像復号化装置４００は、入力バッファ４０１から入力される符号化データ４０９を復号する符号列復号部４０２、符号列復号部４０２からの変換係数を逆量子化し、逆変換する逆量子化・逆変換部４０３、逆量子化・逆変換部４０３からの予測誤差信号４１１と予測画像信号４１５とを加算する加算器４０４、加算器４０４からの復号画像信号を参照画像として記憶する参照画像メモリ４０５、参照画像信号４１３，動領域マスク４１４，予測情報および動きベクトル４１７を受けて予測画像信号４１５を生成する予測部４０６を備えている。動画像符号化装置４００は符号化制御部４０８によって制御され、復号画像信号を出力バッファ４０７に出力する。 <Video decoding device>
Next, fourth to sixth embodiments relating to moving picture decoding will be described.
(Fourth embodiment)
FIG. 29 illustrates a video decoding device according to the fourth embodiment corresponding to the video encoding devices according to the first to third embodiments described with reference to FIGS. 1 to 28. The video decoding apparatus 400 includes a code string decoding unit 402 that decodes encoded data 409 input from the input buffer 401, and inverse quantization and inverse conversion that inversely quantizes transform coefficients from the code string decoding unit 402 and performs inverse transform. An adder 404 that adds the prediction error signal 411 and the predicted image signal 415 from the transform unit 403, the inverse quantization / inverse transform unit 403, and a reference image memory 405 that stores the decoded image signal from the adder 404 as a reference image; A prediction unit 406 that receives the reference image signal 413, the moving area mask 414, the prediction information, and the motion vector 417 and generates a predicted image signal 415 is provided. The moving image encoding apparatus 400 is controlled by the encoding control unit 408 and outputs the decoded image signal to the output buffer 407.

上記構成において、図１に示した動画像符号化装置１００などから送出され、蓄積系または伝送系を経て送られてきた符号化データ４０９は入力バッファ４０１に一度蓄えられ、多重化された符号化データが動画像復号化装置４００に入力される。 In the above configuration, encoded data 409 transmitted from the moving image encoding apparatus 100 shown in FIG. 1 and transmitted via the storage system or transmission system is once stored in the input buffer 401 and multiplexed. Data is input to the video decoding device 400.

動画像復号化装置４００では、符号化データが符号列復号部４０２に入力され、１フレームまたは１フィールド毎にシンタクスに基づいて構文解析による解読が行われる。すなわち、符号列復号部４０２では、順次各シンタクスの符号列がエントロピー復号化され、予測情報４１６、変換係数４１０、対象ブロックの符号化パラメータなどが再生される。本実施の形態においては、符号化パラメータとは、予測情報４１６はもとより、変換係数に関する情報、量子化に関する情報、などの復号の際に必要になるあらゆるパラメータを指す。 In the video decoding device 400, the encoded data is input to the code string decoding unit 402, and decoding is performed by syntax analysis based on the syntax for each frame or field. That is, in the code string decoding unit 402, the code string of each syntax is sequentially entropy decoded, and the prediction information 416, the transform coefficient 410, the encoding parameter of the target block, and the like are reproduced. In the present embodiment, the encoding parameter refers to all parameters necessary for decoding such as information on transform coefficients and information on quantization as well as prediction information 416.

符号列復号部４０２で解読が行われた変換係数４１０は、逆量子化・逆変換部４０３へと入力される。符号列復号部４０２によって解読された量子化に関する様々な情報、すなわち、量子化パラメータや量子化マトリクスは、復号化制御部４０８に設定され、逆量子化処理として利用される際にロードされる。ロードされた量子化に関する情報を用いて、逆量子化・逆変換部４０３では、最初に逆量子化処理が行われる。逆量子化された変換係数４１０は、続いて逆変換処理（例えば逆離散コサイン変換など）が実行される。ここでは、逆直交変換について説明したが、符号化装置でウェーブレット変換などが行われている場合、逆量子化・逆変換部４０３は、対応する逆量子化及び逆ウェーブレット変換などが実行されても良い。 The transform coefficient 410 decoded by the code string decoding unit 402 is input to the inverse quantization / inverse transform unit 403. Various information relating to the quantization decoded by the code string decoding unit 402, that is, the quantization parameter and the quantization matrix are set in the decoding control unit 408 and loaded when used as an inverse quantization process. The inverse quantization / inverse transform unit 403 first performs inverse quantization processing using the loaded information regarding quantization. The inversely quantized transform coefficient 410 is subsequently subjected to inverse transform processing (for example, inverse discrete cosine transform). Here, the inverse orthogonal transform has been described. However, when wavelet transform or the like is performed in the encoding apparatus, the inverse quantization / inverse transform unit 403 may execute the corresponding inverse quantization and inverse wavelet transform. good.

逆量子化・逆変換部４０３を通って、復元された予測誤差信号４１１は加算器４０４へと入力され、ここで後述する予測部４０６で生成された予測画像信号４１５と加算され、復号画像信号４１２が生成される。生成された復号画像信号４１２は、動画像復号化装置４００から出力されて、出力バッファ４０７に一旦蓄積された後、復号化制御部４０８が管理する出力タイミングに従って出力される。また、この復号画像信号４１２は参照画像メモリ４０５へと保存され、参照画像信号４１３となる。参照画像信号４１３は参照画像メモリ４０５から、順次フレーム毎或いはフィールド毎に読み出され、予測部４０６へと入力される。 Through the inverse quantization / inverse transform unit 403, the restored prediction error signal 411 is input to the adder 404, where it is added to the predicted image signal 415 generated by the prediction unit 406, which will be described later, and the decoded image signal 412 is generated. The generated decoded image signal 412 is output from the video decoding device 400, temporarily stored in the output buffer 407, and then output according to the output timing managed by the decoding control unit 408. The decoded image signal 412 is stored in the reference image memory 405 and becomes a reference image signal 413. The reference image signal 413 is sequentially read from the reference image memory 405 for each frame or each field and input to the prediction unit 406.

次に、予測部４０６について説明する。符号列復号部４０２で解読された予測方法を示す予測情報４１６が予測部４０６に入力されるとともに、参照画像メモリ４０５に蓄積されている既に符号化済みの復号画像信号４１２が参照画像４１３として予測部４０６へと入力される。尚、本図では、説明時の簡略化のために、動き補償予測と動領域分離予測で利用される予測情報４１６中の動きベクトル４１７を分けて入力している。 Next, the prediction unit 406 will be described. Prediction information 416 indicating the prediction method decoded by the code string decoding unit 402 is input to the prediction unit 406, and an already encoded decoded image signal 412 stored in the reference image memory 405 is predicted as a reference image 413. Input to the unit 406. In this figure, for simplification of explanation, the motion vector 417 in the prediction information 416 used in motion compensation prediction and motion region separation prediction is input separately.

図３０に予測部４０６のブロック図を示す。予測部４０６は、予測切替スイッチ５０３、イントラ予測部５０１、インター予測部５０２を備えている。予測部４０６に入力された予測情報４１６に含まれる予測モードに従って予測切替スイッチ５０３は、どの予測方法で予測するかを切り替える機能を有する。予測モードがイントラ予測であった場合、予測切替スイッチ５０３はイントラ予測部５０１へと接続される。一方、予測モードがインター予測であった場合、予測切替スイッチはインター予想部５０２へと接続される。 FIG. 30 shows a block diagram of the prediction unit 406. The prediction unit 406 includes a prediction changeover switch 503, an intra prediction unit 501, and an inter prediction unit 502. The prediction changeover switch 503 has a function of switching which prediction method is used according to the prediction mode included in the prediction information 416 input to the prediction unit 406. When the prediction mode is intra prediction, the prediction changeover switch 503 is connected to the intra prediction unit 501. On the other hand, when the prediction mode is inter prediction, the prediction changeover switch is connected to the inter prediction unit 502.

イントラ予測部５０１は、第１の実施の形態で説明した処理を行って予測画像信号４１７を生成する。本実施の形態では、４×４画素イントラ予測（図４Ｃ参照）、８×８画素イントラ予測（図４Ｄ参照）、１６ｘ１６画素イントラ予測（図４Ｂ参照）が規定されている。このイントラ予測では、参照画像メモリ４０５に保存されている参照画像信号４１３から、補間画素を作成し、空間方向にコピーすることによって予測値を生成している。 The intra prediction unit 501 generates the predicted image signal 417 by performing the processing described in the first embodiment. In the present embodiment, 4 × 4 pixel intra prediction (see FIG. 4C), 8 × 8 pixel intra prediction (see FIG. 4D), and 16 × 16 pixel intra prediction (see FIG. 4B) are defined. In this intra prediction, an interpolated pixel is created from a reference image signal 413 stored in the reference image memory 405, and a predicted value is generated by copying in the spatial direction.

次に、インター予測部５０２について説明する。インター予測部５０２の構造は図２で説明した動画像符号化装置におけるインター予測部２０２と全く同じである。但し、予測部４０６内で生成される予測画像信号４１５は、予測情報４１６で与えられる予測モードのみの予測画像信号生成処理だけを行えばよい。つまり、与えられた予測モード以外の予測画像信号４１５を生成する必要はない。例えば、予測情報４１６で与えられる予測モードがインター予測である場合、符号列復号部４０２にて解読され、生成された動きベクトル４１７と予測情報４１６内に含まれるブロック形状情報、利用する参照画像信号のインデックスなどが与えられ、対象ブロックに対してこれらの与えられた情報からただ１つの予測画像信号４１５を生成すればよい。 Next, the inter prediction unit 502 will be described. The structure of the inter prediction unit 502 is exactly the same as that of the inter prediction unit 202 in the video encoding apparatus described with reference to FIG. However, the prediction image signal 415 generated in the prediction unit 406 only needs to perform a prediction image signal generation process only in the prediction mode given by the prediction information 416. That is, it is not necessary to generate a predicted image signal 415 other than the given prediction mode. For example, when the prediction mode given by the prediction information 416 is inter prediction, the motion vector 417 decoded and generated by the code string decoding unit 402, the block shape information included in the prediction information 416, and the reference image signal to be used It is sufficient to generate only one prediction image signal 415 from the given information for the target block.

インター予測部５０２（２０２）内の動き補償部３０１についてより詳細に説明する。動き補償部３０１では、まず動きベクトル４１７（２０７）の情報に従って、当該予測画素ブロックの位置から、式（３）を用いて動きベクトル４１７（２０７）で参照されている位置を割り出す。ここでは、Ｈ．２６４の１／４画素精度の補間を例に挙げて説明する。動きベクトルの各成分が４の倍数である場合は、整数画素位置を指していることを意味する。それ以外の場合は、分数精度の補間位置に対応する予測位置であることがわかる。次に割り出した画素位置に対して、参照画像信号４１３（１１６）の対応する画素位置の補填もしくは補間処理によって予測画素を生成する。図８にＨ．２６４の予測画素生成の例を示す。例えば、図中でアルファベットｂ、ｈの位置に対応する１／２画素の補間処理は式（４）で算出される。また、図中でアルファベットａ、ｄの位置に対応する１／４画素の補間処理は式（５）で算出される。このように１／２画素位置の補間画素は、６タップＦＩＲフィルタ（タップ係数：(1，−５，２０，２０、−５，１)／３２）を用いて生成し、１／４画素位置の補間画素は、２タップの平均値フィルタ（タップ係数：（１／２，１／２））を用いて算出される。４つの整数画素位置の中間に存在するアルファベットｊに対応する１／２画素の補間処理は、垂直方向６タップと水平方向６タップの両方向を行うことによって生成される。説明した以外の画素位置も同様のルールで補間値が生成できる。以上が、動き補償部３０１おける予測画像信号生成の例である。 The motion compensation unit 301 in the inter prediction unit 502 (202) will be described in more detail. In the motion compensation unit 301, first, according to the information of the motion vector 417 (207), the position referred to by the motion vector 417 (207) is calculated from the position of the predicted pixel block using Expression (3). Here, H. A description will be given by taking an example of H.264 1/4 pixel precision interpolation. When each component of the motion vector is a multiple of 4, it means that it is an integer pixel position. In other cases, it is understood that the predicted position corresponds to the interpolation position with fractional accuracy. Next, with respect to the determined pixel position, a predicted pixel is generated by interpolation or interpolation of the corresponding pixel position of the reference image signal 413 (116). FIG. An example of H.264 prediction pixel generation is shown. For example, the interpolation process of 1/2 pixel corresponding to the positions of alphabets b and h in the figure is calculated by Expression (4). In the figure, the interpolation processing of 1/4 pixels corresponding to the positions of the alphabets a and d is calculated by Expression (5). Thus, the interpolation pixel at the 1/2 pixel position is generated using a 6-tap FIR filter (tap coefficient: (1, -5, 20, 20, -5, 1) / 32), and the 1/4 pixel position is obtained. These interpolation pixels are calculated using a 2-tap average value filter (tap coefficients: (1/2, 1/2)). The interpolation process of 1/2 pixel corresponding to the alphabet j existing in the middle of the four integer pixel positions is generated by performing both directions of 6 taps in the vertical direction and 6 taps in the horizontal direction. Interpolated values can be generated by the same rule for pixel positions other than those described. The above is an example of predictive image signal generation in the motion compensation unit 301.

次に、背景画像生成部３０３について説明する。背景画像生成部３０３は、入力された参照画像信号４１３（１１６）を用いて、背景画像信号３０６及び動領域分離マスク４１４（１１５）を生成する機能と、生成した背景画像信号３０６を保持するメモリとしての機能を有する。まず、動領域分離マスク４１４（１１５）の生成について説明する。動領域分離マスク４１４（１１５）は、参照画像信号４１３（１１６）で提供された各々の時刻で復号された復号画像信号１１４に対してそれぞれ１つずつ存在する。動領域分離マスク４１４（１１５）は、同時刻に復号された復号画像信号４１２の各画素に対して、それ以前に復号された復号画像信号４１２との時間的な輝度変化（差分値）が予め定めた規定値ＴＨよりも小さい場合に背景画素と認定し、輝度変化が規定値ＴＨを超える場合を動画素と判定する２値のマスクマップであり式（６）で表される。 Next, the background image generation unit 303 will be described. The background image generation unit 303 uses the input reference image signal 413 (116) to generate a background image signal 306 and a moving region separation mask 414 (115), and a memory that holds the generated background image signal 306 As a function. First, generation of the moving region separation mask 414 (115) will be described. One moving region separation mask 414 (115) exists for each decoded image signal 114 decoded at each time provided by the reference image signal 413 (116). The moving region separation mask 414 (115) has a temporal luminance change (difference value) with respect to each pixel of the decoded image signal 412 decoded at the same time in advance with the decoded image signal 412 previously decoded. This is a binary mask map that is recognized as a background pixel when it is smaller than the prescribed value TH and is determined to be a moving pixel when the luminance change exceeds the prescribed value TH, and is represented by Expression (6).

利用可能な参照画像信号４１３（１１６）が複数ある場合は、式（７）を用いて背景画素であるか、動画素であるかを判定する。図９に複数の参照画像信号に対して差分値を求める際の対応関係を示す。また、図１１に予測対象画素ブロックからの時間的距離に応じて重みｗを変える例を示す。 When there are a plurality of reference image signals 413 (116) that can be used, it is determined whether the pixel is a background pixel or a moving pixel using Equation (7). FIG. 9 shows a correspondence relationship when obtaining difference values for a plurality of reference image signals. FIG. 11 shows an example in which the weight w is changed according to the temporal distance from the prediction target pixel block.

尚、上記では単純に差分値のみによって画素の領域判定を行う例を示したが、代表値を決める指標として、利用可能な複数の復号画像信号間（時間方向）の画素の差分値、差分値の最大値、差分値の平均値、差分値のメディアン値、差分値の分散を用いて判定しても良いし、復号画像信号の領域判定を行う画素に隣接する画素（空間方向）の差分値の最大値、差分値の平均値、差分値のメディアン値、差分値の分散、などの指標を用いて判定しても良い。 In addition, although the example which performs the area | region determination of a pixel only by only a difference value was shown above, as an index for determining a representative value, a pixel difference value or a difference value between a plurality of available decoded image signals (time direction) May be determined using the maximum value, the average value of the difference values, the median value of the difference values, and the variance of the difference values, or the difference value of the pixels (space direction) adjacent to the pixel that performs the region determination of the decoded image signal It may be determined using indices such as the maximum value, the average value of the difference values, the median value of the difference values, and the variance of the difference values.

また、一度生成した動領域分離マスクに対して補正を行っても良い。例えば、生成した動領域分離マスクの補正対象画素において、隣接位置に対応する上下左右の４点、或いは対角方向も含めた９点のマスクの値を用いて、孤立点となるような領域を補正したり、予測ブロック形状に合わせてブロック境界のマスクの値を修正したりしても良い。この場合の例が式（８）に示されている。図１１に対象画素と隣接画素の関係を示す。図１０の丸印の密度が高くなるほど対象画素との距離が離れることを意味している。更に図１２に空間方向の市街地距離に応じて、重みｖを変更する例を示す。このように空間方向に対して重みｖを適切に設定することによって、生成された動領域分離マスクを補正することができ、孤立点の除去、不連続点の連結、矩形ブロックへの領域拡大・縮小、エッジ修正、画素補填、画素マッチング等が可能となる。尚、本発明の本実施の形態では、市街地距離による重み変更例を示したが、距離の定義は市街地距離、マンハッタン距離などを含む、ミンコフスキー距離の中から１つを用いて計算することが可能である。 Further, correction may be performed on the once generated moving region separation mask. For example, in the correction target pixel of the generated moving region separation mask, an area that becomes an isolated point is obtained by using the mask values of four points on the top, bottom, left, and right corresponding to the adjacent positions, or nine points including the diagonal direction. It may be corrected or the mask value at the block boundary may be corrected in accordance with the predicted block shape. An example of this case is shown in equation (8). FIG. 11 shows the relationship between the target pixel and adjacent pixels. As the density of the circles in FIG. 10 increases, the distance from the target pixel increases. Further, FIG. 12 shows an example in which the weight v is changed according to the city area distance in the spatial direction. By appropriately setting the weight v in the spatial direction in this way, the generated moving region separation mask can be corrected, and isolated points are removed, discontinuous points are connected, and the region is expanded to a rectangular block. Reduction, edge correction, pixel compensation, pixel matching, and the like are possible. In the present embodiment of the present invention, an example of changing the weight based on the city distance is shown, but the definition of the distance can be calculated using one of the Minkowski distances including the city distance, the Manhattan distance, and the like. It is.

次に、背景画像信号３０６の生成について説明する。背景画像信号３０６とは、時間方向で輝度変化が少ない背景領域のみを集めた信号となっており、動領域分離マスク４１４（１１５）と時間的に最近接の復号画像信号４１２に基づいて画素毎に導出される。前述した動領域分離マスク４１４（１１５）から式（９）を用いて背景画像信号３０６を生成する。当該時刻の背景画像信号３０６の更新時は、参照画像信号４１３（１１６）の中の時間的に最近接の復号画像信号４１２と動領域分離マスク４１４（１１５）を利用し、マスクの値が0（背景画素の場合）のみ、最近接の復号画像信号１１４と更新前の背景画像信号３０６の重み付き和で更新が行われる。重み付き和は例えばwt=1/2に設定することで平均値フィルタとなる。一方、マスクの値が１（動画素の場合）には、更新は行われない。尚、背景画像信号３０６の初期値は、予め定めた輝度値（例えば輝度信号であれば０や最大輝度値（8ビットでは２５６）、色差信号であれば中間輝度値（8ビットで１２８））で埋めておいても良いし、画面内予測だけで符号化されるようなI-sliceの輝度値を用いたりしても良い。背景画像信号３０６のリフレッシュは、I-sliceが挿入された場合や、ＩＤＲピクチャが挿入された場合に行われる。本実施の形態ではI-sliceのタイミングで必ず背景画像信号３０６のリフレッシュが行われる例を示している。以上のような過程で背景画像信号３０６が適切なタイミングで更新される。 Next, generation of the background image signal 306 will be described. The background image signal 306 is a signal obtained by collecting only background regions with little change in luminance in the time direction, and each pixel is based on the moving region separation mask 414 (115) and the decoded image signal 412 closest in time. To be derived. A background image signal 306 is generated from the moving region separation mask 414 (115) using Equation (9). When the background image signal 306 at the time is updated, the decoded image signal 412 and the moving region separation mask 414 (115) that are closest in time in the reference image signal 413 (116) are used, and the mask value is 0. Only in the case of the background pixel, the update is performed with the weighted sum of the nearest decoded image signal 114 and the background image signal 306 before the update. The weighted sum becomes an average value filter by setting, for example, wt = 1/2. On the other hand, when the mask value is 1 (in the case of moving pixels), no update is performed. The initial value of the background image signal 306 is a predetermined luminance value (for example, 0 or maximum luminance value (256 for 8 bits) for luminance signals, and intermediate luminance value (128 for 8 bits) for color difference signals). It is also possible to use I-slice luminance values that are encoded only by intra prediction. The background image signal 306 is refreshed when an I-slice is inserted or when an IDR picture is inserted. In this embodiment, an example is shown in which the background image signal 306 is always refreshed at the timing of I-slice. In the above process, the background image signal 306 is updated at an appropriate timing.

尚、背景画像信号３０６は、背景画像生成部３０３の内部メモリに保持されており、更新された信号が動領域分離予測部３０２へと出力される。また、生成された動領域分離マスク４１４（１１５）は、インター予測部２０２から出力され、予測部１０６を経て、同じ時刻の復号画像信号４１２とともに参照画像信号４１３として参照画像メモリ４０５へと保存される。 The background image signal 306 is held in the internal memory of the background image generation unit 303, and the updated signal is output to the moving region separation prediction unit 302. The generated moving region separation mask 414 (115) is output from the inter prediction unit 202, and is stored in the reference image memory 405 as a reference image signal 413 together with the decoded image signal 412 at the same time via the prediction unit 106. The

次に、動領域分離予測部３０２について説明する。動領域分離予測部３０２は、符号列復号部４０２で解読された動きベクトル４１７（２０７）、参照画像メモリ４０５から出力された参照画像信号４１３（１１６）、及び背景画像信号生成部３０３から出力された背景画像信号３０６が入力される。動領域分離予測部３０２は、入力された動領域分離マスク４１４（１１５）を用いて、動領域に対して動き補償処理を、背景領域に対しては、背景画像信号の補填を行い、別々の予測方法で予測された信号を合成する機能を有する。尚、入力された動きベクトル４１７（２０７）を用いて動領域分離マスク４１４（１１５）にもマッチングを行う。つまり、動き補償部３０１の説明で述べた動きベクトルから補間位置の導出を動領域分離マスク１１５にも適用する。この場合、動領域分離マスクは整数画素精度のみなので、分数精度の動きベクトルの場合は、整数画素精度へのマッピングを行う。１／４画素精度の動き補償処理の場合の整数画素位置へのマッピングは式（１０）で表される。導出された整数精度の動きベクトルを用いて式（１１）のようにして動領域分離予測が行われる。 Next, the moving region separation prediction unit 302 will be described. The motion region separation prediction unit 302 outputs the motion vector 417 (207) decoded by the code string decoding unit 402, the reference image signal 413 (116) output from the reference image memory 405, and the background image signal generation unit 303. The background image signal 306 is input. The motion region separation prediction unit 302 uses the input motion region separation mask 414 (115) to perform motion compensation processing for the motion region and background image signal for the background region, and perform separate processing. It has a function of synthesizing signals predicted by the prediction method. Note that matching is also performed on the moving region separation mask 414 (115) using the input motion vector 417 (207). That is, the derivation of the interpolation position from the motion vector described in the description of the motion compensation unit 301 is also applied to the motion region separation mask 115. In this case, since the moving region separation mask has only integer pixel accuracy, in the case of a fractional accuracy motion vector, mapping to integer pixel accuracy is performed. The mapping to the integer pixel position in the case of the ¼ pixel precision motion compensation process is expressed by Expression (10). Using the derived integer precision motion vector, the motion region separation prediction is performed as shown in Equation (11).

例えば図８で生成された補間画素ａ、ｂ、ｊなどの値や整数画素Ｇ、Ｈ、Ｍなどの値がMCに入る。図１３に、参照画像信号が時間方向に４枚利用可能な場合の、復号画像信号４１２と動領域分離マスク４１４（１１５）の例と背景画像信号３０６の例を示す。このようにして作成された予測画像信号が動領域分離予測部３０２から出力される。 For example, values such as interpolation pixels a, b, and j and integer pixels G, H, and M generated in FIG. 8 enter MC. FIG. 13 shows an example of the decoded image signal 412 and the moving region separation mask 414 (115) and an example of the background image signal 306 when four reference image signals are available in the time direction. The predicted image signal created in this way is output from the moving region separation prediction unit 302.

次に、予測切替部３０４と予測分離スイッチ３０５について説明する。予測切替部３０４は、入力されてきた動領域分離マスク４１４（１１５）の情報に基づいて、予測分離スイッチ３０５を制御するための予測切替情報３０７を出力する。予測分離スイッチ３０５は、予測切替情報３０７に従って、スイッチの出力端を動き補償部３０１側に接続するか、動領域分離予測部３０２側に接続するかを切り替える機能を有する。より具体的に説明すると、当該予測対象画素ブロック内に含まれる動領域分離マスクの比率を算出し、動領域が予め設定した規定値ＴＰより大きいか、小さいかによって予測切替情報３０７を更新する。図１４に、ＴＰ＝９０％に設定した場合の切替の例を示している。このようにして、予測対象画素ブロックのインター予測の予測方法（動き補償予測と動領域分離予測）が動的に切り替えられ、インター予測部２０２から予測画像信号４１５（１１７）が出力される。 Next, the prediction switching unit 304 and the prediction separation switch 305 will be described. The prediction switching unit 304 outputs prediction switching information 307 for controlling the prediction separation switch 305 based on the input information of the moving region separation mask 414 (115). The prediction separation switch 305 has a function of switching between connecting the output end of the switch to the motion compensation unit 301 side or connecting to the motion region separation prediction unit 302 side according to the prediction switching information 307. More specifically, the ratio of the moving region separation mask included in the prediction target pixel block is calculated, and the prediction switching information 307 is updated depending on whether the moving region is larger or smaller than a preset specified value TP. FIG. 14 shows an example of switching when TP = 90%. In this manner, the prediction method (motion compensation prediction and motion region separation prediction) of the inter prediction of the prediction target pixel block is dynamically switched, and the predicted image signal 415 (117) is output from the inter prediction unit 202.

次に、本動画像復号化装置４００におけるシンタクス構造について説明する。図２４に示すとおり、シンタクスは主に３つのパートからなり、ハイレベルシンタクス１６０１は、スライス以上の上位レイヤのシンタクス情報が詰め込まれている。スライスレベルシンタクス１６０２では、スライス毎に必要な情報が明記されており、マクロブロックレベルシンタクス１６０３では、マクロブロック毎に必要とされるデータが明記されている。 Next, a syntax structure in main video decoding apparatus 400 will be described. As shown in FIG. 24, the syntax is mainly composed of three parts, and the high-level syntax 1601 is packed with syntax information of an upper layer higher than a slice. The slice level syntax 1602 specifies information necessary for each slice, and the macro block level syntax 1603 specifies data required for each macro block.

図２５にスライスヘッダーシンタクスの例を示す。図中に示されるslice_motion_region_separation_flagは、インター予測部５０２（２０２）中の予測切替部３０４から出力される予測切替情報３０７に利用される。slice_motion_region_separation_flagが0である場合、予測切替部３０４は、スライスにおいて常に動き補償予測部３０１の出力端を出力するように予測切替情報３０７を設定して予測分離スイッチ３０５を切り替える。つまり、必ず動き補償予測が行われることを意味している。一方、slice_motion_region_separation_flagが１である場合、前述の通り、スライスにおいて背景画像生成部３０３から出力された動領域分離マスク４１４（１１５）の信号に基づいて動き補償予測と動領域分離予測が動的に切り替わる。 FIG. 25 shows an example of slice header syntax. The slice_motion_region_separation_flag shown in the figure is used for the prediction switching information 307 output from the prediction switching unit 304 in the inter prediction unit 502 (202). When slice_motion_region_separation_flag is 0, the prediction switching unit 304 switches the prediction separation switch 305 by setting the prediction switching information 307 so that the output terminal of the motion compensation prediction unit 301 is always output in the slice. That is, it means that motion compensation prediction is always performed. On the other hand, when slice_motion_region_separation_flag is 1, motion compensation prediction and motion region separation prediction are dynamically switched based on the signal of the motion region separation mask 414 (115) output from the background image generation unit 303 in the slice as described above. .

図２６に符号化パラメータの例としてマクロブロックレイヤーシンタクスの例を示す。表中に示されるｍｂ＿ｔｙｐｅは、マクロブロックタイプ情報を示している。すなわち、現在のマクロブロックがイントラ符号化されているか、インター符号化されているか、或いはどのようなブロック形状で予測が行われているか、などの情報を含んでいる。表中に示されるｃｏｄｅｄ＿ｂｌｏｃｋ＿ｐａｔｔｅｒｎは、８×８画素ブロック毎に、変換係数が存在するかどうかを示している。例えばこの値が０である時、対象ブロックに変換係数が存在しないことを意味している。表中のｍｂ＿ｑｐ＿ｄｅｌｔａは、量子化パラメータに関する情報を示している。対象ブロックの１つ前に符号化されたブロックの量子化パラメータからの差分値を表している。表中のｉｎｔｒａ＿ｐｒｅｄ＿ｍｏｄｅは、イントラ予測の予測方法を示す予測モードを示している。表中のｒｅｆ＿ｉｄｘ＿ｌ０及びｒｅｆ＿ｉｄｘ＿ｌ１は、インター予測が選択されているときに、対象ブロックがどの参照画像を用いて予測されたか、を表す参照画像のインデックスを示している。表中のｍｖ＿ｌ０、ｍｖ＿ｌ１は動きベクトル情報を示している。表中のｔｒａｎｓｆｏｒｍ＿８ｘ８＿ｆｌａｇは、対象ブロックが８×８変換であるかどうかを示す変換情報を表している。 FIG. 26 shows an example of macroblock layer syntax as an example of encoding parameters. Mb_type shown in the table indicates macroblock type information. That is, it includes information such as whether the current macroblock is intra-coded, inter-coded, or in what block shape is predicted. “Coded_block_pattern” shown in the table indicates whether or not a transform coefficient exists for each 8 × 8 pixel block. For example, when this value is 0, it means that there is no transform coefficient in the target block. Mb_qp_delta in the table indicates information on the quantization parameter. The difference value from the quantization parameter of the block encoded immediately before the target block is represented. Intra_pred_mode in the table indicates a prediction mode indicating a prediction method of intra prediction. Ref_idx_l0 and ref_idx_l1 in the table indicate the index of a reference image that indicates which reference image was used to predict the target block when inter prediction is selected. Mv_l0 and mv_l1 in the table indicate motion vector information. In the table, transform_8 × 8_flag represents conversion information indicating whether or not the target block is 8 × 8 conversion.

表中の行間には、本発明で規定していないシンタクス要素が挿入されることも可能であるし、それ以外の条件分岐に関する記述が含まれていても良い。或いは、シンタクステーブルを複数のテーブルに分割、統合することも可能である。また、必ずしも同一の用語を用いる必要は無く、利用する形態によって任意に変更しても良い。更に、マクロブロックレイヤーシンタクスに記述されている各々のシンタクスエレメントは、後述するマクロブロックデータシンタクスに明記されるように変更しても良い。 A syntax element not defined in the present invention can be inserted between the rows in the table, and other conditional branch descriptions may be included. Alternatively, the syntax table can be divided and integrated into a plurality of tables. Moreover, it is not always necessary to use the same term, and it may be arbitrarily changed depending on the form to be used. Furthermore, each syntax element described in the macroblock layer syntax may be changed as specified in a macroblock data syntax described later.

以上が、本発明に係わる動画像復号化装置４００の説明である。 The above is the description of the moving picture decoding apparatus 400 according to the present invention.

（第４の実施形態：変更例１：切替情報のシグナリング）
本実施の形態では、インター予測部５０２（２０２）内の予測方法として、動き補償部３０１と動領域分離予測部３０２の２つを予測切替部３０４によって動的に切り替える例を示したが、動き補償予測と動領域分離予測の切替を動的に行わない実施の形態も可能である。この場合、どちらの予測方法が利用されたかのインデックスを復号化する必要が生じる。このインデックスは、予測切替情報３０７に記述されており、選択された予測画像信号１１７に対するインデックスが予測切替情報３０７に記述されている。 (Fourth embodiment: modification example 1: switching information signaling)
In the present embodiment, as an example of a prediction method in the inter prediction unit 502 (202), an example in which the motion compensation unit 301 and the motion region separation prediction unit 302 are dynamically switched by the prediction switching unit 304 has been described. An embodiment that does not dynamically switch between compensation prediction and dynamic region separation prediction is also possible. In this case, it is necessary to decode an index indicating which prediction method is used. This index is described in the prediction switching information 307, and the index for the selected prediction image signal 117 is described in the prediction switching information 307.

図１５にマクロブロック毎に利用した予測方法を示すインデックスを復号化する例を示す。また、図２７に本実施の形態におけるマクロブロックレイヤーシンタクスの例を示す。図中に示されるmb_motion_region_separation_flagは、インター予測部５０２（２０２）中の予測切替部３０４から出力される予測切替情報３０７に利用される。mb_motion_region_separation_flagが0である場合、予測切替部３０４は、マクロブロックにおいて常に動き補償予測部３０１の出力端を出力するように予測切替情報３０７を設定して予測分離スイッチ３０５を切り替える。つまり、必ず動き補償予測が行われることを意味している。一方、mb_motion_region_separation_flagが１である場合、予測切替部３０４は、マクロブロックにおいて常に動領域分離予測部３０２の出力端を出力するように予測切替情報３０７を設定して予測分離スイッチ３０５を切り替える。つまり、必ず動領域分離予測が行われることを意味している。SignalingFlagはmb_motion_region_separation_flagを符号化するかどうかを決定するための内部パラメータである。SignalingFlagが1の場合、動画素の比率が規定値THMAXからTHMINの間に含まれることを意味する。一方、SignalingFlagが0の場合、動画素の比率が規定値THMAXからTHMINの間に含まれないことを意味する。 FIG. 15 shows an example of decoding an index indicating a prediction method used for each macroblock. FIG. 27 shows an example of macroblock layer syntax in the present embodiment. The mb_motion_region_separation_flag shown in the figure is used for the prediction switching information 307 output from the prediction switching unit 304 in the inter prediction unit 502 (202). When mb_motion_region_separation_flag is 0, the prediction switching unit 304 switches the prediction separation switch 305 by setting the prediction switching information 307 so that the output terminal of the motion compensation prediction unit 301 is always output in the macroblock. That is, it means that motion compensation prediction is always performed. On the other hand, when mb_motion_region_separation_flag is 1, the prediction switching unit 304 switches the prediction separation switch 305 by setting the prediction switching information 307 so that the output end of the moving region separation prediction unit 302 is always output in the macroblock. That is, it means that the motion region separation prediction is always performed. SignalingFlag is an internal parameter for determining whether to encode mb_motion_region_separation_flag. When SignalingFlag is 1, it means that the ratio of moving pixels is included between the specified values THMAX and THMIN. On the other hand, when SignalingFlag is 0, it means that the ratio of moving pixels is not included between the prescribed values THMAX and THMIN.

（第４の実施形態：変更例２：予測画像信号の使いまわし）
本実施の形態では、動き補償部３０１と動領域分離予測部３０２をそれぞれ別々の予測方法として記述しているが、動領域分離予測部３０２内で動き補償部３０１と同様の予測方法も用いている。このように同様の処理を複数回行うことによる演算量の増加を避けるため、図１８に示すように動き補償部３０１で算出した予測画像信号４１５（１１７）を動領域分離予測部３０２へと入力する構造としても良い。或いは動き補償部３０１の機能を動領域分離予測部３０２と統合させても良い。 (Fourth embodiment: Modification 2: Reuse of predicted image signal)
In this embodiment, the motion compensation unit 301 and the motion region separation prediction unit 302 are described as separate prediction methods, but the same prediction method as the motion compensation unit 301 is also used in the motion region separation prediction unit 302. Yes. In order to avoid an increase in the amount of calculation due to performing the same process a plurality of times as described above, the prediction image signal 415 (117) calculated by the motion compensation unit 301 is input to the motion region separation prediction unit 302 as shown in FIG. It is good also as a structure to do. Alternatively, the function of the motion compensation unit 301 may be integrated with the motion region separation prediction unit 302.

（第４の実施形態：変更例３：切替構造の削除）
本実施の形態では、動き補償部３０１と動領域分離予測部３０２をそれぞれ別々の予測方法として記述しているが、予測方法を動領域分離予測３０２に単一化して、予測切替部３０４を削除する構造としても良い。図１９に、動き補償部３０１、予測切替部３０４、予測分離スイッチ３０５を削除した実施形態を示す。予測構造が簡略化するため、ハードウェア規模などの増大を防ぐことが可能となる。 (Fourth embodiment: modification example 3: deletion of switching structure)
In this embodiment, the motion compensation unit 301 and the motion region separation prediction unit 302 are described as separate prediction methods, but the prediction method is unified into the motion region separation prediction 302 and the prediction switching unit 304 is deleted. It is good also as a structure to do. FIG. 19 shows an embodiment in which the motion compensation unit 301, the prediction switching unit 304, and the prediction separation switch 305 are deleted. Since the prediction structure is simplified, it is possible to prevent an increase in hardware scale and the like.

（第５の実施形態：グローバルＭＣ）
本実施の形態では、動画像復号化装置４００において、予測情報４１６にグローバルＭＶ１４０１の情報が含まれている。尚、動画像復号化装置４００としての構造は図２９と変わらないため、同じ構成要素に関する説明は省略する。但し、予測部４０６の機能が異なるため、図３１に示されるように新たに予測部１４００が設けられている。予測部１４００は、構造としては予測部４０６と同一であるが、予測情報４１６に含まれているグローバルＭＶ１４０１がインター予測部８０１へと入力されている点だけが異なる。 (Fifth embodiment: Global MC)
In the present embodiment, in the video decoding device 400, the prediction information 416 includes information on the global MV1401. Note that the structure of the moving picture decoding apparatus 400 is the same as that shown in FIG. However, since the function of the prediction unit 406 is different, a prediction unit 1400 is newly provided as shown in FIG. The prediction unit 1400 is the same in structure as the prediction unit 406, except that the global MV 1401 included in the prediction information 416 is input to the inter prediction unit 801.

インター予測部８０１内の機能について図２０を用いて説明する。まず、背景画像生成部９０１について説明する。背景画像生成部９０１は、参照画像メモリ４０５（１０５）から出力された参照画像信号４１３（１１６）及びグローバルＭＶ１４０１（８０３）が入力される。背景画像生成部９０１は、グローバルＭＶ１４０１（８０３）を利用することによって、カメラが動いているような映像に対しても、背景画像信号３０６を生成することが可能である。まず、動領域分離マスク４１４（１１５）の生成方法について説明する。動領域分離マスク４１４（１１５）は、参照画像信号４１３（１１６）とグローバルＭＶ１４０１（８０３）を用いて式（１２）で算出される。ここで、差分値の代表値を決める指標は、第４の実施形態で述べた方法を当てはめることが可能である。また、第４の実施形態と同様に一度生成した動領域分離マスクに対して補正を行っても良い。 Functions in the inter prediction unit 801 will be described with reference to FIG. First, the background image generation unit 901 will be described. The background image generation unit 901 receives the reference image signal 413 (116) and the global MV 1401 (803) output from the reference image memory 405 (105). By using the global MV 1401 (803), the background image generation unit 901 can generate the background image signal 306 even for a video in which the camera is moving. First, a method for generating the moving region separation mask 414 (115) will be described. The moving region separation mask 414 (115) is calculated by Expression (12) using the reference image signal 413 (116) and the global MV 1401 (803). Here, as the index for determining the representative value of the difference value, the method described in the fourth embodiment can be applied. Moreover, you may correct | amend with respect to the dynamic region separation mask once produced | generated similarly to 4th Embodiment.

次に、背景画像信号３０６の生成について説明する。背景画像信号３０６は、前述した動領域分離マスク４１４（１１５）と復号画像信号４１２、及びグローバルＭＶ１４０１（８０３）を用いてから式（１３）で導出される。 Next, generation of the background image signal 306 will be described. The background image signal 306 is derived by Expression (13) after using the above-described moving region separation mask 414 (115), the decoded image signal 412, and the global MV 1401 (803).

次に、動領域分離予測部９０２について説明する。動領域分離予測部９０２は、動きベクトル４１７（２０７）、参照画像信号４１３、及び背景画像信号生成部９０１から出力された背景画像信号３０６、及びグローバルMV１４０１（８０３）が入力される。動領域分離予測部９０２は、入力された動領域分離マスク４１４（１１５）を用いて、動領域に対して動き補償処理を、背景領域に対しては、グローバルＭＶ１４０１（８０３）を用いた動き補償処理を行い、別々の予測方法で予測された信号を合成する機能を有する。尚、入力された動きベクトル４１７（２０７）を用いて動領域分離マスク４１４（１１５）にもマッチングを行う。つまり、動き補償部３０１の説明で述べた動きベクトルから補間位置の導出を動領域分離マスク４１４（１１５）にも適用する。この場合、動領域分離マスクは整数画素精度のみなので、分数精度の動きベクトルの場合は、整数画素精度へのマッピングを行う。１／４画素精度の動き補償処理の場合の整数画素位置へのマッピングは式（１０）で表される。導出された整数精度の動きベクトルを用いて式（１４）のようにして動領域分離予測が行われる。 Next, the moving region separation prediction unit 902 will be described. The motion region separation prediction unit 902 receives the motion vector 417 (207), the reference image signal 413, the background image signal 306 output from the background image signal generation unit 901, and the global MV 1401 (803). The motion region separation prediction unit 902 uses the input motion region separation mask 414 (115) to perform motion compensation processing on the motion region and motion compensation using the global MV1401 (803) on the background region. It has a function of performing processing and synthesizing signals predicted by different prediction methods. Note that matching is also performed on the moving region separation mask 414 (115) using the input motion vector 417 (207). That is, the derivation of the interpolation position from the motion vector described in the description of the motion compensation unit 301 is also applied to the motion region separation mask 414 (115). In this case, since the moving region separation mask has only integer pixel accuracy, in the case of a fractional accuracy motion vector, mapping to integer pixel accuracy is performed. The mapping to the integer pixel position in the case of the ¼ pixel precision motion compensation process is expressed by Expression (10). Using the derived integer precision motion vector, the motion region separation prediction is performed as shown in Equation (14).

動領域に対しては通常の動き補償予測を、背景領域に対しては、背景画像信号３０６を、グローバルＭＶ１４０１（８０３）を用いて動き補償することによって、動オブジェクトの形状によらずに予測精度を挙げることが可能となる。 By performing motion compensation using the global MV1401 (803) for the motion region using normal motion compensation prediction and for the background region using the background image signal 306, the prediction accuracy can be achieved regardless of the shape of the moving object. Can be mentioned.

図２７に本実施の形態におけるスライスヘッダーシンタクスの例を示す。図中に示されるslice_global_motion_flagは、グローバルＭＶ１４０１（８０３）を利用した動領域分離予測を行うかどうかを示すフラグである。slice_global_motion_flagが0である場合、背景画像生成部９０１及び動領域分離予測部９０２は、第４の実施の形態で説明した背景画像生成部３０３及び動領域分離予測部３０２と同様の予測を行う。つまり、グローバルＭＶ１４０１（８０３）は復号されず、利用できない。 FIG. 27 shows an example of slice header syntax in the present embodiment. The slice_global_motion_flag shown in the figure is a flag indicating whether or not to perform motion region separation prediction using the global MV 1401 (803). When slice_global_motion_flag is 0, the background image generation unit 901 and the motion region separation prediction unit 902 perform the same prediction as the background image generation unit 303 and the motion region separation prediction unit 302 described in the fourth embodiment. That is, the global MV 1401 (803) is not decrypted and cannot be used.

一方、slice_global_motion_flagが１である場合、予め定められたグローバルＭＶ１４０１（８０３）のパラメータの数を示すNumOfGMPの数だけ、gmv_paramを復号化する。これらの情報を用いて、背景画像生成部９０１及び動領域分離予測９０２で対応する予測画像信号が生成される。本実施の形態では、NumOfGMP=2の例を示しており、gmv_param[0]は水平方向の動きベクトルを、gmv_param[1]は垂直方向の動きベクトルを表している。 On the other hand, when slice_global_motion_flag is 1, gmv_param is decoded by the number of NumOfGMP indicating the number of parameters of global MV1401 (803) determined in advance. Using these pieces of information, a corresponding predicted image signal is generated by the background image generation unit 901 and the moving region separation prediction 902. In the present embodiment, an example of NumOfGMP = 2 is shown, where gmv_param [0] represents a horizontal motion vector and gmv_param [1] represents a vertical motion vector.

ここで、本実施の形態ではgmv_paramが直接グローバルＭＶ１４０１（８０３）のパラメータとして与えられる例を示したが、直近に復号されたスライスのグローバルＭＶ１４０１（８０３）からの差分値を符号化しても良いし、予め定めた予測方法によってグローバルMV１４０１（８０３）を算出し、そこからの差分値を復号しても良い。 Here, although an example in which gmv_param is directly given as a parameter of global MV 1401 (803) has been described in the present embodiment, a difference value from the global MV 1401 (803) of the most recently decoded slice may be encoded. Alternatively, the global MV 1401 (803) may be calculated by a predetermined prediction method, and the difference value therefrom may be decoded.

以上が、本発明に係わる動画像復号化装置の説明である。 The above is the description of the moving picture decoding apparatus according to the present invention.

（第６の実施形態：適応補間フィルタ）
本発明の本実施の形態では、動画像復号化装置４００において、予測情報４１６にフィルタ係数１５０１の情報が含まれている。尚、動画像復号化装置４００としての構造は図２９と変わらないため、同じ構成要素に関する説明は省略する。但し、予測部４０６の機能が異なるため、新たに予測部１５００のインデックスを与え、図３２で説明する。予測部１５００は、構造としては予測部４０６と同一であるが、予測情報４１６に含まれているフィルタ係数１５０１がインター予測部１１０１へと入力されている点だけが異なる。 (Sixth embodiment: adaptive interpolation filter)
In the present embodiment of the present invention, in the video decoding device 400, the prediction information 416 includes information on the filter coefficient 1501. Note that the structure of the moving picture decoding apparatus 400 is the same as that shown in FIG. However, since the function of the prediction unit 406 is different, a new index of the prediction unit 1500 is given and will be described with reference to FIG. The prediction unit 1500 is the same as the prediction unit 406 in structure, except that the filter coefficient 1501 included in the prediction information 416 is input to the inter prediction unit 1101.

インター予測部１１０１内の機能について図２２を用いて説明する。動領域分離予測部１２０１は、動きベクトル４１７（２０７）、参照画像信号４１３（１１６）、及び背景画像信号生成部９０１から出力された背景画像信号３０６、及びフィルタ係数１５０１（１１０３）が入力される。動領域分離予測部１２０２は、入力された動領域分離マスク４１４（１１５）を用いて、動領域に対して適応動き補償処理を、背景領域に対しては、背景画像信号３０６の補填を行い、別々の予測方法で予測された信号を合成する機能を有する。尚、入力された動きベクトル４１７（２０７）を用いて動領域分離マスク４１４（１１５）にもマッチングを行う。つまり、動き補償部３０１の説明で述べた動きベクトルから補間位置の導出を動領域分離マスク４１４（１１５）にも適用する。この場合、動領域分離マスクは整数画素精度のみなので、分数精度の動きベクトルの場合は、整数画素精度へのマッピングを行う。１／４画素精度の動き補償処理の場合の整数画素位置へのマッピングは式（１０）で表される。導出された整数精度の動きベクトルを用いて式（１６）によって予測画像信号が生成される。 Functions in the inter prediction unit 1101 will be described with reference to FIG. The motion region separation prediction unit 1201 receives the motion vector 417 (207), the reference image signal 413 (116), the background image signal 306 output from the background image signal generation unit 901, and the filter coefficient 1501 (1103). . The motion region separation prediction unit 1202 uses the input motion region separation mask 414 (115) to perform adaptive motion compensation processing for the motion region and perform background image signal 306 compensation for the background region, It has a function of synthesizing signals predicted by different prediction methods. Note that matching is also performed on the moving region separation mask 414 (115) using the input motion vector 417 (207). That is, the derivation of the interpolation position from the motion vector described in the description of the motion compensation unit 301 is also applied to the motion region separation mask 414 (115). In this case, since the moving region separation mask has only integer pixel accuracy, in the case of a fractional accuracy motion vector, mapping to integer pixel accuracy is performed. The mapping to the integer pixel position in the case of the ¼ pixel precision motion compensation process is expressed by Expression (10). A predicted image signal is generated by Expression (16) using the derived integer-precision motion vector.

より具体的に図８を参照しながら適応動き補償予測を説明する。最初に１／２画素位置に対応するａ，ｂ，ｃ，ｄ，ｈ，ｎの画素位置の予測値を６タップの１次元フィルタで生成する。例えばａ、ｄの画素位置に対応する予測値は式（１７）で生成される。次に残りの分数精度位置に対応するｅ，ｆ，ｇ，ｉ，ｊ，ｋ，ｐ，ｑ，ｒの画素位置の予測値を６タップの２次元フィルタで生成する。例えばｅの画素位置に対応する予測は式（１８）で生成される。尚フィルタの対照性を考慮して、式（１９）を用いてフィルタ係数１５０１（１１０３）を統合する。このような対照性を利用した係数を利用することで、適応動き補償予測で用いるフィルタ係数１５０１（１１０３）を削減することが可能である。 The adaptive motion compensation prediction will be described more specifically with reference to FIG. First, predicted values of pixel positions a, b, c, d, h, and n corresponding to 1/2 pixel positions are generated by a 6-tap one-dimensional filter. For example, predicted values corresponding to the pixel positions a and d are generated by Expression (17). Next, predicted values of pixel positions e, f, g, i, j, k, p, q, and r corresponding to the remaining fractional precision positions are generated by a 6-tap two-dimensional filter. For example, the prediction corresponding to the pixel position of e is generated by Expression (18). In consideration of the contrast of the filter, the filter coefficient 1501 (1103) is integrated using Expression (19). By using a coefficient using such contrast, the filter coefficient 1501 (1103) used in adaptive motion compensation prediction can be reduced.

動領域に対しては復号されたフィルタ係数１５０１（１１０３）を用いて、適応動き補償を行い、背景領域に対しては、背景画像信号３０６を補填することによって、動いているオブジェクトと背景領域毎に最適な予測画像信号が生成できるため、予測精度を高めることが可能となる。 Adaptive motion compensation is performed using the decoded filter coefficient 1501 (1103) for the moving region, and the background image signal 306 is supplemented for the background region, so that the moving object and each background region are compensated. Therefore, it is possible to improve the prediction accuracy.

図２８に本実施の形態におけるスライスヘッダーシンタクスの例を示す。図中に示されるslice_adaptive_filter_flagは、適応動き補償予測を利用した動領域分離予測を行うかどうかを示すフラグである。slice_adaptive_filter_flagが0である場合、動領域分離予測部１２０１は、第３の実施の形態で説明した動領域分離予測部３０２と同様の予測を行う。つまり、動画素に対する適応動き補償予測は行われず、フィルタ係数も利用しない。一方、slice_adaptive_filter_flagが１である場合、予め定められた二次元のフィルタ係数の個数を示すNumOfPosXとNumOfPosYの数だけ、filter_coeffを復号する。これらの情報を用いて、動領域分離予測１２０１で動画素に対して適応動き補償予測が行われ、予測画像信号が生成される。 FIG. 28 shows an example of slice header syntax in the present embodiment. The slice_adaptive_filter_flag shown in the figure is a flag indicating whether or not to perform motion region separation prediction using adaptive motion compensation prediction. When slice_adaptive_filter_flag is 0, the motion region separation prediction unit 1201 performs the same prediction as the motion region separation prediction unit 302 described in the third embodiment. That is, adaptive motion compensation prediction for moving pixels is not performed, and filter coefficients are not used. On the other hand, when slice_adaptive_filter_flag is 1, filter_coeff is decoded by the number of NumOfPosX and NumOfPosY indicating the number of predetermined two-dimensional filter coefficients. Using these pieces of information, adaptive motion compensation prediction is performed on the moving pixels in the moving region separation prediction 1201, and a predicted image signal is generated.

ここで、本実施の形態ではfilter_coeffが直接フィルタ係数１５０１（１１０３）のパラメータとして与えられる例を示したが、直近に復号化されたスライスのフィルタ係数１５０１（１１０３）からの差分値を復号化しても良いし、予め定めた予測方法によってフィルタ係数を算出し、そこからの差分値を復号化しても良い。 Here, although an example in which filter_coeff is directly given as a parameter of the filter coefficient 1501 (1103) is shown in the present embodiment, the difference value from the filter coefficient 1501 (1103) of the most recently decoded slice is decoded. Alternatively, the filter coefficient may be calculated by a predetermined prediction method, and the difference value therefrom may be decoded.

（第１〜第６の実施形態の変形例）
（１）第１〜第６の実施形態においては、処理対象フレームを１６×１６画素サイズなどの短形ブロックに分割し、図４Ａに示したように画面左上のブロックから右下に向かって順に符号化／復号化する場合について説明しているが、符号化／復号化順序はこれに限られない。例えば、右下から左上に向かって順に符号化／復号化を行ってもよいし、画面中央から渦巻状に向かって順に符号化／復号化を行ってもよい。さらに、右上から左下に向かって順に符号化／復号化を行ってもよいし、画面の周辺部から中心部に向かって順に符号化／復号化を行ってもよい。 (Modification of the first to sixth embodiments)
(1) In the first to sixth embodiments, the processing target frame is divided into short blocks of 16 × 16 pixel size or the like, and sequentially from the upper left block to the lower right side as shown in FIG. 4A. Although the case of encoding / decoding has been described, the encoding / decoding order is not limited to this. For example, encoding / decoding may be performed sequentially from the lower right to the upper left, or encoding / decoding may be performed sequentially from the center of the screen toward the spiral. Furthermore, encoding / decoding may be performed in order from the upper right to the lower left, or encoding / decoding may be performed in order from the periphery of the screen toward the center.

（２）第１〜第６の実施形態においては、ブロックサイズを４×４画素ブロック、８×８画素ブロックとして説明を行ったが、対象ブロックは均一なブロック形状にする必要なく、１６×８画素ブロック、８×１６画素ブロック、８×４画素ブロック、４×８画素ブロックなどのブロックサイズであってもよい。また、１つのマクロブロック内でも均一なブロックサイズを取る必要はなく、それぞれ異なるサイズのブロックを混在させてもよい。この場合、分割数が増えると分割情報を符号化するための符号量が増加するが、変換係数の符号量と局部復号画像とのバランスを考慮して、ブロックサイズを選択すればよい。 (2) In the first to sixth embodiments, the block size is described as a 4 × 4 pixel block and an 8 × 8 pixel block. However, the target block does not need to have a uniform block shape, and 16 × 8. The block size may be a pixel block, an 8 × 16 pixel block, an 8 × 4 pixel block, a 4 × 8 pixel block, or the like. Also, it is not necessary to have a uniform block size within one macroblock, and blocks of different sizes may be mixed. In this case, the code amount for encoding the division information increases as the number of divisions increases, but the block size may be selected in consideration of the balance between the code amount of the transform coefficient and the locally decoded image.

（３）第１〜第６の実施形態においては、輝度信号と色差信号を分割せず、一方の色信号成分に限定した例として記述した。しかし、予測処理が輝度信号と色差信号で異なる場合、それぞれ異なる予測方法を用いてもよいし、同一の予測方法を用いても良い。異なる予測方法を用いる場合は、色差信号に対して選択した予測方法を輝度信号と同様の方法で符号化／復号化する。 (3) In the first to sixth embodiments, the luminance signal and the color difference signal are not divided and described as an example limited to one color signal component. However, when the prediction processing is different between the luminance signal and the color difference signal, different prediction methods may be used, or the same prediction method may be used. When a different prediction method is used, the prediction method selected for the color difference signal is encoded / decoded in the same manner as the luminance signal.

（４）第１及び第４の実施形態においては、図１７で説明したように、動き補償部３０１で生成された予測画像信号を動領域分離予測部３０２で使いまわす変更例や、図１８で説明したように、動き補償部３０１を削除して、常に動領域分離予測部３０２を利用する変更例を示したが、これらの変更例は第２、３実施の形態及び第５，６実施の形態においても同様の枠組みが適応可能である。また、第２及び５実施の形態におけるグローバルＭＶ８０３を利用した動き補償予測を、動き補償部３０１に適用しても良いし、実施の形態３及び６におけるフィルタ係数１１０３を利用する適応動き補償予測を動き補償部３０１に適応しても一向に構わない。 (4) In the first and fourth embodiments, as described with reference to FIG. 17, a modification example in which the prediction image signal generated by the motion compensation unit 301 is reused by the moving region separation prediction unit 302, or FIG. As described above, the modification example in which the motion compensation unit 301 is deleted and the moving region separation prediction unit 302 is always used has been described. These modification examples are the same as those in the second and third embodiments and the fifth and sixth embodiments. A similar framework can be applied in the form. In addition, the motion compensation prediction using the global MV 803 in the second and fifth embodiments may be applied to the motion compensation unit 301, or the adaptive motion compensation prediction using the filter coefficient 1103 in the third and sixth embodiments. It does not matter if it is adapted to the motion compensation unit 301.

なお、上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。さらに、異なる実施形態にわたる構成要素を適宜組み合わせてもよい。 The above-described embodiment is not limited to the above-described embodiment, and the constituent elements can be modified and embodied without departing from the spirit of the invention in the implementation stage. In addition, various inventions can be formed by appropriately combining a plurality of components disclosed in the embodiment. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, constituent elements over different embodiments may be appropriately combined.

第１の実施形態に従う動画像符号化装置を示すブロック図The block diagram which shows the moving image encoder according to 1st Embodiment. 第１の実施形態に従う予測部を示すブロック図The block diagram which shows the prediction part according to 1st Embodiment 第１の実施形態に従うインター予測部を示すブロック図The block diagram which shows the inter estimation part according to 1st Embodiment 符号化の処理の流れを示す図Diagram showing the flow of the encoding process １６×１６画素ブロックを示す図A diagram showing a 16 × 16 pixel block ４×４画素ブロックを示す図A diagram showing a 4 × 4 pixel block ８×８画素ブロックを示す図Diagram showing an 8x8 pixel block 参照画像信号と予測対象画像との位置関係と動きベクトルとの関係を示す図The figure which shows the relationship between the positional relationship of a reference image signal and a prediction object image, and a motion vector マクロブロック単位の動き補償ブロックのサイズを示す図The figure which shows the size of the motion compensation block per macroblock サブブロック単位の動き補償ブロックのサイズを示す図The figure which shows the size of the motion compensation block per subblock 動き補償予測の際の整数画素と分数画素の位置関係を示す図The figure which shows the positional relationship of the integer pixel and fractional pixel in the case of motion compensation prediction 複数の参照画像信号に対する対象画素と時間的に同位置の画素との関係を示す図The figure which shows the relationship between the object pixel with respect to several reference image signals, and the pixel of the same position temporally 予測対象画素ブロックからの時間的距離と重みとの関係を示す図The figure which shows the relationship between the time distance from a prediction object pixel block, and a weight. 対象画素と隣接画素の空間的位置関係と距離を示す図The figure which shows the spatial positional relationship and distance of an object pixel and an adjacent pixel 予測対象画素ブロックからの空間的距離と重みとの関係を示す図The figure which shows the relationship between the spatial distance from a prediction object pixel block, and a weight. 複数の復号画像信号と動領域分離マスク及び背景画像信号の予測の概要を示す図The figure which shows the outline | summary of prediction of several decoded image signals, a moving region separation mask, and a background image signal 動領域分離マスク上の動画素と背景画素の比率によって予測方法が変わることを示す図The figure which shows that a prediction method changes with the ratio of the moving pixel on a moving region separation mask, and a background pixel. 動領域分離マスク上の動画素と背景画素の比率によって予測の切替を行うことを示す図The figure which shows switching prediction by the ratio of the moving pixel and background pixel on a moving region separation mask 背景画像信号生成部の処理の流れを示すフローチャートA flowchart showing the flow of processing of the background image signal generation unit 動領域分離予測部の処理の流れを示すフローチャートThe flowchart which shows the flow of a process of a motion area separation prediction part 第１の実施形態の変形例として示されるインター予測部のブロック図The block diagram of the inter prediction part shown as a modification of 1st Embodiment 第１の実施形態の変形例として示されるインター予測部のブロック図The block diagram of the inter prediction part shown as a modification of 1st Embodiment 第２の実施形態に従った動画像符号化装置に設けられる予測部のブロック図The block diagram of the prediction part provided in the moving image encoder according to the second embodiment. 図１９の予測部に設けられるインター予測部のブロック図Block diagram of an inter prediction unit provided in the prediction unit of FIG. 第３の実施形態に従った動画像符号化装置に設けられる予測部のブロック図The block diagram of the prediction part provided in the moving image encoder according to the third embodiment. 図２１の予測部に設けられるインター予測部のブロック図Block diagram of the inter prediction unit provided in the prediction unit of FIG. シンタクス構造を示す図Diagram showing the syntax structure スライスヘッダーに含まれる情報を示す図Diagram showing information contained in slice header 第１の実施の形態におけるマクロブロックレイヤーに含まれる情報を示す図The figure which shows the information contained in the macroblock layer in 1st Embodiment 第１の実施の形態の変更例におけるマクロブロックレイヤーに含まれる情報を示す図The figure which shows the information contained in the macroblock layer in the example of a change of 1st Embodiment 第２の実施の形態におけるスライスヘッダーシンタクスに含まれる情報を示す図The figure which shows the information contained in the slice header syntax in 2nd Embodiment 第３の実施の形態におけるスライスヘッダーシンタクスに含まれる情報を示す図The figure which shows the information contained in the slice header syntax in 3rd Embodiment 第４、５、６の実施形態に従う動画像復号化装置のブロック図Block diagram of moving picture decoding apparatus according to fourth, fifth and sixth embodiments 第４の実施形態に従う動画像復号化装置に設けられる予測部を示すブロック図The block diagram which shows the prediction part provided in the moving image decoding apparatus according to 4th Embodiment. 第５の実施形態に従う動画像復号化装置に設けられる予測部を示すブロック図The block diagram which shows the prediction part provided in the moving image decoding apparatus according to 5th Embodiment. 第６の実施形態に従う動画像復号化装置に設けられる予測部を示すブロック図The block diagram which shows the prediction part provided in the moving image decoding apparatus according to 6th Embodiment.

Explanation of symbols

１０１…減算器、１０２…変換・量子化部、１０３…逆変換・逆量子化部、１０４…加算器、１０５…参照画像メモリ、１０６…予測部、１０７…符号化制御部、１０８…符号列符号化部、１０９…出力バッファ、１１４…復号画像信号、１１５…動領域分離マスク、１１６…参照画像信号、１１７…予測画像信号、２０１…イントラ予測部、２０２…インター予測部、２０３…動きベクトル推定部、２０４…モード判定スイッチ、２０５…モード判定部、３０１…動き補償部、３０２…動領域分離予測部、３０３…背景画像生成部、３０４…予測切替部、３０５…予測分離スイッチ、３０６…背景画像信号 DESCRIPTION OF SYMBOLS 101 ... Subtractor, 102 ... Transformation / quantization part, 103 ... Inverse transformation / inverse quantization part, 104 ... Adder, 105 ... Reference image memory, 106 ... Prediction part, 107 ... Coding control part, 108 ... Code sequence Encoding unit 109 ... output buffer 114 ... decoded image signal 115 ... moving region separation mask 116 ... reference image signal 117 ... prediction image signal 201 ... intra prediction unit 202 ... inter prediction unit 203 ... motion vector Estimating unit, 204 ... mode determination switch, 205 ... mode determination unit, 301 ... motion compensation unit, 302 ... moving region separation prediction unit, 303 ... background image generation unit, 304 ... prediction switching unit, 305 ... prediction separation switch, 306 ... Background image signal

Claims

In the moving picture coding method for dividing an input image signal into a plurality of pixel blocks, performing a prediction process of each pixel block using a reference image signal, and coding a difference signal between the input image signal and the predicted image signal,
A mask generating step for generating a binary moving area separation mask indicating a moving area and a background area for each reference image signal;
A background image generating / updating step for generating or updating one background image signal based on a comparison of two or more reference image signals or a binary moving region separation mask value for each reference image signal;
Using the moving region separation mask, (1) motion compensation processing is performed on the first portion of the prediction target image corresponding to the moving region, and (2) the prediction target image corresponding to the background region is selected. In the second part, a predicted image generation step of generating a predicted image signal by supplementing a signal obtained by interpolating the signal of the background image, and
A moving picture encoding method configured to include:

In the mask generation step, a criterion for determining a pixel of the reference image as the moving region or the background region is determined according to a value derived from a difference value of pixels between two or more available reference images or in the reference image. The moving image encoding method according to claim 1, wherein the moving image encoding method is determined.

The mask generation step and the background image generation / update step include
Estimating a global vector for correcting an amount of change between images due to a change in an imaging system between any of two or more available reference images and the prediction target image; and an estimated global Using the image interpolated based on the vector, generating the dynamic region separation mask and generating or updating the background image signal;
Encoding information on the global vector in units of any one of a sequence, an image, a slice, and a block;
The moving picture coding method according to claim 1 or 2, characterized by comprising:

The predicted image generation step includes a step of changing a coefficient of a filter that generates an interpolated image of integer accuracy or fractional accuracy for each pixel position, for the pixel in which the moving region separation mask is determined to be a moving region, and the change Encoding the information regarding the filter coefficients in units of any one of each sequence, each image, each slice, and each block;
The moving picture encoding method according to claim 1, further comprising:

The mask generation step is derived from a weight based on a distance between a plurality of pixels of the moving region separation mask and spatially or temporally close to the generated moving region separation mask and a difference value of the pixels. The method includes a step of performing correction such as isolated point removal, discontinuous point connection, area enlargement / reduction to a rectangular block, edge correction, pixel compensation, pixel matching, and the like based on the value to be determined. 5. The moving image encoding method according to any one of 1 to 4.

The predicted image generation step includes:
With respect to any one or more moving region division masks at the same position as the block of the first part of the prediction target image or a position derived based on a local motion vector mapped to the integer precision, Calculating a ratio or ratio of the background region;
Switching the prediction method according to whether either the ratio of the moving area or the ratio of the background area is larger or smaller than a predetermined value,
The moving picture encoding method according to claim 4, further comprising:

The predicted image generation step includes:
A first prediction method that applies different prediction methods to the moving region and the background region based on the moving region separation mask, and all values of the moving region separation mask included in the block of the prediction target image The method further includes a step of encoding a piece of information indicating which one of the first and second prediction methods is used, the second prediction method having a second prediction method that is regarded as a moving region and is predicted by a single prediction method. 5. The moving picture encoding method according to claim 1, wherein

The predicted image generation step includes:
Generating the predicted image signal by using a pixel value obtained by interpolating the signal of the background image based on the global vector for the pixel determined as the background region by the moving region separation mask;
Generating the predicted image signal by using a pixel value obtained by interpolating the reference image signal based on the local motion vector for the pixel determined as the moving region;
The moving picture encoding method according to claim 1, further comprising:

The predicted image generation step generates the predicted image signal by using a prediction method that can use the same moving region separation mask or different moving region separation masks for each of the luminance component and the color difference component or for each color component. The moving image encoding method according to claim 1, wherein the moving image encoding method is any one of claims 1 to 4.

In a moving picture decoding method for decoding moving picture encoded data obtained by encoding each frame constituting an input image signal in units of pixel blocks, and performing decoding processing by a prescribed method,
A mask generation step for generating a binary moving region separation mask indicating a moving region and a background region for each signal of the reference image, and comparison of the signals of the two or more reference images or binary for each signal of the reference image A background image generating / updating step of generating or updating one background image signal according to the value of the moving region separation mask;
Using the moving region separation mask, (1) motion compensation processing is performed on the first portion of the prediction target image corresponding to the moving region, and (2) the second portion of the prediction target image corresponding to the background region is used. A predicted image signal generating step for generating a predicted image signal by compensating a signal obtained by interpolating the signal of the background image in the part;
A moving picture decoding method comprising:

In the mask generation step, a criterion for determining the moving area or the background area is determined according to a value derived from a difference value of pixels between two or more available reference images or in a reference image. The moving picture decoding method according to claim 10.

(Moving image decoding: camera correction of moving region separation mask: middle concept)
The mask generation step and the background image generation / update step include
A global vector for correcting the amount of change between images caused by a change in the imaging system is estimated between one of two or more available reference image signals and the prediction target image. Using the image interpolated based on a global vector, generating the dynamic region separation mask and generating or updating the background image signal;
Encoding information on the global vector in units of any one of a sequence, an image, a slice, and a block;
The moving picture decoding method according to claim 10 or 11, further comprising:

The predicted image generation step includes:
Changing a coefficient of a filter for generating an interpolated image of integer precision or fractional precision for each pixel position for a pixel in which the moving area division mask is determined to be a moving area;
Encoding the information regarding the changed filter coefficient in units of any one of each sequence, each image, each slice, and each block;
The moving picture decoding method according to any one of claims 10 to 12, further comprising:

The mask generation step is derived from a weight based on a distance between a plurality of pixels of the moving region separation mask and spatially or temporally close to the generated moving region separation mask and a difference value of the pixels. Including the steps of performing isolated point removal, discontinuous point connection, area expansion / reduction to a rectangular block, edge correction, pixel compensation, pixel matching, etc. The moving picture decoding method according to claim 10.

The predicted image generation step is performed for any one or more moving region division masks at the same position as the partial block of the prediction target image or a position derived based on a local motion vector mapped with the integer precision. Calculating a ratio of the moving area or the ratio of the background area, and a prediction method according to whether either the ratio of the moving area or the ratio of the background area is larger or smaller than a predetermined value. Switching steps;
The moving picture decoding method according to claim 10, further comprising:

The predicted image generation step includes:
A first prediction method that applies different prediction methods to the moving region and the background region based on the moving region separation mask, and the moving region separation included in the block of the first portion of the prediction target image It has a second prediction method in which all mask values are regarded as moving regions and is predicted by a single prediction method, and indicates which of the first prediction method and the second prediction method is used. 14. The moving picture decoding method according to claim 10, further comprising a step of decoding information.

The predicted image generation step includes:
Generating the predicted image signal by using a pixel value obtained by interpolating the background image signal based on the global vector for the pixel determined as the background region by the moving region separation mask;
Generating the predicted image signal by using a pixel value obtained by interpolating the reference image signal based on the local motion vector for the pixel determined to be the moving region;
The moving picture decoding method according to any one of claims 10 to 12, further comprising:

The predicted image generation step generates the predicted image signal by using a prediction method that can use the same moving region separation mask or different moving region separation masks for each of the luminance component and the color difference component or for each color component. The moving picture decoding method according to any one of claims 10 to 13, wherein the moving picture decoding method is characterized.

In a moving image encoding apparatus that divides an input image signal into a plurality of pixel blocks, performs a prediction process on each pixel block using a reference image signal, and encodes a difference signal between the input image signal and the predicted image signal.
Mask generating means for generating a binary moving region separation mask indicating a moving region and a background region for each reference image signal;
A background image generating / updating means for generating or updating one background image signal by comparing two or more reference image signals or by a binary moving region separation mask value for each reference image signal;
Using the moving region separation mask, (1) motion compensation processing is performed on the first portion of the prediction target image corresponding to the moving region, and (2) the prediction target image corresponding to the background region is selected. A predicted image generating means for generating a predicted image signal by supplementing a signal obtained by interpolating a background image signal in the second part;
A moving picture encoding apparatus configured to include:

The mask generating means determines a criterion for determining a pixel of the reference image as the moving region or the background region according to a value derived from a difference value of pixels between two or more available reference images or in the reference image. 20. The moving picture coding apparatus according to claim 19, further comprising means for determining.

An estimation means for estimating a global vector for correcting an amount of change between images caused by a change in an imaging system between any one of two or more available reference images and the prediction target image; and the global vector And encoding means for encoding the information on the sequence, the image, the slice, or the block.
The mask generation unit and the background image generation / update unit generate a mask generation unit that generates the moving region separation mask and a signal of the background image using an image interpolated based on the estimated global vector, or Consists of background image generation / update means for updating,
21. The moving picture coding apparatus according to claim 19 or 20, wherein

The prediction image means includes a changing means for changing a coefficient of a filter for generating an interpolation image with integer precision or fractional precision for each pixel position, for the pixel in which the moving area separation mask is determined to be a moving area, and the change Encoding means for encoding the information regarding the filter coefficient in units of any one of each sequence, each image, each slice, and each block;
The moving picture coding apparatus according to any one of claims 19 to 21, wherein the moving picture coding apparatus includes:

The mask generation means derives the weight based on the distance between the plurality of pixels of the moving region separation mask, which are spatially or temporally close, with respect to the generated moving region separation mask and the difference value of the pixels. And correcting means for performing correction such as isolated point removal, discontinuous point connection, area expansion / reduction to a rectangular block, edge correction, pixel compensation, pixel matching, and the like based on the value to be processed. Item 23. The moving picture encoding apparatus according to any one of Items 19 to 22.

The predicted image means includes
With respect to any one or more moving region division masks at the same position as the block of the first part of the prediction target image or a position derived based on a local motion vector mapped to the integer precision, Calculating means for calculating a ratio or a ratio of the background region;
Switching means for switching a prediction method according to whether either the ratio of the moving area or the ratio of the background area is larger or smaller than a predetermined value,
23. The moving picture coding apparatus according to claim 22, further comprising:

The predicted image generation means includes
A first prediction method that applies different prediction methods to the moving region and the background region based on the moving region separation mask, and all values of the moving region separation mask included in the block of the prediction target image Encoding means for encoding a piece of information indicating which one of the first and second prediction methods is used, having a second prediction method that is regarded as a moving region and predicting with a single prediction method 23. The moving picture coding apparatus according to claim 19, further comprising:

The predicted image generation means includes
Generating means for generating the predicted image signal by using a pixel value obtained by interpolating the signal of the background image based on the global vector for the pixel determined as the background region by the moving region separation mask;
Generating means for generating the predicted image signal by using a pixel value obtained by interpolating the signal of the reference image based on the local motion vector for the pixel determined to be the moving region;
23. The moving picture coding apparatus according to claim 19, further comprising:

The predicted image generation means generates the predicted image signal using a prediction method that can use the same moving region separation mask or different moving region separation masks for each luminance component and color difference component or for each color component. 23. The moving picture coding apparatus according to claim 19, wherein the moving picture coding apparatus is any one of claims 19 to 22.

In a moving image decoding apparatus that decodes moving image encoded data obtained by encoding each frame constituting an input image signal in units of pixel blocks, and performs decoding processing by a prescribed method,
A mask generating means for generating a binary moving region separation mask indicating a moving region and a background region for each reference image signal, and a comparison of the signals of the two or more reference images or a binary value for each signal of the reference image A background image generating / updating means for generating or updating one background image signal according to the value of the moving region separation mask;
Using the moving region separation mask, (1) motion compensation processing is performed on the first portion of the prediction target image corresponding to the moving region, and (2) the second portion of the prediction target image corresponding to the background region is used. A predicted image signal generating means for generating a predicted image signal by supplementing a signal obtained by interpolating the signal of the background image in the portion;
A moving picture decoding apparatus comprising:

The mask generation means includes means for determining a criterion for determining the moving area or the background area according to a value derived from a difference value of pixels between two or more available reference images or in a reference image. The moving picture decoding apparatus according to claim 28.

An estimation means for estimating a global vector for correcting an amount of change between images caused by a change in an imaging system, between any of two or more available reference image signals and the prediction target image; Encoding means for encoding the information about the global vector in units of any one of a sequence, an image, a slice, and a block;
The mask generation means and the background image generation / update means generate or update the mask generation means for generating the moving region separation mask and the background image signal using an image interpolated based on the estimated global vector. Comprising background image generation / update means for
30. A moving picture decoding apparatus according to claim 28 or 29.

The predicted image generation means includes
Change means for changing, for each pixel position, a coefficient of a filter that generates an integer-accurate or fraction-accurate interpolated image for pixels in which the moving area division mask is determined to be a moving area;
Encoding means for encoding the information regarding the changed filter coefficient in units of any one of a sequence, an image, a slice, and a block;
31. The moving picture decoding apparatus according to any one of claims 28 to 30, wherein the moving picture decoding apparatus includes:

The mask generation means derives the weight based on the distance between the plurality of pixels of the moving region separation mask, which are spatially or temporally close, with respect to the generated moving region separation mask and the difference value of the pixels. Including correction means for performing correction such as isolated point removal, discontinuous point connection, area enlargement / reduction to a rectangular block, edge correction, pixel compensation, pixel matching, etc. 32. The moving picture decoding apparatus according to any one of claims 28 to 31.

The predicted image generation means applies to one or more moving region division masks at the same position as the partial block of the prediction target image or a position derived based on a local motion vector mapped with the integer precision. A prediction unit that calculates a ratio of the moving area or the ratio of the background area, and whether either the ratio of the moving area or the ratio of the background area is larger or smaller than a predetermined value. Switching means for switching between,
32. The moving picture decoding apparatus according to claim 28, further comprising:

The predicted image generation means includes
A first prediction method that applies different prediction methods to the moving region and the background region based on the moving region separation mask, and the moving region separation included in the block of the first portion of the prediction target image It has a second prediction method in which all mask values are regarded as moving regions and is predicted by a single prediction method, and indicates which of the first prediction method and the second prediction method is used. 32. The moving picture decoding apparatus according to claim 28, further comprising decoding means for decoding information.

The predicted image generation means includes
Generating means for generating the predicted image signal by using a pixel value obtained by interpolating the background image signal based on the global vector for the pixel determined as the background region by the moving region separation mask;
Generating means for generating the predicted image signal by using a pixel value obtained by interpolating the reference image signal based on the local motion vector for the pixel determined to be the moving region;
31. The moving picture decoding apparatus according to any one of claims 28 to 30, wherein the moving picture decoding apparatus includes:

The predicted image generation unit generates the predicted image signal using a prediction method that can use the same moving region separation mask or different moving region separation masks for each of the luminance component and the color difference component or for each color component. The moving picture decoding apparatus according to any one of claims 28 to 30.