JP4960400B2

JP4960400B2 - Stereo image encoding method and stereo image decoding method

Info

Publication number: JP4960400B2
Application number: JP2009077279A
Authority: JP
Inventors: 玲子野田; 朋夫山影; 晋一郎古藤
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2009-03-26
Filing date: 2009-03-26
Publication date: 2012-06-27
Anticipated expiration: 2029-03-26
Also published as: JP2010232878A

Description

本発明は、ステレオ画像を効率的に予測符号化する符号化方法、及びその符号化された符号データを復号化する復号化方法に関する。 The present invention relates to an encoding method for efficiently predictive encoding a stereo image and a decoding method for decoding the encoded code data.

ステレオ（立体）画像（以下、ステレオ画像）は、左目用画像と右目用画像を、それぞれ左右の目で見られるよう表示することによって視覚的に立体感を出すことを目的とした画像である。そのため、同じ解像度およびフレームレートで表示を行う場合、通常の平面画像に対し、２倍の量の画像データが必要になる。 A stereo (stereoscopic) image (hereinafter, referred to as a stereo image) is an image intended to provide a stereoscopic effect visually by displaying a left-eye image and a right-eye image so that they can be seen by the left and right eyes, respectively. Therefore, when display is performed at the same resolution and frame rate, twice as much image data is required as compared to a normal planar image.

既存の伝送チャネル（例えば、放送、蓄積メディア、有線または無線の通信回線）は、平面画像を伝送することを想定しており、２倍の量の画像データを並列、かつ、同期させて送ることはできない。そのため、２倍に増えた画像データを１／２に削減し、既存の伝送チャネルで送ることが行われている。 Existing transmission channels (for example, broadcast, storage media, wired or wireless communication lines) are supposed to transmit flat images and send twice as much image data in parallel and in synchronization. I can't. For this reason, the image data increased by a factor of two is reduced to ½ and sent over an existing transmission channel.

例えば、Ｈ．２６４／ＭＰＥＧ−４ＡＶＣに採用されているＳｔｅｒｅｏｖｉｄｅｏｉｎｆｏｒｍａｔｉｏｎＳＥＩ（ＳｕｐｐｌｅｍｅｎｔａｌＥｎｈａｎｃｅｍｅｎｔＩｎｆｏｒｍａｔｉｏｎ）ｍｅｓｓａｇｅは、フレーム単位、またはフィールド単位で交互に伝送される左右の画像を示す付加情報をビットストリームに記録する。 For example, H.M. The Stereo video information SEI (Supplemental Enhancement Information) message employed in H.264 / MPEG-4 AVC records additional information indicating left and right images alternately transmitted in frame units or field units in a bit stream.

また、米国特許出願公開第２００３／０２２３４９９号明細書（特許文献１では、左目画像と右目画像を互いに補間する格子状の位置でサブサンプリングした後、水平方向にスクイーズし、左目画像を画面の左半分、右目画像を画面の右半分に並べるステレオ画像形式が開示されている。このステレオ画像形式は、サブサンプリングする前の、左目画像又は右目画像と同じ画素数からなる１枚の左右画像の連結画像である（以下、この画像形式を「サイドバイサイド方式」と呼ぶ）。 In addition, US Patent Application Publication No. 2003/0223499 (in Patent Document 1, after subsampling the left eye image and the right eye image at a grid-like position that interpolates each other, squeezing in the horizontal direction, the left eye image is displayed on the left side of the screen. A stereo image format is disclosed in which the right and left eye images are arranged in the right half of the screen, and this stereo image format is a concatenation of a single left and right image having the same number of pixels as the left or right eye image before sub-sampling. This is an image (hereinafter, this image format is referred to as a “side-by-side method”).

また例えば、左目画像と右目画像とを互いに補間する格子状の位置でサブサンプリングした後、左目画像と右目画像を画素単位でマージするステレオ画像形式が存在する（以下、この画像形式を「画素インタリーブ方式」と呼ぶ）。また更に、左目画像と右目画像のそれぞれを垂直方向１／２サブサンプリングしスクイーズした後、上下に並べるステレオ画像形式等、左右の画像をそれぞれ空間面で１／２の画素数にサブサンプリングし、両者を組み合わせて、１枚分の画像データと同じ画素数にする様々なステレオ画像形式が存在する。 Further, for example, there is a stereo image format in which the left eye image and the right eye image are subsampled at a grid-like position for interpolating each other, and then the left eye image and the right eye image are merged in units of pixels (hereinafter, this image format is referred to as “pixel interleaving”). Called "method"). Furthermore, after subsampling and squeezing each of the left-eye image and the right-eye image in the vertical direction, the left and right images, such as a stereo image format arranged vertically, are subsampled to 1/2 the number of pixels in the space plane, There are various stereo image formats in which both are combined so as to have the same number of pixels as one image data.

米国特許出願公開第２００３／０２２３４９９号明細書US Patent Application Publication No. 2003/0223499

しかしながら、上記特許文献に記載のステレオ画像形式では、左目画像と右目画像との間に視差が存在するため、これらを区別することなく１枚の画像に連結されたままで予測符号化を行うと、圧縮効率が悪くなる。また視差がある色信号を直交変換を含む符号化方式で符号化すると色にじみが発生する。 However, in the stereo image format described in the above patent document, there is a parallax between the left-eye image and the right-eye image, so when performing predictive coding while being connected to one image without distinguishing them, The compression efficiency becomes worse. Further, when a color signal with parallax is encoded by an encoding method including orthogonal transformation, color blur occurs.

本発明は、上記の点に鑑みて、これらの問題を解消するために発明されたものであり、１枚の画像に左右の視差画像が連結された型式のステレオ画像を効率的に予測符号化することを目的としている。 The present invention has been invented to solve these problems in view of the above points, and efficiently predictively encodes a stereo image of a type in which left and right parallax images are connected to one image. The purpose is to do.

上述した課題を解決し、目的を達成するために、本発明は、チェッカーボードパターンの第１の位相の位置に画素を有する第１の視差画像とチェッカーボードパターンの第２の位相の位置に画素を有する第２の視差画像とを有するステレオ画像信号の、前記第１の視差画像と前記第２の視差画像とが結合されたピクチャが予測符号化された符号化データから、（Ａ）前記視差画像ごとの予測残差、（Ｂ）前記視差画像ごとの予測モードの情報、及び、（Ｃ）前記ピクチャ上で前記第１の視差画像と前記第２の視差画像との配置を示す結合情報を復号化する復号化ステップと、既に復号化された視差画像上の各画素値および前記既に復号化された視差画像上で画素が存在しない位置の周辺画素から補間された画素値を参照して、前記予測モード情報に従って前記視差画像ごとの予測信号を生成する予測信号生成ステップと、前記結合情報に従って前記予測信号と前記予測残差とを加算して復号画像信号を生成する復号画像生成ステップと、を有することを特徴とする。 In order to solve the above-described problems and achieve the object, the present invention provides a first parallax image having a pixel at the first phase position of the checkerboard pattern and a pixel at the second phase position of the checkerboard pattern. From encoded data obtained by predictively encoding a picture in which the first parallax image and the second parallax image are combined in a stereo image signal having a second parallax image having (A) the parallax Prediction residual for each image, (B) prediction mode information for each parallax image, and (C) combined information indicating an arrangement of the first parallax image and the second parallax image on the picture. With reference to a decoding step for decoding, each pixel value on the already decoded parallax image, and a pixel value interpolated from neighboring pixels at positions where no pixel exists on the already decoded parallax image, Prediction mode A prediction signal generation step for generating a prediction signal for each parallax image according to the information, and a decoded image generation step for generating a decoded image signal by adding the prediction signal and the prediction residual according to the combination information. It is characterized by.

上述した課題を解決し、目的を達成するために、本発明は、また、チェッカーボードパターンの第１の位相の位置に画素を有する第１の視差画像とチェッカーボードパターンの第２の位相の位置に画素を有する第２の視差画像とを有するステレオ画像信号の、前記第１の視差画像と前記第２の視差画像とが結合されたピクチャの各視差画像の予測信号を、複数の予測モードから選択された予測モードに従って、既に符号化された視差画像上の各画素値及び前記既に符号化された視差画像上で画素が存在しない位置の周辺画素から補間された画素値を参照して生成する予測信号生成ステップと、各視差画像と予測信号との間の予測残差を生成する残差生成ステップと、（Ａ）視差画像ごとの前記予測残差、（Ｂ）視差画像ごとの前記選択された予測モードの情報、及び、（Ｃ）前記ピクチャ上で前記第１の視差画像と前記第２の視差画像との配置を示す結合情報を符号化する符号化ステップと、前記予測残差を復号化して復号予測残差を生成する復号化ステップと、前記予測モードに係る情報と前記結合情報とに基づいて、前記復号予測残差を前記予測信号に加算して、復号画像信号を生成する復号画像生成ステップと、を有することを特徴とする。 In order to solve the above-described problems and achieve the object, the present invention also provides a first parallax image having a pixel at a first phase position of the checkerboard pattern and a second phase position of the checkerboard pattern. A prediction signal of each parallax image of a picture obtained by combining the first parallax image and the second parallax image of a stereo image signal having a second parallax image having pixels in a plurality of prediction modes. According to the selected prediction mode, each pixel value on an already encoded parallax image and a pixel value interpolated from neighboring pixels at positions where no pixel exists on the already encoded parallax image are generated. A prediction signal generation step; a residual generation step for generating a prediction residual between each parallax image and the prediction signal; (A) the prediction residual for each parallax image; and (B) the selection for each parallax image. The And (C) an encoding step for encoding combined information indicating an arrangement of the first parallax image and the second parallax image on the picture, and decoding the prediction residual A decoded image for generating a decoded image signal by adding the decoded prediction residual to the prediction signal based on the decoding step for generating a decoded prediction residual and information related to the prediction mode and the combined information And a generating step.

本発明のステレオ画像符号化方法、及び、ステレオ画像復号化方法によれば、１枚の画像に左右の視差画像が連結された型式のステレオ画像を効率的に予測符号化することが可能になる。 According to the stereo image encoding method and the stereo image decoding method of the present invention, it is possible to efficiently predictively encode a stereo image of a type in which left and right parallax images are connected to one image. .

図１は、本実施形態に係る動画像符号化のための画像符号化装置２０の機能構成の例を示す図である。FIG. 1 is a diagram illustrating an example of a functional configuration of an image encoding device 20 for moving image encoding according to the present embodiment. 図２は、予測画像作成器１１５の詳細な機能構成の例を示す図である。FIG. 2 is a diagram illustrating an example of a detailed functional configuration of the predicted image creator 115. 図３は、画素インタリーブ方式を示す図である。FIG. 3 is a diagram illustrating a pixel interleaving method. 図４は、スクイーズされたステレオ画像を示す図である。FIG. 4 is a diagram illustrating a squeezed stereo image. 図５は、インタリーブ方式のステレオ画像に対する予測処理の一例を示す図である。FIG. 5 is a diagram illustrating an example of a prediction process for an interleaved stereo image. 図６は、サイドバイサイド方式のステレオ画像に対するフレーム内予測器１１５１における予測処理の一例を示す図である。FIG. 6 is a diagram illustrating an example of prediction processing in the intra-frame predictor 1151 for a side-by-side stereo image. 図７は、画素インタリーブ方式のステレオ画像に対するフレーム間予測器１１５２における予測処理の例（その１）を示す図である。FIG. 7 is a diagram illustrating an example (part 1) of the prediction process in the inter-frame predictor 1152 for the pixel interleaved stereo image. 図８は、サイドバイサイド方式のステレオ画像に対するフレーム間予測器１１５２における予測処理の例（その２）を示す図である。FIG. 8 is a diagram illustrating an example (part 2) of the prediction process in the inter-frame predictor 1152 for the side-by-side stereo image. 図９は、画素インタリーブ方式のステレオ画像に対するループフィルタ１１３におけるデブロッキングフィルタ処理の一例を示す図である。FIG. 9 is a diagram illustrating an example of deblocking filter processing in the loop filter 113 for a pixel interleaved stereo image. 図１０は、水平方向にスクイーズされたサイドバイサイド方式のステレオ画像に対するループフィルタ１１３におけるデブロッキングフィルタ処理の別の一例を示す図である。FIG. 10 is a diagram illustrating another example of deblocking filter processing in the loop filter 113 for a side-by-side stereo image squeezed in the horizontal direction. 図１１は、ステレオ画像の符号化データに含まれる、ステレオ画像型式に係る情報のデータ構造を示す図である。FIG. 11 is a diagram illustrating a data structure of information relating to a stereo image format included in encoded data of a stereo image. 図１２は、ステレオ画像の符号化データの全体のデータ構造を示す図である。FIG. 12 is a diagram illustrating an overall data structure of encoded data of a stereo image. 図１３は、ステレオ画像の符号化データにおける符号化の所定の単位（スライス）毎の情報のデータ構造を示す図である。FIG. 13 is a diagram illustrating a data structure of information for each predetermined unit (slice) of encoding in encoded data of a stereo image. 図１４は、ステレオ画像の符号化データにおける符号化の所定の単位（マクロブロック）毎の情報のデータ構造を示す図である。FIG. 14 is a diagram illustrating a data structure of information for each predetermined unit (macroblock) of encoding in encoded data of a stereo image. 図１５は、ステレオ画像の符号化データにおけるマクロブロック毎の予測モード情報を含むデータ構造を示す図である。FIG. 15 is a diagram illustrating a data structure including prediction mode information for each macroblock in encoded data of a stereo image. 図１６は、ステレオ画像の符号化データにおけるサブマクロブロックの予測モードおよび動きベクトルに対するシンタクスの一例を示す図である。FIG. 16 is a diagram illustrating an example of sub macroblock prediction modes and syntax for motion vectors in encoded data of a stereo image. 図１７は、本実施の形態における画像符号化方法を説明するフロー図である。FIG. 17 is a flowchart for explaining an image coding method according to the present embodiment. 図１８は、ステップＳ１０３において行われる処理の一部を示すフロー図である。FIG. 18 is a flowchart showing a part of the processing performed in step S103. 図１９は、本実施の形態における画像復号化装置を示す図である。FIG. 19 is a diagram showing an image decoding apparatus according to the present embodiment. 図２０は、予測画像作成器２０４の詳細な機能構成の例を示す図である。FIG. 20 is a diagram illustrating an example of a detailed functional configuration of the predicted image creator 204. 図２１は、本実施の形態における画像復号化方法を説明するフロー図である。FIG. 21 is a flowchart for explaining an image decoding method according to the present embodiment. 図２２は、ステップＳ２０１において行われる処理の一部を示すフロー図である。FIG. 22 is a flowchart showing a part of the processing performed in step S201.

以下、本実施の形態を図面に基づき説明する。なお、以下の実施の形態において、画素の「サンプリング」及び画素の「サブサンプリング」は、複数の画素の中から一部の画素を抽出することをいう。 Hereinafter, the present embodiment will be described with reference to the drawings. In the following embodiments, “sampling” of pixels and “subsampling” of pixels refer to extracting some pixels from a plurality of pixels.

〔本実施の形態〕
図１は、本実施形態に係る動画像符号化のための画像符号化装置２０の機能構成の例を示す図である。画像符号化装置２０は、入力画像信号１００をフレーム毎に圧縮処理して符号化データ１１７を生成する。 [Embodiment]
FIG. 1 is a diagram illustrating an example of a functional configuration of an image encoding device 20 for moving image encoding according to the present embodiment. The image encoding device 20 compresses the input image signal 100 for each frame to generate encoded data 117.

画像符号化装置２０は、入力フレームバッファ１１８、差分信号作成器１０１、直交変換器１０４、量子化器１０６、逆量子化器１０９、逆直交変換器１１０、局部復号画像信号作成器１１１、ループフィルタ１１３、フレームメモリ１１４、予測画像作成器１１５、及び、エントロピー符号化器１０８を有する。 The image encoding device 20 includes an input frame buffer 118, a differential signal generator 101, an orthogonal transformer 104, a quantizer 106, an inverse quantizer 109, an inverse orthogonal transformer 110, a locally decoded image signal generator 111, a loop filter. 113, a frame memory 114, a predicted image generator 115, and an entropy encoder 108.

入力画像信号１００は、例えばフレーム単位で入力される。入力フレームバッファ１１８は、入力画像信号１００を一旦格納し、差分信号作成器１０１に対して出力する。差分信号作成器１０１は、入力画像信号１００と予測画像信号１０２との差分をとり、予測誤差信号１０３が生成される。予測画像信号１０２と差分をとられる入力画像信号１００は、左右２つの視差に分離した視差分離信号である。 The input image signal 100 is input in units of frames, for example. The input frame buffer 118 temporarily stores the input image signal 100 and outputs it to the differential signal generator 101. The difference signal generator 101 calculates a difference between the input image signal 100 and the predicted image signal 102 and generates a prediction error signal 103. An input image signal 100 that is different from the predicted image signal 102 is a parallax separation signal separated into two parallaxes.

直交変換器１０４は、予測誤差信号１０３に対して直交変換を行う。直交変換は、例えば、離散コサイン変換（以下、「ＤＣＴ」という。）である。直交変換により、直交変換係数情報１０５が得られる。直交変換係数情報１０５は、例えば、ＤＣＴ係数である。 The orthogonal transformer 104 performs orthogonal transformation on the prediction error signal 103. The orthogonal transform is, for example, a discrete cosine transform (hereinafter referred to as “DCT”). Orthogonal transformation coefficient information 105 is obtained by the orthogonal transformation. The orthogonal transform coefficient information 105 is, for example, a DCT coefficient.

直交変換器１０４は、左右の視差画像のそれぞれに対し、同一の視差画像の画素からなる矩形毎に、直交変換を行う。この矩形は、例えば、ブロック又はマクロブロック等である。 The orthogonal transformer 104 performs orthogonal transformation on each of the left and right parallax images for each rectangle composed of pixels of the same parallax image. This rectangle is, for example, a block or a macroblock.

直交変換器１０４は、例えば、視差画像の一に対し、フレームに含まれる画素位置のうち画素が存在しない画素位置の画素を予測により生成し、予測された画素を含む矩形毎に、直交変換を行う。なお、画像符号化装置２０によって生成される符号データには、この直交変換により生成される直交変換係数情報１０５のうち、低域側から所定の個数の直交変換係数情報に係る符号が含まれるとよい。これにより、予測により生成された画素により増加した冗長分を削減することができる。例えば、半分の個数の直交変換係数情報を削除して、補間前の画素数と同数の直交変換係数情報１０５を出力するとよい。 For example, with respect to one parallax image, the orthogonal transformer 104 generates a pixel at a pixel position where no pixel exists among the pixel positions included in the frame, and performs orthogonal transformation for each rectangle including the predicted pixel. Do. The code data generated by the image encoding device 20 includes codes related to a predetermined number of pieces of orthogonal transform coefficient information from the low frequency side among the orthogonal transform coefficient information 105 generated by the orthogonal transform. Good. Thereby, the redundant part increased by the pixel produced | generated by prediction can be reduced. For example, half the number of orthogonal transform coefficient information may be deleted, and the same number of orthogonal transform coefficient information 105 as the number of pixels before interpolation may be output.

直交変換器１０４は、また例えば、視差画像の一に対し、画素が存在しない画素位置をスクイーズした後、スクイーズした方向と異なる方向において同一のピクチャ内の位相を有する画素位置に対応する画素群毎に、直交変換を行う。 For example, the orthogonal transformer 104 squeezes a pixel position where no pixel exists for one parallax image, and then, for each pixel group corresponding to a pixel position having a phase in the same picture in a direction different from the squeezed direction. Then, orthogonal transformation is performed.

より詳細には、水平方向にスクイーズした場合には、奇数ラインに属する画素からなる所定の単位と、偶数ラインに属する画素からなる所定の単位と、のそれぞれを、直交変換を行う単位とする。また垂直方向にスクイーズした場合には、奇数列に属する画素からなる所定の単位と、偶数列に属する画素からなる所定の単位と、のそれぞれを、直交変換を行う単位とする。この所定の単位は、複数の画素が矩形に並び、例えば、ブロック又はマクロブロックと同数の画素が含まれている。 More specifically, when squeezing in the horizontal direction, each of a predetermined unit composed of pixels belonging to the odd lines and a predetermined unit composed of pixels belonging to the even lines is a unit for performing orthogonal transformation. In addition, when squeezing in the vertical direction, a predetermined unit composed of pixels belonging to odd columns and a predetermined unit composed of pixels belonging to even columns are set as units for performing orthogonal transformation. The predetermined unit includes a plurality of pixels arranged in a rectangle, and includes, for example, the same number of pixels as a block or a macroblock.

この直交変換では、画素補間を行わないため、画素数の増加による冗長の増加はない。そこで、生成される符号データには、この直交変換により生成される全ての直交変換係数情報１０５に係る符号が含まれるとよい。 In this orthogonal transform, since no pixel interpolation is performed, there is no increase in redundancy due to an increase in the number of pixels. Therefore, the generated code data may include codes related to all orthogonal transform coefficient information 105 generated by this orthogonal transform.

量子化器１０６は、直交変換係数情報１０５を量子化して量子化直交変換係数情報１０７を出力する。量子化直交変換係数情報１０７は、二分岐され、一方は、エントロピー符号化器１０８に導かれる。また、他方は、逆量子化器１０９及び逆直交変換器１１０により、量子化器１０６及び直交変換器１０４の処理と逆の処理を順次受けて予測誤差信号と同様の信号とされた後、局部復号画像信号作成器１１１で予測画像信号１０２と加算されることにより、局部復号画像信号１１２が生成される。 The quantizer 106 quantizes the orthogonal transform coefficient information 105 and outputs quantized orthogonal transform coefficient information 107. The quantized orthogonal transform coefficient information 107 is bifurcated, and one is guided to the entropy encoder 108. On the other hand, the inverse quantizer 109 and the inverse orthogonal transformer 110 sequentially receive the process opposite to the process of the quantizer 106 and the orthogonal transformer 104 to obtain a signal similar to the prediction error signal, and then the local signal. The decoded image signal generator 111 adds the predicted image signal 102 to generate a locally decoded image signal 112.

ループフィルタ１１３は、局部復号画像信号１１２に対してフィルタ処理を行う。フィルタ処理された局部復号画像信号１１２は、フレームメモリ１１４に格納される。ループフィルタ１１３におけるフィルタ処理については後述する。なお、ループフィルタ１１３による処理は省略されてよい。 The loop filter 113 performs a filter process on the locally decoded image signal 112. The filtered local decoded image signal 112 is stored in the frame memory 114. The filter processing in the loop filter 113 will be described later. Note that the processing by the loop filter 113 may be omitted.

予測画像作成器１１５は、入力画像信号１００及び局部復号画像信号１１２から、予測モード情報に基づく予測画像信号を生成する。予測モード情報は、複数の予測モードのうちの何れの予測モードが用いられるかを示す情報を含む。予測モードとは、例えば、Ｈ．２６４／ＭＰＥＧ−４ＡＶＣにおけるＩｎｔｒａｐｒｅｄｃｔｉｏｎの複数の予測モードである。予測モードとは、また例えば、Ｈ．２６４／ＭＰＥＧ−４ＡＶＣにおけるピクチャ間の複数の予測モードである。予測画像作成器１１５の詳細および予測画像信号の作成方法については後述する。 The predicted image creator 115 generates a predicted image signal based on the prediction mode information from the input image signal 100 and the locally decoded image signal 112. The prediction mode information includes information indicating which prediction mode of a plurality of prediction modes is used. The prediction mode is, for example, H.264. It is a plurality of prediction modes of intra prediction in H.264 / MPEG-4 AVC. The prediction mode is also, for example, H.264. It is a plurality of prediction modes between pictures in H.264 / MPEG-4 AVC. Details of the predicted image generator 115 and a method of generating a predicted image signal will be described later.

生成された予測画像信号１０２は、予測画像信号の動きベクトル情報と予測モード情報とステレオ画像形式情報とを含む情報１１６とともに予測画像作成器１１５より出力される。 The generated predicted image signal 102 is output from the predicted image generator 115 together with information 116 including motion vector information, prediction mode information, and stereo image format information of the predicted image signal.

ステレオ画像形式情報とは、複数の視差画像を含む画像データを、単一の視差画像からなる画像データのデータ形式に格納する際の、視差画像毎の画素の配置を示す情報である。ステレオ画像形式は、インタリーブ方式、サイドバイサイド方式等がある。 Stereo image format information is information indicating the arrangement of pixels for each parallax image when image data including a plurality of parallax images is stored in a data format of image data including a single parallax image. Stereo image formats include an interleave method and a side-by-side method.

エントロピー符号化器１０８では、量子化直交変換係数情報１０７、動きベクトル情報と予測モード情報とステレオ画像形式情報とを含む情報１１６がエントロピー符号化され、これによって生成された符号化データ１１７は、図示しない伝送系または蓄積系へ送出される。 In the entropy encoder 108, information 116 including quantized orthogonal transform coefficient information 107, motion vector information, prediction mode information, and stereo image format information is entropy-encoded, and encoded data 117 generated thereby is shown in FIG. Not sent to transmission system or storage system.

図２は、予測画像作成器１１５の詳細な機能構成の例を示す図である。予測画像作成器１１５は、モード判定器１１５０、フレーム内予測器１１５１、フレーム間予測器１１５２、視差画像サンプリング位置補間器１１５４、及び、視差画像分離器１１５５を有する。 FIG. 2 is a diagram illustrating an example of a detailed functional configuration of the predicted image creator 115. The predicted image creator 115 includes a mode determiner 1150, an intra-frame predictor 1151, an inter-frame predictor 1152, a parallax image sampling position interpolator 1154, and a parallax image separator 1155.

視差画像分離器１１５５は、処理するステレオ画像のサンプリング方式およびマージ方式に応じて、局部復号画像信号１１２と入力画像信号１００とを、視差画像毎に属する画素のみの信号にそれぞれ分離し、スクイーズが行われている場合には解除する。 The parallax image separator 1155 separates the local decoded image signal 112 and the input image signal 100 into signals of only pixels belonging to each parallax image according to the sampling method and merge method of the stereo image to be processed. If it is done, cancel it.

視差画像サンプリング位置補間器１１５４は、視差画像毎に分離された局部復号画像信号１１２と、入力画像信号１００とにおける、視差毎にサンプリングされなかった位置の画素を補間する。補間された入力画像信号１００は動きベクトル検出器１１５３へ、補間された局部復号画像信号１１２はフレーム内予測器１１５１、フレーム間予測器１１５２、および動きベクトル検出器１１５３に出力される。 The parallax image sampling position interpolator 1154 interpolates pixels at positions that were not sampled for each parallax in the local decoded image signal 112 separated for each parallax image and the input image signal 100. The interpolated input image signal 100 is output to a motion vector detector 1153, and the interpolated local decoded image signal 112 is output to an intra-frame predictor 1151, an inter-frame predictor 1152, and a motion vector detector 1153.

フレーム内予測器１１５１は、フレームメモリ１１４に格納されている、処理しているフレーム内で既に符号化された領域の局部復号画像信号１１２からフレーム内予測に基づく予測画像信号を作成する。 The intra-frame predictor 1151 creates a predicted image signal based on the intra-frame prediction from the locally decoded image signal 112 of the area already encoded in the frame being processed, which is stored in the frame memory 114.

フレーム間予測器１１５２は、動きベクトル検出器１１５３で検出された動きベクトルに基づいてフレームメモリ１１４に格納されている再生画像信号に対して動き補償を施し、フレーム間予測に基づく予測画像信号を作成する。 The inter-frame predictor 1152 performs motion compensation on the reproduced image signal stored in the frame memory 114 based on the motion vector detected by the motion vector detector 1153, and creates a predicted image signal based on the inter-frame prediction. To do.

フレーム内予測器１１５１はＭ個のフレーム内予測モードの処理を行い、フレーム間予測器１１５２はＮ個のフレーム間予測モードの処理を行う。 The intra-frame predictor 1151 performs processing in M intra-frame prediction modes, and the inter-frame predictor 1152 performs processing in N inter-frame prediction modes.

より詳細には、フレーム内予測器１１５１及びフレーム間予測器１１５２で行われる予測処理は、符号化単位のうちの分離あるいはスクイーズが解除された視差画像ごとに別々に行われ、予測画像信号も視差ごとに分離あるいはスクイーズが解除された信号が出力される。 More specifically, the prediction processing performed by the intra-frame predictor 1151 and the inter-frame predictor 1152 is performed separately for each parallax image in which the separation or squeezing of the coding unit is released, and the predicted image signal is also parallax. Every time, the signal from which separation or squeezing is released is output.

モード判定器１１５０は、フレーム内予測器１１５１の出力、及び、フレーム間予測器１１５２の出力が入力され、予測モードを選択する。モード判定器１１５０は、Ｎ個のフレーム間予測モードから選択された一つの予測モードに基づく予測画像信号、又は、Ｍ個のフレーム内予測モードから選択された一つの予測モードに基づく予測画像信号を出力する。 The mode determiner 1150 receives the output of the intra-frame predictor 1151 and the output of the inter-frame predictor 1152, and selects a prediction mode. The mode determiner 1150 receives a prediction image signal based on one prediction mode selected from N inter-frame prediction modes or a prediction image signal based on one prediction mode selected from M intra-frame prediction modes. Output.

モード判定器１１５０は、視差画像ごとにそれぞれ予測モードを決定し、分離又はスクイーズが解除された予測画像信号が元のステレオ画像形式にマージ又はスクイーズされた予測画像信号１０２が最終的に出力される。 The mode determiner 1150 determines a prediction mode for each parallax image, and finally outputs a prediction image signal 102 in which the prediction image signal from which separation or squeezing has been released is merged or squeezed into the original stereo image format. .

図３及び図４は、格子状にサブサンプリングされたステレオ画像型式の例を示す図である。図３は、画素インタリーブ方式を示す図である。図３では、左目画像Ｌ１及び右目画像Ｒ１が、互いに位相が異なる位置において、格子状にサブサンプリングされている。左目画像Ｌ１と右目画像Ｒ１とが、画素毎にインタリーブされて１枚のステレオ画像Ｓ１を形成する。ステレオ画像Ｓ１は、隣接する画素間において、縦又は横の位相が必ず同じとなるが、視差が異なる。なお、互いに位相が異なる位置に、格子状にサブサンプリングされた２組の画素位置を含むパターンを、チェッカーボードパターンという。ステレオ画像Ｓ１は、チェッカーボードパターンの第１の位相の位置に、左目画像の画素を有し、チェッカーボードパターンの第２の位相の位置に、右目画像の画素を有する。 3 and 4 are diagrams showing examples of stereo image types subsampled in a grid pattern. FIG. 3 is a diagram illustrating a pixel interleaving method. In FIG. 3, the left-eye image L1 and the right-eye image R1 are subsampled in a lattice shape at positions where the phases are different from each other. The left eye image L1 and the right eye image R1 are interleaved pixel by pixel to form one stereo image S1. The stereo image S1 always has the same vertical or horizontal phase between adjacent pixels, but has a different parallax. Note that a pattern including two sets of pixel positions sub-sampled in a lattice shape at positions having different phases from each other is referred to as a checkerboard pattern. The stereo image S1 has a pixel of the left eye image at the position of the first phase of the checkerboard pattern and a pixel of the right eye image at the position of the second phase of the checkerboard pattern.

図４は、スクイーズされたステレオ画像を示す図である。図４の左目画像Ｌ２及び右目画像Ｒ２は、図３と同様に、互いに位相が異なる位置において、格子状にサブサンプリングされている。図４のステレオ画像Ｓ２１は、左目画像Ｌ２に属する画素を、フレームの左半分に配し、右目画像Ｒ２に属する画素を、フレームの右半分に配する。ステレオ画像Ｓ２１は、サイドバイサイド方式のステレオ画像である。ステレオ画像Ｓ２１は、横方向に並ぶ画素間において縦方向の位相が同一であるが、横方向の位相が異なる。 FIG. 4 is a diagram illustrating a squeezed stereo image. The left-eye image L2 and the right-eye image R2 in FIG. 4 are subsampled in a grid pattern at positions having different phases as in FIG. In the stereo image S21 of FIG. 4, pixels belonging to the left-eye image L2 are arranged in the left half of the frame, and pixels belonging to the right-eye image R2 are arranged in the right half of the frame. The stereo image S21 is a side-by-side stereo image. The stereo image S21 has the same phase in the vertical direction between pixels arranged in the horizontal direction, but has a different phase in the horizontal direction.

ステレオ画像Ｓ２２は、フレームの上半分に右目画像Ｒ２に属する画素を配し、フレームの下半分に左目画像Ｌ２に属する画素を配する。ステレオ画像Ｓ２２は、縦方向に並ぶ画素間において、横方向の位相が同一であるが、縦方向の位相が異なる。 In the stereo image S22, pixels belonging to the right eye image R2 are arranged in the upper half of the frame, and pixels belonging to the left eye image L2 are arranged in the lower half of the frame. The stereo image S22 has the same horizontal phase between pixels arranged in the vertical direction, but has a different vertical phase.

図５は、インタリーブ方式のステレオ画像に対するフレーム内予測器１１５１における予測処理の一例を示す図である。図５では、矩形が予測を行うブロックＢを示しており、丸が視差画像に含まれる画素を示している。ブロックＢの中は、左目用の画素のみを示し、ブロックＢの外に隣接する画素から左目用画素を垂直に予測する例を示している。 FIG. 5 is a diagram illustrating an example of a prediction process in the intra-frame predictor 1151 for an interleaved stereo image. In FIG. 5, a rectangle indicates the block B for which prediction is performed, and a circle indicates a pixel included in the parallax image. In the block B, only the pixel for the left eye is shown, and an example in which the pixel for the left eye is predicted vertically from the adjacent pixels outside the block B is shown.

隣接画素は一画素ごとに右目用の画素がインタリーブされている。予測には、右目用の画素位置の画素を隣接する左目用の画素を用いて補間した画像を参照画像として用いる。補間画素は、例えば、隣接する左目用の２画素の平均値を補間画素の画素値としてもよい。図５では、斜線のハッチングを付した丸が補間画素である。 The adjacent pixels are interleaved with pixels for the right eye for each pixel. For the prediction, an image obtained by interpolating the pixel at the pixel position for the right eye using the adjacent pixel for the left eye is used as a reference image. For the interpolation pixel, for example, an average value of two adjacent left-eye pixels may be used as the pixel value of the interpolation pixel. In FIG. 5, a circle with hatching is an interpolation pixel.

また例えば、１つの画素位置に対し、左右又は上下に隣接する左目画像の６画素を用いて、フィルタ処理を行い補間してもよい。より詳細には、例えば、１つの画素位置に対し、左右の２方向にそれぞれ３画素ずつ、計６画素を選択する。この６画素から、Ｈ．２６４で用いる１／２画素補間用の６タップのフィルタ（１／３２，−５／３２，２０／３２，２０／３２，−５／３２，１／３２）により補間してもよい。 Further, for example, for one pixel position, interpolation may be performed by performing filter processing using six pixels of the left-eye image adjacent to the left or right or up and down. More specifically, for example, a total of 6 pixels are selected for each pixel position, 3 pixels in each of the left and right directions. From these six pixels, H. The interpolation may be performed by a 6-tap filter (1/32, −5/32, 20/32, 20/32, −5/32, 1/32) for 1/2 pixel interpolation used in H.264.

参照画素位置が画面の端にあり、補間に必要な隣接画素が得られない場合は、例えば、最も端の画素値を画面外の画素の画素値として補間してもよく、画面の端の画素を対称点として画面内の画素を画面の外に鏡面コピーして補間に用いてもよい。このように補間により得られた参照画素を用いて、ブロック内の予測を行う。 If the reference pixel position is at the edge of the screen and the adjacent pixel necessary for interpolation cannot be obtained, for example, the pixel value at the end may be interpolated as the pixel value of the pixel outside the screen. As a symmetry point, the pixels in the screen may be mirror-copied outside the screen and used for interpolation. Prediction within the block is performed using the reference pixels obtained by interpolation in this way.

図５では、垂直方向に予測する場合は、例えばブロックの上端に隣接する参照画素の値をそのまま予測値として用いる。ブロック内の右目用の画素についても同様に、参照画素の１画素おきにある左目用の画素位置を、隣接する右目用の画素を用いて補間し、参照画素としてブロック内の右目用画素位置の予測を行う。 In FIG. 5, when predicting in the vertical direction, for example, the value of the reference pixel adjacent to the upper end of the block is used as it is as the predicted value. Similarly for the right-eye pixels in the block, the left-eye pixel positions of every other reference pixel are interpolated using the adjacent right-eye pixels, and the right-eye pixel positions in the block are used as reference pixels. Make a prediction.

この際、左目用の画素と右目用の画素の予測方法（予測モード）は異なっていてもよい。ここでは垂直方向の予測を行う例について述べた。なお、予測は、サンプリングされなかった位置の画素を同一の視差画像に属する画素により補間した参照画素列を用いて予測を行う手法であれば、どのような手法であってもよい。例えばＨ．２６４のイントラ予測などを適用してもよい。 At this time, the prediction method (prediction mode) of the pixel for the left eye and the pixel for the right eye may be different. Here, an example of performing prediction in the vertical direction has been described. Note that the prediction may be performed by any method as long as the prediction is performed by using a reference pixel string obtained by interpolating pixels at unsampled positions with pixels belonging to the same parallax image. For example, H.C. H.264 intra prediction or the like may be applied.

図６は、サイドバイサイド方式のステレオ画像に対するフレーム内予測器１１５１における予測処理の一例を示す図である。図６のｆ１は、サイドバイサイドに水平方向にスクイーズされた、一の視差画像の画素配置を示しており、図６のｆ２は、スクイーズされる前の、一の視差画像の画素配置を示している。 FIG. 6 is a diagram illustrating an example of prediction processing in the intra-frame predictor 1151 for a side-by-side stereo image. 6 shows a pixel arrangement of one parallax image that is squeezed side-by-side in the horizontal direction, and f2 in FIG. 6 shows a pixel arrangement of one parallax image before being squeezed. .

図６で垂直方向に予測を行う例を示す。ｆ２において、一の視差画像に含まれる画素のを、白丸及び格子のハッチングを付した丸で示す。ｆ２において、サンプリングにより抽出されなかった画素位置Ｐ０１ないしＰ０４を、隣接するサンプリングされた画素Ｐ１０ないしＰ１３を用いて補間する。図６では、補間した画素にドットのハッチングを付す。そして、Ｐ０１からＰ０４及びＰ１０からＰ１４の参照画素列を用いて、垂直方向に参照画素値を予測値とすることで予測を行う。 FIG. 6 shows an example in which prediction is performed in the vertical direction. In f2, pixels included in one parallax image are indicated by white circles and circles with lattice hatching. At f2, pixel positions P01 through P04 that have not been extracted by sampling are interpolated using adjacent sampled pixels P10 through P13. In FIG. 6, the interpolated pixels are hatched with dots. Then, using the reference pixel columns P01 to P04 and P10 to P14, prediction is performed by setting the reference pixel value as a predicted value in the vertical direction.

また例えば水平方向に予測を行う場合は、垂直方向にはスクイーズが行われていないため、参照画素を補間せずそのまま予測に用いてもよい。また例えば、スクイーズ前にブロックの端から２画素離れている０行目、及び、２行目の参照画素（図中、白丸が示す画素）の位置を、例えば白丸の画素および１，３行目および左上の画素を用いて補間した画素に入れ替えて参照画素列として用いてもよい。 Further, for example, when prediction is performed in the horizontal direction, since squeezing is not performed in the vertical direction, the reference pixel may be used for prediction as it is without interpolation. Also, for example, the positions of the reference pixels (pixels indicated by white circles in the figure) in the 0th row and the 2nd row that are 2 pixels away from the end of the block before squeezing are set to the white circle pixels and the 1st and 3rd rows, for example Alternatively, the pixel interpolated using the upper left pixel may be used as a reference pixel column.

また、サイドバイサイドに垂直方向にスクイーズされたステレオ画像に対してフレーム内予測を行う場合は、ブロックの左に隣接する参照画素のスクイーズされる前の画素配置においてサンプリングされなかった画素位置を、隣接するサンプリングされた画素を用いて補間した参照画素列を用いることにより、水平方向の予測を行うことができる。 In addition, when performing intra-frame prediction on a stereo image squeezed in the vertical direction side-by-side, pixel positions that have not been sampled in the pixel arrangement before squeezing of the reference pixel adjacent to the left of the block are adjacent to each other. By using the reference pixel row interpolated using the sampled pixels, prediction in the horizontal direction can be performed.

なお、予測は、水平方向や垂直方向の予測に限らず、スクイーズ前の画素配置においてサンプリングされなかった位置の画素を補間して予測に用いる手法であれば、どのような予測方法を用いてもよい。例えばＨ．２６４のイントラ予測などを適用してもよい。 Note that the prediction is not limited to prediction in the horizontal direction and vertical direction, and any prediction method can be used as long as it is a method used for prediction by interpolating pixels at positions not sampled in the pixel arrangement before squeezing. Good. For example, H.C. H.264 intra prediction or the like may be applied.

図７は、画素インタリーブ方式のステレオ画像に対するフレーム間予測器１１５２における予測処理の一例を示す図である。図７においてｆ３１及びｆ３２が参照画像であり、ｆ３０が符号化対象となる入力画像である。参照画像は左目画像ｆ３１と右目画像ｆ３２に分離され、サンプリングされなかった位置の画素が周辺画素を用いて補間されたものを用いる。ｆ３１及びｆ３２において、黒丸及び黒四角で示す画素が補間された画素である。 FIG. 7 is a diagram illustrating an example of a prediction process in the inter-frame predictor 1152 for a pixel interleaved stereo image. In FIG. 7, f31 and f32 are reference images, and f30 is an input image to be encoded. The reference image is separated into a left-eye image f31 and a right-eye image f32, and a pixel that has not been sampled is interpolated using surrounding pixels. In f31 and f32, pixels indicated by black circles and black squares are interpolated pixels.

入力画像ｆ３０の符号化単位であるブロックｂ３０内の、それぞれの視差画像に属する画素のみを用い、視差毎に分離され補間された参照画像に対して予測画像を得る。予測画像は、例えば、ブロックマッチングなどの手法を用いて視差画像ごとにそれぞれ動きベクトルを求め、その動きベクトルがそれぞれ指し示す位置の参照画像を、再び視差ごとに相補的に格子状にサンプリングし、サンプリングされなかった位置に異なる視差の画素をインタリーブすることで予測画像が得られる。 Using only the pixels belonging to the respective parallax images in the block b30, which is the encoding unit of the input image f30, a predicted image is obtained for the reference image separated and interpolated for each parallax. The predicted image is obtained by, for example, obtaining a motion vector for each parallax image using a technique such as block matching, and again sampling the reference image at the position indicated by the motion vector in a complementary grid pattern for each parallax. A predicted image is obtained by interleaving pixels with different parallaxes at positions that have not been performed.

なお、動きベクトルの算出を、１／２画素精度や１／４画素精度で行う場合は、図７のｆ３１及びｆ３２のような、視差毎に分離され、サンプリングされなかった位置を周辺画素を用いて補完した参照画像において、１／２画素位置や１／４画素位置を補間した参照画像を用いる。この補間には、例えばＨ．２６４の補間フィルタなどを用いるとよい。 In addition, when calculating the motion vector with 1/2 pixel accuracy or 1/4 pixel accuracy, a position that is separated for each parallax and not sampled, such as f31 and f32 in FIG. In the supplemented reference image, a reference image obtained by interpolating the ½ pixel position or the ¼ pixel position is used. For example, H.H. For example, an H.264 interpolation filter may be used.

図８は、サイドバイサイド方式のステレオ画像に対するフレーム間予測器１１５２における予測処理の図７とは異なる例を示す図である。図８のｆ４１が参照画像の一部（左目画像に対応する画素を図示）であり、ｆ４０が符号化対象となる入力画像の一部である。ｆ４１において、ドットのハッチングを付した丸が、補間画素である。 FIG. 8 is a diagram illustrating an example different from FIG. 7 of the prediction processing in the inter-frame predictor 1152 for the side-by-side stereo image. In FIG. 8, f41 is a part of the reference image (pixels corresponding to the left-eye image are shown), and f40 is a part of the input image to be encoded. In f41, a circle with dot hatching is an interpolation pixel.

参照画像ｆ４１は、スクイーズが解除され、サンプリングされなかった位置の画素が周辺画素を用いて補間されている。入力画像ｆ４０は、スクイーズが解除された状態の画素を示している。入力画像のある符号化単位に対して、それぞれの視差について補間された参照画像に対して、例えばブロックマッチングなどの手法を用いて視差画像ごとにそれぞれ動きベクトルを求め、その動きベクトルがそれぞれ指し示す位置の参照画像を、再び視差ごとに相補的に格子状にサンプリングし、スクイーズすることで予測画像が得られる。 In the reference image f41, squeezing is canceled, and pixels at positions where sampling is not performed are interpolated using peripheral pixels. The input image f40 shows pixels in a state where squeeze is released. For each encoding unit of the input image, for each reference image interpolated for each parallax, a motion vector is obtained for each parallax image using a technique such as block matching, and the position indicated by each motion vector The reference image is again sampled in a lattice pattern complementarily for each parallax and squeezed to obtain a predicted image.

なお、動きベクトルの算出を、１／２画素精度や１／４画素精度で行う場合は、ｆ４１のような、スクイーズが解除され、サンプリングされなかった位置を周辺画素を用いて補完した参照画像において、１／２画素位置や１／４画素位置を補間した参照画像を用いる。この補間は、例えばＨ．２６４の補間フィルタなどを用いるとよい。 In addition, when the motion vector is calculated with 1/2 pixel accuracy or 1/4 pixel accuracy, a reference image such as f41 in which the squeezed is canceled and the unsampled position is complemented using the peripheral pixels. A reference image obtained by interpolating the ½ pixel position and the ¼ pixel position is used. This interpolation is, for example, H.264. For example, an H.264 interpolation filter may be used.

次に、ループフィルタ１１３の詳細例について述べる。図９は、画素インタリーブ方式のステレオ画像に対するループフィルタ１１３におけるデブロッキングフィルタ処理の一例を示す図である。図９において、第１の視差画像の画素を丸で示し、第２の視差画像の画素を四角で示す。 Next, a detailed example of the loop filter 113 will be described. FIG. 9 is a diagram illustrating an example of deblocking filter processing in the loop filter 113 for a pixel interleaved stereo image. In FIG. 9, pixels of the first parallax image are indicated by circles, and pixels of the second parallax image are indicated by squares.

ループフィルタのうち、デブロッキングフィルタは、符号化単位間の境界に発生するブロック歪を軽減するためのフィルタである。視差画像が画素インタリーブされた状態でデブロッキングフィルタを施すと、一のフィルタ処理に用いられる画素に複数の視差が存在する場合に、ブロック歪を除去する効果が得られないことがある。そのため、垂直および水平方向の境界エッジに対して、１画素おきの画素を用いてフィルタ処理を行うことにより、視差間の影響をなくしてブロック歪を除去する効果が得られる。なお、デブロッキングフィルタには、例えば低域通過フィルタを用いるとよい。 Among the loop filters, the deblocking filter is a filter for reducing block distortion that occurs at the boundary between coding units. When the deblocking filter is applied in a state where the parallax images are interleaved with pixels, the effect of removing block distortion may not be obtained when a plurality of parallaxes exist in a pixel used for one filter process. Therefore, an effect of removing block distortion without an influence between parallaxes can be obtained by performing filter processing using every other pixel on the boundary edges in the vertical and horizontal directions. For example, a low-pass filter may be used as the deblocking filter.

図１０は、水平方向にスクイーズされたサイドバイサイド方式のステレオ画像に対するループフィルタ１１３におけるデブロッキングフィルタ処理の別の一例を示す図である。水平方向には同じ視差の画素が並んでいるため、そのままの画素列を用いてフィルタを施すが、垂直方向には、一行ごとに位相が１画素ずれた画素が並んでいるため、１画素おきの画素列を用いてフィルタ処理を行う。 FIG. 10 is a diagram illustrating another example of deblocking filter processing in the loop filter 113 for a side-by-side stereo image squeezed in the horizontal direction. Since pixels with the same parallax are arranged in the horizontal direction, a filter is applied using the pixel columns as they are. However, in the vertical direction, pixels whose phases are shifted by one pixel are arranged in every row, so every other pixel. Filter processing is performed using the pixel columns.

また、垂直方向にスクイーズされている場合には、水平方向は１画素とばしの画素列を用いてフィルタ処理を行い、垂直方向はそのままの画素列を用いてフィルタ処理を行えばよい。なお、サイドバイサイド方式の場合、視差間の境界となるラインに対するフィルタ処理は行わないようにしてもよい。 In addition, when squeezed in the vertical direction, the filtering process may be performed using a pixel column with one pixel skipped in the horizontal direction and the pixel column as it is in the vertical direction. In the case of the side-by-side method, the filtering process may not be performed on the line that is a boundary between parallaxes.

ループフィルタの例として、符号化単位の境界に対するデブロッキングフィルタについて説明したが、別のフィルタ処理（例えば、入力画像に近づけるウィナーフィルタの処理）を行ってもよい。この場合、画素インタリーブ方式のステレオ画像については、視差ごとに分離した画素列あるいは画素ブロックに対してフィルタ処理を行い、サイドバイサイド方式のステレオ画像に対しては、スクイーズを解除した画素列あるいは画素ブロックに対してフィルタ処理を施すのが望ましい。何れのループフィルタについても、視差が同一、かつ、画素位置の縦又は横の位相が同一の画素からなる複数の画素によるフィルタリング処理を行うとよい。 As an example of the loop filter, the deblocking filter with respect to the boundary of the coding unit has been described. However, another filter process (for example, a Wiener filter process that approximates the input image) may be performed. In this case, for pixel interleaved stereo images, filter processing is performed on pixel columns or pixel blocks separated for each parallax, and for side-by-side stereo images, squeezing is canceled on pixel columns or pixel blocks. It is desirable to apply a filtering process to the filter. For any loop filter, it is preferable to perform a filtering process using a plurality of pixels having the same parallax and the same vertical or horizontal phase of the pixel position.

なお、ループフィルタ１１３は、デブロッキングフィルタに限らず、参照画像となる復号画像に対してフィルタ処理を行うフィルタであるとよい。ループフィルタ１１３によるフィルタ処理は、ブロック境界に限らず、復号化されたステレオ画像が有する画素に対して行われるとよい。 Note that the loop filter 113 is not limited to a deblocking filter, and may be a filter that performs a filter process on a decoded image serving as a reference image. The filter processing by the loop filter 113 is not limited to the block boundary, and may be performed on the pixels included in the decoded stereo image.

本実施の形態では、参照画像となる復号画像に対するフィルタ処理を行う、Ｉｎ−Ｌｏｏｐフィルタについて説明するが、参照画像とは別に、出力される復号画像に対するフィルタ処理を行う、ポストフィルタについても、ループフィルタ１１３と同様に、視差画像毎に属する画素に対して、画素位置の縦又は横の位相が同一の画素からなる複数の画素によるフィルタリング処理を行うとよい。 In this embodiment, an In-Loop filter that performs filter processing on a decoded image serving as a reference image will be described. However, a post filter that performs filter processing on an output decoded image separately from the reference image is also a loop. Similar to the filter 113, it is preferable to perform a filtering process on a pixel belonging to each parallax image by a plurality of pixels including pixels having the same vertical or horizontal phase of the pixel position.

次に、直交変換器１０４における直交変換方法の例を示す。視差画像が画素インタリーブされている場合には、複数の視差の画素が一の直交変換の単位に含まれることにより直交変換が効率的に行われない。また、サイドバイサイド形式の場合は、行あるいは列ごとに１画素位相がずれた状態になっているため、やはり直交変換が効率的に行われない。 Next, an example of the orthogonal transformation method in the orthogonal transformer 104 is shown. When parallax images are interleaved with pixels, orthogonal transformation is not efficiently performed because a plurality of parallax pixels are included in one orthogonal transformation unit. Further, in the case of the side-by-side format, since the phase of one pixel is shifted for each row or column, the orthogonal transformation is not efficiently performed.

そこで、直交変換を効率的に行う例を２つ示す。
１つめの例は、残差信号を視差画像ごとに分離、あるいはスクイーズを解除し、サンプリングされなかった位置を周辺画素位置から補間した信号に対して、直交変換を施す。この場合、変換係数が２倍に増え、情報量が増加するため、生成される変換係数のうち、例えば、低域側の半分の個数の変換係数のみをエントロピー符号化器１０８にて符号化するとよい。なお、低域側の係数は、量子化単位の左斜め上の領域に位置する係数である。 Therefore, two examples for efficiently performing orthogonal transform are shown.
In the first example, the residual signal is separated for each parallax image, or squeezing is canceled, and orthogonal transformation is performed on a signal obtained by interpolating a position not sampled from the peripheral pixel position. In this case, since the transform coefficient is doubled and the amount of information is increased, for example, only half of the low-frequency side transform coefficients are encoded by the entropy encoder 108 among the generated transform coefficients. Good. Note that the low-frequency side coefficient is a coefficient located in the upper left region of the quantization unit.

２つめの例は、画素インタリーブ方式のステレオ画像の場合に、残差信号を視差画像ごとに分離し、さらに偶数ラインおよび奇数ラインのブロックに分割してそれぞれのブロックに対して直交変換を施す。またサイドバイサイド形式の場合は、水平にスクイーズされているときは偶数ラインと奇数ラインのブロックに分割し、垂直にスクイーズされているときは偶数列と奇数列のブロックに分割し、それぞれに対して直交変換を施す。２つ目の例の場合は、変換係数の数は増えないので、出力される変換係数を全てエントロピー符号化器１０８にて符号化してもよい。 In the second example, in the case of a pixel interleaved stereo image, the residual signal is separated for each parallax image, further divided into even-numbered lines and odd-numbered lines, and orthogonal transformation is performed on each block. In the case of the side-by-side format, when horizontally squeezed, it is divided into even-numbered and odd-numbered blocks, and when vertically squeezed, it is divided into even-numbered and odd-numbered blocks and orthogonal to each other. Apply conversion. In the case of the second example, since the number of transform coefficients does not increase, all of the output transform coefficients may be encoded by the entropy encoder 108.

予測画像作成器１１５から出力される、動きベクトル及び予測モード情報、すなわち動きベクトル検出器１１５３から出力される動きベクトルとモード判定器１１５０によって選択された予測モードを示す予測モード情報は、エントロピー符号化器１０８に送られてエントロピー符号化され、符号化データ１１７に多重化される。 The motion vector and prediction mode information output from the predicted image generator 115, that is, the motion vector output from the motion vector detector 1153 and the prediction mode information indicating the prediction mode selected by the mode determiner 1150 are entropy encoded. Sent to the device 108, entropy-coded, and multiplexed into the encoded data 117.

図１１から図１６は、ステレオ画像の形式および、動きベクトル情報および予測モード情報を多重化するデータ構造の例を示す図である。 FIGS. 11 to 16 are diagrams illustrating examples of the format of a stereo image and a data structure in which motion vector information and prediction mode information are multiplexed.

図１１は、ステレオ画像がどのような形式で空間方向にサブサンプリングされ、マージされているかを示す情報を多重化するデータ構造の例を示している。この情報は、シーケンス単位あるいはピクチャ単位に多重化される。 FIG. 11 shows an example of a data structure in which information indicating how stereo images are subsampled and merged in the spatial direction is multiplexed. This information is multiplexed in sequence units or picture units.

図１１において、ｓｐａｔｉａｌ＿ｓｕｂｓａｍｐｌｉｎｇ＿ｄｉｒｅｃｔｉｏｎ＿ｉｎｆｏは、サンプリングの方向を示す情報であり、例えばこの値が０１の場合は水平方向、１０の場合は垂直方向、１１の場合は格子状にサブサンプリングされていることを示す。ｌｅｆｔ＿ｓｕｂｓａｍｐｌｉｎｇ＿ｐｏｓｉｔｉｏｎ＿ｆｌａｇ，ｒｉｇｈｔ＿ｓｕｂｓａｍｐｌｉｎｇ＿ｐｏｓｉｔｉｏｎ＿ｆｌａｇは、左右それぞれの視差画像がどの位相でサブサンプリングされているかを示している。例えば、水平方向にサブサンプリングされている場合、０ならばトップフィールド、１ならばボトムフィールドが用いられる。垂直方向にサブサンプリングされている場合、０ならば偶数列、１ならば奇数列が用いられる。格子状にサブサンプリングされている場合、０ならば左上の画素から順にサブサンプリングされ、１ならば左上の１画素目の画素からサブサンプリングされていることを示す。 In FIG. 11, spatial_subsampling_direction_info is information indicating the direction of sampling. For example, when this value is 01, it indicates that it is subsampled in the horizontal direction, when it is 10, when it is vertical, and when it is 11, it is subsampled in a grid pattern. left_subsampling_position_flag and right_subsampling_position_flag indicate in which phase the left and right parallax images are subsampled. For example, when sub-sampling is performed in the horizontal direction, if 0, the top field is used, and if it is 0, the bottom field is used. When sub-sampling is performed in the vertical direction, even columns are used if 0 and odd columns are used. In the case of sub-sampling in a grid pattern, 0 indicates sub-sampling in order from the upper left pixel, and 1 indicates sub-sampling from the first pixel on the upper left.

ｓｑｕｅｅｚｉｎｇ＿ｆｌａｇは、サンプリングされた視差画像がスクイーズされているか否かを示す。例えば０がスクイーズなしでマージされ、１がスクイーズしてマージされていることを示す。 squeezing_flag indicates whether or not the sampled parallax image is squeezed. For example, 0 indicates merging without squeezing, and 1 indicates squeezing and merging.

ｓｑｕｅｅｚｉｎｇ＿ｆｌａｇが０の場合、すなわちスクイーズされていない場合に、ｌｅｆｔ＿ｓｕｂｓｕｍｐｌｉｎｇ＿ｐｏｓｉｔｉｏｎ＿ｆｌａｇとｒｉｇｈｔ＿ｓｕｂｓａｍｐｌｉｎｇ＿ｐｏｓｉｔｉｏｎ＿ｆｌａｇが同じ値のときは、どちらかの視差画像の位相をずらさないとサンプリングしなかった位置に画素を補填することができない。このため、どちらの視差画像をずらして補填するかを示すｓｈｉｆｔ＿ｖｉｅｗ＿ｆｌａｇの値を符号化データに含ませる。この値が０の場合は例えば右目画像の位置をずらして補填していることを示し、１の場合には例えば左目画像の位置をずらして補填していることを示す。 When squeezing_flag is 0, that is, when squeezing is not performed, if left_subsumming_position_flag and right_subsampling_position_flag are the same value, it is not possible to fill in a position where sampling is not performed unless the phase of either parallax image is shifted. For this reason, the value of shift_view_flag indicating which parallax image is shifted and compensated is included in the encoded data. When this value is 0, for example, it indicates that the position of the right eye image is shifted, and when it is 1, for example, it indicates that the position of the left eye image is shifted and compensated.

一方、ｓｑｕｅｅｚｉｎｇ＿ｆｌａｇが１の場合、すなわちスクイーズされている場合には、どの方向にスクイーズされているかを示すｓｑｕｅｅｚｉｎｇ＿ｄｉｒｅｃｔｉｏｎ＿ｆｌａｇがさらに付け加えられる。例えば、この値が０の場合は水平方向に、１の場合は垂直方向にスクイーズされていることを示す。 On the other hand, when squeezing_flag is 1, that is, when squeezed, squeezing_direction_flag indicating which direction is squeezed is further added. For example, when this value is 0, it indicates that it is squeezed in the horizontal direction, and when it is 1, it indicates that it is squeezed in the vertical direction.

さらに、ｌｅｆｔ＿ｉｍａｇｅ＿ｔｏｐｌｅｆｔ＿ｆｌａｇはどの方向に連結されているかを示すフラグであり、例えばこの値が０の場合は左目画像が左（水平方向にスクイーズされた場合）又は上（垂直方向にスクイーズされた場合）になるように連結され、１の場合は右目画像が左（水平方向にスクイーズされた場合）又は上（垂直方向にスクイーズされた場合）になるように連結されていることを示す。 Further, left_image_topleft_flag is a flag indicating in which direction it is connected. For example, when this value is 0, the left-eye image is left (when squeezed in the horizontal direction) or upward (when squeezed in the vertical direction). 1 indicates that the right-eye images are connected so as to be left (when squeezed in the horizontal direction) or above (when squeezed in the vertical direction).

以上のような情報がビットストリームに多重化され、これらの情報からサンプリング方法が格子状である場合、サンプリング位相やスクイーズの有無などを用いて、前述の予測画像作成器１１５やループフィルタ１１３、直交変換器１０４における視差画像分離やスクイーズの解除処理、サンプリングされていない画素位置の補間処理などが行われる。 When the above information is multiplexed into a bit stream and the sampling method is a lattice from these pieces of information, using the sampling phase, presence / absence of squeeze, etc., the above-described prediction image creator 115, loop filter 113, orthogonal The converter 104 performs a parallax image separation process, a squeeze release process, an interpolation process for unsampled pixel positions, and the like.

次に、動きベクトル情報および予測モード情報の多重化例について説明する。ここでは、例として、Ｈ．２６４と同様のフレーム内予測およびフレーム間予測を行った場合の動きベクトル情報と予測モード情報を多重化する際のデータ構造の例を示す。 Next, an example of multiplexing motion vector information and prediction mode information will be described. Here, as an example, H.P. 2 shows an example of a data structure when multiplexing motion vector information and prediction mode information when intra-frame prediction and inter-frame prediction similar to H.264 are performed.

図１２は符号化装置が出力する符号化データであるビットストリームのシンタックス構成を示す図である。このシンタクス構造例において、アクセスユニット（３０１）は、復号化処理の際に読み込まれる単位であり、この単位ごとに復号化処理が行われる。アクセスユニット（３０１）の内部には、処理の内容や符号化構造に応じて、ハイレベルシンタクス（３０２）、スライスレイヤシンタクス（３０５）などが詰め込まれている。ハイレベルシンタクス（３０２）には、スライス以上の上位レイヤのシンタクス情報が詰め込まれている。例えばこのレイヤにステレオ画像形式に関する情報が格納される。ハイレベルシンタクスは、シーケンスパラメータセットシンタクス（３０３）とピクチャパラメータセットシンタクス（３０４）を有する。 FIG. 12 is a diagram illustrating a syntax configuration of a bit stream that is encoded data output from the encoding apparatus. In this syntax structure example, the access unit (301) is a unit that is read in the decoding process, and the decoding process is performed for each unit. In the access unit (301), high-level syntax (302), slice layer syntax (305), and the like are packed according to the processing content and the coding structure. The high level syntax (302) is packed with syntax information of higher layers above the slice. For example, information regarding the stereo image format is stored in this layer. The high-level syntax has a sequence parameter set syntax (303) and a picture parameter set syntax (304).

スライスレイヤシンタクス（３０５）は、スライス毎に必要な情報が明記されている。スライスレイヤシンタクス（３０５）は、スライスヘッダシンタクス（３０６）とスライスデータシンタクス（３０７）から構成される。スライスデータシンタクス（３０７）は、スライス内に含まれるマクロブロックレイヤの復号に必要な情報が明記されたマクロブロックレイヤシンタクス（３０８）および格子状にサブサンプリングされたステレオ画像用のマルチブロックレイヤ（３０９）が含まれており、マクロブロック毎に必要とされる量子化パラメータの変更値やモード情報などが明記されている。 The slice layer syntax (305) specifies information necessary for each slice. The slice layer syntax (305) includes a slice header syntax (306) and a slice data syntax (307). The slice data syntax (307) includes a macroblock layer syntax (308) in which information necessary for decoding a macroblock layer included in the slice is specified, and a multi-block layer (309) for a stereo image subsampled in a grid pattern. ), And the change value of the quantization parameter and the mode information required for each macroblock are specified.

上述したシンタクスは復号化時に必要不可欠な構成要素であり、これらのシンタクス情報が欠けると復号化時に正しくデータを復元できなくなることがある。 The above-described syntax is an indispensable component at the time of decoding. If the syntax information is missing, there is a case where data cannot be correctly restored at the time of decoding.

図１３から図１６は、動きベクトル情報および予測モード情報を多重化するデータ構造の例を示す図である。 13 to 16 are diagrams illustrating examples of data structures for multiplexing motion vector information and prediction mode information.

図１３は、スライスデータのシンタックスの一例を示す図である。ｍｂ＿ｓｋｉｐ＿ｆｌａｇ、ｍａｃｒｏｂｌｏｃｋ＿ｌａｙｅｒ（）、及び、ｍａｃｒｏｂｌｏｃｋ＿ｌａｙｅｒ＿ｆｏｒ＿ｃｈｅｃｋｅｒｂｏａｒｄ（）が、あわせて、スライス内のマクロブロックの数と同数だけ、符号化データに含まれる。 FIG. 13 is a diagram illustrating an example of the syntax of slice data. The mb_skip_flag, macroblock_layer (), and macroblock_layer_for_checkerboard () are included in the encoded data in the same number as the number of macroblocks in the slice.

ｍｂ＿ｓｋｉｐ＿ｆｌａｇは、マクロブロックの復号化の際に必要な情報を一切明記せずともそれまでの符号化および復号化の情報から復号可能かどうかを示すフラグであり、ＴＲＵＥの場合はマクロブロックレイヤシンタクス以下の情報が多重されなくてもよい。ＦＡＬＳＥの場合には、まず対象画像のサンプリング方法を示すｓｐａｔｉａｌ＿ｓｕｂｓａｍｐｌｉｎｇ＿ｄｉｒｅｃｔｉｏｎ＿ｉｎｆｏの値の判定を行う。 mb_skip_flag is a flag indicating whether or not decoding is possible from the previous encoding and decoding information without specifying any information necessary for decoding the macroblock. In the case of TRUE, the mb_skip_flag is below the macroblock layer syntax. The information may not be multiplexed. In the case of FALSE, first, the value of spatial_subsampling_direction_info indicating the sampling method of the target image is determined.

ｓｐａｔｉａｌ＿ｓｕｂｓａｍｐｌｉｎｇ＿ｄｉｒｅｃｔｉｏｎ＿ｉｎｆｏの値が格子状にサブサンプリングされていることを示す値、例えば１１であった場合には、スライス内に含まれるマクロブロックの復号に必要な情報を明記したマクロブロックレイヤシンタクスｍａｃｒｏｃｕｂｅ＿ｌａｙｅｒ＿ｆｏｒ＿ｃｈｅｃｋｅｒｂｏａｒｄ（）がｅｎｄ＿ｏｆ＿ｓｌｉｃｅ＿ｆｌａｇが１となるまで、順次送信される。 When the value of spatial_subsampling_direction_info is a value indicating that subsampling is performed in a lattice shape, for example, 11, the macroblock layer syntax macrocube_layer_for_checkerboard () specifying information necessary for decoding the macroblock included in the slice is provided. The data are sequentially transmitted until end_of_slice_flag becomes 1.

一方、ｓｐａｔｉａｌ＿ｓｕｂｓａｍｐｌｉｎｇ＿ｄｉｒｅｃｔｉｏｎ＿ｉｎｆｏが格子状にサブサンプリングされていることを示す値ではない場合には、例えばＨ．２６４で用いられているのと同じｍａｃｒｏｂｌｏｃｋ＿ｌａｙｅｒ（）がｅｎｄ＿ｏｆ＿ｓｌｉｃｅ＿ｆｌａｇが１となるまで、順次送信される。 On the other hand, when the spatial_subsampling_direction_info is not a value indicating that the sub-sampling is performed in a lattice shape, for example, H.264. The same macroblock_layer () used in H.264 is sequentially transmitted until end_of_slice_flag becomes 1.

ｅｎｄ＿ｏｆ＿ｓｌｉｃｅ＿ｆｌａｇは、スライス内に含まれるマクロブロックのシンタクスが全て送信されたかどうかを示すフラグを示しており、０の場合はまだ送信されていないマクロブロックシンタクスが存在することを示す。一方、１の場合は、スライス内のマクロブロックシンタクスが全て送信されたことを示す。 The end_of_slice_flag indicates a flag indicating whether or not all macroblock syntax included in the slice has been transmitted. When 0, the end_of_slice_flag indicates that there is a macroblock syntax that has not yet been transmitted. On the other hand, a case of 1 indicates that all macroblock syntaxes in the slice have been transmitted.

図１４は、格子状にサブサンプリングされたステレオ画像用のマクロブロックレイヤのシンタクスの一例を示す図である。まず、ｓｑｕｅｅｚｉｎｇ＿ｆｌａｇの値を判定し、スクイーズされていない場合はマクロブロックの内部に視差画像が２つ含まれているため、マクロブロックのタイプ（フレーム内予測か、フレーム間予測かなどを示す情報）を示すｍｂ＿ｔｙｐｅが２つ多重され、ｖｉｅｗＣｎｔが２に設定される。 FIG. 14 is a diagram illustrating an example of the syntax of a macroblock layer for a stereo image subsampled in a lattice shape. First, the value of squeezing_flag is determined. If the squeezing flag is not squeezed, the macroblock contains two parallax images, so the macroblock type (information indicating whether the prediction is intraframe prediction or interframe prediction, etc.) Mb_type indicating 2 is multiplexed, and viewCnt is set to 2.

一方、スクイーズされている場合には、視差画像は左か右かのどちらかしか含まれていないので、ｍｂ＿ｔｙｐｅが１つ多重され、ｖｉｅｗＣｎｔが１に設定される。 On the other hand, when squeezed, the parallax image includes only either the left or the right, so one mb_type is multiplexed and viewCnt is set to 1.

次に、ｖｉｅｗＣｎｔの個数分だけ、マクロブロック内の予測モードの情報が多重される。すなわち、マクロブロックがフレーム間予測符号化されていて、かつマクロブロック内がさらに４つのサブマクロブロックに分割して符号化されている場合（ｍｂ＿ｔｙｐｅ［ｖｉｅｗＩｄｘ］！＝ＩＮＴＲＡ＆＆ＮｕｍＭｂＰａｒｔ＝＝４）には、サブマクロブロックの数だけｓｕｂ＿ｍｂ＿ｐｒｅｄ（ｍｂ＿ｔｙｐｅ［ｖｉｅｗＩｄｘ］）が多重される。マクロブロックがフレーム内予測符号化されている場合や、サブマクロブロックの個数が４つでない場合には、ｍｂ＿ｐｒｅｄ（ｍｂ＿ｔｙｐｅ［ｖｉｅｗＩｄｘ］）が多重される。 Next, the prediction mode information in the macroblock is multiplexed by the number of viewCnts. That is, when a macroblock is interframe prediction encoded and the inside of the macroblock is further divided into four submacroblocks (mb_type [viewIdx]! = INTRA && NumNumPart == 4), Sub_mb_pred (mb_type [viewIdx]) is multiplexed by the number of sub macroblocks. When the macroblock is subjected to intraframe prediction coding or when the number of sub macroblocks is not four, mb_pred (mb_type [viewIdx]) is multiplexed.

図１５は、マクロブロックの予測モードに関する情報を示すシンタクスの一例を示す図である。このシンタクスの内部では、予測モードの予測を同じ視差画像の符号化済みのブロックから予測するかどうかを示すｕｓｅ＿ｓａｍｅｖｉｅｗ＿ｐｒｅｄｍｏｄｅ＿ｆｌａｇが多重される。この値が０の場合は、同じ視差画像の画素に対して用いた周辺ブロックの予測モードや動きベクトルから対象とするブロックの予測モードや動きベクトルを予測符号化することを示す。一方、この値が１の場合には、符号化済みの例えばサブサンプル前に同じ位置の異なる視差画像のブロックや周辺ブロックに対する予測モードや動きベクトルから対象とするブロックの予測モードや動きベクトルを予測符号化することを示す。 FIG. 15 is a diagram illustrating an example of syntax indicating information related to a macroblock prediction mode. Inside this syntax, use_sameview_predmode_flag indicating whether to predict the prediction mode from an encoded block of the same parallax image is multiplexed. When this value is 0, it indicates that the prediction mode and motion vector of the target block are predictively encoded from the prediction mode and motion vector of the neighboring blocks used for the pixels of the same parallax image. On the other hand, when this value is 1, the prediction mode or motion vector of the target block is predicted from the prediction mode or motion vector for a block or peripheral block of a different parallax image at the same position before encoded sub-sample, for example. Indicates encoding.

予測モードの予測や、動きベクトルの予測方法については、例えばＨ．２６４と同様の手法を用いるとよい。例えば、ｍｂ＿ｔｙｐｅ［ｖｉｅｗＩｄｘ］がフレーム内予測を示すＩＮＴＲＡ＿ＮｘＭであった場合には、内部のイントラ予測の単位ブロックの個数分だけ、ｐｒｅｖ＿ｉｎｔｒａＮｘＭ＿ｐｒｅｄ＿ｍｏｄｅ＿ｆｌａｇ［ｌｕｍａＢｌｋＩｄｘ］が多重される。 For prediction of prediction modes and motion vector prediction methods, see, for example, H.264. A method similar to that of H.264 may be used. For example, when mb_type [viewIdx] is INTRA_NxM indicating intra-frame prediction, prev_intraNxM_pred_mode_flag [lumaBlkIdx] is multiplexed by the number of internal intra prediction unit blocks.

ｐｒｅｖ＿ｉｎｔｒａＮｘＭ＿ｐｒｅｄ＿ｍｏｄｅ＿ｆｌａｇ［ｌｕｍａＢｌｋＩｄｘ］は、符号化済みの同じ視差画像に属する左の画素ブロック（以下、「左ブロック」という。）の予測モード、又は、異なる視差画像の同じ位置の画素ブロックの予測モードと同じ予測モードを使うかどうかを示すフラグである。 prev_intraNxM_pred_mode_flag [lumaBlkIdx] is the same prediction mode as the prediction mode of the left pixel block (hereinafter referred to as “left block”) belonging to the same encoded parallax image or the pixel block at the same position of a different parallax image. A flag that indicates whether to use the mode.

このフラグが１、すなわち同じ予測モードを使わない場合には、左ブロックの予測モードと符号化単位の予測モードが異なるときに、その差分値がｒｅｍ＿ｉｎｔｒａＮｘＭ＿ｐｒｅｄ＿ｍｏｄｅとして多重化される。また、入力画像がカラー信号の場合、例えばＹＵＶ４２０などであった場合には、色差信号に対するフレーム内予測モードを示すｉｎｔｒａ＿ｃｈｒｏｍａ＿ｐｒｅｄが別途送信される。 When this flag is 1, that is, when the same prediction mode is not used, when the prediction mode of the left block is different from the prediction mode of the coding unit, the difference value is multiplexed as rem_intraNxM_pred_mode. Further, when the input image is a color signal, for example, YUV420, intra_chroma_pred indicating an intra-frame prediction mode for the color difference signal is separately transmitted.

なお、同じ視差画像に属する左の画素ブロックが存在しない場合は、上のブロックの予測モードを用い、左も上も存在しない場合はあらかじめ決められた予測モードを用いるとよい。 When there is no left pixel block belonging to the same parallax image, the prediction mode of the upper block is used, and when neither left nor upper is present, a predetermined prediction mode may be used.

一方、ｍｂ＿ｔｙｐｅがフレーム間予測を示している場合には、内部のサブブロックの個数分だけ対応するサブブロックの動きベクトルの予測動きベクトルからの差分が多重される。ただし、動きベクトルの差分情報が０である場合は、サブマクロブロックの０番目のマクロブロックタイプをＤＩＲＥＣＴとして以下の動きベクトルの差分に関する情報は送信されない（ＭｂＰａｒｔＰｒｅｄＭｏｄｅ（ｍｂ＿ｔｙｐｅ，０）＝＝Ｄｉｒｅｃｔの場合に相当する）。 On the other hand, when mb_type indicates inter-frame prediction, the difference from the predicted motion vector of the motion vector of the corresponding sub-block is multiplexed by the number of internal sub-blocks. However, when the motion vector difference information is 0, the following information regarding the motion vector difference is not transmitted with the 0th macroblock type of the sub-macroblock as DIRECT (in the case of MbPartPredMode (mb_type, 0) == Direct) Equivalent to

サブマクロブロックがＤＩＲＥＣＴでない場合には動きベクトルの差分情報がサブマクロブロックごとに送信される。例えば、Ｈ．２６４では、フレーム間予測は２つの参照画像（Ｌ０、Ｌ１）のどちらか又は双方からブロック参照することができる。しかし、サブマクロブロックのタイプがＰｒｅｄ＿Ｌ１すなわち参照画像Ｌ１からのみの予測でない場合は、参照画像Ｌ０に対する動きベクトルの、周辺ブロックから予測された動きベクトルとの差分が水平、垂直方向それぞれで多重される（ｍｖｄ＿ｌ０［ｍｂＰａｒｔＩｄｘ］［０］［ｃｏｍｐＩｄｘ］）。 When the sub macroblock is not DIRECT, motion vector difference information is transmitted for each submacroblock. For example, H.M. In H.264, inter-frame prediction can be block-referenced from either or both of the two reference images (L0, L1). However, when the type of the sub macroblock is Pred_L1, that is, the prediction is not performed only from the reference image L1, the difference between the motion vector for the reference image L0 and the motion vector predicted from the neighboring blocks is multiplexed in the horizontal and vertical directions. (Mvd — 10 [mbPartIdx] [0] [compIdx]).

次に、例えばサブマクロブロックのタイプがＰｒｅｄ＿Ｌ０すなわち参照画像Ｌ０からのみの予測でない場合は、参照画像Ｌ１に対する動きベクトルの、周辺ブロックから予測された動きベクトルとの差分が水平、垂直方向それぞれで多重される（ｍｖｄ＿ｌ１［ｍｂＰａｒｔＩｄｘ］［０］［ｃｏｍｐＩｄｘ］）。 Next, for example, when the type of the sub-macroblock is Pred_L0, that is, the prediction is not performed only from the reference image L0, the difference between the motion vector for the reference image L1 and the motion vector predicted from the neighboring blocks is multiplexed in both the horizontal and vertical directions. (Mvd_l1 [mbPartIdx] [0] [compIdx]).

図１６は、サブマクロブロックの予測モードおよび動きベクトルに対するシンタクスの一例を示す図である。まず、４つのサブブロックのマクロブロックタイプが順に送信され、続いて４つのサブブロックそれぞれに対して、予測モードの予測を同じ視差画像の符号化済みのブロックから予測するかどうかを示すｕｓｅ＿ｓａｍｅｖｉｅｗ＿ｐｒｅｄｍｏｄｅ＿ｆｌａｇが多重される。次に、サブブロック内のサブパーティションに対する動きベクトル情報を示すシンタクスがサブパーティションごとにそれぞれ多重されるが、多重化に、例えば図１５の動きベクトル情報を多重化する際のデータ構造と同様のデータ構造を用いることで実現可能である。 FIG. 16 is a diagram illustrating an example of syntax for sub macroblock prediction modes and motion vectors. First, macroblock types of four sub-blocks are transmitted in order, and then use_sameview_predmode_flag indicating whether prediction of prediction mode is predicted from an encoded block of the same disparity image is multiplexed for each of the four sub-blocks. Is done. Next, the syntax indicating the motion vector information for the subpartitions in the subblock is multiplexed for each subpartition. The same data structure as that used when multiplexing, for example, the motion vector information of FIG. 15 is multiplexed. This can be realized by using a structure.

図１７及び図１８は、本実施の形態における画像符号化方法を説明するフロー図である。図１７の処理は、例えば、画像符号化装置２０によって実行される。 17 and 18 are flowcharts for explaining an image coding method according to the present embodiment. The process of FIG. 17 is executed by the image encoding device 20, for example.

図１７のステップＳ１０１では、視差画像サンプリング位置補間器１１５４が、予測のための補間画素を生成する。この補間画素は、予測信号となる。補間画素は、視差画像毎に、左目画像の補間画素は、左目画像の画素から生成する。また、右目画像の補間画素は、右目画像の画素から生成する。 In step S101 of FIG. 17, the parallax image sampling position interpolator 1154 generates an interpolation pixel for prediction. This interpolated pixel becomes a prediction signal. The interpolation pixel is generated for each parallax image, and the interpolation pixel of the left eye image is generated from the pixel of the left eye image. Further, the interpolation pixel of the right eye image is generated from the pixel of the right eye image.

ステップＳ１０２では、フレーム内予測器１１５１又はフレーム間予測器１１５２が、予測信号と処理する画素との残差である予測残差を生成する。なお、フレーム間予測器１１５２では、動き補償のための動きベクトル検出が行われた後、予測残差が生成される。 In step S102, the intra-frame predictor 1151 or the inter-frame predictor 1152 generates a prediction residual that is a residual between the prediction signal and the pixel to be processed. Note that the interframe predictor 1152 generates a prediction residual after performing motion vector detection for motion compensation.

ステップＳ１０３では、直交変換器１０４が予測残差を直交変換して変換係数を生成し、量子化器１０６が変換係数を量子化し、エントロピー符号化器１０８が、量子化された変換係数をエントロピー符号化する。ステップＳ１０３では、さらに、エントロピー符号化器１０８が、付随する情報を符号化する。付随する情報は、予測モード情報、及び、ステレオ画像形式情報を含む。また、フレーム間予測が行われた場合には、さらに、動きベクトル情報が符号化される。 In step S103, the orthogonal transformer 104 orthogonally transforms the prediction residual to generate a transform coefficient, the quantizer 106 quantizes the transform coefficient, and the entropy encoder 108 converts the quantized transform coefficient into an entropy code. Turn into. In step S103, the entropy encoder 108 further encodes accompanying information. The accompanying information includes prediction mode information and stereo image format information. In addition, when inter-frame prediction is performed, motion vector information is further encoded.

ステップＳ１０４では、逆量子化器１０９が量子化された変換係数を逆量子化し、逆直交変換器１１０が、逆量子化された変換係数に対して逆直交変換を行う。これにより、復号された予測残差信号を得る。 In step S104, the inverse quantizer 109 inversely quantizes the quantized transform coefficient, and the inverse orthogonal transformer 110 performs inverse orthogonal transform on the inversely quantized transform coefficient. As a result, a decoded prediction residual signal is obtained.

ステップＳ１０５では、局部復号画像信号作成器１１１が、復号された予測残差信号と、既に復号化されている画素から、局所復号画像を生成する。局所復号画像は、ステレオ画像形式情報により、所定のステレオ画像形式となる。ステップＳ１０５では、さらに、ループフィルタ１１３が、局所復号画像に対してフィルタ処理を行ってもよい。 In step S105, the local decoded image signal generator 111 generates a local decoded image from the decoded prediction residual signal and pixels that have already been decoded. The locally decoded image has a predetermined stereo image format based on the stereo image format information. In step S105, the loop filter 113 may further perform filter processing on the locally decoded image.

ステップＳ１０６では、フレームメモリ１１４が、局所復号画像を格納する。 In step S106, the frame memory 114 stores the locally decoded image.

図１８は、ステップＳ１０３において行われる処理の一部を示すフロー図である。図１８では、ステップＳ１０３の処理のうち、予測残差に係る処理を示す。図１８のステップＳ１３１では、直交変換器１０４が、予測残差を、ピクチャ内の所定の範囲毎に直交変換する。所定の範囲とは、例えば、ブロック、又は、マクロブロック等の矩形である。これにより、直交変換係数情報１０５が生成される。 FIG. 18 is a flowchart showing a part of the processing performed in step S103. FIG. 18 illustrates a process related to a prediction residual among the processes in step S103. In step S131 of FIG. 18, the orthogonal transformer 104 orthogonally transforms the prediction residual for each predetermined range in the picture. The predetermined range is, for example, a rectangle such as a block or a macro block. Thereby, orthogonal transform coefficient information 105 is generated.

ステップＳ１３２では、エントロピー符号化器１０８が、ステップＳ１３１で生成された直交変換係数情報１０５をエントロピー符号に変換し、符号化データに多重させる。なお、エントロピー符号化に先んじて、直交変換係数情報１０５を、量子化器１０６により量子化してもよい。また、動きベクトル情報、予測モード情報、及び、ステレオ画像形式情報も、符号化データに多重させるとよい。 In step S132, the entropy encoder 108 converts the orthogonal transform coefficient information 105 generated in step S131 into an entropy code and multiplexes the encoded data. Prior to entropy coding, the orthogonal transform coefficient information 105 may be quantized by the quantizer 106. Also, motion vector information, prediction mode information, and stereo image format information may be multiplexed with the encoded data.

（デコーダ）
図１９は、本実施の形態における画像復号化装置を示す図である。図１９の画像復号化装置３０は、入力される符号化データを復号して復号画像を出力する。なお、画像復号化装置３０が処理する符号化データは、例えば、画像符号化装置２０が生成した符号化データである。 (decoder)
FIG. 19 is a diagram showing an image decoding apparatus according to the present embodiment. The image decoding apparatus 30 in FIG. 19 decodes input encoded data and outputs a decoded image. The encoded data processed by the image decoding device 30 is, for example, encoded data generated by the image encoding device 20.

画像復号化装置３０は、エントロピー復号化器２００、逆量子化器１０９、逆直交変換器１１０、局部復号画像信号作成器２０２、ループフィルタ１１３、フレームメモリ１１４、及び、予測画像作成器２０４を有する。 The image decoding apparatus 30 includes an entropy decoder 200, an inverse quantizer 109, an inverse orthogonal transformer 110, a local decoded image signal generator 202, a loop filter 113, a frame memory 114, and a predicted image generator 204. .

エントロピー復号化器２００は、符号化データ１１７を、エントロピー符号化の逆の手順で復号化する。これにより、量子化直交変換係数情報１０７、動きベクトル情報と予測モード情報とステレオ画像形式情報とを含む情報１１６、および、直交変換情報が得られる。逆量子化器１０９及び逆直交変換器１１０は、直交変換情報に基づいて、量子化直交変換係数情報１０７に対し、量子化器１０６及び直交変換器１０４の処理と対になる逆の処理を順次行い残差信号２０１を生成する。 The entropy decoder 200 decodes the encoded data 117 in the reverse procedure of entropy encoding. Thereby, quantized orthogonal transform coefficient information 107, information 116 including motion vector information, prediction mode information, and stereo image format information, and orthogonal transform information are obtained. Based on the orthogonal transform information, the inverse quantizer 109 and the inverse orthogonal transformer 110 sequentially perform inverse processing that is paired with the processing of the quantizer 106 and the orthogonal transformer 104 on the quantized orthogonal transform coefficient information 107. A residual signal 201 is generated.

予測画像作成器２０４は、動きベクトル情報と予測モード情報とステレオ画像形式情報とを含む情報１１６が入力される。予測画像作成器２０４では、復号画像信号２０３および動きベクトル情報と予測モード情報とステレオ画像形式情報とを含む情報１１６に基づく予測画像信号１０２が生成される。 The prediction image creator 204 receives information 116 including motion vector information, prediction mode information, and stereo image format information. The predicted image generator 204 generates a predicted image signal 102 based on the decoded image signal 203 and information 116 including motion vector information, prediction mode information, and stereo image format information.

局部復号画像信号作成器２０２は、残差信号２０１と予測画像信号１０２を加算する。加算された信号は復号画像信号２０３として出力され、フレームメモリ１１４に格納される。なお、加算された信号は、ループフィルタ１１３で符号化器におけるループフィルタ１１３と同様のフィルタ処理がなされてもよい。フレームメモリ１１４内に格納された復号画像信号２０３は、表示順序に並び替えて出力される。 The local decoded image signal generator 202 adds the residual signal 201 and the predicted image signal 102. The added signal is output as a decoded image signal 203 and stored in the frame memory 114. The added signal may be subjected to filter processing similar to that of the loop filter 113 in the encoder by the loop filter 113. The decoded image signal 203 stored in the frame memory 114 is output after being rearranged in the display order.

図２０は、予測画像作成器２０４の詳細な機能構成の例を示す図である。予測画像作成器２０４は、入力される復号画像信号２０３から、視差画像毎に、予測画像信号１０２を生成する。予測画像作成器２０４は、視差画像分離器２０３４、視差画像サンプリング位置補間器２０３３、切替器２０３２、フレーム内予測器２０３０、及び、フレーム間予測器２０３１を有する。 FIG. 20 is a diagram illustrating an example of a detailed functional configuration of the predicted image creator 204. The predicted image generator 204 generates a predicted image signal 102 for each parallax image from the input decoded image signal 203. The predicted image creator 204 includes a parallax image separator 2034, a parallax image sampling position interpolator 2033, a switcher 2032, an intra-frame predictor 2030, and an inter-frame predictor 2031.

視差画像分離器２０３４は、局部復号画像信号１１２を、２つの視差画像に分離する。視差画像分離器２０３４は、例えば、図１１に示すシンタクスにより符号化データ１１７に多重されているステレオ画像形式に係る情報に基づいて、視差画像への分離を行う。視差画像分離器２０３４は、さらに、スクイーズがなされている画像のスクイーズを解除する。 The parallax image separator 2034 separates the locally decoded image signal 112 into two parallax images. For example, the parallax image separator 2034 performs separation into parallax images based on information related to a stereo image format multiplexed in the encoded data 117 with the syntax illustrated in FIG. 11. The parallax image separator 2034 further cancels the squeezing of the squeezed image.

視差画像サンプリング位置補間器２０３３は、視差画像毎に、サンプリングされていない画素位置の画素を補間する。視差画像サンプリング位置補間器２０３３は、図１１に示すシンタクスにより符号化データ１１７に多重されているステレオ画像型式に係る情報に基づいて、画素の補間を行う。補間が行われた視差画像は、切替器２０３２により、フレーム内予測器２０３０又はフレーム間予測器２０３１に入力される。 The parallax image sampling position interpolator 2033 interpolates pixels at pixel positions that are not sampled for each parallax image. The parallax image sampling position interpolator 2033 performs pixel interpolation based on information related to the stereo image format multiplexed on the encoded data 117 with the syntax shown in FIG. The interpolated parallax image is input to the intra-frame predictor 2030 or the inter-frame predictor 2031 by the switcher 2032.

切替器２０３２は、例えば、図１３から図１６に示すシンタクスにより符号化データ１１７に多重される予測モード情報、及び、ステレオ画像型式に係る情報等に基づいて、フレーム内予測器２０３１とフレーム間予測器２０３２との何れかを選択する。 For example, the switching unit 2032 and the intra-frame prediction unit 2031 perform inter-frame prediction based on the prediction mode information multiplexed on the encoded data 117 with the syntax shown in FIGS. 13 to 16 and information on the stereo image format. One of the units 2032 is selected.

フレーム内予測器２０３０は、視差画像毎に分離され、又は、スクイーズが解除され、サンプリングされなかった画素位置の画素が補間された局部復号画像信号１１２を参照画像としてフレーム内予測を行う。フレーム内予測器２０３０が行うフレーム内予測は、画像符号化装置２０のフレーム内予測器１１５１と同様である。これにより、フレーム内予測に基づく予測画像信号１０２が作成される。 The intra-frame predictor 2030 performs intra-frame prediction using, as a reference image, the local decoded image signal 112 that is separated for each parallax image or is squeezed and interpolated with pixels at pixel positions that have not been sampled. The intra-frame prediction performed by the intra-frame predictor 2030 is the same as the intra-frame predictor 1151 of the image encoding device 20. As a result, a predicted image signal 102 based on intra-frame prediction is created.

フレーム間予測器２０３１は、例えば図１４又は図１５のシンタクスにより符号化データ１１７に多重された動きベクトルを用いて、フレームメモリ内の再生画像信号に対して動き補償を施し、フレーム間予測に基づく予測画像信号を作成する。フレーム間予測器２０３１の動き補償によるフレーム間予測処理は、画像符号化装置２０のフレーム間予測器１１５２と同様の処理である。 The inter-frame predictor 2031 performs motion compensation on the reproduced image signal in the frame memory using, for example, a motion vector multiplexed on the encoded data 117 with the syntax of FIG. 14 or FIG. 15, and is based on inter-frame prediction. Create a predicted image signal. The inter-frame prediction process by motion compensation of the inter-frame predictor 2031 is the same process as the inter-frame predictor 1152 of the image encoding device 20.

図２１及び図２２は、本実施の形態における画像復号化方法を説明するフロー図である。図２１の処理は、例えば、画像復号化装置３０によって実行される。 21 and 22 are flowcharts for explaining the image decoding method according to the present embodiment. The process of FIG. 21 is executed by the image decoding device 30, for example.

図２１のステップＳ２０１では、エントロピー復号化器２００が、符号化データ１１７をエントロピー復号して、量子化直交変換係数情報１０７、動きベクトル情報と予測モード情報とステレオ画像形式情報とを含む情報１１６、および、直交変換情報を得る。 In step S201 of FIG. 21, the entropy decoder 200 entropy-decodes the encoded data 117, and information 116 including quantized orthogonal transform coefficient information 107, motion vector information, prediction mode information, and stereo image format information, And orthogonal transformation information is obtained.

ステップＳ２０１では、さらに、逆量子化器１０９が、量子化直交変換係数情報１０７を逆量子化して直交変換係数情報を生成し、逆直交変換器１１０が、直交変換係数情報に逆直交変換を行い、残差信号２０１を出力する。ステップＳ２０２では、予測画像作成器２０４が、予測モード情報に従い、復号画像信号２０３から、予測信号１０２を生成する。 In step S201, the inverse quantizer 109 further inversely quantizes the quantized orthogonal transform coefficient information 107 to generate orthogonal transform coefficient information, and the inverse orthogonal transformer 110 performs inverse orthogonal transform on the orthogonal transform coefficient information. The residual signal 201 is output. In step S202, the predicted image creator 204 generates the predicted signal 102 from the decoded image signal 203 in accordance with the prediction mode information.

ステップＳ２０３では、局部復号画像信号作成器２０２が、予測信号１０２と残差信号２０１とから、復号画像信号２０３を生成する。ステップＳ２０３では、さらに、ループフィルタ１１３により、復号画像信号２０３に対し、フィルタ処理が行われてもよい。 In step S <b> 203, the local decoded image signal generator 202 generates a decoded image signal 203 from the prediction signal 102 and the residual signal 201. In step S <b> 203, the decoded image signal 203 may be further filtered by the loop filter 113.

ステップＳ２０４では、フレームメモリ１１４に、復号画像信号２０３が格納される。フレームメモリ１１４に格納された復号画像信号２０３は、フレーム毎に、表示順にしたがい出力される。 In step S <b> 204, the decoded image signal 203 is stored in the frame memory 114. The decoded image signal 203 stored in the frame memory 114 is output for each frame in the display order.

図２２は、ステップＳ２０１において行われる処理の一部を示すフロー図である。図２２のステップＳ２１１では、エントロピー復号化器２００が、符号化データ１１７を復号することにより、予測残差の変換係数を得る。この変換係数が量子化されている場合には、ステップＳ２１１では、さらに、逆量子化器１０９により、逆量子化が行われる。これにより、直交変換係数情報が出力される。 FIG. 22 is a flowchart showing a part of the processing performed in step S201. In step S211 of FIG. 22, the entropy decoder 200 decodes the encoded data 117 to obtain a transform coefficient of the prediction residual. If this transform coefficient is quantized, in step S211, the inverse quantizer 109 further performs inverse quantization. Thereby, orthogonal transformation coefficient information is output.

ステップＳ２１２では、逆直交変換器１１０が、直交変換係数情報に対する逆直交変換を行い、残差信号２０１を出力する。 In step S212, the inverse orthogonal transformer 110 performs inverse orthogonal transform on the orthogonal transform coefficient information, and outputs a residual signal 201.

以上の実施例によれば、格子状にサブサンプリングされマージされたステレオ画像を視差画像ごとに別々に予測処理、フィルタ処理、直交変換処理を行うことで、視差の影響やスクイーズによる位相のずれを排除した効果的な予測、フィルタ処理、直交変換を行うことが可能となり、符号化効率を向上させることが可能となる。 According to the above embodiments, the stereo image subsampled and merged in a grid pattern is subjected to prediction processing, filtering processing, and orthogonal transformation processing separately for each parallax image, thereby reducing the influence of parallax and phase shift due to squeeze. Effective excluded prediction, filtering, and orthogonal transformation can be performed, and encoding efficiency can be improved.

（コンピュータ等による実現）
なお、本実施の形態に係るステレオ画像符号化装置及びステレオ画像復号化装置は、例えばパーソナルコンピュータ（ＰＣ）等で実現されてもよい。また、本発明の実施形態に係るステレオ画像符号化装置及びステレオ画像復号化装置は、例えば、ＣＰＵがＲＯＭやハードディスク装置等に記憶されたプログラムに従い、ＲＡＭ等のメインメモリをワークエリアとして使用し、実行される。 (Realization by computer etc.)
Note that the stereo image encoding device and the stereo image decoding device according to the present embodiment may be realized by a personal computer (PC), for example. Further, in the stereo image encoding device and the stereo image decoding device according to the embodiment of the present invention, for example, the CPU uses a main memory such as a RAM as a work area according to a program stored in a ROM, a hard disk device, or the like, Executed.

なお、本発明は、上記実施の形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化することができる。また、上記実施の形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成することができる。例えば、実施の形態に示される全構成要素からいくつかの構成要素を削除してもよい。さらに、異なる実施の形態にわたる構成要素を適宜組み合わせても良い。 It should be noted that the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. In addition, various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the above embodiments. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, constituent elements over different embodiments may be appropriately combined.

以上のように、本発明にかかるステレオ画像符号化装置は、ステレオ画像を既存の伝送チャネルで伝送する際に有用である。 As described above, the stereo image encoding device according to the present invention is useful when transmitting a stereo image through an existing transmission channel.

２０画像符号化装置
３０画像復号化装置
１００入力画像信号
１０１差分信号作成器
１０２予測画像信号
１０３予測誤差信号
１０４直交変換器
１０５直交変換係数情報
１０６量子化器
１０７量子化直交変換係数情報
１０８エントロピー符号化器
１０９逆量子化器
１１０逆直交変換器
１１１局部復号画像信号作成器
１１２局部復号画像信号
１１３ループフィルタ
１１４フレームメモリ
１１５予測画像作成器
１１６情報
１１７符号化データ
１１８入力フレームバッファ
２００エントロピー復号化器
２０１残差信号
２０２局部復号画像信号作成器
２０３復号画像信号
２０４予測画像作成器
１１５０モード判定器
１１５１、２０３０フレーム内予測器
１１５２、２０３１フレーム間予測器
１１５３動きベクトル検出器
１１５４、２０３３視差画像サンプリング位置補間器
１１５５、２０３４視差画像分離器
２０３２切替器 20 Image coding apparatus 30 Image decoding apparatus 100 Input image signal 101 Difference signal generator 102 Predicted image signal 103 Prediction error signal 104 Orthogonal transformer 105 Orthogonal transform coefficient information 106 Quantizer 107 Quantized orthogonal transform coefficient information 108 Entropy code 109 Inverse quantizer 110 Inverse orthogonal transformer 111 Local decoded image signal generator 112 Local decoded image signal 113 Loop filter 114 Frame memory 115 Predictive image generator 116 Information 117 Encoded data 118 Input frame buffer 200 Entropy decoder 201 Residual signal 202 Local decoded image signal generator 203 Decoded image signal 204 Predicted image generator 1150 Mode determiner 1151, 2030 Intraframe predictor 1152, 2031 Interframe predictor 1153 Motion vector detector 1154, 2 33 parallax images sampling position interpolator 1155,2034 parallax image separator 2032 switcher

Claims

A first stereo image signal having a first parallax image having a pixel at a first phase position of the checkerboard pattern and a second parallax image having a pixel at a second phase position of the checkerboard pattern. (A) prediction residual for each of the parallax images, and (B) prediction mode for each of the parallax images. A decoding step of decoding information and (C) joint information indicating an arrangement of the first parallax image and the second parallax image on the picture;
The parallax image according to the prediction mode information with reference to each pixel value on the already decoded parallax image and pixel values interpolated from surrounding pixels at positions where no pixel exists on the already decoded parallax image A prediction signal generation step for generating a prediction signal for each
A decoded image generating step of generating a decoded image signal by adding the prediction signal and the prediction residual according to the combination information;
A stereo image decoding method comprising:

In the prediction signal generation step, the prediction signal is generated for each predetermined range in the picture,
2. The decoding step according to claim 1, wherein, in each of the predetermined ranges, an index calculated based on information relating to the prediction mode in the predetermined range that has already been decoded is decoded. Stereo image decoding method.

In the decoding step, for each of the parallax images, further, information on a motion vector for a reference image having the same parallax in another picture is decoded,
The information related to the motion vector is information related to a difference between the motion vector and a motion vector in a predetermined range that has already been decoded,
3. The stereo image decoding method according to claim 2, wherein in the prediction signal generation step, the prediction signal in the reference image is generated, and motion compensation is performed using the motion vector.

The encoded data is obtained by orthogonally transforming the prediction residual for each predetermined range of an image obtained by interpolating a pixel at an unextracted pixel position from the pixel of the parallax image for each parallax image. When including the sign of the coefficient,
The decoding step includes
For each parallax image, a prediction residual is obtained by inverse orthogonal transformation with respect to the transform coefficient for each predetermined range, and the prediction residual of the pixel position of the parallax image is sampled among the prediction residuals. The stereo image decoding method according to claim 3, further comprising an inverse orthogonal transform step of outputting.

The decoding step includes
A transform coefficient decoding step for decoding a transform coefficient of each of the prediction residuals of a predetermined range composed of pixels belonging to even lines and a predetermined range composed of pixels belonging to odd lines for each of the parallax images;
An inverse orthogonal transform step for generating a prediction residual by performing an inverse orthogonal transform on each of a predetermined range including pixels belonging to the even lines and a predetermined range including pixels belonging to the odd lines;
5. The stereo image decoding method according to claim 4, further comprising:

In the decoded image signal, pixels included in the two parallax images are arranged in a predetermined arrangement order for each picture based on the combination information.
The stereo image decoding method according to claim 5, further comprising a filter processing step for performing a filter process for each pixel having the same pixel position phase with respect to the pixels included in the decoded image signal.

In the decoded image signal, when the first parallax image and the second parallax image are arranged in a complementary lattice shape,
The stereo image decoding method according to claim 6, wherein in the filter processing step, filter processing using every other pixel of the decoded image signal is performed.

A first stereo image signal having a first parallax image having a pixel at a first phase position of the checkerboard pattern and a second parallax image having a pixel at a second phase position of the checkerboard pattern. A prediction signal of each parallax image of a picture in which the parallax image and the second parallax image are combined, according to a prediction mode selected from a plurality of prediction modes, A prediction signal generation step of generating by referring to a pixel value interpolated from neighboring pixels at positions where no pixel exists on the already encoded parallax image;
A residual generation step for generating a prediction residual between each parallax image and the prediction signal;
(A) the prediction residual for each parallax image, (B) information on the selected prediction mode for each parallax image, and (C) the first parallax image and the second parallax image on the picture An encoding step for encoding combined information indicating the arrangement of
A decoding step of decoding the prediction residual to generate a decoded prediction residual;
A decoded image generating step of generating a decoded image signal by adding the decoded prediction residual to the prediction signal based on the information related to the prediction mode and the combined information;
A stereo image encoding method comprising: