JP2009100071A

JP2009100071A - Method, device and program for decoding multi-view image

Info

Publication number: JP2009100071A
Application number: JP2007267437A
Authority: JP
Inventors: Hiroya Nakamura; 博哉中村
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 2007-10-15
Filing date: 2007-10-15
Publication date: 2009-05-07

Abstract

<P>PROBLEM TO BE SOLVED: To reduce a redundant amount of processing when decoding an encoded bit stream that has been encoded without performing inter-viewpoint prediction. <P>SOLUTION: An inter-viewpoint prediction information decoding part 404 decodes 1 bit inter-viewpoint prediction information indicating whether decoding using the inter-viewpoint prediction is performed. A decoding value determines whether viewpoint dependency information is decoded by a viewpoint dependency information decoding part 405. When the value of the inter-viewpoint prediction information is "0", decoding does not use the inter-viewpoint prediction and the viewpoint dependency information is not decoded. In this case, a switch 406 does not switch to the viewpoint dependency information decoding part 405. Also, in this case, in the following decoding processing, a decoder determines all the viewpoints can be decoded without reference to other viewpoints and decodes them. Thus, when encoding is not performed by using inter-viewpoint prediction, viewpoints can be decoded independently every viewpoint, so that the amount of processing can be reduced. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は多視点画像復号方法、多視点画像復号装置及び多視点画像復号プログラムに係り、特に異なる視点から撮影された多視点画像を符号化して得られた多視点画像符号化データを復号する多視点画像復号方法、多視点画像復号装置及び多視点画像復号プログラムに関する。 The present invention relates to a multi-view image decoding method, a multi-view image decoding apparatus, and a multi-view image decoding program, and more particularly to decoding multi-view image encoded data obtained by encoding multi-view images taken from different viewpoints. The present invention relates to a viewpoint image decoding method, a multi-view image decoding apparatus, and a multi-view image decoding program.

＜動画像符号化方式＞
現在、時間軸上に連続する動画像をディジタル信号の情報として取り扱い、その際、効率の高い情報の放送、伝送又は蓄積等を目的とし、時間方向の冗長性を利用して動き補償予測を用い、空間方向の冗長性を利用して離散コサイン変換等の直交変換を用いて符号化圧縮するＭＰＥＧ（Moving Picture Experts Group）などの符号化方式に準拠した装置、システムが、普及している。 <Video coding system>
Currently, moving images on the time axis are handled as digital signal information. At that time, motion compensated prediction is used using redundancy in the time direction for the purpose of broadcasting, transmitting or storing information with high efficiency. Devices and systems that are compliant with a coding scheme such as MPEG (Moving Picture Experts Group) that performs coding compression using orthogonal transform such as discrete cosine transform using redundancy in the spatial direction have become widespread.

１９９５年に制定されたＭＰＥＧ−２ビデオ（ＩＳＯ／ＩＥＣ１３８１８−２）符号化方式は、汎用の動画像圧縮符号化方式として定義されており、プログレッシブ走査画像に加えてインターレース走査画像にも対応し、ＳＤＴＶ（標準解像度画像）のみならずＨＤＴＶ（高精細画像）まで対応しており、光ディスクであるＤＶＤ（Digital Versatile Disk）や、Ｄ−ＶＨＳ（登録商標）規格のディジタルＶＴＲによる磁気テープなどの蓄積メディアや、ディジタル放送等のアプリケーションとして広く用いられている。 The MPEG-2 video (ISO / IEC 13818-2) encoding system established in 1995 is defined as a general-purpose moving image compression encoding system, and supports interlaced scanned images in addition to progressive scanned images. Supports not only SDTV (standard definition images) but also HDTV (high definition images), and storage of DVDs (Digital Versatile Disks), which are optical discs, and magnetic tapes using D-VHS (registered trademark) digital VTRs. It is widely used as an application for media and digital broadcasting.

また、ネットワーク伝送や携帯端末等のアプリケーションにおいて、より高い符号化効率を目標とする、ＭＰＥＧ−４ビジュアル（ＩＳＯ／ＩＥＣ１４４９６−２）符号化方式の標準化が行われ、１９９８年に国際標準として制定された。 In addition, MPEG-4 visual (ISO / IEC 14496-2) encoding method was standardized, aiming at higher encoding efficiency in applications such as network transmission and portable terminals, and was established as an international standard in 1998. It was done.

更に、国際標準化機構（ＩＳＯ）と国際電気標準会議（ＩＥＣ）のジョイント技術委員会（ＩＳＯ／ＩＥＣ）と、国際電気通信連合電気通信標準化部門（ＩＴＵ−Ｔ）が共同でＪＶＴ（ＪｏｉｎｔＶｉｄｅｏＴｅａｍ）を組織し、共同作業によって２００３年に、ＭＰＥＧ−４ＡＶＣ／Ｈ.２６４と呼ばれる符号化方式（ＩＳＯ／ＩＥＣでは１４４９６−１０、ＩＴＵ‐ＴではＨ.２６４の規格番号がつけられている。以下、これをＡＶＣ／Ｈ.２６４符号化方式と呼ぶ）が国際標準として制定された。このＡＶＣ／Ｈ.２６４符号化方式では、従来のＭＰＥＧ−２ビデオやＭＰＥＧ−４ビジュアル等の符号化方式に比べ、より高い符号化効率を実現している。 In addition, the Joint Technical Committee (ISO / IEC) of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC) and the International Telecommunication Union Telecommunication Standardization Department (ITU-T) jointly jointly developed JVT (Joint Video Team). In 2003, the MPEG-4 AVC / H.264 encoding method (14496-10 for ISO / IEC and H.264 for ITU-T was assigned in collaboration. This is called the AVC / H.264 encoding method). This AVC / H.264 encoding method achieves higher encoding efficiency than conventional encoding methods such as MPEG-2 video and MPEG-4 visual.

ＭＰＥＧ−２ビデオやＭＰＥＧ−４ビジュアル等の符号化方式のＰピクチャ（順方向予測符号化画像）では、表示順序で直前のＩピクチャまたはＰピクチャのみから動き補償予測を行っていた。これに対して、ＡＶＣ／Ｈ.２６４符号化方式では、Ｐピクチャ及びＢピクチャは複数のピクチャを参照ピクチャとして用いることができ、この中からブロック毎に最適なものを選択して動き補償を行うことができる。また、表示順序で先行するピクチャに加えて、既に符号化済みの表示順序で後続のピクチャも参照することができる。また、ＭＰＥＧ−２ビデオやＭＰＥＧ−４ビジュアル等の符号化方式のＢピクチャは、表示順序で前方１枚の参照ピクチャ、後方１枚の参照ピクチャ、もしくはその２枚の参照ピクチャを同時に参照し、２つのピクチャの平均値を予測ピクチャとし、対象ピクチャと予測ピクチャの差分データを符号化していた。 In a P picture (forward prediction encoded image) of an encoding method such as MPEG-2 video or MPEG-4 visual, motion compensation prediction is performed only from the immediately preceding I picture or P picture in the display order. On the other hand, in the AVC / H.264 encoding method, a plurality of pictures can be used as P pictures and B pictures as reference pictures, and an optimal one is selected for each block to perform motion compensation. be able to. Further, in addition to the preceding picture in the display order, a subsequent picture can be referred to in the already encoded display order. In addition, a B picture of an encoding method such as MPEG-2 video or MPEG-4 visual refers to one reference picture in the display order, one reference picture in the rear, or two reference pictures at the same time. The average value of the two pictures is a predicted picture, and the difference data between the target picture and the predicted picture is encoded.

一方、ＡＶＣ／Ｈ.２６４符号化方式では、Ｂピクチャは表示順序で前方１枚、後方１枚という制約にとらわれず、前方や後方に関係なく任意の参照ピクチャを予測のために参照可能となった。さらに、Ｂピクチャを参照ピクチャとして参照することも可能となっている。ＰピクチャやＢピクチャの時間方向のインター予測（動き補償予測）において、複数の参照ピクチャの候補から実際にどの参照ピクチャを参照しているかを指定するために参照ピクチャリストが定義されている。参照ピクチャは参照ピクチャリストに登録され、その特定はインデックスにより指定する。このインデックスは参照インデックスと呼ばれる。また、参照ピクチャリストは参照ピクチャリスト０と参照ピクチャリスト１が定義されており、Ｐスライスは参照ピクチャリスト０に登録されている参照ピクチャのみを参照してインター予測を行うことが可能であり、Ｂスライスは参照ピクチャリスト０、参照ピクチャリスト１の両方のリストに登録されている参照ピクチャを参照してインター予測を行うことが可能である。 On the other hand, in the AVC / H.264 coding system, a B picture can be referred to for prediction regardless of the forward or backward, regardless of the forward or backward, without being restricted by the restriction of one forward and one backward in the display order. It was. Furthermore, it is possible to refer to the B picture as a reference picture. In inter prediction (motion compensation prediction) in the temporal direction of P pictures and B pictures, a reference picture list is defined to designate which reference picture is actually referred from a plurality of reference picture candidates. The reference picture is registered in the reference picture list, and its specification is specified by an index. This index is called the reference index. Further, the reference picture list defines the reference picture list 0 and the reference picture list 1, and the P slice can perform inter prediction with reference to only the reference pictures registered in the reference picture list 0. The B slice can perform inter prediction with reference to reference pictures registered in both the reference picture list 0 and the reference picture list 1.

更に、ＭＰＥＧ−２ビデオではピクチャ、ＭＰＥＧ−４ではビデオ・オブジェクト・プレーン（ＶＯＰ）を１つの単位として、ピクチャ（ＶＯＰ）毎の符号化モードが決められていたが、ＡＶＣ／Ｈ.２６４符号化方式では、スライスを符号化の単位としており、１つのピクチャ内にＩスライス、Ｐスライス、Ｂスライス等異なるスライスを混在させる構成にすることも可能となっている。 Furthermore, the encoding mode for each picture (VOP) has been determined using a picture in MPEG-2 video and a video object plane (VOP) in MPEG-4 as one unit, but AVC / H.264 encoding is used. In the system, a slice is used as an encoding unit, and it is also possible to have a configuration in which different slices such as an I slice, a P slice, and a B slice are mixed in one picture.

更に、ＡＶＣ／Ｈ.２６４符号化方式ではビデオの画素信号（符号化モード、動きベクトル、ＤＣＴ係数等）の符号化／復号処理を行うＶＣＬ（Video Coding Layer;ビデオ符号化層）と、ＮＡＬ（Network Abstraction Layer;ネットワーク抽象層）が定義されている。 Further, in the AVC / H.264 encoding method, a VCL (Video Coding Layer) that performs encoding / decoding processing of video pixel signals (encoding mode, motion vector, DCT coefficient, etc.), NAL ( Network Abstraction Layer) is defined.

ＡＶＣ／Ｈ.２６４符号化方式で符号化された符号化ビット列はＮＡＬの一区切りであるＮＡＬユニットを単位として構成される。ＮＡＬユニットはＶＣＬで符号化されたデータ（符号化モード、動きベクトル、ＤＣＴ係数等）を含むＶＣＬＮＡＬユニットと、ＶＣＬで生成されたデータを含まないｎｏｎ−ＶＣＬＮＡＬユニットがある。ｎｏｎ−ＶＣＬＮＡＬユニットにはシーケンス全体の符号化に関わるパラメータ情報が含まれているＳＰＳ（シーケンス・パラメータ・セット）や、ピクチャの符号化に関わるパラメータ情報が含まれているＰＰＳ（ピクチャ・パラメータ・セット）、ＶＣＬで符号化されたデータの復号に必須ではないＳＥＩ（補足付加情報）等がある。 An encoded bit string encoded by the AVC / H.264 encoding method is configured in units of NAL units that are a delimiter of NAL. The NAL unit includes a VCL NAL unit including data (encoding mode, motion vector, DCT coefficient, etc.) encoded by VCL, and a non-VCL NAL unit not including data generated by VCL. The non-VCL NAL unit includes an SPS (sequence parameter set) that includes parameter information related to coding of the entire sequence, and a PPS (picture parameter parameter) that includes parameter information related to picture coding. Set), SEI (supplementary additional information) and the like which are not essential for decoding data encoded by VCL.

それぞれのＮＡＬユニットのヘッダ部（先頭部）には常に”０”の値を持つフラグ（forbidden_zero_bit）、ＳＰＳ、またはＰＰＳ、または参照ピクチャとなるスライスが含まれているかどうかを見分ける識別子（nal_ref_idc）、ＮＡＬユニットの種類を見分ける識別子（nal_unit_type）が含まれる。nal_unit_typeは、ＶＣＬＮＡＬユニットの場合、”１”から”５”のいずれかの値を持つように規定されており、ｎｏｎ−ＶＣＬＮＡＬユニットの場合、例えばＳＥＩが”６”、ＳＰＳが”７”、ＰＰＳが”８”の値を持つように規定されている。復号側ではＮＡＬユニットの種類はＮＡＬユニットのヘッダ部に含まれるＮＡＬユニットの種類を見分ける識別子であるnal_unit_typeで識別することができる。 A header (head part) of each NAL unit always has a flag (forbidden_zero_bit) having a value of “0”, an identifier (nal_ref_idc) for identifying whether an SPS or PPS, or a slice serving as a reference picture is included, An identifier (nal_unit_type) for identifying the type of NAL unit is included. The nal_unit_type is defined to have any value from “1” to “5” in the case of the VCL NAL unit. For the non-VCL NAL unit, for example, the SEI is “6” and the SPS is “7”. , PPS is defined to have a value of “8”. On the decoding side, the type of the NAL unit can be identified by nal_unit_type which is an identifier for identifying the type of the NAL unit included in the header part of the NAL unit.

また、ＡＶＣ／Ｈ.２６４符号化方式における符号化の基本の単位はピクチャを分割したスライスであり、ＶＣＬＮＡＬユニットはスライス単位となっている。そこで、いくつかのＮＡＬユニットを纏めたアクセス・ユニットと呼ばれる単位が定義されており、１アクセス・ユニットに１つの符号化されたピクチャが含まれている。 The basic unit of encoding in the AVC / H.264 encoding method is a slice obtained by dividing a picture, and the VCL NAL unit is a slice unit. Therefore, a unit called an access unit in which several NAL units are combined is defined, and one encoded picture is included in one access unit.

＜多視点画像符号化方式＞
一方、２眼式立体テレビジョンにおいては、２台のカメラにより異なる２方向から撮影された左眼用画像、右眼用画像を生成し、これを同一画面上に表示して立体画像を見せるようにしている。この場合、左眼用画像、及び右眼用画像はそれぞれ独立した画像として別個に伝送、あるいは記録されている。しかし、これでは単一の２次元画像の約２倍の情報量が必要となってしまう。 <Multi-view image coding method>
On the other hand, in a twin-lens stereoscopic television, a left-eye image and a right-eye image captured from two different directions by two cameras are generated and displayed on the same screen to show a stereoscopic image. I have to. In this case, the left eye image and the right eye image are separately transmitted or recorded as independent images. However, this requires about twice as much information as a single two-dimensional image.

そこで、左右いずれか一方の画像を主画像とし、他方の画像（副画像）情報を一般的な圧縮符号化方法によって情報圧縮して情報量を抑える手法が提案されている（例えば、特許文献１参照）。この特許文献１に記載された立体テレビジョン画像伝送方式では、小領域毎に他方の画像での相関の高い相対位置を求め、その位置偏移量（視差ベクトル）と差信号（予測残差信号）とを伝送するようにしている。差信号も伝送、記録するのは、主画像と視差情報であるずれ量や位置偏移量を用いれば副画像に近い画像が復元できるが、物体の影になる部分など主画像がもたない副画像の情報は復元できないからである。 Therefore, a method has been proposed in which one of the left and right images is used as a main image, and the other image (sub-image) information is information-compressed by a general compression encoding method to suppress the amount of information (for example, Patent Document 1). reference). In the stereoscopic television image transmission method described in Patent Document 1, a relative position with high correlation in the other image is obtained for each small region, and the position shift amount (parallax vector) and a difference signal (prediction residual signal) are obtained. ). The difference signal is also transmitted and recorded because the image close to the sub-image can be restored using the main image and the amount of disparity and position shift, which is parallax information, but there is no main image such as the shadow of the object. This is because the sub-image information cannot be restored.

また、１９９６年に単視点画像の符号化国際標準であるＭＰＥＧ−２ビデオ（ＩＳＯ／ＩＥＣ１３８１８−２）符号化方式に、マルチビュープロファイルと呼ばれるステレオ画像の符号化方式が追加された（ＩＳＯ／ＩＥＣ１３８１８−２／ＡＭＤ３）。ＭＰＥＧ−２ビデオ・マルチビュープロファイルは左眼用画像を基本レイヤー、右眼用画像を拡張レイヤーで符号化する２レイヤーの符号化方式となっており、時間方向の冗長性を利用した動き補償予測や、空間方向の冗長性を利用した離散コサイン変換に加えて、視点間の冗長性を利用した視差補償予測を用いて符号化圧縮する。 In 1996, a stereo image encoding method called a multi-view profile was added to the MPEG-2 video (ISO / IEC 13818-2) encoding method, which is an international standard for single-view image encoding (ISO / IEC). IEC 13818-2 / AMD3). The MPEG-2 video multi-view profile is a two-layer encoding method that encodes the image for the left eye with the base layer and the image for the right eye with the enhancement layer, and motion compensated prediction using redundancy in the time direction In addition to discrete cosine transformation using redundancy in the spatial direction, encoding compression is performed using disparity compensation prediction using redundancy between viewpoints.

また、３台以上のカメラで撮影された多視点画像に対して動き補償予測、視差補償予測を用いて情報量を抑える手法が提案されている（例えば、特許文献２参照）。この特許文献２に記載された画像高能率符号化方式は複数の視点の参照ピクチャとのパターンマッチングを行い、誤差が最小となる動き補償／視差補償予測画像を選択することにより、符号化効率を向上させている。 In addition, a technique has been proposed for reducing the amount of information using motion compensation prediction and parallax compensation prediction for multi-viewpoint images captured by three or more cameras (see, for example, Patent Document 2). The high-efficiency image coding method described in Patent Document 2 performs pattern matching with reference pictures of a plurality of viewpoints, and selects a motion compensation / disparity compensation predicted image that minimizes an error, thereby improving coding efficiency. It is improving.

また、ＪＶＴではＡＶＣ／Ｈ.２６４符号化方式を多視点画像に拡張した多視点画像符号化（ＭＶＣ：Multiview Video Coding（以下、ＭＶＣ方式と呼ぶ））の標準化作業が進んでおり、現時点では規格の草案であるＪＤ４.０（Joint Draft 4.0）を最新版として発行している（例えば、非特許文献１参照）。上記のＭＰＥＧ−２ビデオ・マルチビュープロファイルと同様に、このＭＶＣ方式でも視点間の予測を取り入れることで、符号化効率を向上させている。 In JVT, the standardization work of multi-view video coding (MVC: Multiview Video Coding (hereinafter referred to as MVC method)) in which the AVC / H.264 coding method is extended to a multi-view image is progressing. JD4.0 (Joint Draft 4.0) is issued as the latest version (see Non-Patent Document 1, for example). Similar to the MPEG-2 video multi-view profile described above, this MVC method also improves encoding efficiency by incorporating prediction between viewpoints.

ここで、ＭＶＣ方式で多視点画像の各視点の画像を符号化、及び符号化された符号化ビット列を復号する際の視点間、及び視点画像を構成する符号化対象画像間の参照依存関係について８視点の場合を例にとって説明する。図２７は８視点からなる多視点画像を符号化する際の画像間の参照依存関係の一例を示す図であり、横軸は撮影（表示）順序での時間を示している。Ｐ（ｖ，ｔ）（視点ｖ＝０，１，２，・・・；時間ｔ＝０，１，２，・・・）は時間ｔにおける視点ｖの画像である。また、矢印の終点で指し示す画像が符号化／復号する画像で、その符号化／復号する画像を符号化／復号する際に時間方向のインター予測や視点間予測で参照する参照ピクチャは矢印の始点で指し示す画像である。更に、符号化／復号する画像を符号化／復号する際に時間方向のインター予測で参照する参照ピクチャは横方向の矢印の始点で指し示す画像であり、視点間予測で参照する参照ピクチャは縦方向の矢印の始点で指し示す画像である。 Here, with respect to the reference dependency relationship between the viewpoints when the images of the respective viewpoints of the multi-viewpoint image are encoded by the MVC method, and the encoded encoded bit string is decoded, and between the encoding target images constituting the viewpoint image The case of 8 viewpoints will be described as an example. FIG. 27 is a diagram showing an example of the reference dependency relationship between images when a multi-view image consisting of eight viewpoints is encoded, and the horizontal axis indicates time in the photographing (display) order. P (v, t) (viewpoint v = 0, 1, 2,...; Time t = 0, 1, 2,...) Is an image of the viewpoint v at time t. Also, the image pointed to by the end point of the arrow is an image to be encoded / decoded, and the reference picture that is referred to in inter prediction or inter-view prediction in the time direction when encoding / decoding the image to be encoded / decoded is the start point of the arrow It is an image pointed at. Further, when encoding / decoding an image to be encoded / decoded, a reference picture referred to by temporal inter prediction is an image pointed by the start point of a horizontal arrow, and a reference picture referred to by inter-view prediction is a vertical direction It is an image pointed by the starting point of the arrow.

視点０の画像Ｐ（０，ｔ）は、すべて他の視点の画像を参照せず、時間方向のインター予測（動き補償予測）を用いて通常のＡＶＣ／Ｈ.２６４と同様に符号化／復号する。また、視点０以外の視点（視点１〜７）では他の視点の復号画像から予測する視点間予測（視差補償予測）を用いている。例えば、視点２の画像Ｐ（２，０）は他の視点である視点０の画像Ｐ（０，０）の復号画像を参照ピクチャとし、視点間予測を用いて、符号化／復号する。また、視点１の画像Ｐ（１，０）は他の視点である視点０の画像Ｐ（０，０）と視点２の画像Ｐ（２，０）の各復号画像を参照ピクチャとし、視点間予測を用いて、符号化／復号する。 The image P (0, t) of the viewpoint 0 is not encoded with all other viewpoint images, and is encoded / decoded in the same way as normal AVC / H.264 using inter prediction in the time direction (motion compensation prediction). To do. Further, viewpoints other than viewpoint 0 (viewpoints 1 to 7) use inter-view prediction (disparity compensation prediction) predicted from decoded images of other viewpoints. For example, the image P (2, 0) of the viewpoint 2 is encoded / decoded using inter-view prediction using the decoded image of the image P (0, 0) of the viewpoint 0 as another viewpoint as a reference picture. The viewpoint 1 image P (1, 0) is a decoded reference image of the viewpoint 0 image P (0, 0) and the viewpoint 2 image P (2, 0), which are other viewpoints. Encode / decode using prediction.

視点間の予測を取り入れるに際しては、ＡＶＣ／Ｈ．２６４方式で既に定義されている参照ピクチャリストに、時間方向のインター予測（動き補償予測）に用いる参照ピクチャに加えて視点間予測に用いる参照ピクチャも登録できるように拡張することで対応している。 In incorporating predictions between viewpoints, AVC / H. The reference picture list already defined in the H.264 system is supported by extending the reference picture used for inter-view prediction in addition to the reference picture used for temporal inter prediction (motion compensation prediction). .

更に、ＭＶＣ方式は、符号化される多視点画像の視点数や、視点間方向の符号化／復号順序、視点間予測によってもたらされる各視点間の参照依存関係をシーケンス全体として符号化する仕組みを持っており、シーケンス情報のパラメータセットであるＳＰＳ（シーケンス・パラメータ・セット）を拡張することにより符号化を行う。ＳＰＳのＭＶＣ拡張部分のシンタックス構造を図２８を用いて説明する。図２８に示すシンタックス構造はＪＤ４.０で定義されているもので、「seq_parameter_set_mvc_extension」はＳＰＳに含まれるＭＶＣのための拡張である。 Furthermore, the MVC method has a mechanism for encoding the number of viewpoints of the multi-view image to be encoded, the encoding / decoding order in the inter-view direction, and the reference dependency relationship between the viewpoints brought about by the inter-view prediction as the entire sequence. The SPS (sequence parameter set), which is a parameter set of sequence information, is encoded by extending. The syntax structure of the MVC extension part of SPS is demonstrated using FIG. The syntax structure shown in FIG. 28 is defined by JD4.0, and “seq_parameter_set_mvc_extension” is an extension for MVC included in SPS.

図２８において、「num_views_minus1」は符号化する多視点画像の視点数を符号化するためのパラメータであり、視点数から「１」を引いた値である。「view_id[i]」はｉによって指し示す視点方向での符号化順序での視点の視点ＩＤを示す。すなわち、視点方向での符号化／復号順序がi番目の視点の視点ＩＤを示す。続くシンタックス要素は視点間の依存関係を示す視点依存情報である。「num_anchor_refs_l0[i]」はview_id[i]に等しい視点ＩＤを持つ視点、すなわち視点方向の符号化／復号順序でｉ番目の視点のアンカーピクチャのための参照ピクチャリスト０用に利用できる視点予測で参照できる視点の数である。 In FIG. 28, “num_views_minus1” is a parameter for encoding the number of viewpoints of the multi-viewpoint image to be encoded, and is a value obtained by subtracting “1” from the number of viewpoints. “View_id [i]” indicates the viewpoint ID of the viewpoint in the encoding order in the viewpoint direction indicated by i. That is, the viewpoint ID of the i-th viewpoint is the encoding / decoding order in the viewpoint direction. The subsequent syntax element is viewpoint dependency information indicating the dependency relationship between viewpoints. “Num_anchor_refs_l0 [i]” is a view prediction that can be used for the reference picture list 0 for the view having the view ID equal to view_id [i], that is, the anchor picture of the i-th view in the encoding / decoding order in the view direction. The number of viewpoints that can be referenced.

ここで、アンカーピクチャは復号時に異なる表示時刻の画像を参照ピクチャとして参照せずに復号することのできる画像である。アンカーピクチャの復号時に参照ピクチャとして用いることができるのは同時刻の他の視点のアンカーピクチャだけである。従って、アンカーピクチャは時間方向のインター予測を用いることはできない。例えば、図２７に示す参照依存関係で符号化する場合は、Ｐ（０，０）、Ｐ（１，０）、Ｐ（２，０）、Ｐ（０，４）、Ｐ（１，４）、Ｐ（２，４）などがアンカーピクチャである。 Here, an anchor picture is an image that can be decoded without referring to an image at a different display time as a reference picture at the time of decoding. Only the anchor picture of another viewpoint at the same time can be used as the reference picture when the anchor picture is decoded. Therefore, anchor picture cannot use inter prediction in the temporal direction. For example, in the case of encoding with the reference dependency shown in FIG. 27, P (0,0), P (1,0), P (2,0), P (0,4), P (1,4) , P (2, 4), etc. are anchor pictures.

また、図２８の「anchor_ref_l0[i][j]」はview_id[i]に等しい視点ＩＤを持つ視点、すなわち視点方向の符号化／復号順序でｉ番目の視点のアンカーピクチャ用に、初期化された参照ピクチャリスト０のｊ番目の視点間予測の参照として用いられる視点の視点ＩＤの値を示す。「num_anchor_refs_l1[i]」はview_id[i]に等しい視点ＩＤを持つ視点、すなわち視点方向の符号化／復号順序でｉ番目の視点のアンカーピクチャのための参照ピクチャリスト１用に利用できる視点間予測で参照できる視点の数である。「anchor_ref_l1[i][j]」はview_id[i]に等しい視点ＩＤを持つ視点、すなわち視点方向の符号化／復号順序でｉ番目の視点のアンカーピクチャ用に、初期化された参照ピクチャリスト１のｊ番目の視点間予測の参照として用いられる視点の視点ＩＤの値を示す。 Also, “anchor_ref_l0 [i] [j]” in FIG. 28 is initialized for the viewpoint having the viewpoint ID equal to view_id [i], that is, the anchor picture of the i-th viewpoint in the encoding / decoding order in the viewpoint direction. The viewpoint ID value of the viewpoint used as the reference for the j-th inter-view prediction in the reference picture list 0 is shown. “Num_anchor_refs_l1 [i]” is a view having a view ID equal to view_id [i], that is, an inter-view prediction that can be used for the reference picture list 1 for the anchor picture of the i-th view in the encoding / decoding order of the view direction. Is the number of viewpoints that can be referred to. “Anchor_ref_l1 [i] [j]” is a reference picture list 1 initialized for a view having a view ID equal to view_id [i], that is, an anchor picture of the i-th view in the encoding / decoding order in the view direction. Indicates the value of the viewpoint ID of the viewpoint used as a reference for the j-th inter-view prediction.

また、「num_non_anchor_refs_l0[i]」はview_id[i]に等しい視点ＩＤを持つ視点、すなわち視点方向の符号化／復号順序でｉ番目の視点のノンアンカーピクチャのための参照ピクチャリスト０用に利用できる視点間予測で参照できる視点の数である。ここで、ノンアンカーピクチャはアンカーピクチャを除く画像である。ノンアンカーピクチャの復号時に異なる表示時刻の画像を参照ピクチャとして参照することもできる。従って、時間方向のインター予測を用いることも可能である。例えば、図２７では、Ｐ（０，１）、Ｐ（１，１）、Ｐ（２，１）、Ｐ（０，２）、Ｐ（１，２）、Ｐ（２，２）などがノンアンカーピクチャである。 Also, “num_non_anchor_refs_l0 [i]” can be used for the reference picture list 0 for the viewpoint having the viewpoint ID equal to view_id [i], that is, the non-anchor picture of the i-th viewpoint in the encoding / decoding order in the viewpoint direction. This is the number of viewpoints that can be referenced in inter-view prediction. Here, the non-anchor picture is an image excluding the anchor picture. Images at different display times can be referred to as reference pictures when decoding non-anchor pictures. Therefore, it is possible to use inter prediction in the time direction. For example, in FIG. 27, P (0,1), P (1,1), P (2,1), P (0,2), P (1,2), P (2,2), etc. are non- It is an anchor picture.

また、図２８の「non_anchor_ref_l0[i][j]」は、view_id[i]に等しい視点ＩＤを持つ視点、すなわち視点方向の符号化／復号順序でｉ番目の視点のノンアンカーピクチャ用に、初期化された参照ピクチャリスト０のｊ番目の視点間予測の参照として用いられる視点の視点ＩＤの値を示す。また、「num_non_anchor_refs_l1[i]」はview_id[i]に等しい視点ＩＤを持つ視点、すなわち視点方向の符号化／復号順序でｉ番目の視点のノンアンカーピクチャのための参照ピクチャリスト１用に利用できる視点間予測で参照できる視点の数である。更に、「non_anchor_ref_l1[i][j]」はview_id[i]に等しい視点ＩＤを持つ視点、すなわち視点方向の符号化／復号順序でｉ番目の視点のノンアンカーピクチャ用に、初期化された参照ピクチャリスト０のｊ番目の視点間予測の参照として用いられる視点の視点ＩＤの値を示す。また、各シンタックス要素は指数ゴロム符号化（expothetical Golomb coding）と呼ばれる手法で符号無しで符号化される。 Also, “non_anchor_ref_l0 [i] [j]” in FIG. 28 is an initial value for a view having a view ID equal to view_id [i], that is, a non-anchor picture of the i-th view in the encoding / decoding order in the view direction. The viewpoint ID value of the viewpoint used as a reference for the j-th inter-view prediction in the reference picture list 0 is shown. Also, “num_non_anchor_refs_l1 [i]” can be used for the reference picture list 1 for the viewpoint having the viewpoint ID equal to view_id [i], that is, the non-anchor picture of the i-th viewpoint in the encoding / decoding order in the viewpoint direction. This is the number of viewpoints that can be referenced in inter-view prediction. Furthermore, “non_anchor_ref_l1 [i] [j]” is a reference initialized for a view having a view ID equal to view_id [i], that is, a non-anchor picture of the i-th view in the encoding / decoding order of the view direction. A viewpoint ID value of a viewpoint used as a reference for the j-th inter-view prediction in the picture list 0 is shown. Each syntax element is encoded without a code by a technique called exponential Golomb coding.

ここで用いる指数ゴロム符号化はユニバーサル符号化の一種で、変換テーブルを用いずに可変長符号化する方式である。指数ゴロム符号はprefixと呼ばれる“０”が連続したビット列の後に１ビットの“１”が続き、suffixと呼ばれる“０”又は“１”が連続したprefixのビット数と同じビット数のビット列が続く。prefixのビット数をｎとし、suffixの値をｓとすると、符号無し指数ゴロム符号で符号化されたビット列の値νは次式で導き出される。 Exponential Golomb coding used here is a kind of universal coding, which is a variable length coding method without using a conversion table. The exponent Golomb code is a bit string of consecutive “0” s called “prefix” followed by “1” of 1 bit, followed by a bit string of the same number of bits as the prefix of “0” or “1” called suffix. . If the number of prefix bits is n and the suffix value is s, the value ν of the bit string encoded by the unsigned exponential Golomb code is derived by the following equation.

ν＝２ⁿ−１＋ｓ（１）
符号なし指数ゴロム符号で符号化されたビット列とコード番号の関係を図２９に示す。例えば、これから復号するビット列が“0001010”の場合、最初に“０”が３つ連続するので、prefixのビット数ｎは「３」となる。次に続く“１”を省き、prefixのビット数３ビットに相当するsuffixのビット列は“０１０”であるので、このsuffixの値ｓは１０進数で「２」である。従って、（１）式により、このビット列のコード番号νは９（＝２^３−１＋２）となる。 ν = 2 ⁿ -1 + s (1)
FIG. 29 shows the relationship between the bit string encoded by the unsigned exponential Golomb code and the code number. For example, when the bit string to be decoded is “0001010”, since three “0” s are consecutive first, the prefix bit number n is “3”. The subsequent “1” is omitted, and the suffix bit string corresponding to the prefix bit number of 3 bits is “010”. Therefore, the value “s” of the suffix is “2” in decimal. Therefore, according to the equation (1), the code number ν of this bit string is 9 (= 2 ³ -1 + 2).

また、ＭＶＣ方式で定義されている図２８に示すシンタックス構造に従って、８視点からなる多視点画像を図２７に示す参照依存関係で符号化する際のＳＰＳのＭＶＣ拡張部分の各シンタックス要素とその値の一例を図３０に示す。まず、図２７に示す多視点画像の視点数は８視点であるので、「num_views_minus1」は「７」が符号無し指数ゴロム符号で符号化される。その際のビット列は“0001000”となり、７ビットである。次に、同一時刻での視点の符号化順序は視点０、視点２、視点１、視点４、視点３、視点６、視点５、視点７の順で符号化されるので、まず、「view_id[0]」の値は視点０の視点ＩＤである「０」が符号無し指数ゴロム符号で符号化され、その際のビット列は“１”となり、１ビットである。同様に、「view_id[1]」の値は視点２の視点ＩＤである「２」が符号化されてビット列は“０１１”となり、「view_id[2]」の値は視点１の視点ＩＤである「１」が符号化されてビット列は“０１０”となる。以下の「view_id[3]」から「view_id[7]」も同様に符号化される。 In addition, according to the syntax structure shown in FIG. 28 defined in the MVC scheme, each syntax element of the MVC extension portion of the SPS when encoding a multi-view image consisting of eight viewpoints with reference dependency shown in FIG. An example of the value is shown in FIG. First, since the number of viewpoints of the multi-viewpoint image shown in FIG. 27 is 8, “7” is encoded with an unsigned exponential Golomb code for “num_views_minus1”. The bit string at that time is “0001000”, which is 7 bits. Next, since the encoding order of viewpoints at the same time is encoded in the order of viewpoint 0, viewpoint 2, viewpoint 1, viewpoint 4, viewpoint 3, viewpoint 6, viewpoint 5, and viewpoint 7, first, "view_id [ The value of “0]” is a viewpoint ID of viewpoint 0 “0” is encoded with an unsigned exponential Golomb code, and the bit string at that time is “1” and is 1 bit. Similarly, the value of “view_id [1]” is encoded as “2”, which is the viewpoint ID of viewpoint 2, and the bit string is “011”, and the value of “view_id [2]” is the viewpoint ID of viewpoint 1 “1” is encoded and the bit string becomes “010”. The following “view_id [3]” to “view_id [7]” are similarly encoded.

続いて、視点依存情報のシンタックス要素が符号化される。まず、視点０は他の視点を参照しないので、「num_anchor_refs_l0[0]」、「num_anchor_refs_l1[0]」の値は「０」が符号化される。視点０に続いて符号化される視点２のアンカーピクチャの符号化の際には視点０を参照するので、視点間予測で参照する視点の数が１つであるので、「num_anchor_refs_l0[1]」の値は「１」が符号化され、「anchor_ref_l0[1][0]」は参照する視点０の視点ＩＤの値である「０」が符号化される。続く以下のシンタックス要素も同様に符号化される。 Subsequently, the syntax element of the view-dependent information is encoded. First, since the viewpoint 0 does not refer to other viewpoints, “0” is encoded as the values of “num_anchor_refs_l0 [0]” and “num_anchor_refs_l1 [0]”. Since the viewpoint 0 is referred to when the anchor picture of the viewpoint 2 that is encoded subsequent to the viewpoint 0 is encoded, the number of viewpoints to be referred to in the inter-view prediction is one. Therefore, “num_anchor_refs_l0 [1]” Is encoded as “1”, and “anchor_ref_l0 [1] [0]” is encoded as “0”, which is the value of the viewpoint ID of the viewpoint 0 to be referenced. The following syntax elements that follow are similarly encoded:

符号化側でシーケンス全体として前記パラメータ、すなわち、視点数、及び各視点の視点依存情報を符号化することにより、復号側ではシーケンス全体として、各視点の参照依存関係を判別することができる。各視点の参照依存情報は視点間予測ピクチャのための参照ピクチャリストの初期化等の復号処理に用いる。 By encoding the parameters, that is, the number of viewpoints and the view dependency information of each viewpoint, on the encoding side, it is possible to determine the reference dependency of each viewpoint as the entire sequence on the decoding side. The reference dependency information of each viewpoint is used for decoding processing such as initialization of a reference picture list for inter-view prediction pictures.

特開昭６１-１４４１９１号公報JP-A 61-144191 特開平６−９８３１２号公報JP-A-6-98312 Joint Draft 4.0 on Multiview Video Coding, Joint Video Team of ISO/IEC MPEG & ITU-T VCEG,JVT-X209, July 2007Joint Draft 4.0 on Multiview Video Coding, Joint Video Team of ISO / IEC MPEG & ITU-T VCEG, JVT-X209, July 2007

ＭＶＣ方式では、多くの視点数を有する多視点画像を符号化する場合は時間方向の冗長性を利用した時間方向のインター予測（動き補償予測）や、空間方向の冗長性を利用した直交変換に加えて、視点間の冗長性を利用した視点間予測（視差補償予測）を用いて符号化圧縮することで、より符号化効率を向上させることができる。 In the MVC method, when encoding a multi-viewpoint image having a large number of viewpoints, inter-prediction in the temporal direction (motion compensation prediction) using redundancy in the temporal direction and orthogonal transform using redundancy in the spatial direction are used. In addition, encoding efficiency can be further improved by performing encoding compression using inter-view prediction (parallax compensation prediction) using redundancy between viewpoints.

一方、多視点画像信号が符号化された符号化ビット列から必ずしも全ての視点の画像を復号する必要はなく、必要な視点だけを復号するなど、視点のアクセスが容易であることが重要なアプリケーションも存在する。しかしながら、視点間予測を用いて符号化圧縮した符号化ビット列から所望の視点の画像を取得する際には、当該視点以外に視点間予測の参照ピクチャとなる視点の画像を復号してから当該画像を復号しなければならない。そこで、視点のアクセスを優先するアプリケーション用途として、視点間予測を用いずに符号化する場合もある。 On the other hand, it is not always necessary to decode all viewpoint images from an encoded bit string in which a multi-viewpoint image signal is encoded, and there are applications where it is important that viewpoint access is easy, such as decoding only necessary viewpoints. Exists. However, when an image of a desired viewpoint is acquired from an encoded bit string that has been encoded and compressed using inter-view prediction, the image of the viewpoint that serves as a reference picture for inter-view prediction is decoded in addition to the viewpoint. Must be decrypted. Therefore, there is a case where encoding is performed without using inter-view prediction as an application use giving priority to access of the viewpoint.

従来のＭＶＣ方式では視点間予測を用いて符号化することが前提となっており、視点間予測を用いずに符号化する際にも視点依存情報として各視点の視点間予測に用いる視点の数を０として符号化しており、冗長であった。従って、復号側においても視点間予測を用いずに符号化された符号化ビット列であっても、視点依存情報を復号する必要があり、復号処理が冗長であった。 In the conventional MVC scheme, it is assumed that encoding is performed using inter-view prediction, and the number of viewpoints used for inter-view prediction of each viewpoint as viewpoint-dependent information even when encoding is performed without using inter-view prediction. Is encoded as 0, which is redundant. Therefore, even on the decoding side, it is necessary to decode the view-dependent information even if the encoded bit string is encoded without using the inter-view prediction, and the decoding process is redundant.

本発明は以上の点に鑑みてなされたもので、視点間予測を行わずに符号化された符号化ビット列を復号する場合において冗長な処理量を削減する多視点画像復号方法、多視点画像復号装置及び多視点画像復号プログラムを提供することを目的とする。 The present invention has been made in view of the above points, and a multi-view image decoding method and multi-view image decoding that reduce redundant processing amount when decoding an encoded bit string encoded without performing inter-view prediction. An object is to provide a device and a multi-viewpoint image decoding program.

上記目的を達成するため、第１の発明は、設定された複数の視点でそれぞれ得られる各視点の画像信号を含む多視点画像信号であり、一の視点の画像信号は、一の視点から実際に撮影して得られた画像信号、又は一の視点から仮想的に撮影したものとして生成した画像信号である多視点画像信号が符号化されてなる復号対象の符号化データを復号する多視点画像復号方法であって、
上記復号対象の符号化データは、
各視点の画像信号の符号化において他の視点の復号画像信号を参照して符号化する画像があるか否かを示す視点間予測情報を符号化して得た第１の符号化データと、他の視点の復号画像信号を参照して符号化する画像がある場合にのみ、視点間の依存関係を示す視点依存情報を符号化して得た第２の符号化データと、符号化対象の各視点の画像信号を、他の視点の復号画像信号を参照して符号化する画像がある場合には視点依存情報の値に従い符号化し、他の視点の復号画像信号を参照して符号化する画像がない場合には他の視点の復号画像信号を参照せずに符号化して得た第３の符号化データとを含むものであり、
第１の符号化データを復号して、視点間予測情報を得る第１のステップと、第１のステップで復号して得た視点間予測情報の値に基づき、他の視点の復号画像信号を参照して復号する画像があると判別した場合にのみ、第２の符号化データを復号して視点依存情報を得る第２のステップと、視点間予測情報と視点依存情報とが復号されたときは、その視点依存情報を用いて第３の符号化データを復号し、視点依存情報が復号されないときは、他の視点の復号画像信号を参照せずに第３の符号化データを復号して各視点の画像信号を得る第３のステップとを含むことを特徴とする。 In order to achieve the above object, the first invention is a multi-view image signal including image signals of respective viewpoints respectively obtained from a plurality of set viewpoints, and the image signal of one viewpoint is actually transmitted from one viewpoint. A multi-viewpoint image that decodes encoded data to be decoded, which is an image signal obtained by shooting the image or a multi-viewpoint image signal that is generated as a virtual image taken from one viewpoint A decryption method,
The encoded data to be decoded is
First encoded data obtained by encoding inter-view prediction information indicating whether or not there is an image to be encoded with reference to a decoded image signal of another viewpoint in encoding of the image signal of each viewpoint; Only when there is an image to be encoded with reference to the decoded image signal of the viewpoint, the second encoded data obtained by encoding the view dependency information indicating the dependency relationship between the viewpoints, and each viewpoint to be encoded When there is an image to be encoded with reference to a decoded image signal of another viewpoint, an image to be encoded with reference to a decoded image signal of another viewpoint is encoded. The third encoded data obtained by encoding without referring to the decoded image signal of the other viewpoint,
A first step of decoding first encoded data to obtain inter-view prediction information, and a decoded image signal of another viewpoint based on the value of inter-view prediction information obtained by decoding in the first step Only when it is determined that there is an image to be decoded by reference, the second step of decoding the second encoded data to obtain the view dependent information, and the inter-view prediction information and the view dependent information are decoded Decodes the third encoded data using the viewpoint dependent information, and decodes the third encoded data without referring to the decoded image signal of the other viewpoint when the viewpoint dependent information is not decoded. And a third step of obtaining an image signal of each viewpoint.

また、上記の目的を達成するため、第２の発明は、設定された複数の視点でそれぞれ得られる各視点の画像信号を含む多視点画像信号であり、一の視点の画像信号は、一の視点から実際に撮影して得られた画像信号、又は一の視点から仮想的に撮影したものとして生成した画像信号である多視点画像信号が符号化されてなる復号対象の符号化データを復号する多視点画像復号装置であって、
上記復号対象の符号化データは、各視点の画像信号の符号化において他の視点の復号画像信号を参照して符号化する画像があるか否かを示す視点間予測情報を符号化して得た第１の符号化データと、他の視点の復号画像信号を参照して符号化する画像がある場合にのみ、視点間の依存関係を示す視点依存情報を符号化して得た第２の符号化データと、符号化対象の各視点の画像信号を、他の視点の復号画像信号を参照して符号化する画像がある場合には視点依存情報の値に従い符号化し、他の視点の復号画像信号を参照して符号化する画像がない場合には他の視点の復号画像信号を参照せずに符号化して得た第３の符号化データとを含むものであり、
第１の符号化データを復号して、視点間予測情報を得る第１の復号手段と、第１の復号手段で復号して得た視点間予測情報の値に基づき、他の視点の復号画像信号を参照して復号する画像があると判別した場合にのみ、第２の符号化データを復号して視点依存情報を得る第２の復号手段と、視点間予測情報と視点依存情報とが復号されたときは、その視点依存情報を用いて第３の符号化データを復号し、視点依存情報が復号されないときは、他の視点の復号画像信号を参照せずに第３の符号化データを復号して各視点の画像信号を得る第３の復号手段と、を有することを特徴とする。 In order to achieve the above object, the second invention is a multi-viewpoint image signal including image signals of respective viewpoints respectively obtained from a plurality of set viewpoints, and an image signal of one viewpoint is Decode encoded data to be decoded, which is an image signal obtained by actually capturing from a viewpoint or a multi-viewpoint image signal that is an image signal generated as a virtual image from one viewpoint. A multi-viewpoint image decoding device,
The encoded data to be decoded was obtained by encoding inter-view prediction information indicating whether or not there is an image to be encoded with reference to a decoded image signal of another viewpoint in encoding of the image signal of each viewpoint. The second encoding obtained by encoding the view dependency information indicating the dependency relationship between the viewpoints only when there is an image to be encoded with reference to the first encoded data and the decoded image signal of another viewpoint. If there is an image that encodes the data and the image signal of each viewpoint to be encoded with reference to the decoded image signal of another viewpoint, it is encoded according to the value of the viewpoint dependent information, and the decoded image signal of the other viewpoint And the third encoded data obtained by encoding without referring to the decoded image signal of another viewpoint when there is no image to be encoded with reference to
A first decoding unit that decodes the first encoded data to obtain inter-view prediction information, and a decoded image of another viewpoint based on the value of the inter-view prediction information obtained by decoding by the first decoding unit Only when it is determined that there is an image to be decoded with reference to the signal, the second decoding means for decoding the second encoded data to obtain the viewpoint dependent information, and the inter-view prediction information and the viewpoint dependent information are decoded. When the view dependent information is not decoded, the third encoded data is decoded without referring to the decoded image signal of the other viewpoint. And third decoding means for decoding and obtaining an image signal of each viewpoint.

更に、上記の目的を達成するため、第３の発明は、第１の発明の各ステップをコンピュータにより実行させる多視点画像復号プログラムであることを特徴とする。 Furthermore, in order to achieve the above object, the third invention is a multi-viewpoint image decoding program that causes a computer to execute each step of the first invention.

これらの発明では、多視点画像の復号の際に、視点依存情報を復号することなく、視点間の予測を用いて符号化されるかどうかを示す視点予測情報を復号して得た値に基づいて、入力された復号対象の符号化データが視点間の予測をもいて符号化されているかどうかが分かる。 In these inventions, when decoding multi-viewpoint images, based on values obtained by decoding viewpoint prediction information indicating whether or not encoding is performed using prediction between viewpoints without decoding viewpoint-dependent information. Thus, it can be determined whether or not the input encoded data to be decoded is encoded with prediction between viewpoints.

本発明によれば、視点間予測情報の復号値だけに基づいて、復号対象の符号化データが視点間の予測を用いて符号化されているかどうかが分かり、視点間の予測を用いて符号化されていない場合には、視点依存情報を復号することなく視点毎に独立して符号化データを復号させることができるので処理量を削減できる。 According to the present invention, based on only the decoded value of inter-view prediction information, it can be determined whether or not the encoded data to be decoded is encoded using inter-view prediction, and encoded using inter-view prediction. If not, the encoded data can be decoded independently for each viewpoint without decoding the viewpoint dependent information, so that the processing amount can be reduced.

以下、図面と共に本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

（符号化装置及び符号化方法）
まず、本発明になる多視点画像復号方法、多視点画像復号装置及び多視点画像復号プログラムで復号する符号化ビット列を生成する多視点画像符号化装置及び多視点画像符号化方法について説明する。 (Encoding device and encoding method)
First, a multi-view image decoding method, a multi-view image decoding apparatus, a multi-view image encoding apparatus that generates an encoded bit string to be decoded by a multi-view image decoding program, and a multi-view image encoding method according to the present invention will be described.

図１は多視点画像符号化装置の一例のブロック図を示す。同図に示すように、この多視点画像符号化装置は、符号化管理部１０１、シーケンス情報符号化部１０２、ピクチャ情報符号化部１０３、画像信号符号化部１０４、多重化１０５を備え、入力される多視点画像信号を符号化して符号化データ（符号化ビット列）を出力する。ここで、上記の多視点画像信号は、設定された複数の視点でそれぞれ得られる各視点の画像信号を含む多視点画像信号であり、一の視点の画像信号は、その一の視点から実際に撮影して得られた画像信号、又はその一の視点から仮想的に撮影したものとして生成した画像信号である。 FIG. 1 shows a block diagram of an example of a multi-view image encoding apparatus. As shown in the figure, the multi-view image encoding apparatus includes an encoding management unit 101, a sequence information encoding unit 102, a picture information encoding unit 103, an image signal encoding unit 104, and a multiplexing 105, and an input The multi-view image signal to be encoded is encoded to output encoded data (encoded bit string). Here, the multi-viewpoint image signal is a multi-viewpoint image signal including the image signals of the respective viewpoints obtained from a plurality of set viewpoints, and the image signal of one viewpoint is actually transmitted from the one viewpoint. It is an image signal obtained by photographing, or an image signal generated as a virtually photographed image from one viewpoint.

本発明になる多視点画像復号方法、多視点画像復号装置及び多視点画像復号プログラムで復号する符号化ビット列を生成する図１の多視点画像符号化装置の説明においては、ＡＶＣ／Ｈ.２６４符号化方式を多視点画像に拡張したＭＶＣ方式による多視点画像符号化装置として説明する。 In the description of the multi-view image decoding method, multi-view image decoding apparatus, and multi-view image decoding apparatus of FIG. 1 that generates an encoded bit string to be decoded by the multi-view image decoding program according to the present invention, the AVC / H.264 code is used. The multi-view image encoding apparatus according to the MVC method in which the encoding method is extended to a multi-view image.

ＭＶＣ方式は視点間の予測を取り入れて符号化することで、符号化効率を向上させている。一方、多視点画像信号が符号化された符号化ビット列から必ずしも全ての視点の画像を復号する必要はなく、必要な視点だけを復号するなど、視点のアクセスが容易であることが重要なアプリケーションも存在する。しかしながら、視点間予測を用いて符号化圧縮した符号化ビット列から所望の視点の画像を取得する際には、当該視点以外に視点間予測の参照ピクチャとなる視点の画像を復号してから当該画像を復号しなければならない。そこで、視点のアクセスを優先するアプリケーション用途として、視点間予測を用いずに符号化することも多い。 In the MVC method, encoding is performed by incorporating prediction between viewpoints, thereby improving the encoding efficiency. On the other hand, it is not always necessary to decode all viewpoint images from an encoded bit string in which a multi-viewpoint image signal is encoded, and there are applications where it is important that viewpoint access is easy, such as decoding only necessary viewpoints. Exists. However, when an image of a desired viewpoint is acquired from an encoded bit string that has been encoded and compressed using inter-view prediction, the image of the viewpoint that serves as a reference picture for inter-view prediction is decoded in addition to the viewpoint. Must be decrypted. Therefore, in many cases, encoding is performed without using inter-viewpoint prediction as an application application that prioritizes viewpoint access.

ここで、ＭＶＣ方式で多視点画像の各視点の画像を視点間予測を用いずに符号化、及び符号化された符号化ビット列を復号する際の画像間の参照依存関係について８視点の場合を例にとって説明する。図１０は８視点からなる多視点画像を視点間予測を用いずに符号化する際の画像間の参照依存関係の一例を示す図であり、図２７と同様に、横軸は撮影（表示）順序での時間を示している。Ｐ（ｖ，ｔ）（視点ｖ＝０，１，２，・・・；時間ｔ＝０，１，２，・・・）は時間ｔにおける視点ｖの画像である。また、矢印の終点で指し示す画像が符号化／復号する画像で、その符号化／復号する画像を符号化／復号する際に時間方向のインター予測で参照する参照ピクチャは矢印の始点で指し示す画像である。 Here, in the case of eight viewpoints regarding the reference dependency relationship between images when an image of each viewpoint of a multi-viewpoint image is encoded without using inter-view prediction in the MVC method and the encoded bit string is decoded. Let's take an example. FIG. 10 is a diagram illustrating an example of the reference dependency relationship between images when a multi-view image composed of 8 viewpoints is encoded without using inter-view prediction, and the horizontal axis is taken (displayed) as in FIG. Shows time in order. P (v, t) (viewpoint v = 0, 1, 2,...; Time t = 0, 1, 2,...) Is an image of the viewpoint v at time t. In addition, the image pointed to by the end point of the arrow is an image to be encoded / decoded, and the reference picture to be referred to in inter prediction in the time direction when the image to be encoded / decoded is encoded / decoded is an image pointed to by the start point of the arrow. is there.

ＭＶＣ方式で定義されている図２８に示すシンタックス構造に従って、８視点の多視点画像を図１０に示す参照依存関係のように視点間予測を用いずに符号化する際のＳＰＳのＭＶＣ拡張部分の各シンタックス要素とその値の一例を図１１に示す。 According to the syntax structure shown in FIG. 28 defined by the MVC method, the MVC extension part of SPS when encoding multi-view images of 8 viewpoints without using inter-view prediction like the reference dependency shown in FIG. FIG. 11 shows an example of each syntax element and its value.

まず、図１０に示す多視点画像の視点数は８視点であるので、図１１に示すように、「num_views_minus1」は「７」が符号無し指数ゴロム符号で符号化される。その際のビット列は“000100”となり、７ビットである。次に、同一時刻での視点の符号化／復号順序は視点０、視点１、視点２、視点３、視点４、視点５、視点６、視点７の順で符号化されるものとし、まず、「view_id[0]」の値は視点０の視点ＩＤである「０」が符号無し指数ゴロム符号で符号化され、その際のビット列は“１”となり、１ビットである。続いて、「view_id[1]」の値は視点１の視点ＩＤである「１」が符号化されてビット列は“０１０”となり、「view_id[2]」の値は視点２の視点ＩＤである「２」が符号化されてビット列は“０１１”となる。以下の「view_id[3]」から「view_id[7]」も同様に符号化される。 First, since the number of viewpoints of the multi-viewpoint image shown in FIG. 10 is 8, as shown in FIG. 11, “7” is encoded with an unsigned exponential Golomb code as “num_views_minus1”. The bit string at that time is “000100”, which is 7 bits. Next, the encoding / decoding order of viewpoints at the same time is encoded in the order of viewpoint 0, viewpoint 1, viewpoint 2, viewpoint 3, viewpoint 4, viewpoint 5, viewpoint 6, and viewpoint 7, The value of “view_id [0]” is “0”, which is the viewpoint ID of the viewpoint 0, is encoded with an unsigned exponential Golomb code, and the bit string at that time is “1” and is 1 bit. Subsequently, the value of “view_id [1]” is encoded as “1”, which is the viewpoint ID of viewpoint 1, the bit string is “010”, and the value of “view_id [2]” is the viewpoint ID of viewpoint 2. “2” is encoded and the bit string becomes “011”. The following “view_id [3]” to “view_id [7]” are similarly encoded.

続いて、視点依存情報のシンタックス要素が符号化される。ここで符号化される多視点画像は視点間予測を用いずに符号化されるので、どの視点においてもアンカーピクチャ／ノンアンカーピクチャ、参照ピクチャリスト０／参照ピクチャリスト１を問わず他の視点を参照しないので、すべての視点において「num_anchor_refs_l0[i]」、「num_anchor_refs_l1[i]」、「num_non_anchor_refs_l0[i]」、「num_non_anchor_refs_l1[i]」の値は「０」が符号化される。従って、ＭＶＣ方式で定義されている図２８に示すシンタックス構造に従って、８視点の多視点画像を視点間予測を用いずに符号化する場合、視点依存情報に関するシンタックス要素を符号化した結果は図１１に示すように“１”が３２個連続したビット列となる。 Subsequently, the syntax element of the view-dependent information is encoded. Since the multi-view image to be encoded here is encoded without using inter-view prediction, other viewpoints can be selected regardless of anchor picture / non-anchor picture, reference picture list 0 / reference picture list 1 at any viewpoint. Since no reference is made, the values of “num_anchor_refs_l0 [i]”, “num_anchor_refs_l1 [i]”, “num_non_anchor_refs_l0 [i]”, and “num_non_anchor_refs_l1 [i]” are encoded as “0” in all viewpoints. Therefore, when encoding a multi-view image of 8 viewpoints without using inter-view prediction according to the syntax structure shown in FIG. 28 defined in the MVC method, the result of encoding syntax elements related to view-dependent information is As shown in FIG. 11, “1” is a continuous 32 bit string.

すなわち、前述したように、ＭＶＣ方式は視点間予測を用いずに符号化する際の特別な仕組みが無く、視点間予測を用いずに符号化する際にも視点依存情報として各視点の視点間予測に用いる視点の数を視点毎に符号化する必要があり、そのために図１１に示すように視点数に４を乗じた数（ここでは、３２）の“１”が連続したビット列となり、冗長となる。そこで、本発明では、ＭＶＣ方式に視点間予測を用いずに符号化／復号する際の冗長性を削減する仕組みを導入する。 That is, as described above, the MVC method does not have a special mechanism for encoding without using inter-view prediction, and even when encoding without using inter-view prediction, the MVC method uses inter-view-point information as viewpoint-dependent information. It is necessary to encode the number of viewpoints used for prediction for each viewpoint. Therefore, as shown in FIG. 11, a number obtained by multiplying the number of viewpoints by 4 (in this case, 32) becomes a continuous bit string, and is redundant. It becomes. Therefore, the present invention introduces a mechanism for reducing redundancy when encoding / decoding without using inter-view prediction in the MVC method.

次に、図１の多視点画像符号化装置で符号化することにより生成される符号化ビット列のシンタックス構造について説明する。図１２は図１の多視点画像符号化装置で符号化する符号化ビット列のＳＰＳにおけるＭＶＣ拡張部分のシンタックス構造を示す図である。従来例の図２８のシンタックス構造と比較すると、１ビットのシンタックス要素「inter_view_pred_flag」が追加されており、「inter_view_pred_flag」の値に応じて、視点依存情報であるシンタックス要素「num_anchor_refs_l0[i]」、「anchor_ref_l0[i][j]」、「num_anchor_refs_l1[i]」、「anchor_ref_l1[i][j]」、「num_non_anchor_refs_l0[i]」、「non_anchor_ref_l0[i][j]」、「num_non_anchor_refs_l1[i]」、「non_anchor_ref_l1[i][j]」を符号化するか否かを決定する構造になっている点が異なる。 Next, a syntax structure of an encoded bit string generated by encoding with the multi-view image encoding device in FIG. 1 will be described. FIG. 12 is a diagram showing a syntax structure of an MVC extension portion in SPS of an encoded bit string to be encoded by the multi-view image encoding device of FIG. Compared with the syntax structure of FIG. 28 of the conventional example, a 1-bit syntax element “inter_view_pred_flag” is added, and the syntax element “num_anchor_refs_l0 [i] which is view-dependent information is added according to the value of“ inter_view_pred_flag ”. , "Anchor_ref_l0 [i] [j]", "num_anchor_refs_l1 [i]", "anchor_ref_l1 [i] [j]", "num_non_anchor_refs_l0 [i]", "non_anchor_ref_l0 [i] [j]", "num_non_ch ] ”And“ non_anchor_ref_l1 [i] [j] ”are different in that the structure is determined.

上記のシンタックス要素「inter_view_pred_flag」は、多視点画像を符号化する際に、各視点の画像信号の符号化において他の視点の復号画像信号を参照して符号化する画像があるか否かを示す情報であり、視点間予測を用いるか否かを示す１ビットの２値のフラグである。このシンタックス要素「inter_view_pred_flag」の値が「１」の場合、視点間予測を用いて符号化されていることを示す。 The syntax element “inter_view_pred_flag” indicates whether or not there is an image to be encoded with reference to a decoded image signal of another viewpoint in encoding of the image signal of each viewpoint when encoding a multi-view image. This is information indicating a 1-bit binary flag indicating whether or not inter-view prediction is used. When the value of this syntax element “inter_view_pred_flag” is “1”, it indicates that encoding is performed using inter-view prediction.

この場合、従来と同様に、視点依存情報であるシンタックス要素が符号化される。すなわち、「num_anchor_refs_l0[i]」、「num_anchor_refs_l1[i]」、「num_non_anchor_refs_l0[i]」、「num_non_anchor_refs_l1[i]」がそれぞれ符号化され、それぞれの値が「１」以上の場合は「anchor_ref_l0[i][j]」、「anchor_ref_l1[i][j]」、「non_anchor_ref_l0[i][j]」、「non_anchor_ref_l1[i][j]」も符号化される。 In this case, a syntax element that is view-dependent information is encoded as in the conventional case. In other words, “num_anchor_refs_l0 [i]”, “num_anchor_refs_l1 [i]”, “num_non_anchor_refs_l0 [i]”, “num_non_anchor_refs_l1 [i]” are encoded, and if each value is “1” or more, “anchor_ref_l ] [j] "," anchor_ref_l1 [i] [j] "," non_anchor_ref_l0 [i] [j] ", and" non_anchor_ref_l1 [i] [j] "are also encoded.

一方、上記のシンタックス要素「inter_view_pred_flag」の値が「０」の場合、視点間予測を用いずに符号化されていることを示す。その場合、視点依存情報であるシンタックス要素は符号化されない。すなわち、すべての視点の「num_anchor_refs_l0[i]」、「num_anchor_refs_l1[i]」、「num_non_anchor_refs_l0[i]」、「num_non_anchor_refs_l1[i]」の値を「０」とみなす。 On the other hand, when the value of the syntax element “inter_view_pred_flag” is “0”, it indicates that encoding is performed without using inter-view prediction. In this case, the syntax element that is the view-dependent information is not encoded. That is, the values of “num_anchor_refs_l0 [i]”, “num_anchor_refs_l1 [i]”, “num_non_anchor_refs_l0 [i]”, and “num_non_anchor_refs_l1 [i]” of all viewpoints are regarded as “0”.

図１３は、図１２に示すシンタックス構造に従って、８視点の多視点画像を図１０に示す参照依存関係のように視点間予測を用いずに符号化する際のＳＰＳのＭＶＣ拡張部分の各シンタックス要素とその値の一例を示す。多視点画像を視点間予測を用いずに符号化する場合に関しては、ＭＶＣ方式で定義されている図２８に示すシンタックス構造に従えば、視点依存情報として各視点の視点間予測に用いる視点の数を視点毎に符号化する必要があり、そのために図１１に示すように視点数に「４」を乗じた数の“１”が連続したビット列となり、冗長となる。 FIG. 13 shows each syntax of the MVC extension part of SPS when encoding multi-view images of 8 viewpoints without using inter-view prediction like the reference dependency shown in FIG. 10 according to the syntax structure shown in FIG. An example of a tax element and its value is shown. In the case of encoding a multi-viewpoint image without using inter-view prediction, according to the syntax structure shown in FIG. 28 defined in the MVC method, the viewpoint used for the inter-view prediction of each viewpoint as the view-dependent information. It is necessary to encode the number for each viewpoint. For this reason, as shown in FIG. 11, a number “1” obtained by multiplying the number of viewpoints by “4” becomes a continuous bit string, which is redundant.

これに対し、図１２に示すシンタックス構造に従って視点間予測を用いずに符号化した場合は、視点依存情報は図１３に示すように１ビットのシンタックス要素「inter_view_pred_flag」のフラグのみで代用することができる。従って、図１２に示すシンタックス構造によれば、生成される符号化ビット列の符号量が大きく削減され、視点依存情報を符号化／復号することなく済ませることができ、洗練されたものとなる。 On the other hand, when encoding is performed without using inter-view prediction according to the syntax structure shown in FIG. 12, only the 1-bit syntax element “inter_view_pred_flag” flag is substituted for the view-dependent information as shown in FIG. be able to. Therefore, according to the syntax structure shown in FIG. 12, the code amount of the generated encoded bit string is greatly reduced, and it is possible to eliminate the need to encode / decode the view-dependent information, which is refined.

次に、図１の多視点画像符号化装置の動作について説明する。図１において、まず、符号化管理部１０１は、外部から設定された符号化パラメータをもとに、必要に応じて新たにパラメータを計算し、シーケンス全体に関連するパラメータ情報（ＳＰＳ）、ピクチャに関連するパラメータ情報（ＰＰＳ）、ピクチャのスライスに関連するヘッダ情報（スライスヘッダ）等を含む符号化に関する管理を行う。さらに、符号化管理部１０１は撮影／表示時間順に入力された視点画像Ｍ（０）、Ｍ（１）、Ｍ（２）、・・・を構成する各符号化対象画像の参照依存関係、符号化／復号順序を管理する。 Next, the operation of the multi-view image encoding device in FIG. 1 will be described. In FIG. 1, first, the encoding management unit 101 calculates a new parameter as necessary based on an encoding parameter set from the outside, and sets parameter information (SPS) related to the entire sequence and a picture. Management related to coding including related parameter information (PPS), header information related to a slice of a picture (slice header), and the like is performed. Further, the encoding management unit 101 includes reference dependency relationships and codes of the encoding target images constituting the viewpoint images M (0), M (1), M (2),. Manage the decoding / decoding order.

参照依存関係については、視点単位で他の視点の復号画像を参照するか否かを管理するとともに、ピクチャまたはスライス単位で、符号化対象画像を符号化する際に他の視点の復号画像を参照画像として用いる視点間予測（視差補償予測）を行うか否か、符号化対象画像を符号化後に復号して得られる復号画像が他の視点の符号化対象画像を符号化する際に参照画像として用いられるか否か、複数ある参照画像の候補の中からどの参照画像を参照するかについて管理する。また、符号化／復号順序については、前記参照依存関係において、復号側で、復号する符号化ビット列の画像が参照する参照画像が復号された後に復号を開始できるように符号化／復号順序を管理する。 Regarding the reference dependency, it is managed whether or not to refer to the decoded image of another viewpoint in units of viewpoints, and when the encoding target image is encoded in units of pictures or slices, the decoded images of other viewpoints are referred to Whether to perform inter-view prediction (disparity compensation prediction) to be used as an image, or a decoded image obtained by decoding an encoding target image after encoding an encoding target image of another viewpoint as a reference image It is managed whether or not it is used and which reference image is referred to from among a plurality of reference image candidates. As for the encoding / decoding order, the encoding / decoding order is managed so that the decoding side can start decoding after the reference image referenced by the image of the encoded bit string to be decoded is decoded, in the reference dependency relationship. To do.

次に、シーケンス情報符号化部１０２は、符号化管理部１０１で管理されるシーケンス全体に関連するパラメータ情報（ＳＰＳ）を符号化する。ここでは、図１２に示すシンタックス構造に従ってＳＰＳのＭＶＣ拡張部分も符号化する。 Next, the sequence information encoding unit 102 encodes parameter information (SPS) related to the entire sequence managed by the encoding management unit 101. Here, the MVC extension part of SPS is also encoded according to the syntax structure shown in FIG.

図２はシーケンス情報符号化部１０２の一例のブロック図を示す。図２に示すように、シーケンス情報符号化部１０２は、シーケンス情報符号化部２０１、視点数情報符号化部２０２、符号化順序情報符号化部２０３、視点間予測情報符号化部２０４、及び視点依存情報符号化部２０５から構成される。シーケンス情報符号化部２０１は、ＭＶＣ拡張部分以外のシーケンス情報、即ちＡＶＣ／Ｈ.２６４方式でのＳＰＳ（シーケンス・パラメータ・セット）を符号化する。 FIG. 2 is a block diagram illustrating an example of the sequence information encoding unit 102. As illustrated in FIG. 2, the sequence information encoding unit 102 includes a sequence information encoding unit 201, a number-of-views information encoding unit 202, an encoding order information encoding unit 203, an inter-view prediction information encoding unit 204, and a viewpoint. The dependency information encoding unit 205 is configured. The sequence information encoding unit 201 encodes sequence information other than the MVC extension part, that is, SPS (sequence parameter set) in the AVC / H.264 system.

一方、視点数情報符号化部２０２、符号化順序情報符号化部２０３、視点間予測情報符号化部２０４、及び視点依存情報符号化部２０５は、図１２に示すシンタックス構造に従ってシーケンス全体に関連する情報（ＳＰＳ）のＭＶＣ拡張部分を符号化する。まず、視点数情報符号化部２０２は、視点数の情報としてシンタックス要素「num_views_minus1」を符号化する。次に、符号化順序情報符号化部２０３は、視点方向の符号化／復号順序の情報としてシンタックス要素「view_id[i]」を視点方向の符号化／復号順序で符号化する。 On the other hand, the view number information encoding unit 202, the encoding order information encoding unit 203, the inter-view prediction information encoding unit 204, and the view dependent information encoding unit 205 are related to the entire sequence according to the syntax structure shown in FIG. The MVC extension part of the information (SPS) to be encoded is encoded. First, the viewpoint number information encoding unit 202 encodes the syntax element “num_views_minus1” as the number of viewpoints information. Next, the encoding order information encoding unit 203 encodes the syntax element “view_id [i]” in the encoding / decoding order in the viewing direction as information on the encoding / decoding order in the viewing direction.

次に、視点間予測情報符号化部２０４は、視点間予測を用いて符号化するかどうかを示す視点間予測情報として１ビットのシンタックス要素「inter_view_pred_flag」を符号化する。さらに、視点間予測を用いて符号化する場合は、視点依存情報符号化部２０５で視点依存情報として前述したシンタックス要素「num_anchor_refs_l0[i]」、「anchor_ref_l0[i][j]」、「num_anchor_refs_l1[i]」、「anchor_ref_l1[i][j]」、「num_non_anchor_refs_l0[i]」、「non_anchor_ref_l0[i][j]」、「num_non_anchor_refs_l1[i]」、「non_anchor_ref_l1[i]
[j]」を符号化する。 Next, the inter-view prediction information encoding unit 204 encodes a 1-bit syntax element “inter_view_pred_flag” as inter-view prediction information indicating whether to encode using inter-view prediction. Further, when encoding using inter-view prediction, the syntax elements “num_anchor_refs_l0 [i]”, “anchor_ref_l0 [i] [j]”, “num_anchor_refs_l1” described above as the view dependency information in the view dependency information encoding unit 205 are used. [i] "," anchor_ref_l1 [i] [j] "," num_non_anchor_refs_l0 [i] "," non_anchor_ref_l0 [i] [j] "," num_non_anchor_refs_l1 [i] "," non_anchor_ref_l1 [i]
[j] "is encoded.

再び図１に戻って説明する。ピクチャ情報符号化部１０３は、符号化管理部１０１で管理されるピクチャに関連する情報（ＰＰＳ）を符号化する。また、画像信号符号化部１０４は、符号化管理部１０１で管理されるスライスに関連する情報（スライスヘッダ）及び供給される符号化対象の画像信号をスライス単位で符号化する。画像信号を符号化する際には視点間予測を用いることもあるが、その際には前記視点依存情報に基づいて視点間予測の参照画像を選択する。 Returning again to FIG. The picture information encoding unit 103 encodes information (PPS) related to a picture managed by the encoding management unit 101. Also, the image signal encoding unit 104 encodes information (slice header) related to the slice managed by the encoding management unit 101 and the supplied encoding target image signal in units of slices. When encoding an image signal, inter-view prediction may be used. In this case, a reference image for inter-view prediction is selected based on the viewpoint dependency information.

多重化部１０５は、シーケンス情報符号化部１０２で符号化して得られたシーケンス情報の符号化ビット列と、ピクチャ情報符号化部１０３で符号化して得られたピクチャ情報の符号化ビット列と、画像信号符号化部１０４で符号化して得られたスライス情報及び画像信号の符号化ビット列とをそれぞれ多重化し、多視点画像の符号化ビット列とする。 The multiplexing unit 105 includes an encoded bit sequence of sequence information obtained by encoding by the sequence information encoding unit 102, an encoded bit sequence of picture information obtained by encoding by the picture information encoding unit 103, and an image signal The slice information obtained by encoding by the encoding unit 104 and the encoded bit sequence of the image signal are multiplexed to form an encoded bit sequence of the multi-view image.

次に、図１に示した多視点画像符号化装置による多視点画像符号化処理手順について、図３のフローチャートを参照して説明する。各ステップの処理動作については図１、及び図２のブロック図を用いて説明したものと同じであるので、ここでは図１、及び図２と対応付けることで、処理手順のみを説明する。 Next, the multi-view image encoding processing procedure by the multi-view image encoding device shown in FIG. 1 will be described with reference to the flowchart of FIG. Since the processing operation of each step is the same as that described with reference to the block diagrams of FIG. 1 and FIG. 2, only the processing procedure will be described here in association with FIG. 1 and FIG.

まず、シーケンス全体の符号化に関わるパラメータ情報を符号化し、シーケンス全体の符号化に関わるパラメータ情報の符号化ビット列を生成する（ステップＳ１０１）。このステップＳ１０１の処理は、図１の多視点画像符号化装置ではシーケンス情報符号化部１０２での符号化動作に相当する。 First, parameter information related to encoding of the entire sequence is encoded, and an encoded bit string of parameter information related to encoding of the entire sequence is generated (step S101). The processing in step S101 corresponds to the encoding operation in the sequence information encoding unit 102 in the multi-viewpoint image encoding device in FIG.

この、ステップＳ１０１のシーケンス情報の符号化処理手順の一例について図４のフローチャートと共に更に詳細に説明する。まず、シーケンス情報符号化部１０２は、ＭＶＣ拡張部分以外のシーケンス情報を符号化する（ステップＳ１１１）。このステップＳ１１１の処理は、図２のシーケンス情報符号化部１０２では、ＭＶＣ拡張部分以外のシーケンス情報符号化部２０１での符号化動作に相当する。 An example of the sequence information encoding process procedure in step S101 will be described in more detail with reference to the flowchart of FIG. First, the sequence information encoding unit 102 encodes sequence information other than the MVC extension portion (step S111). The processing in step S111 corresponds to the encoding operation in the sequence information encoding unit 201 other than the MVC extension portion in the sequence information encoding unit 102 in FIG.

続いて、視点数の情報を符号化する（ステップＳ１１２）。このステップＳ１１２の処理は、図２のシーケンス情報符号化部１０２では、視点数情報符号化部２０２での符号化動作に相当する。続いて、視点方向の符号化／復号順序で各視点の視点ＩＤの情報を符号化する（ステップＳ１１３）。このステップＳ１１３の処理は、図２のシーケンス情報符号化部１０２では、符号化順序情報符号化部２０３での符号化動作に相当する。 Subsequently, information on the number of viewpoints is encoded (step S112). The processing in step S112 corresponds to the encoding operation in the viewpoint number information encoding unit 202 in the sequence information encoding unit 102 in FIG. Subsequently, the viewpoint ID information of each viewpoint is encoded in the encoding / decoding order in the viewpoint direction (step S113). The processing in step S113 corresponds to the encoding operation in the encoding order information encoding unit 203 in the sequence information encoding unit 102 in FIG.

この、ステップＳ１１３の視点方向の符号化／復号順序での視点ＩＤの符号化処理手順の一例について図５のフローチャートと共に更に詳細に説明する。まず、変数iを０とする（ステップＳ１２１）。続いて、変数iの値が（視点数−１）以下かどうかを判断する（ステップＳ１２２）。変数iの値が（視点数−１）以下でない場合、符号化処理を終了する。変数iの値が（視点数−１）以下の場合、ステップＳ１２３に進み、変数iの値が（視点数−１）以下でなくなるまで、ステップＳ１２３とステップＳ１２４の処理を繰り返す。ステップＳ１２３では、視点方向の符号化／復号順序でi番目の視点のシンタックス要素「view_id[i]」を符号化する。続いて、ステップＳ１２４では、変数iに「１」を加えて再びステップＳ１２２に進む。 An example of the viewpoint ID encoding process procedure in the viewpoint direction encoding / decoding order in step S113 will be described in more detail with reference to the flowchart of FIG. First, the variable i is set to 0 (step S121). Subsequently, it is determined whether or not the value of the variable i is equal to or less than (number of viewpoints-1) (step S122). If the value of the variable i is not less than (number of viewpoints−1), the encoding process is terminated. If the value of the variable i is equal to or less than (number of viewpoints−1), the process proceeds to step S123, and the processes of step S123 and step S124 are repeated until the value of the variable i is not equal to or less than (number of viewpoints−1). In step S123, the syntax element “view_id [i]” of the i-th viewpoint is encoded in the encoding / decoding order of the viewpoint direction. Subsequently, in step S124, “1” is added to the variable i, and the process proceeds again to step S122.

再び、図４のフローチャートに戻って説明する。上記のステップＳ１１３の処理に続いて、ステップＳ１１４では、視点間予測を用いて符号化するかどうかを示す情報を符号化する。このステップＳ１１４の処理は、図２のシーケンス情報符号化部１０２では、視点間予測情報符号化部２０４での符号化動作に相当する。続いて、視点間予測を用いて符号化するかどうかを判断し（ステップＳ１１５）、視点間予測を用いて符号化する場合は、ステップＳ１１６により視点依存情報を符号化し、視点間予測を用いずに符号化する場合は、視点依存情報を符号化せずにシーケンス情報の符号化処理を終了する。このステップＳ１１６の処理は、図２のシーケンス情報符号化部１０２では、視点依存情報符号化部２０５での符号化動作に相当する。 Returning to the flowchart of FIG. Subsequent to the process of step S113 described above, in step S114, information indicating whether to encode using inter-view prediction is encoded. The processing in step S114 corresponds to the encoding operation in the inter-view prediction information encoding unit 204 in the sequence information encoding unit 102 in FIG. Subsequently, it is determined whether or not encoding is performed using inter-view prediction (step S115). When encoding is performed using inter-view prediction, view-dependent information is encoded in step S116 without using inter-view prediction. In the case of encoding, the sequence information encoding process is terminated without encoding the view-dependent information. The processing in step S116 corresponds to the encoding operation in the view-dependent information encoding unit 205 in the sequence information encoding unit 102 in FIG.

この、ステップＳ１１６の視点依存情報の符号化処理手順の一例について図６のフローチャートと共に更に詳細に説明する。ステップＳ１１６の視点依存情報の符号化処理では、アンカーピクチャの視点依存情報を符号化した後（ステップＳ１３１）、ノンアンカーピクチャの視点依存情報を符号化する（ステップＳ１３２）。このステップＳ１３２の処理が完了したら図６の視点依存情報の符号化処理は終了である。 An example of the processing procedure for encoding viewpoint-dependent information in step S116 will be described in more detail with reference to the flowchart of FIG. In the encoding process of the view dependency information in step S116, after the view dependency information of the anchor picture is encoded (step S131), the view dependency information of the non-anchor picture is encoded (step S132). When the process in step S132 is completed, the view-dependent information encoding process in FIG. 6 is completed.

上記のステップＳ１３１のアンカーピクチャの視点依存情報の符号化処理手順の一例について図７のフローチャートと共に更に詳細に説明する。まず、変数iを０とし（ステップＳ１４１）、続いて、変数iの値が（視点数−１）以下かどうかを判断する（ステップＳ１４２）。変数iの値が（視点数−１）以下でない場合、アンカーピクチャの視点依存情報の符号化処理を終了する。変数iの値が（視点数−１）以下の場合、ステップＳ１４３に進み、変数iの値が（視点数−１）以下でなくなるまで、ステップＳ１４３からステップＳ１５３までの処理を繰り返す。 An example of the processing procedure for encoding the viewpoint dependent information of the anchor picture in step S131 will be described in more detail with reference to the flowchart of FIG. First, the variable i is set to 0 (step S141), and then it is determined whether the value of the variable i is equal to or less than (number of viewpoints−1) (step S142). If the value of the variable i is not less than or equal to (number of viewpoints −1), the encoding process of the anchor-picture viewpoint-dependent information ends. If the value of the variable i is equal to or less than (number of viewpoints−1), the process proceeds to step S143, and the processing from step S143 to step S153 is repeated until the value of the variable i is not equal to or less than (number of viewpoints−1).

ステップＳ１４３では、視点方向の符号化／復号順序でi番目の視点のシンタックス要素「num_anchor_refs_l0[i]」を符号化する。続いて、ステップＳ１４４では、変数ｊを「０」とする。続いて、ステップＳ１４５では変数ｊの値が「num_anchor_refs_l0[i]」より小さいかどうかを判断し、変数ｊの値が「num_anchor_refs_l0[i]」の値以上の場合、ステップＳ１４８に進む。一方、変数ｊの値が「num_anchor_refs_l0[i]」の値より小さい場合、変数ｊの値が「num_anchor_refs_l0[i]」の値以上になるまで、ステップＳ１４５からステップＳ１４７までの処理を繰り返す。ステップＳ１４６では視点方向の符号化／復号順序でi番目の視点の参照画像リスト０のインデックスがｊのシンタックス要素「anchor_ref_l0[i][j]」を符号化してステップＳ１４７に進む。ステップＳ１４７では変数ｊに「１」を加えて再びステップＳ１４５に進む。 In step S143, the syntax element “num_anchor_refs_l0 [i]” of the i-th viewpoint is encoded in the encoding / decoding order of the viewpoint direction. Subsequently, in step S144, the variable j is set to “0”. Subsequently, in step S145, it is determined whether the value of the variable j is smaller than “num_anchor_refs_l0 [i]”. If the value of the variable j is equal to or greater than the value of “num_anchor_refs_l0 [i]”, the process proceeds to step S148. On the other hand, when the value of the variable j is smaller than the value of “num_anchor_refs_l0 [i]”, the processing from step S145 to step S147 is repeated until the value of the variable j becomes equal to or greater than the value of “num_anchor_refs_l0 [i]”. In step S146, the syntax element “anchor_ref_l0 [i] [j]” in which the index of the reference image list 0 of the i-th view is j in the encoding / decoding order in the view direction is encoded, and the process proceeds to step S147. In step S147, “1” is added to the variable j, and the process proceeds again to step S145.

上記のステップＳ１４８では、視点方向の符号化／復号順序でi番目の視点のシンタックス要素「num_anchor_refs_l1[i]」を符号化する。続くステップＳ１４９では、変数ｊを「０」とする。続くステップＳ１５０では変数ｊの値が「num_anchor_refs_l1[i]」より小さいかどうかを判断する。変数ｊの値が「num_anchor_refs_l1[i]」以上の場合、ステップＳ１５３に進み、変数ｉに「１」を加えて再びステップＳ１４２に進む。一方、変数ｊの値が「num_anchor_refs_l1[i]」より小さい場合、ｊの値がnum_anchor_refs_l1[i]以上になるまで、ステップＳ１５０からステップＳ１５２までの処理を繰り返す。ステップＳ１５１では、視点方向の符号化／復号順序でi番目の視点の参照画像リスト１のインデックスがｊのシンタックス要素「anchor_ref_l1[i][j]」を符号化してステップＳ１５２に進む。ステップＳ１５２では、変数ｊに「１」を加えて再びステップＳ１５０に進む。 In step S148 described above, the syntax element “num_anchor_refs_l1 [i]” of the i-th viewpoint is encoded in the encoding / decoding order in the viewpoint direction. In the subsequent step S149, the variable j is set to “0”. In a succeeding step S150, it is determined whether or not the value of the variable j is smaller than “num_anchor_refs_l1 [i]”. When the value of the variable j is “num_anchor_refs_l1 [i]” or more, the process proceeds to step S153, “1” is added to the variable i, and the process proceeds to step S142 again. On the other hand, when the value of the variable j is smaller than “num_anchor_refs_l1 [i]”, the processing from step S150 to step S152 is repeated until the value of j becomes num_anchor_refs_l1 [i] or more. In step S151, the syntax element “anchor_ref_l1 [i] [j]” whose index is j in the reference image list 1 of the i-th viewpoint in the encoding / decoding order in the viewpoint direction is encoded, and the process proceeds to step S152. In step S152, “1” is added to the variable j, and the process proceeds again to step S150.

次に、図６のステップＳ１３２のノンアンカーピクチャの視点依存情報の符号化処理手順の一例について図８のフローチャートと共に更に詳細に説明する。まず、変数iを「０」とした後（ステップＳ１５４）、変数iの値が（視点数−１）以下かどうかを判断する（ステップＳ１５５）。変数iの値が（視点数−１）以下でない場合、ノンアンカーピクチャの視点依存情報の符号化処理を終了する。変数iの値が（視点数−１）以下の場合、ステップＳ１５６に進み、変数iの値が（視点数−１）以下でなくなるまで、ステップＳ１５５からステップＳ１６６までの処理を繰り返す。 Next, an example of the processing procedure for encoding the viewpoint dependent information of the non-anchor picture in step S132 of FIG. 6 will be described in more detail with reference to the flowchart of FIG. First, after setting the variable i to “0” (step S154), it is determined whether the value of the variable i is equal to or less than (number of viewpoints−1) (step S155). When the value of the variable i is not less than (number of viewpoints −1), the encoding process of the viewpoint dependent information of the non-anchor picture is ended. When the value of the variable i is (number of viewpoints-1) or less, the process proceeds to step S156, and the processing from step S155 to step S166 is repeated until the value of the variable i is not less than (number of viewpoints-1).

ステップＳ１５６では、視点方向の符号化／復号順序でi番目の視点のシンタックス要素「num_non_anchor_refs_l0[i]」を符号化する。続くステップＳ１５７で変数ｊの値を「０」とした後、ステップＳ１５８で変数ｊの値が視点方向の符号化／復号順序でi番目の視点のシンタックス要素「num_non_anchor_refs_l0[i]」より小さいかどうかを判断する。変数ｊの値が上記のシンタックス要素「num_non_anchor_refs_l0[i]」以上の場合、ステップＳ１６１に進み、変数ｊの値が上記のシンタックス要素「num_non_anchor_refs_l0[i]」より小さい場合、ステップＳ１５９に進み、変数ｊの値が「num_non_anchor_refs_l0[i]」以上になるまで、ステップＳ１５８からステップＳ１６０までの処理を繰り返す。 In step S156, the syntax element “num_non_anchor_refs_l0 [i]” of the i-th viewpoint is encoded in the encoding / decoding order of the viewpoint direction. In the subsequent step S157, the value of the variable j is set to “0”, and in step S158, the value of the variable j is smaller than the syntax element “num_non_anchor_refs_l0 [i]” of the i-th viewpoint in the encoding / decoding order in the viewpoint direction. Judge whether. If the value of the variable j is greater than or equal to the syntax element “num_non_anchor_refs_l0 [i]”, the process proceeds to step S161. If the value of the variable j is smaller than the syntax element “num_non_anchor_refs_l0 [i]”, the process proceeds to step S159. Until the value of the variable j becomes “num_non_anchor_refs_l0 [i]” or more, the processing from step S158 to step S160 is repeated.

ステップＳ１５９では視点方向の符号化／復号順序でi番目の視点の参照画像リスト０のインデックスがｊのシンタックス要素「anchor_ref_l0[i][j]」を符号化して、ステップＳ１６０に進む。ステップＳ１６０では変数ｊに「１」を加えて再びステップＳ１５８に進む。 In step S159, the syntax element “anchor_ref_l0 [i] [j]” in which the index of the reference image list 0 of the i-th viewpoint is j in the encoding / decoding order in the viewpoint direction is encoded, and the process proceeds to step S160. In step S160, “1” is added to the variable j and the process proceeds again to step S158.

ステップＳ１６１では、視点方向の符号化／復号順序でi番目の視点のシンタックス要素「num_non_anchor_refs_l1[i]」を符号化する。続くステップＳ１６２で変数ｊを０とし、続くステップＳ１６３では変数ｊの値が上記の視点方向の符号化／復号順序でi番目の視点のシンタックス要素「num_non_anchor_refs_l1[i]」より小さいかどうかを判断する。変数ｊの値がシンタックス要素「num_non_anchor_refs_l1[i]」以上の場合、ステップＳ１６６に進み変数ｉに「１」を加えてステップＳ１５５に戻る。 In step S161, the syntax element “num_non_anchor_refs_l1 [i]” of the i-th viewpoint is encoded in the encoding / decoding order in the viewpoint direction. In the subsequent step S162, the variable j is set to 0, and in the subsequent step S163, it is determined whether or not the value of the variable j is smaller than the syntax element “num_non_anchor_refs_l1 [i]” of the i-th view in the encoding / decoding order in the view direction. To do. If the value of the variable j is greater than or equal to the syntax element “num_non_anchor_refs_l1 [i]”, the process proceeds to step S166, “1” is added to the variable i, and the process returns to step S155.

一方、変数ｊの値がシンタックス要素「num_non_anchor_refs_l1[i]」より小さい場合、ステップＳ１６４に進み、変数ｊの値がシンタックス要素「num_non_anchor_refs_l1[i]」の値以上になるまで、ステップＳ１６３からステップＳ１６５までの処理を繰り返す。ステップＳ１６４では視点方向の符号化／復号順序でi番目の視点の参照画像リスト１のインデックスがｊのシンタックス要素「non_anchor_ref_l1[i][j]」を符号化して、ステップＳ１６５に進む。ステップＳ１６５では変数ｊに「１」を加えて再びステップＳ１６３に進む。 On the other hand, when the value of the variable j is smaller than the syntax element “num_non_anchor_refs_l1 [i]”, the process proceeds to step S164, and from step S163 until the value of the variable j becomes equal to or larger than the value of the syntax element “num_non_anchor_refs_l1 [i]”. The process up to S165 is repeated. In step S164, the syntax element “non_anchor_ref_l1 [i] [j]” in which the index of the reference image list 1 of the i-th viewpoint is j in the encoding / decoding order in the viewpoint direction is encoded, and the process proceeds to step S165. In step S165, “1” is added to the variable j, and the process proceeds again to step S163.

再び、図４のフローチャートに戻って説明する。ステップＳ１１６の処理が完了したら図４のシーケンス情報の符号化処理は終了である。 Returning to the flowchart of FIG. When the process of step S116 is completed, the sequence information encoding process of FIG. 4 is completed.

再び、図３のフローチャートに戻って説明する。上記の図４乃至図８のフローチャートと共に説明したステップＳ１０１の処理が完了すると、ステップＳ１０２に進む。ステップＳ１０２では、シーケンス全体の符号化に関わるパラメータ情報の符号化ビット列を多重化し、多重化された符号化ビット列を得る。このステップＳ１０２の処理は、図１の多視点画像符号化装置では多重化部１０５での多重化動作に相当する。 Returning to the flowchart of FIG. When the process of step S101 described with the flowcharts of FIGS. 4 to 8 is completed, the process proceeds to step S102. In step S102, the encoded bit string of the parameter information related to the encoding of the entire sequence is multiplexed to obtain a multiplexed encoded bit string. The processing in step S102 corresponds to the multiplexing operation in the multiplexing unit 105 in the multi-view image encoding device in FIG.

次のステップＳ１０３では、ピクチャの符号化に関わるパラメータ情報等を符号化し、ピクチャの符号化に係わるパラメータ情報の符号化ビット列を生成する。このステップＳ１０３の処理は、図１の多視点画像符号化装置ではピクチャ情報符号化部１０３での符号化動作に相当する。 In the next step S103, parameter information related to picture encoding is encoded, and an encoded bit string of parameter information related to picture encoding is generated. The processing in step S103 corresponds to the encoding operation in the picture information encoding unit 103 in the multi-view image encoding apparatus in FIG.

続いて、ステップＳ１０４では、シーケンス全体の符号化に関わるパラメータ情報の符号化ビット列を多重化し、多重化された符号化ビット列を得る。このステップＳ１０４の処理は、図１の多視点画像符号化装置では多重化部１０５での多重化動作に相当する。 Subsequently, in step S104, the encoded bit string of the parameter information related to the encoding of the entire sequence is multiplexed to obtain a multiplexed encoded bit string. The processing in step S104 corresponds to the multiplexing operation in the multiplexing unit 105 in the multi-view image encoding device in FIG.

続いて、ステップＳ１０５では、スライス情報及び、画像信号を符号化する。このステップＳ１０５の処理は、図１の多視点画像符号化装置では画像信号符号化部１０４での処理動作に相当する。 Subsequently, in step S105, the slice information and the image signal are encoded. The processing in step S105 corresponds to the processing operation in the image signal encoding unit 104 in the multi-view image encoding device in FIG.

続いて、ステップＳ１０６では、ステップＳ１０２、ステップＳ１０４で多重化されたビット列に続いて、復号画像出力順番号ｏ、符号化モード、及び、動きベクトルまたは視差ベクトル、符号化残差信号等の符号化ビット列を必要に応じて一つの符号化ビット列、または複数の符号化ビット列に適宜多重化する。このステップＳ１０６の処理は、図１の多視点画像符号化装置では多重化部１０５での多重化動作に相当する。 Subsequently, in step S106, following the bit sequence multiplexed in step S102 and step S104, the decoded image output order number o, the encoding mode, and the motion vector or disparity vector, encoding residual signal, etc. are encoded. The bit string is appropriately multiplexed into one encoded bit string or a plurality of encoded bit strings as necessary. The processing in step S106 corresponds to the multiplexing operation in the multiplexing unit 105 in the multi-view image encoding device in FIG.

次に、ネットワークを介して伝送する場合の多重化部１０５での多重化及び送信処理手順について、図９のフローチャートを用いて説明する。図９において、多重化部１０５は、シーケンス情報の符号化ビット列と、ピクチャ情報の符号化ビット列と、スライス情報及び画像信号の符号化ビット列とをそれぞれ多重化したデータを、必要に応じてＭＰＥＧ−２システム方式、ＭＰ４ファイルフォーマット、ＲＴＰ等の規格に基づいてパケット化する（ステップＳ１７１）。続いて、多重化部１０５は、必要に応じてＭＰＥＧ−２システム方式、ＭＰ４ファイルフォーマット、ＲＴＰ等の規格に基づいてパケット・ヘッダを上記のパケットに付加した後（ステップＳ１７２）、ネットワークを介して送信する（ステップＳ１７３）。 Next, the multiplexing and transmission processing procedure in the multiplexing unit 105 when transmitting via a network will be described with reference to the flowchart of FIG. In FIG. 9, a multiplexing unit 105 multiplexes data obtained by multiplexing a coded bit sequence of sequence information, a coded bit sequence of picture information, and a coded bit sequence of slice information and an image signal, as necessary. Packetization is performed based on standards such as the 2-system method, the MP4 file format, and RTP (step S171). Subsequently, the multiplexing unit 105 adds a packet header to the packet based on standards such as MPEG-2 system, MP4 file format, RTP, etc. as necessary (step S172), and then via the network. Transmit (step S173).

再び図３に戻って説明する。ステップＳ１０７では、符号化の対象となる多視点画像の全ての画像について符号化処理が完了したか否かを判断する。完了している場合、本多視点画像符号化処理手順が終了となる。完了していない場合、ステップＳ１０５に進み、符号化の対象となる多視点画像の全ての画像について符号化処理が完了するまでステップＳ１０５からステップＳ１０６までの処理を繰り返す。 Returning to FIG. 3, the description will be continued. In step S107, it is determined whether or not the encoding process has been completed for all images of the multi-viewpoint image to be encoded. If completed, the multi-viewpoint image encoding processing procedure ends. If not completed, the process proceeds to step S105, and the process from step S105 to step S106 is repeated until the encoding process is completed for all the images of the multi-viewpoint image to be encoded.

（復号装置及び復号方法）
次に、本発明になる多視点画像復号方法及び多視点復号装置について図面を参照して説明する。 (Decoding device and decoding method)
Next, the multiview image decoding method and multiview decoding apparatus according to the present invention will be described with reference to the drawings.

図１４は本発明になる多視点画像復号装置の一実施の形態のブロック図を示す。図１４に示すように、本実施の形態の多視点画像復号装置は、分離部３０１、復号管理部３０２、シーケンス情報復号部３０３、ピクチャ情報復号部３０４、画像信号復号部３０５を備え、多視点画像信号を符号化した符号化ビット列が入力され、これを復号して多視点画像信号を出力する。 FIG. 14 shows a block diagram of an embodiment of a multi-viewpoint image decoding apparatus according to the present invention. As shown in FIG. 14, the multi-view image decoding apparatus according to the present embodiment includes a separation unit 301, a decoding management unit 302, a sequence information decoding unit 303, a picture information decoding unit 304, and an image signal decoding unit 305. An encoded bit string obtained by encoding an image signal is input, and this is decoded to output a multi-view image signal.

次に、図１４に示す多視点画像復号装置の動作について、ＡＶＣ／Ｈ.２６４符号化方式と関連付けて説明する。まず、分離部３０１は、図１に示した多視点画像符号化装置により符号化され、ネットワークを介して送信された符号化ビット列を受信する。なお、本方式での符号化ビット列の供給形態はネットワーク伝送での受信のみならず、ＤＶＤ等の蓄積メディアに記録された符号化ビット列を読み込んだり、ＢＳ／地上波等の放送で放映された符号化ビット列を受信することもできる。 Next, the operation of the multi-view image decoding apparatus shown in FIG. 14 will be described in association with the AVC / H.264 encoding method. First, the separation unit 301 receives an encoded bit string that is encoded by the multi-view image encoding apparatus illustrated in FIG. 1 and transmitted via the network. It should be noted that the encoded bit string supply form in this system is not only received via network transmission, but also a code bit string recorded on a storage medium such as a DVD, or a code broadcast on BS / terrestrial broadcasts. An encoded bit string can also be received.

また、分離部３０１は、供給される符号化ビット列からパケット・ヘッダを除去し、ＮＡＬユニット単位に分離する。更に、分離部３０１は、分離したＮＡＬユニットのヘッダ部に含まれるＮＡＬユニットの種類を見分ける識別子（nal_unit_type）を評価し、当該ＮＡＬユニットがシーケンス全体の符号化に関わるパラメータ情報が符号化されている符号化ビット列の場合は、シーケンス情報復号部３０３に供給し、ピクチャの符号化に関わるパラメータ情報等が符号化されている符号化ビット列の場合は、ピクチャ情報復号部３０４に供給し、当該ＮＡＬユニットがＶＣＬＮＡＬユニット、すなわち符号化モード、及び動き／視差ベクトル、符号化残差信号等が符号化されている符号化ビット列の場合は、画像信号復号部３０５に供給する。 Also, the separation unit 301 removes the packet header from the supplied encoded bit string, and separates the NAL unit. Further, the separation unit 301 evaluates an identifier (nal_unit_type) for identifying the type of the NAL unit included in the header part of the separated NAL unit, and parameter information related to the coding of the entire sequence is encoded by the NAL unit. In the case of an encoded bit sequence, it is supplied to the sequence information decoding unit 303, and in the case of an encoded bit sequence in which parameter information related to the encoding of a picture is encoded, it is supplied to the picture information decoding unit 304, and the NAL unit Is a coded bit string in which a coding mode, a motion / disparity vector, a coded residual signal, and the like are coded, is supplied to the image signal decoding unit 305.

シーケンス情報復号部３０３は、分離部３０１で分離されたシーケンス全体の符号化に関わるパラメータ情報（ＳＰＳ）が符号化された符号化ビット列を復号する。ここでは、図１２に示すシンタックス構造に従ってＳＰＳのＭＶＣ拡張部分も復号する。 The sequence information decoding unit 303 decodes a coded bit string in which parameter information (SPS) related to coding of the entire sequence separated by the separating unit 301 is coded. Here, the MVC extension part of SPS is also decoded according to the syntax structure shown in FIG.

図１５はシーケンス情報復号部３０３の一実施の形態の構成を示すブロック図である。図１５に示すように、シーケンス情報復号部３０３は、スイッチ４０６、シーケンス情報復号部４０１、視点数情報復号部４０２、復号順序情報復号部４０３、視点間予測情報復号部４０４、及び視点依存情報復号部４０５から構成される。スイッチ４０６は、図１２に示すシンタックス構造に応じて切り替わり、符号化ビット列を復号部４０１〜４０４に順次供給する。また、スイッチ４０６は、視点間予測情報復号部４０４により復号された視点間予測情報の値が「１」のときは符号化ビット列を視点依存情報復号部４０５に供給し、視点間予測情報の値が「０」のときは、符号化ビット列を視点依存情報復号部４０５には供給しない。 FIG. 15 is a block diagram showing a configuration of an embodiment of the sequence information decoding unit 303. As illustrated in FIG. 15, the sequence information decoding unit 303 includes a switch 406, a sequence information decoding unit 401, a number-of-views information decoding unit 402, a decoding order information decoding unit 403, an inter-view prediction information decoding unit 404, and a view-dependent information decoding. Part 405. The switch 406 switches according to the syntax structure shown in FIG. 12, and sequentially supplies the encoded bit string to the decoding units 401 to 404. In addition, when the value of the inter-view prediction information decoded by the inter-view prediction information decoding unit 404 is “1”, the switch 406 supplies the encoded bit string to the view-dependent information decoding unit 405, and the value of the inter-view prediction information Is “0”, the encoded bit string is not supplied to the view-dependent information decoding unit 405.

シーケンス情報復号部４０１は、ＭＶＣ拡張部分以外のシーケンス情報、すなわちＡＶＣ／Ｈ.２６４方式でのＳＰＳ（シーケンス・パラメータ・セット）を復号する。視点数情報復号部４０２、復号順序情報復号部４０３、視点間予測情報復号部４０４、及び視点依存情報復号部４０５は、図１２に示すシンタックス構造に従ってシーケンス全体に関連する情報（ＳＰＳ）のＭＶＣ拡張部分を復号する。まず、視点数情報復号部４０２が視点数の情報としてシンタックス要素「num_views_minus1」を復号する。次に、復号順序情報復号部４０３がシンタックス要素「view_id[i]」を順次復号する。「view_id_[i]」は符号化／復号順序で視点ＩＤが符号化されているので、どのような復号順序で各視点が符号化されているのかを知ることができる。 The sequence information decoding unit 401 decodes sequence information other than the MVC extension portion, that is, SPS (sequence parameter set) in the AVC / H.264 system. The number-of-views information decoding unit 402, the decoding order information decoding unit 403, the inter-view prediction information decoding unit 404, and the view-dependent information decoding unit 405 are MVC of information (SPS) related to the entire sequence according to the syntax structure shown in FIG. Decrypt the extension. First, the viewpoint number information decoding unit 402 decodes the syntax element “num_views_minus1” as the number of viewpoints information. Next, the decoding order information decoding unit 403 sequentially decodes the syntax element “view_id [i]”. Since “view_id_ [i]” has the viewpoint ID encoded in the encoding / decoding order, it is possible to know in what decoding order each viewpoint is encoded.

次に、視点間予測情報復号部４０４が視点間予測を用いて符号化されているかどうかを示す情報、即ち視点間予測を用いて復号するかどうかを示す視点間予測情報としてシンタックス要素「inter_view_pred_flag」を復号する。シンタックス要素「inter_view_pred_flag」の値により、次の視点依存情報復号部４０５で視点依存情報を復号するかどうかが決まる。シンタックス要素「inter_view_pred_flag」の値が「０」の場合、視点間予測を用いずに復号し、視点依存情報が符号化されていない。この場合、スイッチ４０６は視点依存情報復号部４０５に切り替わることはない。また、この場合、以降の復号処理においては、復号装置は全ての視点について他の視点を参照せずに復号できる視点であると判断して復号する。具体的にはすべての視点の「num_anchor_refs_l0[i]」、「num_anchor_refs_l1[i]」、「num_non_anchor_refs_l0[i]」、「num_non_anchor_refs_l1[i]」の値を「０」とする。 Next, the syntax element “inter_view_pred_flag” is used as information indicating whether the inter-view prediction information decoding unit 404 is encoded using inter-view prediction, that is, as inter-view prediction information indicating whether decoding is performed using inter-view prediction. "Is decrypted. The value of the syntax element “inter_view_pred_flag” determines whether or not the next viewpoint-dependent information decoding unit 405 decodes the viewpoint-dependent information. When the value of the syntax element “inter_view_pred_flag” is “0”, decoding is performed without using inter-view prediction, and view-dependent information is not encoded. In this case, the switch 406 does not switch to the viewpoint dependent information decoding unit 405. In this case, in the subsequent decoding process, the decoding apparatus determines that all viewpoints can be decoded without referring to other viewpoints, and performs decoding. Specifically, the values of “num_anchor_refs_l0 [i]”, “num_anchor_refs_l1 [i]”, “num_non_anchor_refs_l0 [i]”, and “num_non_anchor_refs_l1 [i]” for all viewpoints are set to “0”.

一方、シンタックス要素「inter_view_pred_flag」の値が「１」の場合、視点間予測を用いて復号するので、視点依存情報復号部４０５で視点依存情報としてシンタックス要素「num_anchor_refs_l0[i]」、「anchor_ref_l0[i][j]」、「num_anchor_refs_l1[i]」、「anchor_ref_l1[i][j]」、「num_non_anchor_refs_l0[i]」、「non_anchor_ref_l0[i][j]」、「num_non_anchor_refs_l1[i]」、「non_anchor_ref_l1[i][j]」を復号する。 On the other hand, when the value of the syntax element “inter_view_pred_flag” is “1”, decoding is performed using inter-view prediction. Therefore, the viewpoint dependent information decoding unit 405 uses syntax elements “num_anchor_refs_l0 [i]” and “anchor_ref_l0” as viewpoint dependent information. [i] [j] "," num_anchor_refs_l1 [i] "," anchor_ref_l1 [i] [j] "," num_non_anchor_refs_l0 [i] "," non_anchor_ref_l0 [i] [j] "," num_non_anchor_refs_l1 [i] " Non_anchor_ref_l1 [i] [j] ”is decoded.

再び、図１４に戻って説明する。シーケンス情報復号部３０３で復号されたシーケンス全体の管理情報は復号管理部３０２に供給され、復号の管理に用いられる。ピクチャ情報復号部３０４は、分離部３０１で分離されたピクチャの符号化に関わるパラメータ情報（ＰＰＳ）が符号化された符号化ビット列を復号し、復号したパラメータ情報（ＰＰＳ）をピクチャ管理情報として復号管理部３０２に供給し、復号の管理に用いる。 Again, referring back to FIG. The management information of the entire sequence decoded by the sequence information decoding unit 303 is supplied to the decoding management unit 302 and used for decoding management. The picture information decoding unit 304 decodes an encoded bit string in which parameter information (PPS) related to encoding of the picture separated by the separation unit 301 is encoded, and decodes the decoded parameter information (PPS) as picture management information. The data is supplied to the management unit 302 and used for decoding management.

画像信号復号部３０５は、復号管理部３０２から供給される視点数情報、復号順序情報、視点間予測情報、視点依存情報などの復号されたシーケンス情報に基づいて、分離部３０１から供給される復号対象の符号化ビット列（符号化データ）を復号して画像信号を得る。画像信号を復号する際には視点間予測を用いて復号することもあるが、その際には前記視点依存情報も用いて視点間予測の参照画像を決定する。 The image signal decoding unit 305 performs decoding supplied from the separation unit 301 based on decoded sequence information such as the number-of-views information, decoding order information, inter-view prediction information, and view dependency information supplied from the decoding management unit 302. A target encoded bit string (encoded data) is decoded to obtain an image signal. When decoding an image signal, it may be decoded using inter-view prediction, and in that case, a reference image for inter-view prediction is also determined using the view-dependent information.

このように、本実施の形態の多視点画像復号装置によれば、視点依存情報を復号することなく、１ビットの視点間予測情報「inter_view_pred_flag」の復号値だけに基づいて、復号対象の符号化ビット列（符号化データ）が視点間の予測を用いて符号化されているかどうかが分かり、また、視点間の予測を用いて符号化されていない場合には、視点毎に独立して復号させることができるので処理量を削減できる。更に、視点間予測をサポートしない多視点画像復号装置においては視点依存情報を復号する処理を実装する必要が無くなる。 Thus, according to the multi-view image decoding apparatus of the present embodiment, the decoding target encoding is performed based only on the decoded value of 1-bit inter-view prediction information “inter_view_pred_flag” without decoding the view-dependent information. It can be determined whether or not the bit string (encoded data) is encoded using prediction between viewpoints, and if it is not encoded using prediction between viewpoints, decoding is performed independently for each viewpoint. Can reduce the amount of processing. Furthermore, in a multi-view image decoding apparatus that does not support inter-view prediction, it is not necessary to implement a process for decoding view-dependent information.

次に、図１４に示した多視点画像復号装置による多視点画像復号処理手順について、図１６のフローチャートを参照して説明する。各ステップの処理動作については図１４及び図１５のブロック図を用いて説明したものと同じであるので、ここでは図１４及び図１５と対応付けることで、処理手順のみを説明する。 Next, the multi-view image decoding processing procedure by the multi-view image decoding apparatus shown in FIG. 14 will be described with reference to the flowchart of FIG. Since the processing operation of each step is the same as that described with reference to the block diagrams of FIGS. 14 and 15, only the processing procedure will be described here in association with FIGS.

まず、符号化された符号化ビット列をＮＡＬユニット単位に分離する（ステップＳ２０１）。このステップＳ２０１において、ネットワークを介して符号化ビット列を伝送する場合の受信及び分離処理手順について、図２２のフローチャートを用いて詳細に説明する。ステップＳ２０１の分離処理において、まず、ネットワークを介して符号化ビット列を受信し（ステップＳ２７１）、続いて、その受信した符号化ビット列に用いられたＭＰＥＧ−２システム方式、ＭＰ４ファイルフォーマット、ＲＴＰ等の規格に基づいて付加されたパケット・ヘッダを復号して除去する（ステップＳ２７２）。そして、ＮＡＬユニット単位で符号化ビット列を分離する（ステップＳ２７３）。 First, the encoded bit string that has been encoded is separated into NAL unit units (step S201). The reception and separation processing procedure in the case where the encoded bit string is transmitted via the network in step S201 will be described in detail with reference to the flowchart of FIG. In the separation processing in step S201, first, an encoded bit string is received via the network (step S271), and then the MPEG-2 system method, MP4 file format, RTP, etc. used for the received encoded bit string are received. The packet header added based on the standard is decoded and removed (step S272). Then, the encoded bit string is separated in units of NAL units (step S273).

再び、図１６に戻って説明する。図１６のステップＳ２０１で分離されたＮＡＬユニットのヘッダ部に含まれるＮＡＬユニットの種類を見分ける識別子（nal_unit_type）を評価し、当該ＮＡＬユニットがシーケンス全体の符号化に関わるパラメータ情報（ＳＰＳ）、すなわちシーケンス情報であるか否か判定し（ステップＳ２０２）、シーケンス情報の場合、ステップＳ２０５に進み、シーケンス情報ではなくピクチャ情報（ＰＰＳ）と判定された場合（ステップＳ２０３）、ステップＳ２０６に進む。 Again, referring back to FIG. The identifier (nal_unit_type) for identifying the type of the NAL unit included in the header part of the NAL unit separated in step S201 in FIG. 16 is evaluated, and the parameter information (SPS) related to the encoding of the entire sequence by the NAL unit, that is, the sequence It is determined whether or not the information is information (step S202). If it is sequence information, the process proceeds to step S205. If it is determined that the information is not picture information (PPS) but sequence information (step S203), the process proceeds to step S206.

また、当該ＮＡＬユニットがシーケンス情報でも、ピクチャ情報でもない場合は、ステップＳ２０４に進む。ステップＳ２０４では当該ＮＡＬユニットがＶＣＬＮＡＬユニットであるか、すなわち符号化モード、動きベクトルまたは視差ベクトル、符号化残差信号等が符号化されている符号化ビット列であるかを判定し、ＶＣＬＮＡＬユニットである場合、ステップＳ２０７に進む。これらのステップＳ２０１、Ｓ２０２、Ｓ２０３、Ｓ２０４の処理は、図１４の多視点画像復号装置では分離部３０１での処理動作に相当する。 If the NAL unit is neither sequence information nor picture information, the process proceeds to step S204. In step S204, it is determined whether the NAL unit is a VCL NAL unit, that is, a coding bit string in which a coding mode, a motion vector or a disparity vector, a coded residual signal, and the like are coded, and the VCL NAL unit. If YES, the process proceeds to step S207. The processes in steps S201, S202, S203, and S204 correspond to the processing operation in the separation unit 301 in the multi-viewpoint image decoding apparatus in FIG.

次に、ステップＳ２０５では、シーケンス全体の符号化に関わるパラメータ情報が符号化された符号化ビット列を復号し、シーケンス全体の符号化に関わるパラメータ情報を得る。このステップＳ２０５の処理は、図１４の多視点画像符号化装置ではシーケンス情報復号部３０３での復号動作に相当する。 Next, in step S205, a coded bit string in which parameter information related to coding of the entire sequence is coded is decoded to obtain parameter information related to coding of the whole sequence. The processing in step S205 corresponds to the decoding operation in the sequence information decoding unit 303 in the multi-view image encoding device in FIG.

この、ステップＳ２０５のシーケンス情報の復号処理手順の一例について図１７のフローチャートと共に更に詳細に説明する。シーケンス情報の復号処理では、まず、ＭＶＣ拡張部分以外のシーケンス情報を復号する（ステップＳ２１１）。このステップＳ２１１の処理は、図１５のシーケンス情報復号部３０３内のＭＶＣ拡張部分以外のシーケンス情報復号部４０１での復号動作に相当する。 An example of the sequence information decoding process procedure of step S205 will be described in more detail with reference to the flowchart of FIG. In the sequence information decoding process, first, sequence information other than the MVC extension is decoded (step S211). The processing in step S211 corresponds to a decoding operation in the sequence information decoding unit 401 other than the MVC extension portion in the sequence information decoding unit 303 in FIG.

ステップＳ２１１に続いて、視点数の情報を復号する（ステップＳ２１２）。このステップＳ２１２の処理は、図１５のシーケンス情報復号部３０３では視点数情報復号部４０２での復号動作に相当する。ステップＳ２１２に続いて、視点方向の復号順序で符号化された各視点の視点ＩＤの情報を復号する（ステップＳ２１３）。このステップＳ２１３の復号処理は、図１５のシーケンス情報復号部３０３内の復号順序情報復号部４０３での復号動作に相当する。 Following step S211, information on the number of viewpoints is decoded (step S212). The processing in step S212 corresponds to the decoding operation in the viewpoint number information decoding unit 402 in the sequence information decoding unit 303 in FIG. Subsequent to step S212, the viewpoint ID information of each viewpoint encoded in the decoding order of the viewpoint direction is decoded (step S213). The decoding process in step S213 corresponds to the decoding operation in the decoding order information decoding unit 403 in the sequence information decoding unit 303 in FIG.

ここで、ステップＳ２１３の視点方向の復号順序で符号化された各視点の視点ＩＤの復号処理手順の一例について、図１８のフローチャートと共に更に詳細に説明する。ステップＳ２１３の復号処理では、まず、変数iを０とし（ステップＳ２２１）、続いて、変数iの値が（視点数−１）以下かどうかを判断する（ステップＳ２２２）。変数iの値が（視点数−１）以下でない場合、ステップＳ２１３の復号処理を終了する。変数iの値が（視点数−１）以下の場合、変数iの値が（視点数−１）以下でなくなるまで、ステップＳ２２３とステップＳ２２４の処理を繰り返す。 Here, an example of the decoding process procedure of the viewpoint ID of each viewpoint encoded in the decoding order of the viewpoint direction in step S213 will be described in more detail with reference to the flowchart of FIG. In the decoding process in step S213, first, the variable i is set to 0 (step S221), and then it is determined whether the value of the variable i is equal to or less than (number of viewpoints−1) (step S222). If the value of the variable i is not less than (number of viewpoints−1), the decoding process in step S213 is terminated. When the value of the variable i is less than or equal to (number of viewpoints −1), the processes of step S223 and step S224 are repeated until the value of the variable i is not less than or equal to (number of viewpoints −1).

ステップＳ２２３では、視点方向の符号化／復号順序でi番目の視点のシンタックス要素「view_id[i]」を復号する。続いて、ステップＳ２２４では、変数iに「１」を加えて、再びステップＳ２２２に進む。 In step S223, the syntax element “view_id [i]” of the i-th view is decoded in the encoding / decoding order in the view direction. Subsequently, in step S224, “1” is added to the variable i, and the process proceeds again to step S222.

再び、図１７のフローチャートに戻って説明する。図１８と共に説明した上記のステップＳ２１３の復号処理に続いて、視点間予測を用いて符号化されているかどうかを示す視点間予測情報「inter_view_pred_flag」を復号する（ステップＳ２１４）。このステップＳ２１４の処理は、図１５のシーケンス情報復号部３０３内の視点間予測情報復号部４０４での復号動作に相当する。続いて、「inter_view_pred_flag」の値に基づいて、視点間予測を用いて符号化されているかどうかを判断し（ステップＳ２１５）、視点間予測を用いて符号化されている場合（「inter_view_pred_flag」の値が「１」）は、ステップＳ２１６により視点依存情報を復号し、視点間予測を用いずに符号化されている場合（「inter_view_pred_flag」の値が「０」）は、シーケンス情報の復号処理を終了する。このステップＳ２１５とＳ２１６の処理は、図１５のシーケンス情報復号部３０３内の視点依存情報復号部４０５での復号動作とスイッチ４０６の切換動作とに相当する。すなわち、図１５において、スイッチ４０６は、復号した「inter_view_pred_flag」の値が「１」のときのみ、入力される符号化ビット列を視点依存情報復号部４０５に供給し、復号した「inter_view_pred_flag」の値が「０」のときは入力される符号化ビット列を視点依存情報復号部４０５には供給しない。 Returning to the flowchart of FIG. Following the decoding process in step S213 described above with reference to FIG. 18, inter-view prediction information “inter_view_pred_flag” indicating whether or not encoding is performed using inter-view prediction is decoded (step S214). The processing in step S214 corresponds to the decoding operation in the inter-view prediction information decoding unit 404 in the sequence information decoding unit 303 in FIG. Subsequently, based on the value of “inter_view_pred_flag”, it is determined whether or not encoding is performed using inter-view prediction (step S215). When encoding is performed using inter-view prediction (value of “inter_view_pred_flag”) Is “1”), the viewpoint dependent information is decoded in step S216, and when the encoding is performed without using the inter-view prediction (the value of “inter_view_pred_flag” is “0”), the decoding process of the sequence information ends. To do. The processes in steps S215 and S216 correspond to the decoding operation in the viewpoint dependent information decoding unit 405 and the switching operation of the switch 406 in the sequence information decoding unit 303 in FIG. That is, in FIG. 15, the switch 406 supplies the input encoded bit string to the view-dependent information decoding unit 405 only when the decoded “inter_view_pred_flag” value is “1”, and the decoded “inter_view_pred_flag” value is When it is “0”, the input encoded bit string is not supplied to the view dependent information decoding unit 405.

次に、図１７のステップＳ２１６の視点依存情報の復号処理手順の一例について図１９のフローチャートと共に説明する。このステップＳ２１６では、まず、アンカーピクチャの視点依存情報を復号し（ステップＳ２３１）、続いてノンアンカーピクチャの視点依存情報を復号する（ステップＳ２３２）ことで復号処理を終了する。 Next, an example of the processing procedure for decoding the viewpoint dependent information in step S216 in FIG. 17 will be described with reference to the flowchart in FIG. In this step S216, first, the view dependency information of the anchor picture is decoded (step S231), and then the view dependency information of the non-anchor picture is decoded (step S232), thereby completing the decoding process.

次に、図１９のステップＳ２３１のアンカーピクチャの視点依存情報の復号処理手順の一例について図２０のフローチャートと共に更に詳細に説明する。ステップＳ２３１のアンカーピクチャの視点依存情報の復号処理では、まず、変数iを０とし（ステップＳ２４１）、続いて、変数iの値が（視点数−１）以下かどうかを判断する（ステップＳ２４２）。変数iの値が（視点数−１）以下でない場合、アンカーピクチャの視点依存情報の復号処理を終了する。変数iの値が（視点数−１）以下の場合、ステップＳ２４３に進み、変数iの値が（視点数−１）以下でなくなるまで、ステップＳ２４２からステップＳ２５３までの処理を繰り返す。 Next, an example of the decoding processing procedure of the anchor picture view-dependent information in step S231 in FIG. 19 will be described in more detail with reference to the flowchart in FIG. In the decoding process of the anchor picture view-dependent information in step S231, first, the variable i is set to 0 (step S241), and then it is determined whether the value of the variable i is equal to or less than (number of viewpoints−1) (step S242). . If the value of the variable i is not less than or equal to (number of viewpoints-1), the decoding process of the anchor-picture viewpoint-dependent information ends. When the value of the variable i is (number of viewpoints-1) or less, the process proceeds to step S243, and the processing from step S242 to step S253 is repeated until the value of the variable i is not less than (number of viewpoints-1).

ステップＳ２４３では、視点方向の符号化／復号順序でi番目の視点のシンタックス要素「num_anchor_refs_l0[i]」を復号する。続いて、変数ｊを「０」とした後（ステップＳ２４４）、変数ｊの値が「num_anchor_refs_l0[i]」より小さいかどうかを判断する（ステップＳ２４５）。変数ｊの値が「num_anchor_refs_l0[i]」以上の場合、ステップＳ２４８に進む。変数ｊの値が「num_anchor_refs_l0[i]」より小さい場合、ステップＳ２４６に進み、変数ｊの値が「num_anchor_refs_l0[i]」の値以上になるまで、ステップＳ２４５からステップＳ２４７までの処理を繰り返す。ステップＳ２４６では視点方向の符号化／復号順序でi番目の視点の参照画像リスト０のインデックスがｊのシンタックス要素「anchor_ref_l0[i][j]」を復号してステップＳ２４７に進む。ステップＳ２４７では、変数ｊの値に「１」を加えて再びステップＳ２４５に進む。 In step S243, the syntax element “num_anchor_refs_l0 [i]” of the i-th viewpoint is decoded in the encoding / decoding order in the viewpoint direction. Subsequently, after setting the variable j to “0” (step S244), it is determined whether or not the value of the variable j is smaller than “num_anchor_refs_l0 [i]” (step S245). When the value of the variable j is “num_anchor_refs — 10 [i]” or more, the process proceeds to step S248. When the value of the variable j is smaller than “num_anchor_refs_l0 [i]”, the process proceeds to step S246, and the processing from step S245 to step S247 is repeated until the value of the variable j becomes equal to or greater than the value of “num_anchor_refs_l0 [i]”. In step S246, the syntax element “anchor_ref_l0 [i] [j]” whose index is j in the reference image list 0 of the i-th viewpoint in the encoding / decoding order in the viewpoint direction is decoded, and the process proceeds to step S247. In step S247, “1” is added to the value of variable j, and the process proceeds again to step S245.

一方、ステップＳ２４８では、視点方向の符号化／復号順序でi番目の視点のシンタックス要素「num_anchor_refs_l1[i]」を復号する。続いて、変数ｊを０とした後（ステップＳ２４９）、変数ｊの値が復号した上記シンタックス要素「num_anchor_refs_l1[i]」の値より小さいかどうかを判断する（ステップＳ２５０）。変数ｊの値がシンタックス要素「num_anchor_refs_l1[i]」の値以上の場合、変数ｉの値に「１」を加算して（ステップＳ２５３）、ステップＳ２４２に戻る。 On the other hand, in step S248, the syntax element “num_anchor_refs_l1 [i]” of the i-th viewpoint is decoded in the encoding / decoding order in the viewpoint direction. Subsequently, after setting the variable j to 0 (step S249), it is determined whether or not the value of the variable j is smaller than the value of the decoded syntax element “num_anchor_refs_l1 [i]” (step S250). If the value of the variable j is equal to or greater than the value of the syntax element “num_anchor_refs_l1 [i]”, “1” is added to the value of the variable i (step S253), and the process returns to step S242.

一方、変数ｊの値がシンタックス要素「num_anchor_refs_l1[i]」の値より小さい場合、変数ｊの値がシンタックス要素「num_anchor_refs_l1[i]」の値以上になるまで、ステップＳ２５０からステップＳ２５２までの処理を繰り返す。すなわち、ステップＳ２５０に続くステップＳ２５１では、視点方向の符号化／復号順序でi番目の視点の参照画像リスト１のインデックスｊのシンタックス要素「anchor_ref_l1[i][j]」を復号してステップＳ２５２に進む。ステップＳ２５２では、変数ｊの値に「１」を加えて再びステップＳ２５０に進む。 On the other hand, when the value of the variable j is smaller than the value of the syntax element “num_anchor_refs_l1 [i]”, the process from step S250 to step S252 is performed until the value of the variable j becomes equal to or greater than the value of the syntax element “num_anchor_refs_l1 [i]”. Repeat the process. That is, in step S251 following step S250, the syntax element “anchor_ref_l1 [i] [j]” of the index j of the reference image list 1 of the i-th viewpoint in the encoding / decoding order in the viewpoint direction is decoded and step S252 is performed. Proceed to In step S252, “1” is added to the value of variable j, and the process proceeds again to step S250.

次に、図１９のステップＳ２３２のノンアンカーピクチャの視点依存情報の復号処理手順の一例について図２１のフローチャートと共に更に詳細に説明する。ステップＳ２３２のノンアンカーピクチャの視点依存情報の復号処理では、まず、変数iを「０」とし（ステップＳ２５４）、続いて、変数iの値が（視点数−１）以下かどうかを判断する（ステップＳ２５５）。変数iの値が（視点数−１）以下でない場合、ノンアンカーピクチャの視点依存情報の復号処理を終了する。変数iの値が（視点数−１）以下の場合、ステップＳ２５６に進み、iの値が（視点数−１）以下でなくなるまで、ステップＳ２５５からステップＳ２６６までの処理を繰り返す。 Next, an example of the decoding processing procedure of the view dependency information of the non-anchor picture in step S232 in FIG. In the decoding process of the view-dependent information of the non-anchor picture in step S232, first, the variable i is set to “0” (step S254), and then it is determined whether the value of the variable i is equal to or less than (number of viewpoints−1) ( Step S255). When the value of the variable i is not less than (number of viewpoints −1), the decoding process of the viewpoint dependent information of the non-anchor picture is ended. When the value of the variable i is (number of viewpoints-1) or less, the process proceeds to step S256, and the processing from step S255 to step S266 is repeated until the value of i is not less than (number of viewpoints-1).

ステップＳ２５６では、視点方向の符号化／復号順序でi番目の視点のシンタックス要素「num_non_anchor_refs_l0[i]」を復号する。続いて、変数ｊを「０」とし（ステップＳ２５７）、続いて、変数ｊの値が、復号した上記のシンタックス要素「num_non_anchor_refs_l0[i]」の値より小さいかどうかを判断する。変数ｊの値がシンタックス要素「num_non_anchor_refs_l0[i]」の値以上の場合、ステップＳ２６１に進む。変数ｊの値がシンタックス要素「num_non_anchor_refs_l0[i]」の値より小さい場合、変数ｊの値がシンタックス要素「num_non_anchor_refs_l0[i]」の値より小さくなくなるまで、ステップＳ２５８からステップＳ２６０までの処理を繰り返す。ステップＳ２５８に続くステップＳ２５９では視点方向の符号化／復号順序でi番目の視点の参照画像リスト０のインデックスがｊのシンタックス要素「anchor_ref_l0[i][j]」を復号してステップＳ２６０に進む。続いて、ステップＳ２６０では変数ｊの値に「１」を加えて再びステップＳ２５８に進む。 In step S256, the syntax element “num_non_anchor_refs_l0 [i]” of the i-th viewpoint is decoded in the encoding / decoding order in the viewpoint direction. Subsequently, the variable j is set to “0” (step S257), and then it is determined whether or not the value of the variable j is smaller than the value of the decoded syntax element “num_non_anchor_refs_l0 [i]”. When the value of the variable j is equal to or greater than the value of the syntax element “num_non_anchor_refs_l0 [i]”, the process proceeds to step S261. When the value of the variable j is smaller than the value of the syntax element “num_non_anchor_refs_l0 [i]”, the processing from step S258 to step S260 is performed until the value of the variable j becomes smaller than the value of the syntax element “num_non_anchor_refs_l0 [i]”. repeat. In step S259 following step S258, the syntax element “anchor_ref_l0 [i] [j]” in which the index of the reference image list 0 of the i-th viewpoint is j in the encoding / decoding order in the viewpoint direction is decoded, and the process proceeds to step S260. . Subsequently, in step S260, “1” is added to the value of the variable j, and the process proceeds again to step S258.

一方、ステップＳ２６１では、視点方向の符号化／復号順序でi番目の視点のシンタックス要素「num_non_anchor_refs_l1[i]」を復号する。続いて、変数ｊを０とした後（ステップＳ２６２）、変数ｊの値が復号した上記シンタックス要素「num_non_anchor_refs_l1[i]」の値より小さいかどうかを判断する（ステップＳ２６３）。変数ｊの値がシンタックス要素「num_non_anchor_refs_l1[i]」の値以上の場合、変数ｉの値に「１」を加算して（ステップＳ２６６）、ステップＳ２５５に戻る。 On the other hand, in step S261, the syntax element “num_non_anchor_refs_l1 [i]” of the i-th viewpoint is decoded in the encoding / decoding order in the viewpoint direction. Subsequently, after setting the variable j to 0 (step S262), it is determined whether or not the value of the variable j is smaller than the value of the decoded syntax element “num_non_anchor_refs_l1 [i]” (step S263). If the value of the variable j is equal to or greater than the value of the syntax element “num_non_anchor_refs_l1 [i]”, “1” is added to the value of the variable i (step S266), and the process returns to step S255.

一方、変数ｊの値がシンタックス要素「num_non_anchor_refs_l1[i]」の値より小さい場合、変数ｊの値がシンタックス要素「num_non_anchor_refs_l1[i]」の値以上になるまで、ステップＳ２６３からステップＳ２６５までの処理を繰り返す。すなわち、ステップＳ２６３に続くステップＳ２６４では、視点方向の符号化／復号順序でi番目の視点の参照画像リスト１のインデックスがｊのシンタックス要素「non_anchor_ref_l1[i][j]」を復号してステップＳ２６５に進む。ステップＳ２６５では、変数ｊの値に「１」を加えて再びステップＳ２６３に進む。 On the other hand, when the value of the variable j is smaller than the value of the syntax element “num_non_anchor_refs_l1 [i]”, the process from step S263 to step S265 is performed until the value of the variable j becomes equal to or larger than the value of the syntax element “num_non_anchor_refs_l1 [i]”. Repeat the process. That is, in step S264 following step S263, the syntax element “non_anchor_ref_l1 [i] [j]” whose index is j in the reference image list 1 of the i-th viewpoint in the encoding / decoding order in the viewpoint direction is decoded and executed. The process proceeds to S265. In step S265, “1” is added to the value of variable j, and the process proceeds again to step S263.

再び、図１７に戻って説明する。ステップＳ２１６の処理が完了したら図１７のシーケンス情報の符号化処理は終了である。 Returning again to FIG. When the process of step S216 is completed, the sequence information encoding process of FIG. 17 is completed.

再び、図１６のフローチャートに戻って説明する。図１７乃至図２１と共に説明した上記のステップＳ２０５の処理が完了すると、ステップＳ２０８に進む。一方、ステップＳ２０６では、ピクチャの符号化に関わるパラメータ情報を復号する。このステップＳ２０６の処理は、図１４の多視点画像復号装置のピクチャ情報復号部３０４での復号動作に相当する。ステップＳ２０６の処理が完了したらステップＳ２０８に進む。一方、ステップＳ２０７では、スライス情報及び画像信号を復号する。このステップＳ２０７の処理は、図１４の本実施の形態の多視点画像復号装置では画像信号復号部３０５での復号動作に相当する。ステップＳ２０７の処理が完了したらステップＳ２０８に進む。 Returning to the flowchart of FIG. When the process of step S205 described with reference to FIGS. 17 to 21 is completed, the process proceeds to step S208. On the other hand, in step S206, parameter information related to picture coding is decoded. The processing in step S206 corresponds to the decoding operation in the picture information decoding unit 304 of the multi-viewpoint image decoding apparatus in FIG. When the process of step S206 is completed, the process proceeds to step S208. On the other hand, in step S207, the slice information and the image signal are decoded. The processing in step S207 corresponds to the decoding operation in the image signal decoding unit 305 in the multi-viewpoint image decoding apparatus of the present embodiment in FIG. When the process of step S207 is completed, the process proceeds to step S208.

ステップＳ２０８では、復号の対象となる符号化ビット列のすべての復号処理が完了したか否かを判断する。完了している場合、本多視点画像復号処理手順が終了となる。完了していない場合、最初のステップＳ２０１に戻り、復号の対象となる符号化ビット列のすべての復号処理が完了するまでステップＳ２０１からステップＳ２０８までの処理を繰り返す。これにより、図１４及び図１５と共に説明した本発明の多視点画像復号装置と同様の特長が得られる。 In step S208, it is determined whether or not all decoding processes for the encoded bit string to be decoded have been completed. If completed, this multi-viewpoint image decoding processing procedure ends. If not completed, the process returns to the first step S201, and the processes from step S201 to step S208 are repeated until all the decoding processes of the encoded bit string to be decoded are completed. Thereby, the same features as those of the multi-view image decoding apparatus of the present invention described with reference to FIGS. 14 and 15 can be obtained.

なお、以上の説明においては、視点間予測を用いて符号化するかどうかの情報を符号化し、この情報に基づいて視点依存情報が符号化されるかどうかを判断したが、視点間予測を用いて符号化するかどうかの情報をアンカーピクチャ用とノンアンカーピクチャ用で別々に用意して符号化／復号することもでき、本発明に含まれる。 In the above description, information on whether to encode using inter-view prediction is encoded, and it is determined whether view-dependent information is encoded based on this information. It is also possible to separately prepare and encode / decode information on whether or not to encode for an anchor picture and a non-anchor picture, which is included in the present invention.

図２３は、視点間予測を用いて符号化するかどうかの情報をアンカーピクチャ用とノンアンカーピクチャ用で別々に用意した場合のＳＰＳにおけるＭＶＣ拡張部分のシンタックス構造の一例を示す。図１２のシンタックス構造と比較すると、図２３ではアンカーピクチャ用のシンタックス要素「anchor_inter_view_pred_flag」と、ノンアンカーピクチャ用のシンタックス要素「non_anchor_inter_view_pred_flag」とが用意されており、「anchor_inter_view_pred_flag」の値に応じて、アンカーピクチャ用の視点依存情報であるシンタックス要素「num_anchor_refs_l0[i]」、「anchor_ref_l0[i][j]」、「num_anchor_refs_l1[i]」、「anchor_ref_l1[i][j]」を符号化／復号するか否かを決定し、「non_anchor_inter_view_pred_flag」の値に応じて、ノンアンカーピクチャ用の視点依存情報であるシンタックス要素「num_non_anchor_refs_l0[i]」、「non_anchor_ref_l0[i][j]」、「num_non_anchor_refs_l1[i]」、「non_anchor_ref_l1[i][j]」を符号化／復号するか否かを決定する構造になっている点が異なる。 FIG. 23 shows an example of the syntax structure of the MVC extension part in the SPS when information on whether to encode using inter-view prediction is prepared separately for anchor pictures and non-anchor pictures. Compared with the syntax structure of FIG. 12, in FIG. 23, a syntax element “anchor_inter_view_pred_flag” for anchor picture and a syntax element “non_anchor_inter_view_pred_flag” for non-anchor picture are prepared, and according to the value of “anchor_inter_view_pred_flag” Encoding the syntax elements “num_anchor_refs_l0 [i]”, “anchor_ref_l0 [i] [j]”, “num_anchor_refs_l1 [i]”, “anchor_ref_l1 [i] [j]”, which are the viewpoint dependent information for anchor pictures / Determining whether to decode, according to the value of "non_anchor_inter_view_pred_flag", syntax elements "num_non_anchor_refs_l0 [i]", "non_anchor_ref_l0 [i] [j]", "non_anchor_ref_l0 [i] [j]" The difference is that the structure determines whether or not to encode / decode “num_non_anchor_refs_l1 [i]” and “non_anchor_ref_l1 [i] [j]”.

また、視点間予測を用いて符号化するかどうかの情報を参照ピクチャリスト０用と参照ピクチャリスト１用で別々に用意して符号化／復号することもでき、本発明に含まれる。図２４は、視点間予測を用いて符号化するかどうかの情報を参照ピクチャリスト０用と参照ピクチャリスト１用で別々に用意した場合のＳＰＳにおけるＭＶＣ拡張部分のシンタックス構造の一例を示す。図１２のシンタックス構造と比較すると、図２４のシンタックス構造では参照ピクチャリスト０用のシンタックス要素「inter_view_pred_l0_flag」と、参照ピクチャリスト１用の「inter_view_pred_l1_flag」とが用意されており、「inter_view_pred_l0_flag」の値に応じて、参照ピクチャリスト０用の視点依存情報であるシンタックス要素「num_anchor_refs_l0[i]」、「anchor_ref_l0[i][j]」、「num_non_anchor_refs_l0[i]」、「non_anchor_ref_l0[i][j]」を符号化／復号されるか否かを決定し、「inter_view_pred_l1_flag」の値に応じて、ノンアンカーピクチャ用の視点依存情報であるシンタックス要素「num_anchor_refs_l1[i]」、「anchor_ref_l1[i][j]」、「num_non_anchor_refs_l1[i]」、「non_anchor_ref_l1[i][j]」を符号化／復号するか否かを決定する構造になっている点が異なる。 Also, information on whether to encode using inter-view prediction can be separately prepared and encoded / decoded for the reference picture list 0 and the reference picture list 1, and is included in the present invention. FIG. 24 shows an example of the syntax structure of the MVC extension part in the SPS when information on whether to encode using inter-view prediction is prepared separately for the reference picture list 0 and the reference picture list 1. Compared with the syntax structure of FIG. 12, in the syntax structure of FIG. 24, a syntax element “inter_view_pred_l0_flag” for reference picture list 0 and “inter_view_pred_l1_flag” for reference picture list 1 are prepared, and “inter_view_pred_l0_flag” is prepared. The syntax elements “num_anchor_refs_l0 [i]”, “anchor_ref_l0 [i] [j]”, “num_non_anchor_refs_l0 [i]”, “non_anchor_ref_l0 [i] [i] [ j] ”is determined to be encoded / decoded, and syntax elements“ num_anchor_refs_l1 [i] ”,“ anchor_ref_l1 [i] that are view-dependent information for non-anchor pictures are determined according to the value of “inter_view_pred_l1_flag”. ] [j] "," num_non_anchor_refs_l1 [i] ", and" non_anchor_ref_l1 [i] [j] "are different in that the structure determines whether or not to encode / decode.

また、視点間予測を用いて符号化するかどうかの情報をアンカーピクチャの参照ピクチャリスト０用と参照ピクチャ１用、ノンアンカーピクチャの参照ピクチャリスト０用と参照ピクチャ１用で別々に用意して符号化／復号することもでき、本発明に含まれる。図２５は、視点間予測を用いて符号化するかどうかの情報をアンカーピクチャの参照ピクチャリスト０用と参照ピクチャ１用、ノンアンカーピクチャの参照ピクチャリスト０用と参照ピクチャ１用で別々に用意した場合のＳＰＳにおけるＭＶＣ拡張部分のシンタックス構造の一例を示す。 In addition, information on whether to perform encoding using inter-view prediction is prepared separately for reference picture list 0 and reference picture 1 of an anchor picture, and for reference picture list 0 and reference picture 1 of a non-anchor picture. It can also be encoded / decoded and is included in the present invention. In FIG. 25, information on whether to encode using inter-view prediction is prepared separately for reference picture list 0 and reference picture 1 of an anchor picture, and for reference picture list 0 and reference picture 1 of a non-anchor picture. An example of the syntax structure of the MVC extension part in the SPS in this case is shown.

図１２のシンタックス構造と比較すると、図２５のシンタックス構造ではアンカーピクチャの参照ピクチャリスト０用と参照ピクチャ１用のシンタックス要素「anchor_inter_view_pred_l0_flag」及び「anchor_inter_view_pred_l1_flag」と、ノンアンカーピクチャの参照ピクチャリスト０用と参照ピクチャ１用のシンタックス要素「non_anchor_inter_view_pred_l0_flag」及び「non_anchor_inter_view_pred_l1_flag」とが用意されており、「anchor_inter_view_pred_l0_flag」の値に応じて、アンカーピクチャの参照ピクチャリスト０用の視点依存情報であるシンタックス要素「num_anchor_refs_l0[i]」、「anchor_ref_l0[i][j]」を符号化／復号するか否かを決定し、「anchor_inter_view_pred_l1_flag」の値に応じて、アンカーピクチャの参照ピクチャリスト１用の視点依存情報であるシンタックス要素「num_anchor_refs_l1[i]」、「anchor_ref_l1[i][j]」を符号化／復号するか否かを決定し、「non_anchor_inter_view_pred_l0_flag」の値に応じて、ノンアンカーピクチャの参照ピクチャリスト０用の視点依存情報であるシンタックス要素「num_non_anchor_refs_l0[i]」、「non_anchor_ref_l0[i][j]」を符号化／復号するか否かを決定し、「non_anchor_inter_view_pred_l0_flag」の値に応じて、ノンアンカーピクチャの参照ピクチャリスト１用の視点依存情報であるシンタックス要素「num_non_anchor_refs_l1[i]」、「non_anchor_ref_l1[i][j]」を符号化／復号するか否かを決定する構造になっている点が異なる。 Compared with the syntax structure of FIG. 12, in the syntax structure of FIG. 25, the syntax elements “anchor_inter_view_pred_l0_flag” and “anchor_inter_view_pred_l1_flag” for the reference picture list 0 and reference picture 1 of the anchor picture, and the reference picture list of the non-anchor picture Syntax elements “non_anchor_inter_view_pred_l0_flag” and “non_anchor_inter_view_pred_l1_flag” for 0 and reference picture 1 are prepared, and the syntax is the view-dependent information for the reference picture list 0 of the anchor picture according to the value of “anchor_inter_view_pred_l0_flag” Decide whether or not to encode / decode elements “num_anchor_refs_l0 [i]” and “anchor_ref_l0 [i] [j]” and depending on the value of “anchor_inter_view_pred_l1_flag”, view-dependent for reference picture list 1 of anchor picture Information syntax element "num_anchor_refs_ Determine whether to encode / decode “l1 [i]” and “anchor_ref_l1 [i] [j]”, and according to the value of “non_anchor_inter_view_pred_l0_flag”, view-dependent information for the reference picture list 0 of the non-anchor picture To determine whether to encode / decode the syntax elements “num_non_anchor_refs_l0 [i]” and “non_anchor_ref_l0 [i] [j]”, and according to the value of “non_anchor_inter_view_pred_l0_flag”, the reference picture list of the non-anchor picture The difference is that the syntax elements “num_non_anchor_refs_l1 [i]” and “non_anchor_ref_l1 [i] [j]”, which are viewpoint-dependent information for 1, are structured to determine whether to encode / decode.

なお、図１２に示すシンタックス構造に従った符号化／復号方式の説明においては、視点間予測を用いて符号化／復号するかどうかの情報を符号化し、この情報に基づいて視点依存情報が符号化／復号されるかどうかを判断したが、さらに、この情報に基づいて視点方向の符号化／復号順序情報も符号化されるかどうかを切り替えることもでき、本発明に含まれる。 In the description of the encoding / decoding method according to the syntax structure shown in FIG. 12, information on whether to encode / decode using inter-view prediction is encoded, and the viewpoint-dependent information is based on this information. Whether or not to be encoded / decoded has been determined, and further, whether or not the encoding / decoding order information in the view direction is also encoded can be switched based on this information, and is included in the present invention.

図２６は、視点間予測を用いて符号化するかどうかの情報に応じて視点依存情報に加えて符号化／復号順序情報を符号化するかどうかも切り替える場合のＳＰＳにおけるＭＶＣ拡張部分のシンタックス構造の一例を示す。図１２と比較すると、図２６のシンタックス構造では、シンタックス要素「inter_view_pred_flag」の値に応じて、符号化／復号順序情報であるシンタックス要素「view_id[i]」が符号化／復号されるか否かも決定する構造になっている点が異なる。「inter_view_pred_flag」の値が「０」となり、「view_id[i]」が符号化されない場合は、視点方向の符号化／復号順序情報であるシンタックス要素「view_id[i]」の値を「０」から昇順と規定してもよいし、未定と規定してもよい。 FIG. 26 shows the syntax of the MVC extension part in the SPS when switching whether to encode / decode order information in addition to view-dependent information according to information about whether to encode using inter-view prediction. An example of a structure is shown. Compared to FIG. 12, in the syntax structure of FIG. 26, the syntax element “view_id [i]” that is the encoding / decoding order information is encoded / decoded in accordance with the value of the syntax element “inter_view_pred_flag”. The difference is that the structure is also determined. When the value of “inter_view_pred_flag” is “0” and “view_id [i]” is not encoded, the value of the syntax element “view_id [i]” that is encoding / decoding order information in the view direction is set to “0”. Ascending order may be specified, or may be specified as undecided.

なお、以上の説明においては、視点間予測を用いて符号化するかどうかの情報を符号化し、この情報に基づいて視点依存情報が符号化されるかどうかを判断したが、視点間予測を用いて符号化するかどうかの情報が判断できる暗示的な情報が存在する場合には、視点間予測を用いて符号化するかどうかの情報を明示的に符号化する必要はなく、暗示的な情報に基づいて視点依存情報が符号化されるかどうかを判断することもでき、本発明に含まれる。例えば、視点間予測を用いずに多視点画像を符号化するプロファイル、及び視点間予測を用いて多視点画像を符号化するプロファイルを規定する。さらに、視点間予測を用いずに多視点画像を符号化するプロファイルの場合は視点依存情報を符号化しないと規定する。そして、そのプロファイルを判別するための情報を符号化し、復号側でそのプロファイルを判別するための情報を復号することで、プロファイルを判別することができ、暗示的な情報であるプロファイルの値から視点間予測を用いて符号化されているか否かが判別できる。 In the above description, information on whether to encode using inter-view prediction is encoded, and it is determined whether view-dependent information is encoded based on this information. If there is implicit information that can be used to determine whether or not to encode, it is not necessary to explicitly encode information about whether or not to encode using inter-view prediction. It is also possible to determine whether the view-dependent information is encoded based on the above, and this is included in the present invention. For example, a profile for encoding a multi-view image without using inter-view prediction and a profile for encoding a multi-view image using inter-view prediction are defined. Furthermore, in the case of a profile that encodes a multi-view image without using inter-view prediction, it is defined that the view-dependent information is not encoded. Then, by encoding the information for determining the profile and decoding the information for determining the profile on the decoding side, the profile can be determined, and the viewpoint is determined from the value of the profile that is implicit information. It can be determined whether or not encoding is performed using inter prediction.

なお、以上の説明においては、符号化、復号に用いる多視点画像は異なる視点から実際に撮影された多視点画像を符号化、復号することもできるが、実際には撮影していない仮想的な視点の位置を周辺の視点から補間する等、変換または生成された視点画像を符号化、復号することもでき、本発明に含まれる。 In the above description, a multi-view image used for encoding and decoding can be encoded and decoded from a multi-view image actually captured from different viewpoints, but it is a virtual image that is not actually captured. It is also possible to encode and decode a viewpoint image that has been converted or generated, such as by interpolating the viewpoint position from surrounding viewpoints, and is included in the present invention.

例えば、Ａ，Ｂ，Ｃ，Ｄの４つの視点の画像信号を備えた多視点画像信号は、（１）４つの視点の画像信号がすべて各視点で実際に撮影して得られた画像信号である場合、（２）４つの視点の画像信号がすべて各視点で仮想的に撮影したものとして生成した画像信号である場合、（３）Ａ，Ｂ視点の画像信号が各視点で実際に撮影して得られた画像信号、Ｃ，Ｄ視点の画像信号が各視点で仮想的に撮影したものとして生成した画像信号といったように、実際に撮影して得られた画像信号と仮想的に撮影したものとして生成した画像信号とが混在している場合の３つの場合が想定される。 For example, a multi-viewpoint image signal including image signals of four viewpoints A, B, C, and D is (1) an image signal obtained by actually photographing all four viewpoint image signals at each viewpoint. In some cases, (2) when all four viewpoint image signals are virtually taken at each viewpoint, (3) A and B viewpoint image signals are actually captured at each viewpoint. The image signal obtained by actually shooting and the image signal obtained by actually shooting, such as the image signal obtained by virtually capturing the image signal of the C and D viewpoints and the image signals of the C and D viewpoints. Are assumed to be mixed with the generated image signal.

また、コンピュータグラフィックス等の多視点画像を符号化、復号することもでき、本発明に含まれる。更に、以上の多視点画像符号化、および復号に関する処理は、ハードウェアを用いた伝送、蓄積、受信装置として実現することができるのは勿論のこと、ＲＯＭ（リード・オンリ・メモリ）やフラッシュメモリ等に記憶されているファームウェアや、コンピュータ等のソフトウェアによっても実現することができる。そのファームウェアプログラム、ソフトウェアプログラムをコンピュータ等で読み取り可能な記録媒体に記録して提供することも、有線あるいは無線のネットワークを通してサーバから提供することも、地上波あるいは衛星ディジタル放送のデータ放送として提供することも可能である。 In addition, multi-view images such as computer graphics can be encoded and decoded, and is included in the present invention. Furthermore, the above multi-view image encoding and decoding processes can be realized as a transmission, storage, and reception device using hardware, as well as a ROM (Read Only Memory) and a flash memory. It can also be realized by firmware stored in the computer or software such as a computer. The firmware program and software program can be provided by recording them on a computer-readable recording medium, provided from a server through a wired or wireless network, or provided as a data broadcast of terrestrial or satellite digital broadcasting. Is also possible.

本発明で復号する符号化ビット列を生成する多視点画像符号化装置の一例のブロック図である。It is a block diagram of an example of the multiview image coding apparatus which produces | generates the encoding bit sequence decoded by this invention. 図１中の多視点画像符号化装置を構成するシーケンス情報符号化部１０２の一例のブロック図である。It is a block diagram of an example of the sequence information encoding part 102 which comprises the multiview image encoding apparatus in FIG. 本発明で復号する符号化ビット列の多視点画像符号化処理説明用フローチャートである。It is a flowchart for the multi-view image encoding process description of the encoding bit sequence decoded by this invention. 図３中のステップＳ１０１のシーケンス情報の符号化処理説明用フローチャートである。FIG. 4 is a flowchart for explaining an encoding process of sequence information in step S <b> 101 in FIG. 3. FIG. 図４中のステップＳ１１３の符号化／復号順序による視点ＩＤの符号化処理説明用フローチャートである。5 is a flowchart for explaining viewpoint ID encoding processing according to the encoding / decoding order in step S113 in FIG. 4. 図４中のステップＳ１１６の視点依存情報の符号化処理説明用フローチャートである。5 is a flowchart for explaining encoding processing of viewpoint dependent information in step S116 in FIG. 図６中のステップＳ１３１のアンカーピクチャの視点依存情報の符号化処理説明用フローチャートである。FIG. 7 is a flowchart for explaining processing for encoding viewpoint-dependent information of an anchor picture in step S131 in FIG. 6; FIG. 図６中のステップＳ１３２のノンアンカーピクチャの視点依存情報の符号化処理説明用フローチャートである。FIG. 7 is a flowchart for explaining encoding processing of viewpoint-dependent information of a non-anchor picture in step S132 in FIG. ネットワークを介して伝送する場合のパケット化及び送信処理説明用フローチャートである。It is a flowchart for packetization and transmission processing explanation in the case of transmitting via a network. ８視点からなる多視点画像を視点間予測を用いずに符号化する際の画像間の参照依存関係の一例を示す図である。It is a figure which shows an example of the reference dependence relationship between the images at the time of encoding the multiview image which consists of 8 viewpoints, without using inter-view prediction. 図２８に示すシンタックス構造に基づいて、図１０に示す予測の参照依存関係で符号化する際のＳＰＳのＭＶＣ拡張部分の各シンタックス要素とその値の一例である。It is an example of each syntax element and its value of the MVC extension part of SPS at the time of encoding by the reference dependence relationship of prediction shown in FIG. 10 based on the syntax structure shown in FIG. 本発明で復号する符号化ビット列のＳＰＳのＭＶＣ拡張部分のシンタックス構造の一例である。It is an example of the syntax structure of the MVC extension part of SPS of the encoding bit stream decoded by this invention. 図１２に示すシンタックス構造に基づいて、図１０に示す予測の参照依存関係で符号化する際のＳＰＳのＭＶＣ拡張部分の各シンタックス要素とその値の一例である。It is an example of each syntax element and its value of the MVC extension part of SPS at the time of encoding by the reference dependence of prediction shown in FIG. 10 based on the syntax structure shown in FIG. 本発明の多視点画像復号装置の一実施の形態のブロック図である。It is a block diagram of one embodiment of a multi-view image decoding device of the present invention. 図１３中の多視点画像復号装置を構成するシーケンス情報復号部３０３の一実施の形態のブロック図である。It is a block diagram of one Embodiment of the sequence information decoding part 303 which comprises the multiview image decoding apparatus in FIG. 本発明の多視点画像復号処理説明用フローチャートである。It is a flowchart for multi-view image decoding processing description of this invention. 図１６中のステップＳ２０５のシーケンス情報の復号処理説明用フローチャートである。17 is a flowchart for explaining decoding processing of sequence information in step S205 in FIG. 図１７中のステップＳ２１３の符号化／復号順序で符号化された視点ＩＤの復号処理説明用フローチャートである。18 is a flowchart for explaining decoding processing of a viewpoint ID encoded in the encoding / decoding order in step S213 in FIG. 図１７中のステップＳ２１６の視点依存情報の復号処理説明用フローチャートである。18 is a flowchart for explaining decoding processing of viewpoint-dependent information in step S216 in FIG. 図１９中のステップＳ２３１のアンカーピクチャの視点依存情報の復号処理説明用フローチャートである。FIG. 20 is a flowchart for explaining decoding processing of viewpoint-dependent information of an anchor picture in step S231 in FIG. 図１９中のステップＳ２３２のノンアンカーピクチャの視点依存情報の復号処理説明用フローチャートである。FIG. 20 is a flowchart for explaining decoding processing of viewpoint-dependent information of a non-anchor picture in step S232 in FIG. ネットワークを介して受信する場合の受信処理説明用フローチャートである。It is a flowchart for reception processing explanation in the case of receiving via a network. 本発明のＳＰＳのＭＶＣ拡張部分のシンタックス構造の一例である。It is an example of the syntax structure of the MVC extension part of SPS of this invention. 本発明のＳＰＳのＭＶＣ拡張部分のシンタックス構造の一例である。It is an example of the syntax structure of the MVC extension part of SPS of this invention. 本発明のＳＰＳのＭＶＣ拡張部分のシンタックス構造の一例である。It is an example of the syntax structure of the MVC extension part of SPS of this invention. 本発明のＳＰＳのＭＶＣ拡張部分のシンタックス構造の一例である。It is an example of the syntax structure of the MVC extension part of SPS of this invention. ８視点からなる多視点画像を視点間予測を用いて符号化する際の予測の参照依存関係の一例を示す図である。It is a figure which shows an example of the reference dependence of the prediction at the time of encoding the multiview image which consists of 8 viewpoints using inter-view prediction. 従来例のＳＰＳのＭＶＣ拡張部分のシンタックス構造の一例である。It is an example of the syntax structure of the MVC extension part of SPS of a prior art example. 符号なし指数ゴロム符号で符号化されたビット列とコード番号の関係の一例である。It is an example of the relationship between the bit string encoded with the unsigned exponential Golomb code and the code number. 図２８のシンタックス構造に基づいて、図２７に示す予測の参照依存関係で符号化する際のＳＰＳのＭＶＣ拡張部分の各シンタックス要素とその値の一例である。It is an example of each syntax element and its value of the MVC extension part of SPS at the time of encoding by the reference dependence relationship of prediction shown in FIG. 27 based on the syntax structure of FIG.

Explanation of symbols

１０１符号化管理部
１０２シーケンス情報符号化部
１０３ピクチャ情報符号化部
１０４画像信号符号化部
１０５多重化部
２０１ＭＶＣ拡張部分以外のシーケンス情報符号化部
２０２視点数情報符号化部
２０３符号化順序情報符号化部
２０４視点間予測情報符号化部
２０５視点依存情報符号化部
３０１分離部
３０２復号管理部
３０３シーケンス情報復号部
３０４ピクチャ情報復号部
３０５画像信号復号部
４０１ＭＶＣ拡張部分以外のシーケンス情報復号部
４０２視点数情報復号部
４０３復号順序情報復号部
４０４視点間予測情報復号部
４０５視点依存情報復号部 DESCRIPTION OF SYMBOLS 101 Coding management part 102 Sequence information coding part 103 Picture information coding part 104 Image signal coding part 105 Multiplexing part 201 Sequence information coding part other than MVC extension part 202 View number information coding part 203 Coding order information Coding section 204 Inter-view prediction information coding section 205 View-dependent information coding section 301 Separating section 302 Decoding management section 303 Sequence information decoding section 304 Picture information decoding section 305 Image signal decoding section 401 Sequence information decoding section other than MVC extension section 402 Number-of-views information decoding unit 403 Decoding order information decoding unit 404 Inter-view prediction information decoding unit 405 View-dependent information decoding unit

Claims

It is a multi-viewpoint image signal including image signals of each viewpoint obtained respectively at a plurality of set viewpoints, and the image signal of one viewpoint is an image signal obtained by actually photographing from the one viewpoint, or A multi-view image decoding method for decoding encoded data to be decoded in which a multi-view image signal, which is an image signal generated as a virtual image taken from one viewpoint, is encoded,
The encoded data to be decoded is
First encoded data obtained by encoding inter-view prediction information indicating whether or not there is an image to be encoded with reference to a decoded image signal of another viewpoint in the encoding of the image signal of each viewpoint;
Only when there is an image to be encoded with reference to a decoded image signal of another viewpoint, second encoded data obtained by encoding the viewpoint dependency information indicating the dependency relationship between the viewpoints;
When there is an image to be encoded with reference to the decoded image signal of another viewpoint, the image signal of each viewpoint to be encoded is encoded according to the value of the viewpoint dependent information, and the decoded image signal of the other viewpoint is When there is no image to be encoded with reference, the third encoded data obtained by encoding without referring to the decoded image signal of another viewpoint,
A first step of decoding the first encoded data to obtain the inter-view prediction information;
The second encoding is performed only when it is determined that there is an image to be decoded with reference to a decoded image signal of another viewpoint based on the value of the inter-view prediction information obtained by decoding in the first step. A second step of decoding the data to obtain the viewpoint dependent information;
When the inter-view prediction information and the view-dependent information are decoded, the third encoded data is decoded using the view-dependent information, and when the view-dependent information is not decoded, A third step of obtaining the image signal of each viewpoint by decoding the third encoded data without referring to the decoded image signal;
A multi-viewpoint image decoding method comprising:

It is a multi-viewpoint image signal including image signals of each viewpoint obtained respectively at a plurality of set viewpoints, and the image signal of one viewpoint is an image signal obtained by actually photographing from the one viewpoint, or A multi-view image decoding apparatus that decodes encoded data to be decoded, which is obtained by encoding a multi-view image signal that is an image signal generated as a virtual image taken from one viewpoint,
The encoded data to be decoded is
First encoded data obtained by encoding inter-view prediction information indicating whether or not there is an image to be encoded with reference to a decoded image signal of another viewpoint in the encoding of the image signal of each viewpoint;
Only when there is an image to be encoded with reference to a decoded image signal of another viewpoint, second encoded data obtained by encoding the viewpoint dependency information indicating the dependency relationship between the viewpoints;
When there is an image to be encoded with reference to the decoded image signal of another viewpoint, the image signal of each viewpoint to be encoded is encoded according to the value of the viewpoint dependent information, and the decoded image signal of the other viewpoint is When there is no image to be encoded with reference, the third encoded data obtained by encoding without referring to the decoded image signal of another viewpoint,
First decoding means for decoding the first encoded data to obtain the inter-view prediction information;
Only when it is determined that there is an image to be decoded with reference to a decoded image signal of another viewpoint based on the value of the inter-view prediction information obtained by decoding by the first decoding means, the second code Second decoding means for decoding the encrypted data to obtain the viewpoint dependent information;
When the inter-view prediction information and the view-dependent information are decoded, the third encoded data is decoded using the view-dependent information, and when the view-dependent information is not decoded, Third decoding means for obtaining the image signal of each viewpoint by decoding the third encoded data without referring to the decoded image signal;
A multi-viewpoint image decoding apparatus comprising:

It is a multi-viewpoint image signal including image signals of each viewpoint obtained respectively at a plurality of set viewpoints, and the image signal of one viewpoint is an image signal obtained by actually photographing from the one viewpoint, or A multi-viewpoint image decoding program for decoding encoded data to be decoded by encoding a multi-viewpoint image signal that is an image signal generated as a virtual image taken from one viewpoint,
The encoded data to be decoded is
First encoded data obtained by encoding inter-view prediction information indicating whether or not there is an image to be encoded with reference to a decoded image signal of another viewpoint in the encoding of the image signal of each viewpoint;
Only when there is an image to be encoded with reference to a decoded image signal of another viewpoint, second encoded data obtained by encoding the viewpoint dependency information indicating the dependency relationship between the viewpoints;
When there is an image to be encoded with reference to the decoded image signal of another viewpoint, the image signal of each viewpoint to be encoded is encoded according to the value of the viewpoint dependent information, and the decoded image signal of the other viewpoint is When there is no image to be encoded with reference, the third encoded data obtained by encoding without referring to the decoded image signal of another viewpoint,
In the computer,
A first step of decoding the first encoded data to obtain the inter-view prediction information;
The second encoding is performed only when it is determined that there is an image to be decoded with reference to a decoded image signal of another viewpoint based on the value of the inter-view prediction information obtained by decoding in the first step. A second step of decoding the data to obtain the viewpoint dependent information;
When the inter-view prediction information and the view-dependent information are decoded, the third encoded data is decoded using the view-dependent information, and when the view-dependent information is not decoded, A third step of obtaining the image signal of each viewpoint by decoding the third encoded data without referring to the decoded image signal;
A multi-viewpoint image decoding program characterized by causing