JP2010157824A

JP2010157824A - Image encoder, image encoding method, and program of the same

Info

Publication number: JP2010157824A
Application number: JP2008333857A
Authority: JP
Inventors: Hiroya Nakamura; 博哉中村; Motoharu Ueda; 基晴上田
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 2008-12-26
Filing date: 2008-12-26
Publication date: 2010-07-15

Abstract

<P>PROBLEM TO BE SOLVED: To transmit or accumulate multi-viewpoint images efficiently. <P>SOLUTION: An image signal encoder 107 encodes a plurality of images respectively made from a plurality of different visual points to generate image encoded data. A depth information encoder (for example, a depth signal encoder 108) encodes depth information indicative of a depth of a specific space from at least one or more visual points to generate depth information encoded data. A parameter information encoder 110 encodes parameter information including visual point information for specifying a plurality of visual points, a basis of the plurality of images and the depth information, to generate parameter information encoded data. A unitization portion 109 generates an encoded stream containing the image encoded data generated by the image signal encoder 107, the depth information encoded data generated by the depth information encoder, and the parameter information encoded data generated by the parameter information encoder 110. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、異なる複数の視点から撮影された画像を符号化する画像符号化装置、画像符号化方法およびそのプログラムに関する。 The present invention relates to an image encoding device, an image encoding method, and a program thereof that encode images captured from a plurality of different viewpoints.

近年、多視点からの画像を使用したアプリケーションが普及してきている。そのアプリケーションの一つに、２眼式立体テレビジョンがある。２眼式立体テレビジョンにおいては、２台のカメラにより異なる２方向から撮影された、左眼用画像および右眼用画像を生成し、これを同一画面上に表示して立体画像を見せるようにしている。この場合、左眼用画像および右眼用画像は、それぞれ独立した画像として別個に伝送、または記録される。この場合、単一の２次元画像の約２倍の情報量が必要となってしまう。 In recent years, applications using images from multiple viewpoints have become widespread. One such application is a twin-lens stereoscopic television. In a twin-lens stereoscopic television, a left-eye image and a right-eye image, which are taken from two different directions by two cameras, are generated and displayed on the same screen to show a stereoscopic image. ing. In this case, the left-eye image and the right-eye image are separately transmitted or recorded as independent images. In this case, the amount of information about twice that of a single two-dimensional image is required.

そこで、左右いずれか一方の画像を主画像とし、他方の画像を副画像とし、その副画像の情報を一般的な圧縮符号化方法によって情報圧縮し、情報量を抑える手法が提案されている（例えば、特許文献１参照）。この提案されている立体テレビジョン画像伝送方式では、副画像の小領域毎に主画像との相関の高い相対位置を求め、その位置偏移量（以下、視差ベクトルという）と差信号（以下、予測残差信号という）とを伝送または記録するようにしている。主画像と視差ベクトルを用いれば副画像に近い画像が復元できるが、予測残差信号も伝送または記録するのは、物体の影になる部分など主画像がもたない副画像の情報は復元できないためである。 Therefore, a method has been proposed in which one of the left and right images is set as a main image, the other image is set as a sub image, and information of the sub image is information-compressed by a general compression encoding method to reduce the amount of information ( For example, see Patent Document 1). In this proposed stereoscopic television image transmission method, a relative position having a high correlation with the main image is obtained for each small area of the sub-image, and its position deviation amount (hereinafter referred to as a disparity vector) and a difference signal (hereinafter referred to as a disparity vector). The prediction residual signal) is transmitted or recorded. If the main image and the disparity vector are used, an image close to the sub-image can be restored, but the prediction residual signal is also transmitted or recorded because the information of the sub-image that does not have the main image such as a shadow part of the object cannot be restored Because.

また、１９９６年に単視点画像の符号化国際標準であるＭＰＥＧ−２ビデオ（ＩＳＯ／ＩＥＣ１３８１８−２）符号化方式に、マルチビュー・プロファイルと呼ばれるステレオ画像の符号化方式が追加された（ＩＳＯ／ＩＥＣ１３８１８−２／ＡＭＤ３）。ＭＰＥＧ−２ビデオ・マルチビュー・プロファイルは左眼用画像を基本レイヤーで、右眼用画像を拡張レイヤーで符号化する２レイヤーの符号化方式となっており、時間方向の冗長性を利用した動き補償予測や、空間方向の冗長性を利用した離散コサイン変換に加えて、視点間の冗長性を利用した視差補償予測を用いて圧縮符号化する。 In 1996, a stereo image encoding method called a multi-view profile was added to the MPEG-2 video (ISO / IEC 13818-2) encoding method, which is an international standard for single-view image encoding (ISO). / IEC 13818-2 / AMD3). The MPEG-2 video multi-view profile is a two-layer encoding method that encodes the image for the left eye with the basic layer and the image for the right eye with the enhancement layer, and motion using redundancy in the time direction In addition to compensation prediction and discrete cosine transformation using redundancy in the spatial direction, compression coding is performed using disparity compensation prediction using redundancy between viewpoints.

また、３台以上のカメラで撮影された多視点画像に対して、動き補償予測および視差補償予測を用いて情報量を抑える手法が提案されている（例えば、特許文献２参照）。この提案されている画像高能率符号化方式は、複数の視点の参照ピクチャとのパターンマッチングを行い、誤差が最小となる、動き補償予測画像または視差補償予測画像を選択することにより、符号化効率を向上させている。 In addition, a technique has been proposed for reducing the amount of information for multi-viewpoint images captured by three or more cameras using motion compensation prediction and parallax compensation prediction (see, for example, Patent Document 2). This proposed image high-efficiency coding method performs pattern matching with reference pictures of multiple viewpoints, and selects a motion-compensated prediction image or a parallax-compensated prediction image that minimizes an error, thereby improving the coding efficiency. Has improved.

また、ＪＶＴ（Joint Video Team）ではＡＶＣ／Ｈ．２６４符号化方式（非特許文献１参照）を多視点画像に拡張した多視点画像符号化方式（ＭＶＣ：Multiview Video Coding（以下、ＭＶＣ方式と呼ぶ））の標準化作業が進んでいる（非特許文献２参照）。上記のＭＰＥＧ−２ビデオ・マルチビュー・プロファイルと同様に、このＭＶＣ方式でも視点間の予測を取り入れることで、符号化効率を向上させている。
特開昭６１-１４４１９１号公報特開平６−９８３１２号公報 ITU-T Recommendation H.264 (11/2007) Joint Draft 6.0 on Multiview Video Coding, Joint Video Team of ISO/IEC MPEG & ITU-T VCEG,JVT-Z209, January 2008 In JVT (Joint Video Team), AVC / H. The standardization work of a multi-view image coding method (MVC: Multiview Video Coding (hereinafter referred to as MVC method)) in which the H.264 coding method (see Non-Patent Document 1) is extended to a multi-view image is progressing (Non-Patent Document 1). 2). Similar to the MPEG-2 video multi-view profile described above, the MVC method also improves the coding efficiency by incorporating prediction between viewpoints.
JP-A 61-144191 JP-A-6-98312 ITU-T Recommendation H.264 (11/2007) Joint Draft 6.0 on Multiview Video Coding, Joint Video Team of ISO / IEC MPEG & ITU-T VCEG, JVT-Z209, January 2008

上述した様々な方式を用いて、複数の視点からの多視点画像を符号化することができる。しかしながら、これらの方式は必要な視点の画像をすべて符号化することになっており、限られた伝送速度、蓄積容量を考慮すれば、多視点画像を効率よく伝送または蓄積することは難しい面が多い。例えば、必要とされる視点の数が多い場合、それらの視点の画像をすべて伝送または蓄積するとデータ量が非常に大きくなってしまう。すなわち、復号側で非常に大きなデータ量を受信または読み込まなければならなくなってしまう。また、復号側でユーザの指示に応じた自由視点の画像を高精度に生成することは難しい面が多い。 A multi-view image from a plurality of viewpoints can be encoded using the various methods described above. However, these methods encode all images of the necessary viewpoints, and considering the limited transmission speed and storage capacity, it is difficult to efficiently transmit or store multi-viewpoint images. Many. For example, if a large number of viewpoints are required, the amount of data will become very large if all the images from those viewpoints are transmitted or stored. That is, a very large amount of data must be received or read on the decoding side. Also, it is difficult to generate a free viewpoint image according to a user instruction on the decoding side with high accuracy.

本発明はこうした状況に鑑みなされたものであり、その目的は、多視点画像を効率よく伝送または蓄積することができる画像符号化装置、画像符号化方法およびそのプログラムを提供することにある。 The present invention has been made in view of such circumstances, and an object thereof is to provide an image encoding device, an image encoding method, and a program thereof that can efficiently transmit or store multi-viewpoint images.

本発明のある態様の画像符号化装置は、それぞれ異なる複数の視点からの複数の画像を符号化して、画像符号化データを生成する第１符号化部と、少なくとも一つ以上の視点からの特定空間の奥行きを示す奥行き情報を符号化して、奥行き情報符号化データを生成する第２符号化部と、複数の画像および奥行き情報のもとになる複数の視点を特定するための視点情報を含むパラメータ情報を符号化して、パラメータ情報符号化データを生成する第３符号化部と、第１符号化部、第２符号化部および第３符号化部によりそれぞれ生成された、画像符号化データ、奥行き情報符号化データおよびパラメータ情報符号化データを含む符号化ストリームを生成するストリーム生成部と、を備える。 An image encoding device according to an aspect of the present invention includes a first encoding unit that encodes a plurality of images from a plurality of different viewpoints to generate encoded image data, and identification from at least one or more viewpoints. A second encoding unit that encodes depth information indicating the depth of the space to generate depth information encoded data, and includes viewpoint information for specifying a plurality of viewpoints based on a plurality of images and depth information A third encoding unit that encodes parameter information and generates parameter information encoded data; and image encoded data generated by the first encoding unit, the second encoding unit, and the third encoding unit, A stream generation unit that generates an encoded stream including the depth information encoded data and the parameter information encoded data.

なお、以上の構成要素の任意の組み合わせ、本発明の表現を方法、装置、システム、記録媒体、コンピュータプログラムなどの間で変換したものもまた、本発明の態様として有効である。 It should be noted that any combination of the above-described constituent elements and a conversion of the expression of the present invention between a method, an apparatus, a system, a recording medium, a computer program, etc. are also effective as an aspect of the present invention.

本発明によれば、多視点画像を効率よく伝送または蓄積することができる。 According to the present invention, multi-viewpoint images can be transmitted or stored efficiently.

以下、図面と共に本発明の実施の形態を説明する。以下の実施の形態では、ＡＶＣ／Ｈ．２６４符号化方式を多視点画像に拡張したＭＶＣ方式をさらに拡張した方式で、多視点画像を符号化する例を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the following embodiments, AVC / H. An example will be described in which a multi-view image is encoded using a method that is an extension of the MVC method in which the H.264 encoding method is extended to a multi-view image.

まず、ＡＶＣ／Ｈ．２６４符号化方式について簡単に説明する。ＡＶＣ／Ｈ．２６４符号化方式は、従来のＭＰＥＧ−２ビデオ（ＩＳＯ／ＩＥＣ１３８１８−２）やＭＰＥＧ−４ビジュアル（ＩＳＯ／ＩＥＣ１４４９６−２）等の符号化方式に比べ、より高い符号化効率を実現している。 First, AVC / H. The H.264 encoding method will be briefly described. AVC / H. The H.264 encoding method realizes higher encoding efficiency than conventional encoding methods such as MPEG-2 video (ISO / IEC 13818-2) and MPEG-4 visual (ISO / IEC 14496-2). Yes.

ＭＰＥＧ−２ビデオやＭＰＥＧ−４ビジュアル等の符号化方式におけるＰピクチャ（すなわち、順方向予測符号化画像）では、表示順で直前のＩピクチャまたはＰピクチャのみから動き補償予測を行っていた。これに対して、ＡＶＣ／Ｈ．２６４符号化方式では、ＰピクチャおよびＢピクチャは複数のピクチャを参照ピクチャとして用いることができ、この中からブロック毎に最適なものを選択して動き補償を行うことができる。また、表示順で先行するピクチャに加えて、既に符号化済みの表示順で後続のピクチャも参照することができる。 In a P picture (that is, a forward prediction encoded image) in an encoding method such as MPEG-2 video or MPEG-4 visual, motion compensation prediction is performed only from the immediately preceding I picture or P picture in display order. In contrast, AVC / H. In the H.264 coding system, a plurality of P pictures and B pictures can be used as reference pictures, and motion compensation can be performed by selecting an optimum picture for each block. In addition to the preceding picture in the display order, a subsequent picture in the already encoded display order can also be referred to.

また、ＭＰＥＧ−２ビデオやＭＰＥＧ−４ビジュアル等の符号化方式におけるＢピクチャは、表示順で前方１枚の参照ピクチャ、後方１枚の参照ピクチャ、またはその２枚の参照ピクチャを同時に参照して２つのピクチャの平均値を、予測のための参照ピクチャとし、対象ピクチャと参照ピクチャとの差分データを符号化していた。これに対して、ＡＶＣ／Ｈ．２６４符号化方式では、Ｂピクチャは表示順で前方１枚、後方１枚という制約にとらわれず、かつ前方や後方に関係なく任意の参照ピクチャを予測のために参照可能である。さらに、Ｂピクチャは自己以外のＢピクチャを参照ピクチャとして参照することも可能である。 A B picture in an encoding system such as MPEG-2 video or MPEG-4 visual refers to one reference picture in the display order, one reference picture in the rear, or two reference pictures at the same time. The average value of the two pictures is used as a reference picture for prediction, and difference data between the target picture and the reference picture is encoded. In contrast, AVC / H. In the H.264 coding system, a B picture is not restricted by the restriction of one front and one rear in the display order, and an arbitrary reference picture can be referred for prediction regardless of the front or rear. Further, a B picture can refer to a B picture other than itself as a reference picture.

さらに、ＭＰＥＧ−２ビデオではピクチャ、ＭＰＥＧ−４ではビデオ・オブジェクト・プレーン（ＶＯＰ）を１つの単位として、ピクチャまたはＶＯＰ毎の符号化モードが決められていたが、ＡＶＣ／Ｈ．２６４符号化方式では、スライスを符号化の単位としており、１つのピクチャ内にＩスライス、Ｐスライス、Ｂスライス等、異なるスライスを混在させることも可能である。 Furthermore, the encoding mode for each picture or VOP has been determined using a picture in MPEG-2 video and a video object plane (VOP) in MPEG-4 as one unit. In the H.264 encoding method, a slice is used as an encoding unit, and different slices such as an I slice, a P slice, and a B slice can be mixed in one picture.

さらに、ＡＶＣ／Ｈ．２６４符号化方式ではビデオの画素信号（すなわち、符号化モード、動きベクトル、ＤＣＴ係数等）の符号化処理または復号処理を行うＶＣＬ（Video Coding Layer;ビデオ符号化層）と、ＮＡＬ（Network Abstraction Layer;ネットワーク抽象層）が定義されている。 Furthermore, AVC / H. In the H.264 encoding method, a VCL (Video Coding Layer) for encoding or decoding a video pixel signal (that is, encoding mode, motion vector, DCT coefficient, etc.) and NAL (Network Abstraction Layer) ; Network abstraction layer) is defined.

ＡＶＣ／Ｈ．２６４符号化方式で符号化された符号化ストリームは、ＮＡＬの一区切りであるＮＡＬユニットを単位として構成される。ＮＡＬユニットには、ＶＣＬで符号化されたデータ（すなわち、符号化モード、動きベクトル、ＤＣＴ係数等）を含むＶＣＬのＮＡＬユニットと、ＶＣＬで生成されたデータを含まないｎｏｎ−ＶＣＬのＮＡＬユニットがある。ｎｏｎ−ＶＣＬのＮＡＬユニットには、シーケンス全体の符号化に係るパラメータ情報が含まれているＳＰＳ（Sequence Parameter Set）、ピクチャの符号化に係るパラメータ情報が含まれているＰＰＳ（Picture Parameter Set）、ＶＣＬで符号化されたデータの復号に必要ではないＳＥＩ（Supplemental Enhancement Information：補足付加情報）等がある。 AVC / H. An encoded stream encoded by the H.264 encoding method is configured in units of NAL units that are one segment of NAL. The NAL unit includes a VCL NAL unit that includes VCL-encoded data (ie, encoding mode, motion vector, DCT coefficient, etc.) and a non-VCL NAL unit that does not include VCL-generated data. is there. The non-VCL NAL unit includes an SPS (Sequence Parameter Set) that includes parameter information related to encoding of the entire sequence, a PPS (Picture Parameter Set) that includes parameter information related to encoding of pictures, There is SEI (Supplemental Enhancement Information) which is not necessary for decoding data encoded by VCL.

また、ＡＶＣ／Ｈ．２６４符号化方式における符号化の基本単位は、ピクチャが分割されたスライスであり、ＶＣＬのＮＡＬユニットはスライス単位となっている。そこで、いくつかのＮＡＬユニットをまとめたアクセス・ユニットと呼ばれる単位が定義されており、１アクセス・ユニットに１つの符号化されたピクチャが含まれる。 In addition, AVC / H. The basic unit of encoding in the H.264 encoding method is a slice in which a picture is divided, and the NAL unit of the VCL is a slice unit. Therefore, a unit called an access unit in which several NAL units are collected is defined, and one encoded picture is included in one access unit.

次に、ＭＶＣ方式について簡単に説明する。ここでは、多視点画像の各画像を符号化、および符号化された符号化ストリームを復号する際の視点間の関係、並びにその多視点画像を構成する符号化対象画像間の参照依存関係について、５視点の例を用いて説明する。 Next, the MVC method will be briefly described. Here, the relationship between the viewpoints when encoding each image of the multi-view image and decoding the encoded stream, and the reference dependency relationship between the encoding target images constituting the multi-view image, This will be described using an example of five viewpoints.

図２は、ＭＶＣ方式で５視点からなる多視点画像を符号化する際の、画像間の参照依存関係の一例を示す図である。縦軸は複数の視点の空間方向（本明細書では複数の視点の空間方向を視点方向とする）を示しており、横軸は撮影または表示順の時間方向を示している。Ｐ（ｖ，ｔ）（視点ｖ＝０，１，２，・・・；時間ｔ＝０，１，２，・・・）は、時間ｔにおける視点ｖの画像である。 FIG. 2 is a diagram illustrating an example of a reference dependency relationship between images when a multi-view image including five viewpoints is encoded by the MVC method. The vertical axis indicates the spatial direction of a plurality of viewpoints (in this specification, the spatial direction of the plurality of viewpoints is the viewpoint direction), and the horizontal axis indicates the time direction of shooting or display order. P (v, t) (viewpoint v = 0, 1, 2,...; Time t = 0, 1, 2,...) Is an image of the viewpoint v at time t.

また、矢印の終点側で指し示される画像は、符号化または復号される対象ピクチャである。矢印の始点側で指し示される画像は、その対象ピクチャが符号化または復号される際に参照される参照ピクチャである。すなわち、時間方向のインター予測（例えば、動き補償予測）や視点間予測（例えば、視差補償予測）で参照される参照ピクチャである。より具体的には、横方向の矢印の始点側で指し示される画像は、対象ピクチャが符号化または復号される際に時間方向のインター予測で参照される参照ピクチャであり、縦方向の矢印の始点側で指し示される画像は、視点間予測で参照される参照ピクチャである。 In addition, the image indicated on the end point side of the arrow is a target picture to be encoded or decoded. The image pointed to by the start point of the arrow is a reference picture that is referred to when the target picture is encoded or decoded. That is, the reference picture is referred to in inter prediction (eg, motion compensation prediction) or inter-view prediction (eg, disparity compensation prediction) in the time direction. More specifically, the image pointed to by the start point of the horizontal arrow is a reference picture that is referred to by inter prediction in the temporal direction when the target picture is encoded or decoded, and the image of the vertical arrow The image pointed to by the start point side is a reference picture that is referred to in inter-view prediction.

ここで、時間方向のインター予測は他の時間の画像を参照する予測方法であり、視点間予測は他の視点の画像を参照する予測方法である。また、時間方向のインター予測の参照ピクチャとして用いられる画像は、時間方向の符号化または復号順で先行する画像のみとし、視点間予測の参照ピクチャとして用いられる画像は、視点方向の符号化または復号順で先行する画像のみとする。例えば、図２に示す参照依存関係では、視点方向への視点の符号化または復号順を視点０、視点２、視点１、視点４、視点３とすればよい。また、時間方向への視点の符号化または復号順は、ｔ＝０、４、２、１、３、８、６、５、７、…とすればよい。まず、同じ時間である、ｔが０の各視点の画像を、上記視点方向への視点の符号化または復号順にしたがい、Ｐ（０，０），Ｐ（２，０），Ｐ（１，０），Ｐ（４，０），Ｐ（３，０）の順で符号化または復号する。その後、ｔが４の各視点の画像を同じく、上記視点方向への視点の符号化または復号順にしたがい、Ｐ（０，４），Ｐ（２，４），Ｐ（１，４），Ｐ（４，４），Ｐ（３，４）の順で符号化または復号する。以下、ｔが２の各視点の画像以降の画像についても同様に処理する。 Here, inter prediction in the time direction is a prediction method that refers to an image at another time, and inter-view prediction is a prediction method that refers to an image at another viewpoint. In addition, the image used as the reference picture for inter prediction in the temporal direction is only the image preceding in the encoding or decoding order in the temporal direction, and the image used as the reference picture for inter-view prediction is encoded or decoded in the viewpoint direction. Only images that precede in order. For example, in the reference dependency relationship illustrated in FIG. 2, the viewpoint encoding or decoding order in the viewpoint direction may be set to viewpoint 0, viewpoint 2, viewpoint 1, viewpoint 4, and viewpoint 3. Further, the encoding or decoding order of viewpoints in the time direction may be t = 0, 4, 2, 1, 3, 8, 6, 5, 7,. First, P (0,0), P (2,0), P (1,0) are set at the same time in accordance with the viewpoint encoding or decoding order of viewpoints in the viewpoint direction. ), P (4,0), P (3,0) in this order. After that, the images of the respective viewpoints having t of 4 are similarly expressed in the order of encoding or decoding of the viewpoints in the viewpoint direction, and P (0, 4), P (2, 4), P (1, 4), P ( 4, 4) and P (3, 4) in this order. Hereinafter, the same processing is performed for images after the images of the respective viewpoints with t = 2.

また、視点０を基底視点とする。ＭＶＣ符号化方式では、基底視点とは他の視点に依存せずに符号化または復号することができる視点をいう。多視点画像のシーケンス全体で１つの視点だけが基底視点となる。すなわち、基底視点は他の視点の画像を視点間予測の参照画像として用いることなく、単独で符号化または復号することができる。また、非基底視点（すなわち、基底視点以外の視点）は、他の視点の画像を視点間予測の参照画像として用いることができる。 In addition, viewpoint 0 is set as a base viewpoint. In the MVC encoding method, the base viewpoint refers to a viewpoint that can be encoded or decoded without depending on other viewpoints. Only one viewpoint is the base viewpoint in the entire sequence of multi-viewpoint images. That is, the base viewpoint can be encoded or decoded independently without using an image of another viewpoint as a reference image for inter-view prediction. For non-base viewpoints (that is, viewpoints other than the base viewpoint), images from other viewpoints can be used as reference images for inter-view prediction.

さらに、ＭＶＣ方式は、符号化される多視点画像の視点数、視点方向への符号化または復号順序、および視点間予測による各視点間の参照依存関係をシーケンス全体として符号化する仕組みを持っている。シーケンス情報のパラメータセットであるＳＰＳを拡張することにより符号化を行う。 Furthermore, the MVC scheme has a mechanism for encoding the number of viewpoints of the multi-view image to be encoded, the encoding or decoding order in the viewpoint direction, and the reference dependency relationship between the viewpoints by inter-view prediction as a whole sequence. Yes. Encoding is performed by extending SPS which is a parameter set of sequence information.

符号化側でシーケンス全体として上記パラメータ、すなわち視点数および各視点の視点依存情報を符号化することにより、復号側ではシーケンス全体として、各視点の参照依存関係を判別することができる。各視点の参照依存情報は、視点間予測ピクチャのための参照ピクチャリストの初期化等の復号処理に用いられる。 By encoding the above parameters, that is, the number of viewpoints and the viewpoint dependency information of each viewpoint, on the encoding side, the reference dependence relationship of each viewpoint can be determined on the decoding side as the entire sequence. The reference dependency information of each viewpoint is used for decoding processing such as initialization of a reference picture list for an inter-view prediction picture.

（実施の形態１）
図１は、実施の形態１に係る画像符号化装置１００の構成を示すブロック図である。実施の形態１に係る画像符号化装置１００は、符号化管理部１０１、パラメータ情報符号化部１１０、画像信号符号化部１０７および奥行き情報符号化部（より具体的には、デプス信号符号化部１０８）を備える。パラメータ情報符号化部１１０は、画像信号用シーケンス情報符号化部１０２、デプス信号用シーケンス情報符号化部１０３、画像信号用ピクチャ情報符号化部１０４、デプス信号用ピクチャ情報符号化部１０５およびカメラパラメータ情報符号化部１０６を含む。 (Embodiment 1)
FIG. 1 is a block diagram showing a configuration of an image encoding device 100 according to Embodiment 1. The image encoding apparatus 100 according to Embodiment 1 includes an encoding management unit 101, a parameter information encoding unit 110, an image signal encoding unit 107, and a depth information encoding unit (more specifically, a depth signal encoding unit). 108). The parameter information encoding unit 110 includes an image signal sequence information encoding unit 102, a depth signal sequence information encoding unit 103, an image signal picture information encoding unit 104, a depth signal picture information encoding unit 105, and a camera parameter. An information encoding unit 106 is included.

これらの構成は、ハードウェア的には、任意のコンピュータのＣＰＵ、メモリ、その他のＬＳＩで実現でき、ソフトウェア的にはメモリにロードされたプログラムなどによって実現されるが、ここではそれらの連携によって実現される機能ブロックを描いている。したがって、これらの機能ブロックがハードウェアのみ、ソフトウェアのみ、またはそれらの組み合わせによっていろいろな形で実現できることは、当業者には理解されるところである。 These configurations can be realized in hardware by any computer's CPU, memory, and other LSIs, and in software, they are realized by programs loaded into the memory. Draw functional blocks. Therefore, those skilled in the art will understand that these functional blocks can be realized in various forms by hardware only, software only, or a combination thereof.

画像信号符号化部１０７は、それぞれ異なる複数の視点からの複数の画像を符号化して、画像符号化データを生成する。当該複数の画像はカメラにより実際に撮像された画像であってもよいし、コンピュータグラフィックにより生成された画像であってもよい。当該複数の視点のうち基準とすべき視点が一つ設定される場合、画像信号符号化部１０７は、当該複数の画像のうち基準とすべき視点からの画像を符号化して第１画像符号化データを生成し、それ以外の画像を符号化して第２画像符号化データを生成することができる。 The image signal encoding unit 107 encodes a plurality of images from a plurality of different viewpoints, and generates encoded image data. The plurality of images may be images actually captured by a camera or images generated by computer graphics. When one viewpoint to be a reference among the plurality of viewpoints is set, the image signal encoding unit 107 encodes an image from the viewpoint to be the reference among the plurality of images and performs first image encoding. Data can be generated and other images can be encoded to generate second image encoded data.

その際、画像信号符号化部１０７は、当該複数の画像のうち基準とすべき視点からの画像をフレーム内予測符号化し、それ以外の画像を複数の画像間でフレーム間予測符号化してもよい。ＭＶＣ方式では、当該基準とすべき視点は上述した基底視点である。当該フレーム間予測符号化は上述した視点間予測符号化である。 At that time, the image signal encoding unit 107 may perform intra-frame prediction encoding of an image from the viewpoint to be used as a reference among the plurality of images, and may perform inter-frame prediction encoding of the other images between the plurality of images. . In the MVC method, the viewpoint that should be the reference is the base viewpoint described above. The inter-frame prediction encoding is the inter-view prediction encoding described above.

また、当該複数の画像が動画像の場合、画像信号符号化部１０７は各視点からの動画像を、それぞれ時間方向にフレーム間予測符号化することもできる。もちろん、視点方向へのフレーム間予測符号化と時間方向へのフレーム間予測符号化を併用することもできる。 When the plurality of images are moving images, the image signal encoding unit 107 can also perform inter-frame predictive encoding of the moving images from the respective viewpoints in the time direction. Of course, inter-frame prediction encoding in the viewing direction and inter-frame prediction encoding in the temporal direction can be used in combination.

上記奥行き情報符号化部は、少なくとも一つ以上の視点からの特定空間の奥行きを示す奥行き情報を符号化して、奥行き情報符号化データを生成する。当該奥行き情報はある視点からのモノクローム・フォーマットの画像（以下適宜、モノクローム画像とする）で表されてもよい。この場合、奥行き情報符号化部は、当該モノクローム画像を符号化して、奥行き情報符号化データを生成する。 The depth information encoding unit encodes depth information indicating the depth of the specific space from at least one viewpoint, and generates depth information encoded data. The depth information may be represented by an image in a monochrome format from a certain viewpoint (hereinafter referred to as a monochrome image as appropriate). In this case, the depth information encoding unit encodes the monochrome image to generate depth information encoded data.

上記奥行き情報符号化部は、それぞれ異なる複数の視点からの、複数のモノクローム画像のうち、基準とすべき視点からのモノクローム画像をフレーム内予測符号化し、それ以外のモノクローム画像を複数のモノクローム画像間でフレーム間予測符号化してもよい。ここで、モノクローム画像のもとになる基準とすべき視点は、画像信号符号化部１０７により符号化されるべき画像のもとになる基準とすべき視点と一致していてもよいし、別々であってもよい。 The depth information encoding unit performs intra-frame predictive encoding of a monochrome image from a viewpoint to be a reference among a plurality of monochrome images from a plurality of different viewpoints, and converts the other monochrome images between the plurality of monochrome images. Inter-frame predictive coding may be used. Here, the viewpoint that should be the basis of the monochrome image may coincide with the viewpoint that should be the basis of the image that is to be encoded by the image signal encoding unit 107, or may be different. It may be.

また、当該複数のモノクローム画像が動画像の場合、上記奥行き情報符号化部は各視点からのモノクローム・フォーマットの動画像を、それぞれ時間方向にフレーム間予測符号化することもできる。もちろん、視点方向へのフレーム間予測符号化と時間方向へのフレーム間予測符号化を併用することもできる。 In addition, when the plurality of monochrome images are moving images, the depth information encoding unit can also perform inter-frame predictive encoding of moving images in monochrome format from each viewpoint in the time direction. Of course, inter-frame prediction encoding in the viewing direction and inter-frame prediction encoding in the temporal direction can be used in combination.

ここで、上記奥行き情報のもとになる視点の数は、第１符号化部により符号化されるべき画像のもとになる視点の数より少なく設定されてもよいし、両者の視点の数が一致するように設定されてもよい。また、上記奥行き情報のもとになる各視点の位置は、画像信号符号化部１０７により符号化されるべき複数の画像のもとになる複数の視点の位置のいずれかに一致するように設定されてもよいし、いずれにも一致しないように設定されてもよい。 Here, the number of viewpoints from which the depth information is based may be set to be smaller than the number of viewpoints from which the first encoding unit is to be encoded, or the number of both viewpoints is May be set to match. In addition, the position of each viewpoint that is the basis of the depth information is set so as to coincide with any of the positions of the plurality of viewpoints that are the basis of the plurality of images to be encoded by the image signal encoding unit 107. It may be set so as not to match any of them.

パラメータ情報符号化部１１０は、上記複数の画像および上記奥行き情報のもとになる、複数の視点を特定するための視点情報を含むパラメータ情報を符号化して、パラメータ情報符号化データを生成する。上述したように基準とすべき視点が一つ設定される場合、パラメータ情報符号化部１１０は、上記複数の画像のうち、基準とすべき視点からの画像の第１パラメータ情報と、それ以外の画像の第２パラメータ情報と、奥行き情報の第３パラメータ情報をそれぞれ符号化して、第１パラメータ情報符号化データ、第２パラメータ情報符号化データおよび第３パラメータ情報符号化データを生成する。 The parameter information encoding unit 110 generates parameter information encoded data by encoding parameter information including viewpoint information for specifying a plurality of viewpoints based on the plurality of images and the depth information. As described above, when one viewpoint to be used as a reference is set, the parameter information encoding unit 110 sets the first parameter information of an image from the viewpoint to be used as the reference among the plurality of images, and other information. The second parameter information of the image and the third parameter information of the depth information are encoded to generate first parameter information encoded data, second parameter information encoded data, and third parameter information encoded data.

ここで、第３パラメータ情報は、第２パラメータ情報のシンタックス構造に対応するシンタックス構造で記述される。例えば、第２パラメータ情報および第３パラメータ情報をＡＶＣ／Ｈ．２６４符号化方式のマルチビュー・ハイ・プロファイルに準拠して記述することができる。第２パラメータ情報および第３パラメータ情報には、視点の識別情報が記述される。画像信号符号化部１０７により符号化されるべき画像のもとになる視点の位置と、上記奥行き情報のもとになる視点の位置が一致する場合、それらの視点に共通の識別情報が付与される。すなわち、視点の識別情報は当該画像と当該奥行き情報との間で統一的に管理される。 Here, the third parameter information is described in a syntax structure corresponding to the syntax structure of the second parameter information. For example, the second parameter information and the third parameter information are stored in AVC / H. It can be described in accordance with the H.264 encoding multi-view high profile. Viewpoint identification information is described in the second parameter information and the third parameter information. When the viewpoint position that is the basis of the image to be encoded by the image signal encoding unit 107 matches the viewpoint position that is the basis of the depth information, common identification information is assigned to these viewpoints. The That is, the viewpoint identification information is managed uniformly between the image and the depth information.

ユニット化部１０９は、画像信号符号化部１０７および上記奥行き情報符号化部によりそれぞれ生成された、上記画像符号化データおよび上記奥行き情報符号化データを含む符号化ストリームを生成する。ユニット化部１０９は、パラメータ情報符号化部１１０により生成された上記パラメータ情報符号化データをさらに含む符号化ストリームを生成することもできる。 The unitization unit 109 generates an encoded stream including the image encoded data and the depth information encoded data generated by the image signal encoding unit 107 and the depth information encoding unit, respectively. The unitization unit 109 can also generate an encoded stream that further includes the parameter information encoded data generated by the parameter information encoding unit 110.

画像信号符号化部１０７により符号化されるべき画像のもとになる複数の視点のうち、基準とすべき視点が一つ設定される場合、ユニット化部１０９は、画像信号符号化部１０７、上記奥行き情報符号化部およびパラメータ情報符号化部１１０によりそれぞれ生成された、上記第１画像符号化データ、上記第２画像符号化データ、上記奥行き情報符号化データ、上記第１パラメータ情報符号化データ、上記第２パラメータ情報符号化データおよび上記第３パラメータ情報符号化データを含む符号化ストリームを生成する。 When one viewpoint to be used as a reference is set among a plurality of viewpoints that are the basis of an image to be encoded by the image signal encoding unit 107, the unitization unit 109 includes the image signal encoding unit 107, The first image encoded data, the second image encoded data, the depth information encoded data, and the first parameter information encoded data generated by the depth information encoding unit and the parameter information encoding unit 110, respectively. Then, an encoded stream including the second parameter information encoded data and the third parameter information encoded data is generated.

図３は、実施の形態１の変形例に係る画像符号化装置１００ａの構成を示すブロック図である。実施の形態１の変形例に係る画像符号化装置１００ａは、図１に示す画像符号化装置１００に奥行き情報生成部（より具体的には、デプス信号生成部１１１）が追加された構成である。 FIG. 3 is a block diagram showing a configuration of an image encoding device 100a according to a modification of the first embodiment. The image coding device 100a according to the modification of the first embodiment has a configuration in which a depth information generation unit (more specifically, a depth signal generation unit 111) is added to the image coding device 100 illustrated in FIG. .

当該変形例において、上記奥行き情報生成部は、画像信号符号化部１０７により符号化されるべき複数の画像から、少なくとも一つの視点からの特定空間の奥行きを示す奥行き情報を生成する。上記奥行き情報生成部は、この奥行き情報の生成を既存のアルゴリズムを用いて実現することができる。上記奥行き情報符号化部は、当該奥行き情報生成部により生成された奥行き情報を符号化して、奥行き情報符号化データを生成する。その他の処理は、図１に示した実施の形態１の基本例に係る画像符号化装置１００の説明と同様のため、その説明を省略する。 In the modification, the depth information generation unit generates depth information indicating the depth of a specific space from at least one viewpoint from a plurality of images to be encoded by the image signal encoding unit 107. The depth information generation unit can generate the depth information using an existing algorithm. The depth information encoding unit encodes the depth information generated by the depth information generation unit to generate depth information encoded data. The other processing is the same as the description of the image coding apparatus 100 according to the basic example of the first embodiment shown in FIG.

ここで、実施の形態１に係る画像符号化装置１００で符号化されるべき画像、および奥行き情報について説明する。当該画像は被写体が各視点に対応する２次元平面にカメラ等の撮像装置により投影されることによってできる絵である。また、画像信号は２次元情報である画像を1次元の信号の流れに変換したものである。なお、デジタルで表現される画像、及び画像信号の最小単位は画素である。当該画像符号化装置１００に入力される多視点の画像信号は、設定された２以上の複数の視点でそれぞれ得られる画像信号を含む多視点画像信号である。ある視点の画像信号は、その視点から実際に撮影されて得られた画像信号であってもよいし、その視点から仮想的に撮影されたものとして、コンピュータグラフィックス等により生成された画像信号であってもよい。また、実際の撮影により得られた画像信号に対して、その撮影に用いられる各カメラのばらつきを補正するために、位置補正、輝度・色レベル補正を施す場合もある。 Here, an image to be encoded by the image encoding apparatus 100 according to Embodiment 1 and depth information will be described. The image is a picture that can be obtained by projecting a subject onto a two-dimensional plane corresponding to each viewpoint by an imaging device such as a camera. The image signal is obtained by converting an image, which is two-dimensional information, into a one-dimensional signal flow. Note that the minimum unit of digitally expressed images and image signals is pixels. The multi-view image signal input to the image encoding apparatus 100 is a multi-view image signal including image signals respectively obtained from a plurality of set two or more viewpoints. The image signal of a certain viewpoint may be an image signal obtained by actually photographing from the viewpoint, or an image signal generated by computer graphics or the like as virtually photographed from the viewpoint. There may be. In addition, position correction and luminance / color level correction may be performed on an image signal obtained by actual shooting in order to correct variations in each camera used for the shooting.

上記デプス信号は、設定された２以上の複数の視点でそれぞれ得られるデプス信号を含む多視点のデプス信号であってもよい。ある視点のデプス信号は、赤外線カメラ等によりその視点から実際に撮影されて得られたデプス信号であってもよいし、その視点から仮想的に撮影されたものとして、上記多視点の画像信号をもとに演算により生成されたデプス信号であってもよい。 The depth signal may be a multi-view depth signal including depth signals respectively obtained from a plurality of set two or more viewpoints. The depth signal of a certain viewpoint may be a depth signal obtained by actually photographing from the viewpoint by an infrared camera or the like, or the above-mentioned multi-viewpoint image signal may be virtually captured from the viewpoint. It may be a depth signal originally generated by calculation.

当該奥行き情報は特定空間の奥行きを示す情報である。例えば、当該奥行き情報は画像内の被写体（すなわち、オブジェクト）の画像平面に対する、奥行き情報として表される。より具体的には、当該奥行き情報は２次元平面に投影された画像の奥行きを示す情報である。２次元平面に投影された画像の各画素に対応する奥行き情報が画素単位あるいは複数画素単位でマッピングされた画像をデプスマップと呼ぶ。デプス信号は２次元情報であるデプスマップが1次元の信号の流れに変換されたものである。なお、画像や画像信号と同様に、デジタルで表現されるデプスマップ、及びデプス信号の最小単位も画素である。上記デプス信号は、設定された２以上の複数の視点でそれぞれ得られるデプス信号を含む多視点のデプス信号であってもよい。ある視点のデプス信号は、赤外線カメラ等によりその視点から実際に撮影されて得られたデプス信号であってもよいし、その視点から仮想的に撮影されたものとして、上記多視点の画像信号をもとに演算により生成されたデプス信号であってもよい。 The depth information is information indicating the depth of the specific space. For example, the depth information is represented as depth information with respect to an image plane of a subject (that is, an object) in the image. More specifically, the depth information is information indicating the depth of the image projected on the two-dimensional plane. An image in which depth information corresponding to each pixel of an image projected on a two-dimensional plane is mapped in units of pixels or in units of a plurality of pixels is referred to as a depth map. The depth signal is obtained by converting a depth map, which is two-dimensional information, into a one-dimensional signal flow. Note that, as with images and image signals, the digitally expressed depth map and the minimum unit of depth signals are also pixels. The depth signal may be a multi-view depth signal including depth signals respectively obtained from a plurality of set two or more viewpoints. The depth signal of a certain viewpoint may be a depth signal obtained by actually photographing from the viewpoint by an infrared camera or the like, or the above-mentioned multi-viewpoint image signal may be virtually captured from the viewpoint. It may be a depth signal originally generated by calculation.

デプス信号の画素値も画像信号と同様に８ビットで表現されることが多いが、奥行き方向の再現性を高めるために９〜１４ビット程度で表現されてもよい。デプスマップはモノクローム・フォーマットの画像として表される。なお、上記画像の各画素との対応が取れる限りにおいては当該デプスマップの解像度は上記画像の解像度より低く設定されてもよい。 The pixel value of the depth signal is often expressed by 8 bits like the image signal, but may be expressed by about 9 to 14 bits in order to improve the reproducibility in the depth direction. The depth map is represented as an image in monochrome format. Note that the resolution of the depth map may be set lower than the resolution of the image as long as the correspondence with each pixel of the image can be obtained.

当該デプス信号は、主に、実在する視点の画像信号から、存在しない所望の仮想視点の画像信号を生成するために用いる。ユーザの指示に応じて表示されるべき画像の視点があらかじめ特定できない自由視点画像を復号側で表示する場合や、視点の数が多く、それらの視点のすべての画像をすべて撮影、伝送または蓄積することが困難な場合には仮想視点の画像信号を生成するのが有効である。 The depth signal is mainly used to generate an image signal of a desired virtual viewpoint that does not exist from an image signal of an existing viewpoint. When a free viewpoint image in which the viewpoint of an image to be displayed in accordance with a user instruction cannot be specified in advance is displayed on the decoding side, or there are many viewpoints, and all images from those viewpoints are captured, transmitted, or stored If this is difficult, it is effective to generate a virtual viewpoint image signal.

既存の視点の画像信号から、存在しない仮想視点の画像を生成するための手法の１つに特開平９−８１７４６号公報に開示されたものがある。この手法では、存在しない仮想視点の画像を生成する際、既存の視点の画像信号から奥行き情報を計算し、その奥行き情報に従って、所望の仮想視点の画像を生成する。 One technique for generating a virtual viewpoint image that does not exist from an existing viewpoint image signal is disclosed in Japanese Patent Laid-Open No. 9-81746. In this method, when generating an image of a virtual viewpoint that does not exist, depth information is calculated from an image signal of an existing viewpoint, and a desired virtual viewpoint image is generated according to the depth information.

多視点の画像信号が符号化されて得られた符号化ストリームを伝送または蓄積し、その符号化ストリームを復号して得られる画像信号からデプス信号を求めて、所望の仮想視点の画像信号を生成する手法は、復号側でデプス信号を算出する処理の負担が大きい。また一般的に、復号側で生成されるデプス信号の品質は、符号側で生成されるデプス信号の品質より低くなる。一般的な符号化方式では、符号化する際に原画像信号の高周波成分が省略されるためである。 Transmit or store the encoded stream obtained by encoding the multi-viewpoint image signal, obtain the depth signal from the image signal obtained by decoding the encoded stream, and generate the desired virtual viewpoint image signal This technique places a heavy burden on the decoding side to calculate the depth signal. In general, the quality of the depth signal generated on the decoding side is lower than the quality of the depth signal generated on the code side. This is because a high-frequency component of the original image signal is omitted in the general encoding method when encoding.

そこで、本実施の形態では符号化側で多視点の画像信号からデプス信号を生成し、複数の視点の画像信号と、複数の視点のデプス信号を符号化の対象とする。復号側は符号化ストリームを復号することにより、画像信号に加えてデプス信号も得ることができる。これにより、復号後にデプス信号を生成する必要がなく、復号側は符号化ストリームを復号して得られる、画像信号とデプス信号から所望の仮想視点の画像信号を生成することができる。 Therefore, in the present embodiment, a depth signal is generated from a multi-viewpoint image signal on the encoding side, and a plurality of viewpoint image signals and a plurality of viewpoint depth signals are to be encoded. The decoding side can obtain a depth signal in addition to the image signal by decoding the encoded stream. Thereby, there is no need to generate a depth signal after decoding, and the decoding side can generate an image signal of a desired virtual viewpoint from the image signal and the depth signal obtained by decoding the encoded stream.

なお、仮想視点の画像信号を生成する場合、１つの視点の、画像信号およびデプス信号から画像を生成するよりも、複数の視点の、画像信号およびデプス信号から画像を生成する方がより良好な仮想視点の画像を得ることができる。以下、この知見について図４、図５を参照しながら、より詳細に説明する。 When generating an image signal of a virtual viewpoint, it is better to generate an image from image signals and depth signals of a plurality of viewpoints than to generate an image from an image signal and a depth signal of one viewpoint. A virtual viewpoint image can be obtained. Hereinafter, this knowledge will be described in more detail with reference to FIGS. 4 and 5.

図４は、第２視点ＶＢおよび第３視点ＶＣから第１対象物ＯＡおよび第２対象物ＯＢが存在するシーンを撮影し、仮想視点である第１視点ＶＡ（以下、第１仮想視点ＶＡと表記する）の画像を生成する例を示す図である。
図５は、図４の例において、撮影された画像、それに対応するデプスマップ、および生成される画像を示す図である。図５において、第２画像ＩＢは図４の第２視点ＶＢから撮影された画像を示し、第３画像ＩＣは図４の第３視点ＶＣから撮影された画像を示す。第２デプスマップＤＢは第２画像ＩＢに対応するデプスマップを示し、第３デプスマップＤＣは第３画像ＩＣに対応するデプスマップを示す。 FIG. 4 shows a scene in which the first object OA and the second object OB exist from the second viewpoint VB and the third viewpoint VC, and the first viewpoint VA (hereinafter referred to as the first virtual viewpoint VA) which is a virtual viewpoint. It is a figure which shows the example which produces | generates the image of (notation).
FIG. 5 is a diagram illustrating a captured image, a corresponding depth map, and a generated image in the example of FIG. In FIG. 5, the second image IB shows an image taken from the second viewpoint VB in FIG. 4, and the third image IC shows an image taken from the third viewpoint VC in FIG. The second depth map DB indicates a depth map corresponding to the second image IB, and the third depth map DC indicates a depth map corresponding to the third image IC.

以下の説明においては、最も後方の対象、すなわちカメラから最も遠い対象に対応するデプス信号の画素値を最小値である０とし、対象が前方にくればくるほど、すなわちカメラに近ければ近いほど、デプス信号の画素値を大きな値とする。また、第１画像ＩＡは第１仮想視点ＶＡから撮影したと仮定した場合に得られる画像（以下、予測画像という）を示し、実際に撮影されるものではなく、生成されるべき画像である。 In the following description, the pixel value of the depth signal corresponding to the rearmost object, that is, the object farthest from the camera is set to 0 which is the minimum value, and the closer the object is to the front, that is, the closer to the camera, The pixel value of the depth signal is set to a large value. Further, the first image IA indicates an image (hereinafter referred to as a predicted image) obtained on the assumption that the first image is taken from the first virtual viewpoint VA, and is not an actual image but an image to be generated.

また、第１−２画像ＩＡＢは、第２視点ＶＢから撮影された第２画像ＩＢとそれに対応する第２デプスマップＤＢから生成された第１仮想視点ＶＡの予測画像である。第２視点ＶＢから撮影された第２画像ＩＢとそれに対応する第２デプスマップＤＢから第１仮想視点ＶＡの予測画像を生成する場合、第２視点ＶＢから撮影した際に前方の第１対象物ＯＡにより隠蔽されていた部分が不明であり、欠落部分が発生する。第１−２画像ＩＡＢの黒塗りの部分が、当該第１仮想視点ＶＡの予測画像内で発生する第２欠落部分ＬＰＢである。 The first-second image IAB is a predicted image of the first virtual viewpoint VA generated from the second image IB photographed from the second viewpoint VB and the corresponding second depth map DB. When the predicted image of the first virtual viewpoint VA is generated from the second image IB photographed from the second viewpoint VB and the corresponding second depth map DB, the first object ahead when photographed from the second viewpoint VB The part concealed by OA is unknown and a missing part occurs. The black portion of the first-second image IAB is the second missing portion LPB that occurs in the predicted image of the first virtual viewpoint VA.

また、第１−３画像ＩＡＣは、第３視点ＶＣから撮影された第３画像ＩＣとそれに対応する第３デプスマップＤＣから生成された第１仮想視点ＶＡの予測画像である。第１−３画像ＩＡＣにも欠落部分が発生する。第１−３画像ＩＡＣの黒塗りの部分が、当該第１仮想視点ＶＡの予測画像内で発生する第３欠落部分ＬＰＣである。第１−３画像ＩＡＣの第３欠落部分ＬＰＣは、第１−２画像ＩＡＢの第２欠落部分ＬＰＢとは異なる位置に発生する。 The first-third image IAC is a predicted image of the first virtual viewpoint VA generated from the third image IC photographed from the third viewpoint VC and the corresponding third depth map DC. Missing portions also occur in the first-3 image IAC. The black portion of the first-3 image IAC is a third missing portion LPC that occurs in the predicted image of the first virtual viewpoint VA. The third missing portion LPC of the first-3 image IAC occurs at a position different from the second missing portion LPB of the 1-2 image IAB.

そこで、第１−２画像ＩＡＢの第２欠落部分ＬＰＢを第１−３画像ＩＡＣの画像信号から補うことにより、欠落部分の少ない第１仮想視点ＶＡの第１画像ＩＡを生成することができる。なお、実際には対象物に立体感や影があり、撮影する視点の位置および方向と、光源との相対関係により、撮影して得られる画像に明るさや色の差が生じるが、図４、図５においてはその点を考慮せずに描いている。 Therefore, by supplementing the second missing portion LPB of the 1-2 image IAB with the image signal of the 1-3 image IAC, the first image IA of the first virtual viewpoint VA with few missing portions can be generated. Actually, the object has a three-dimensional effect or shadow, and brightness or color difference occurs in the image obtained by photographing depending on the relative relationship between the position and direction of the viewpoint to be photographed and the light source. FIG. 5 is drawn without considering that point.

それらの視点毎に生じる輝度差等を考慮したり、ノイズを低減するために、第１−２画像ＩＡＢと第１−３画像ＩＡＣの両方に存在する画素は平均値を用い、片方の画像に欠落部分が生じる画素についてのみもう一方の画像の画素だけを用いる方法もある。このように１つの視点の、画像信号およびデプス信号から生成された仮想視点の画像（図５では、第１−２画像ＩＡＢまたは第１−３画像ＩＡＣ）よりも、２つの視点の、画像信号およびデプス信号から生成された画像のほうが、欠落部分の少ない良好な画像を得ることができる。 In order to take into account the luminance difference between the respective viewpoints and to reduce noise, the average value is used for pixels existing in both the 1-2 image IAB and the 1-3 image IAC, and one image is used. There is also a method in which only the pixel of the other image is used only for the pixel in which the missing portion occurs. Thus, the image signal of two viewpoints rather than the image of the virtual viewpoint generated from the image signal and the depth signal of one viewpoint (in FIG. 5, the first-second image IAB or the first-third image IAC). In addition, the image generated from the depth signal can obtain a good image with fewer missing portions.

また、２つの視点の、画像信号とデプス信号から仮想視点の画像信号を生成するよりも、それ以上の視点の、画像信号とデプス信号を用いた方が、より欠落部分の少ない良好な画像を得ることができる。このように、仮想視点の画像を生成する場合、１つの視点の、画像信号およびデプス信号から画像を生成するよりも、複数の視点の、画像信号およびデプス信号から画像を生成する方がより良好な仮想視点の画像を得ることができる。 In addition, it is possible to generate a better image with fewer missing parts by using an image signal and a depth signal of more viewpoints than generating an image signal of a virtual viewpoint from the image signals and depth signals of two viewpoints. Obtainable. Thus, when generating an image of a virtual viewpoint, it is better to generate an image from image signals and depth signals of a plurality of viewpoints than to generate an image from an image signal and a depth signal of one viewpoint. A virtual viewpoint image can be obtained.

また、２つの視点の、画像信号およびデプス信号から仮想視点の画像信号を生成する場合、視点間の距離が短い２つの視点の、画像信号およびデプス信号から生成する方が、視点間の距離が長い２つの視点の、画像信号およびデプス信号から生成するより良好な仮想視点の画像信号を得ることができる。以下、この知見について図６、図７を参照しながら、より詳細に説明する。 In addition, when generating an image signal of a virtual viewpoint from an image signal and a depth signal of two viewpoints, the distance between the viewpoints is more likely to be generated from an image signal and a depth signal of two viewpoints having a short distance between the viewpoints. A better virtual viewpoint image signal generated from an image signal and a depth signal of two long viewpoints can be obtained. Hereinafter, this knowledge will be described in more detail with reference to FIGS.

図６は、第５視点ＶＥおよび第６視点ＶＦから第３対象物ＯＣおよび第４対象物ＯＤが存在するシーンを撮影し、仮想視点である第４視点ＶＤ（以下、第４仮想視点ＶＤと表記する）の画像を生成する例を示す図である。
図７は、図６の例において、撮影された画像、それに対応するデプスマップ、および生成される画像を示す図である。図７において、第５画像ＩＥは図６の第５視点ＶＥから撮影された画像を示し、第６画像ＩＦは図６の第６視点ＶＦから撮影された画像を示す。第５デプスマップＤＥは第５画像ＩＥに対応するデプスマップを示し、第６デプスマップＤＦは第３画像ＩＣに対応するデプスマップを示す。また、第４画像ＩＤは第４仮想視点ＶＤから撮影したと仮定した場合に得られる予測画像を示し、実際に撮影されるものではなく、生成されるべき画像である。 FIG. 6 shows a scene where the third object OC and the fourth object OD exist from the fifth viewpoint VE and the sixth viewpoint VF, and a fourth viewpoint VD (hereinafter referred to as a fourth virtual viewpoint VD) which is a virtual viewpoint. It is a figure which shows the example which produces | generates the image of (notation).
FIG. 7 is a diagram illustrating a captured image, a corresponding depth map, and a generated image in the example of FIG. In FIG. 7, the fifth image IE shows an image taken from the fifth viewpoint VE in FIG. 6, and the sixth image IF shows an image taken from the sixth viewpoint VF in FIG. The fifth depth map DE indicates a depth map corresponding to the fifth image IE, and the sixth depth map DF indicates a depth map corresponding to the third image IC. The fourth image ID indicates a predicted image obtained when it is assumed that the image is captured from the fourth virtual viewpoint VD, and is not an actual image but an image to be generated.

また、第４−５画像ＩＤＥは、第５視点ＶＥから撮影された第５画像ＩＥとそれに対応する第５デプスマップＤＥから生成された第４仮想視点ＶＤの予測画像である。第５視点ＶＥから撮影された第５画像ＩＥとそれに対応する第５デプスマップＤＥから第４仮想視点ＶＤの予測画像を生成する場合、第５視点ＶＥから撮影した際に前方の第３対象物ＯＣにより隠蔽されていた部分が不明であり、欠落部分が発生する。第４−５画像ＩＤＥの黒塗りの部分が、当該第４仮想視点ＶＤの予測画像内で発生する第５欠落部分ＬＰＥである。 The 4-5th image IDE is a predicted image of the fourth virtual viewpoint VD generated from the fifth image IE photographed from the fifth viewpoint VE and the corresponding fifth depth map DE. When the predicted image of the fourth virtual viewpoint VD is generated from the fifth image IE photographed from the fifth viewpoint VE and the corresponding fifth depth map DE, the third object ahead is photographed from the fifth viewpoint VE. The part concealed by the OC is unknown and a missing part occurs. The black portion of the 4-5th image IDE is the fifth missing portion LPE that occurs in the predicted image of the fourth virtual viewpoint VD.

また、第４−６画像ＩＤＦは、第６視点ＶＦから撮影された第６画像ＩＦとそれに対応する第６デプスマップＤＦから生成された第４仮想視点ＶＤの予測画像である。第４−６画像ＩＤＦにも欠落部分が発生する。第４−６画像ＩＤＦの黒塗りの部分が、当該第４仮想視点ＶＤの予測画像内で発生する第６欠落部分ＬＰＦである。 The fourth to sixth image IDF is a predicted image of the fourth virtual viewpoint VD generated from the sixth image IF photographed from the sixth viewpoint VF and the corresponding sixth depth map DF. A missing part also occurs in the 4th-6th image IDF. The black portion of the 4-6th image IDF is the sixth missing portion LPF that occurs in the predicted image of the fourth virtual viewpoint VD.

第５視点ＶＥと第６視点ＶＦとを比較すると、第６視点ＶＦの方が第４仮想視点から離れているため、第６画像ＩＦの方が第４画像ＩＤからのずれ量が大きくなり、第４−６画像ＩＤＦの第６欠落部分ＬＰＦの面積の方が第４−５画像ＩＤＥの第５欠落部分ＬＰＥの面積より大きくなる。このように、視点間の距離が小さければ小さいほど、画像に写る被写体の視点間のずれ量、変形、明るさや色の差は小さくなり、良好な画像を得ることができる。したがって、仮想視点の画像信号を生成する場合、視点間の距離が短い複数の視点の、画像信号およびデプス信号から生成する方が、視点間の距離が長い複数の視点の、画像信号およびデプス信号から生成するより良好な仮想視点の画像を得ることができる。 Comparing the fifth viewpoint VE and the sixth viewpoint VF, since the sixth viewpoint VF is farther from the fourth virtual viewpoint, the shift amount of the sixth image IF from the fourth image ID becomes larger. The area of the sixth missing portion LPF of the 4-6th image IDF is larger than the area of the fifth missing portion LPE of the 4-5th image IDE. As described above, the smaller the distance between the viewpoints, the smaller the shift amount, deformation, brightness, and color difference between the viewpoints of the subject in the image, and a good image can be obtained. Therefore, when generating an image signal of a virtual viewpoint, it is more preferable to generate an image signal and a depth signal of a plurality of viewpoints having a longer distance between the viewpoints by generating the image signal and the depth signal of the viewpoints having a shorter distance between the viewpoints. A better virtual viewpoint image generated from the image can be obtained.

また、コンテンツの奥行きの状態によっても仮想視点の画像信号の生成しやすさは異なる。重なり合う被写体同士の奥行きの差が小さければ小さいほど、より良好な仮想視点の画像信号を得ることができる。以下、この知見について図８、図９を参照しながら、より詳細に説明する。 Also, the ease with which a virtual viewpoint image signal is generated varies depending on the depth state of the content. The smaller the difference in depth between overlapping subjects, the better the virtual viewpoint image signal can be obtained. Hereinafter, this knowledge will be described in more detail with reference to FIGS.

図８は、第８視点ＶＨから第５対象物ＯＥまたは第６対象物ＯＦのいずれか一方と、第７対象物ＯＧが存在する２つのシーンを撮影し、仮想視点である第７視点ＶＧ（以下、第７仮想視点ＶＧと表記する）の画像を生成する例を示す図である。それぞれのシーンの撮影時には第５対象物ＯＥと第６対象物ＯＦは同時に存在しない。ここで、第５対象物ＯＥおよび第７対象物ＯＧが存在するシーンを第１シーンＨ１、第６対象物ＯＦおよび第７対象物ＯＧが存在するシーンを第２シーンＨ２とする。 FIG. 8 shows two scenes in which either the fifth object OE or the sixth object OF and the seventh object OG are present from the eighth viewpoint VH, and the seventh viewpoint VG ( Hereinafter, an example of generating an image of the seventh virtual viewpoint VG) will be described. At the time of shooting each scene, the fifth object OE and the sixth object OF do not exist at the same time. Here, a scene where the fifth object OE and the seventh object OG are present is a first scene H1, and a scene where the sixth object OF and the seventh object OG are present is a second scene H2.

図９は、図８の例において、撮影された画像、それに対応するデプスマップ、および生成される画像を示す図である。図９において、第８−１画像ＩＨ１は図８の第８視点ＶＨから撮影された第１シーンＨ１の画像を示し、第８−２画像ＩＨ２は同様に図８の第８視点ＶＨから撮影された第２シーンＨ２の画像を示す。第８−１デプスマップＤＨ１は第８−１画像ＩＨ１に対応するデプスマップを示し、第８−２デプスマップＤＨ２は第８−２画像ＩＨ２に対応するデプスマップを示す。 FIG. 9 is a diagram illustrating a captured image, a corresponding depth map, and a generated image in the example of FIG. In FIG. 9, an 8-1 image IH1 shows an image of the first scene H1 taken from the eighth viewpoint VH in FIG. 8, and an eighth-2 image IH2 is taken from the eighth viewpoint VH in FIG. An image of the second scene H2 is shown. The eighth-first depth map DH1 indicates a depth map corresponding to the eighth-first image IH1, and the eighth-second depth map DH2 indicates a depth map corresponding to the eighth-second image IH2.

第７−１画像ＩＧ１は第７仮想視点ＶＧから第１シーンＨ１を撮影したと仮定した場合に得られる予測画像を示し、実際に撮影されるものではなく、生成されるべき画像である。また、第７−２画像ＩＧ２は第７仮想視点ＶＧから第２シーンＨ２を撮影したと仮定した場合に得られる予測画像を示し、実際に撮影されるものではなく、生成されるべき画像である。 The seventh-first image IG1 is a predicted image obtained when it is assumed that the first scene H1 is photographed from the seventh virtual viewpoint VG, and is not actually photographed but an image to be generated. The seventh-second image IG2 is a predicted image obtained when it is assumed that the second scene H2 is captured from the seventh virtual viewpoint VG, and is not actually captured but is an image to be generated. .

また、第７−８−１画像ＩＧＨ１は、第８視点ＶＨから撮影された第１シーンＨ１の第８−１画像ＩＨ１とそれに対応する第８−１デプスマップＤＨ１から生成された第７仮想視点ＶＧの第１シーンＨ１の予測画像である。第８視点ＶＨから撮影された第１シーンＨ１の第８−１画像ＩＨ１とそれに対応する第８−１デプスマップＤＨ１から第７仮想視点ＶＧの第１シーンＨ１の予測画像を生成する場合、第８視点ＶＨから撮影した際に前方の第５対象物ＯＥにより隠蔽されていた部分が不明であり、欠落部分が発生する。第７−８−１画像ＩＧＨ１の黒塗りの部分が、当該第７仮想視点ＶＧの第１シーンＨ１の予測画像内で発生する第８−１欠落部分ＬＰＨ１である。 The seventh-8-1 image IGH1 is a seventh virtual viewpoint generated from the eighth-first image IH1 of the first scene H1 photographed from the eighth viewpoint VH and the corresponding eighth-first depth map DH1. It is a prediction image of the 1st scene H1 of VG. When generating a predicted image of the first scene H1 of the seventh virtual viewpoint VG from the 8-1 image IH1 of the first scene H1 photographed from the eighth viewpoint VH and the corresponding 8-1 depth map DH1, When the image is taken from the eight viewpoints VH, the portion hidden by the front fifth object OE is unknown, and a missing portion occurs. The black portion of the seventh-8-1st image IGH1 is the eighth-first missing portion LPH1 that occurs in the predicted image of the first scene H1 of the seventh virtual viewpoint VG.

また、第７−８−２画像ＩＧＨ２は、第８視点ＶＨから撮影された第２シーンＨ２の第８−２画像ＩＨ２とそれに対応する第８−２デプスマップＤＨ２から生成された第７仮想視点ＶＧの第２シーンＨ２の予測画像である。第７−８−２画像ＩＧＨ２にも欠落部分が発生する。第７−８−２画像ＩＧＨ２の黒塗りの部分が、当該第７仮想視点ＶＤの第２シーンＨ２の予測画像内で発生する第８−２欠落部分ＬＰＨ２である。 In addition, the seventh-8-2 image IGH2 is a seventh virtual viewpoint generated from the eighth-2 image IH2 of the second scene H2 taken from the eighth viewpoint VH and the corresponding eighth-2 depth map DH2. It is a prediction image of the 2nd scene H2 of VG. A missing portion is also generated in the 7-8-2 image IGH2. The blackened portion of the seventh-8-8 image IGH2 is the eighth-2 missing portion LPH2 generated in the predicted image of the second scene H2 of the seventh virtual viewpoint VD.

第８−１画像ＩＨ１と第７−１画像ＩＧ１とのずれ量と、第８−２画像ＩＨ２と第７−２画像ＩＧ２とのずれ量とを比較すると、後者の方が大きくなる。したがって、第７−８−２画像ＩＧＨ２の第８−２欠落部分ＬＰＨ２の面積の方が、第７−８−１画像ＩＧＨ１の第８−１欠落部分ＬＰＨ１の面積より大きくなる。このように、コンテンツの奥行きの状態によっても仮想視点の画像信号の生成しやすさが異なってくる。すなわち、互いに重なり合う被写体同士の奥行きの差が小さければ小さいほど、重なり合う被写体同士の、画像内における相対的なずれ量が小さくなり、生成される画像の欠落部分は小さくなり、良好な画像を得ることができる。 When the shift amount between the 8-1st image IH1 and the 7-1th image IG1 is compared with the shift amount between the 8-2nd image IH2 and the 7th-2 image IG2, the latter becomes larger. Therefore, the area of the 8-2 missing portion LPH2 of the 7-8-2 image IGH2 is larger than the area of the 8-1 missing portion LPH1 of the 7-8-1 image IGH1. Thus, the ease of generating the image signal of the virtual viewpoint varies depending on the depth state of the content. In other words, the smaller the difference in depth between the overlapping subjects, the smaller the relative shift amount between the overlapping subjects in the image, and the smaller the missing portion of the generated image, the better the image. Can do.

なお、重なり合う被写体同士の奥行きの差は、デプス信号から算出することができる。デプス信号（図９では、第８−１デプスマップＤＨ１および第８−２デプスマップＤＨ２）のエッジ（すなわち、濃度が急峻に変化する点）を抽出し、エッジ部分の境界を挟んだ画素値の差を算出し、その差が小さければ小さいほど、重なりあう被写体同士の奥行きの差を小さいと判定する。 Note that the difference in depth between overlapping subjects can be calculated from the depth signal. The edge of the depth signal (the 8-1 depth map DH1 and the 8-2 depth map DH2 in FIG. 9) (that is, the point where the density changes sharply) is extracted, and the pixel value across the boundary of the edge portion is extracted. The difference is calculated, and the smaller the difference is, the smaller the difference in depth between the overlapping subjects is determined.

このように、複数の視点の画像信号を含む多視点画像信号に加えて、複数の視点のデプス信号を含む多視点デプス信号を用いると、復号側で高精度な仮想視点の画像信号を生成することができる。また、視点間の間隔が密な多視点画像信号と、その各視点画像信号のそれぞれに対応した多視点デプス信号を用いると、復号側でさらに高精度な仮想視点の画像信号を生成することができる。 As described above, when a multi-view depth signal including a plurality of viewpoint depth signals is used in addition to a multi-view image signal including a plurality of viewpoint image signals, a high-accuracy virtual viewpoint image signal is generated on the decoding side. be able to. In addition, when a multi-view image signal with a close interval between viewpoints and a multi-view depth signal corresponding to each viewpoint image signal are used, a higher-accuracy virtual viewpoint image signal can be generated on the decoding side. it can.

ただし、視点の数を多く設定しすぎると、ビットレートが高くなり、伝送効率または蓄積効率が低下する。従って、対象となるアプリケーションの伝送レートまたは蓄積媒体の容量を考慮して、多視点画像信号および多視点デプス信号のそれぞれにおいて符号化すべき視点を適切に決定する必要がある。 However, if the number of viewpoints is set too large, the bit rate increases and transmission efficiency or storage efficiency decreases. Therefore, it is necessary to appropriately determine the viewpoint to be encoded in each of the multi-view image signal and the multi-view depth signal in consideration of the transmission rate of the target application or the capacity of the storage medium.

この際、必ずしも符号化される、多視点画像信号とデプス信号のそれぞれの視点が１対１に対応している必要はなく、多視点画像信号と多視点デプス信号とで異なる視点の信号が符号化されてもよい。この場合、より柔軟に符号化することができる。例えば、実際に撮影して得られた画像信号をすべて符号化し、伝送または蓄積する必要がある場合でも、仮想視点の画像信号の生成が容易な場合、符号化するデプス信号の視点を少なく設定してもよい。この場合、より効率的な符号化ストリームを生成することができる。ここで、仮想視点の画像信号の生成が容易な場合とは、符号化される多視点画像信号の視点間の間隔が十分に密である場合や、コンテンツに含まれる被写体同士の奥行きの差があまりない場合等である。 At this time, it is not always necessary that the viewpoints of the multi-view image signal and the depth signal to be encoded have a one-to-one correspondence. May be used. In this case, encoding can be performed more flexibly. For example, if it is necessary to encode and transmit or store all image signals actually captured, if the virtual viewpoint image signal is easy to generate, set the viewpoint of the depth signal to be encoded to be small. May be. In this case, a more efficient encoded stream can be generated. Here, the generation of the image signal of the virtual viewpoint is easy when the distance between the viewpoints of the multi-view image signal to be encoded is sufficiently close, or the difference in depth between subjects included in the content is This is the case when there is not much.

次に、実施の形態１に係る画像符号化装置１００で符号化されることにより生成される符号化ストリームについて説明する。
図１０は、符号化すべき、５視点（視点０、視点１、視点２、視点３および視点４）からの画像ＩＳを含む多視点画像、および３視点（視点０、視点２および視点４）からのデプスＤＳを含む多視点デプスマップを示す図である。縦軸は視点方向を示し、横軸は時間方向を示している。また、視点０を上記基底視点とする。ＭＶＣ符号化方式において、基底視点は他の視点に依存せずに符号化または復号することができる視点である。多視点画像を含む１つのシーケンス全体で１つの視点のみが基底視点に設定される。すなわち、基底視点の画像は、他の視点の画像を視点間予測の参照画像として用いることなく、単独で符号化または復号されることが可能である。また、非基底視点（すなわち、基底視点以外の視点）の画像は、他の視点の画像を視点間予測の参照画像として用いて符号化または復号されることが可能である。以下の説明では、図１０に示す多視点画像および多視点デプスマップを符号化する場合について述べる。 Next, an encoded stream generated by being encoded by the image encoding device 100 according to Embodiment 1 will be described.
FIG. 10 illustrates a multi-view image including an image IS from five viewpoints (viewpoint 0, viewpoint 1, viewpoint 2, viewpoint 3, and viewpoint 4) to be encoded, and three viewpoints (viewpoint 0, viewpoint 2, and viewpoint 4). It is a figure which shows the multiview depth map containing the depth DS of this. The vertical axis indicates the viewpoint direction, and the horizontal axis indicates the time direction. Further, the viewpoint 0 is set as the base viewpoint. In the MVC encoding method, the base viewpoint is a viewpoint that can be encoded or decoded without depending on other viewpoints. Only one viewpoint is set as a base viewpoint in one entire sequence including multi-viewpoint images. That is, a base viewpoint image can be encoded or decoded independently without using an image of another viewpoint as a reference image for inter-view prediction. An image of a non-base viewpoint (that is, a viewpoint other than the base viewpoint) can be encoded or decoded using an image of another viewpoint as a reference image for inter-view prediction. In the following description, a case where the multi-view image and the multi-view depth map shown in FIG. 10 are encoded will be described.

図１１は、実施の形態１に係る画像符号化装置１００で生成される符号化ストリームをＮＡＬユニット単位で表現した例を示す図である。１つの四角形のブロックが１つのＮＡＬユニットに相当する。ＮＡＬユニットはヘッダ部（すなわち、先頭部）であるＮＡＬユニットヘッダと、そのＮＡＬユニットヘッダを除いた生のデータであるＲＢＳＰ（Raw Byte Sequence Payload）を含む。それぞれのＮＡＬユニットのヘッダ部には常に“０”の値を持つフラグ（すなわち、"forbidden_zero_bit"）と、ＳＰＳ、ＰＰＳまたは参照ピクチャとなるスライスが含まれているかどうかを見分ける識別子（すなわち、"nal_ref_idc"）と、ＮＡＬユニットの種類を見分ける識別子（すなわち、"nal_unit_type"）が含まれる。 FIG. 11 is a diagram illustrating an example in which an encoded stream generated by the image encoding device 100 according to Embodiment 1 is expressed in units of NAL units. One square block corresponds to one NAL unit. The NAL unit includes a NAL unit header which is a header part (that is, a head part) and an RBSP (Raw Byte Sequence Payload) which is raw data excluding the NAL unit header. The header part of each NAL unit always has a flag having a value of “0” (ie, “forbidden_zero_bit”) and an identifier for distinguishing whether a slice serving as an SPS, PPS or reference picture is contained (ie, “nal_ref_idc”). ") And an identifier for identifying the type of the NAL unit (ie," nal_unit_type ").

図１２は、ＡＶＣ／Ｈ．２６４符号化方式で規定されているＮＡＬユニットの種類を示す図である。復号側ではＮＡＬユニットの種類を、ＮＡＬユニットのヘッダ部に含まれるＮＡＬユニットの種類を見分ける識別子である"nal_unit_type"を参照することにより、識別することができる。 FIG. 12 shows AVC / H. 2 is a diagram illustrating the types of NAL units defined in the H.264 encoding scheme. FIG. On the decoding side, the type of the NAL unit can be identified by referring to “nal_unit_type” that is an identifier for identifying the type of the NAL unit included in the header part of the NAL unit.

（ＳＰＳ＃Ａ）
図１１に示す符号化ストリームでは、まず、ＳＰＳ＃ＡのＮＡＬユニットが生成される。ＳＰＳ＃Ａには基底視点の画像信号（図１０では、視点０の画像の信号）の、シーケンス全体の符号化に関わる情報が設定される。ＳＰＳ＃ＡのＮＡＬユニットヘッダに含まれるＮＡＬユニットの種類を示す"nal_unit_type"の値には、ＳＰＳであることを示す“７”が設定される（図１２参照）。 (SPS # A)
In the encoded stream shown in FIG. 11, first, an SPS # A NAL unit is generated. In SPS # A, information related to the encoding of the entire sequence of the base viewpoint image signal (in FIG. 10, the image signal of the viewpoint 0) is set. In the value of “nal_unit_type” indicating the type of NAL unit included in the NAL unit header of SPS # A, “7” indicating SPS is set (see FIG. 12).

図１３は、ＳＰＳのＮＡＬユニットの構成を示す図である。ＳＰＳのＲＢＳＰである"seq_parameter_set_rbsp"は、シーケンス全体の符号化に関わる情報が含まれる"seq_parameter_set_data"と、ＲＢＳＰの最後に付加する調整のためのビットである"rbsp_trailing_bits"を含む。"seq_parameter_set_data"にはプロファイルを識別するための"profile_idc"が含まれる。ここでのプロファイルとはＡＶＣ／Ｈ．２６４符号化方式のシンタックスのサブセットを示す。 FIG. 13 is a diagram showing the configuration of the SPS NAL unit. The SPS RBSP “seq_parameter_set_rbsp” includes “seq_parameter_set_data” that includes information related to coding of the entire sequence, and “rbsp_trailing_bits” that is an adjustment bit added to the end of the RBSP. “seq_parameter_set_data” includes “profile_idc” for identifying a profile. The profile here is AVC / H. 2 shows a subset of the H.264 encoding syntax.

例えば、ＳＰＳ＃Ａの"profile_idc"の値を“１００”に設定することにより、符号化ストリームがＡＶＣ／Ｈ．２６４符号化方式のハイ・プロファイル（High Profile）に準拠していることを示すことができる。その場合、ＳＰＳ＃Ａを参照すべき後述のＮＡＬユニットは、ハイ・プロファイルに準拠した制限に基づいて生成される。さらに、"seq_parameter_set_data"にはＳＰＳを識別するための、ＳＰＳを特定する一意の番号である"seq_parameter_set_id"が含まれており、ＳＰＳ＃Ａの"seq_parameter_set_id"には、後述するＳＰＳ＃ＢおよびＳＰＳ＃Ｃの、"seq_parameter_set_id"と異なる任意の値が設定される。この基底視点の画像信号のＳＰＳには、後述するシーケンス全体の符号化にかかわるＭＶＣ拡張の情報が含まれる"seq_parameter_set_mvc_extension"は含まれない。 For example, by setting the value of “profile_idc” of SPS # A to “100”, the encoded stream becomes AVC / H. It can be shown that it conforms to the High Profile of the H.264 encoding method. In this case, a NAL unit to be described later that should refer to SPS # A is generated based on a restriction conforming to the high profile. Further, “seq_parameter_set_data” includes “seq_parameter_set_id” that is a unique number for identifying the SPS for identifying the SPS, and “seq_parameter_set_id” of SPS # A includes SPS # B and SPS # described later. An arbitrary value different from “seq_parameter_set_id” of C is set. The SPS of the image signal of the base viewpoint does not include “seq_parameter_set_mvc_extension” that includes MVC extension information related to encoding of the entire sequence described later.

（ＳＰＳ＃Ｂ）
続いて、ＳＰＳ＃ＢのＮＡＬユニットが生成される。ＳＰＳ＃Ｂには基底視点を除くその他の視点の画像信号（図１０では、視点１、視点２、視点３および視点４の画像の信号）の、シーケンス全体の符号化に関わる情報が設定される。ＳＰＳ＃ＢのＮＡＬユニットヘッダに含まれるＮＡＬユニットの種類を示す"nal_unit_type"の値には、ＭＶＣ拡張のＳＰＳであるサブセットＳＰＳであることを示す“１５”が設定される。 (SPS # B)
Subsequently, an SPS # B NAL unit is generated. In SPS # B, information related to encoding of the entire sequence of image signals of other viewpoints excluding the base viewpoint (in FIG. 10, the signals of the images of viewpoint 1, viewpoint 2, viewpoint 3, and viewpoint 4) is set. . In the value of “nal_unit_type” indicating the type of the NAL unit included in the NAL unit header of SPS # B, “15” indicating the subset SPS that is the SPS of the MVC extension is set.

図１４は、サブセットＳＰＳのＮＡＬユニットの構成を示す図である。サブセットＳＰＳのＲＢＳＰである"subset_seq_parameter_set_rbsp"には、シーケンス全体の符号化に関わる情報が含まれる"seq_parameter_set_data"に加えて、シーケンス全体の符号化に関わるＭＶＣ拡張の情報が含まれる"seq_parameter_set_mvc_extension_rbsp"が含まれる。ＳＰＳ＃Ｂの"profile_idc"の値には、ＡＶＣ／Ｈ．２６４符号化方式のマルチビュー・ハイ・プロファイル（Multiview High Profile）を示す“１１８”が設定される。本明細書では、ＡＶＣ／Ｈ．２６４符号化方式のマルチビュー・ハイ・プロファイルに対応した符号化方式をＭＶＣ符号化方式と呼ぶ。 FIG. 14 is a diagram illustrating the configuration of the NAL unit of the subset SPS. “Subset_seq_parameter_set_rbsp”, which is an RBSP of the subset SPS, includes “seq_parameter_set_mvc_extension_rbsp” including MVC extension information related to encoding of the entire sequence in addition to “seq_parameter_set_data” including information related to encoding of the entire sequence. . The value of “profile_idc” of SPS # B includes AVC / H. “118” indicating the multiview high profile of the H.264 encoding method is set. In this specification, AVC / H. An encoding method corresponding to the multi-view high profile of the H.264 encoding method is referred to as an MVC encoding method.

ＳＰＳ＃Ｂを参照すべき後述のＮＡＬユニットは、マルチビュー・ハイ・プロファイルに準拠した制限に基づいて生成される。さらに、ＳＰＳ＃Ｂの"seq_parameter_set_id"には、上述したＳＰＳ＃Ａおよび後述するＳＰＳ＃Ｃの、"seq_parameter_set_id"と異なる任意の値が設定される。"seq_parameter_set_mvc_extension_rbsp"には、符号化される画像信号の視点数、視点方向への符号化または復号順序、および画像信号を符号化または復号する際の視点間予測の際に参照すべき、視点を特定するための視点間の依存関係を示す情報が含まれる。 A NAL unit to be described later that should refer to SPS # B is generated based on a restriction conforming to the multi-view high profile. Furthermore, an arbitrary value different from “seq_parameter_set_id” of SPS # A and SPS # C described later is set in “seq_parameter_set_id” of SPS # B. "seq_parameter_set_mvc_extension_rbsp" specifies the number of viewpoints of the image signal to be encoded, the encoding or decoding order in the viewpoint direction, and the viewpoint to be referred to when performing inter-view prediction when encoding or decoding the image signal Information indicating the dependency relationship between the viewpoints to be included.

図１４において、"num_views_minus1"は、上記符号化ビット列に上記視点の数を設定するためのパラメータであり、視点数から“１”を引いた値である。図１０の例では、視点０、視点１、視点２、視点３および視点４の５視点の画像の信号を含む多視点画像信号が符号化されるため、"num_views_minus1"の値には“４”が設定される。 In FIG. 14, “num_views_minus1” is a parameter for setting the number of viewpoints in the encoded bit string, and is a value obtained by subtracting “1” from the number of viewpoints. In the example of FIG. 10, since the multi-viewpoint image signal including the signals of the five viewpoint images of viewpoint 0, viewpoint 1, viewpoint 2, viewpoint 3, and viewpoint 4 is encoded, the value of “num_views_minus1” is “4”. Is set.

続いて、"view_id[i]"が各視点ごとに視点方向への符号化または復号順序で、連続して繰り返し設定される構造となっている。"view_id[i]"は視点方向への符号化または復号順序をインデックスｉで示したときの視点の識別情報（以下、視点ＩＤという）を示す。すなわち、"view_id[i]"は視点方向への符号化または復号順序で、i番目の視点ＩＤを示す。ここで、本明細書では、配列のインデックス（すなわち、添え字）は０から始まるものとする。例えば、配列"view_id[i]"の先頭はview_id[0]、その次は"view_id[1]"となる。また、順序を表す際にも最初を０番目、その次を１番目とする。つまり、視点方向に最初に符号化または復号される視点を０番目、その次に符号化または復号される視点を１番目とする。例えば、視点０、視点２、視点１、視点４、視点３の順序で符号化される場合、"view_id[0]"には視点０の視点ＩＤを、"view_id[1]"には視点２の視点ＩＤを、"view_id[2]"には視点１の視点ＩＤを、"view_id[3]"には視点４の視点ＩＤを、および"view_id[4]"には視点３の視点ＩＤをそれぞれ設定する。 Subsequently, “view_id [i]” has a structure in which each viewpoint is repeatedly set in succession in the encoding or decoding order in the viewpoint direction. “view_id [i]” indicates viewpoint identification information (hereinafter referred to as viewpoint ID) when the encoding or decoding order in the viewpoint direction is indicated by an index i. That is, “view_id [i]” indicates the i-th view ID in the encoding or decoding order in the view direction. Here, in this specification, it is assumed that the array index (that is, the subscript) starts from 0. For example, the top of the array “view_id [i]” is view_id [0], and the next is “view_id [1]”. Also, when expressing the order, the first is 0th and the next is 1st. That is, the viewpoint that is first encoded or decoded in the viewpoint direction is 0th, and the viewpoint that is encoded or decoded next is 1st. For example, when encoding is performed in the order of viewpoint 0, viewpoint 2, viewpoint 1, viewpoint 4, and viewpoint 3, viewpoint ID of viewpoint 0 is assigned to "view_id [0]" and viewpoint 2 is assigned to "view_id [1]". The viewpoint ID of viewpoint 1, "view_id [2]" is the viewpoint ID of viewpoint 1, "view_id [3]" is the viewpoint ID of viewpoint 4, and "view_id [4]" is the viewpoint ID of viewpoint 3 Set each.

（ＳＰＳ＃Ｃ）
続いて、ＳＰＳ＃ＣのＮＡＬユニットが生成される。ＳＰＳ＃Ｃには各視点のデプス信号のシーケンス全体の符号化に関わる情報が設定される。ＳＰＳ＃Ｂと同様に、ＳＰＳ＃ＣのＮＡＬユニットヘッダに含まれるＮＡＬユニットの種類を示す"nal_unit_type"の値には、ＭＶＣ拡張のＳＰＳであるサブセットＳＰＳであることを示す“１５”が設定される。ここで、本実施の形態においては、多視点デプス信号も復号可能なプロファイルであることを示す"profile_idc"の値を“１２０”と規定する。したがって、ＳＰＳ＃Ｃの"profile_idc"の値が“１２０”に設定される。さらに、ＳＰＳ＃Ｃの"seq_parameter_set_id"には、上述したＳＰＳ＃ＡおよびＳＰＳ＃Ｂの、"seq_parameter_set_id"と異なる任意の値が設定され。"seq_parameter_set_mvc_extension_rbsp"には、符号化されるデプス信号の視点数、視点方向への符号化または復号順序、およびデプス信号を符号化または復号する際の視点間予測の際に参照すべき視点を特定するための、視点間の依存関係が含まれる。 (SPS # C)
Subsequently, an SPS # C NAL unit is generated. In SPS # C, information related to the coding of the entire sequence of depth signals for each viewpoint is set. Similar to SPS # B, the value of “nal_unit_type” indicating the type of the NAL unit included in the NAL unit header of SPS # C is set to “15” indicating the subset SPS that is the SPS of the MVC extension. The Here, in the present embodiment, the value of “profile_idc” indicating that the multi-view depth signal is also a decodable profile is defined as “120”. Therefore, the value of “profile_idc” of SPS # C is set to “120”. Furthermore, an arbitrary value different from “seq_parameter_set_id” of SPS # A and SPS # B described above is set in “seq_parameter_set_id” of SPS # C. In “seq_parameter_set_mvc_extension_rbsp”, the number of viewpoints of the depth signal to be encoded, the encoding or decoding order in the viewpoint direction, and the viewpoint to be referred to in inter-view prediction when encoding or decoding the depth signal are specified. Dependencies between viewpoints are included.

"seq_parameter_set_mvc_extension_rbsp"にパラメータが、基底視点を除くその他の視点の画像信号の、シーケンス全体の符号化に関わる情報であるＳＰＳ＃Ｂと同様に設定される。上述したように、視点０、視点２および視点４の３視点の画像信号を含む多視点デプス信号を視点０、視点２および視点４の順序で符号化する場合、各パラメータの値は次のように設定される。まず、"num_views_minus1"の値を“２”に設定し、次に、"view_id[0]"に視点０の視点ＩＤを、"view_id[1]"に視点２の視点ＩＤを、および"view_id[2]"に視点４の視点ＩＤをそれぞれ設定する。同じ視点の画像信号およびデプス信号の視点ＩＤを共通とすることで、復号側で画像信号の視点とデプス信号の視点との対応関係を明確に特定することができる。 A parameter is set in “seq_parameter_set_mvc_extension_rbsp” in the same manner as SPS # B, which is information relating to encoding of the entire sequence of image signals of other viewpoints other than the base viewpoint. As described above, when a multi-view depth signal including image signals of three viewpoints, viewpoint 0, viewpoint 2, and viewpoint 4, is encoded in the order of viewpoint 0, viewpoint 2, and viewpoint 4, the values of the parameters are as follows: Set to First, the value of “num_views_minus1” is set to “2”, then the viewpoint ID of viewpoint 0 is set to “view_id [0]”, the viewpoint ID of viewpoint 2 is set to “view_id [1]”, and “view_id [ 2] "is set to the viewpoint ID of viewpoint 4. By making the viewpoint ID of the image signal and the depth signal common to the same viewpoint, it is possible to clearly identify the correspondence between the viewpoint of the image signal and the viewpoint of the depth signal on the decoding side.

また、本実施の形態では、デプス信号はモノクローム・フォーマットの画像と同様に符号化されるため、"seq_parameter_set_data"に含まれる、輝度成分と色差成分との比を表すクロマ・フォーマット"chroma_format_idc"にはモノクロームを示す“０”が設定される。ここまで、多視点デプス信号を復号可能なプロファイルであることを示す"profile_idc"の値を“１２０”と規定する例を説明したが、既存の"profile_idc"の値以外であれば、いずれの値でもよい。
また、サブセットＳＰＳのＮＡＬユニットのＲＢＳＰに、デプス信号のシーケンス情報か否かを示すフラグを用意し、ＳＰＳ＃Ｃの"profile_idc"の値をマルチビュー・ハイ・プロファイルを示す“１１８”とすることもできる。 In the present embodiment, the depth signal is encoded in the same manner as a monochrome format image. Therefore, the chroma format “chroma_format_idc” included in “seq_parameter_set_data” represents the ratio between the luminance component and the color difference component. “0” indicating monochrome is set. Up to this point, an example has been described in which the value of “profile_idc” indicating that the multi-view depth signal can be decoded is defined as “120”, but any value other than the existing “profile_idc” value has been described. But you can.
In addition, a flag indicating whether or not the depth signal sequence information is prepared in the RBSP of the NAL unit of the subset SPS, and the value of “profile_idc” of SPS # C is set to “118” indicating the multi-view high profile. You can also.

（ＰＰＳ＃Ａ）
続いて、ＰＰＳ＃ＡのＮＡＬユニットが生成される。ＰＰＳ＃Ａには基底視点の画像信号（図１０の例では、視点０の画像の信号）の、ピクチャ全体の符号化に関する情報が設定される。ＰＰＳ＃ＡのＮＡＬユニットヘッダに含まれるＮＡＬユニットの種類を示す"nal_unit_type"の値には、ＰＰＳであることを示す“８”が設定される（図１２参照）。 (PPS # A)
Subsequently, a PPS # A NAL unit is generated. In PPS # A, information related to coding of the entire picture of the base viewpoint image signal (in the example of FIG. 10, the image of the viewpoint 0 image) is set. In the value of “nal_unit_type” indicating the type of NAL unit included in the NAL unit header of PPS # A, “8” indicating PPS is set (see FIG. 12).

図１５は、ＰＰＳのＮＡＬユニットの構成を示す図である。ＰＰＳのＲＢＳＰである"pic_parameter_set_rbsp"には、ＰＰＳを識別するための、ＰＰＳを特定する一意の番号である"pic_parameter_set_id"が含まれている。ＰＰＳ＃Ａの"pic_parameter_set_id"には、後述するＰＰＳ＃ＢおよびＰＰＳ＃Ｃの、"pic_parameter_set_id"と異なる任意の値が設定される。さらに、ＰＰＳのＲＢＳＰである"pic_parameter_set_rbsp"には、参照すべきＳＰＳを特定する番号である"seq_parameter_set_id"が含まれており、ＰＰＳ＃Ａの"seq_parameter_set_id"には、ＰＰＳ＃Ａが参照すべきＳＰＳ＃Ａの"seq_parameter_set_id"の値が設定される。 FIG. 15 is a diagram showing a configuration of a PPS NAL unit. The “pic_parameter_set_rbsp” that is the RBSP of the PPS includes “pic_parameter_set_id” that is a unique number for identifying the PPS for identifying the PPS. An arbitrary value different from “pic_parameter_set_id” of PPS # B and PPS # C described later is set in “pic_parameter_set_id” of PPS # A. Furthermore, “pic_parameter_set_rbsp”, which is the RBSP of the PPS, includes “seq_parameter_set_id”, which is a number for identifying the SPS to be referred to. The value of “seq_parameter_set_id” of #A is set.

（ＰＰＳ＃Ｂ）
続いて、ＰＰＳ＃ＢのＮＡＬユニットが生成される。ＰＰＳ＃Ｂには基底視点を除くその他の視点の画像信号（ここでは図１０における、視点１および視点２の画像の信号）の、ピクチャ全体の符号化に関する情報が設定される。ＰＰＳ＃Ａと同様に、ＰＰＳ＃ＢのＮＡＬユニットヘッダに含まれるＮＡＬユニットの種類を示す"nal_unit_type"の値には、ＰＰＳであることを示す“８”が設定される。 (PPS # B)
Subsequently, a PPS # B NAL unit is generated. In PPS # B, information related to the encoding of the entire picture of the image signals of other viewpoints excluding the base viewpoint (here, the signals of the images of viewpoint 1 and viewpoint 2 in FIG. 10) is set. Similarly to PPS # A, “8” indicating PPS is set in the value of “nal_unit_type” indicating the type of NAL unit included in the NAL unit header of PPS # B.

ＰＰＳ＃Ｂの"pic_parameter_set_id"には、上述したＰＰＳ＃Ａおよび後述するＰＰＳ＃Ｃの、"pic_parameter_set_id"と異なる任意の値が設定される。さらに、ＰＰＳ＃Ｂの"seq_parameter_set_id"には、ＰＰＳ＃Ｂが参照すべきＳＰＳ＃Ｂの"seq_parameter_set_id"の値が設定される。 An arbitrary value different from “pic_parameter_set_id” of PPS # A and PPS # C described later is set in “pic_parameter_set_id” of PPS # B. Furthermore, the value of “seq_parameter_set_id” of SPS # B to be referred to by PPS # B is set in “seq_parameter_set_id” of PPS # B.

（ＰＰＳ＃Ｃ）
続いて、ＰＰＳ＃ＣのＮＡＬユニットが生成される。ＰＰＳ＃Ｃには各視点のデプス信号のピクチャ情報が設定される。ＰＰＳ＃ＡおよびＰＰＳ＃Ｂと同様に、ＰＰＳ＃ＣのＮＡＬユニットヘッダに含まれるＮＡＬユニットの種類を示す"nal_unit_type"の値には、ＰＰＳであることを示す“８”が設定される。ＰＰＳ＃Ｃの"pic_parameter_set_id"には上述したＰＰＳ＃ＡおよびＰＰＳ＃Ｂの、"pic_parameter_set_id"と異なる任意の値が設定される。さらに、ＰＰＳ＃Ｃの"seq_parameter_set_id"には、ＰＰＳ＃Ｃが参照すべきＳＰＳ＃Ｃの"seq_parameter_set_id"の値が設定される。 (PPS # C)
Subsequently, a PPS # C NAL unit is generated. In PPS # C, the picture information of the depth signal of each viewpoint is set. Similar to PPS # A and PPS # B, “nal_unit_type” indicating the type of NAL unit included in the NAL unit header of PPS # C is set to “8” indicating PPS. An arbitrary value different from “pic_parameter_set_id” of PPS # A and PPS # B described above is set in “pic_parameter_set_id” of PPS # C. Furthermore, the value of “seq_parameter_set_id” of SPS # C to be referred to by PPS # C is set in “seq_parameter_set_id” of PPS # C.

（カメラパラメータ情報）
続いて、カメラパラメータ情報＃０のＮＡＬユニットが生成される。このカメラパラメータ情報には内部パラメータ情報と外部パラメータ情報が含まれる。内部パラメータ情報は各視点のカメラ固有の情報であり、各視点からの撮影に用いたカメラの、焦点距離、主点、ラジアルディストーション（すなわち、主点から放射方向のレンズの歪み）といった係数を含む。外部パラメータ情報は各視点のカメラの配置情報を含む。この配置情報は、３次元空間上の位置（ｘ、ｙ、ｚ座標）または３軸（x、ｙ、z軸）上の回転角度（ロール、ピッチ、ヨー）で表されることが可能である。 (Camera parameter information)
Subsequently, a NAL unit of camera parameter information # 0 is generated. This camera parameter information includes internal parameter information and external parameter information. The internal parameter information is information specific to the camera at each viewpoint, and includes coefficients such as focal length, principal point, and radial distortion (that is, lens distortion in the radial direction from the principal point) of the camera used for photographing from each viewpoint. . The external parameter information includes the arrangement information of the cameras at each viewpoint. This arrangement information can be represented by a position (x, y, z coordinate) in a three-dimensional space or a rotation angle (roll, pitch, yaw) on three axes (x, y, z axes). .

カメラパラメータ情報はそれぞれの時間で符号化される。例えば、カメラパラメータ情報＃０は後述のスライス＃Ａ００からスライス＃Ｂ３０までの画像の撮影に用いたカメラパラメータ情報である。このカメラパラメータ情報は補足付加情報の一種である"Multiview acqisition information SEI"として符号化される。カメラパラメータ情報＃０のＮＡＬユニットヘッダに含まれるＮＡＬユニットの種類を示す"nal_unit_type"の値には、ＳＥＩであることを示す“６”が設定される（図１２参照）。カメラパラメータ情報はＶＣＬで符号化されたデータの復号に直接必要なパラメータではないが、復号後の仮想視点の生成や表示の際に用いられる。 Camera parameter information is encoded at each time. For example, camera parameter information # 0 is camera parameter information used for capturing images from slice # A00 to slice # B30, which will be described later. This camera parameter information is encoded as “Multiview acqisition information SEI” which is a type of supplementary additional information. The value of “nal_unit_type” indicating the type of NAL unit included in the NAL unit header of camera parameter information # 0 is set to “6” indicating SEI (see FIG. 12). The camera parameter information is not a parameter directly required for decoding data encoded by VCL, but is used when generating or displaying a virtual viewpoint after decoding.

（プリフィックスＮＡＬユニット＃Ａ００）
続いて、プリフィックスＮＡＬユニット＃Ａ００が生成される。プリフィックスＮＡＬユニットは、プリフィックスＮＡＬユニットの後に続くスライスＮＡＬユニットの視点情報を符号化するためのＮＡＬユニットである。プリフィックスＮＡＬユニット＃Ａ００のＮＡＬユニットヘッダに含まれるＮＡＬユニットの種類を示す"nal_unit_type"の値には、プリフィックスＮＡＬユニットであることを示す“１４”が設定される（図１２参照）。 (Prefix NAL unit # A00)
Subsequently, a prefix NAL unit # A00 is generated. The prefix NAL unit is a NAL unit for encoding the viewpoint information of the slice NAL unit following the prefix NAL unit. In the value of “nal_unit_type” indicating the type of the NAL unit included in the NAL unit header of the prefix NAL unit # A00, “14” indicating the prefix NAL unit is set (see FIG. 12).

図１６は、プリフィックスＮＡＬユニットの構成を示す図である。プリフィックスＮＡＬユニットのヘッダ部であるＮＡＬユニットヘッダには、"forbidden_zero_bit"および"nal_ref_idc、nal_unit_type"に加えて、"nal_unit_header_svc_mvc_extension"が含まれている。この"nal_unit_header_svc_mvc_extension"にはプリフィックスＮＡＬユニットの後に続くスライスＮＡＬユニットの視点情報が設定される。図１１のプリフィックスＮＡＬユニット＃Ａ００の"nal_unit_header_svc_mvc_extension"には、後に続くスライスＮＡＬユニット＃Ａ００の視点情報が設定される。 FIG. 16 is a diagram illustrating the configuration of the prefix NAL unit. The NAL unit header which is the header part of the prefix NAL unit includes “nal_unit_header_svc_mvc_extension” in addition to “forbidden_zero_bit” and “nal_ref_idc, nal_unit_type”. In this “nal_unit_header_svc_mvc_extension”, the viewpoint information of the slice NAL unit following the prefix NAL unit is set. In “nal_unit_header_svc_mvc_extension” of the prefix NAL unit # A00 in FIG. 11, the viewpoint information of the subsequent slice NAL unit # A00 is set.

プリフィックスＮＡＬユニットの"nal_unit_header_svc_mvc_extension"には、視点情報の１つとして、後に続くスライスＮＡＬユニットの視点を識別するための、視点を特定する一意の番号である"view_id"が含まれる。プリフィックスＮＡＬユニット＃Ａ００の"view_id"には、視点０を示す値が設定される。ここで、この視点０の"view_id"には、他の視点である、視点１、視点２、視点３および視点４の"view_id"と異なる値を規定する。当該プリフィックスＮＡＬユニット＃Ａ００の"view_id"は、後に続く視点０のスライスＮＡＬユニット＃Ａ００の"view_id"として用いられる。なお、ＭＶＣ方式ではプリフィックスＮＡＬユニットのＲＢＳＰである"prefix_nal_unit_rbsp"には、いずれのデータも定義されておらず、空である。すなわち、ＭＶＣ方式ではプリフィックスＮＡＬユニットのＲＢＳＰにはデータが設定されない。 The prefix NAL unit “nal_unit_header_svc_mvc_extension” includes “view_id” that is a unique number for identifying the viewpoint for identifying the viewpoint of the subsequent slice NAL unit as one piece of viewpoint information. A value indicating the viewpoint 0 is set in “view_id” of the prefix NAL unit # A00. Here, a value different from the “view_id” of the other viewpoints, that is, the viewpoint 1, the viewpoint 2, the viewpoint 3, and the viewpoint 4, is defined for the “view_id” of the viewpoint 0. “View_id” of the prefix NAL unit # A00 is used as “view_id” of the slice NAL unit # A00 of the viewpoint 0 that follows. In the MVC method, “prefix_nal_unit_rbsp” which is the RBSP of the prefix NAL unit does not define any data and is empty. That is, in the MVC method, data is not set in the RBSP of the prefix NAL unit.

（スライスＮＡＬユニット＃Ａ００）
続いて、スライスＮＡＬユニット＃Ａ００が生成される。スライスＮＡＬユニット＃Ａ００には基底視点である視点０の画像信号がスライス単位で設定される。ここで、基底視点のスライスは、ＮＡＬユニットの種類を示す"nal_unit_type"が“１”または“５”（図１２参照）の、ＶＣＬのＮＡＬユニットとして生成される。また、基底視点の画像信号のシーケンスの、先頭のピクチャはＩＤＲピクチャとして符号化され、それに続くピクチャは非ＩＤＲピクチャとして符号化される。 (Slice NAL unit # A00)
Subsequently, a slice NAL unit # A00 is generated. In slice NAL unit # A00, the image signal of viewpoint 0, which is the base viewpoint, is set in units of slices. Here, the slice of the base viewpoint is generated as a VCL NAL unit whose “nal_unit_type” indicating the type of the NAL unit is “1” or “5” (see FIG. 12). Also, the first picture in the base viewpoint image signal sequence is encoded as an IDR picture, and the subsequent pictures are encoded as non-IDR pictures.

スライスＮＡＬユニット＃Ａ００はシーケンスの先頭のスライスであるため、スライスＮＡＬユニット＃Ａ００のＮＡＬユニットヘッダに含まれるＮＡＬユニットの種類を示す"nal_unit_type"の値には、ＩＤＲピクチャの符号化されたスライスであることを示す“５”が設定される（図１２参照）。図１１の例では、１つのピクチャを１つのスライスとして符号化しているが、１つのピクチャを複数のスライスに分割して符号化することも可能である。 Since slice NAL unit # A00 is the first slice in the sequence, the value of “nal_unit_type” indicating the type of NAL unit included in the NAL unit header of slice NAL unit # A00 is the coded slice of the IDR picture. “5” indicating the presence is set (see FIG. 12). In the example of FIG. 11, one picture is encoded as one slice, but one picture may be divided into a plurality of slices and encoded.

図１７は、"nal_unit_type"の値が“１”または“５”のスライスＮＡＬユニットの構成を示す図である。"nal_unit_type"の値が“１”または“５”のスライスＮＡＬユニットのＮＡＬユニットヘッダには、"nal_unit_header_svc_mvc_extension"が含まれないため、視点情報は設定されない。そこで、前に符号化されたプリフィックスＮＡＬユニットの"nal_unit_header_svc_mvc_extension"に設定されている視点情報を用いる。すなわち、プリフィックスＮＡＬユニット＃Ａ００の"nal_unit_header_svc_mvc_extension"に設定されている視点情報を、スライスＮＡＬユニット＃Ａ００の視点情報とする。 FIG. 17 is a diagram illustrating a configuration of a slice NAL unit whose “nal_unit_type” value is “1” or “5”. Since the “nal_unit_header_svc_mvc_extension” is not included in the NAL unit header of the slice NAL unit whose “nal_unit_type” value is “1” or “5”, viewpoint information is not set. Therefore, the viewpoint information set in “nal_unit_header_svc_mvc_extension” of the prefix NAL unit encoded before is used. That is, the viewpoint information set in “nal_unit_header_svc_mvc_extension” of prefix NAL unit # A00 is set as the viewpoint information of slice NAL unit # A00.

さらに、"nal_unit_type"の値が“１”または“５”のスライスＮＡＬユニットのＲＢＳＰである"slice_layer_without_partitioning_rbsp"は、"slice_header"、"slice_data"および"rbsp_slice_trailing_bits"を含む。"slice_header"はスライスの符号化にかかわる情報を含む。"slice_data"はスライス内の画像信号が符号化されて得られる、符号化モード、動きベクトル、符号化残差信号等の符号化データを含む。"rbsp_slice_trailing_bits"は調整用のためのビットである。 Furthermore, “slice_layer_without_partitioning_rbsp”, which is the RBSP of the slice NAL unit whose “nal_unit_type” value is “1” or “5”, includes “slice_header”, “slice_data”, and “rbsp_slice_trailing_bits”. “slice_header” includes information related to coding of a slice. “slice_data” includes encoded data such as an encoding mode, a motion vector, an encoded residual signal, and the like obtained by encoding an image signal in a slice. “rbsp_slice_trailing_bits” is a bit for adjustment.

"slice_header"には、参照すべきＰＰＳを特定する番号である"pic_parameter_set_id"が含まれる。スライスＮＡＬユニット＃Ａ００の"pic_parameter_set_id"には、スライスＮＡＬユニット＃Ａ００が参照すべきＰＰＳ＃Ａの"pic_parameter_set_id"の値が設定される。また、ＰＰＳ＃Ａの"seq_parameter_set_id"には、ＰＰＳ＃Ａが参照すべきＳＰＳ＃Ａの"seq_parameter_set_id"の値が設定されているため、スライスＮＡＬユニット＃Ａ００が参照すべきシーケンス情報がＳＰＳ＃Ａであることを明確に特定することができる。 “slice_header” includes “pic_parameter_set_id” that is a number for identifying the PPS to be referred to. In the “pic_parameter_set_id” of the slice NAL unit # A00, the value of “pic_parameter_set_id” of the PPS #A to be referred to by the slice NAL unit # A00 is set. Further, since the value of “seq_parameter_set_id” of SPS # A that should be referred to by PPS # A is set in “seq_parameter_set_id” of PPS # A, the sequence information that slice NAL unit # A00 should refer to is SPS # A. It can be clearly identified.

（スライスＮＡＬユニット＃Ｂ２０）
続いて、スライスＮＡＬユニット１＃Ｂ２０が生成される。スライスＮＡＬユニット＃Ｂ２０は非基底視点である視点２の画像信号がスライス単位で符号化される。また、ここで符号化されるのは先の視点０のスライス＃Ａ００と同じ表示時刻の、視点２の画像信号のスライスである。スライスＮＡＬユニット＃Ｂ２０のＮＡＬユニットヘッダに含まれるＮＡＬユニットの種類を示す"nal_unit_type"の値には、基底視点以外の符号化されたスライスを示す“２０”が設定される（図１２参照）。 (Slice NAL unit # B20)
Subsequently, a slice NAL unit 1 # B20 is generated. In slice NAL unit # B20, the image signal of viewpoint 2, which is a non-basis viewpoint, is encoded in units of slices. Also, what is encoded here is a slice of the image signal of the viewpoint 2 at the same display time as the slice # A00 of the previous viewpoint 0. In the value of “nal_unit_type” indicating the type of the NAL unit included in the NAL unit header of the slice NAL unit # B20, “20” indicating an encoded slice other than the base viewpoint is set (see FIG. 12).

図１８は、"nal_unit_type"の値が“２０”のスライスＮＡＬユニットの構成を示す図である。"nal_unit_type"の値が“２０”のスライスＮＡＬユニットのヘッダ部であるＮＡＬユニットヘッダには、"forbidden_zero_bit"および"nal_ref_idc、nal_unit_type"に加えて、"nal_unit_header_svc_mvc_extension"が含まれる。この"nal_unit_header_svc_mvc_extension"に当該スライスＮＡＬユニットの視点情報が設定される。"nal_unit_type"の値が“２０”のスライスＮＡＬユニットの"nal_unit_header_svc_mvc_extension"には、視点情報の１つとしてこのスライスＮＡＬユニットの視点を識別するための、視点を特定する一意の番号である"view_id"が含まれる。スライスＮＡＬユニット＃Ｂ２０の"view_id"には、視点２を示す値が設定される。ここで、この視点２の"view_id"は、他の視点である視点０、視点１、視点３および視点４の"view_id"と異なる値とする。 FIG. 18 is a diagram illustrating a configuration of a slice NAL unit whose “nal_unit_type” value is “20”. The NAL unit header which is the header part of the slice NAL unit whose “nal_unit_type” value is “20” includes “nal_unit_header_svc_mvc_extension” in addition to “forbidden_zero_bit” and “nal_ref_idc, nal_unit_type”. In this “nal_unit_header_svc_mvc_extension”, the viewpoint information of the slice NAL unit is set. In the “nal_unit_header_svc_mvc_extension” of the slice NAL unit whose “nal_unit_type” value is “20”, “view_id” which is a unique number for identifying the viewpoint for identifying the viewpoint of this slice NAL unit as one of the viewpoint information Is included. A value indicating the viewpoint 2 is set in “view_id” of the slice NAL unit # B20. Here, “view_id” of viewpoint 2 is set to a value different from “view_id” of viewpoints 0, 1, 3, and 4 which are other viewpoints.

さらに、nal_unit_typeの値が“２０”のスライスＮＡＬユニットのＲＢＳＰである"slice_layer_in_scalable_extension_rbsp"は、"slice_header"、"slice_data"および"rbsp_slice_trailing_bits"を含む。"slice_header"はスライスの符号化に関わる情報を含む。"slice_data"はスライス内の画像信号が符号化されて得られる、符号化モードと、動きベクトルまたは視差ベクトルと、符号化残差信号等の符号化データを含む。"rbsp_slice_trailing_bits"は調整用のためのビットである。"slice_header"には、参照すべきＰＰＳを特定する番号である"pic_parameter_set_id"が含まれる。スライスＮＡＬユニット＃Ｂ２０の"pic_parameter_set_id"には、スライスＮＡＬユニット＃Ｂ２０が参照すべきＰＰＳ＃Ｂの"pic_parameter_set_id"の値が設定される。また、ＰＰＳ＃Ｂの"seq_parameter_set_id"には、ＰＰＳ＃Ｂが参照すべきＳＰＳ＃Ｂの"seq_parameter_set_id"の値が設定されているため、スライスＮＡＬユニット＃Ｂ２０が参照すべきシーケンス情報はＳＰＳ＃Ｂであることを容易に判別することができる。 Furthermore, “slice_layer_in_scalable_extension_rbsp” that is the RBSP of the slice NAL unit whose nal_unit_type value is “20” includes “slice_header”, “slice_data”, and “rbsp_slice_trailing_bits”. “slice_header” includes information related to encoding of a slice. “slice_data” includes encoded data such as an encoding mode, a motion vector or a disparity vector, and an encoded residual signal obtained by encoding an image signal in the slice. “rbsp_slice_trailing_bits” is a bit for adjustment. “slice_header” includes “pic_parameter_set_id” that is a number for identifying the PPS to be referred to. In the “pic_parameter_set_id” of the slice NAL unit # B20, the value of the “pic_parameter_set_id” of the PPS #B to be referred to by the slice NAL unit # B20 is set. Further, since the value of “seq_parameter_set_id” of SPS # B that PPS # B should refer to is set in “seq_parameter_set_id” of PPS # B, the sequence information that slice NAL unit # B20 should refer to is SPS # B. This can be easily determined.

（スライスＮＡＬユニット＃Ｂ１０）
続いて、スライスＮＡＬユニット＃Ｂ１０、＃Ｂ４０、＃Ｂ３０がスライスＮＡＬユニット＃Ｂ２０と同様の方法で順次、生成される。スライスＮＡＬユニット＃Ｂ１０には非基底視点である視点１の画像信号がスライス単位で設定され、スライスＮＡＬユニット＃Ｂ４０には非基底視点である視点４の画像信号がスライス単位で設定され、およびスライスＮＡＬユニット＃Ｂ３０には非基底視点である視点３の画像信号がスライス単位で設定される。 (Slice NAL unit # B10)
Subsequently, slice NAL units # B10, # B40, and # B30 are sequentially generated in the same manner as the slice NAL unit # B20. In slice NAL unit # B10, the image signal of viewpoint 1, which is a non-base viewpoint, is set in units of slices, and in slice NAL unit # B40, the image signal of viewpoint 4, which is a non-base viewpoint, is set in units of slices, and slices In NAL unit # B30, the image signal of viewpoint 3, which is a non-basis viewpoint, is set in units of slices.

また、ここで符号化されるのは先の、視点０のスライス＃Ａ００および視点２のスライス＃Ｂ２０と同じ表示時刻の、視点１、視点４、視点３のスライス単位の画像信号である。スライスＮＡＬユニット＃Ｂ２０と同様に、スライスＮＡＬユニット＃Ｂ１０、＃４０、＃３０のＮＡＬユニットヘッダに含まれるＮＡＬユニットの種類を示す"nal_unit_type"の値には、基底視点以外の符号化されたスライスを示す“２０”が設定される（図１２参照）。スライスＮＡＬユニット＃Ｂ１０の"view_id"には、視点１を示す値が、スライスＮＡＬユニット＃Ｂ４０の"view_id"には視点４を示す値が、およびスライスＮＡＬユニット＃Ｂ３０の"view_id"には視点３を示す値がそれぞれ設定される。ここで、それぞれの視点の"view_id"には、他の視点の"view_id"と異なる値が設定される。 In addition, what is encoded here is an image signal in units of slices of viewpoint 1, viewpoint 4, and viewpoint 3 at the same display time as that of slice # A00 of viewpoint 0 and slice # B20 of viewpoint 2. Similar to the slice NAL unit # B20, the value of “nal_unit_type” indicating the type of the NAL unit included in the NAL unit header of the slice NAL units # B10, # 40, and # 30 is an encoded slice other than the base viewpoint. “20” is set (see FIG. 12). The “view_id” of slice NAL unit # B10 has a value indicating viewpoint 1, the “view_id” of slice NAL unit # B40 has a value indicating viewpoint 4, and the “view_id” of slice NAL unit # B30 has a viewpoint. A value indicating 3 is set. Here, “view_id” of each viewpoint is set to a value different from “view_id” of other viewpoints.

スライスＮＡＬユニット＃Ｂ１０、＃Ｂ４０、＃Ｂ３０の"pic_parameter_set_id"には、スライスＮＡＬユニット＃Ｂ１０、＃Ｂ４０、＃Ｂ３０が参照すべきＰＰＳ＃Ｂの"pic_parameter_set_id"の値がそれぞれ設定される。また、ＰＰＳ＃Ｂの"seq_parameter_set_id"には、ＰＰＳ＃Ｂが参照すべきＳＰＳ＃Ｂの"seq_parameter_set_id"の値が設定されているため、スライスＮＡＬユニット＃Ｂ１０、＃Ｂ４０、＃Ｂ３０が参照すべきシーケンス情報がＳＰＳ＃Ｂであることを明確に特定することができる。 In the “pic_parameter_set_id” of the slice NAL units # B10, # B40, and # B30, the value of “pic_parameter_set_id” of the PPS #B that the slice NAL units # B10, # B40, and # B30 should refer to, respectively. In addition, since the value of “seq_parameter_set_id” of SPS # B that should be referred to by PPS # B is set in “seq_parameter_set_id” of PPS # B, slice NAL units # B10, # B40, and # B30 should be referred to It can be clearly specified that the sequence information is SPS # B.

（スライスＮＡＬユニット＃Ｃ００）
続いて、スライスＮＡＬユニット＃Ｃ００が生成される。スライスＮＡＬユニット＃Ｃ００には視点０の画像信号のスライスＮＡＬユニット＃Ａ００に対応するデプス信号がスライス単位で設定される。ここで、本実施の形態においては、デプス信号が設定されたスライスＮＡＬユニットであることを示す"nal_unit_type"の値を“２１”と規定する。したがって、スライスＮＡＬユニット＃Ｃ００のＮＡＬユニットヘッダに含まれるＮＡＬユニットの種類を示す"nal_unit_type"の値には“２１”が設定される。 (Slice NAL unit # C00)
Subsequently, a slice NAL unit # C00 is generated. In the slice NAL unit # C00, a depth signal corresponding to the slice NAL unit # A00 of the image signal of the viewpoint 0 is set for each slice. Here, in the present embodiment, the value of “nal_unit_type” indicating the slice NAL unit in which the depth signal is set is defined as “21”. Accordingly, “21” is set to the value of “nal_unit_type” indicating the type of the NAL unit included in the NAL unit header of slice NAL unit # C00.

デプス信号が設定されたスライスＮＡＬユニットの"nal_unit_type"において、既存の"nal_unit_type"の値を用いずに“２１”と設定することにより、従来のデプス信号を復号しないＭＶＣ方式との互換性を保つことができる。すなわち、上記符号化ビット列を従来の、デプス信号を復号しないＭＶＣ方式のデコーダで復号する際に、"nal_unit_type"の値が“２１”のＮＡＬユニットを無視して復号することにより、画像信号のみを正常に復号することができるためである。なお、ここではデプス信号が符号化されたスライスであることを示す"nal_unit_type"の値を“２１”と規定したが、将来の拡張のために予約された、“１６”、“１７”、“１８”、“２２”または“２３”等の他の値を用いてもよい。 By setting “21” in the “nal_unit_type” of the slice NAL unit in which the depth signal is set without using the existing “nal_unit_type” value, compatibility with the conventional MVC method that does not decode the depth signal is maintained. be able to. That is, when the encoded bit string is decoded by a conventional MVC decoder that does not decode a depth signal, only the image signal is decoded by ignoring the NAL unit whose “nal_unit_type” value is “21”. This is because it can be normally decrypted. Here, the value of “nal_unit_type” indicating that the depth signal is an encoded slice is defined as “21”, but “16”, “17”, “ Other values such as 18 "," 22 "or" 23 "may be used.

さらに、"nal_unit_type"の値が“２１” のスライスＮＡＬユニットの構成を図１８に示す構成と同様に規定する。すなわち、"nal_unit_type"の値が“２１”のスライスＮＡＬユニットのヘッダ部であるＮＡＬユニットヘッダは、"forbidden_zero_bit"、"nal_ref_idc"および"nal_unit_type"に加えて、"nal_unit_header_svc_mvc_extension"を含むものとする。 Furthermore, the configuration of the slice NAL unit whose “nal_unit_type” value is “21” is defined similarly to the configuration shown in FIG. That is, the NAL unit header that is the header part of the slice NAL unit whose “nal_unit_type” value is “21” includes “nal_unit_header_svc_mvc_extension” in addition to “forbidden_zero_bit”, “nal_ref_idc”, and “nal_unit_type”.

スライスＮＡＬユニット＃Ｃ００の"view_id"には、視点０を示す値が設定される。このスライスＮＡＬユニット＃Ｃ００の"view_id"の値は、スライスＮＡＬユニット＃Ｃ００に対応するスライスユニット＃Ａ００の視点情報が設定されるプリフィックスＮＡＬユニット＃Ａ００の"view_id"の値と等しい値である。 A value indicating the viewpoint 0 is set in “view_id” of the slice NAL unit # C00. The value of “view_id” of the slice NAL unit # C00 is equal to the value of “view_id” of the prefix NAL unit # A00 in which the viewpoint information of the slice unit # A00 corresponding to the slice NAL unit # C00 is set.

さらに、"nal_unit_type"の値が“２１”のスライスＮＡＬユニットのＲＢＳＰである"slice_layer_in_scalable_extension_rbsp"は、"slice_header"、"slice_data"および"rbsp_slice_trailing_bits"を含む。"slice_header"はスライスの符号化にかかわる情報を含む。"slice_data"はスライス内のデプス信号が符号化されて得られる、符号化モードと、動きベクトルまたは視差ベクトルと、符号化残差信号等の符号化データを含む。"rbsp_slice_trailing_bits"は調整用のためのビットである。 Furthermore, “slice_layer_in_scalable_extension_rbsp” that is the RBSP of the slice NAL unit whose “nal_unit_type” value is “21” includes “slice_header”, “slice_data”, and “rbsp_slice_trailing_bits”. “slice_header” includes information related to coding of a slice. “slice_data” includes encoded data, such as an encoding mode, a motion vector or a disparity vector, and an encoded residual signal, obtained by encoding a depth signal in the slice. “rbsp_slice_trailing_bits” is a bit for adjustment.

"slice_header"には、参照すべきＰＰＳを特定する番号である"pic_parameter_set_id"が含まれる。スライスＮＡＬユニット＃Ｃ００の"pic_parameter_set_id"には、スライスＮＡＬユニット＃Ｃ００が参照すべきＰＰＳ＃Ｃの"pic_parameter_set_id"の値が設定される。また、ＰＰＳ＃Ｃの"seq_parameter_set_id"には、ＰＰＳ＃Ｃが参照べきＳＰＳ＃Ｃの"seq_parameter_set_id"の値が設定されているため、スライスＮＡＬユニット＃Ｃ００が参照すべきシーケンス情報がＳＰＳ＃Ｃであることを明確に特定することができる。 “slice_header” includes “pic_parameter_set_id” that is a number for identifying the PPS to be referred to. In the “pic_parameter_set_id” of the slice NAL unit # C00, the value of “pic_parameter_set_id” of the PPS #C to be referred to by the slice NAL unit # C00 is set. Further, since the value of “seq_parameter_set_id” of SPS # C that PPS # C should refer to is set in “seq_parameter_set_id” of PPS # C, the sequence information that slice NAL unit # C00 should refer to is SPS # C. It can be clearly identified.

（スライスＮＡＬユニット＃Ｃ２０）
続いて、スライスＮＡＬユニット＃Ｃ２０、＃Ｃ４０がスライスＮＡＬユニット＃Ｃ００と同様の方法で順次、生成される。スライスＮＡＬユニット＃Ｃ２０には視点２の画像信号に対応する視点２のデプス信号がスライス単位で設定され、スライスＮＡＬユニット＃Ｃ４０には視点４の画像信号に対応する視点４のデプス信号がスライス単位で設定される。スライスＮＡＬユニット＃Ｃ００と同様に、スライスＮＡＬユニット＃Ｃ２０、＃４０のＮＡＬユニットヘッダに含まれるＮＡＬユニットの種類を示す"nal_unit_type"の値には“２１”が設定される。 (Slice NAL unit # C20)
Subsequently, slice NAL units # C20 and # C40 are sequentially generated in the same manner as the slice NAL unit # C00. In the slice NAL unit # C20, the depth signal of the viewpoint 2 corresponding to the image signal of the viewpoint 2 is set in slice units, and in the slice NAL unit # C40, the depth signal of the viewpoint 4 corresponding to the image signal of the viewpoint 4 is set in slice units. Set by. Similarly to the slice NAL unit # C00, “21” is set to the value of “nal_unit_type” indicating the type of the NAL unit included in the NAL unit header of the slice NAL units # C20 and # 40.

スライスＮＡＬユニット＃Ｃ２０のview_idには視点２を示す値を設定し、スライスＮＡＬユニット＃Ｃ４０のview_idには視点４を示す値を設定する。このスライスＮＡＬユニット＃Ｃ２０のview_idの値は、スライスＮＡＬユニット＃Ｃ２０に対応するスライスユニット＃Ｂ２０のview_idの値と等しい値であり、スライスＮＡＬユニット＃Ｃ４０のview_idの値は、スライスＮＡＬユニット＃Ｃ４０に対応するスライスユニット＃Ｂ４０のview_idの値と等しい値である。 A value indicating the viewpoint 2 is set in the view_id of the slice NAL unit # C20, and a value indicating the viewpoint 4 is set in the view_id of the slice NAL unit # C40. The view_id value of the slice NAL unit # C20 is equal to the view_id value of the slice unit # B20 corresponding to the slice NAL unit # C20, and the view_id value of the slice NAL unit # C40 is the slice NAL unit # C40. Is equal to the value of view_id of slice unit # B40 corresponding to.

スライスＮＡＬユニット＃Ｃ２０、＃４０の"pic_parameter_set_id"には、スライスＮＡＬユニット＃Ｃ２０、＃Ｃ４０が参照すべきＰＰＳ＃Ｃの"pic_parameter_set_id"の値が設定される。また、ＰＰＳ＃Ｃの"seq_parameter_set_id"には、ＰＰＳ＃Ｃが参照すべきＳＰＳ＃Ｃの"seq_parameter_set_id"の値が設定されているため、スライスＮＡＬユニット＃Ｃ２０、＃４０が参照すべきシーケンス情報がＳＰＳ＃Ｃであることを明確に特定することができる。 In the “pic_parameter_set_id” of the slice NAL units # C20 and # 40, the value of “pic_parameter_set_id” of the PPS #C to be referred to by the slice NAL units # C20 and # C40 is set. In addition, since the value of “seq_parameter_set_id” of SPS # C that PPS # C should refer to is set in “seq_parameter_set_id” of PPS # C, sequence information that slice NAL units # C20 and # 40 should refer to It can be clearly specified that it is SPS # C.

スライスＮＡＬユニット＃Ｃ４０に続くカメラパラメータ情報のＮＡＬユニット＃Ａ１以降のＮＡＬユニットも、カメラパラメータ情報＃０からスライスＮＡＬユニット＃Ｃ４０と同様に生成される。プリフィックスＮＡＬユニット＃Ａ０１には、後に続くスライス＃Ａ０１の視点情報が、プリフィックスＮＡＬユニット＃Ａ００と同様の方法で設定される。 NAL units subsequent to the NAL unit # A1 in the camera parameter information following the slice NAL unit # C40 are also generated from the camera parameter information # 0 in the same manner as the slice NAL unit # C40. In the prefix NAL unit # A01, the viewpoint information of the subsequent slice # A01 is set in the same manner as the prefix NAL unit # A00.

スライスＮＡＬユニット＃Ａ０１には、スライスＮＡＬユニット＃Ａ００に設定された画像信号の符号化または復号順序で次にくる画像信号が、スライスＮＡＬユニット＃Ａ００と同様の方法でスライス単位で設定される。スライスＮＡＬユニット＃Ａ０１のＮＡＬユニットヘッダに含まれるＮＡＬユニットの種類を示す"nal_unit_type"の値には、非ＩＤＲピクチャの符号化されたスライスであることを示す“１”が設定される（図１２参照）。 In the slice NAL unit # A01, the next image signal in the encoding or decoding order of the image signal set in the slice NAL unit # A00 is set in units of slices in the same manner as in the slice NAL unit # A00. The value of “nal_unit_type” indicating the type of the NAL unit included in the NAL unit header of the slice NAL unit # A01 is set to “1” indicating that the slice is a non-IDR picture (FIG. 12). reference).

スライスＮＡＬユニット＃Ｂ２１、＃Ｂ１１、＃Ｂ４１、＃Ｂ３１には、スライスＮＡＬユニット＃Ｂ２０、＃Ｂ１０、＃Ｂ４０、＃Ｂ３０に設定された画像信号のそれぞれの視点において符号化または復号順序で、次にくる画像信号が、スライスＮＡＬユニット＃Ｂ２０や＃Ｂ１０等と同様の方法でスライス単位でそれぞれ符号化される。スライスＮＡＬユニット＃Ｃ０１、＃Ｃ２１、＃Ｃ４１には、スライスＮＡＬユニット＃Ｃ００、＃Ｃ２０、＃Ｃ４０に設定されたデプス信号のそれぞれの視点において符号化または復号順序で次にくるデプス信号が、スライスＮＡＬユニット＃Ｃ００、＃Ｃ２０、＃Ｃ４０と同様の方法で、スライス単位でそれぞれ符号化される。 Slice NAL units # B21, # B11, # B41, and # B31 include the following in the encoding or decoding order in the respective viewpoints of the image signals set in slice NAL units # B20, # B10, # B40, and # B30. The image signals coming in are encoded in units of slices in the same manner as the slice NAL units # B20 and # B10. In the slice NAL units # C01, # C21, and # C41, the next depth signal in the order of encoding or decoding in each viewpoint of the depth signals set in the slice NAL units # C00, # C20, and # C40 Encoding is performed in units of slices in the same manner as NAL units # C00, # C20, and # C40.

図１、図３に戻り、実施の形態１に係る画像符号化装置１００、１００ａの構成について、より具体的に説明する。符号化管理部１０１には、外部または図示しない符号化管理情報保持部から符号化管理情報が供給される。符号化管理部１０１は必要に応じて新たにパラメータを計算する。 Returning to FIG. 1 and FIG. 3, the configuration of the image coding apparatuses 100 and 100a according to Embodiment 1 will be described more specifically. Encoding management information is supplied to the encoding management unit 101 from the outside or from an encoding management information holding unit (not shown). The encoding management unit 101 newly calculates parameters as necessary.

符号化管理部１０１は、
（ａ）画像信号のシーケンス全体に関連するパラメータ情報（すなわち、画像信号のＳＰＳ）、
（ｂ）デプス信号のシーケンス全体に関連するパラメータ情報（すなわち、デプス信号のＳＰＳ）、
（ｃ）画像信号のピクチャに関連するパラメータ情報（すなわち、画像信号のＰＰＳ）、（ｄ）デプス信号のピクチャに関連するパラメータ情報（すなわち、デプス信号のＰＰＳ）、（ｅ）画像信号のピクチャのスライスに関連するヘッダ情報（すなわち、画像信号のスライスヘッダ）、
（ｆ）デプス信号のピクチャのスライスに関連するヘッダ情報（すなわち、デプス信号のスライスヘッダ）
等を含む符号化に関する情報を管理する。 The encoding management unit 101
(A) parameter information relating to the entire sequence of image signals (ie SPS of the image signal),
(B) Parameter information related to the entire sequence of depth signals (ie, SPS of the depth signal),
(C) Parameter information related to the picture of the image signal (ie, PPS of the image signal), (d) Parameter information related to the picture of the depth signal (ie, PPS of the depth signal), (e) Header information related to the slice (ie the slice header of the image signal),
(F) Header information related to a slice of a picture of a depth signal (that is, a slice header of a depth signal)
It manages information related to encoding, including.

さらに、符号化管理部１０１は多視点画像信号および多視点デプス信号の視点情報、符号化対象画像の参照依存関係、並びに符号化または復号順序を管理する。符号化管理部１０１は上記視点情報として、各視点における画像信号およびデプス信号の対応関係を視点ＩＤにより管理する。 Further, the encoding management unit 101 manages the viewpoint information of the multi-view image signal and the multi-view depth signal, the reference dependency relationship of the encoding target image, and the encoding or decoding order. As the viewpoint information, the encoding management unit 101 manages the correspondence between the image signal and the depth signal at each viewpoint using the viewpoint ID.

符号化管理部１０１は上記参照依存関係として、視点単位で他の視点の画像信号またはデプス信号を参照するか否かを管理する。また、符号化管理部１０１は上記参照依存関係として、ピクチャまたはスライス単位で、符号化対象画像信号または符号化対象デプス信号を符号化する際に他の視点の画像信号またはデプス信号を参照画像として用いる視点間予測（例えば、視差補償予測）を行うか否かを管理する。また、符号化管理部１０１は上記参照依存関係として、符号化対象画像信号または符号化対象デプス信号が符号化された後に、符号化側で復号して得られる復号画像信号または復号デプス信号が、他の視点の符号化対象画像信号または符号化対象デプス信号を符号化する際の参照画像として用いられるか否かを管理する。さらに、符号化管理部１０１は上記参照依存関係として、複数ある参照画像の候補の中からどの参照画像を参照すべきかについて管理する。 The encoding management unit 101 manages whether or not to refer to an image signal or a depth signal of another viewpoint for each viewpoint, as the reference dependency relationship. In addition, as the reference dependency, the encoding management unit 101 uses the image signal or depth signal of another viewpoint as a reference image when encoding the encoding target image signal or the encoding target depth signal in units of pictures or slices. Whether to perform inter-view prediction (for example, parallax compensation prediction) to be used is managed. Further, the encoding management unit 101, as the reference dependency relationship, a decoded image signal or a decoded depth signal obtained by decoding on the encoding side after the encoding target image signal or the encoding target depth signal is encoded, It is managed whether or not it is used as a reference image when encoding an encoding target image signal or an encoding target depth signal of another viewpoint. Furthermore, the encoding management unit 101 manages which reference image should be referred to from among a plurality of reference image candidates as the reference dependency relationship.

また、符号化管理部１０１は上記符号化または復号順序として、復号側で、上記参照依存関係にしたがった復号対象画像信号の復号順番が、その画像信号が参照すべき参照画像の復号順番の後になるように管理する。また、符号化管理部１０１は上記符号化または復号順序として、同一時刻の各視点の画像信号とデプス信号が、復号された後に同時に出力されるのに適した順序で、画像信号およびデプス信号を符号化するように管理する。 In addition, the encoding management unit 101 sets the decoding order of the decoding target image signal according to the reference dependency on the decoding side after the decoding order of the reference image to be referred to by the image signal. Manage to be. In addition, the encoding management unit 101 outputs the image signal and the depth signal in the order suitable for outputting the image signal and the depth signal of each viewpoint at the same time after decoding as the encoding or decoding order. Manage to encode.

画像信号用シーケンス情報符号化部１０２は、符号化管理部１０１で管理される基底視点の画像信号の、シーケンス全体に関連するパラメータ情報（すなわち、基底視点の画像信号のＳＰＳ）を符号化し、符号化ビット列を生成する。この符号化ビット列は、図１１に示した符号化ビット列全体の、ＳＰＳ＃ＡのＲＢＳＰ部に相当する。上述したように、基底視点の画像信号のＳＰＳは、図１３に示したＲＢＳＰである"seq_parameter_set_rbsp"のシンタックス構造に従って符号化される。 The sequence information encoding unit for image signal 102 encodes parameter information related to the entire sequence (that is, SPS of the image signal of the base viewpoint) of the base viewpoint image signal managed by the encoding management unit 101, Generate a bit string. This encoded bit string corresponds to the RBSP part of SPS # A of the entire encoded bit string shown in FIG. As described above, the SPS of the base viewpoint image signal is encoded according to the syntax structure of “seq_parameter_set_rbsp” that is the RBSP shown in FIG. 13.

さらに、画像信号用シーケンス情報符号化部１０２は、符号化管理部１０１で管理される非基底視点の画像信号の、シーケンス全体に関連するパラメータ情報（すなわち、非基底視点の画像信号のＳＰＳ）を符号化し、符号化ビット列を生成する。この符号化ビット列は、図１１に示した符号化ビット列全体の、ＳＰＳ＃ＢのＲＢＳＰ部に相当する。上述したように、非基底視点の画像信号用のＳＰＳは、図１４に示したＲＢＳＰである"subset_seq_parameter_set_rbsp"のシンタックス構造に従って符号化される。ここでは、図１４に示すシンタックス構造に従ってＳＰＳのＭＶＣ拡張情報も符号化される。 Further, the sequence information encoding unit for image signal 102 receives parameter information related to the entire sequence of the image signal of the non-base viewpoint managed by the encoding management unit 101 (that is, the SPS of the image signal of the non-base viewpoint). Encode to generate a coded bit string. This encoded bit string corresponds to the RBSP part of SPS # B of the entire encoded bit string shown in FIG. As described above, the SPS for the image signal of the non-basis viewpoint is encoded according to the syntax structure of “subset_seq_parameter_set_rbsp” which is the RBSP shown in FIG. Here, SPS MVC extension information is also encoded in accordance with the syntax structure shown in FIG.

デプス信号用シーケンス情報符号化部１０３は、符号化管理部１０１で管理されるデプス信号の、シーケンス全体に関連するパラメータ情報（すなわち、デプス信号のＳＰＳ）を符号化し、符号化ビット列を生成する。この符号化ビット列は、図１１に示した符号化ビット列全体の、ＳＰＳ＃ＣのＲＢＳＰ部に相当する。上述したように、デプス信号のＳＰＳは図１４に示したＲＢＳＰである"subset_seq_parameter_set_rbsp"のシンタックス構造に従って符号化される。ここでは、図１４に示したシンタックス構造に従ってＳＰＳのＭＶＣ拡張情報も符号化される。 The depth signal sequence information encoding unit 103 encodes parameter information related to the entire sequence of the depth signal managed by the encoding management unit 101 (that is, the SPS of the depth signal), and generates an encoded bit string. This encoded bit string corresponds to the RBSP portion of SPS # C of the entire encoded bit string shown in FIG. As described above, the SPS of the depth signal is encoded according to the syntax structure of “subset_seq_parameter_set_rbsp” which is the RBSP shown in FIG. Here, SPS MVC extension information is also encoded in accordance with the syntax structure shown in FIG.

画像信号用ピクチャ情報符号化部１０４は、符号化管理部１０１で管理される画像信号のピクチャに関連する情報（すなわち、画像信号のＰＰＳ）を符号化し、符号化ビット列を生成する。この符号化ビット列は、図１１に示した符号化ビット列全体の、ＰＰＳ＃ＡおよびＰＰＳ＃Ｂの、ＲＢＳＰ部に相当する。上述したように、基底視点の画像信号のＰＰＳおよび非基底視点の画像信号のＰＰＳは、図１５に示したＲＢＳＰである"pic_parameter_set_rbsp"のシンタックス構造に従ってそれぞれ符号化される。 The picture signal picture information encoding unit 104 encodes information related to the picture of the image signal managed by the encoding management unit 101 (that is, the PPS of the image signal), and generates an encoded bit string. This encoded bit string corresponds to the RBSP part of PPS # A and PPS # B of the entire encoded bit string shown in FIG. As described above, the PPS of the base viewpoint image signal and the PPS of the non-base viewpoint image signal are encoded according to the syntax structure of “pic_parameter_set_rbsp” which is the RBSP shown in FIG.

デプス信号用ピクチャ情報符号化部１０５は、符号化管理部１０１で管理されるデプス信号のピクチャに関連する情報（すなわち、デプス信号のＰＰＳ）を符号化し、符号化ビット列を生成する。この符号化ビット列は、図１１に示した符号化ビット列全体の、ＰＰＳ＃ＣのＲＢＳＰ部に相当する。上述したように、デプス信号のＰＰＳは、図１５に示したＲＢＳＰである"pic_parameter_set_rbsp"のシンタックス構造に従って符号化される。 The depth signal picture information encoding unit 105 encodes information related to the picture of the depth signal managed by the encoding management unit 101 (that is, the PPS of the depth signal), and generates an encoded bit string. This encoded bit string corresponds to the RBSP part of PPS # C of the entire encoded bit string shown in FIG. As described above, the PPS of the depth signal is encoded according to the syntax structure of “pic_parameter_set_rbsp” that is the RBSP shown in FIG.

カメラパラメータ情報符号化部１０６は、各視点の撮影に用いたカメラのパラメータ情報をＳＥＩとして符号化し、符号化ビット列を生成する。ここで、当該カメラパラメータ情報には内部パラメータ情報と外部パラメータ情報が含まれる。内部パラメータ情報は各視点のカメラ固有の情報であり、各視点からの撮影に用いたカメラの、焦点距離、主点、ラジアルディストーション（すなわち、主点から放射方向のレンズの歪み）といった係数を含む。外部パラメータ情報は各視点のカメラの配置情報を含む。この配置情報は、３次元空間上の位置（ｘ、ｙ、ｚ座標）または３軸（x、ｙ、z軸）上の回転角度（ロール、ピッチ、ヨー）で表されることが可能である。 The camera parameter information encoding unit 106 encodes the camera parameter information used for photographing each viewpoint as SEI and generates an encoded bit string. Here, the camera parameter information includes internal parameter information and external parameter information. The internal parameter information is information specific to the camera at each viewpoint, and includes coefficients such as focal length, principal point, and radial distortion (that is, lens distortion in the radial direction from the principal point) of the camera used for photographing from each viewpoint. . The external parameter information includes the arrangement information of the cameras at each viewpoint. This arrangement information can be represented by a position (x, y, z coordinate) in a three-dimensional space or a rotation angle (roll, pitch, yaw) on three axes (x, y, z axes). .

画像信号符号化部１０７には各視点の画像信号が供給される。図１０の例において、画像信号符号化部１０７に供給される画像信号は、視点０、視点１、視点２、視点３および視点４の画像の信号である。画像信号符号化部１０７は、符号化管理部１０１で管理される画像信号のスライスに関連する情報（すなわち、画像信号のスライスヘッダ）、および供給される符号化対象の画像信号をスライス単位で符号化し、符号化ストリームを生成する。 The image signal encoding unit 107 is supplied with the image signal of each viewpoint. In the example of FIG. 10, the image signal supplied to the image signal encoding unit 107 is an image signal of viewpoint 0, viewpoint 1, viewpoint 2, viewpoint 3, and viewpoint 4. The image signal encoding unit 107 encodes information related to the slice of the image signal managed by the encoding management unit 101 (that is, the slice header of the image signal) and the supplied encoding target image signal in units of slices. To generate an encoded stream.

この符号化ストリームは、図１１に示した符号化ストリーム全体の、スライス＃Ａ００、＃Ｂ２０、＃Ｂ１０、＃Ｂ４０、＃Ｂ３０、＃Ａ０１、＃Ｂ２１、＃Ｂ１１、＃Ｂ４１、＃Ｂ３１のＲＢＳＰ部に相当する。上述したように、基底視点の画像信号のスライスヘッダ、および供給される符号化対象の、基底視点のスライス単位の画像信号は、図１７に示したＲＢＳＰである"slice_layer_without_partitioning_rbsp"のシンタックス構造に従ってそれぞれ符号化される。より具体的には、上記基底視点のスライス単位の画像信号は、イントラ予測符号化、インター予測符号化、直行変換、量子化、エントロピー符号化等の処理を経ることにより、符号化される。 This encoded stream is the RBSP part of slices # A00, # B20, # B10, # B40, # B30, # A01, # B21, # B11, # B41, and # B31 of the entire encoded stream shown in FIG. It corresponds to. As described above, the slice header of the base viewpoint image signal and the supplied image signal of the base viewpoint slice unit according to the syntax structure of “slice_layer_without_partitioning_rbsp”, which is the RBSP shown in FIG. Encoded. More specifically, the image signal for each slice of the base viewpoint is encoded through processing such as intra prediction encoding, inter prediction encoding, orthogonal transform, quantization, and entropy encoding.

また、非基底視点の画像信号のスライスヘッダ、および供給される符号化対象の、非基底視点のスライス単位の画像信号は、図１８に示したＲＢＳＰである"slice_layer_in_scalable_extension_rbsp"のシンタックス構造に従ってそれぞれ符号化される。画像信号を符号化する際には視点間予測や動き補償予測を用いることがあるが、その際には既に符号化された画像信号のピクチャから局部的に復号された画像信号を参照画像として利用することができる。 Further, the slice header of the non-basis viewpoint image signal and the supplied encoding target non-basis viewpoint slice unit image signal are encoded according to the syntax structure of “slice_layer_in_scalable_extension_rbsp”, which is the RBSP shown in FIG. It becomes. When encoding an image signal, inter-view prediction or motion compensated prediction may be used. In that case, an image signal locally decoded from a picture of an already encoded image signal is used as a reference image. can do.

デプス信号符号化部１０８には各視点のデプス信号が供給される。図１０の例において、デプス信号符号化部１０８に供給されるデプス信号は、視点０、視点２および視点４のデプスマップの信号である。デプス信号符号化部１０８は、符号化管理部１０１で管理されるデプス信号のスライスに関連する情報（すなわち、デプス信号のスライスヘッダ）、および供給される符号化対象のデプス信号をスライス単位で符号化し、符号化ストリームを生成する。 The depth signal encoding unit 108 is supplied with the depth signal of each viewpoint. In the example of FIG. 10, the depth signal supplied to the depth signal encoding unit 108 is a depth map signal of viewpoint 0, viewpoint 2, and viewpoint 4. The depth signal encoding unit 108 encodes information related to the slice of the depth signal managed by the encoding management unit 101 (that is, the slice header of the depth signal) and the supplied depth signal to be encoded in units of slices. To generate an encoded stream.

この符号化ビット列は、図１１に示した符号化ビット列全体の、スライス＃Ｃ００、＃Ｃ２０、＃Ｃ４０、＃Ｃ０１、＃Ｃ２１、＃Ｃ４１のＲＢＳＰ部に相当する。上述したように、デプス信号のスライスヘッダおよび、供給される符号化対象のスライス単位のデプス信号は、図１８に示したＲＢＳＰである"slice_layer_in_scalable_extension_rbsp"のシンタックス構造に従ってそれぞれ符号化される。デプス信号を符号化する際には視点間予測や動き補償予測を用いることもあるが、その際には既に符号化されたデプス信号のピクチャから局部的に復号されたデプス信号を参照画像として利用することができる。デプス信号の符号化方法はグレースケールの画像信号の場合と同じ方法を利用することができる。 This encoded bit string corresponds to the RBSP part of slices # C00, # C20, # C40, # C01, # C21, and # C41 of the entire encoded bit string shown in FIG. As described above, the slice header of the depth signal and the supplied depth signal for each slice to be encoded are encoded according to the syntax structure of “slice_layer_in_scalable_extension_rbsp”, which is the RBSP shown in FIG. When encoding a depth signal, inter-view prediction or motion compensated prediction may be used. In this case, a depth signal that is locally decoded from a picture of an already encoded depth signal is used as a reference image. can do. The encoding method of the depth signal can use the same method as that of the gray scale image signal.

ユニット化部１０９は、
（ａ）画像信号用シーケンス情報符号化部１０２により生成された、基底視点の画像信号の、シーケンス情報の符号化ビット列、
（ｂ）画像信号用シーケンス情報符号化部１０２により生成された、非基底視点の画像信号の、シーケンス情報の符号化ビット列、
（ｃ）デプス信号用シーケンス情報符号化部１０３により生成された、デプス信号のシーケンス情報の符号化ビット列、
（ｄ）画像信号用ピクチャ情報符号化部１０４により生成された、基底視点の画像信号の、ピクチャ情報の符号化ビット列、
（ｅ）画像信号用ピクチャ情報符号化部１０４により生成された、非基底視点の画像信号の、ピクチャ情報の符号化ビット列、
（ｆ）デプス信号用ピクチャ情報符号化部１０５により生成された、デプス信号のピクチャ情報の符号化ビット列、
（ｇ）カメラパラメータ情報符号化部１０６により生成されたカメラパラメータ情報の符号化ビット列、
（ｈ）画像信号符号化部１０７により生成された、基底視点の画像信号のスライスに関連する情報（すなわち、基底視点の画像信号のスライスヘッダ）および基底視点のスライス単位の画像信号の符号化ビット列、
（ｉ）画像信号符号化部１０７により生成された、非基底視点の画像信号のスライスに関連する情報（すなわち、非基底視点の画像信号のスライスヘッダ）および非基底視点のスライス単位の画像信号の符号化ビット列、および
（ｊ）デプス信号符号化部１０８により生成された、デプス信号用のスライスに関連する情報（すなわち、デプス信号のスライスヘッダ）およびスライス単位のデプス信号の符号化ビット列に、
それぞれの符号化ビット列をＮＡＬユニット単位で扱うためのヘッダ情報であるＮＡＬユニットヘッダをそれぞれ付加することにより、それぞれＮＡＬユニット化する。 The unitization unit 109
(A) an encoded bit string of sequence information of the base viewpoint image signal generated by the image signal sequence information encoding unit 102;
(B) an encoded bit string of sequence information of a non-basis viewpoint image signal generated by the image signal sequence information encoding unit 102;
(C) an encoded bit string of the sequence information of the depth signal generated by the sequence information encoding unit 103 for the depth signal,
(D) a coded bit sequence of picture information of the base viewpoint image signal generated by the picture signal picture information encoding unit 104;
(E) a coded bit sequence of picture information of a non-basis viewpoint image signal generated by the picture information picture information coding unit 104;
(F) an encoded bit sequence of the picture information of the depth signal generated by the depth signal picture information encoding unit 105;
(G) an encoded bit string of the camera parameter information generated by the camera parameter information encoding unit 106;
(H) Information related to the slice of the base viewpoint image signal generated by the image signal encoding unit 107 (that is, the slice header of the base viewpoint image signal) and the encoded bit sequence of the base viewpoint slice unit image signal ,
(I) Information related to the slice of the non-basis viewpoint image signal generated by the image signal encoding unit 107 (that is, the slice header of the non-basis viewpoint image signal) and the image signal of the slice unit of the non-basis viewpoint In the encoded bit string, and (j) information related to the slice for the depth signal (that is, the slice header of the depth signal) and the encoded bit string of the depth signal in units of slices generated by the depth signal encoding unit 108,
NAL unit headers, which are header information for handling each encoded bit string in units of NAL units, are added to form NAL units.

さらに、ユニット化部１０９は、必要に応じてＮＡＬユニット化した符号化ビット列同士を多重化し、図１１に示した多視点画像の符号化ビット列を生成する。さらに、ネットワークを介して当該符号化ビット列が伝送される場合、図示しないパケット化部は、ＭＰＥＧ−２システム方式、ＭＰ４ファイルフォーマット、ＲＴＰ等の規格に基づいてパケット化する。図示しない送信部はそのパケット化された符号化ビット列を送信する。 Further, the unitization unit 109 multiplexes the encoded bit sequences that are converted into NAL units as necessary, and generates the encoded bit sequence of the multi-view image shown in FIG. Further, when the encoded bit string is transmitted via a network, a packetizing unit (not shown) packetizes based on standards such as MPEG-2 system, MP4 file format, RTP, and the like. A transmission unit (not shown) transmits the packetized encoded bit string.

ここで、画像信号用シーケンス情報符号化部１０２から供給される、基底視点の画像信号の、シーケンス情報の符号化ビット列には、図１３に示したＮＡＬユニットヘッダが付加される。ここで、ＮＡＬユニットの種類を示す"nal_unit_type"の値には、ＳＰＳであることを示す“７”が設定される。このＮＡＬユニットヘッダが付加された符号化ビット列は、図１１に示した符号化ビット列のＳＰＳ＃ＡのＮＡＬユニットに相当する。また、非基底視点の画像信号の、シーケンス情報の符号化ビット列には、図１４に示したＮＡＬユニットヘッダが付加される。ここで、ＮＡＬユニットの種類を示す"nal_unit_type"の値には、ＭＶＣ拡張のＳＰＳであるサブセットＳＰＳであることを示す“１５”が設定される。このＮＡＬユニットヘッダが付加された符号化ビット列は、図１１に示した符号化ビット列全体の、ＳＰＳ＃ＢのＮＡＬユニットに相当する。 Here, the NAL unit header shown in FIG. 13 is added to the encoded bit string of the sequence information of the base viewpoint image signal supplied from the image signal sequence information encoding unit 102. Here, the value of “nal_unit_type” indicating the type of NAL unit is set to “7” indicating SPS. The encoded bit string to which the NAL unit header is added corresponds to the SPS # A NAL unit of the encoded bit string shown in FIG. Further, the NAL unit header shown in FIG. 14 is added to the encoded bit string of the sequence information of the image signal of the non-base viewpoint. Here, the value of “nal_unit_type” indicating the type of the NAL unit is set to “15” indicating the subset SPS that is the SPS of the MVC extension. The encoded bit string to which the NAL unit header is added corresponds to the SPS # B NAL unit of the entire encoded bit string shown in FIG.

デプス信号用シーケンス情報符号化部１０３から供給される、デプス信号のシーケンス情報の符号化ビット列には、図１４に示したＮＡＬユニットヘッダが付加される。ここで、ＮＡＬユニットの種類を示す"nal_unit_type"の値には、ＭＶＣ拡張のＳＰＳであるサブセットＳＰＳであることを示す“１５”が設定される。このＮＡＬユニットヘッダが付加された符号化ビット列は、図１１に示した符号化ビット列全体の、ＳＰＳ＃ＣのＮＡＬユニットに相当する。 The NAL unit header shown in FIG. 14 is added to the encoded bit string of the depth signal sequence information supplied from the depth signal sequence information encoding unit 103. Here, the value of “nal_unit_type” indicating the type of the NAL unit is set to “15” indicating the subset SPS that is the SPS of the MVC extension. The encoded bit sequence to which the NAL unit header is added corresponds to the SPS # C NAL unit of the entire encoded bit sequence shown in FIG.

画像信号用ピクチャ情報符号化部１０４から供給される、基底視点の画像信号の、ピクチャ情報の符号化ビット列には、図１５に示したＮＡＬユニットヘッダが付加される。ここで、ＮＡＬユニットの種類を示す"nal_unit_type"の値には、ＰＰＳであることを示す“８”が設定される。このＮＡＬユニットヘッダが付加された符号化ビット列は、図１１に示した符号化ビット列全体の、ＰＰＳ＃ＡのＮＡＬユニットに相当する。また、非基底視点の画像信号の、ピクチャ情報の符号化ビット列にも、図１５に示したＮＡＬユニットヘッダが付加される。ここで、ＮＡＬユニットの種類を示す"nal_unit_type"の値には、ＰＰＳであることを示す“８”が設定される。このＮＡＬユニットヘッダが付加された符号化ビット列は、図１１に示した符号化ビット列全体の、ＰＰＳ＃ＢのＮＡＬユニットに相当する。 The NAL unit header shown in FIG. 15 is added to the encoded bit sequence of the picture information of the base viewpoint image signal supplied from the picture signal picture information encoding unit 104. Here, the value of “nal_unit_type” indicating the type of NAL unit is set to “8” indicating PPS. The encoded bit sequence to which the NAL unit header is added corresponds to the NPS unit of PPS # A in the entire encoded bit sequence shown in FIG. Further, the NAL unit header shown in FIG. 15 is also added to the coded bit string of the picture information of the non-base viewpoint image signal. Here, the value of “nal_unit_type” indicating the type of NAL unit is set to “8” indicating PPS. The encoded bit string to which the NAL unit header is added corresponds to the NPS unit of the PPS # B in the entire encoded bit string shown in FIG.

デプス信号用ピクチャ情報符号化部１０５から供給される、デプス信号のピクチャ情報の符号化ビット列にも、図１５に示したＮＡＬユニットヘッダが付加される。ここで、ＮＡＬユニットの種類を示す"nal_unit_type"の値には、ＰＰＳであることを示す“８”が設定される。このＮＡＬユニットヘッダが付加された符号化ビット列は、図１１に示した符号化ビット列全体の、ＰＰＳ＃ＣのＮＡＬユニットに相当する。 The NAL unit header shown in FIG. 15 is also added to the encoded bit string of the picture information of the depth signal supplied from the depth signal picture information encoding unit 105. Here, the value of “nal_unit_type” indicating the type of NAL unit is set to “8” indicating PPS. The encoded bit sequence to which the NAL unit header is added corresponds to the PPS # C NAL unit of the entire encoded bit sequence shown in FIG.

カメラパラメータ情報符号化部１０６から供給されるカメラパラメータ情報の符号化ビット列には、ＳＥＩ用のＮＡＬユニットヘッダが付加される。ここで、ＮＡＬユニットの種類を示す"nal_unit_type"の値には、ＳＥＩであることを示す“６”が設定される。このＮＡＬユニットヘッダが付加された符号化ビット列は、図１１に示した符号化ビット列全体の、カメラパラメータ情報＃０、＃１のＮＡＬユニットに相当する。 An NAL unit header for SEI is added to the encoded bit string of the camera parameter information supplied from the camera parameter information encoding unit 106. Here, the value of “nal_unit_type” indicating the type of NAL unit is set to “6” indicating SEI. The encoded bit string to which the NAL unit header is added corresponds to the NAL units of the camera parameter information # 0 and # 1 in the entire encoded bit string shown in FIG.

画像信号符号化部１０７から供給される、符号化された基底視点の画像信号のスライスヘッダ情報および符号化された基底視点の画像信号を含む符号化ビット列には、図１７に示したＮＡＬユニットヘッダが付加される。ここで、ＮＡＬユニットの種類を示す"nal_unit_type"の値には、基底視点の画像信号のスライスであることを示す“１”または“５”が設定される。このＮＡＬユニットヘッダが付加された符号化ビット列は、図１１に示した符号化ビット列全体の、スライス＃Ａ００、＃Ａ０１のＮＡＬユニットに相当する。 The encoded bit sequence including the slice header information of the encoded base viewpoint image signal and the encoded base viewpoint image signal supplied from the image signal encoding unit 107 includes the NAL unit header shown in FIG. Is added. Here, the value of “nal_unit_type” indicating the type of the NAL unit is set to “1” or “5” indicating the slice of the base viewpoint image signal. The encoded bit string to which the NAL unit header is added corresponds to the NAL units of slices # A00 and # A01 of the entire encoded bit string shown in FIG.

なお、上記基底視点の画像信号の、スライスＮＡＬユニットの前には、基底視点の画像信号の視点情報を符号化するためのプリフィックスＮＡＬユニットが設定される。プリフィックスＮＡＬユニットの構造は図１６に示した通りであるが、上述したように、ＭＶＣ方式ではＲＢＳＰが設定されないため、図１６に示したＮＡＬユニットヘッダのみが設定される。ここで、ＮＡＬユニットの種類を示す"nal_unit_type"の値には、プリフィックスＮＡＬユニットであることを示す“１４”が設定される。このＮＡＬユニットヘッダのみが符号化された符号化ビット列は、図１１に示した符号化ビット列全体の、プリフィックスＮＡＬユニット＃Ａ００、＃Ａ０１のＮＡＬユニットに相当する。 A prefix NAL unit for encoding the viewpoint information of the base viewpoint image signal is set before the slice NAL unit of the base viewpoint image signal. The structure of the prefix NAL unit is as shown in FIG. 16. However, as described above, since the RBSP is not set in the MVC method, only the NAL unit header shown in FIG. 16 is set. Here, “14” indicating a prefix NAL unit is set to the value of “nal_unit_type” indicating the type of NAL unit. The encoded bit string obtained by encoding only the NAL unit header corresponds to the NAL units of the prefix NAL units # A00 and # A01 of the entire encoded bit string shown in FIG.

また、符号化された非基底視点の画像信号のスライスヘッダおよび符号化された非基底視点のスライス単位の画像信号を含む符号化ビット列には、図１８に示したＮＡＬユニットヘッダが付加される。ここで、ＮＡＬユニットの種類を示す"nal_unit_type"の値には、非基底視点の画像信号のスライスであることを示す“２０”が設定される。このＮＡＬユニットヘッダが付加された符号化ビット列は、図１１に示した符号化ビット列全体の、スライス＃Ｂ２０、＃Ｂ１０、＃Ｂ４０、＃Ｂ３０、＃Ｂ２１、＃Ｂ１１、＃Ｂ４１、＃Ｂ３１のＮＡＬユニットに相当する。 Further, the NAL unit header shown in FIG. 18 is added to the encoded bit string including the slice header of the encoded non-base viewpoint image signal and the encoded non-base viewpoint slice unit image signal. Here, the value of “nal_unit_type” indicating the type of the NAL unit is set to “20” indicating that the slice is a non-basis viewpoint image signal. The encoded bit sequence to which the NAL unit header is added is the NAL of slices # B20, # B10, # B40, # B30, # B21, # B11, # B41, # B31 of the entire encoded bit sequence shown in FIG. Corresponds to a unit.

デプス信号符号化部１０８から供給される、符号化されたデプス信号のスライスヘッダおよび符号化されたスライス単位のデプス信号を含む符号化ビット列には、図１８に示したＮＡＬユニットヘッダが付加される。ここで、ＮＡＬユニットの種類を示す"nal_unit_type"の値には、デプス信号のスライスであることを示す“２１”が設定される。このＮＡＬユニットヘッダが付加された符号化ビット列は、図１１に示した符号化ビット列全体の、スライス＃Ｃ００、＃Ｃ１０、＃Ｃ２０、＃Ｃ０１、＃Ｃ１１、＃Ｃ２１のＮＡＬユニットに相当する。 The NAL unit header shown in FIG. 18 is added to the encoded bit string including the slice header of the encoded depth signal and the encoded depth signal of the slice unit supplied from the depth signal encoding unit 108. . Here, the value of “nal_unit_type” indicating the type of NAL unit is set to “21” indicating that it is a slice of a depth signal. The encoded bit sequence to which the NAL unit header is added corresponds to the NAL units of slices # C00, # C10, # C20, # C01, # C11, and # C21 of the entire encoded bit sequence shown in FIG.

次に、図１、３に示した実施の形態１に係る画像符号化装置１００、１００ａによる多視点画像の符号化処理手順について説明する。
図１９は、実施の形態１に係る画像符号化装置１００、１００ａによる多視点画像の符号化処理手順を示すフローチャートである。まず、画像信号用シーケンス情報符号化部１０２は基底視点の画像信号の、シーケンス全体の符号化に係るパラメータ情報を符号化し、基底視点の画像信号のシーケンス情報（すなわち、基底視点の画像信号のＳＰＳ）の符号化ビット列を生成する（Ｓ１０１）。 Next, a multi-viewpoint image encoding process procedure performed by the image encoding apparatuses 100 and 100a according to Embodiment 1 shown in FIGS.
FIG. 19 is a flowchart illustrating a multi-viewpoint image encoding process performed by the image encoding devices 100 and 100a according to Embodiment 1. First, the sequence information encoding unit for image signal 102 encodes parameter information related to encoding of the entire sequence of the base viewpoint image signal, and sequence information of the base viewpoint image signal (that is, the SPS of the base viewpoint image signal). ) Is generated (S101).

続いて、ユニット化部１０９はステップＳ１０１の処理により得られた、基底視点の画像信号の、シーケンス情報の符号化ビット列に、ＮＡＬユニット単位で扱うためのヘッダ情報を付加することによりＮＡＬユニット化する（Ｓ１０２）。さらに、ユニット化部１０９は、必要に応じて他のＮＡＬユニットと多重化する。 Subsequently, the unitizing unit 109 performs NAL unitization by adding header information for handling in units of NAL units to the encoded bit string of the sequence information of the base viewpoint image signal obtained by the processing of step S101. (S102). Furthermore, the unitization unit 109 multiplexes with other NAL units as necessary.

続いて、画像信号用シーケンス情報符号化部１０２は非基底視点の画像信号の、シーケンス全体の符号化に係るパラメータ情報を符号化し、非基底視点の画像信号用の、シーケンス情報（すなわち、非基底視点の画像信号のＳＰＳ）の符号化ビット列を生成する（Ｓ１０３）。 Subsequently, the sequence information encoding unit for image signal 102 encodes parameter information related to encoding of the entire sequence of the image signal of the non-base viewpoint, and sequence information (that is, non-base viewpoint) for the image signal of the non-base viewpoint. An encoded bit string of SPS of the viewpoint image signal is generated (S103).

続いて、ユニット化部１０９はステップＳ１０４の処理により得られた、非基底視点の画像信号の、シーケンス情報の符号化ビット列に、ＮＡＬユニット単位で扱うためのヘッダ情報を付加することによりＮＡＬユニット化する（Ｓ１０４）。さらに、ユニット化部１０９は、必要に応じて他のＮＡＬユニットと多重化する。 Subsequently, the unitization unit 109 adds NAL unit header information to the encoded bit string of the sequence information of the image signal of the non-basis viewpoint obtained by the process of step S104. (S104). Furthermore, the unitization unit 109 multiplexes with other NAL units as necessary.

続いて、デプス信号用シーケンス情報符号化部１０３は、デプス信号のシーケンス全体の符号化に係るパラメータ情報を符号化し、デプス信号のシーケンス情報（すなわち、デプス信号のＳＰＳ）の符号化ビット列を生成する（Ｓ１０５）。 Subsequently, the depth signal sequence information encoding unit 103 encodes the parameter information related to the encoding of the entire depth signal sequence, and generates the encoded bit string of the depth signal sequence information (that is, the SPS of the depth signal). (S105).

続いて、ユニット化部１０９はステップＳ１０５の処理により得られた、デプス信号のシーケンス情報の符号化ビット列に、ＮＡＬユニット単位で扱うためのヘッダ情報を付加することによりＮＡＬユニット化する（Ｓ１０６）。さらに、ユニット化部１０９は、必要に応じて他のＮＡＬユニットと多重化する。 Subsequently, the unitizing unit 109 adds the header information for handling in units of NAL units to the encoded bit string of the sequence information of the depth signal obtained by the process of step S105, thereby forming a NAL unit (S106). Furthermore, the unitization unit 109 multiplexes with other NAL units as necessary.

続いて、画像信号用ピクチャ情報符号化部１０４は、基底視点の画像信号の、ピクチャ全体の符号化に係るパラメータ情報を符号化し、基底視点の画像信号の、ピクチャ情報（すなわち、基底視点の画像信号のＰＰＳ）の符号化ビット列を生成する（Ｓ１０７）。 Subsequently, the picture information picture information encoding unit 104 encodes the parameter information related to the encoding of the entire picture of the base viewpoint image signal, and the picture information (that is, the base viewpoint image of the base viewpoint image signal). An encoded bit string of the PPS of the signal is generated (S107).

続いて、ユニット化部１０９はステップＳ１０７の処理により得られた、基底視点の画像信号の、ピクチャ情報の符号化ビット列に、ＮＡＬユニット単位で扱うためのヘッダ情報を付加することによりＮＡＬユニット化する（Ｓ１０８）。さらに、ユニット化部１０９は、必要に応じて他のＮＡＬユニットと多重化する。 Subsequently, the unitizing unit 109 performs NAL unitization by adding header information for handling in units of NAL units to the encoded bit sequence of picture information of the base viewpoint image signal obtained by the processing of step S107. (S108). Furthermore, the unitization unit 109 multiplexes with other NAL units as necessary.

続いて、画像信号用ピクチャ情報符号化部１０４は非基底視点の画像信号の、ピクチャ全体の符号化に係るパラメータ情報を符号化し、非基底視点の画像信号の、ピクチャ情報（すなわち、非基底視点の画像信号のＰＰＳ）の符号化ビット列を生成する（Ｓ１０９）。 Subsequently, the picture information picture information encoding unit 104 encodes the parameter information related to the encoding of the entire picture of the non-basis viewpoint image signal, and the picture information (that is, the non-basis viewpoint image signal). An encoded bit string of PPS) of the image signal is generated (S109).

続いて、ユニット化部１０９はステップＳ１０９の処理により得られた、非基底視点の画像信号の、ピクチャ情報の符号化ビット列に、ＮＡＬユニット単位で扱うためのヘッダ情報を付加することによりＮＡＬユニット化する（Ｓ１１０）。さらに、ユニット化部１０９は、必要に応じて他のＮＡＬユニットと多重化する。 Subsequently, the unitization unit 109 adds NAL unit header information to the encoded bit sequence of the picture information of the image signal of the non-basis viewpoint obtained by the process of step S109, thereby adding NAL units. (S110). Furthermore, the unitization unit 109 multiplexes with other NAL units as necessary.

続いて、デプス信号用ピクチャ情報符号化部１０５はデプス信号のピクチャ全体の符号化に係るパラメータ情報を符号化し、デプス信号のピクチャ情報（すなわち、デプス信号のＰＰＳ）の符号化ビット列を生成する（Ｓ１１１）。 Subsequently, the depth signal picture information encoding unit 105 encodes the parameter information related to the encoding of the entire depth signal picture, and generates the encoded bit string of the depth signal picture information (that is, the PPS of the depth signal) ( S111).

続いて、ユニット化部１０９はステップＳ１１１の処理により得られた、デプス信号のピクチャ情報の符号化ビット列に、ＮＡＬユニット単位で扱うためのヘッダ情報を付加することによりＮＡＬユニット化する（Ｓ１１２）。さらに、ユニット化部１０９は、必要に応じて他のＮＡＬユニットと多重化する。 Subsequently, the unitizing unit 109 adds the header information for handling in units of NAL units to the encoded bit string of the picture information of the depth signal obtained by the process of step S111, thereby forming a NAL unit (S112). Furthermore, the unitization unit 109 multiplexes with other NAL units as necessary.

続いて、カメラパラメータ情報符号化部１０６は各視点の撮影に用いたカメラのパラメータ情報をＳＥＩとして符号化し、カメラパラメータ情報の符号化ビット列を生成する（Ｓ１１３）。 Subsequently, the camera parameter information encoding unit 106 encodes the camera parameter information used for photographing each viewpoint as SEI, and generates an encoded bit string of the camera parameter information (S113).

続いて、ユニット化部１０９はステップＳ１１３の処理により得られたカメラパラメータ情報の符号化ビット列に、ＮＡＬユニット単位で扱うためのヘッダ情報を付加することによりＮＡＬユニット化する（Ｓ１１４）。さらに、ユニット化部１０９は、必要に応じて他のＮＡＬユニットと多重化する。 Subsequently, the unitizing unit 109 adds the header information to be handled in units of NAL units to the encoded bit string of the camera parameter information obtained by the process of step S113 to form a NAL unit (S114). Furthermore, the unitization unit 109 multiplexes with other NAL units as necessary.

続いて、ユニット化部１０９は、後に続くＮＡＬユニットの視点情報を含むＮＡＬユニット単位で扱うためのヘッダ情報を符号化し、プリフィックスＮＡＬユニットとする（Ｓ１１５）。上述したように、ＭＶＣ方式ではＲＢＳＰが符号化されないためである。さらに、ユニット化部１０９は、必要に応じて他のＮＡＬユニットと多重化する。 Subsequently, the unitization unit 109 encodes header information to be handled in units of NAL units including the viewpoint information of the subsequent NAL unit, and sets it as a prefix NAL unit (S115). As described above, this is because RBSP is not encoded in the MVC method. Furthermore, the unitization unit 109 multiplexes with other NAL units as necessary.

続いて、画像信号符号化部１０７は基底視点の画像信号のスライスに関連する情報（すなわち、基底視点の画像信号のスライスヘッダ）および符号化対象の基底視点の、スライス単位の画像信号を符号化し、基底視点のスライス単位の、画像信号の符号化ビット列を生成する（Ｓ１１６）。 Subsequently, the image signal encoding unit 107 encodes information related to the slice of the base viewpoint image signal (that is, the slice header of the base viewpoint image signal) and the image signal in units of slices of the base viewpoint to be encoded. Then, an encoded bit sequence of the image signal in units of slices of the base viewpoint is generated (S116).

続いて、ユニット化部１０９ステップＳ１１６の処理により得られた、基底視点のスライス単位の、画像信号の符号化ビット列に、ＮＡＬユニット単位で扱うためのヘッダ情報を付加することによりＮＡＬユニット化する（Ｓ１１７）。さらに、ユニット化部１０９は、必要に応じて他のＮＡＬユニットと多重化する。なお、図１９には図示していないが、ピクチャを複数のスライスに分割して符号化する場合、ステップＳ１１６からＳ１１７の処理を繰り返す。 Subsequently, the NAL unit is formed by adding header information for handling in units of NAL units to the encoded bit sequence of the image signal in units of slices of the base viewpoint obtained by the processing of the unitization unit 109 step S116 ( S117). Furthermore, the unitization unit 109 multiplexes with other NAL units as necessary. Although not shown in FIG. 19, when a picture is divided into a plurality of slices and encoded, the processing from step S116 to S117 is repeated.

続いて、画像信号符号化部１０７は非基底視点の画像信号のスライスに関連する情報（すなわち、非基底視点の画像信号のスライスヘッダ）および符号化対象の基底視点の、スライス単位の画像信号を符号化し、非基底視点のスライス単位の、画像信号の符号化ビット列を生成する（Ｓ１１８）。 Subsequently, the image signal encoding unit 107 outputs information related to the slice of the image signal of the non-base viewpoint (that is, the slice header of the image signal of the non-base viewpoint) and the image signal in units of slices of the base viewpoint to be encoded. Encoding is performed to generate a coded bit string of the image signal for each slice of the non-basis viewpoint (S118).

続いて、ユニット化部１０９はステップＳ１１７の処理により得られた、非基底視点のスライス単位の、画像信号の符号化ビット列に、ＮＡＬユニット単位で扱うためのヘッダ情報を付加することによりＮＡＬユニット化する（Ｓ１１９）。さらに、ユニット化部１０９は、必要に応じて他のＮＡＬユニットと多重化する。なお、図１９には図示していないが、ピクチャを複数のスライスに分割して符号化する場合、ステップＳ１１８からＳ１１９の処理を繰り返す。 Subsequently, the unitizing unit 109 converts the non-basic viewpoint slice unit obtained by the processing of step S117 into the encoded bit sequence of the image signal by adding header information for handling in units of NAL units. (S119). Furthermore, the unitization unit 109 multiplexes with other NAL units as necessary. Although not shown in FIG. 19, when a picture is divided into a plurality of slices and encoded, the processing from steps S118 to S119 is repeated.

続いて、符号化管理部１０１は表示時刻において、符号化対象のすべての視点の画像信号の符号化処理が完了したか否かを判定する（Ｓ１２０）。当該表示時刻の画像信号の符号化処理が完了した場合（Ｓ１２０のＹ）、ステップＳ１２１の処理に進み、完了していない場合（Ｓ１２０のＮ）、ステップＳ１１８からステップＳ１２０の符号化処理を繰り返す。 Subsequently, the encoding management unit 101 determines whether the encoding processing of the image signals of all viewpoints to be encoded is completed at the display time (S120). When the encoding process of the image signal at the display time is completed (Y in S120), the process proceeds to step S121. When the encoding process is not completed (N in S120), the encoding process from step S118 to step S120 is repeated.

続いて、デプス信号符号化部１０８はデプス信号のスライスに関連する情報（すなわち、デプス信号のスライスヘッダ）および符号化対象のスライス単位のデプス信号を符号化し、デプス信号のスライスの符号化ビット列を生成する（Ｓ１２１）。 Subsequently, the depth signal encoding unit 108 encodes the information related to the slice of the depth signal (that is, the slice header of the depth signal) and the depth signal of the slice unit to be encoded, and the encoded bit string of the slice of the depth signal. Generate (S121).

続いて、ユニット化部１０９はステップＳ１２１の処理により得られた、スライス単位のデプス信号の符号化ビット列に、ＮＡＬユニット単位で扱うためのヘッダ情報を付加することによりＮＡＬユニット化する（Ｓ１２２）。さらに、ユニット化部１０９は、必要に応じて他のＮＡＬユニットと多重化する。なお、図１９には図示していないが、ピクチャを複数のスライスに分割して符号化する場合、ステップＳ１２１からＳ１２２の処理を繰り返す。 Subsequently, the unitizing unit 109 adds the header information to be handled in units of NAL units to the encoded bit string of the depth signal in units of slices obtained by the processing in step S121, thereby forming a NAL unit (S122). Furthermore, the unitization unit 109 multiplexes with other NAL units as necessary. Although not shown in FIG. 19, when a picture is divided into a plurality of slices and encoded, the processing from step S121 to S122 is repeated.

続いて、符号化管理部１０１は当該表示時刻において、符号化対象のすべての視点のデプス信号の符号化処理が完了したか否かを判定する（Ｓ１２３）。当該表示時刻のデプス信号の符号化処理が完了した場合（Ｓ１２３のＹ）、ステップＳ１２１の処理に進み、完了していない場合（Ｓ１２３のＮ）、ステップＳ１２１からステップＳ１２３の符号化処理を繰り返す。 Subsequently, the encoding management unit 101 determines whether or not the encoding processing of the depth signals of all viewpoints to be encoded has been completed at the display time (S123). When the encoding process of the depth signal at the display time is completed (Y in S123), the process proceeds to step S121. When the encoding process is not completed (N in S123), the encoding process from step S121 to step S123 is repeated.

続いて、符号化管理部１０１は、符号化対象のすべての画像信号およびデプス信号の符号化処理が完了したか否かを判定する（Ｓ１２４）。すべての画像信号およびデプス信号の符号化処理が完了した場合（Ｓ１２４のＹ）、本符号化処理を終了し、完了していない場合（Ｓ１２４のＮ）、ステップＳ１１３からステップＳ１２４の符号化処理を繰り返す。 Subsequently, the encoding management unit 101 determines whether or not encoding processing for all image signals and depth signals to be encoded has been completed (S124). When the encoding process of all the image signals and the depth signals is completed (Y in S124), this encoding process is finished. When the encoding process is not completed (N in S124), the encoding process from step S113 to step S124 is performed. repeat.

次に、図１、３に示した実施の形態１に係る画像符号化装置１００、１００ａにより生成された多視点画像の符号化ビット列をネットワークを介して伝送する場合の送信処理手順について説明する。
図２０は、実施の形態１に係る画像符号化装置１００、１００ａにより生成された多視点画像の符号化ビット列をネットワークを介して伝送する場合の送信処理手順を示すフローチャートである。図２０のフローチャートに示す全体の処理は、図１９のフローチャートにおける、ステップＳ１０２、Ｓ１０４、Ｓ１０６、Ｓ１０８、Ｓ１１０、Ｓ１１２、Ｓ１１４、Ｓ１１５、Ｓ１１７、Ｓ１１９およびＳ１２２のそれぞれ処理の後に、必要に応じて実行される。 Next, a transmission processing procedure in the case of transmitting the encoded bit string of the multi-viewpoint image generated by the image encoding devices 100 and 100a according to Embodiment 1 shown in FIGS. 1 and 3 via the network will be described.
FIG. 20 is a flowchart illustrating a transmission processing procedure in the case of transmitting an encoded bit sequence of a multi-view image generated by the image encoding devices 100 and 100a according to Embodiment 1 via a network. The entire process shown in the flowchart of FIG. 20 is executed as necessary after each of steps S102, S104, S106, S108, S110, S112, S114, S115, S117, S119, and S122 in the flowchart of FIG. Is done.

図２０のフローチャートにおいて、図示しないパケット化部は、図１９のフローチャートにおける、ステップＳ１０２、Ｓ１０４、Ｓ１０６、Ｓ１０８、Ｓ１１０、Ｓ１１２、Ｓ１１４、Ｓ１１５、Ｓ１１７、Ｓ１１９およびＳ１２２の処理により得られた符号化ビット列を、必要に応じてＭＰＥＧ−２システム方式、ＭＰ４ファイルフォーマット、ＲＴＰ等の規格に基づいてパケット化する（Ｓ２０１）。 In the flowchart of FIG. 20, the packetizing unit (not shown) is the encoded bit string obtained by the processing of steps S102, S104, S106, S108, S110, S112, S114, S115, S117, S119, and S122 in the flowchart of FIG. Are packetized based on standards such as MPEG-2 system, MP4 file format, RTP, etc. (S201).

続いて、当該パケット化部は、必要に応じてオーディオ等の符号化ビット列と多重化する（Ｓ２０２）。続いて、図示しない送信部はパケット化された符号化ビット列をネットワーク等を介して随時送信する（Ｓ２０３）。 Subsequently, the packetizing unit multiplexes with an encoded bit string such as audio as necessary (S202). Subsequently, a transmission unit (not shown) transmits the packetized encoded bit string as needed via a network or the like (S203).

なお、実施の形態１に係る画像符号化装置１００、１００ａにより符号化された符号化ビット列は、既存の単視点のＡＶＣ／Ｈ．２６４符号化方式に対応した復号装置でも復号することができる。その場合、復号側で基底視点の画像信号のみが得られる。例えば、実施の形態１に係る画像符号化装置１００、１００ａにより符号化された、図１１に示した符号化ビット列は、ＡＶＣ／Ｈ.２６４符号化方式のハイ・プロファイルに対応した復号装置で復号することができる。 The encoded bit string encoded by the image encoding devices 100 and 100a according to Embodiment 1 is an existing single-view AVC / H. The decoding apparatus corresponding to the H.264 encoding method can also decode. In this case, only the base viewpoint image signal is obtained on the decoding side. For example, the encoded bit string shown in FIG. 11 encoded by the image encoding apparatuses 100 and 100a according to Embodiment 1 is decoded by a decoding apparatus corresponding to the high profile of the AVC / H.264 encoding method. can do.

その際には、ＡＶＣ／Ｈ．２６４符号化方式のハイ・プロファイルに対応している、
（ａ）"nal_unit_type"が“７”のＮＡＬユニットであるＳＰＳのＮＡＬユニット＃Ａ、
（ｂ）"nal_unit_type"が“８”のＮＡＬユニットであるＰＰＳのＮＡＬユニット＃Ａ、＃Ｂ、＃Ｃ、
（ｃ）"nal_unit_type"が“１”のＮＡＬユニットであるスライスＮＡＬユニット＃Ａ００、および
（ｄ）"nal_unit_type"が“５”のＮＡＬユニットであるスライスＮＡＬユニット＃Ａ０１、
を復号する。 In that case, AVC / H. It corresponds to the high profile of H.264 encoding method,
(A) NAL unit #A of SPS that is a NAL unit whose “nal_unit_type” is “7”,
(B) PAL NAL units #A, #B, #C, which are NAL units whose “nal_unit_type” is “8”.
(C) Slice NAL unit # A00 which is a NAL unit whose “nal_unit_type” is “1”, and (d) Slice NAL unit # A01 which is a NAL unit whose “nal_unit_type” is “5”.
Is decrypted.

ただし、ＰＰＳのＮＡＬユニット＃Ｂ、＃ＣについてはこれらのＰＰＳを参照するスライスＮＡＬユニットは復号しないので、実際には使われない。ＡＶＣ／Ｈ．２６４符号化方式のハイ・プロファイルに対応していない"nal_unit_type"が“１５”のＮＡＬユニットであるＳＰＳのＮＡＬユニット＃Ｂ、＃Ｃは復号しない。 However, the PPS NAL units #B and #C are not actually used because the slice NAL units referring to these PPSs are not decoded. AVC / H. SPS NAL units #B and #C which are NAL units whose “nal_unit_type” is not “15” that does not correspond to the high profile of the H.264 encoding method are not decoded.

同様に、
（ａ）"nal_unit_type"が“１４”のＮＡＬユニットであるプリフィックスＮＡＬユニット＃Ａ００、
（ｂ）"nal_unit_type"が“２０”のＮＡＬユニットであるスライスＮＡＬユニット＃Ｂ１０、＃Ｂ２０、＃Ｂ１１、＃Ｂ２１、および
（ｃ）"nal_unit_type"が“２１”のＮＡＬユニットであるスライスＮＡＬユニット＃Ｃ００、＃Ｃ１０、＃Ｃ２０、＃Ｃ０１、＃Ｃ１１、＃Ｃ２１、
も復号しない。 Similarly,
(A) Prefix NAL unit # A00 which is a NAL unit whose “nal_unit_type” is “14”,
(B) Slice NAL units # B10, # B20, # B11, # B21 whose Nal units are “nal_unit_type” “20”, and (c) Slice NAL units # whose Nal units “nal_unit_type” are “21” C00, # C10, # C20, # C01, # C11, # C21,
Does not decrypt.

さらに、実施の形態１に係る画像符号化装置１００、１００ａにより符号化された符号化ビット列は、既存のＭＶＣ符合化方式に対応した復号装置でも復号することができる。その場合、復号側で多視点の画像信号のみが得られる。例えば、実施の形態１に係る画像符号化装置１００、１００ａにより符号化された、図１１に示した符号化ビット列は、ＡＶＣ／Ｈ．２６４符号化方式のマルチビュー・ハイ・プロファイルに対応した復号装置で復号することができる。 Furthermore, the encoded bit string encoded by the image encoding devices 100 and 100a according to Embodiment 1 can be decoded by a decoding device that supports the existing MVC encoding scheme. In that case, only a multi-view image signal is obtained on the decoding side. For example, the encoded bit string shown in FIG. 11 and encoded by the image encoding devices 100 and 100a according to Embodiment 1 is AVC / H. It is possible to perform decoding by a decoding device that supports the H.264 encoding multi-view high profile.

その際には、ＡＶＣ／Ｈ.２６４符号化方式のマルチビュー・ハイ・プロファイルに対応している、
（ａ）"nal_unit_type"が“７”のＮＡＬユニットであるＳＰＳのＮＡＬユニット＃Ａ、
（ｂ）"nal_unit_type"が“１５”のＮＡＬユニットであるＳＰＳのＮＡＬユニット＃Ｂ、＃Ｃ、
（ｃ）"nal_unit_type"が“８”のＮＡＬユニットであるＰＰＳのＮＡＬユニット＃Ａ、＃Ｂ、＃Ｃ、
（ｄ）"nal_unit_type"が“１４”のＮＡＬユニットであるプリフィックスＮＡＬユニット＃Ａ００、
（ｅ）"nal_unit_type"が“１”のＮＡＬユニットであるスライスＮＡＬユニット＃Ａ００、
（ｆ）"nal_unit_type"が“５”のＮＡＬユニットであるスライスＮＡＬユニット＃Ａ０１、および
（ｇ）"nal_unit_type"が“２０”のＮＡＬユニットであるスライスＮＡＬユニット＃Ｂ１０、＃Ｂ２０、＃Ｂ１１、＃Ｂ２１、
を復号する。 In that case, it corresponds to the multi-view high profile of the AVC / H.264 encoding method.
(A) NAL unit #A of SPS that is a NAL unit whose “nal_unit_type” is “7”,
(B) SPS NAL units #B, #C, which are NAL units whose “nal_unit_type” is “15”,
(C) NAL units #A, #B, #C of PPS whose “nal_unit_type” is a NAL unit of “8”,
(D) Prefix NAL unit # A00 which is a NAL unit whose “nal_unit_type” is “14”;
(E) Slice NAL unit # A00 which is a NAL unit whose “nal_unit_type” is “1”;
(F) Slice NAL unit # A01 whose Nal unit has “nal_unit_type” of “5”, and (g) Slice NAL units # B10, # B20, # B11, # whose Nal unit whose “nal_unit_type” is “20” B21,
Is decrypted.

ただし、ＳＰＳのＮＡＬユニット＃Ｃ、ＰＰＳＮＡＬユニット＃ＣについてはこれらのＳＰＳやＰＰＳを参照するスライスＮＡＬユニットは復号しないので、実際には使われない。ＡＶＣ／Ｈ．２６４符号化方式のマルチビュー・ハイ・プロファイルに対応していない"nal_unit_type"が“２１”のＮＡＬユニットであるスライスＮＡＬユニット＃Ｃ００、＃Ｃ１０、＃Ｃ２０、＃Ｃ０１、＃Ｃ１１、＃Ｃ２１は復号しない。 However, the SPS NAL unit #C and the PPS NAL unit #C are not actually used because the slice NAL unit referring to these SPS and PPS is not decoded. AVC / H. Slice NAL units # C00, # C10, # C20, # C01, # C11, and # C21 that are NAL units whose “nal_unit_type” is not “21” that does not correspond to the H.264 encoding multi-view high profile are decoded. do not do.

以上説明したように実施の形態１によれば、複数の視点からの画像信号を含む多視点画像信号を符号化して生成された多視点画像符号化ビット列と、補助情報として複数の視点からのデプス信号を含む多視点デプス信号を符号化して生成された多視点デプス信号ビット列を同一の符号化ストリームとしてユニット化することにより、多視点画像を効率よく伝送または蓄積することができる。すなわち、符号化する画像信号の視点を大きく削減することができ、符号化効率や再生品質が向上する。 As described above, according to the first embodiment, a multi-view image encoded bit string generated by encoding a multi-view image signal including image signals from a plurality of viewpoints, and depths from a plurality of viewpoints as auxiliary information. A multi-view image can be efficiently transmitted or stored by unitizing a multi-view depth signal bit sequence generated by encoding a multi-view depth signal including a signal as the same encoded stream. That is, the viewpoint of the image signal to be encoded can be greatly reduced, and the encoding efficiency and reproduction quality are improved.

さらに、上記符号化ビット列のデータ構造を、従来の単視点の画像を復号する復号装置で基底視点の画像信号のみを復号したり、従来の多視点の画像を復号する復号装置で多視点画像信号のみを復号したりすることができる構造としたことにより、スケーラブル機能を実現し、従来の単視点の２次元画像を対象としたＡＶＣ／Ｈ．２６４符号化方式や、多視点の画像信号のみを対象としたＭＶＣ方式との互換性を保つことができる。 Furthermore, the data structure of the coded bit sequence is obtained by decoding only the base-view image signal with a conventional decoding device that decodes a single-view image, or the multi-view image signal with a conventional decoding device that decodes a multi-view image. A structure capable of decoding only a single-viewpoint two-dimensional image by realizing a scalable function. It is possible to maintain compatibility with the H.264 encoding method and the MVC method for only multi-viewpoint image signals.

さらに、多視点画像信号と多視点デプス信号とを同数でそれぞれが１対１に対応している符号化ビット列を生成できるのはもちろんのこと、多視点画像信号とデプス信号の視点の数が異なっており、それぞれが１対１に対応していない符号化ビット列も生成することができる。 Furthermore, the same number of multi-view image signals and multi-view depth signals can be generated, and the number of viewpoints of the multi-view image signal and the depth signal is different. Therefore, it is possible to generate encoded bit strings that do not correspond one-to-one.

（実施の形態２）
次に、実施の形態１に係る画像符号化装置１００、１００ａにより符号化された符号化データを復号する画像復号装置３００について説明する。
図２１は、本発明の実施の形態２に係る画像復号装置３００の構成を示すブロック図である。実施の形態２に係る画像復号装置３００は、分解部３０１、復号管理部３０２、パラメータ情報復号部３２０、画像信号復号部３０７、奥行き情報復号部（より具体的には、デプス信号復号部３０９）および復号画像バッファ３１０を備える。パラメータ情報復号部３２０は、基底視点の画像信号用シーケンス情報復号部３０３、ＭＶＣ拡張情報を含むシーケンス情報復号部３０４、ピクチャ情報復号部３０５および補足付加情報復号部３０６を含む。 (Embodiment 2)
Next, an image decoding apparatus 300 that decodes encoded data encoded by the image encoding apparatuses 100 and 100a according to Embodiment 1 will be described.
FIG. 21 is a block diagram showing a configuration of image decoding apparatus 300 according to Embodiment 2 of the present invention. The image decoding apparatus 300 according to Embodiment 2 includes a decomposition unit 301, a decoding management unit 302, a parameter information decoding unit 320, an image signal decoding unit 307, and a depth information decoding unit (more specifically, a depth signal decoding unit 309). And a decoded image buffer 310. The parameter information decoding unit 320 includes a base viewpoint image signal sequence information decoding unit 303, a sequence information decoding unit 304 including MVC extension information, a picture information decoding unit 305, and a supplementary additional information decoding unit 306.

分解部３０１は、それぞれ異なる複数の視点からの複数の画像が符号化された画像符号化データと、少なくとも一つ以上の視点からの特定空間の奥行きを示す奥行き情報が符号化された奥行き情報符号化データと、複数の画像および奥行き情報のもとになる複数の視点を特定するための視点情報を含むパラメータ情報が符号化されたパラメータ情報符号化データとを含む符号化ストリームを分解する。この符号化ストリームには、実施の形態１に係る画像符号化装置１００、１００ａにより生成された符号化ストリームが含まれている。なお、この符号化ストリームに含まれる、奥行き情報符号化データの数は、画像符号化データの数より少なく設定されていてもよい。 The decomposing unit 301 is a depth information code in which encoded image data obtained by encoding a plurality of images from a plurality of different viewpoints and depth information indicating the depth of a specific space from at least one or more viewpoints are encoded. The encoded stream including the encoded data and the parameter information encoded data obtained by encoding the parameter information including the viewpoint information for specifying the plurality of viewpoints based on the plurality of images and the depth information is decomposed. This encoded stream includes the encoded stream generated by the image encoding devices 100 and 100a according to Embodiment 1. Note that the number of depth information encoded data included in the encoded stream may be set smaller than the number of image encoded data.

画像信号復号部３０７は、分解部３０１により分解された画像符号化データを復号して、複数の画像を復元する。上記複数の視点のうち基準とすべき視点が一つ設定される場合、画像信号復号部３０７は、上記複数の画像のうち、基準とすべき視点からの画像が符号化された第１画像符号化データを復号して当該画像を復元し、当該基準とすべき視点からの画像以外の画像が符号化された第２画像符号化データを復号して当該画像を復元する。 The image signal decoding unit 307 decodes the image encoded data decomposed by the decomposition unit 301 to restore a plurality of images. When one viewpoint to be a reference among the plurality of viewpoints is set, the image signal decoding unit 307 encodes a first image code obtained by encoding an image from the viewpoint to be the reference among the plurality of images. The decoded data is decoded to restore the image, and the second image encoded data obtained by encoding an image other than the image from the viewpoint to be the reference is decoded to restore the image.

上記奥行き情報復号部は、分解部３０１により分解された奥行き情報符号化データを復号して、奥行き情報を復元する。ここで、奥行き情報符号化データは、ある視点からのモノクローム画像で表された奥行き情報が符号化されたデータであってもよい。この場合、上記奥行き情報復号部は、奥行き情報符号化データを復号して、当該モノクローム画像を復元する。 The depth information decoding unit decodes the depth information encoded data decomposed by the decomposition unit 301 to restore the depth information. Here, the depth information encoded data may be data obtained by encoding depth information represented by a monochrome image from a certain viewpoint. In this case, the depth information decoding unit decodes the depth information encoded data and restores the monochrome image.

パラメータ情報復号部３２０は、分解部３０１により分解されたパラメータ情報符号化データを復号して、パラメータ情報を復元する。上記複数の視点のうち基準とすべき視点が一つ設定される場合、パラメータ情報復号部３２０は、上記複数の画像のうち、基準とすべき視点からの画像の第１パラメータ情報が符号化された第１パラメータ情報符号化データを復号して、当該第１パラメータ情報を復元する。また、パラメータ情報復号部３２０は、上記複数の画像のうち、当該基準とすべき視点からの画像以外の画像の第２パラメータ情報が符号化された第２パラメータ情報符号化データを復号して、当該第２パラメータ情報を復元する。また、パラメータ情報復号部３２０は、上記奥行き情報の第３パラメータ情報が符号化された第３パラメータ情報符号化データを復号して、当該第３パラメータ情報を復元する。 The parameter information decoding unit 320 decodes the parameter information encoded data decomposed by the decomposition unit 301 and restores the parameter information. When one viewpoint to be a reference among the plurality of viewpoints is set, the parameter information decoding unit 320 encodes the first parameter information of the image from the viewpoint to be the reference among the plurality of images. The first parameter information encoded data is decoded to restore the first parameter information. Further, the parameter information decoding unit 320 decodes second parameter information encoded data obtained by encoding the second parameter information of an image other than the image from the viewpoint to be the reference among the plurality of images, The second parameter information is restored. The parameter information decoding unit 320 decodes the third parameter information encoded data obtained by encoding the third parameter information of the depth information, and restores the third parameter information.

なお、第３パラメータ情報は、第２パラメータ情報のシンタックス構造に対応するシンタックス構造で記述されてもよい。例えば、第２パラメータ情報および第３パラメータ情報はＡＶＣ／Ｈ．２６４符号化方式のマルチビュー・ハイ・プロファイルに準拠して記述されてもよい。また、第２パラメータ情報および第３パラメータ情報には、視点の識別情報が記述されていてもよく、上記画像符号化データとして符号化されていた画像のもとになる視点の位置と、上記奥行き情報符号化データとして符号化されていた奥行き情報のもとになる視点の位置が一致する場合、それらの視点に共通の識別情報が付与されていてもよい。 Note that the third parameter information may be described in a syntax structure corresponding to the syntax structure of the second parameter information. For example, the second parameter information and the third parameter information are AVC / H. It may be described in conformity with the multi-view high profile of the H.264 encoding method. Further, viewpoint identification information may be described in the second parameter information and the third parameter information, the position of the viewpoint that is the basis of the image encoded as the image encoded data, and the depth When the viewpoint positions that are the basis of the depth information encoded as the information encoded data match, common identification information may be given to these viewpoints.

図２２は、実施の形態２の変形例に係る画像復号装置３００ａの構成を示すブロック図である。実施の形態２の変形例に係る画像復号装置３００ａは、図２１に示す画像復号装置３００に仮想視点画像生成部３３０が追加された構成である。 FIG. 22 is a block diagram showing a configuration of an image decoding device 300a according to a modification of the second embodiment. The image decoding device 300a according to the modification of the second embodiment has a configuration in which a virtual viewpoint image generation unit 330 is added to the image decoding device 300 illustrated in FIG.

当該変形例において、仮想視点画像生成部３３０は、画像信号復号部３０７により復号された画像および上記奥行き情報復号部により復号された奥行き情報をもとに、その画像のもとになる視点と異なる、別の視点からの画像を生成する。より具体的には、仮想視点画像生成部３３０は、画像信号復号部３０７により復号された画像、上記奥行き情報復号部により復号された奥行き情報、およびパラメータ情報復号部３２０により復号された、カメラパラメータ等のパラメータ情報をもとに、仮想視点からの画像を生成する。 In this modification, the virtual viewpoint image generation unit 330 is different from the viewpoint based on the image based on the image decoded by the image signal decoding unit 307 and the depth information decoded by the depth information decoding unit. , Generate an image from another viewpoint. More specifically, the virtual viewpoint image generation unit 330 has the image decoded by the image signal decoding unit 307, the depth information decoded by the depth information decoding unit, and the camera parameter decoded by the parameter information decoding unit 320. An image from a virtual viewpoint is generated based on parameter information such as

仮想視点画像生成部３３０は、この仮想視点からの画像の生成を既存のアルゴリズムを用いて実現することができる。この仮想視点は、アプリケーションからの指示により、またはユーザ操作に起因して仮想視点画像生成部３３０に指定される。その他の処理は、図２１に示した実施の形態２の基本例に係る画像復号装置３００の説明と同様のため、その説明を省略する。 The virtual viewpoint image generation unit 330 can realize generation of an image from this virtual viewpoint using an existing algorithm. This virtual viewpoint is designated to the virtual viewpoint image generation unit 330 by an instruction from the application or due to a user operation. The other processing is the same as the description of the image decoding apparatus 300 according to the basic example of the second embodiment shown in FIG.

以下、実施の形態２に係る画像復号装置３００、３００ａの構成について、より具体的に説明する。分解部３０１は、実施の形態１に係る画像符号化装置１００、１００ａにより生成され符号化ビット列を取得する。符号化ビット列を取得する形態は、ネットワーク伝送された符号化ビット列を受信する形態でもよし、ＤＶＤ等の蓄積メディアに記録された符号化ビット列を読み込む形態でもよいし、ＢＳ／地上波等の放送で放映された符号化ビット列を受信する形態でもよい。 Hereinafter, the configuration of the image decoding devices 300 and 300a according to Embodiment 2 will be described more specifically. The decomposition unit 301 acquires the encoded bit string generated by the image encoding devices 100 and 100a according to Embodiment 1. The form for obtaining the encoded bit string may be a form for receiving the encoded bit string transmitted over the network, a form for reading the encoded bit string recorded on a storage medium such as a DVD, or a broadcast such as BS / terrestrial wave. A form in which a broadcast coded bit string is received may be used.

また、分解部３０１は、供給される符号化ビット列をＮＡＬユニット単位に分離する。この際、図示しないパケット分解部は、必要に応じてＭＰＥＧ−２システム方式、ＭＰ４ファイルフォーマット、ＲＴＰ等のパケット・ヘッダを除去する。分解部３０１は、分離したＮＡＬユニットのヘッダ部であるＮＡＬユニットヘッダを復号し、復号したＮＡＬユニットヘッダの情報を復号管理部３０２に供給する。これらのＮＡＬユニットヘッダの情報の管理は復号管理部３０２で行われる。 Also, the decomposing unit 301 separates the supplied encoded bit string into NAL unit units. At this time, a packet decomposing unit (not shown) removes packet headers such as MPEG-2 system, MP4 file format, RTP, etc. as necessary. The disassembling unit 301 decodes the NAL unit header that is the header portion of the separated NAL unit, and supplies the decoded NAL unit header information to the decoding management unit 302. Management of the information of these NAL unit headers is performed by the decoding management unit 302.

分解部３０１は、ＮＡＬユニットヘッダに含まれるＮＡＬユニットの種類を見分ける識別子である"nal_unit_type"の値が“７”、すなわち当該ＮＡＬユニットが、基底視点の画像信号の、シーケンス全体の符号化に係るパラメータ情報が符号化されている符号化ビット列の場合、当該ＮＡＬユニットのＲＢＳＰ部の符号化ビット列を基底視点の画像信号用シーケンス情報復号部３０３に供給する。 The disassembling unit 301 has a value of “nal_unit_type” that is an identifier for identifying the type of the NAL unit included in the NAL unit header, which is “7”, that is, the NAL unit relates to encoding of the entire sequence of the base viewpoint image signal. In the case of an encoded bit string in which parameter information is encoded, the encoded bit string of the RBSP part of the NAL unit is supplied to the base-view image signal sequence information decoding unit 303.

分解部３０１は、"nal_unit_type"の値が“１５”、すなわちＭＶＣ拡張情報を含むシーケンス全体の符号化に係るパラメータ情報が符号化されている符号化ビット列の場合、当該ＮＡＬユニットのＲＢＳＰ部の符号化ビット列をＭＶＣ拡張情報を含むシーケンス情報復号部３０４に供給する。 When the value of “nal_unit_type” is “15”, that is, in the case of an encoded bit string in which parameter information related to encoding of the entire sequence including MVC extension information is encoded, the decomposing unit 301 encodes the code of the RBSP unit of the NAL unit The generated bit string is supplied to the sequence information decoding unit 304 including the MVC extension information.

分解部３０１は、"nal_unit_type"の値が“８”、すなわちピクチャの符号化に係るパラメータ情報等が符号化されている符号化ビット列の場合、当該ＮＡＬユニットのＲＢＳＰ部の符号化ビット列をピクチャ情報復号部３０５に供給する。 When the value of “nal_unit_type” is “8”, that is, in the case of an encoded bit string in which parameter information related to the encoding of a picture is encoded, the decomposing unit 301 converts the encoded bit string of the RBSP part of the NAL unit into picture information It supplies to the decoding part 305.

分解部３０１は、"nal_unit_type"の値が“６”、すなわち補足付加情報が符号化されている符号化ビット列の場合、当該ＮＡＬユニットのＲＢＳＰ部の符号化ビット列を補足付加情報復号部３０６に供給する。 When the value of “nal_unit_type” is “6”, that is, the encoded bit string in which the supplementary additional information is encoded, the decomposing unit 301 supplies the encoded bit string of the RBSP unit of the NAL unit to the supplementary additional information decoding unit 306. To do.

分解部３０１は、"nal_unit_type"の値が“１”または“５”、すなわち基底視点の画像信号の、符号化モード、動きベクトルまたは視差ベクトル、符号化残差信号等が符号化されている符号化ビット列の場合、当該ＮＡＬユニットのＲＢＳＰ部の符号化ビット列を画像信号復号部３０７に供給する。 The decomposition unit 301 has a value of “nal_unit_type” of “1” or “5”, that is, a code in which an encoding mode, a motion vector or a disparity vector, an encoded residual signal, and the like of an image signal of a base viewpoint are encoded In the case of an encoded bit sequence, the encoded bit sequence of the RBSP unit of the NAL unit is supplied to the image signal decoding unit 307.

分解部３０１は、"nal_unit_type"の値が“２０”、すなわち非基底視点の画像信号の、符号化モード、動きベクトルまたは視差ベクトル、符号化残差信号等が符号化されている符号化ビット列の場合、当該ＮＡＬユニットのＲＢＳＰ部の符号化ビット列を画像信号復号部３０７に供給する。 The decomposition unit 301 has a value of “nal_unit_type” of “20”, that is, an encoded bit string in which an encoding mode, a motion vector or a disparity vector, an encoded residual signal, etc. of an image signal of a non-basis viewpoint are encoded. In this case, the encoded bit string of the RBSP unit of the NAL unit is supplied to the image signal decoding unit 307.

分解部３０１は、"nal_unit_type"の値が“２１”、すなわちデプス信号の、符号化モード、動きベクトルまたは視差ベクトル、符号化残差信号等が符号化されている符号化ビット列の場合、当該ＮＡＬユニットのＲＢＳＰ部の符号化ビット列をデプス信号復号部３０９に供給する。 If the value of “nal_unit_type” is “21”, that is, the coded bit sequence in which the coding mode, the motion vector or the disparity vector, the coded residual signal, or the like of the depth signal is coded, The encoded bit string of the RBSP unit of the unit is supplied to the depth signal decoding unit 309.

なお、"nal_unit_type"の値が“１４”、すなわち後に続くスライスＮＡＬユニットの視点情報等が符号化されているプリフィックスＮＡＬユニットの場合、当該ＮＡＬユニットのＲＢＳＰ部の符号化ビット列は空である。 In the case of a prefix NAL unit in which the value of “nal_unit_type” is “14”, that is, the viewpoint information of the subsequent slice NAL unit is encoded, the encoded bit string of the RBSP portion of the NAL unit is empty.

分解部３０１は、"nal_unit_type"の値が“１４” 、“２０” 、“２１”の場合、ＮＡＬユニットヘッダに含まれる視点情報である"nal_unit_header_svc_mvc_extension"も復号し、復号された視点情報を復号管理部３０２に供給する。ここで復号される視点情報には視点ＩＤ等が含まれる。なお、"nal_unit_type"の値が“１４”のＮＡＬユニットヘッダに含まれる視点情報は、後に続くＮＡＬユニットの視点情報となり、"nal_unit_type"の値が “２０” または“２１” のＮＡＬユニットヘッダに含まれる視点情報は、当該ＮＡＬユニットの視点情報となる。これらの視点情報の管理は復号管理部３０２で行われる。 When the value of “nal_unit_type” is “14”, “20”, or “21”, the decomposition unit 301 also decodes “nal_unit_header_svc_mvc_extension” that is the viewpoint information included in the NAL unit header, and decodes and manages the decoded viewpoint information Supplied to the unit 302. The viewpoint information decoded here includes a viewpoint ID and the like. The viewpoint information included in the NAL unit header whose “nal_unit_type” value is “14” is the viewpoint information of the subsequent NAL unit, and is included in the NAL unit header whose “nal_unit_type” value is “20” or “21”. The viewpoint information to be displayed is the viewpoint information of the NAL unit. The management of these viewpoint information is performed by the decoding management unit 302.

基底視点の画像信号用シーケンス情報復号部３０３は、分解部３０１から供給される基底視点の画像信号の、シーケンス全体の符号化に係るパラメータ情報が符号化された符号化ビット列を復号する。この供給される符号化ビット列は、図１１に示した符号化ビット列のＳＰＳ＃ＡのＲＢＳＰ部に相当する。ここで、供給されるＲＢＳＰ部の符号化ビット列は、図１３に示した"seq_parameter_set_rbsp"である。基底視点の画像信号用シーケンス情報復号部３０３は、図１３に示した"seq_parameter_set_rbsp"のシンタックス構造に従って符号化ビット列を復号し、基底視点の画像信号の、シーケンス全体の符号化に係るパラメータ情報を得る。基底視点の画像信号用シーケンス情報復号部３０３は、この復号された基底視点の画像信号の、シーケンス情報を復号管理部３０２に供給する。この基底視点の画像信号の、シーケンス情報の管理は復号管理部３０２で行われる。 The base viewpoint image signal sequence information decoding unit 303 decodes an encoded bit string in which parameter information relating to encoding of the entire sequence of the base viewpoint image signal supplied from the decomposition unit 301 is encoded. The supplied encoded bit string corresponds to the RBSP part of SPS # A of the encoded bit string shown in FIG. Here, the supplied encoded bit string of the RBSP part is “seq_parameter_set_rbsp” shown in FIG. The base viewpoint image signal sequence information decoding unit 303 decodes the encoded bit string according to the syntax structure of “seq_parameter_set_rbsp” shown in FIG. 13, and sets parameter information related to the encoding of the entire sequence of the base viewpoint image signal. obtain. The base viewpoint image signal sequence information decoding unit 303 supplies sequence information of the decoded base viewpoint image signal to the decoding management unit 302. The decoding management unit 302 manages sequence information of the base viewpoint image signal.

ＭＶＣ拡張情報を含むシーケンス情報復号部３０４は、分解部３０１から供給されるＭＶＣ拡張情報を含むシーケンス全体の符号化に係るパラメータ情報、すなわち非基底視点の画像信号のシーケンス情報またはデプス信号のシーケンス情報が符号化された符号化ビット列を復号する。この供給される符号化ビット列は、図１１に示した符号化ビット列のＳＰＳ＃Ｂ、ＳＰＳ＃ＣのＲＢＳＰ部に相当する。ここで、供給されるＲＢＳＰ部の符号化ビット列は、図１４にした"subset_seq_parameter_set_rbsp"である。ＭＶＣ拡張情報を含むシーケンス情報復号部３０４は、図１４にした"subset_seq_parameter_set_rbsp"のシンタックス構造に従って符号化ビット列を復号し、非基底視点の画像信号の、シーケンス全体の符号化に係るパラメータ情報またはデプス信号のシーケンス全体の符号化に係るパラメータ情報を得る。 The sequence information decoding unit 304 including the MVC extension information is parameter information related to the encoding of the entire sequence including the MVC extension information supplied from the decomposition unit 301, that is, the sequence information of the image signal of the non-basis viewpoint or the sequence information of the depth signal Decode the encoded bit string encoded by. The supplied encoded bit string corresponds to the RBSP part of SPS # B and SPS # C of the encoded bit string shown in FIG. Here, the supplied encoded bit string of the RBSP part is “subset_seq_parameter_set_rbsp” shown in FIG. The sequence information decoding unit 304 including the MVC extension information decodes the encoded bit string in accordance with the “subset_seq_parameter_set_rbsp” syntax structure shown in FIG. 14, and parameter information or depth related to the encoding of the entire sequence of the image signal of the non-basis viewpoint. Parameter information relating to the coding of the entire signal sequence is obtained.

非基底視点の画像信号のシーケンス情報か、デプス信号のシーケンス情報かは、"profile_idc"の値を参照することにより判別することができる。"profile_idc"の値が、ＡＶＣ／Ｈ．２６４符号化方式のマルチビュー・ハイ・プロファイルを示す“１１８”の場合、非基底視点の画像信号のシーケンス情報であり、多視点デプス信号も復号できるプロファイルであることを示す “１２０”の場合、デプス信号のシーケンス情報である。"subset_seq_parameter_set_rbsp"にはＭＶＣ拡張情報が含まれており、ＭＶＣ拡張情報を含むシーケンス情報復号部３０４で復号されるシーケンス情報には、ＭＶＣ拡張情報も含まれる。ＭＶＣ拡張情報を含むシーケンス情報復号部３０４は、これらの復号された、非基底視点の画像信号のシーケンス情報またはデプス信号のシーケンス情報を復号管理部３０２に供給する。これらのシーケンス情報の管理は復号管理部３０２で行われる。 Whether the sequence information of the non-basis viewpoint image signal or the depth signal can be determined by referring to the value of “profile_idc”. The value of “profile_idc” is AVC / H. In the case of “118” indicating the multi-view high profile of the H.264 encoding method, it is the sequence information of the image signal of the non-basis viewpoint, and in the case of “120” indicating that the multi-view depth signal is a profile that can be decoded, This is the sequence information of the depth signal. “subset_seq_parameter_set_rbsp” includes MVC extension information, and the sequence information decoded by the sequence information decoding unit 304 including the MVC extension information also includes MVC extension information. The sequence information decoding unit 304 including the MVC extension information supplies the decoded non-basis viewpoint image signal sequence information or depth signal sequence information to the decoding management unit 302. Management of these sequence information is performed by the decoding management unit 302.

ピクチャ情報復号部３０５は、分解部３０１から供給されるピクチャ全体の符号化に係るパラメータ情報が符号化された符号化ビット列を復号する。この供給される符号化ビット列は、図１１に示した符号化ビット列のＰＰＳ＃Ａ、ＰＰＳ＃Ｂ、ＰＰＳ＃ＣのＲＢＳＰ部に相当する。ここで、供給されるＲＢＳＰ部の符号化ビット列は、図１５に示した"pic_parameter_set_rbsp"である。ピクチャ情報復号部３０５は、図１５に示した"pic_parameter_set_rbsp"のシンタックス構造に従って符号化ビット列を復号し、基底視点の画像信号、非基底視点の画像信号、またはデプス信号の、ピクチャ全体の符号化に係るパラメータ情報を得る。ピクチャ情報復号部３０５は、この復号されたピクチャ情報を復号管理部３０２に供給する。このピクチャ情報の管理は復号管理部３０２で行われる。 The picture information decoding unit 305 decodes a coded bit string in which parameter information related to coding of the entire picture supplied from the decomposition unit 301 is coded. This supplied encoded bit string corresponds to the RBSP part of PPS # A, PPS # B, and PPS # C of the encoded bit string shown in FIG. Here, the supplied encoded bit string of the RBSP part is “pic_parameter_set_rbsp” shown in FIG. The picture information decoding unit 305 decodes the encoded bit string in accordance with the “pic_parameter_set_rbsp” syntax structure shown in FIG. 15 and encodes the entire picture of the base-view image signal, the non-base-view image signal, or the depth signal. The parameter information concerning is obtained. The picture information decoding unit 305 supplies the decoded picture information to the decoding management unit 302. This picture information is managed by the decoding manager 302.

補足付加情報復号部３０６は、分解部３０１から供給される補足付加情報が符号化された符号化ビット列を復号し、補足付加情報を出力する。供給される符号化ビット列にカメラパラメータ情報が含まれている場合、復号後の仮想視点の画像信号の生成や表示の際に、このカメラパラメータ情報を用いることができる。 The supplementary additional information decoding unit 306 decodes the encoded bit string in which the supplementary additional information supplied from the decomposition unit 301 is encoded, and outputs the supplementary additional information. When camera parameter information is included in the supplied encoded bit string, this camera parameter information can be used when generating or displaying an image signal of a virtual viewpoint after decoding.

画像信号復号部３０７は、分解部３０１から供給される基底視点の画像信号の、スライスヘッダ、並びにスライスの符号化モード、動きベクトル、符号化残差信号等が符号化された符号化ビット列を復号する。この供給される符号化ビット列は、図１１に示した符号化ビット列のスライス＃Ａ００、＃Ａ０１のＲＢＳＰ部に相当する。ここで、供給されるＲＢＳＰ部の符号化ビット列は、図１７に示した"slice_layer_without_partitioning_rbsp"である。 The image signal decoding unit 307 decodes the encoded bit string in which the slice header of the base viewpoint image signal supplied from the decomposition unit 301 and the encoding mode, motion vector, encoded residual signal, and the like of the slice are encoded. To do. This supplied encoded bit string corresponds to the RBSP part of slices # A00 and # A01 of the encoded bit string shown in FIG. Here, the supplied encoded bit string of the RBSP part is “slice_layer_without_partitioning_rbsp” shown in FIG.

画像信号復号部３０７は、図１７に示した"slice_layer_without_partitioning_rbsp"のシンタックス構造に従って符号化ビット列を復号する。まず、画像信号復号部３０７は、"slice_layer_without_partitioning_rbsp"に含まれる"slice_header"を復号し、スライスに関連する情報を得る。画像信号復号部３０７は、この復号されたスライスに関連する情報を復号管理部３０２に供給する。 The image signal decoding unit 307 decodes the encoded bit string according to the syntax structure of “slice_layer_without_partitioning_rbsp” illustrated in FIG. First, the image signal decoding unit 307 decodes “slice_header” included in “slice_layer_without_partitioning_rbsp” to obtain information related to the slice. The image signal decoding unit 307 supplies information related to the decoded slice to the decoding management unit 302.

ここで上述した通り、"slice_layer_without_partitioning_rbsp"に含まれる"slice_header"には、参照すべきＰＰＳを特定する番号"pic_parameter_set_id"が含まれており、図１１に示したスライス＃Ａ００、＃Ａ０１の"pic_parameter_set_id"には、スライス＃Ａ００、＃Ａ０１が参照すべきＰＰＳ＃Ａの"pic_parameter_set_id"の値が設定されている。また、ＰＰＳ＃Ａの"seq_parameter_set_id"には、ＰＰＳ＃Ａが参照すべきＳＰＳ＃Ａの"seq_parameter_set_id"の値が設定されているため、スライス＃Ａ００、＃Ａ０１が参照すべきシーケンス情報がＳＰＳ＃Ａであることを明確に特定することができる。これらの管理は復号管理部３０２で行われる。 As described above, “slice_header” included in “slice_layer_without_partitioning_rbsp” includes the number “pic_parameter_set_id” for identifying the PPS to be referred to, and “pic_parameter_set_id” of slices # A00 and # A01 shown in FIG. Is set with the value of “pic_parameter_set_id” of PPS # A to be referred to by slices # A00 and # A01. In addition, since the value of “seq_parameter_set_id” of SPS # A to be referred to by PPS # A is set in “seq_parameter_set_id” of PPS # A, the sequence information to be referred to by slices # A00 and # A01 is SPS #. A can be clearly identified. These managements are performed by the decryption management unit 302.

画像信号復号部３０７は、スライス＃Ａ００または＃Ａ０１の"slice_header"から復号されたスライスに関連する情報に加えて、復号管理部３０２から供給されるスライス＃Ａ００、＃Ａ０１が参照すべき、ＳＰＳ＃Ａから復号されたシーケンス情報およびＰＰＳ＃Ａから復号されたピクチャ情報を用いて、"slice_layer_without_partitioning_rbsp"に含まれる"slice_data"を復号し、基底視点の復号画像信号を得る。 The image signal decoding unit 307 should refer to the slices # A00 and # A01 supplied from the decoding management unit 302 in addition to the information related to the slice decoded from the “slice_header” of the slice # A00 or # A01. Using the sequence information decoded from #A and the picture information decoded from PPS # A, “slice_data” included in “slice_layer_without_partitioning_rbsp” is decoded to obtain a base-view decoded image signal.

この基底視点の復号画像信号は、復号画像バッファ３１０に格納される。基底視点の画像信号の符号化ビット列を復号する際、動き補償予測等のインター予測を用いることもあるが、その際には既に復号され、復号画像バッファ３１０に格納された基底視点の復号画像信号を参照画像として利用する。なお、基底視点のスライスＮＡＬユニットのＮＡＬユニットヘッダには視点情報が含まれないが、基底視点のスライスＮＡＬユニットの前に符号化されるプリフィックスＮＡＬユニットのＮＡＬユニットヘッダの視点情報を、基底視点のスライスＮＡＬユニットの視点情報とする。 The decoded image signal of the base viewpoint is stored in the decoded image buffer 310. When decoding the encoded bit string of the base viewpoint image signal, inter prediction such as motion compensation prediction may be used. In this case, the base viewpoint decoded image signal that has already been decoded and stored in the decoded image buffer 310 is used. Is used as a reference image. Although the viewpoint information is not included in the NAL unit header of the base viewpoint slice NAL unit, the viewpoint information of the NAL unit header of the prefix NAL unit encoded before the base viewpoint slice NAL unit is changed to the base viewpoint slice information. The viewpoint information of the slice NAL unit is used.

さらに、画像信号復号部３０７は、分解部３０１から供給される非基底視点の画像信号の、スライスヘッダ、並びにスライスの符号化モード、動きベクトルまたは視差ベクトル、符号化残差信号等が符号化された符号化ビット列を復号する。この供給される符号化ビット列は、図１１に示した符号化ビット列のスライス＃Ｂ２０、＃Ｂ１０、＃Ｂ４０、＃Ｂ３０、＃Ｂ２１、＃Ｂ１１、＃Ｂ４１、＃Ｂ３１のＲＢＳＰ部に相当する。 Furthermore, the image signal decoding unit 307 encodes the slice header, the slice encoding mode, the motion vector or the disparity vector, the encoded residual signal, and the like of the image signal of the non-basis viewpoint supplied from the decomposition unit 301. The encoded bit string is decoded. The supplied encoded bit sequence corresponds to the RBSP portion of slices # B20, # B10, # B40, # B30, # B21, # B11, # B41, and # B31 of the encoded bit sequence shown in FIG.

ここで、供給されるＲＢＳＰ部の符号化ビット列は、図１８に示した"slice_layer_in_scalable_extension_rbsp"である。画像信号復号部３０７は、図１８に示した"slice_layer_in_scalable_extension_rbsp"のシンタックス構造に従って符号化ビット列を復号する。まず、画像信号復号部３０７は、"slice_layer_in_scalable_extension_rbsp"に含まれる"slice_header"を復号し、スライスに関連する情報を得る。画像信号復号部３０７は、この復号されたスライスに関連する情報を復号管理部３０２に供給する。 Here, the supplied encoded bit string of the RBSP part is “slice_layer_in_scalable_extension_rbsp” shown in FIG. The image signal decoding unit 307 decodes the encoded bit string according to the syntax structure of “slice_layer_in_scalable_extension_rbsp” illustrated in FIG. First, the image signal decoding unit 307 decodes “slice_header” included in “slice_layer_in_scalable_extension_rbsp” to obtain information related to the slice. The image signal decoding unit 307 supplies information related to the decoded slice to the decoding management unit 302.

ここで上述した通り、"slice_layer_in_scalable_extension_rbsp"に含まれる"slice_header"には、参照すべきＰＰＳを特定する番号"pic_parameter_set_id"が含まれており、図１１に示したスライス＃Ｂ２０、＃Ｂ１０、＃Ｂ４０、＃Ｂ３０、＃Ｂ２１、＃Ｂ１１、＃Ｂ４１、＃Ｂ３１の"pic_parameter_set_id"には、スライス＃Ｂ２０、＃Ｂ１０、＃Ｂ４０、＃Ｂ３０、＃Ｂ２１、＃Ｂ１１、＃Ｂ４１、＃Ｂ３１が参照すべきＰＰＳ＃Ｂの"pic_parameter_set_id"の値が設定されている。 As described above, “slice_header” included in “slice_layer_in_scalable_extension_rbsp” includes a number “pic_parameter_set_id” that identifies the PPS to be referred to, and includes slices # B20, # B10, # B40, In “pic_parameter_set_id” of # B30, # B21, # B11, # B41, and # B31, the PPS that the slices # B20, # B10, # B40, # B30, # B21, # B11, # B41, and # B31 should refer to A value of “pic_parameter_set_id” of #B is set.

また、ＰＰＳ＃Ｂの"seq_parameter_set_id"には、ＰＰＳ＃Ｂが参照すべきＳＰＳ＃Ｂの"seq_parameter_set_id"の値が設定されているため、スライス＃Ｂ２０、＃Ｂ１０、＃Ｂ４０、＃Ｂ３０、＃Ｂ２１、＃Ｂ１１、＃Ｂ４１、＃Ｂ３１が参照すべきシーケンス情報がＳＰＳ＃Ｂであることを明確に特定することができる。これらの管理は復号管理部３０２で行われる。 In addition, since the value of “seq_parameter_set_id” of SPS # B to be referred to by PPS # B is set in “seq_parameter_set_id” of PPS # B, slices # B20, # B10, # B40, # B30, # B21 , # B11, # B41, # B31 can clearly specify that the sequence information to be referred to is SPS # B. These managements are performed by the decryption management unit 302.

画像信号復号部３０７は、
（ａ）スライス＃Ｂ２０、＃Ｂ１０、＃Ｂ４０、＃Ｂ３０、＃Ｂ２１、＃Ｂ１１、＃Ｂ４１、＃Ｂ３１の"slice_header"から復号されたスライスに関連する情報に加えて、
（ｂ）復号管理部３０２から供給されるスライス＃Ｂ２０、＃Ｂ１０、＃Ｂ４０、＃Ｂ３０、＃Ｂ２１、＃Ｂ１１、＃Ｂ４１、＃Ｂ３１のＮＡＬユニットヘッダに含まれていた"nal_unit_header_svc_mvc_extension"から復号された視点情報、
（ｃ）スライス＃Ｂ２０、＃Ｂ１０、＃Ｂ４０、＃Ｂ３０、＃Ｂ２１、＃Ｂ１１、＃Ｂ４１、＃Ｂ３１が参照すべきＳＰＳ＃Ｂから復号されたシーケンス情報、および
（ｄ）スライス＃Ｂ２０、＃Ｂ１０、＃Ｂ４０、＃Ｂ３０、＃Ｂ２１、＃Ｂ１１、＃Ｂ４１、＃Ｂ３１が参照すべきＰＰＳ＃Ｂから復号されたピクチャ情報、
を用いて、"slice_layer_in_scalable_extension_rbsp"に含まれる"slice_data"を復号し、非基底視点の復号画像信号を得る。 The image signal decoding unit 307
(A) In addition to information related to slices decoded from “slice_header” of slices # B20, # B10, # B40, # B30, # B21, # B11, # B41, and # B31,
(B) Decoded from “nal_unit_header_svc_mvc_extension” included in the NAL unit header of slices # B20, # B10, # B40, # B30, # B21, # B11, # B41, and # B31 supplied from the decoding management unit 302 Perspective information,
(C) Sequence information decoded from SPS # B to be referenced by slices # B20, # B10, # B40, # B30, # B21, # B11, # B41, # B31, and (d) slices # B20, # B10, # B40, # B30, # B21, # B11, # B41, # B31, picture information decoded from PPS # B to be referred to,
Is used to decode “slice_data” included in “slice_layer_in_scalable_extension_rbsp” to obtain a decoded image signal of a non-basis viewpoint.

この非基底視点の復号画像信号は、復号画像バッファ３１０に格納される。非基底視点の画像信号の符号化ビット列を復号する際、視点間予測や動き補償予測等のインター予測を用いることもあるが、その際には既に復号され、復号画像バッファ３１０に格納された基底視点、または非基底視点の画像信号を参照画像として利用する。 The decoded image signal of the non-basis viewpoint is stored in the decoded image buffer 310. Inter-prediction such as inter-view prediction and motion compensation prediction may be used when decoding the encoded bit string of the image signal of the non-basis viewpoint, but in this case, the base already decoded and stored in the decoded image buffer 310 is used. An image signal of a viewpoint or a non-basis viewpoint is used as a reference image.

デプス信号復号部３０９は、分解部３０１から供給されるデプス信号の、スライスヘッ、並びにスライスの符号化モード、動きベクトルまたは視差ベクトル、符号化残差信号等が符号化された符号化ビット列を復号する。この供給される符号化ビット列は、図１１に示した符号化ビット列のスライス＃Ｃ００、＃Ｃ２０、＃Ｃ４０、＃Ｃ０１、＃Ｃ２１、＃Ｃ４１のＲＢＳＰ部に相当する。 The depth signal decoding unit 309 decodes a coded bit sequence in which the slice head of the depth signal supplied from the decomposition unit 301 and the coding mode, motion vector or disparity vector of the slice, the coded residual signal, and the like are coded. . This supplied encoded bit string corresponds to the RBSP section of slices # C00, # C20, # C40, # C01, # C21, and # C41 of the encoded bit string shown in FIG.

ここで、供給されるＲＢＳＰ部の符号化ビット列は、図１８に示した"slice_layer_in_scalable_extension_rbsp"である。デプス信号復号部３０９は、図１８に示した"slice_layer_in_scalable_extension_rbsp"のシンタックス構造に従って符号化ビット列を復号する。まず、デプス信号復号部３０９は、"slice_layer_in_scalable_extension_rbsp"に含まれる"slice_header"を復号し、スライスに関連する情報を得る。デプス信号復号部３０９は、この復号されたスライスに関連する情報を復号管理部３０２に供給する。 Here, the supplied encoded bit string of the RBSP part is “slice_layer_in_scalable_extension_rbsp” shown in FIG. The depth signal decoding unit 309 decodes the encoded bit string according to the syntax structure of “slice_layer_in_scalable_extension_rbsp” illustrated in FIG. First, the depth signal decoding unit 309 decodes “slice_header” included in “slice_layer_in_scalable_extension_rbsp”, and obtains information related to the slice. The depth signal decoding unit 309 supplies information related to the decoded slice to the decoding management unit 302.

ここで上述した通り、"slice_layer_in_scalable_extension_rbsp"に含まれる"slice_header"には、参照すべきＰＰＳを特定する番号"pic_parameter_set_id"が含まれており、図１１に示したスライス＃Ｃ００、＃Ｃ２０、＃Ｃ４０、＃Ｃ０１、＃Ｃ２１、＃Ｃ４１の"pic_parameter_set_id"には、スライス＃Ｃ００、＃Ｃ２０、＃Ｃ４０、＃Ｃ０１、＃Ｃ２１、＃Ｃ４１が参照すべきＰＰＳ＃Ｃの"pic_parameter_set_id"の値が設定されている。また、ＰＰＳ＃Ｃの"seq_parameter_set_id"には、ＰＰＳ＃Ｃが参照すべきＳＰＳ＃Ｃの"seq_parameter_set_id"の値が設定されているため、スライス＃Ｃ００、＃Ｃ２０、＃Ｃ４０、＃Ｃ０１、＃Ｃ２１、＃Ｃ４１が参照すべきシーケンス情報がＳＰＳ＃Ｃであることを明確に特定することができる。これらの管理は復号管理部３０２で行われる。 As described above, the “slice_header” included in “slice_layer_in_scalable_extension_rbsp” includes the number “pic_parameter_set_id” that identifies the PPS to be referred to, and includes slices # C00, # C20, # C40 illustrated in FIG. In “pic_parameter_set_id” of # C01, # C21, and # C41, the value of “pic_parameter_set_id” of PPS # C that the slices # C00, # C20, # C40, # C01, # C21, and # C41 should refer to is set. Yes. In addition, since the value of “seq_parameter_set_id” of SPS # C to be referred to by PPS # C is set in “seq_parameter_set_id” of PPS # C, slices # C00, # C20, # C40, # C01, # C21 , # C41 can clearly specify that the sequence information to be referred to is SPS # C. These managements are performed by the decryption management unit 302.

デプス信号復号部３０９は、
（ａ）スライススライス＃Ｃ００、＃Ｃ２０、＃Ｃ４０、＃Ｃ０１、＃Ｃ２１、＃Ｃ４１の"slice_header"から復号されたスライスに関連する情報に加えて、
（ｂ）復号管理部３０２から供給されるスライス＃Ｃ００、＃Ｃ２０、＃Ｃ４０、＃Ｃ０１、＃Ｃ２１、＃Ｃ４１のＮＡＬユニットヘッダに含まれていた"nal_unit_header_svc_mvc_extension"から復号された視点情報、
（ｃ）スライス＃Ｃ００、＃Ｃ２０、＃Ｃ４０、＃Ｃ０１、＃Ｃ２１、＃Ｃ４１が参照すべきＳＰＳ＃Ｃから復号されたシーケンス情報、および
（ｄ）スライス＃Ｃ００、＃Ｃ２０、＃Ｃ４０、＃Ｃ０１、＃Ｃ２１、＃Ｃ４１が参照すべきＰＰＳ＃Ｃから復号されたピクチャ情報、
を用いて、"slice_layer_in_scalable_extension_rbsp"に含まれる"slice_data"を復号し、復号デプス信号を得る。 The depth signal decoding unit 309
(A) In addition to information related to slices decoded from “slice_header” of slice slices # C00, # C20, # C40, # C01, # C21, and # C41,
(B) View point information decoded from “nal_unit_header_svc_mvc_extension” included in the NAL unit header of slices # C00, # C20, # C40, # C01, # C21, and # C41 supplied from the decoding management unit 302;
(C) Sequence information decoded from SPS # C to be referenced by slices # C00, # C20, # C40, # C01, # C21, # C41, and (d) Slices # C00, # C20, # C40, # Picture information decoded from PPS # C to which C01, # C21, # C41 should refer;
Is used to decode “slice_data” included in “slice_layer_in_scalable_extension_rbsp” to obtain a decoded depth signal.

この復号デプス信号は復号画像バッファ３１０に格納される。デプス信号の符号化ビット列を復号する際には視点間予測や動き補償予測等のインター予測を用いることもあるが、その際には既に復号され、復号画像バッファ３１０に格納された復号デプス信号を参照画像として利用する。なお、デプス信号の復号方法はモノクローム・フォーマットの画像信号の場合と同じ方法を利用することができる。 This decoded depth signal is stored in the decoded image buffer 310. Inter-prediction such as inter-view prediction and motion compensation prediction may be used when decoding the encoded bit string of the depth signal. In this case, the decoded depth signal already decoded and stored in the decoded image buffer 310 is used. Use as a reference image. Note that the decoding method of the depth signal can use the same method as the case of the image signal in the monochrome format.

復号管理部３０２は、復号画像バッファ３１０に格納された、復号画像信号および復号デプス信号の出力タイミングを管理し、復号画像バッファ３１０から同一時刻の、各視点の復号画像信号および復号デプス信号を同期して出力する。この際、各視点の復号画像信号および復号デプス信号に、それらの視点を特定する情報である視点ＩＤを関連付けて出力する。 The decoding management unit 302 manages the output timing of the decoded image signal and the decoded depth signal stored in the decoded image buffer 310, and synchronizes the decoded image signal and the decoded depth signal of each viewpoint at the same time from the decoded image buffer 310. And output. At this time, the viewpoint ID, which is information for identifying the viewpoint, is output in association with the decoded image signal and the decoded depth signal of each viewpoint.

画像復号装置３００、３００ａから出力された各視点の復号画像信号は、表示装置等で表示されてもよい。所望の視点が出力されない場合、画像復号装置３００、３００ａから出力された復号画像信号、復号デプス信号、及びカメラパラメータ等の補足付加情報から、仮想視点の画像信号を生成し、得られた仮想視点の画像信号を表示装置等に表示する。なお、変形例に係る画像復号装置３００ａでは仮想視点画像生成部３３０でその仮想視点の画像信号を生成してもよい。 The decoded image signal of each viewpoint output from the image decoding devices 300 and 300a may be displayed on a display device or the like. When the desired viewpoint is not output, a virtual viewpoint image signal is generated from supplementary additional information such as a decoded image signal, a decoded depth signal, and a camera parameter output from the image decoding devices 300 and 300a, and the obtained virtual viewpoint Are displayed on a display device or the like. In the image decoding device 300a according to the modification, the virtual viewpoint image generation unit 330 may generate an image signal of the virtual viewpoint.

次に、図２１、２２に示した実施の形態２に係る画像復号装置３００、３００ａによる多視点画像の復号処理手順について説明する。
図２３は、実施の形態２に係る画像復号装置３００、３００ａによる多視点画像の復号処理手順を示すフローチャートである。図２３のフローチャートにおいて、分解部３０１は、取得した符号化ビット列をＮＡＬユニット単位に分離し、ＮＡＬユニットヘッダを復号する（Ｓ３０１）。このステップＳ３０１において、ネットワークを介して符号化ビット列を受信し、ＮＡＬユニット単位に分離する処理手順について、より具体的に説明する。 Next, a multi-viewpoint image decoding process procedure performed by the image decoding apparatuses 300 and 300a according to Embodiment 2 shown in FIGS.
FIG. 23 is a flowchart illustrating a multi-viewpoint image decoding process procedure performed by the image decoding apparatuses 300 and 300a according to the second embodiment. In the flowchart of FIG. 23, the decomposition unit 301 separates the acquired encoded bit string into NAL unit units, and decodes the NAL unit header (S301). The processing procedure for receiving the encoded bit string via the network and separating it in units of NAL units in step S301 will be described more specifically.

図２４は、ネットワークを介して符号化ビット列を受信し、ＮＡＬユニット単位に分離する処理手順について示すフローチャートである。図２４のフローチャートにおいて、図示しない受信部は、ネットワークを介して符号化ビット列を受信する（Ｓ４０１）。続いて、図示しないパケット分解部は、その受信された符号化ビット列に用いられたＭＰＥＧ−２システム方式、ＭＰ４ファイルフォーマット、ＲＴＰ等の規格に基づいて付加されたパケット・ヘッダを除去し、上記符号化ビット列を得る（Ｓ４０２）。続いて、分解部３０１は、その符号化ビット列をＮＡＬユニット単位で分離する（Ｓ４０２）。続いて、分解部３０１は、ＮＡＬユニットヘッダを復号する（Ｓ４０３）。 FIG. 24 is a flowchart showing a processing procedure for receiving an encoded bit string via a network and separating it into NAL unit units. In the flowchart of FIG. 24, a receiving unit (not shown) receives an encoded bit string via the network (S401). Subsequently, a packet decomposing unit (not shown) removes the packet header added to the received encoded bit string based on the MPEG-2 system method, MP4 file format, RTP, etc. A digitized bit string is obtained (S402). Subsequently, the decomposition unit 301 separates the encoded bit string in units of NAL units (S402). Subsequently, the disassembling unit 301 decodes the NAL unit header (S403).

なお、分解部３０１は、"nal_unit_type"の値が“１４” 、“２０”または“２１”の場合、ＮＡＬユニットヘッダに含まれる視点情報である"nal_unit_header_svc_mvc_extension"も復号する。ここで復号される視点情報には視点ＩＤ等が含まれる。なお、"nal_unit_type"の値が“１４”のＮＡＬユニットヘッダに含まれる視点情報は、後に続くＮＡＬユニットの視点情報となり、"nal_unit_type"の値が “２０” または“２１” のＮＡＬユニットヘッダに含まれる視点情報は、当該ＮＡＬユニットの視点情報となる。 Note that when the value of “nal_unit_type” is “14”, “20”, or “21”, the decomposing unit 301 also decodes “nal_unit_header_svc_mvc_extension” that is the viewpoint information included in the NAL unit header. The viewpoint information decoded here includes a viewpoint ID and the like. The viewpoint information included in the NAL unit header whose “nal_unit_type” value is “14” is the viewpoint information of the subsequent NAL unit, and is included in the NAL unit header whose “nal_unit_type” value is “20” or “21”. The viewpoint information to be displayed is the viewpoint information of the NAL unit.

図２３のフローチャートに戻る。分解部３０１は、ステップＳ３０１の処理により分離されたＮＡＬユニットのヘッダ部であＮＡＬユニットヘッダに含まれる、ＮＡＬユニットの種類を見分ける識別子である"nal_unit_type"を評価する（Ｓ３０２）。
（ａ）"nal_unit_type"が“７”、すなわち当該ＮＡＬユニットが基底視点の画像信号の、シーケンス全体の符号化に係るパラメータ情報が符号化された符号化ビット列の場合（Ｓ３０２の７）、ステップＳ３０３に進む。
（ｂ）"nal_unit_type"が“１５”、すなわちＭＶＣ拡張情報を含むシーケンス全体の符号化に係るパラメータ情報、すなわち非基底視点の画像信号のシーケンス情報またはデプス信号のシーケンス情報が符号化された符号化ビット列の場合（Ｓ３０２の１５）、ステップＳ３０４に進む。
（ｃ）"nal_unit_type"が“８”、すなわち当該ＮＡＬユニットが基底視点の画像信号、非基底視点の画像信号、またはデプス信号の、ピクチャ全体の符号化に係るパラメータ情報が符号化された符号化ビット列の場合（Ｓ３０２の８）、ステップＳ３０５に進む。
（ｄ）"nal_unit_type"が“６”、すなわち当該ＮＡＬユニットが補足付加情報が符号化された符号化ビット列の場合（Ｓ３０２の６）、ステップＳ３０６に進む。
（ｅ）"nal_unit_type"が“１４”、すなわち当該ＮＡＬユニットがプリフィックスＮＡＬユニットの場合（Ｓ３０２の１４）、ステップＳ３０７に進む。
（ｆ）"nal_unit_type"が“１”または“５”、すなわち当該ＮＡＬユニットが基底視点のスライス単位の画像信号が符号化された符号化ビット列の場合（Ｓ３０２の１または５）、ステップＳ３０８に進む。
（ｇ）"nal_unit_type"が“２０”、すなわち当該ＮＡＬユニットが非基底視点のスライス単位の画像信号が符号化された符号化ビット列の場合（Ｓ３０２の２０）、ステップＳ３０９に進む。
（ｈ）"nal_unit_type"が“２１”、すなわち当該ＮＡＬユニットがスライス単位のデプス信号が符号化された符号化ビット列の場合（Ｓ３０２の２１）、ステップＳ３１０に進む。
（ｉ）"nal_unit_type"がその他の値をとる場合（Ｓ３０２のその他）もあるが、本明細書では説明を省略する。 Returning to the flowchart of FIG. The disassembling unit 301 evaluates “nal_unit_type” which is an identifier for identifying the type of the NAL unit included in the NAL unit header which is the header part of the NAL unit separated by the process of step S301 (S302).
(A) If “nal_unit_type” is “7”, that is, if the NAL unit is an encoded bit string obtained by encoding parameter information related to encoding of the entire sequence of the base viewpoint image signal (S302-7), step S303 Proceed to
(B) “nal_unit_type” is “15”, that is, parameter information relating to encoding of the entire sequence including MVC extension information, that is, encoding of non-basis viewpoint image signal sequence information or depth signal sequence information In the case of a bit string (15 in S302), the process proceeds to step S304.
(C) “nal_unit_type” is “8”, that is, the NAL unit is a base viewpoint image signal, a non-base viewpoint image signal, or a depth signal, which is encoded with parameter information related to the encoding of the entire picture. In the case of a bit string (S302-8), the process proceeds to step S305.
(D) When “nal_unit_type” is “6”, that is, the NAL unit is an encoded bit string in which supplementary additional information is encoded (S302-6), the process proceeds to step S306.
(E) If “nal_unit_type” is “14”, that is, if the NAL unit is a prefix NAL unit (14 in S302), the process proceeds to step S307.
(F) If “nal_unit_type” is “1” or “5”, that is, if the NAL unit is an encoded bit string obtained by encoding an image signal in the slice unit of the base viewpoint (1 or 5 in S302), the process proceeds to step S308. .
(G) If “nal_unit_type” is “20”, that is, if the NAL unit is a coded bit string obtained by coding an image signal in a slice unit of a non-basis viewpoint (step S302-20), the process proceeds to step S309.
(H) If “nal_unit_type” is “21”, that is, if the NAL unit is a coded bit string obtained by coding a depth signal in units of slices (21 in S302), the process proceeds to step S310.
(I) “nal_unit_type” may take other values (others in S302), but the description is omitted in this specification.

基底視点の画像信号用シーケンス情報復号部３０３は、基底視点の画像信号の、シーケンス全体の符号化に係るパラメータ情報が符号化された符号化ビット列を復号し、基底視点の画像信号の、シーケンス全体の符号化に係るパラメータ情報を得る（Ｓ３０３）。 The base viewpoint image signal sequence information decoding unit 303 decodes an encoded bit string in which parameter information related to encoding of the entire sequence of the base viewpoint image signal is encoded, and the base viewpoint image signal of the entire sequence. The parameter information related to the encoding of is obtained (S303).

ＭＶＣ拡張情報を含むシーケンス情報復号部３０４は、ＭＶＣ拡張情報を含むシーケンス全体の符号化に係るパラメータ情報、すなわち非基底視点の画像信号のシーケンス情報またはデプス信号のシーケンス情報が符号化された符号化ビット列を復号し、非基底視点の画像信号またはデプス信号の、シーケンス全体の符号化に係るパラメータ情報を得る（Ｓ３０４）。 The sequence information decoding unit 304 including the MVC extension information is encoded by encoding the parameter information related to the encoding of the entire sequence including the MVC extension information, that is, the sequence information of the non-base viewpoint image signal or the depth signal. The bit string is decoded, and parameter information relating to the coding of the entire sequence of the image signal or depth signal of the non-basis viewpoint is obtained (S304).

ピクチャ情報復号部３０５は、ピクチャ全体の符号化に係るパラメータ情報が符号化された符号化ビット列を復号し、基底視点の画像信号、非基底視点の画像信号またはデプス信号の、ピクチャ全体の符号化に係るパラメータ情報を得る（Ｓ３０５）。 The picture information decoding unit 305 decodes an encoded bit string in which parameter information relating to encoding of the entire picture is encoded, and encodes the entire picture of a base viewpoint image signal, a non-base viewpoint image signal, or a depth signal. The parameter information concerning is obtained (S305).

補足付加情報復号部３０６は、補足付加情報が符号化された符号化ビット列を復号し、補足付加情報を得る（Ｓ３０６）。 The supplementary additional information decoding unit 306 decodes the encoded bit string in which the supplementary additional information is encoded, and obtains supplementary additional information (S306).

分解部３０１は、プリフィックスＮＡＬユニットのＲＢＳＰを復号する（Ｓ３０７）。ただし、ＭＶＣ方式ではプリフィックスＮＡＬユニットのＲＢＳＰは空であるため、事実上復号処理は行われない。 The decomposition unit 301 decodes the RBSP of the prefix NAL unit (S307). However, in the MVC method, since the RBSP of the prefix NAL unit is empty, the decoding process is practically not performed.

画像信号復号部３０７は、基底視点の画像信号のスライスヘッダ、並びに基底視点の画像信号のスライスの符号化モード、動きベクトル、符号化残差信号等が符号化された符号化ビット列を復号し、基底視点のスライス単位の画像信号を得る（Ｓ３０８）。 The image signal decoding unit 307 decodes the encoded bit string in which the slice header of the base viewpoint image signal, the encoding mode of the slice of the base viewpoint image signal, the motion vector, the encoded residual signal, and the like are encoded, An image signal for each slice of the base viewpoint is obtained (S308).

画像信号復号部３０７は、非基底視点の画像信号のスライスヘッダ、並びに非基底視点の画像信号のスライスの符号化モード、動きベクトル、符号化残差信号等が符号化された符号化ビット列を復号し、非基底視点のスライス単位の画像信号を得る（Ｓ３０９）。 The image signal decoding unit 307 decodes a coded bit string obtained by coding a slice header of a non-basis viewpoint image signal, a coding mode of a slice of the non-basis viewpoint image signal, a motion vector, a coded residual signal, and the like. Then, an image signal for each slice of the non-base viewpoint is obtained (S309).

デプス信号復号部３０９は、デプス信号のスライスヘッダ、並びにデプス信号のスライスの符号化モード、動きベクトル、符号化残差信号等が符号化された符号化ビット列を復号し、スライス単位のデプス信号を得る（Ｓ３１０）。 The depth signal decoding unit 309 decodes the coded bit string in which the slice header of the depth signal, the coding mode of the slice of the depth signal, the motion vector, the coded residual signal, and the like are coded, and the depth signal in units of slices is decoded. Obtain (S310).

復号管理部３０２は、復号された、画像信号およびデプス信号を出力するタイミングか否かを判断する（Ｓ３１１）。出力するタイミングでない場合（Ｓ３１１のＮ）、ステップＳ３１３に進み、出力するタイミングである場合（Ｓ３１１のＹ）、復号された、画像信号およびデプス信号を出力し（Ｓ３１２）、ステップＳ３１３に進む。この際、各視点の復号画像信号および復号デプス信号と、それらの視点を特定する情報である視点ＩＤとを関連付けて出力する。 The decoding management unit 302 determines whether it is time to output the decoded image signal and depth signal (S311). If it is not the timing to output (N in S311), the process proceeds to step S313, and if it is the timing to output (Y in S311), the decoded image signal and depth signal are output (S312), and the process proceeds to step S313. At this time, the decoded image signal and the decoded depth signal of each viewpoint and the viewpoint ID which is information for specifying the viewpoint are output in association with each other.

すべてのＮＡＬユニットの復号処理が完了したかどうかを判定する（Ｓ３１３）。すべてのＮＡＬユニットの符号化処理が完了した場合（Ｓ３１３のＹ）、本復号処理を終了し、完了していない場合（Ｓ３１３のＮ）、ステップＳ３０１からステップＳ３１３の処理を繰り返す。 It is determined whether the decoding process for all NAL units has been completed (S313). If the encoding process for all NAL units is completed (Y in S313), the decoding process is terminated. If the encoding process is not completed (N in S313), the processes from step S301 to step S313 are repeated.

なお、実施の形態２に係る画像復号装置３００、３００ａは、単視点の画像信号が既存のＡＶＣ／Ｈ．２６４方式で符号化された符号化ビット列を復号し、単視点の画像信号を得ることもできる。さらに、実施の形態２に係る画像復号装置３００、３００ａは、デプス信号を含まない多視点の画像信号が既存のＭＶＣ方式で符号化された符号化ビット列を復号し、多視点の画像信号を得ることもできる。 Note that the image decoding apparatuses 300 and 300a according to the second embodiment are configured such that a single-viewpoint image signal is an existing AVC / H. It is also possible to obtain a single-view image signal by decoding an encoded bit string encoded by the H.264 method. Furthermore, the image decoding apparatuses 300 and 300a according to Embodiment 2 decode a coded bit string in which a multi-view image signal that does not include a depth signal is encoded by an existing MVC method, and obtain a multi-view image signal. You can also.

以上の説明においては、図１０に示したような多視点画像とデプスマップの視点の数が異なっており、それぞれが１対１に対応していない場合について説明したが、もちろん多視点画像信号と多視点デプス信号が同数で、それぞれが１対１に対応していても符号化または復号することができる。 In the above description, the case where the number of viewpoints of the multi-viewpoint image and the depth map as shown in FIG. 10 is different and each does not correspond one-to-one has been described. Even if the number of multi-view depth signals is the same and each corresponds one-to-one, it can be encoded or decoded.

以上説明したように実施の形態２によれば、多視点画像の復号において、複数の視点からの画像信号を含む多視点画像信号とともに、補助情報として複数の視点からのデプス信号を含む多視点デプス信号が符号化された符号化ビット列を復号して、多視点画像信号と多視点デプス信号を得ることができる。その際、当該符号化ビット列を効率よく受信または読み出すことができる。 As described above, according to the second embodiment, in multi-view image decoding, multi-view depth including multi-view image signals including image signals from a plurality of viewpoints and depth signals from a plurality of viewpoints as auxiliary information. A multi-view image signal and a multi-view depth signal can be obtained by decoding an encoded bit string in which the signal is encoded. At that time, the encoded bit string can be efficiently received or read out.

また、実施の形態２に係る画像復号装置３００、３００ａは、従来の単視点の画像信号のみが符号化された符号化ビット列を復号し、単視点の画像信号を得ることができる。さらに、実施の形態２に係る画像復号装置３００、３００ａは、補助情報としての多視点デプス信号を含まない、複数の視点の画像信号を含む多視点画像信号のみが符号化された符号化ビット列を復号して多視点画像信号を得ることもでき、上位互換性が保たれる。 In addition, the image decoding apparatuses 300 and 300a according to Embodiment 2 can decode a coded bit string in which only a conventional single-viewpoint image signal is encoded, and obtain a single-viewpoint image signal. Furthermore, the image decoding apparatuses 300 and 300a according to Embodiment 2 include an encoded bit string obtained by encoding only a multi-view image signal including an image signal of a plurality of viewpoints and not including a multi-view depth signal as auxiliary information. Multi-viewpoint image signals can also be obtained by decoding, and upward compatibility is maintained.

さらに、多視点画像信号と多視点デプス信号が同数でそれぞれが１対１に対応した符号化ビット列を復号できるのはもちろんのこと、多視点画像信号とデプス信号の視点の数が異なっており、それぞれが1対１に対応していない符号化ビット列を復号することもできる。 Furthermore, the number of viewpoints of the multi-view image signal and the depth signal is different as well as the same number of multi-view image signals and multi-view depth signals can be decoded. It is also possible to decode encoded bit strings that do not correspond one-to-one.

（実施の形態３）
次に、本発明の実施の形態３に係る画像符号化装置について説明する。実施の形態３に係る画像符号化装置は、符号化する必要のある画像信号およびデプス信号の視点をコンテンツやシーンの内容に応じて判定し、その判定に応じて必要な視点の、画像信号およびデプス信号のみを符号化する点が実施の形態１に係る画像符号化装置と異なる。それ以外については、実施の形態１に係る画像符号化装置と同様であるため、その説明を省略する。 (Embodiment 3)
Next, an image coding apparatus according to Embodiment 3 of the present invention will be described. The image encoding device according to Embodiment 3 determines the viewpoint of an image signal and a depth signal that need to be encoded according to the content and the contents of the scene, and the image signal and the viewpoint of the required viewpoint according to the determination. It differs from the image coding apparatus according to Embodiment 1 in that only the depth signal is coded. Since the rest is the same as that of the image coding apparatus according to Embodiment 1, the description thereof is omitted.

図２５は、実施の形態３に係る画像符号化装置４００の構成を示すブロック図である。図２５において、図２と同じ構成ブロックには同じ符号を付している。実施の形態３に係る画像符号化装置４００は、実施の形態１に係る画像符号化装置１００の構成に、判定部１２０、切替部１２１、１２２が追加された構成である。 FIG. 25 is a block diagram illustrating a configuration of an image encoding device 400 according to Embodiment 3. In FIG. 25, the same components as those in FIG. The image coding apparatus 400 according to Embodiment 3 has a configuration in which a determination unit 120 and switching units 121 and 122 are added to the configuration of the image coding apparatus 100 according to Embodiment 1.

判定部１２０は、ある視点からの奥行き情報を符号化対象とするか否かを判定する。この場合、ユニット化部１０９は、画像信号符号化部１０７により生成された画像符号化データ、および判定部１２０により符号化対象とすると判定された奥行き情報をデプス信号符号化部１０８により符号化した奥行き情報符号化データを含む符号化ストリームを生成する。 The determination unit 120 determines whether or not depth information from a certain viewpoint is to be encoded. In this case, the unitization unit 109 uses the depth signal encoding unit 108 to encode the image encoded data generated by the image signal encoding unit 107 and the depth information determined to be encoded by the determination unit 120. An encoded stream including depth information encoded data is generated.

また、判定部１２０は、ある視点からの画像を符号化対象とするか否かを判定する。この場合、ユニット化部１０９は、判定部１２０により符号化対象とすると判定された画像を画像信号符号化部１０７により符号化した画像符号化データ、およびデプス信号符号化部１０８により生成された奥行き情報符号化データを含む符号化ストリームを生成する。なお、判定部１２０はその両方の判定を行うこともできる。その場合、ユニット化部１０９は、判定部１２０により符号化対象とすると判定された画像を画像信号符号化部１０７により符号化した画像符号化データ、および判定部１２０により符号化対象とすると判定された奥行き情報をデプス信号符号化部１０８により符号化した奥行き情報符号化データを含む符号化ストリームを生成する。 Further, the determination unit 120 determines whether or not an image from a certain viewpoint is to be encoded. In this case, the unitization unit 109 encodes the image determined by the determination unit 120 to be encoded by the image signal encoding unit 107 and the depth generated by the depth signal encoding unit 108. An encoded stream including information encoded data is generated. Note that the determination unit 120 can also perform both determinations. In this case, the unitization unit 109 determines that the image determined to be the encoding target by the determination unit 120 is the encoded image data encoded by the image signal encoding unit 107 and the determination unit 120 determines the encoding target. An encoded stream including depth information encoded data obtained by encoding the depth information by the depth signal encoding unit 108 is generated.

以下、判定部１２０の処理をより具体的に説明する。判定部１２０には、符号化管理情報、カメラパラメータ情報、各視点の画像信号、および各視点のデプス信号が供給される。判定部１２０はこれらをもとに、符号化すべき画像信号の視点およびデプス信号の視点を決定する。判定部１２０は、符号化しないと判定した、画像信号の視点およびデプス信号の視点に関する情報を省略した、新たな符号化管理情報を作成し、符号化管理部１０１に供給する。なお、図２５の符号化管理部１０１に供給される符号化管理情報は、図１の符号化管理部１０１に供給される符号化管理情報と同様の情報である。 Hereinafter, the process of the determination part 120 is demonstrated more concretely. The determination unit 120 is supplied with encoding management information, camera parameter information, an image signal for each viewpoint, and a depth signal for each viewpoint. Based on these, the determination unit 120 determines the viewpoint of the image signal to be encoded and the viewpoint of the depth signal. The determination unit 120 creates new encoding management information in which the information about the viewpoint of the image signal and the viewpoint of the depth signal that are determined not to be encoded is omitted, and supplies the encoded management information to the encoding management unit 101. Note that the encoding management information supplied to the encoding management unit 101 in FIG. 25 is the same information as the encoding management information supplied to the encoding management unit 101 in FIG.

以下、判定部１２０における判定方法の具体例を説明する。
判定例１として、判定部１２０は、判定対象の奥行き情報のもとになる視点と、既に符号化対象に決定されている別の奥行き情報のもとになる視点との距離が所定の第１基準距離より短いとき、判定対象の奥行き情報を符号化対象としないと判定し、当該第１基準距離より長いとき、判定対象の奥行き情報を符号化対象とすると判定する。当該第１基準距離は、実験やシミュレーションにより得られた知見をもとに、設計者が任意に設定することができる。 Hereinafter, a specific example of the determination method in the determination unit 120 will be described.
As a determination example 1, the determination unit 120 has a predetermined first distance between the viewpoint that is the basis of the depth information to be determined and the viewpoint that is the source of another depth information that has already been determined as the encoding target. When it is shorter than the reference distance, it is determined that the depth information to be determined is not to be encoded, and when it is longer than the first reference distance, it is determined that the depth information to be determined is to be encoded. The first reference distance can be arbitrarily set by the designer based on knowledge obtained through experiments and simulations.

判定部１２０は、供給されるカメラパラメータ情報に含まれるカメラの外部パラメータ情報から、各画像信号の視点および各デプス信号の視点の位置を特定することができる。当該外部パラメータには各視点のカメラの配置情報が含まれ、この配置情報には３次元空間上の位置（ｘ、ｙ、ｚ座標）または３軸（x、ｙ、z軸）上の回転角度（ロール、ピッチ、ヨー）が含まれている。判定部１２０は、供給される、同時刻の複数のデプス信号の視点間の間隔が十分に密である場合、いずれかのデプス信号を符号化対象から外す。このように、判定部１２０は一部の視点からのデプス信号の符号化を省略しても復号側で所望の視点の画像信号の生成が容易にできると判断した場合、所望の視点の画像信号の生成に必要のない視点のデプス信号を省略し、その生成に必要な視点のデプス信号を符号化対象として採用する。この判定例１は、図６、７を参照しながら説明した知見にもとづく。 The determination unit 120 can identify the viewpoint of each image signal and the position of the viewpoint of each depth signal from the external parameter information of the camera included in the supplied camera parameter information. The external parameter includes the arrangement information of the camera of each viewpoint, and this arrangement information includes the rotation angle on the position (x, y, z coordinate) or the three axes (x, y, z axis) in the three-dimensional space. (Roll, pitch, yaw) are included. When the interval between the viewpoints of the plurality of depth signals supplied at the same time is sufficiently close, the determination unit 120 excludes any depth signal from the encoding target. As described above, when the determination unit 120 determines that the decoding side can easily generate the image signal of the desired viewpoint even if the coding of the depth signals from some viewpoints is omitted, the image signal of the desired viewpoint is determined. The depth signal of the viewpoint that is not necessary for the generation is omitted, and the depth signal of the viewpoint that is necessary for the generation is adopted as the encoding target. This determination example 1 is based on the knowledge described with reference to FIGS.

判定例２として、判定部１２０は、同一画像内の第１被写体と第２被写体との距離が所定の第２基準距離より短いとき、複数のデプス信号のうち一部のデプス信号を省略する。当該第２基準距離も、実験やシミュレーションにより得られた知見をもとに、設計者が任意に設定することができる。その際、判定部１２０は、第１被写体と第２被写体との距離が短いほど、符号化対象と判定すべき奥行き情報の数を減らしてもよい。 As a determination example 2, when the distance between the first subject and the second subject in the same image is shorter than a predetermined second reference distance, the determination unit 120 omits some of the depth signals. The second reference distance can also be arbitrarily set by the designer based on knowledge obtained through experiments and simulations. At this time, the determination unit 120 may reduce the number of depth information to be determined as an encoding target as the distance between the first subject and the second subject is shorter.

判定部１２０は、供給されるデプス信号から、重なり合う被写体同士の奥行きの差を算出することができる。この被写体同士の奥行きの差として、デプス信号のエッジ（例えば、濃度が急峻に変化する点）を抽出し、エッジ部分の境界を挟んだ画素値の差を用いることができる。判定部１２０は、重なりあう被写体同士の奥行きの差が十分小さく、一部の視点の符号化を省略しても復号側で所望の視点の画像信号を容易に生成できると判断した場合、所望の視点の画像信号の生成に必要のない視点のデプス信号を省略し、その生成に必要な視点のデプス信号を符号化対象として採用する。この判定例２は、図８、９を参照しながら説明した知見にもとづく。 The determination unit 120 can calculate the difference in depth between overlapping subjects from the supplied depth signal. As the depth difference between the subjects, an edge of the depth signal (for example, a point where the density changes sharply) can be extracted, and a difference in pixel values sandwiching the boundary of the edge portion can be used. When the determination unit 120 determines that the difference in depth between the overlapping subjects is sufficiently small and the decoding side can easily generate an image signal of a desired viewpoint even if coding of some viewpoints is omitted, A viewpoint depth signal that is not necessary for generating a viewpoint image signal is omitted, and a viewpoint depth signal necessary for the generation is adopted as an encoding target. This determination example 2 is based on the knowledge described with reference to FIGS.

上記判定例１、２において、復号側で所望の視点の画像信号の生成が前提となるアプリケーションの場合、デプス信号の視点と同様に画像信号の視点も省略することもできる。 In the above determination examples 1 and 2, in the case of an application on the premise that the decoding side generates an image signal of a desired viewpoint, the viewpoint of the image signal can be omitted as well as the viewpoint of the depth signal.

判定例３として、判定部１２０は、判定対象の画像を用いずに別の画像および奥行き情報から判定対象の画像を予測生成した場合にて、その生成された画像の品質が所定の基準値より高い場合、判定対象の画像を符号化対象としないと判定する。当該基準値も、実験やシミュレーションにより得られた知見をもとに、設計者が任意に設定することができる。 As a determination example 3, when the determination unit 120 predicts and generates a determination target image from another image and depth information without using the determination target image, the quality of the generated image is higher than a predetermined reference value. If it is high, it is determined that the image to be determined is not to be encoded. The reference value can also be arbitrarily set by the designer based on knowledge obtained through experiments and simulations.

判定部１２０は、供給される画像信号の、一部の視点の画像信号を省略し、残された視点の画像信号とデプス信号から省略した視点の画像信号を予測生成する。判定部１２０は、省略した視点の原画像信号と、その視点の予測生成した画像信号との間の歪み量を各画素毎に二乗誤差等の指標を用いて評価する。判定部１２０は、歪み量が所定の基準値より少ない視点の画像信号を、仮想視点の生成への寄与度が小さい信号であると判断し、当該視点の画像信号を省略する。なお、ここでは画像信号を省略するための処理について説明したが、同様の処理によりデプス信号を省略することもできる。 The determination unit 120 omits some of the viewpoint image signals from the supplied image signal, and predicts and generates a viewpoint image signal omitted from the remaining viewpoint image signals and depth signals. The determination unit 120 evaluates the amount of distortion between the original image signal of the omitted viewpoint and the predicted and generated image signal of the viewpoint using an index such as a square error for each pixel. The determination unit 120 determines that an image signal of a viewpoint having a distortion amount less than a predetermined reference value is a signal having a small contribution to the generation of the virtual viewpoint, and omits the image signal of the viewpoint. Although the process for omitting the image signal has been described here, the depth signal can be omitted by the same process.

切替部１２１は、判定部１２０の判定結果に応じて、符号化対象とする視点の画像信号のみを画像信号符号化部１０７に供給する。この画像信号符号化部１０７に供給される画像信号は、図１の画像信号符号化部１０７に供給される画像信号と同様の信号である。同様に、切替部１２２は、判定部１２０の判定結果に応じて、符号化対象とする視点のデプス信号のみをデプス信号符号化部１０８に供給する。このデプス信号符号化部１０８に供給される画像信号は、図１のデプス信号符号化部１０８に供給されるデプス信号と同様の信号である。 The switching unit 121 supplies only the image signal of the viewpoint to be encoded to the image signal encoding unit 107 according to the determination result of the determination unit 120. The image signal supplied to the image signal encoding unit 107 is the same signal as the image signal supplied to the image signal encoding unit 107 in FIG. Similarly, the switching unit 122 supplies only the depth signal of the viewpoint to be encoded to the depth signal encoding unit 108 according to the determination result of the determination unit 120. The image signal supplied to the depth signal encoding unit 108 is the same signal as the depth signal supplied to the depth signal encoding unit 108 of FIG.

次に、実施の形態３に係る画像符号化装置４００による多視点画像の符号化処理手順について説明する。
図２６は、実施の形態３に係る画像符号化装置４００による多視点画像の符号化処理手順を示すフローチャートである。上述したように、実施の形態３に係る画像符号化装置４００は、符号化する必要のある、画像信号およびデプス信号の視点を、コンテンツやシーンの内容に応じて判定する。図２６に示す実施の形態３に係る画像符号化処理手順は、符号化する必要のある、画像信号およびデプス信号の視点が変化した際、改めてシーケンスを開始する点が、図１９に示した実施の形態１に係る画像符号化処理手順と異なる。図２６において、図１９と同じステップには同じ符号を付し、図１９と異なる点のみを説明する。 Next, a multi-viewpoint image encoding process procedure performed by the image encoding apparatus 400 according to Embodiment 3 will be described.
FIG. 26 is a flowchart illustrating a multi-viewpoint image encoding processing procedure performed by the image encoding device 400 according to Embodiment 3. As described above, the image coding apparatus 400 according to Embodiment 3 determines the viewpoints of the image signal and the depth signal that need to be coded according to the content and the content of the scene. The image encoding processing procedure according to the third embodiment shown in FIG. 26 is that the sequence is started again when the viewpoints of the image signal and the depth signal that need to be encoded change. This is different from the image encoding processing procedure according to the first embodiment. In FIG. 26, the same steps as those in FIG. 19 are denoted by the same reference numerals, and only differences from FIG. 19 will be described.

図２６のフローチャートにおいて、判定部１２０は、符号化すべき画像信号の視点およびデプス信号の視点を評価して、その視点の信号を採用するか否か判定する（Ｓ５０１）。採用される信号のみがステップＳ５０２以降の処理に進む。 In the flowchart of FIG. 26, the determination unit 120 evaluates the viewpoint of the image signal to be encoded and the viewpoint of the depth signal, and determines whether to adopt the signal of the viewpoint (S501). Only the adopted signal proceeds to the processing after step S502.

続いて、符号化管理部１０１は、ステップＳ５０１の処理により採用された、画像信号およびデプス信号の視点が変化したか否かを判断（Ｓ５０２）する。変化した場合（Ｓ５０２のＹ）および一番最初の場合、ステップＳ５０１に進み、変化していない場合（Ｓ５０２のＮ）、ステップＳ１１３に進む。 Subsequently, the encoding management unit 101 determines whether or not the viewpoints of the image signal and the depth signal adopted by the process of step S501 have changed (S502). If changed (Y in S502) and the first case, the process proceeds to step S501. If not changed (N in S502), the process proceeds to step S113.

ステップＳ１０１以降、図１９の実施の形態１に係る画像符号化処理手順と同様に、画像信号およびデプス信号が符号化される。ただし、ステップＳ１２４の処理において、すべての画像信号およびデプス信号の符号化処理が完了していないと判定された場合（Ｓ１２４のＮ）、ステップＳ５０１からステップＳ１２４の符号化処理を繰り返す。 After step S101, the image signal and the depth signal are encoded in the same manner as the image encoding processing procedure according to the first embodiment of FIG. However, in the process of step S124, when it is determined that the encoding process for all image signals and depth signals is not completed (N of S124), the encoding process of steps S501 to S124 is repeated.

実施の形態１から３に係る画像符号化処理および画像復号処理は、その処理を実行可能なハードウェアを搭載した送信装置、蓄積装置、および受信装置により実現可能なことはもちろんのこと、ＲＯＭやフラッシュメモリ等に記憶されているファームウェアや、コンピュータ等のソフトウェアによっても実現することができる。そのファームウェアプログラム、ソフトウェアプログラムを、コンピュータ等で読み取り可能な記録媒体に記録して提供することも、有線もしくは無線のネットワークを通してサーバから提供することも、地上波もしくは衛星デジタル放送のデータ放送として提供することも可能である。 The image encoding process and the image decoding process according to the first to third embodiments can be realized by a transmission device, a storage device, and a reception device that are equipped with hardware capable of executing the processing. It can also be realized by firmware stored in a flash memory or the like, or software such as a computer. The firmware program and software program can be provided by being recorded on a computer-readable recording medium, provided from a server through a wired or wireless network, or provided as a data broadcast of terrestrial or satellite digital broadcasting. It is also possible.

以上、本発明をいくつかの実施の形態をもとに説明した。これらの実施の形態は例示であり、それらの各構成要素や各処理プロセスの組合せにいろいろな変形例が可能なこと、またそうした変形例も本発明の範囲にあることは当業者に理解されるところである。 The present invention has been described based on some embodiments. It is understood by those skilled in the art that these embodiments are exemplifications, and that various modifications can be made to combinations of the respective constituent elements and processing processes, and such modifications are also within the scope of the present invention. By the way.

実施の形態１に係る画像符号化装置の構成を示すブロック図である。1 is a block diagram showing a configuration of an image encoding device according to Embodiment 1. FIG. ＭＶＣ方式で５視点からなる多視点画像を符号化する際の、画像間の参照依存関係の一例を示す図である。It is a figure which shows an example of the reference dependence relationship between images at the time of encoding the multiview image which consists of 5 viewpoints by a MVC system. 実施の形態１の変形例に係る画像符号化装置の構成を示すブロック図である。6 is a block diagram illustrating a configuration of an image encoding device according to a modification of the first embodiment. FIG. 第２視点および第３視点から第１対象物および第２対象物が存在するシーンを撮影し、仮想視点である第１視点の画像を生成する例を示す図である。It is a figure which shows the example which image | photographs the scene where a 1st target object and a 2nd target object exist from a 2nd viewpoint and a 3rd viewpoint, and produces | generates the image of the 1st viewpoint which is a virtual viewpoint. 図４の例において、撮影される画像、それに対応するデプスマップ、および生成される画像を示す図である。In the example of FIG. 4, it is a figure which shows the image image | photographed, the depth map corresponding to it, and the produced | generated image. 第５視点および第６視点から第３対象物および第４対象物が存在するシーンを撮影し、仮想視点である第４視点の画像を生成する例を示す図である。It is a figure which shows the example which image | photographs the scene where a 3rd target object and a 4th target object exist from a 5th viewpoint and a 6th viewpoint, and produces | generates the image of the 4th viewpoint which is a virtual viewpoint. 図６の例において、撮影される画像、それに対応するデプスマップ、および生成される画像を示す図である。In the example of FIG. 6, it is a figure which shows the image image | photographed, the depth map corresponding to it, and the produced | generated image. 第８視点から第５対象物または第６対象物のいずれか一方と、第７対象物が存在する２つのシーンを撮影し、仮想視点である第７視点の画像を生成する例を示す図である。It is a figure which shows the example which image | photographs two scenes in which either a 5th target object or a 6th target object and a 7th target object exist from an 8th viewpoint, and produces | generates the image of the 7th viewpoint which is a virtual viewpoint. is there. 図９は、図８の例において、撮影された画像、それに対応するデプスマップ、および生成される画像を示す図である。FIG. 9 is a diagram illustrating a captured image, a corresponding depth map, and a generated image in the example of FIG. 符号化すべき、５視点（視点０、視点１、視点２、視点３および視点４）からの画像を含む多視点画像、および３視点（視点０、視点２および視点４）からのデプスＤＳを含む多視点デプスマップを示す図である。Includes multi-view images including images from 5 viewpoints (view 0, viewpoint 1, viewpoint 2, viewpoint 3 and viewpoint 4) to be encoded, and depth DS from 3 viewpoints (view 0, viewpoint 2 and viewpoint 4) It is a figure which shows a multiview depth map. 実施の形態１に係る画像符号化装置で生成される符号化ストリームをＮＡＬユニット単位で表現した例を示す図である。6 is a diagram illustrating an example in which an encoded stream generated by the image encoding device according to Embodiment 1 is expressed in units of NAL units. FIG. ＡＶＣ／Ｈ．２６４符号化方式で規定されているＮＡＬユニットの種類を示す図である。AVC / H. 2 is a diagram illustrating the types of NAL units defined in the H.264 encoding scheme. FIG. ＳＰＳのＮＡＬユニットの構成を示す図である。It is a figure which shows the structure of the NAL unit of SPS. サブセットＳＰＳのＮＡＬユニットの構成を示す図である。It is a figure which shows the structure of the NAL unit of subset SPS. ＰＰＳのＮＡＬユニットの構成を示す図である。It is a figure which shows the structure of the NAL unit of PPS. プリフィックスＮＡＬユニットの構成を示す図である。It is a figure which shows the structure of a prefix NAL unit. "nal_unit_type"の値が“１”または“５”のスライスＮＡＬユニットの構成を示す図である。It is a figure which shows the structure of the slice NAL unit whose value of "nal_unit_type" is "1" or "5". "nal_unit_type"の値が“２０”のスライスＮＡＬユニットの構成を示す図である。It is a figure which shows the structure of the slice NAL unit whose value of "nal_unit_type" is "20". 実施の形態１に係る画像符号化装置による多視点画像の符号化処理手順を示すフローチャートである。5 is a flowchart illustrating a multi-viewpoint image encoding process procedure by the image encoding apparatus according to Embodiment 1; 実施の形態１に係る画像符号化装置により生成された多視点画像の符号化ビット列をネットワークを介して伝送する場合の送信処理手順を示すフローチャートである。6 is a flowchart illustrating a transmission processing procedure when transmitting an encoded bit sequence of a multi-view image generated by the image encoding device according to Embodiment 1 via a network. 本発明の実施の形態２に係る画像復号装置の構成を示すブロック図である。It is a block diagram which shows the structure of the image decoding apparatus which concerns on Embodiment 2 of this invention. 実施の形態２の変形例に係る画像復号装置の構成を示すブロック図である。FIG. 10 is a block diagram illustrating a configuration of an image decoding device according to a modification of the second embodiment. 実施の形態２に係る画像復号装置による多視点画像の復号処理手順を示すフローチャートである。12 is a flowchart illustrating a decoding process procedure of a multi-viewpoint image by the image decoding apparatus according to Embodiment 2. ネットワークを介して符号化ビット列を受信し、ＮＡＬユニット単位に分離する処理手順について示すフローチャートである。It is a flowchart which shows about the process sequence which receives an encoding bit stream via a network, and isolate | separates into a NAL unit unit. 実施の形態３に係る画像符号化装置の構成を示すブロック図である。10 is a block diagram illustrating a configuration of an image encoding device according to Embodiment 3. FIG. 実施の形態３に係る画像符号化装置による多視点画像の符号化処理手順を示すフローチャートである。12 is a flowchart illustrating a multi-viewpoint image encoding processing procedure by the image encoding device according to the third embodiment.

Explanation of symbols

１００画像符号化装置、１０１符号化管理部、１０２画像信号用シーケンス情報符号化部、１０３デプス信号用シーケンス情報符号化部、１０４画像信号用ピクチャ情報符号化部、１０５デプス信号用ピクチャ情報符号化部、１０６カメラパラメータ情報符号化部、１０７画像信号符号化部、１０８デプス信号符号化部、１０９ユニット化部、１１０パラメータ情報符号化部、１１１デプス信号生成部、１２０判定部、１２１，１２２切替部、３００，３０１分解部、３０２復号管理部、３０３基底視点の画像信号用シーケンス情報復号部、３０４ＭＶＣ拡張情報を含むシーケンス情報復号部、３０５ピクチャ情報復号部、３０６補足付加情報復号部、３０７画像信号復号部、３０９デプス信号復号部、３１０復号画像バッファ、３２０パラメータ情報復号部、３３０仮想視点画像生成部。 DESCRIPTION OF SYMBOLS 100 Image encoding apparatus, 101 encoding management part, 102 Image signal sequence information encoding part, 103 Depth signal sequence information encoding part, 104 Image signal picture information encoding part, 105 Depth signal picture information encoding Unit, 106 camera parameter information encoding unit, 107 image signal encoding unit, 108 depth signal encoding unit, 109 unitization unit, 110 parameter information encoding unit, 111 depth signal generation unit, 120 determination unit, 121, 122 switching , 300, 301 decomposition unit, 302 decoding management unit, 303 base sequence image signal sequence information decoding unit, 304 sequence information decoding unit including MVC extension information, 305 picture information decoding unit, 306 supplementary additional information decoding unit, 307 Image signal decoding unit, 3 9 depth signal decoding unit 310 decoded picture buffer, 320 parameter information decoding unit, 330 a virtual viewpoint image generator.

Claims

A first encoding unit that encodes a plurality of images from a plurality of different viewpoints to generate image encoded data;
A second encoding unit that encodes depth information indicating the depth of a specific space from at least one viewpoint and generates depth information encoded data;
A third encoding unit that generates parameter information encoded data by encoding parameter information including viewpoint information for specifying a plurality of viewpoints based on the plurality of images and the depth information;
Generate encoded streams including image encoded data, depth information encoded data, and parameter information encoded data generated by the first encoding unit, the second encoding unit, and the third encoding unit, respectively. A stream generator,
An image encoding device comprising:

The first encoding unit generates first image encoded data by encoding an image from a viewpoint to be a reference among the plurality of images, and encodes other images to encode a second image Generate data,
The third encoding unit includes first parameter information of an image from the viewpoint to be used as the reference, second parameter information of other images, and third parameter information of the depth information among the plurality of images. Are encoded to generate first parameter information encoded data, second parameter information encoded data, and third parameter information encoded data,
The stream generation unit is configured to generate first image encoded data, second image encoded data, and depth information encoded by the first encoding unit, the second encoding unit, and the third encoding unit, respectively. The image encoding apparatus according to claim 1, wherein an encoded stream including data, first parameter information encoded data, second parameter information encoded data, and third parameter information encoded data is generated.

The image coding apparatus according to claim 2, wherein the third parameter information is described in a syntax structure corresponding to the syntax structure of the second parameter information.

In the second parameter information and the third parameter information, viewpoint identification information is described,
The common identification information is given to the viewpoints when the position of the viewpoint that is the basis of the image and the position of the viewpoint that is the basis of the depth information match. The image encoding device described in 1.

A first encoding step of encoding a plurality of images from a plurality of different viewpoints to generate encoded image data;
A second encoding step of generating depth information encoded data by encoding depth information indicating the depth of the specific space from at least one viewpoint;
A third encoding step of generating parameter information encoded data by encoding parameter information including viewpoint information for specifying a plurality of viewpoints based on the plurality of images and the depth information;
Generate encoded streams including image encoded data, depth information encoded data, and parameter information encoded data generated by the first encoding step, the second encoding step, and the third encoding step, respectively. A stream generation step;
An image encoding method comprising:

The first encoding step generates a first image encoded data by encoding an image from a viewpoint to be a reference among the plurality of images, and encodes the other images to encode a second image. Generate data,
The third encoding step includes, among the plurality of images, first parameter information of an image from the viewpoint to be used as a reference, second parameter information of other images, and third parameter information of the depth information. Are encoded to generate first parameter information encoded data, second parameter information encoded data, and third parameter information encoded data,
The stream generation step includes a first image encoded data, a second image encoded data, and a depth information encoded respectively generated by the first encoding step, the second encoding step, and the third encoding step. 6. The image encoding method according to claim 5, wherein an encoded stream including data, first parameter information encoded data, second parameter information encoded data, and third parameter information encoded data is generated.

The image coding method according to claim 6, wherein the third parameter information is described in a syntax structure corresponding to the syntax structure of the second parameter information.

A first encoding process for encoding a plurality of images from a plurality of different viewpoints and generating encoded image data;
A second encoding process for encoding depth information indicating the depth of the specific space from at least one viewpoint and generating depth information encoded data;
A third encoding process for generating parameter information encoded data by encoding parameter information including viewpoint information for specifying a plurality of viewpoints based on the plurality of images and the depth information;
Generate encoded streams including image encoded data, depth information encoded data, and parameter information encoded data generated by the first encoding process, the second encoding process, and the third encoding process, respectively. Stream generation processing,
An image encoding program executed by a computer.

The first encoding process generates a first image encoded data by encoding an image from a viewpoint to be a reference among the plurality of images, and encodes other images to encode a second image. Generate data,
The third encoding process includes, among the plurality of images, first parameter information of an image from the viewpoint to be used as a reference, second parameter information of other images, and third parameter information of the depth information. Are encoded to generate first parameter information encoded data, second parameter information encoded data, and third parameter information encoded data,
The stream generation processing includes first image encoded data, second image encoded data, and depth information encoded generated by the first encoding process, the second encoding process, and the third encoding process, respectively. 9. The image encoding program according to claim 8, wherein an encoded stream including data, first parameter information encoded data, second parameter information encoded data, and third parameter information encoded data is generated.

The image coding program according to claim 9, wherein the third parameter information is described in a syntax structure corresponding to the syntax structure of the second parameter information.