JP2009004941A

JP2009004941A - Multi-viewpoint image receiving method, multi-viewpoint image receiving device, and multi-viewpoint image receiving program

Info

Publication number: JP2009004941A
Application number: JP2007162123A
Authority: JP
Inventors: Hiroya Nakamura; 博哉中村; Motoharu Ueda; 基晴上田
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 2007-06-20
Filing date: 2007-06-20
Publication date: 2009-01-08

Abstract

<P>PROBLEM TO BE SOLVED: To solve the problem that when an encoded multi-viewpoint image signal is received and decoded, a buffer is required for sorting the signals by viewpoint and delay is therefore caused. <P>SOLUTION: Encoded data obtained by encoding a multi-viewpoint image signal are received by a separation unit 21 through a network etc., and separated into first encoded data in which subsidiary information (SEI) and the number V of viewpoints of the multi-viewpoint image signal are encoded respectively and second encoded data wherein a decoded image output order number (o) and the multi-viewpoint image signal are encoded respectively. Those encoded data are individually decoded. A decoded image management unit 211 specifies viewpoints with numbers (v) for respective decoded image signals stored in a decoded image buffer 210, and manages the output order of decoded images at the respective viewpoints with a number (d) to output images having the same value of numbers (d) of decoded image signals at the respective viewpoints while synchronizing the viewpoints with one another so that the images are output simultaneously or successively. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は多視点画像受信方法、多視点画像受信装置及び多視点画像受信用プログラムに係り、特に異なる視点から撮影された多視点画像信号を符号化した多視点画像符号化データをネットワークなどを介して受信して復号する多視点画像受信方法、多視点画像受信装置及び多視点画像受信用プログラムに関する。 The present invention relates to a multi-view image receiving method, a multi-view image receiving apparatus, and a multi-view image receiving program, and in particular, multi-view image encoded data obtained by encoding multi-view image signals taken from different viewpoints via a network or the like. The present invention relates to a multi-view image receiving method, a multi-view image receiving apparatus, and a multi-view image receiving program.

＜動画像符号化方式＞
現在、時間軸上に連続する動画像をディジタル信号の情報として取り扱い、その際、効率の高い情報の放送、伝送又は蓄積等を目的とし、時間方向の冗長性を利用して動き補償予測を用い、空間方向の冗長性を利用して離散コサイン変換等の直交変換を用いて符号化圧縮するＭＰＥＧ（Moving Picture Experts Group）などの符号化方式に準拠した装置、システムが、普及している。 <Video coding system>
Currently, moving images on the time axis are handled as digital signal information. At that time, motion compensated prediction is used using redundancy in the time direction for the purpose of broadcasting, transmitting or storing information with high efficiency. Devices and systems that are compliant with a coding scheme such as MPEG (Moving Picture Experts Group) that performs coding compression using orthogonal transform such as discrete cosine transform using redundancy in the spatial direction have become widespread.

１９９５年に制定されたＭＰＥＧ−２ビデオ（ＩＳＯ／ＩＥＣ１３８１８−２）符号化方式は、汎用の動画像圧縮符号化方式として定義されており、プログレッシブ走査画像に加えてインターレース走査画像にも対応し、ＳＤＴＶ（標準解像度画像）のみならずＨＤＴＶ（高精細画像）まで対応しており、光ディスクであるＤＶＤ（Digital Versatile Disk）や、Ｄ−ＶＨＳ（登録商標）規格のディジタルＶＴＲによる磁気テープなどの蓄積メディアや、ディジタル放送等のアプリケーションとして広く用いられている。 The MPEG-2 video (ISO / IEC 13818-2) encoding system established in 1995 is defined as a general-purpose moving image compression encoding system, and supports interlaced scanned images in addition to progressive scanned images. Supports not only SDTV (standard definition images) but also HDTV (high definition images), and storage of DVDs (Digital Versatile Disks), which are optical discs, and magnetic tapes using D-VHS (registered trademark) digital VTRs. It is widely used as an application for media and digital broadcasting.

また、ネットワーク伝送や携帯端末等のアプリケーションにおいて、より高い符号化効率を目標とする、ＭＰＥＧ−４ビジュアル（ＩＳＯ／ＩＥＣ１４４９６−２）符号化方式の標準化が行われ、１９９８年に国際標準として制定された。 In addition, MPEG-4 visual (ISO / IEC 14496-2) encoding method was standardized, aiming at higher encoding efficiency in applications such as network transmission and portable terminals, and was established as an international standard in 1998. It was done.

更に、２００３年に、国際標準化機構（ＩＳＯ）と国際電気標準会議（ＩＥＣ）のジョイント技術委員会（ＩＳＯ／ＩＥＣ）と、国際電気通信連合電気通信標準化部門（ＩＴＵ−Ｔ）の共同作業によってＭＰＥＧ−４ＡＶＣ／Ｈ.２６４と呼ばれる符号化方式（ＩＳＯ／ＩＥＣでは１４４９６−１０、ＩＴＵ‐ＴではＨ.２６４の規格番号がつけられている。以下、これをＡＶＣ／Ｈ.２６４符号化方式と呼ぶ）が国際標準として制定された。このＡＶＣ／Ｈ.２６４符号化方式では、従来のＭＰＥＧ−２ビデオやＭＰＥＧ−４ビジュアル等の符号化方式に比べ、より高い符号化効率を実現している。 Furthermore, in 2003, joint work of the International Technical Organization (ISO) and the International Electrotechnical Commission (IEC) Joint Technical Committee (ISO / IEC) and the International Telecommunication Union Telecommunication Standardization Sector (ITU-T) -4 AVC / H.264 encoding method (ISO / IEC 1449-10, ITU-T H.264 standard number. This is hereinafter referred to as AVC / H.264 encoding method. Called the international standard. This AVC / H.264 encoding method achieves higher encoding efficiency than conventional encoding methods such as MPEG-2 video and MPEG-4 visual.

ＭＰＥＧ−２ビデオやＭＰＥＧ−４ビジュアル等の符号化方式のＰピクチャ（順方向予測符号化画像）では、表示順序で直前のＩピクチャ（画面内符号化画像）またはＰピクチャのみから動き補償予測を行っていた。これに対して、ＡＶＣ／Ｈ.２６４符号化方式では、Ｐピクチャ及びＢピクチャ（双予測符号化画像）は複数のピクチャを参照ピクチャとして用いることができ、この中からブロック毎に最適なものを選択して動き補償を行うことができる。また、表示順序で先行するピクチャに加えて、既に符号化済みの表示順序で後続のピクチャも参照することができる。また、ＭＰＥＧ−２ビデオやＭＰＥＧ−４ビジュアル等の符号化方式のＢピクチャは、表示順序で前方１枚の参照ピクチャ、後方１枚の参照ピクチャ、もしくはその２枚の参照ピクチャを同時に参照し、２つのピクチャの平均値を予測ピクチャとし、対象ピクチャと予測ピクチャの差分データを符号化していた。 In a P picture (forward prediction encoded image) of an encoding system such as MPEG-2 video or MPEG-4 visual, motion compensation prediction is performed only from the immediately preceding I picture (in-screen encoded image) or P picture in the display order. I was going. On the other hand, in the AVC / H.264 coding system, P pictures and B pictures (bi-predictive coded images) can use a plurality of pictures as reference pictures, and the most suitable one for each block is selected from these pictures. Select to perform motion compensation. Further, in addition to the preceding picture in the display order, a subsequent picture can be referred to in the already encoded display order. In addition, a B picture of an encoding method such as MPEG-2 video or MPEG-4 visual refers to one reference picture in the display order, one reference picture in the rear, or two reference pictures at the same time. The average value of the two pictures is a predicted picture, and the difference data between the target picture and the predicted picture is encoded.

一方、ＡＶＣ／Ｈ.２６４符号化方式では、Ｂピクチャは表示順序で前方１枚、後方１枚という制約にとらわれず、前方や後方に関係なく任意の参照ピクチャを予測のために参照可能となった。さらに、Ｂピクチャを参照ピクチャとして参照することも可能となっている。 On the other hand, in the AVC / H.264 coding system, a B picture can be referred to for prediction regardless of the forward or backward, regardless of the forward or backward, without being restricted by the restriction of one forward and one backward in the display order. It was. Furthermore, it is possible to refer to the B picture as a reference picture.

また、ＡＶＣ／Ｈ.２６４符号化方式では復号画像を出力する際に符号化順序から復号画像の出力順序（出力先の表示装置等で表示する際に望ましい順序）に並び替えるために、参照ピクチャと、非参照ピクチャの両方をメモリに格納しなければならないが、参照ピクチャと非参照ピクチャを復号画像バッファと呼ばれる１つのメモリで統一的に管理する仕組みが導入されている。符号化された符号化列（ビットストリーム）には、画像毎にピクチャ・オーダー・カウント（picture order count）と呼ばれる出力順序を示す情報が符号化されており、復号画像の出力順序で番号がつけられている。復号されて復号画像バッファに格納されている画像で、ピクチャ・オーダー・カウントの値が最も小さい画像から順次出力する。また、ピクチャ・オーダー・カウントはＩＤＲピクチャ（符号化順序でそのピクチャより前のピクチャの情報を使わなくても、それ以後のピクチャが正常に復号できることを意味するピクチャ）でリセットされる。 Further, in the AVC / H.264 encoding method, when the decoded image is output, the reference picture is rearranged from the encoding order to the decoded image output order (desired order when displayed on the output destination display device or the like). Both of the non-reference pictures must be stored in the memory, but a mechanism for managing the reference pictures and the non-reference pictures in a single memory called a decoded picture buffer has been introduced. The coded sequence (bit stream) is encoded with information indicating an output order called a picture order count for each image, and is numbered in the output order of the decoded images. It has been. Of the images decoded and stored in the decoded image buffer, the images with the smallest picture order count value are sequentially output. In addition, the picture order count is reset in an IDR picture (a picture that means that subsequent pictures can be normally decoded without using information of pictures preceding the picture in the coding order).

更に、ＭＰＥＧ−２ビデオではピクチャ、ＭＰＥＧ−４ではビデオ・オブジェクト・プレーン（ＶＯＰ）を１つの単位として、ピクチャ（ＶＯＰ）毎の符号化モードが決められていたが、ＡＶＣ／Ｈ.２６４符号化方式では、スライスを符号化の単位としており、１つのピクチャ内にＩスライス、Ｐスライス、Ｂスライス等異なるスライスを混在させる構成にすることも可能となっている。 Furthermore, the encoding mode for each picture (VOP) has been determined using a picture in MPEG-2 video and a video object plane (VOP) in MPEG-4 as one unit, but AVC / H.264 encoding is used. In the system, a slice is used as an encoding unit, and it is also possible to have a configuration in which different slices such as an I slice, a P slice, and a B slice are mixed in one picture.

更に、ＡＶＣ／Ｈ.２６４符号化方式ではビデオの画素信号（符号化モード、動きベクトル、ＤＣＴ係数等）の符号化／復号処理を行うＶＣＬ（Video Coding Layer;ビデオ符号化層）と、ＮＡＬ（Network Abstraction Layer;ネットワーク抽象層）が定義されている。 Further, in the AVC / H.264 encoding method, a VCL (Video Coding Layer) that performs encoding / decoding processing of video pixel signals (encoding mode, motion vector, DCT coefficient, etc.), NAL ( Network Abstraction Layer) is defined.

ＡＶＣ／Ｈ.２６４符号化方式で符号化された符号化ビット列はＮＡＬの一区切りであるＮＡＬユニットを単位として構成される。ＮＡＬユニットはＶＣＬで符号化されたデータ（符号化モード、動きベクトル、ＤＣＴ係数等）を含むＶＣＬＮＡＬユニットと、ＶＣＬで生成されたデータを含まないｎｏｎ−ＶＣＬＮＡＬユニットがある。ｎｏｎ−ＶＣＬＮＡＬユニットにはシーケンス全体の符号化に関わるパラメータ情報が含まれているＳＰＳ（シーケンス・パラメータ・セット）や、ピクチャの符号化に関わるパラメータ情報が含まれているＰＰＳ（ピクチャ・パラメータ・セット）、ＶＣＬで符号化されたデータの復号に必須ではないＳＥＩ（補足付加情報）等がある。 An encoded bit string encoded by the AVC / H.264 encoding method is configured in units of NAL units that are a delimiter of NAL. The NAL unit includes a VCL NAL unit including data (encoding mode, motion vector, DCT coefficient, etc.) encoded by VCL, and a non-VCL NAL unit not including data generated by VCL. The non-VCL NAL unit includes an SPS (sequence parameter set) that includes parameter information related to coding of the entire sequence, and a PPS (picture parameter parameter) that includes parameter information related to picture coding. Set), SEI (supplementary additional information) and the like which are not essential for decoding data encoded by VCL.

それぞれのＮＡＬユニットのヘッダ部（先頭部）には常に”０”の値を持つフラグ（forbidden_zero_bit）、ＳＰＳ、またはＰＰＳ、または参照ピクチャとなるスライスが含まれているかどうかを見分ける識別子（nal_ref_idc）、ＮＡＬユニットの種類を見分ける識別子（nal_unit_type）が含まれる。nal_unit_typeは、ＶＣＬＮＡＬユニットの場合、”１”から”５”のいずれかの値を持つように規定されており、ｎｏｎ−ＶＣＬＮＡＬユニットの場合、例えばＳＥＩが”６”、ＳＰＳが”７”、ＰＰＳが”８”の値を持つように規定されている。復号側ではＮＡＬユニットの種類はＮＡＬユニットのヘッダ部に含まれるＮＡＬユニットの種類を見分ける識別子であるnal_unit_typeで識別することができる。 A header (head part) of each NAL unit always has a flag (forbidden_zero_bit) having a value of “0”, an identifier (nal_ref_idc) for identifying whether an SPS or PPS, or a slice serving as a reference picture is included, An identifier (nal_unit_type) for identifying the type of NAL unit is included. The nal_unit_type is defined to have any value from “1” to “5” in the case of the VCL NAL unit. For the non-VCL NAL unit, for example, the SEI is “6” and the SPS is “7”. , PPS is defined to have a value of “8”. On the decoding side, the type of the NAL unit can be identified by nal_unit_type which is an identifier for identifying the type of the NAL unit included in the header part of the NAL unit.

また、ＡＶＣ／Ｈ.２６４符号化方式における符号化の基本の単位はピクチャを分割したスライスであり、ＶＣＬＮＡＬユニットはスライス単位となっている。そこで、いくつかのＮＡＬユニットを纏めたアクセス・ユニットと呼ばれる単位が定義されており、１アクセス・ユニットに１つの符号化されたピクチャが含まれている。 The basic unit of encoding in the AVC / H.264 encoding method is a slice obtained by dividing a picture, and the VCL NAL unit is a slice unit. Therefore, a unit called an access unit in which several NAL units are combined is defined, and one encoded picture is included in one access unit.

＜多視点画像符号化方式＞
一方、２眼式立体テレビジョンにおいては、２台のカメラにより異なる２方向から撮影された左眼用画像、右眼用画像を生成し、これを同一画面上に表示して立体画像を見せるようにしている。この場合、左眼用画像、及び右眼用画像はそれぞれ独立した画像として別個に伝送、あるいは記録されている。しかし、これでは単一の２次元画像の約２倍の情報量が必要となってしまう。 <Multi-view image coding method>
On the other hand, in a twin-lens stereoscopic television, a left-eye image and a right-eye image captured from two different directions by two cameras are generated and displayed on the same screen to show a stereoscopic image. I have to. In this case, the left eye image and the right eye image are separately transmitted or recorded as independent images. However, this requires about twice as much information as a single two-dimensional image.

そこで、従来より、左右いずれか一方の画像を主画像とし、他方の画像（副画像）情報を一般的な圧縮符号化方法によって情報圧縮して情報量を抑える手法が提案されている（例えば、特許文献１参照）。この特許文献１に記載された立体テレビジョン画像伝送方式では、小領域毎に他方の画像での相関の高い相対位置を求め、その位置偏移量（視差ベクトル）と差信号（予測残差信号）とを伝送するようにしている。差信号も伝送、記録するのは、主画像と視差情報であるずれ量や位置偏移量を用いれば副画像に近い画像が復元できるが、物体の影になる部分など主画像がもたない副画像の情報は復元できないからである。 Therefore, conventionally, a method has been proposed in which one of the left and right images is set as a main image and the other image (sub-image) information is information-compressed by a general compression encoding method to reduce the amount of information (for example, Patent Document 1). In the stereoscopic television image transmission method described in Patent Document 1, a relative position with high correlation in the other image is obtained for each small region, and the position shift amount (parallax vector) and a difference signal (prediction residual signal) are obtained. ). The difference signal is also transmitted and recorded because the image close to the sub-image can be restored using the main image and the amount of disparity and position shift, which is parallax information, but there is no main image such as the shadow of the object. This is because the sub-image information cannot be restored.

また、１９９６年に単視点画像の符号化国際標準であるＭＰＥＧ−２ビデオ（ＩＳＯ／ＩＥＣ１４４９６−２）符号化方式に、マルチビュープロファイルと呼ばれるステレオ画像の符号化方式が追加された（ＩＳＯ／ＩＥＣ１４４９６−２／ＡＭＤ３）。ＭＰＥＧ−２ビデオ・マルチビュープロファイルは左眼用画像を基本レイヤー、右眼用画像を拡張レイヤーで符号化する２レイヤーの符号化方式となっており、時間方向の冗長性を利用した動き補償予測や、空間方向の冗長性を利用した離散コサイン変換に加えて、視点間の冗長性を利用した視差補償予測を用いて符号化圧縮する。 In 1996, a stereo image encoding method called a multi-view profile was added to the MPEG-2 video (ISO / IEC 14496-2) encoding method, which is an international standard for single-view image encoding (ISO / IEC). IEC 14496-2 / AMD3). The MPEG-2 video multi-view profile is a two-layer encoding method that encodes the image for the left eye with the base layer and the image for the right eye with the enhancement layer, and motion compensated prediction using redundancy in the time direction In addition to discrete cosine transformation using redundancy in the spatial direction, encoding compression is performed using disparity compensation prediction using redundancy between viewpoints.

更に、多視点画像を符号化・復号する際に、１系統の符号化経路及び復号経路を用いる手法が提案されている（例えば、特許文献２参照）。この手法では、符号化側（送信側）で、図２５に示すように複数の方向から撮影して得られる複数のチャンネルの画像信号を、順次化手段３０１にて各チャンネル毎に順次１フレーム又は１フィールドずつ遅延させる。制御手段３０４は、フレーム周期またはフィールド周期のスイッチ信号を出力して、順次化手段３０１から複数のチャンネルの信号を順次に取り出すよう、マルチプレクサ３０２を制御する。 Furthermore, a method using one system encoding path and decoding path when encoding / decoding a multi-viewpoint image has been proposed (for example, see Patent Document 2). In this method, on the encoding side (transmission side), image signals of a plurality of channels obtained by photographing from a plurality of directions as shown in FIG. Delay one field at a time. The control unit 304 outputs a switch signal having a frame period or a field period, and controls the multiplexer 302 so as to sequentially extract the signals of a plurality of channels from the sequential unit 301.

マルチプレクサ３０２では複数のチャンネルの信号を１フレーム又は１フィールド毎に順次にチャンネル単位で挿入することで時系列的に合成して、１系統の画像信号とする。符号化手段３０３はマルチプレクサ３０２から出力された１系統の画像信号を符号化して、符号化ビット列を伝送路に出力する。 The multiplexer 302 synthesizes the signals of a plurality of channels in time series by sequentially inserting them in units of channels for each frame or field, thereby forming one system of image signals. The encoding unit 303 encodes one system of image signals output from the multiplexer 302 and outputs an encoded bit string to the transmission path.

復号側（受信側）では、図２６に示すように復号化手段３０５で前記符号化ビット列を復号する。制御手段３０８は、復号された画像信号から、フレーム周期またはフィールド周期のスイッチ信号を出力して、デマルチプレクサ３０６を制御する。これにより、デマルチプレクサ３０６からは復号された画像信号の各フレームまたはフィールドがチャンネル単位で順次に抜き取られて同時化手段３０７に供給され、ここで１フレーム又は１フィールドずつ順次遅延して同時化し、撮影時と同様な複数チャンネルの画像信号として出力する。 On the decoding side (receiving side), the coded bit string is decoded by the decoding means 305 as shown in FIG. The control unit 308 controls the demultiplexer 306 by outputting a switch signal having a frame period or a field period from the decoded image signal. As a result, each frame or field of the decoded image signal is sequentially extracted from the demultiplexer 306 in units of channels and supplied to the synchronizer 307, where the frames or fields are sequentially delayed by one frame or one field and synchronized. It is output as a multi-channel image signal similar to that at the time of shooting.

特開昭６１-１４４１９１号公報JP-A 61-144191 特開平１０−２４３４１９号公報Japanese Patent Laid-Open No. 10-243419

しかしながら、従来の立体視画像符号化・復号化方法及び装置では、予め交互にＬとＲチャンネルの画像が配置された左右画像信号の左右画像がどちらのチャンネルかを規定しておく必要がある。また、ＬとＲチャンネルの画像が交互に配置されていることが前提となっているので、各視点のフレームレートが異なっている場合や、画像が欠落している場合には対応できない。 However, in the conventional stereoscopic image encoding / decoding method and apparatus, it is necessary to define which channel is the left and right images of the left and right image signals in which the L and R channel images are alternately arranged in advance. Further, since it is assumed that the images of the L and R channels are alternately arranged, it is not possible to deal with cases where the frame rates of the respective viewpoints are different or when images are missing.

また、従来の立体視画像符号化・復号化方法及び装置では、符号化時に多視点画像信号を構成するそれぞれの複数のチャンネルの画像信号を、１フレーム又は１フィールド毎に順次に配列し、１系統の画像信号としてから符号化手段３０３に入力し、さらに符号化手段３０３で符号化順に並び替えて符号化するので、１フレーム又は１フィールド毎に順次に並び替えるためのバッファが必要であり、その際に遅延が生じることがある。 Further, in the conventional stereoscopic image encoding / decoding method and apparatus, the image signals of the plurality of channels constituting the multi-view image signal at the time of encoding are sequentially arranged for each frame or one field. Since the system image signal is input to the encoding unit 303 and further encoded and rearranged in the encoding order by the encoding unit 303, a buffer for sequentially rearranging every frame or one field is necessary. In that case, a delay may occur.

また、復号時にも、復号化手段３０５で復号された画像信号を１フレーム又は１フィールド毎に順次に配列して出力してから、１フレーム又は１フィールドずつ順次遅延して同時化するので、視点毎に並び替えるためのバッファが必要となり、その際に遅延が生じることがある。従って、上記の従来の立体視画像符号化方法、装置で符号化された符号化データを受信し、従来の立体視画像復号方法及び装置を用いて復号する受信装置においても、同時化のための画像バッファが必要で、遅延時間を短くすることができない。 Also, at the time of decoding, the image signals decoded by the decoding means 305 are sequentially arranged and output for each frame or field, and then sequentially delayed by one frame or field, so that the viewpoint is synchronized. A buffer for rearrangement is required every time, and a delay may occur at that time. Therefore, even in a receiving apparatus that receives the encoded data encoded by the above-described conventional stereoscopic image encoding method and apparatus and decodes the encoded data using the conventional stereoscopic image decoding method and apparatus, it is An image buffer is required and the delay time cannot be shortened.

本発明は以上の点に鑑みてなされたもので、受信側装置におけるバッファサイズを削減すると共に、遅延時間を短くする多視点画像受信方法、多視点画像受信装置及び多視点画像受信用プログラムを提供することを目的とする。 The present invention has been made in view of the above points, and provides a multi-view image receiving method, a multi-view image receiving apparatus, and a multi-view image receiving program that reduce the buffer size and shorten the delay time in the receiving-side apparatus. The purpose is to do.

上記目的を達成するため、第１の発明は、設定された複数の視点でそれぞれ得られる各視点の画像信号を含む多視点画像信号であり、一の視点の画像信号は、一の視点から実際に撮影して得られた画像信号、又は一の視点から仮想的に撮影したものとして生成した画像信号である多視点画像信号を符号化した符号化データを受信し、符号化データを復号する多視点画像受信方法であって、
多視点画像信号が符号化されていることを示す補足付加情報と、その多視点画像信号の視点の数Ｖと、各視点のそれぞれを特定する番号ｖ及びそれぞれの視点での復号画像の出力順序を示す番号ｄを一括で示す復号画像出力順番号ｏ（ｏは０以上の整数）と、多視点画像信号とがそれぞれ符号化されている符号化データを受信する第１のステップと、第１のステップで受信した符号化データから、補足付加情報とその多視点画像信号の視点の数Ｖとがそれぞれ符号化されている第１の符号化データと、復号画像出力順番号ｏと多視点画像信号とがそれぞれ符号化されている第２の符号化データとを分離する第２のステップと、第１の符号化データを復号して、多視点画像信号の視点の数Ｖを含む補足付加情報を生成する第３のステップと、第２の符号化データを復号して、復号画像出力順番号ｏを生成する第４のステップと、復号された復号画像出力順番号ｏを復号された視点の数Ｖで整数演算により除算して得た商（０以上の整数）を、それぞれの視点での復号画像の出力順序を示す番号ｄとして算出すると共に、除算の剰余（０以上Ｖ未満の整数）を各視点のそれぞれを特定する番号ｖとして算出する第５のステップと、第２の符号化データを復号して、復号多視点画像信号を生成する第６のステップと、復号多視点画像信号を画像バッファに格納する第７のステップと、第５のステップで算出された、それぞれの視点での復号画像の出力順序を示す番号ｄと各視点のそれぞれを特定する番号ｖとに応じて、画像バッファから復号多視点画像信号を取り出し、復号多視点画像信号を構成する各視点の復号画像信号を互いに同期させて出力する第８のステップとを含むことを特徴とする。 In order to achieve the above object, the first invention is a multi-view image signal including image signals of respective viewpoints respectively obtained from a plurality of set viewpoints, and the image signal of one viewpoint is actually transmitted from one viewpoint. Receiving encoded data obtained by encoding a multi-view image signal, which is an image signal obtained by photographing an image signal or an image signal generated as a virtual image taken from one viewpoint, and decoding the encoded data A viewpoint image receiving method,
Supplementary additional information indicating that the multi-view image signal is encoded, the number V of viewpoints of the multi-view image signal, the number v for identifying each viewpoint, and the output order of the decoded image at each viewpoint A first step of receiving encoded data in which a decoded image output order number o (where o is an integer equal to or greater than 0) and a multi-view image signal are encoded, each indicating a number d indicating The first encoded data in which the supplementary additional information and the number of viewpoints V of the multi-view image signal are encoded, the decoded image output order number o, and the multi-view image Supplementary additional information including the second step of separating the second encoded data in which the signal is encoded and the number of viewpoints V of the multi-view image signal by decoding the first encoded data A third step of generating Obtained by dividing the decoded decoded image output order number o by the number V of decoded viewpoints by integer arithmetic, and generating a decoded image output order number o. The quotient (an integer greater than or equal to 0) is calculated as a number d indicating the output order of the decoded image at each viewpoint, and the remainder of the division (an integer greater than or equal to 0 and less than V) is the number v that identifies each viewpoint. A fifth step of calculating, a sixth step of decoding the second encoded data to generate a decoded multi-view image signal, a seventh step of storing the decoded multi-view image signal in an image buffer, The decoded multi-viewpoint image signal is extracted from the image buffer according to the number d indicating the output order of the decoded image at each viewpoint and the number v specifying each viewpoint calculated in the fifth step, and decoding is performed. Multi-view image Synchronize with each other a decoded image signal at each viewpoint which constitutes the characterized in that it comprises an eighth step of outputting.

また、上記の目的を達成するため、第３の発明は、設定された複数の視点でそれぞれ得られる各視点の画像信号を含む多視点画像信号であり、一の視点の画像信号は、一の視点から実際に撮影して得られた画像信号、又は一の視点から仮想的に撮影したものとして生成した画像信号である多視点画像信号を符号化した符号化データを受信し、符号化データを復号する多視点画像受信装置であって、
多視点画像が符号化されていることを示す補足付加情報と、その多視点画像信号の視点の数Ｖと、各視点のそれぞれを特定する番号ｖ及びそれぞれの視点での復号画像の出力順序を示す番号ｄを一括で示す復号画像出力順番号ｏ（ｏは０以上の整数）と、多視点画像信号とがそれぞれ符号化されている符号化データを受信する受信手段と、受信手段により受信された符号化データから、多視点画像が符号化されていることを示す補足付加情報とその多視点画像信号の視点の数Ｖとが符号化されている第１の符号化データと、復号画像出力順番号ｏと多視点画像信号とがそれぞれ符号化されている第２の符号化データとを分離する分離手段と、分離された第１の符号化データを復号して、多視点画像信号の視点の数Ｖを含む補足付加情報を生成する第１の復号手段と、分離された第２の符号化データを復号して、復号画像出力順番号ｏを生成する第２の復号手段と、復号された復号画像出力順番号ｏを復号された視点の数Ｖで整数演算により除算して得た商（整数）を、それぞれの視点での復号画像の出力順序を示す番号ｄとして算出すると共に、除算の剰余（０以上Ｖ未満の整数）を各視点のそれぞれを特定する番号ｖとして算出する算出手段と、第２の符号化データを復号して、復号多視点画像信号を生成する第３の復号手段と、復号多視点画像信号を画像バッファに格納する格納手段と、算出手段から供給されるそれぞれの視点での復号画像の出力順序を示す番号ｄと各視点のそれぞれを特定する番号ｖに応じて、画像バッファから復号多視点画像信号を取り出し、復号多視点画像信号を構成する各視点の復号画像信号を互いに同期させて出力する出力手段とを有することを特徴とする。また、上記の目的を達成するため、第５の発明の多視点画像受信用プログラムは、第１の発明の各ステップをコンピュータにより実行させることを特徴とする。 In order to achieve the above object, the third aspect of the invention is a multi-viewpoint image signal including an image signal of each viewpoint obtained from each of a plurality of set viewpoints. Receives encoded data obtained by encoding a multi-view image signal that is an image signal obtained by actually capturing an image from a viewpoint or an image signal generated as a virtual image from one viewpoint, and A multi-viewpoint image receiving device for decoding,
The supplementary additional information indicating that the multi-view image is encoded, the number V of the viewpoints of the multi-view image signal, the number v for identifying each viewpoint, and the output order of the decoded image at each viewpoint. A reception unit that receives encoded data in which a decoded image output order number o (o is an integer equal to or greater than 0) and a multi-view image signal are respectively encoded, and the reception unit receives First encoded data in which supplementary additional information indicating that a multi-view image is encoded and the number of viewpoints V of the multi-view image signal are encoded from the encoded data, and decoded image output Separation means for separating the second encoded data in which the sequence number o and the multi-view image signal are encoded respectively, and the viewpoint of the multi-view image signal by decoding the separated first encoded data Generate supplementary information including the number V The first decoding means, the second decoding means for decoding the separated second encoded data to generate the decoded image output order number o, and the decoded decoded image output order number o is decoded. The quotient (integer) obtained by dividing the number of viewpoints V by integer arithmetic is calculated as a number d indicating the output order of the decoded image at each viewpoint, and the remainder of the division (an integer not less than 0 and less than V) Calculating means for calculating each of the viewpoints as a number v, third decoding means for decoding the second encoded data to generate a decoded multi-view image signal, and the decoded multi-view image signal as an image According to the storage means for storing in the buffer, the number d indicating the output order of the decoded image at each viewpoint supplied from the calculation means, and the number v for identifying each viewpoint, the decoded multi-view image signal from the image buffer Take out and decode multi-viewpoint Synchronize with each other a decoded image signal at each viewpoint to construct an image signal and having an output means for outputting. In order to achieve the above object, a multi-viewpoint image receiving program according to a fifth aspect of the present invention causes a computer to execute each step of the first aspect.

上記の第１、第３、第５の発明では、受信した符号化データから、多視点画像信号が符号化されていることを示す補足付加情報と、その多視点画像信号の視点の数Ｖとがそれぞれ符号化されている第１の符号化データと、各視点のそれぞれを特定する番号ｖ及びそれぞれの視点での復号画像の出力順序を示す番号ｄを一括で示す復号画像出力順番号ｏと、多視点画像信号とがそれぞれ符号化されている第２の符号化データとを分離し、第１、第２の符号化データのそれぞれについて復号して得た復号多視点画像信号を適宜画像バッファに格納した後、復号及び算出して得たそれぞれの視点での復号画像の出力順序を示す番号ｄと各視点のそれぞれを特定する番号ｖに応じて、画像バッファから復号多視点画像信号を取り出し、復号多視点画像信号を構成する各視点の復号画像信号を互いに同期させて出力することで、復号後に１フレーム又は１フィールド毎に順次配列して出力してから同時化する処理及びデバイスを不要にできる。 In the first, third, and fifth inventions described above, supplementary additional information indicating that the multi-view image signal is encoded from the received encoded data, and the number of viewpoints V of the multi-view image signal, , Encoded image output order number o that collectively indicates a number v that identifies each viewpoint, and a number d that indicates the output order of the decoded image at each viewpoint. The multi-view image signal is separated from the second encoded data encoded, and the decoded multi-view image signal obtained by decoding each of the first and second encoded data is appropriately converted into an image buffer. The decoded multi-viewpoint image signal is extracted from the image buffer in accordance with the number d indicating the output order of the decoded image at each viewpoint obtained by decoding and calculation and the number v specifying each viewpoint. Decoded multi-view image No. By synchronously outputs together decoded image signal at each viewpoint which constitutes the, can be made unnecessary process and device for synchronization from the output of sequentially arranged for every one frame or one field after decoding.

また、第１、第３、第５の発明では、多視点画像であることを示す情報をＳＥＩ（補足付加情報）として多視点画像信号の視点の数Ｖと共に符号化され、かつ、多視点画像信号を構成する各視点の画像信号の視点を特定する視点番号ｖとそれぞれの視点での復号画像の出力順序を示す番号ｄを一括で示す復号画像出力順番号ｏが符号化された符号化データを受信して復号することができる。 In the first, third, and fifth inventions, information indicating a multi-view image is encoded as SEI (supplementary additional information) together with the number of viewpoints V of the multi-view image signal, and the multi-view image Encoded data obtained by encoding a decoded image output order number o that collectively indicates a viewpoint number v that specifies the viewpoint of the image signal of each viewpoint that constitutes the signal and a number d that indicates the output order of the decoded image at each viewpoint Can be received and decoded.

また、上記の目的を達成するため、第２の発明は、設定された複数の視点でそれぞれ得られる各視点の画像信号を含む多視点画像信号であり、一の視点の画像信号は、一の視点から実際に撮影して得られた画像信号、又は一の視点から仮想的に撮影したものとして生成した画像信号である多視点画像信号を符号化した符号化データを受信し、符号化データを復号する多視点画像受信方法であって、
多視点画像が符号化されていることを示す補足付加情報と、その多視点画像信号の視点の数Ｖと、多視点画像の各視点をそれぞれ特定する番号ｖと復号画像の管理用に視点を特定する視点ＩＤとの対応関係と、各視点のそれぞれを特定する番号ｖ及びそれぞれの視点での復号画像の出力順序を示す番号ｄを一括で示す復号画像出力順番号ｏ（ｏは０以上の整数）と、多視点画像信号とがそれぞれ符号化されている符号化データを受信する第１のステップと、第１のステップで受信した符号化データから、補足付加情報とその多視点画像信号の視点の数Ｖと多視点画像の各視点をそれぞれ特定する番号ｖと視点ＩＤとの対応関係とがそれぞれ符号化されている第１の符号化データと、復号画像出力順番号ｏと多視点画像信号とがそれぞれ符号化されている第２の符号化データとを分離する第２のステップと、第１の符号化データを復号して、多視点画像信号の視点の数Ｖ、及び多視点画像の各視点をそれぞれ特定する番号ｖと視点ＩＤとの対応関係を含む補足付加情報を生成する第３のステップと、第２の符号化データを復号して、復号画像出力順番号ｏを生成する第４のステップと、復号された復号画像出力順番号ｏを復号された視点の数Ｖで整数演算により除算して得た商（整数）を、それぞれの視点での復号画像の出力順序を示す番号ｄとして算出すると共に、除算の剰余（０以上Ｖ未満の整数）を各視点のそれぞれを特定する番号ｖとして算出する第５のステップと、第２の符号化データを復号して、復号多視点画像信号を生成する第６のステップと、復号多視点画像信号を画像バッファに格納する第７のステップと、第５のステップで算出された、それぞれの視点での復号画像の出力順序を示す番号ｄと視点ＩＤとに応じて、前記画像バッファから復号多視点画像信号を取り出し、前記復号多視点画像信号を構成する各視点の復号画像信号を互いに同期させて出力する第８のステップと、を含むことを特徴とする。 In order to achieve the above object, the second invention is a multi-viewpoint image signal including image signals of respective viewpoints respectively obtained from a plurality of set viewpoints, and an image signal of one viewpoint is Receives encoded data obtained by encoding a multi-view image signal that is an image signal obtained by actually capturing an image from a viewpoint or an image signal generated as a virtual image from one viewpoint, and A multi-viewpoint image receiving method for decoding,
Supplementary additional information indicating that the multi-view image is encoded, the number V of the viewpoints of the multi-view image signal, the number v for identifying each viewpoint of the multi-view image, and the viewpoint for managing the decoded image Decoded image output order number o (where o is 0 or more) that collectively indicates a correspondence relationship with the identified viewpoint ID, a number v that identifies each viewpoint, and a number d that indicates the output order of the decoded image at each viewpoint. Integer) and the encoded data received in the first step, the first step of receiving the encoded data in which the multi-view image signal is encoded, and the encoded data received in the first step. First encoded data in which the number V of viewpoints and the correspondence between a viewpoint v and a number v that specifies each viewpoint of the multi-viewpoint image are encoded, a decoded image output order number o, and a multi-viewpoint image Each signal is encoded A second step of separating the second encoded data, and decoding the first encoded data to specify the number V of viewpoints of the multi-view image signal and each viewpoint of the multi-view image A third step of generating supplementary additional information including the correspondence between the number v to be performed and the viewpoint ID, a fourth step of decoding the second encoded data to generate a decoded image output order number o, The quotient (integer) obtained by dividing the decoded decoded image output order number o by the number of decoded viewpoints V by integer arithmetic is calculated as a number d indicating the output order of the decoded images at the respective viewpoints. The fifth step of calculating the remainder of division (an integer greater than or equal to 0 and less than V) as a number v for identifying each viewpoint, and decoding the second encoded data to generate a decoded multi-view image signal Sixth step and decoding multi-viewpoint image signal In accordance with the number d indicating the output order of the decoded image at each viewpoint and the viewpoint ID calculated in the seventh step and the fifth step stored in the image buffer, the decoded multi-viewpoint image is output from the image buffer. And an eighth step of extracting the signal and outputting the decoded image signals of the respective viewpoints constituting the decoded multi-view image signal in synchronization with each other.

また、上記の目的を達成するため、第４の発明は、設定された複数の視点でそれぞれ得られる各視点の画像信号を含む多視点画像信号であり、一の視点の画像信号は、一の視点から実際に撮影して得られた画像信号、又は一の視点から仮想的に撮影したものとして生成した画像信号である多視点画像信号を符号化した符号化データを受信し、符号化データを復号する多視点画像受信装置であって、
多視点画像が符号化されていることを示す補足付加情報と、その多視点画像信号の視点の数Ｖと、多視点画像の各視点をそれぞれ特定する番号ｖと復号画像の管理用に視点を特定する視点ＩＤとの対応関係と、各視点のそれぞれを特定する番号ｖ及びそれぞれの視点での復号画像の出力順序を示す番号ｄを一括で示す復号画像出力順番号ｏ（ｏは０以上の整数）と、多視点画像信号とがそれぞれ符号化されている符号化データを受信する受信手段と、受信した符号化データから、補足付加情報とその多視点画像信号の視点の数Ｖと多視点画像の各視点をそれぞれ特定する番号ｖと視点ＩＤとの対応関係とがそれぞれ符号化されている第１の符号化データと、復号画像出力順番号ｏと多視点画像信号とがそれぞれ符号化されている第２の符号化データとを分離する分離手段と、分離された第１の符号化データを復号して、多視点画像信号の視点の数Ｖ、及び多視点画像の各視点をそれぞれ特定する番号ｖと復号画像の管理用に視点を特定する視点ＩＤとの対応関係を含む補足付加情報を生成する第１の復号手段と、分離された第２の符号化データを復号して、復号画像出力順番号ｏを生成する第２の復号手段と、復号された復号画像出力順番号ｏを復号された視点の数Ｖで整数演算により除算して得た商（整数）を、それぞれの視点での復号画像の出力順序を示す番号ｄとして算出すると共に、除算の剰余（０以上Ｖ未満の整数）を各視点のそれぞれを特定する番号ｖとして算出する算出手段と、第２の符号化データを復号して、復号多視点画像信号を生成する第３の復号手段と、復号多視点画像信号を画像バッファに格納する格納手段と、算出手段から供給されるそれぞれの視点での復号画像の出力順序を示す番号ｄと視点ＩＤとに応じて、画像バッファから復号多視点画像信号を取り出し、復号多視点画像信号を構成する各視点の復号画像信号を互いに同期させて出力する出力手段とを有することを特徴とする。また、上記の目的を達成するため、第６の発明の多視点画像受信用プログラムは、第２の発明の各ステップをコンピュータにより実行させることを特徴とする。 In order to achieve the above object, the fourth invention is a multi-viewpoint image signal including image signals of respective viewpoints respectively obtained from a plurality of set viewpoints. Receives encoded data obtained by encoding a multi-view image signal that is an image signal obtained by actually capturing an image from a viewpoint or an image signal generated as a virtual image from one viewpoint, and A multi-viewpoint image receiving device for decoding,
Supplementary additional information indicating that the multi-view image is encoded, the number V of the viewpoints of the multi-view image signal, the number v for identifying each viewpoint of the multi-view image, and the viewpoint for managing the decoded image Decoded image output order number o (where o is 0 or more) that collectively indicates a correspondence relationship with the identified viewpoint ID, a number v that identifies each viewpoint, and a number d that indicates the output order of the decoded image at each viewpoint. An integer) and multi-view image signal encoded data receiving means, receiving additional data, the number V of viewpoints of the multi-view image signal and the multi-view point from the received encoded data. The first encoded data in which the correspondence between the number v for identifying each viewpoint of the image and the viewpoint ID is encoded, the decoded image output order number o, and the multi-view image signal are encoded, respectively. Second encoding Separating means for separating data, decoding the first encoded data separated, the number V of viewpoints of the multi-view image signal, and the number v and the decoded image for specifying each viewpoint of the multi-view image A first decoding means for generating supplementary additional information including a correspondence relationship with a viewpoint ID for identifying a viewpoint for management of the decoded data, decoding the separated second encoded data, and setting a decoded image output order number o Second decoding means to be generated, and a quotient (integer) obtained by dividing the decoded decoded image output order number o by the number V of decoded viewpoints by integer arithmetic, outputs decoded images at the respective viewpoints. A calculation unit that calculates the number d indicating the order, calculates a remainder of division (an integer of 0 or more and less than V) as a number v that identifies each viewpoint, and decodes the second encoded data to decode A third decoding means for generating a multi-viewpoint image signal; A storage means for storing the multi-viewpoint image signal in the image buffer, and a decoded multi-viewpoint image signal from the image buffer in accordance with the number d and the view ID indicating the output order of the decoded image at each viewpoint supplied from the calculation means And an output means for outputting the decoded image signals of the respective viewpoints constituting the decoded multi-view image signal in synchronization with each other. In order to achieve the above object, a multi-viewpoint image receiving program according to a sixth invention causes a computer to execute each step of the second invention.

上記の第２、第４、第６の発明では、多視点画像信号が符号化されていることを示す補足付加情報と、その多視点画像信号の視点の数Ｖと、多視点画像の各視点をそれぞれ特定する視点番号ｖと復号画像の管理用に視点を特定する視点ＩＤとの対応関係が符号化された第１の符号化データと、視点の数Ｖ、各視点のそれぞれを特定する番号ｖ及びそれぞれの視点での復号画像の出力順序を示す番号ｄを一括で示す復号画像出力順番号ｏと、多視点画像信号とがそれぞれ符号化されている第２の符号化データを受信するようにしたことを特徴とし、各画像の視点を特定する視点番号ｖの代わりに復号画像の管理用に視点を特定する視点ＩＤを用意し、その視点ＩＤに基づいて復号画像の視点を管理することができる。 In the second, fourth, and sixth inventions described above, supplementary additional information indicating that the multi-view image signal is encoded, the number V of viewpoints of the multi-view image signal, and each viewpoint of the multi-view image The first encoded data in which the correspondence between the viewpoint number v that identifies each viewpoint and the viewpoint ID that identifies the viewpoint for decoding image management is encoded, the number V of viewpoints, and the number that identifies each viewpoint The second encoded data in which the decoded image output order number o collectively indicating v and the number d indicating the output order of the decoded image at each viewpoint and the multi-view image signal are respectively received are received. A viewpoint ID for specifying a viewpoint for managing the decoded image is prepared instead of the viewpoint number v for specifying the viewpoint of each image, and the viewpoint of the decoded image is managed based on the viewpoint ID Can do.

本発明によれば、復号画像出力順番号ｏから算出される多視点画像信号を構成する各視点の画像信号の視点を特定する視点番号ｖ又は視点を特定する視点ＩＤとそれぞれの視点での復号画像の出力順序を示す番号ｄとに応じて、各視点の復号画像信号を互いに同期させてこれらのパラメータと関連付けて出力させることで、各視点の復号画像信号を適切に出力することができ、復号画像の出力先の多視点画像表示装置等で視点と画像の対応関係を把握することができる。 According to the present invention, the viewpoint number v specifying the viewpoint of the image signal of each viewpoint constituting the multi-view image signal calculated from the decoded image output order number o or the viewpoint ID specifying the viewpoint and the decoding at each viewpoint. According to the number d indicating the output order of the images, the decoded image signal of each viewpoint is synchronized with each other and output in association with these parameters, so that the decoded image signal of each viewpoint can be appropriately output, The correspondence between the viewpoint and the image can be grasped by a multi-view image display device or the like that is the output destination of the decoded image.

更に、本発明によれば、従来の単一視点の画像符号化／復号方式を、多視点画像符号化／復号方式に拡張する際に、本発明の復号画像出力順番号ｏを従来の符号化方式の復号画像の出力順序を示す番号（例えばＡＶＣ／Ｈ.２６４方式におけるpicture order count）として扱い、復号することで、小さな改良により符号化／復号を実現することができる。 Furthermore, according to the present invention, when the conventional single-view image encoding / decoding method is extended to the multi-view image encoding / decoding method, the decoded image output order number o of the present invention is converted to the conventional encoding. Encoding / decoding can be realized with a small improvement by treating and decoding as a number indicating the output order of decoded images of the method (for example, picture order count in the AVC / H.264 method).

更に、本発明によれば、各視点の画像信号を複数チャンネルで出力する際には、従来の立体視画像受信方法及び装置のように復号後に１フレーム又は１フィールド毎に順次配列して出力してから同時化する必要が無く、同時化のための画像バッファを持たず、遅延時間を短くすることができる。 Furthermore, according to the present invention, when the image signals of each viewpoint are output in a plurality of channels, they are sequentially arranged and output for each frame or field after decoding as in the conventional stereoscopic image receiving method and apparatus. It is not necessary to synchronize afterwards, no image buffer for synchronization is provided, and the delay time can be shortened.

以下、図面と共に本発明の実施の形態を説明する。まず、本発明になる多視点画像受信方法、多視点画像受信装置及び多視点画像受信プログラムを説明する前に、本発明で受信して復号すべき符号化された多視点画像信号を生成する多視点画像符号化装置について説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. First, before describing the multi-view image receiving method, multi-view image receiving apparatus, and multi-view image receiving program according to the present invention, a multi-view image signal that is received and decoded according to the present invention is generated. A viewpoint image encoding device will be described.

図１は本発明で受信及び復号すべき符号化された多視点画像信号を生成する多視点画像符号化装置の一例のブロック図を示す。この多視点画像符号化装置は、図１に示すように、符号化管理部１０１、パラメータセット符号化部１０２、多視点画像ＳＥＩ符号化部１０３、復号画像出力順番号算出部１０４、並べ替えバッファ１０５、動き／視差補償予測部１０６、符号化モード判定部１０７、残差信号演算部１０８、残差信号符号化部１０９、残差信号復号部１１０、残差信号重畳部１１１、復号画像バッファ１１２、符号化ビット列生成部１１３、多重化部１１４を備え、この多視点画像符号化装置に入力される多視点画像信号を符号化して得た符号化データ（符号化ビット列）を、送信部にて伝送媒体に応じた所定の信号形式に変換した後出力する。 FIG. 1 is a block diagram showing an example of a multi-view image encoding apparatus that generates an encoded multi-view image signal to be received and decoded according to the present invention. As shown in FIG. 1, the multi-view image encoding apparatus includes an encoding management unit 101, a parameter set encoding unit 102, a multi-view image SEI encoding unit 103, a decoded image output order number calculation unit 104, a rearrangement buffer. 105, motion / disparity compensation prediction unit 106, encoding mode determination unit 107, residual signal calculation unit 108, residual signal encoding unit 109, residual signal decoding unit 110, residual signal superimposing unit 111, decoded image buffer 112 , A coded bit string generation unit 113 and a multiplexing unit 114, and the coded data (coded bit string) obtained by coding the multi-view image signal input to the multi-view image coding apparatus is transmitted by the transmitting unit. The signal is output after being converted into a predetermined signal format according to the transmission medium.

ここで、上記の多視点画像信号は、設定された複数の視点でそれぞれ得られる各視点の画像信号を含む多視点画像信号であり、一の視点の画像信号は、その一の視点から実際に撮影して得られた画像信号、又はその一の視点から仮想的に撮影したものとして生成した画像信号である。また、上記の伝送媒体としては、ネットワーク、有線伝送路、無線伝送路などがあり、更には記録媒体に記録して、その記録媒体を送付することも含む。 Here, the multi-viewpoint image signal is a multi-viewpoint image signal including the image signals of the respective viewpoints obtained from a plurality of set viewpoints, and the image signal of one viewpoint is actually transmitted from the one viewpoint. It is an image signal obtained by photographing, or an image signal generated as a virtually photographed image from one viewpoint. The transmission medium includes a network, a wired transmission path, a wireless transmission path, and the like, and further includes recording on a recording medium and sending the recording medium.

次に、図１に示す多視点画像符号化装置の動作について、ＡＶＣ／Ｈ.２６４符号化方式と関連付けて説明する。まず、図１の符号化管理部１０１にはシーケンス全体の符号化に関わるパラメータ情報、ピクチャの符号化に関わるパラメータ情報が供給され、ＡＶＣ／Ｈ.２６４符号化方式ではシーケンス情報をＳＰＳ（シーケンス・パラメータ・セット）、ピクチャ情報をＰＰＳ（ピクチャ・パラメータ・セット）として管理する。これらの情報は受信側で符号化ビット列を復号する際に必要な情報であり、パラメータセット符号化部１０２に供給される。 Next, the operation of the multi-view image encoding apparatus shown in FIG. 1 will be described in association with the AVC / H.264 encoding method. First, parameter information related to coding of the entire sequence and parameter information related to picture coding are supplied to the coding management unit 101 in FIG. 1. In the AVC / H.264 coding method, sequence information is converted to SPS (sequence Parameter set) and picture information are managed as PPS (picture parameter set). These pieces of information are information necessary for decoding the encoded bit string on the receiving side, and are supplied to the parameter set encoding unit 102.

パラメータセット符号化部１０２は、供給されるシーケンス情報、ピクチャ情報等を符号化する。ＡＶＣ／Ｈ.２６４符号化方式ではＮＡＬ（Network Abstraction Layer;ネットワーク抽象層）の一区切りであるＮＡＬユニットを単位として符号化ビット列を構成するので、ＮＡＬユニットのヘッダ部でＮＡＬユニットの種類を見分ける識別子等を符号化した後、シーケンス情報、ピクチャ情報等を符号化する。さらに、符号化管理部１０１には、多視点画像符号化装置に供給される多視点画像の視点数の情報が供給される。これらの情報は受信側でも多視点画像を表示する際に必要な情報である。 The parameter set encoding unit 102 encodes supplied sequence information, picture information, and the like. In the AVC / H.264 encoding method, an encoded bit string is configured with a NAL unit that is a delimiter of NAL (Network Abstraction Layer) as a unit, so an identifier that identifies the type of NAL unit in the header part of the NAL unit, etc. Then, sequence information, picture information, etc. are encoded. Further, the encoding management unit 101 is supplied with information on the number of viewpoints of the multi-view image supplied to the multi-view image encoding device. These pieces of information are necessary for displaying a multi-viewpoint image on the receiving side.

本発明では、生成する符号化ビット列が多視点画像を符号化されていることを示す多視点画像ＳＥＩ（補足付加情報）を定義する。多視点画像ＳＥＩのペイロード部（データの部分）の符号化シンタックス構造の一例を図３に示す。図３において、「num_views_minus1」は符号化する多視点画像の視点数を示すパラメータであり、視点数は１以上となるため、視点数から１を引いた値を、ＡＶＣ／Ｈ.２６４符号化方式で用意されている指数ゴロム（Exponential Golomb）符号等の可変長や予め長さを規定した固定長で符号化する。 In the present invention, a multi-view image SEI (supplementary additional information) indicating that the encoded bit string to be generated is a multi-view image is defined. An example of the encoding syntax structure of the payload portion (data portion) of the multi-viewpoint image SEI is shown in FIG. In FIG. 3, “num_views_minus1” is a parameter indicating the number of viewpoints of the multi-view image to be encoded. Since the number of viewpoints is 1 or more, a value obtained by subtracting 1 from the number of viewpoints is used as the AVC / H.264 encoding method. The encoding is performed using a variable length such as an exponential Golomb code prepared in the above or a fixed length that defines a length in advance.

図１の多視点画像ＳＥＩ符号化部１０３では多視点画像を符号化されていることを示す多視点画像ＳＥＩ（補足付加情報）を符号化する。すなわち、多視点画像ＳＥＩ符号化部１０３では、まず、ＮＡＬユニットのヘッダ部でＮＡＬユニットの種類を見分ける識別子等を符号化した後、ＳＥＩのペイロード部の種類を見分ける識別子やサイズを符号化し、ＳＥＩのペイロードとして図３に示す符号化シンタックス構造に基づいて「num_views_minus1」を符号化する。 The multi-view image SEI encoding unit 103 in FIG. 1 encodes a multi-view image SEI (supplementary additional information) indicating that the multi-view image is encoded. That is, the multi-viewpoint image SEI encoding unit 103 first encodes an identifier or the like that identifies the type of the NAL unit in the header portion of the NAL unit, and then encodes an identifier or size that identifies the type of the payload portion of the SEI. "Num_views_minus1" is encoded based on the encoding syntax structure shown in FIG.

ここで、ＳＥＩのペイロード部の種類を見分ける識別子の値としてはまだＡＶＣ／Ｈ.２６４符号化方式で規定されていない値を定義する。符号化側で、このように多視点画像ＳＥＩを符号化することで、復号側では、符号化された符号化ビット列を復号する際に、多視点画像ＳＥＩを復号することで、符号化された符号化ビット列のコンテンツが多視点画像であることや、多視点画像の視点数を知ることができる。また、多視点画像ＳＥＩが規定されていない従来のＡＶＣ／Ｈ.２６４デコーダでも、この多視点画像ＳＥＩを無効にすることで、符号化ビット列を復号し、復号画像を出力することはでき、従来との互換性を保つことができる。ただし、多視点画像ＳＥＩが規定されていないので、多視点画像ＳＥＩを復号することはできず、符号化ビット列のコンテンツが多視点画像か否か、多視点画像であった場合の視点数は分からない。 Here, a value not yet defined by the AVC / H.264 encoding method is defined as an identifier value for identifying the type of the payload portion of the SEI. On the encoding side, the multi-view image SEI is encoded in this way, and on the decoding side, the encoded multi-view image SEI is decoded when the encoded bit string is decoded. It is possible to know that the content of the encoded bit string is a multi-view image and the number of viewpoints of the multi-view image. Further, even in the conventional AVC / H.264 decoder in which the multi-view image SEI is not defined, the encoded bit string can be decoded and the decoded image can be output by invalidating the multi-view image SEI. Compatibility with can be maintained. However, since the multi-view image SEI is not defined, the multi-view image SEI cannot be decoded, and whether the content of the encoded bit string is a multi-view image or not is not known. Absent.

さらに、符号化管理部１０１には多視点画像を構成する各画像のそれぞれについて、各視点をそれぞれ特定する情報、タイムスタンプ等の情報が供給される。これらの情報や画像の入力順序等を基に、各画像がどの視点に属するか、それぞれの視点での復号画像の出力順序（本多視点画像符号化装置で符号化して得られる符号化ビット列を復号側で復号して得られるそれぞれの視点での復号画像の出力順序）、各視点間の同期等を管理する。各画像のそれぞれについて、視点を特定する視点番号ｖ、及びそれぞれの視点での復号画像の出力順序を示す番号ｄをつけ、各画像にそれぞれ対応付ける。 Further, the encoding management unit 101 is supplied with information for specifying each viewpoint, information such as a time stamp, etc. for each of the images constituting the multi-viewpoint image. Based on these information and image input order, etc., which viewpoint each image belongs to, the output order of the decoded image at each viewpoint (the encoded bit string obtained by encoding with this multi-viewpoint image encoding device) (Decoding image output order at each viewpoint obtained by decoding on the decoding side), synchronization between each viewpoint, and the like are managed. For each image, a viewpoint number v that identifies the viewpoint and a number d that indicates the output order of the decoded image at each viewpoint are assigned, and are associated with each image.

ここで、各視点のそれぞれを特定する視点番号ｖの値の割り当て方について説明する。各視点の視点番号ｖには０、または視点数を示すＶ未満の正の整数を各視点のそれぞれにユニークに割り当てる。ここで、各視点のそれぞれを特定する視点番号ｖは復号側でそれぞれの時刻における各視点の出力順序を示す番号としても用いるので、その順序を指定するために、復号側でのそれぞれの時刻における各視点の出力順序に応じて０から昇順に１ずつ増加させて、各視点の視点番号ｖに割り当てる。また、各視点に所属するそれぞれの画像にも当該視点の視点番号ｖと同一の値を割り当てることで、視点番号ｖによりそれぞれの画像がどの視点に所属するかを特定することができる。 Here, how to assign the value of the viewpoint number v that specifies each viewpoint will be described. The viewpoint number v of each viewpoint is uniquely assigned to each viewpoint by 0 or a positive integer less than V indicating the number of viewpoints. Here, the viewpoint number v for identifying each viewpoint is also used as a number indicating the output order of each viewpoint at each time on the decoding side. Therefore, in order to specify the order, the viewpoint number v at each time on the decoding side is used. According to the output order of each viewpoint, it is incremented by 1 from 0 in ascending order and assigned to the viewpoint number v of each viewpoint. Also, by assigning the same value as the viewpoint number v of the viewpoint to each image belonging to each viewpoint, it is possible to specify which viewpoint each image belongs to by the viewpoint number v.

次に、それぞれの視点での復号画像の出力順序を示す番号ｄの値の割り当て方について説明する。番号ｄにはそれぞれ整数を割り当て、その値は復号画像の出力順序に応じて増加させる。ただし、視点間のそれぞれの復号画像の出力時刻が同じ場合、番号ｄの値はそれぞれ同一にする。この規則に従えば、ある時刻において、他の視点に画像が存在するにも係わらず、ある視点には画像が存在しない場合、その視点の画像に相当する番号ｄの値を欠番となる。以上のようにそれぞれの復号画像の出力順序を示す番号ｄに値を割り当てることで、復号側では番号ｄの値に応じて番号ｄの小さい画像から出力することで、所望の出力順序で出力することができ、また、視点間においては番号ｄが等しい場合、出力時刻が同じであることが判別できるので、視点間で同期させて出力することができる。加えて、符号化管理部１０１では各画像の符号化順序を管理すると共に、動き補償予測／視差補償予測に用いる参照画像を管理する。 Next, how to assign the value of the number d indicating the output order of the decoded image from each viewpoint will be described. An integer is assigned to each number d, and the value is increased in accordance with the output order of decoded images. However, when the output times of the decoded images between the viewpoints are the same, the values of the numbers d are the same. According to this rule, when an image does not exist at a certain viewpoint even though an image exists at another viewpoint at a certain time, the value of the number d corresponding to the image at that viewpoint is missing. As described above, by assigning a value to the number d indicating the output order of each decoded image, the decoding side outputs from the image having a smaller number d according to the value of the number d, and outputs in the desired output order. In addition, when the number d is the same between the viewpoints, it can be determined that the output times are the same, so that the viewpoints can be output in synchronization. In addition, the coding management unit 101 manages the coding order of each image and manages reference images used for motion compensation prediction / parallax compensation prediction.

図４は図１の多視点画像符号化装置で符号化する多視点画像を構成する各画像のそれぞれについて、視点を特定する視点番号ｖ及びそれぞれの視点での復号画像の出力順序を示す番号ｄに値を割り当てた場合の一例を示す図である。図４において、縦軸が視点方向を表し、横軸は時間方向を表している。また、Ｍ（ｖ）（ｖ＝０，１，２，・・・，Ｖ−１）は多視点画像を構成する視点画像を示しており、ｖは各視点のそれぞれを特定する視点番号である。さらに、ｍ（ｖ，ｄ）（ｖ＝０，１，２，・・・，Ｖ−１；ｄ＝０，１，２，・・・）は、視点画像Ｍ（ｖ）を構成する画像を示しており、ｖは各視点のそれぞれを特定する視点番号、ｄはそれぞれの視点での復号画像の出力順序を示す番号である。例えば、ある２つの画像ｍ（ｖ，ｄ）を比較する際に、両者の番号ｖの値が同じ場合は、両者は同じ視点の画像であり、両者の番号ｖの値が異なる場合は、両者はそれぞれ異なる視点の画像である。また、両者の番号ｄの値が同じ場合は、同じ時刻の画像である。両者の番号ｄの値が異なる場合は、異なる時刻の画像であり、値の小さい方が早い時刻の画像である。 FIG. 4 is a view number v that identifies the viewpoint and a number d that indicates the output order of the decoded image at each viewpoint for each of the images constituting the multi-view image encoded by the multi-view image encoding device of FIG. It is a figure which shows an example at the time of assigning a value to. In FIG. 4, the vertical axis represents the viewpoint direction, and the horizontal axis represents the time direction. M (v) (v = 0, 1, 2,..., V−1) indicates viewpoint images constituting the multi-viewpoint image, and v is a viewpoint number for specifying each viewpoint. . Further, m (v, d) (v = 0, 1, 2,..., V−1; d = 0, 1, 2,...) Represents an image constituting the viewpoint image M (v). In the figure, v is a viewpoint number that identifies each viewpoint, and d is a number that indicates the output order of the decoded image at each viewpoint. For example, when comparing two images m (v, d), if the values of the numbers v are the same, they are images of the same viewpoint, and if the values of the numbers v are different, both Are images of different viewpoints. When the values of the numbers d are the same, the images are the same time. When the values of the numbers d are different, the images are at different times, and the smaller value is the image at the earlier time.

多視点画像信号の視点の数Ｖを符号化するのに加えて、視点を特定する視点番号ｖとそれぞれの視点での復号画像の出力順序を示す番号ｄを個別に符号化することもできるが、図１の多視点符号化装置では、両者を一括で示す復号画像出力順番号ｏとして符号化する。従来の単一視点の画像符号化／復号方式を本実施の形態の多視点画像符号化／復号方式に拡張する際に、図１の方式の視点を特定する視点番号ｖとそれぞれの視点での復号画像の出力順序を示す番号ｄとを一括で示す復号画像出力順番号ｏを従来の単一視点の画像符号化／復号方式の復号画像の出力順序を示す番号として扱い、符号化／復号することで、小さな改良により従来の単一視点の画像符号化／復号方式との互換をとることができる。具体的には、ＡＶＣ／Ｈ.２６４方式では、本方式の復号画像出力順番号ｏをＡＶＣ／Ｈ.２６４方式の復号画像の出力順序を示す番号であるピクチャ・オーダー・カウント（picture order count）として扱う。 In addition to encoding the number of viewpoints V of the multi-viewpoint image signal, it is also possible to individually encode the viewpoint number v for specifying the viewpoint and the number d indicating the output order of the decoded image at each viewpoint. In the multi-viewpoint encoding apparatus of FIG. 1, encoding is performed as a decoded image output order number o that collectively indicates both. When the conventional single-view image encoding / decoding method is extended to the multi-view image encoding / decoding method of the present embodiment, the viewpoint number v for specifying the viewpoint of the method of FIG. The decoded image output order number o that collectively indicates the output order d of the decoded image is treated as a number that indicates the output order of the decoded image of the conventional single-view image encoding / decoding method, and is encoded / decoded. Thus, with a small improvement, compatibility with the conventional single-viewpoint image encoding / decoding scheme can be achieved. Specifically, in the AVC / H.264 system, the decoded image output order number o of this system is a number indicating the output order of decoded images of the AVC / H.264 system, and is a picture order count. Treat as.

図１の復号画像出力順番号算出部１０４では符号化する多視点画像の視点数Ｖ、各視点のそれぞれを特定する視点番号ｖ、及びそれぞれの視点での復号画像の出力順序ｄから復号画像出力順番号ｏを算出する。復号画像出力順番号ｏは次式により算出する。 The decoded image output order number calculation unit 104 in FIG. 1 outputs the decoded image from the viewpoint number V of the multi-view image to be encoded, the viewpoint number v that identifies each viewpoint, and the output order d of the decoded image at each viewpoint. A sequence number o is calculated. The decoded image output order number o is calculated by the following equation.

ｏ＝ｄ・Ｖ＋ｖ（１）
図５は図１に示す多視点画像符号化装置で符号化する５視点（Ｖ＝５）の多視点画像を構成する各画像のそれぞれについて（１）式により、復号画像出力順番号ｏを算出し、値を割り当てた場合の一例を示す図である。また、図６は図１に示す多視点画像符号化装置で符号化する視点毎にフレームレートの異なる５視点（Ｖ＝５）の多視点画像を構成する各画像のそれぞれについて（１）式により、復号画像出力順番号ｏを算出し、値を割り当てた場合の一例を示す図である。ただし、図６では視点毎にフレームレートが異なっており、視点画像Ｍ（０）、Ｍ（２）、Ｍ（４）に対して、視点画像Ｍ（１）、Ｍ（３）のフレームレートが小さくなっている。視点画像Ｍ（１）、Ｍ（３）には、復号画像の出力順序を示す番号ｄが”１”、”３”、”５”の値をとる画像が存在せず、そのために、復号画像出力順番号ｏが”６”，”８”，”１６”，”１８”，”２６”，”２８”の値をとる画像が存在しない。 o = d · V + v (1)
FIG. 5 calculates the decoded image output order number o according to the equation (1) for each of the images constituting the multi-view image of five viewpoints (V = 5) encoded by the multi-view image encoding device shown in FIG. FIG. 8 is a diagram illustrating an example when values are assigned. Further, FIG. 6 is a diagram for each of the images constituting a multi-view image of five viewpoints (V = 5) having different frame rates for each viewpoint encoded by the multi-view image encoding device shown in FIG. FIG. 10 is a diagram illustrating an example when a decoded image output order number o is calculated and assigned a value. However, in FIG. 6, the frame rate is different for each viewpoint, and the viewpoint images M (1) and M (3) have frame rates different from the viewpoint images M (0), M (2), and M (4). It is getting smaller. In the viewpoint images M (1) and M (3), there are no images in which the number d indicating the output order of the decoded images takes values of “1”, “3”, and “5”. There is no image in which the output order number o takes the values “6”, “8”, “16”, “18”, “26”, “28”.

また、復号側である後述する多視点画像復号装置では図１の多視点画像符号化装置で符号化されたビット列を復号して得られる多視点画像の視点数Ｖと復号画像出力順番号ｏから（１）式を満たす各画像の視点を特定する視点番号ｖ（ただし、ｖは０以上Ｖ未満の整数）とそれぞれの視点での復号画像の出力順序を示す番号ｄ（整数）を算出する。具体的には、番号ｄは復号画像出力順番号ｏを視点数Ｖで整数演算により除算して得た商とする。また、番号ｖは復号画像出力順番号ｏを視点数Ｖで整数演算により除算したときの剰余の値とする。または、視点番号ｖは番号ｄを算出した後で次式により算出してもよい。 Further, in the multi-view image decoding apparatus described later on the decoding side, from the viewpoint number V of the multi-view image obtained by decoding the bit string encoded by the multi-view image encoding apparatus in FIG. 1 and the decoded image output order number o. A viewpoint number v (where v is an integer greater than or equal to 0 and less than V) and a number d (integer) indicating the output order of decoded images at each viewpoint are calculated. Specifically, the number d is a quotient obtained by dividing the decoded image output order number o by the number of viewpoints V by integer arithmetic. The number v is a remainder value obtained by dividing the decoded image output order number o by the number of viewpoints V by integer arithmetic. Alternatively, the viewpoint number v may be calculated by the following equation after calculating the number d.

ｖ＝ｏ−ｄ・Ｖ（２）
図１の符号化ビット列生成部１１３は、スライス毎に復号画像出力順番号算出部１０４で算出された復号画像出力順番号ｏを他のスライス情報と共にビット列に符号化する。ここで、ＡＶＣ／Ｈ.２６４符号化方式ではＮＡＬ（Network Abstraction Layer;ネットワーク抽象層）の一区切りであるＮＡＬユニットを単位として符号化ビット列を構成するので、ＮＡＬユニットのヘッダ部でＮＡＬユニットの種類を見分ける識別子等を符号化した後、復号画像出力順番号ｏを含むスライス情報を符号化する。 v = od−V (2)
The encoded bit string generation unit 113 in FIG. 1 encodes the decoded image output order number o calculated by the decoded image output order number calculation unit 104 for each slice into a bit string together with other slice information. Here, in the AVC / H.264 encoding method, an encoded bit string is configured in units of NAL units that are a delimiter of NAL (Network Abstraction Layer), so the type of NAL unit is set in the header portion of the NAL unit. After encoding an identifier or the like for identifying, slice information including the decoded image output order number o is encoded.

また、並べ替えバッファ１０５は供給される多視点画像を格納する。ここで、多視点画像を構成する視点画像Ｍ（ｖ）（ｖ＝０，１，２，・・・，Ｖ−１）は各視点毎にそれぞれ独立したチャンネルで並列に入力する方法と、各視点の画像信号がインターリーブされた信号として１つのチャンネルでシリアルに入力する方法がある。各視点の画像信号をインターリーブする方法としては、各視点の画像信号を画素単位でインターリーブする方法、複数の画素を纏めた単位でインターリーブする方法、水平方向のライン単位でインターリーブする方法、画像単位でインターリーブする方法、複数の画像を纏めた単位でインターリーブする方法等がある。 The rearrangement buffer 105 stores the supplied multi-viewpoint images. Here, the viewpoint images M (v) (v = 0, 1, 2,..., V−1) constituting the multi-viewpoint image are input in parallel through independent channels for each viewpoint, There is a method in which a viewpoint image signal is serially input through one channel as an interleaved signal. As a method of interleaving the image signals of each viewpoint, a method of interleaving the image signals of each viewpoint in units of pixels, a method of interleaving in units of a plurality of pixels, a method of interleaving in units of horizontal lines, a unit of images There are a method of interleaving, a method of interleaving in units of a plurality of images, and the like.

入力される視点画像Ｍ（ｖ）のインターリーブ構造の例を図７〜図１４を用いて説明する。図７は各視点の信号を画素単位でインターリーブした場合の例である。同図において、ｐ（ｖ，ｉ）は各視点画像の画素を表し、ｖは視点を特定する視点番号、ｉは画素のインデックスである。 An example of the interleave structure of the input viewpoint image M (v) will be described with reference to FIGS. FIG. 7 shows an example in which the signals at each viewpoint are interleaved in units of pixels. In the figure, p (v, i) represents a pixel of each viewpoint image, v is a viewpoint number for specifying the viewpoint, and i is a pixel index.

図８は各視点の信号を複数の画素を纏めた単位でインターリーブした場合の例である。同図において、ｐｙ（ｖ，ｉｙ）は各視点画像の輝度信号の画素を表し、ｖは視点を特定する視点番号、ｉｙは輝度信号の画素のインデックスである。ｐｕ（ｖ，ｉｕ）は色差信号（Ｕ）の画素を表し、ｉｕは色差信号（Ｕ）の画素のインデックスである。ｐｖ（ｖ，ｉｖ）は色差信号（Ｖ）の画素を表し、ｉｖは色差信号（Ｖ）の画素のインデックスである。 FIG. 8 shows an example of interleaving the signals at each viewpoint in units of a plurality of pixels. In the figure, py (v, iy) represents a pixel of the luminance signal of each viewpoint image, v is a viewpoint number for specifying the viewpoint, and iy is an index of the pixel of the luminance signal. pu (v, iu) represents a pixel of the color difference signal (U), and iu is an index of the pixel of the color difference signal (U). pv (v, iv) represents a pixel of the color difference signal (V), and iv is an index of the pixel of the color difference signal (V).

図９は各視点の信号を水平方向１６画素、垂直方向１６画素の画素ブロック単位、あるいは水平方向１６画素、垂直方向８画素の画素ブロック単位でインターリーブした場合の例を示す。同図において、ｂ（ｖ，ｉｂ）は各視点画像の画素を表し、ｖは視点を特定する視点番号、ｉｂは画素ブロックのインデックスである。画素ブロックを水平ラインで走査することにより、複数画素を纏めた単位でインターリーブしたものの一種であるといえる。 FIG. 9 shows an example in which the signals of each viewpoint are interleaved in units of pixel blocks of 16 pixels in the horizontal direction and 16 pixels in the vertical direction, or in units of pixel blocks of 16 pixels in the horizontal direction and 8 pixels in the vertical direction. In the figure, b (v, ib) represents a pixel of each viewpoint image, v is a viewpoint number for specifying the viewpoint, and ib is a pixel block index. It can be said that this is a type of interleaving in a unit of a plurality of pixels by scanning a pixel block along a horizontal line.

図１０は各視点の信号を水平方向のライン単位でインターリーブした場合の例である。同図において、ｌ（ｖ，ｊ）は各視点画像のラインを表し、ｖは視点を特定する視点番号、ｊはラインのインデックスである。ラインは複数の画素から構成されるので、複数画素を纏めた単位でインターリーブしたものの一種であるといえる。 FIG. 10 shows an example of interleaving the signals of each viewpoint in units of horizontal lines. In the figure, l (v, j) represents a line of each viewpoint image, v is a viewpoint number for specifying the viewpoint, and j is an index of the line. Since a line is composed of a plurality of pixels, it can be said that the line is a kind of interleaved unit of a plurality of pixels.

また、図１１は各視点の信号を１つの画像に纏めた形式でインターリーブした場合の例を示す。この場合は１つに纏めた画像を水平ラインで走査することにより、水平ライン単位でインターリーブしたものの一種であるといえる。図１１（Ａ）は各視点の信号を纏めた１つの画像を示し、同図（Ｂ）は水平ライン単位でインターリーブした画像を示す。同図において、ｖは視点を特定する視点番号、ｄはそれぞれの視点での画像の撮影順序を示す番号（それぞれの視点での復号画像の出力順序を示す番号）を示す。また、ｌ（ｖ，ｊ）のｊはラインインデックスを示す。 FIG. 11 shows an example of interleaving the signals of the respective viewpoints in a form in which the signals are combined into one image. In this case, it can be said that this is a type of interleaving in units of horizontal lines by scanning a single image with horizontal lines. FIG. 11A shows one image in which the signals of the respective viewpoints are collected, and FIG. 11B shows an image interleaved in units of horizontal lines. In the same figure, v indicates a viewpoint number for specifying a viewpoint, and d indicates a number indicating an imaging order of images at each viewpoint (a number indicating an output order of decoded images at each viewpoint). Further, j in l (v, j) indicates a line index.

図１２は各視点の信号を複数のラインを纏めたスライス単位でインターリーブした場合の例を示す。同図において、ｓ（ｖ，ｋ）は各視点画像の輝度信号の画素を表し、ｖは視点を特定する視点番号、ｋは複数のラインを纏めたスライスのインデックスである。スライスは複数の画素から構成されるので、複数画素を纏めた単位でインターリーブしたものの一種であるといえると共に、画素ブロック単位でインターリーブしたものの一種であるともいえる。 FIG. 12 shows an example in which the signals of each viewpoint are interleaved in units of slices in which a plurality of lines are combined. In the figure, s (v, k) represents a pixel of a luminance signal of each viewpoint image, v is a viewpoint number for specifying the viewpoint, and k is an index of a slice in which a plurality of lines are collected. Since a slice is composed of a plurality of pixels, it can be said that it is a kind of interleaving in units of a plurality of pixels and also a kind of interleaving in units of pixel blocks.

図１３は各視点の信号を画像単位でインターリーブした場合の例を示す。同図において、ｍ（ｖ，ｄ）は各視点画像を構成する画像を表し、ｖは視点を特定する視点番号、ｄはそれぞれの視点での画像の撮影順序を示す番号（それぞれの視点での復号画像の出力順序を示す番号）である。番号ｄが等しい各視点のそれぞれの画像信号を１つのグループとし、そのグループにおいて、画像単位で連続的に番号ｖの値が小さいものから順に入力する。さらに、番号ｄの値が小さいグループから順に入力することで、画像単位で各視点を互いに同期して入力することができる。 FIG. 13 shows an example in which the signals of each viewpoint are interleaved in units of images. In the figure, m (v, d) represents an image constituting each viewpoint image, v is a viewpoint number for specifying the viewpoint, and d is a number indicating an image capturing order at each viewpoint (at each viewpoint). The number indicating the output order of the decoded image). The image signals of the respective viewpoints having the same number d are set as one group, and in that group, the values of the number v are sequentially input in order from the smallest. Furthermore, by inputting in order from the group with the smallest value of the number d, the viewpoints can be input in synchronization with each other in units of images.

図１４は各視点の信号を複数の画像を纏めた単位でインターリーブした場合の例を示す。同図において、ｍ（ｖ，ｄ）は図１３と同様に各視点画像を構成する画像を表し、ｖは視点を特定する視点番号、ｄはそれぞれの視点での画像の撮影順序を示す番号（それぞれの視点での復号画像の出力順序を示す番号）である。 FIG. 14 shows an example in which the signals of each viewpoint are interleaved in units of a plurality of images. In the same figure, m (v, d) represents an image constituting each viewpoint image as in FIG. 13, v is a viewpoint number for specifying the viewpoint, and d is a number indicating the image capturing order at each viewpoint ( The number indicating the output order of the decoded image at each viewpoint).

図１の多視点画像符号化装置においては、図７〜図１４のうちのいずれの方法で画像信号を入力する場合においても、遅延させることなく、適宜並べ替えバッファ１０５に入力し、格納する。更に、符号化管理部１０１で制御される符号化順に応じて、並べ替えバッファ１０５に格納された画像信号が画素ブロック単位で、適宜動き／視差補償予測部１０６及び残差信号演算部１０８にそれぞれ供給される。 In the multi-view image encoding device of FIG. 1, even when an image signal is input by any of the methods in FIGS. 7 to 14, the image signal is appropriately input to the rearrangement buffer 105 and stored without delay. Further, according to the encoding order controlled by the encoding management unit 101, the image signal stored in the rearrangement buffer 105 is appropriately supplied to the motion / disparity compensation prediction unit 106 and the residual signal calculation unit 108 in units of pixel blocks, respectively. Supplied.

以上のように、図１に示した多視点画像符号化装置において、前記各視点の視点画像をそれぞれ独立したチャンネルで並列に入力する方法と、前記各視点の視点画像をインターリーブされた信号として１つのチャンネルでシリアルに入力する方法のいずれにおいても遅延させることなく、適宜並べ替えバッファ１０５に各視点の視点画像を入力し、格納する。従って、従来例の立体視画像符号化方法及び装置のように符号化の前に１フレーム又は１フィールド毎に順次配列する必要が無く、順次化のための画像バッファを持たず、遅延時間を短くすることができるという効果を得ることができる。 As described above, in the multi-view image encoding apparatus shown in FIG. 1, a method of inputting the viewpoint images of the respective viewpoints in parallel through independent channels, and the viewpoint images of the respective viewpoints as an interleaved signal 1 In any of the serial input methods using one channel, the viewpoint images of the respective viewpoints are input and stored in the rearrangement buffer 105 as appropriate without delay. Therefore, unlike the conventional stereoscopic image encoding method and apparatus, it is not necessary to sequentially arrange one frame or one field before encoding, and there is no image buffer for sequentialization, and the delay time is shortened. The effect that it can be done can be obtained.

次に、符号化管理部１０１で制御する符号化順序について図１５を用いて説明する。図１５は５視点（Ｖ＝５）の多視点画像を構成する各画像ｍ（ｖ，ｄ）の符号化順序、及び動き補償／視差補償の参照関係の一例を示す図である。同図において、視点画像Ｍ（０）、Ｍ（２）、Ｍ（４）は他の視点の画像を参照する視差補償予測を行わずに符号化する。例えば、視点画像Ｍ（０）の画像ｍ（０，０）は他の画像を参照せず、画面内だけで独立して符号化するピクチャとして符号化する。また、視点画像Ｍ（０）の画像ｍ（０，３）は同一視点の表示順序で前方の画像ｍ（０，０）の復号画像を参照画像とし、動き補償予測を用いて、符号化する。更に、視点画像Ｍ（０）の画像ｍ（０，１）は同一視点の表示順序で前方の画像ｍ（０，０）及び後方の画像ｍ（０，３）の復号画像を参照画像とし、動き補償予測を用いて、符号化する。 Next, the encoding order controlled by the encoding management unit 101 will be described with reference to FIG. FIG. 15 is a diagram illustrating an example of the encoding order of each image m (v, d) constituting a multi-view image of five viewpoints (V = 5) and a reference relationship of motion compensation / parallax compensation. In the figure, viewpoint images M (0), M (2), and M (4) are encoded without performing disparity compensation prediction referring to images of other viewpoints. For example, the image m (0,0) of the viewpoint image M (0) is encoded as a picture that is encoded independently only within the screen without referring to other images. Further, the image m (0,3) of the viewpoint image M (0) is encoded using the motion compensated prediction with the decoded image of the front image m (0,0) in the display order of the same viewpoint as the reference image. . Further, the image m (0, 1) of the viewpoint image M (0) is a reference image that is a decoded image of the front image m (0, 0) and the rear image m (0, 3) in the same viewpoint display order. Encode using motion compensated prediction.

一方、視点画像Ｍ（１）、Ｍ（３）は動き補償予測に加えて、他の視点の画像を参照画像として予測する視差補償予測を用いて符号化する。例えば、視点画像Ｍ（１）の画像ｍ（１，１）は同一視点の表示順序で前方の画像ｍ（１，０）及び後方の画像ｍ（１，３）の復号画像を参照画像とし、動き補償予測を行うのに加えて、別視点の画像ｍ（０，１）及びｍ（２，１）の復号画像を参照画像とし、視差補償予測を用いて符号化する。 On the other hand, viewpoint images M (1) and M (3) are encoded using disparity compensation prediction that predicts an image of another viewpoint as a reference image in addition to motion compensation prediction. For example, the image m (1,1) of the viewpoint image M (1) is a reference image that is a decoded image of the front image m (1,0) and the rear image m (1,3) in the display order of the same viewpoint. In addition to performing motion compensation prediction, the decoded images of the images m (0, 1) and m (2, 1) of different viewpoints are used as reference images and encoded using parallax compensation prediction.

視点画像Ｍ（１）の画像ｍ（１，１）を符号化する際には、参照画像となる画像ｍ（１，０）、ｍ（１，３）、ｍ（０，１）及びｍ（２，１）は符号化、復号が完了し、復号画像バッファ１１２に格納されていなければならない。本例の参照関係では、ｍ（０，０）、ｍ（２，０）、ｍ（１，０）、ｍ（４，０）、ｍ（３，０）、ｍ（０，３）、ｍ（２，３）、ｍ（１，３）、ｍ（４，３）、ｍ（３，３）、ｍ（０，１）、ｍ（２，１）、ｍ（１，１）、ｍ（４，１）、ｍ（３，１）、ｍ（０，２）、ｍ（２，２）、ｍ（１，２）、ｍ（４，２）、ｍ（３，２）、ｍ（０，６）、ｍ（２，６）、ｍ（１，６）、ｍ（４，６）、ｍ（３，６）、ｍ（０，４）、ｍ（２，４）、ｍ（１，４）、・・・の符号化順で符号化する。この符号化順序は、復号画像出力順番号ｏと１対１に対応しており、符号化管理部１０１で管理される。ここで、本例においては、いずれの時刻においても視点方向の各画像の符号化順序は同じであり、いずれの視点においても時間方向の各画像の符号化順序は同じである。 When the image m (1, 1) of the viewpoint image M (1) is encoded, the images m (1, 0), m (1, 3), m (0, 1), and m ( 2 and 1) must be encoded and decoded and stored in the decoded image buffer 112. In the reference relationship of this example, m (0,0), m (2,0), m (1,0), m (4,0), m (3,0), m (0,3), m (2,3), m (1,3), m (4,3), m (3,3), m (0,1), m (2,1), m (1,1), m ( 4,1), m (3,1), m (0,2), m (2,2), m (1,2), m (4,2), m (3,2), m (0 , 6), m (2, 6), m (1, 6), m (4, 6), m (3, 6), m (0, 4), m (2, 4), m (1, 4) Encoding is performed in the order of encoding. This encoding order has a one-to-one correspondence with the decoded image output order number o and is managed by the encoding management unit 101. Here, in this example, the encoding order of the images in the viewpoint direction is the same at any time, and the encoding order of the images in the time direction is the same at any viewpoint.

動き／視差補償予測部１０６は、従来のＡＶＣ／Ｈ.２６４方式等と同様に動き補償予測を行うのに加えて、前述の視差補償予測を行う。動き補償予測は表示順序で前方または後方の同一視点の画像を参照画像とするが、視差補償予測は別視点の画像を参照画像とすれば共通の処理を行うことができる。符号化管理部１０１の制御に応じて、並べ替えバッファ１０５から供給される画素ブロックと、復号画像バッファ１１２から供給される参照画像との間でブロックマッチングを行い、動き補償予測の場合は動きベクトル、視差補償予測の場合は視差ベクトルを検出し、動き補償予測／視差補償予測ブロック信号を作成して動き補償予測／視差補償予測ブロック信号、及び動きベクトル／視差ベクトルを符号化モード判定部１０７に供給する。 The motion / disparity compensation prediction unit 106 performs the above-described disparity compensation prediction in addition to performing the motion compensation prediction in the same manner as in the conventional AVC / H.264 scheme or the like. In motion compensated prediction, an image of the same viewpoint in front or rear in the display order is used as a reference image, but in parallax compensated prediction, a common process can be performed if an image of another viewpoint is used as a reference image. In accordance with control of the encoding management unit 101, block matching is performed between the pixel block supplied from the rearrangement buffer 105 and the reference image supplied from the decoded image buffer 112. In the case of motion compensation prediction, a motion vector is used. In the case of disparity compensation prediction, a disparity vector is detected, a motion compensation prediction / disparity compensation prediction block signal is generated, and the motion compensation prediction / disparity compensation prediction block signal and the motion vector / disparity vector are input to the encoding mode determination unit 107. Supply.

動き補償予測／視差補償予測を行うか否か、参照画像の数、どの復号画像を参照画像とするか、画素ブロックのサイズ等の候補の組み合わせは符号化管理部１０１で制御され、この制御に応じて動き補償予測／視差補償予測に関するすべての符号化モードの候補となるすべての組み合わせについて動き補償予測／視差補償予測を行い、それぞれの動き補償予測／視差補償予測ブロック信号、及び動きベクトル／視差ベクトルを符号化モード判定部１０７に供給する。ここでの画素ブロックのサイズの候補とは、画素ブロックを更に分割したそれぞれの小ブロックのことである。例えば、画素ブロックを水平方向１６画素、垂直方向１６画素（すなわち、１６×１６）とした場合、１６×８、８×１６、８×８、８×４、４×８、４×４等の小ブロックに分割して動き補償予測を行い、候補とする。 The encoding management unit 101 controls candidate combinations such as whether to perform motion compensation prediction / disparity compensation prediction, the number of reference images, which decoded image to use as a reference image, and the size of a pixel block. Accordingly, motion compensation prediction / disparity compensation prediction is performed for all combinations that are candidates for all coding modes related to motion compensation prediction / disparity compensation prediction, and each motion compensation prediction / disparity compensation prediction block signal and motion vector / disparity are predicted. The vector is supplied to the encoding mode determination unit 107. The candidate pixel block size here is each small block obtained by further dividing the pixel block. For example, if the pixel block is 16 pixels in the horizontal direction and 16 pixels in the vertical direction (that is, 16 × 16), 16 × 8, 8 × 16, 8 × 8, 8 × 4, 4 × 8, 4 × 4, etc. Dividing into small blocks, motion compensation prediction is performed, and candidates are set.

符号化モード判定部１０７では、動き補償予測、視差補償予測のどの手法をどの参照画像を用いてどのような画素ブロック単位で選択、組み合わせると効率の良い符号化が実現できるかを判定して符号化モードを決定し、得られた符号化モード、及び当該動きベクトル／視差ベクトルを符号化ビット列生成部１１３に供給すると共に、当該動き補償予測／視差補償予測ブロック信号を残差信号演算部１０８に供給する。 The encoding mode determination unit 107 determines which method of motion compensation prediction and parallax compensation prediction is selected and combined in which pixel block unit using which reference image and efficient encoding can be realized. Encoding mode and the obtained encoding mode and the motion vector / disparity vector obtained are supplied to the encoded bit string generation unit 113, and the motion compensation prediction / disparity compensation prediction block signal is supplied to the residual signal calculation unit 108. Supply.

例えば、時間軸上で前と後の参照画像からの動き補償予測を組み合わせる場合、前の参照画像から動き補償予測を行って得られた動き補償予測ブロックと、後の参照画像から動き補償予測を行って得られた動き補償予測ブロックの各画素値を平均したブロックを生成して候補とする。また、動き補償予測と視差補償予測と組み合わせることもできる。さらに、画素値を平均する際には１：１の平均のみならず、１：２、１：３などの重み付けをしてもよい。また、画素ブロックを４×４画素から１６×１６画素の小ブロックに分割して符号化モードの候補とした場合、それぞれの小ブロックの予測方法を変えることもできる。 For example, when combining motion compensated prediction from the previous and subsequent reference images on the time axis, the motion compensated prediction block obtained by performing the motion compensated prediction from the previous reference image and the motion compensated prediction from the subsequent reference image are used. A block obtained by averaging the pixel values of the motion compensated prediction block obtained by the above is generated and set as a candidate. Also, motion compensation prediction and parallax compensation prediction can be combined. Furthermore, when averaging pixel values, weighting such as 1: 2 or 1: 3 may be used in addition to the average of 1: 1. Further, when the pixel block is divided into small blocks of 4 × 4 pixels to 16 × 16 pixels to be candidates for the encoding mode, the prediction method of each small block can be changed.

符号化モードを判定する手法については様々なものがあるが、例えば各符号化モードについて符号量と歪み量を算出し、これら符号量と歪み量のバランスにおいて最適な符号化モードを選択する手法がある。この符号化モード判定では、まずそれぞれの符号化モードの組み合わせに対して、残差信号を算出し、この残差信号やベクトル及び符号化モードを符号化して得られる符号化列のビット長を算出し、符号量とする。さらに、符号化した残差信号を復号し、予測信号と加算された復号信号と符号化前の画像信号との絶対値誤差和、あるいは二乗和を算出し、歪み量とする。符号量に予め定めた乗数を乗じ、歪み量に加算し、評価値とする。候補となるすべての符号化モードの組み合わせの評価値の中で最小のものを選択し、当該画素ブロックの符号化モードとする。 There are various methods for determining the encoding mode. For example, there is a method for calculating the code amount and the distortion amount for each encoding mode and selecting the optimum encoding mode in the balance between the code amount and the distortion amount. is there. In this encoding mode determination, first, a residual signal is calculated for each combination of encoding modes, and the bit length of an encoded sequence obtained by encoding the residual signal, vector, and encoding mode is calculated. Code amount. Further, the encoded residual signal is decoded, and an absolute value error sum or a square sum of the decoded signal added with the prediction signal and the image signal before encoding is calculated to obtain the distortion amount. The code amount is multiplied by a predetermined multiplier and added to the distortion amount to obtain an evaluation value. The smallest evaluation value of the combinations of all candidate encoding modes is selected and set as the encoding mode of the pixel block.

残差信号演算部１０８は、並べ替えバッファ１０５から供給される画素ブロック信号から、符号化モード判定部１０７から供給される決定された動き補償予測／視差補償予測ブロック信号を減算し、残差信号を得る。残差信号符号化部１０９は、残差信号演算部１０８から入力された残差信号に対して直交変換、量子化等の残差信号符号化処理を行い、符号化残差信号を算出する。 The residual signal calculation unit 108 subtracts the determined motion compensation prediction / disparity compensation prediction block signal supplied from the coding mode determination unit 107 from the pixel block signal supplied from the rearrangement buffer 105 to obtain a residual signal. Get. The residual signal encoding unit 109 performs residual signal encoding processing such as orthogonal transformation and quantization on the residual signal input from the residual signal calculation unit 108, and calculates an encoded residual signal.

符号化管理部１０１は、当該符号化画像が符号化順序で後に続く画像の動き補償予測、もしくは他の視点の視差補償予測の参照画像として利用されるか否かを管理しており、参照画像として利用される場合は、符号化残差信号を復号し、復号画像信号を復号画像バッファ１１２に画素ブロック単位で順次格納する。 The encoding management unit 101 manages whether the encoded image is used as a reference image for motion compensation prediction of an image that follows in the encoding order, or a parallax compensation prediction of another viewpoint. When the encoded residual signal is used, the encoded residual signal is decoded, and the decoded image signal is sequentially stored in the decoded image buffer 112 in units of pixel blocks.

まず、残差信号復号部１１０は、残差信号符号化部１０９から入力された符号化残差信号に対して、逆量子化、逆直交変換等の残差信号復号処理を行い、復号残差信号を生成する。残差信号重畳部１１１は符号化モード判定部１０７から供給される決定された動き補償予測／視差補償予測ブロック信号に、残差信号復号部１１０から供給される復号残差信号を重畳して、復号画像信号を生成し、その復号画像信号を復号画像バッファ１１２に画素ブロック単位で順次格納する。この復号画像バッファ１１２に格納された復号画像信号は、必要に応じて、符号化順で後に続く画像の動き補償予測、もしくは他の視点の視差補償予測の参照画像となる。 First, the residual signal decoding unit 110 performs residual signal decoding processing such as inverse quantization and inverse orthogonal transformation on the encoded residual signal input from the residual signal encoding unit 109, and thereby obtains a decoded residual. Generate a signal. The residual signal superimposing unit 111 superimposes the decoded residual signal supplied from the residual signal decoding unit 110 on the determined motion compensation prediction / disparity compensation prediction block signal supplied from the coding mode determination unit 107, A decoded image signal is generated, and the decoded image signal is sequentially stored in the decoded image buffer 112 in units of pixel blocks. The decoded image signal stored in the decoded image buffer 112 becomes a reference image for motion compensated prediction of an image that follows in the coding order or parallax compensated prediction of another viewpoint, if necessary.

符号化ビット列生成部１１３は、前述の復号画像出力順番号ｏを含むスライス情報に続いて、符号化モード判定部１０７から入力される決定された符号化モード、及び動きベクトルまたは視差ベクトル、残差信号符号化部１０９から入力される符号化残差信号等をハフマン符号化、算術符号化等のエントロピー符号化を用いて順次符号化し、符号化ビット列を生成する。 The encoded bit string generation unit 113, following the slice information including the decoded image output order number o described above, the determined encoding mode input from the encoding mode determination unit 107, the motion vector or the disparity vector, and the residual An encoded residual signal or the like input from the signal encoding unit 109 is sequentially encoded using entropy encoding such as Huffman encoding or arithmetic encoding to generate an encoded bit string.

以上の画素ブロック単位での符号化処理を符号化画像内のすべての画素ブロックの符号化が完了するまで繰り返す。更に、符号化画像単位での符号化処理をすべての符号化が完了するまで繰り返す。 The above-described encoding process in units of pixel blocks is repeated until encoding of all pixel blocks in the encoded image is completed. Further, the encoding process in units of encoded images is repeated until all encoding is completed.

パラメータセット符号化部１０２で符号化されたシーケンス全体の符号化に関わるパラメータ情報の符号化ビット列、ピクチャの符号化に関わるパラメータ情報の符号化ビット列、及び多視点画像ＳＥＩ符号化部１０３で符号化された多視点画像ＳＥＩの符号化ビット列、及び符号化ビット列生成部１１３で符号化された復号画像出力順番号ｏを含むスライス情報、符号化モード、及び、動きベクトルまたは視差ベクトル、符号化残差信号等の符号化ビット列は、多重化部１１４に供給され、必要に応じて一つの符号化ビット列に多重化される。 Encoded bit sequence of parameter information related to encoding of entire sequence encoded by parameter set encoding unit 102, encoded bit sequence of parameter information related to encoding of picture, and encoded by multi-view image SEI encoding unit 103 Slice information, coding mode, motion vector or disparity vector, and coding residual including the coded bit sequence of the multi-view image SEI that has been performed and the decoded image output sequence number o coded by the coded bit sequence generation unit 113 An encoded bit string such as a signal is supplied to the multiplexing unit 114 and is multiplexed into one encoded bit string as necessary.

この多重化の際には、必要に応じてＭＰＥＧ−２システム方式、ＭＰ４ファイルフォーマット、ＲＴＰ等の規格に基づいてパケット化し、パケット・ヘッダを付加して多重化する。図１の多視点画像符号化装置により一つの符号化ビット列に多重化された符号化ビット列（符号化データ）は、図示しない送信部によりネットワーク等を介して受信側に送信される。ここで、ＶＣＬＮＡＬユニットを伝送するチャンネルとは別にパラメータ・セット等の重要な情報を含むＮＡＬユニットのために信頼性の高いチャンネルが用意されているシステム等においては、チャンネルに応じた複数の符号化ビット列に多重化する。また、本実施の形態はネットワーク伝送のみならず、ＤＶＤ等の蓄積メディアへの記録、ＢＳ／地上波等の放送、有線放送などに利用することもできる。 At the time of multiplexing, packets are packetized based on standards such as MPEG-2 system, MP4 file format, RTP, etc. as necessary, and multiplexed by adding a packet header. The encoded bit sequence (encoded data) multiplexed into one encoded bit sequence by the multi-view image encoding device in FIG. 1 is transmitted to the receiving side via a network or the like by a transmission unit (not shown). Here, in a system or the like in which a highly reliable channel is prepared for a NAL unit including important information such as a parameter set in addition to a channel that transmits the VCL NAL unit, a plurality of codes corresponding to the channel are provided. Multiplexed into a generalized bit string. Further, this embodiment can be used not only for network transmission but also for recording on a storage medium such as a DVD, broadcasting such as BS / terrestrial waves, and cable broadcasting.

次に、図１に示した多視点画像符号化装置による多視点画像符号化処理手順について、図２のフローチャートを参照して説明する。各ステップの処理動作については図１のブロック図を用いて説明したものと同じであるので、ここでは図１と対応付けることで、処理手順のみを説明する。 Next, a multi-view image encoding processing procedure by the multi-view image encoding device shown in FIG. 1 will be described with reference to the flowchart of FIG. Since the processing operation of each step is the same as that described with reference to the block diagram of FIG. 1, only the processing procedure will be described here in association with FIG.

まず、ステップＳ１０１では、シーケンス全体の符号化に関わるパラメータ情報、ピクチャの符号化に関わるパラメータ情報等を符号化し、シーケンス全体の符号化に関わるパラメータ情報の符号化ビット列、ピクチャの符号化に係わるパラメータ情報の符号化ビット列を生成する。このステップＳ１０１の処理は、図１の多視点画像符号化装置ではパラメータセット符号化部１０２での符号化動作に相当する。 First, in step S101, parameter information relating to encoding of the entire sequence, parameter information relating to encoding of the picture, and the like are encoded, an encoded bit string of parameter information relating to encoding of the entire sequence, and parameters relating to encoding of the picture. An encoded bit string of information is generated. The processing in step S101 corresponds to the encoding operation in the parameter set encoding unit 102 in the multi-view image encoding apparatus in FIG.

続いて、ステップＳ１０２では、シーケンス全体の符号化に関わるパラメータ情報の符号化ビット列、ピクチャの符号化に関わるパラメータ情報の符号化ビット列等を多重化し、多重化された符号化ビット列を得る。このステップＳ１０２の処理は、図１の多視点画像符号化装置では多重化部１１４での多重化動作に相当する。 Subsequently, in step S102, an encoded bit string of parameter information related to encoding of the entire sequence, an encoded bit string of parameter information related to encoding of a picture, and the like are multiplexed to obtain a multiplexed encoded bit string. The processing in step S102 corresponds to the multiplexing operation in the multiplexing unit 114 in the multi-view image encoding device in FIG.

次のステップＳ１０３では、多視点画像を符号化されていることを示す多視点画像ＳＥＩ（補足付加情報）を符号化し、多視点画像ＳＥＩの符号化ビット列を生成する。このステップＳ１０３の処理は、図１の多視点画像符号化装置では多視点画像ＳＥＩ符号化部１０３での符号化動作に相当する。続いて、ステップＳ１０４では、ステップＳ１０２で多重化された符号化ビット列に続いて多視点画像ＳＥＩを多重化する。このステップＳ１０４の処理は、図１の多視点画像符号化装置では多重化部１１４での多重化動作に相当する。 In the next step S103, a multi-view image SEI (supplementary additional information) indicating that the multi-view image is encoded is encoded, and an encoded bit string of the multi-view image SEI is generated. The processing in step S103 corresponds to the encoding operation in the multi-view image SEI encoding unit 103 in the multi-view image encoding device of FIG. Subsequently, in step S104, the multi-view image SEI is multiplexed subsequent to the encoded bit string multiplexed in step S102. The processing in step S104 corresponds to the multiplexing operation in the multiplexing unit 114 in the multi-view image encoding device in FIG.

続いて、ステップＳ１０５では、符号化する多視点画像の視点数Ｖ、各視点のそれぞれを特定する視点番号ｖ、及びそれぞれの視点での復号画像の出力順序ｄから復号画像出力順番号ｏを算出する。このステップＳ１０５の処理は、図１の多視点画像符号化装置では復号画像出力順番号算出部１０４での算出動作に相当する。 Subsequently, in step S105, the decoded image output order number o is calculated from the viewpoint number V of the multi-view image to be encoded, the viewpoint number v that identifies each viewpoint, and the output order d of the decoded image at each viewpoint. To do. The processing in step S105 corresponds to a calculation operation in the decoded image output order number calculation unit 104 in the multi-viewpoint image encoding device in FIG.

次のステップＳ１０６では、復号画像出力順番号ｏを符号化し、符号化ビット列を生成する。このステップＳ１０６の処理は、図１の多視点画像符号化装置では符号化ビット列生成部１１３での復号画像出力順番号ｏの符号化動作に相当する。続いて、ステップＳ１０７では、動き補償予測／視差補償予測を行う。このステップＳ１０７の処理は、図１の多視点画像符号化装置では動き／視差補償予測部１０６での処理動作に相当する。 In the next step S106, the decoded image output order number o is encoded to generate an encoded bit string. The processing in step S106 corresponds to the encoding operation of the decoded image output order number o in the encoded bit string generation unit 113 in the multi-view image encoding device in FIG. Subsequently, in step S107, motion compensation prediction / disparity compensation prediction is performed. The processing in step S107 corresponds to the processing operation in the motion / disparity compensation prediction unit 106 in the multi-viewpoint image encoding device in FIG.

続いて、ステップＳ１０８では、符号化モードを判定する。このステップＳ１０８の処理は、図１の多視点画像符号化装置では符号化モード判定部１０７での処理動作に相当する。次のステップＳ１０９では、符号化の対象となる画素ブロック信号から、ステップＳ１０８で決定された動き補償予測／視差補償予測ブロック信号を減算し、残差信号を得る。このステップＳ１０９の処理は、図１の多視点画像符号化装置では残差信号演算部１０８での処理動作に相当する。 Subsequently, in step S108, the encoding mode is determined. The processing in step S108 corresponds to the processing operation in the encoding mode determination unit 107 in the multi-view image encoding device in FIG. In the next step S109, the motion compensation prediction / disparity compensation prediction block signal determined in step S108 is subtracted from the pixel block signal to be encoded to obtain a residual signal. The processing in step S109 corresponds to the processing operation in the residual signal calculation unit 108 in the multi-view image encoding device in FIG.

続いて、ステップＳ１１０では、残差信号に対して、直交変換、量子化等の残差信号符号化処理を行い、符号化残差信号を算出する。このステップＳ１１０の処理は、図１の多視点画像符号化装置では残差信号符号化部１０９での処理動作に相当する。次のステップＳ１１１では、当該符号化画像が符号化順序で後に続く画像の動き補償予測、もしくは他の視点の視差補償予測の参照画像として利用されるか否かを判断する。このステップＳ１１１の処理は、図１の多視点画像符号化装置では符号化管理部１０１での参照画像として利用されるか否かの判断動作に相当する。参照画像として利用されると判断した場合はステップＳ１１２に進み、参照画像として利用されないと判断した場合はステップＳ１１５に進む。 Subsequently, in step S110, residual signal encoding processing such as orthogonal transformation and quantization is performed on the residual signal to calculate an encoded residual signal. The processing in step S110 corresponds to the processing operation in the residual signal encoding unit 109 in the multi-view image encoding device in FIG. In the next step S111, it is determined whether the encoded image is used as a reference image for motion compensation prediction of an image that follows in the encoding order or parallax compensation prediction of another viewpoint. The processing in step S111 corresponds to a determination operation as to whether or not to be used as a reference image in the encoding management unit 101 in the multi-view image encoding device in FIG. If it is determined that the image is used as a reference image, the process proceeds to step S112. If it is determined that the image is not used as a reference image, the process proceeds to step S115.

続いて、ステップＳ１１２では、符号化残差信号に対して、逆量子化、逆直交変換等の残差信号復号処理を行い、復号残差信号を生成する。このステップＳ１１２の処理は、図１の多視点画像符号化装置では残差信号復号部１１０での処理動作に相当する。続いて、ステップＳ１１３では、決定された動き補償予測／視差補償予測ブロック信号に、復号残差信号を重畳し、復号画像信号を生成する。このステップＳ１１３の処理は、図１の多視点画像符号化装置では残差信号重畳部１１１での処理動作に相当する。復号画像信号は適宜画素ブロック単位で復号画像バッファに格納する。 Subsequently, in step S112, the encoded residual signal is subjected to residual signal decoding processing such as inverse quantization and inverse orthogonal transform to generate a decoded residual signal. The processing in step S112 corresponds to the processing operation in the residual signal decoding unit 110 in the multi-view image encoding device in FIG. Subsequently, in step S113, the decoded residual signal is superimposed on the determined motion compensation prediction / disparity compensation prediction block signal to generate a decoded image signal. The processing in step S113 corresponds to the processing operation in the residual signal superimposing unit 111 in the multi-view image encoding device in FIG. The decoded image signal is stored in the decoded image buffer in units of pixel blocks as appropriate.

続いて、ステップＳ１１４では、ステップＳ１０６での復号画像出力順番号ｏの符号化ビット列と共に、ステップＳ１０８で決定された符号化モード、及び動きベクトルまたは視差ベクトル、ステップＳ１０８で得られた符号化残差信号等を符号化し、符号化ビット列を生成する。このステップＳ１１４の処理は、図１の多視点画像符号化装置では残差信号重畳部１１１での処理動作に相当する。 Subsequently, in step S114, the encoded bit sequence of the decoded image output order number o in step S106, the encoding mode determined in step S108, and the motion vector or disparity vector, the encoding residual obtained in step S108. A signal or the like is encoded to generate an encoded bit string. The processing in step S114 corresponds to the processing operation in the residual signal superimposing unit 111 in the multi-view image encoding device in FIG.

続いて、ステップＳ１１５では、符号化画像内の全ての画素ブロックについて符号化処理が完了しているか否かを判断する。完了している場合、ステップＳ１１６に進む。完了していない場合、ステップＳ１０７に進み、符号化画像内の全ての画素ブロックについて符号化処理が完了するまでステップＳ１０７からステップＳ１１４までの処理を繰り返す。 Subsequently, in step S115, it is determined whether or not the encoding process has been completed for all the pixel blocks in the encoded image. If completed, the process proceeds to step S116. If not completed, the process proceeds to step S107, and the processes from step S107 to step S114 are repeated until the encoding process is completed for all the pixel blocks in the encoded image.

続いて、ステップＳ１１６では、ステップＳ１０２、ステップＳ１０４で多重化されたビット列に続いて、復号画像出力順番号ｏ、符号化モード、及び、動きベクトルまたは視差ベクトル、符号化残差信号等の符号化ビット列を必要に応じて一つの符号化ビット列、または複数の符号化ビット列に適宜多重化する。このステップＳ１１６の処理は、図１の多視点画像符号化装置では多重化部１１４での多重化動作に相当する。 Subsequently, in step S116, following the bit sequence multiplexed in step S102 and step S104, the decoded image output order number o, the encoding mode, and the motion vector or disparity vector, encoding residual signal, and the like are encoded. The bit string is appropriately multiplexed into one encoded bit string or a plurality of encoded bit strings as necessary. The processing in step S116 corresponds to the multiplexing operation in the multiplexing unit 114 in the multi-view image encoding device in FIG.

次に、ネットワークを介して伝送する場合の多重化部１１４での多重化及び送信処理手順について、図１６を用いて説明する。ステップＳ３０１では必要に応じてＭＰＥＧ−２システム方式、ＭＰ４ファイルフォーマット、ＲＴＰ等の規格に基づいてパケット化する。ステップＳ３０２では必要に応じてＭＰＥＧ−２システム方式、ＭＰ４ファイルフォーマット、ＲＴＰ等の規格に基づいてパケット・ヘッダを付加する。ステップＳ３０３ではネットワークを介して送信する。 Next, a multiplexing and transmission processing procedure in the multiplexing unit 114 when transmitting via a network will be described with reference to FIG. In step S301, packetization is performed based on standards such as MPEG-2 system, MP4 file format, RTP, and the like as necessary. In step S302, a packet header is added according to standards such as MPEG-2 system, MP4 file format, RTP, etc. as necessary. In step S303, transmission is performed via the network.

再び図２に戻って説明する。ステップＳ１１７では、符号化の対象となる多視点画像の全ての画像について符号化処理が完了したか否かを判断する。完了している場合、本多視点画像符号化処理手順が終了となる。完了していない場合、ステップＳ１０５に進み、符号化の対象となる多視点画像の全ての画像について符号化処理が完了するまでステップＳ１０５からステップＳ１０６までの処理を繰り返す。 Returning again to FIG. In step S117, it is determined whether or not the encoding process has been completed for all images of the multi-viewpoint image to be encoded. If completed, the multi-viewpoint image encoding processing procedure ends. If not completed, the process proceeds to step S105, and the process from step S105 to step S106 is repeated until the encoding process is completed for all the images of the multi-viewpoint image to be encoded.

このように、図１の多視点画像符号化装置を備えた送信装置によれば、多視点画像信号を構成する各視点の画像信号の視点を特定する視点番号ｖとそれぞれの視点での復号画像の出力順序を示す番号ｄを一括で示す復号画像出力順番号ｏを多視点画像信号の視点の数Ｖと共に符号化して送信し、以下説明する受信側でこのようにして符号化された符号化データを受信及び復号させることで、上記の視点番号ｖとそれぞれの視点での復号画像の出力順序を示す番号ｄに応じて多視点画像表示装置等に適切に出力させることができる。 As described above, according to the transmission apparatus including the multi-viewpoint image encoding apparatus of FIG. 1, the viewpoint number v that specifies the viewpoint of the image signal of each viewpoint that constitutes the multi-viewpoint image signal and the decoded image at each viewpoint. A decoded image output order number o that collectively indicates an output order d is encoded and transmitted together with the number of viewpoints V of the multi-view image signal, and is encoded in this way on the receiving side described below By receiving and decoding the data, the multi-viewpoint image display device and the like can be appropriately output according to the viewpoint number v and the number d indicating the output order of the decoded image at each viewpoint.

次に、図１の多視点画像符号化装置を備えた送信装置により送信された符号化データを受信して復号する本発明の多視点画像受信方法、多視点画像受信装置及び多視点画像受信用プログラムについて図面を参照して説明する。図１７は本発明になる多視点画像受信装置の要部の多視点画像復号装置の一実施の形態のブロック図を示す。 Next, the multi-view image receiving method, multi-view image receiving apparatus, and multi-view image receiving method according to the present invention for receiving and decoding the encoded data transmitted by the transmitting apparatus including the multi-view image encoding apparatus of FIG. The program will be described with reference to the drawings. FIG. 17 is a block diagram showing an embodiment of a multi-view image decoding apparatus as a main part of the multi-view image receiving apparatus according to the present invention.

図１７に示すように、この多視点画像復号装置は、分離部２０１、パラメータセット復号部２０２、多視点画像ＳＥＩ復号部２０３、符号化ビット列復号部２０４、復号画像管理情報算出部２０５、動き／視差補償予測部２０６、予測信号合成部２０７、残差信号復号部２０８、残差信号重畳部２０９、復号画像バッファ２１０、復号画像管理部２１１、復号画像出力部２１２を備え、図示しない受信部により受信し、かつ、必要に応じて復調された多視点画像信号を符号化した符号化ビット列が入力され、これを復号して多視点画像信号を出力する。 As shown in FIG. 17, the multi-view image decoding apparatus includes a separation unit 201, a parameter set decoding unit 202, a multi-view image SEI decoding unit 203, an encoded bit string decoding unit 204, a decoded image management information calculation unit 205, a motion / A parallax compensation prediction unit 206, a prediction signal synthesis unit 207, a residual signal decoding unit 208, a residual signal superimposing unit 209, a decoded image buffer 210, a decoded image management unit 211, and a decoded image output unit 212 are provided. An encoded bit string obtained by encoding a multi-view image signal received and demodulated as necessary is input, and is decoded to output a multi-view image signal.

次に、本発明受信装置の要部である図１７に示す多視点画像復号装置の動作について、ＡＶＣ／Ｈ.２６４符号化方式と関連付けて説明する。まず、分離部２０１には図１に示した多視点画像符号化装置により符号化され、ネットワークを介して送信された符号化ビット列を受信する。なお、本方式での符号化ビット列の供給形態はネットワーク伝送での受信のみならず、ＤＶＤ等の蓄積メディアに記録された符号化ビット列を読み込んだり、ＢＳ／地上波等の放送で放映された符号化ビット列を受信することもできる。 Next, the operation of the multi-view image decoding apparatus shown in FIG. 17, which is the main part of the receiving apparatus of the present invention, will be described in association with the AVC / H.264 encoding method. First, the separation unit 201 receives an encoded bit string that has been encoded by the multi-view image encoding apparatus illustrated in FIG. 1 and transmitted via the network. It should be noted that the encoded bit string supply form in this system is not only received via network transmission, but also a code bit string recorded on a storage medium such as a DVD, or a code broadcast on BS / terrestrial broadcasts. An encoded bit string can also be received.

更に、分離部２０１では供給される符号化ビット列からパケット・ヘッダを除去し、ＮＡＬユニット単位に分離する。更に、分離されたＮＡＬユニットのヘッダ部に含まれるＮＡＬユニットの種類を見分ける識別子（nal_unit_type）を評価し、当該ＮＡＬユニットがシーケンス全体の符号化に関わるパラメータ情報、ピクチャの符号化に関わるパラメータ情報等が符号化されている符号化ビット列の場合は、パラメータセット復号部２０２に供給し、当該ＮＡＬユニットが多視点画像ＳＥＩが符号化されている符号化ビット列の場合は、多視点画像ＳＥＩ復号部２０３に供給し、当該ＮＡＬユニットがＶＣＬＮＡＬユニット、即ち復号画像出力順番号ｏ、符号化モード、及び、動き／視差ベクトル、符号化残差信号等が符号化されている符号化ビット列の場合は、符号化ビット列復号部２０４に供給する。ただし、分離されたＮＡＬユニットが多視点画像ＳＥＩか否かを見分ける際には、ＮＡＬユニットの種類を見分ける識別子でＳＥＩと判断した後、ＳＥＩのペイロード部の種類を見分ける識別子を評価し、この識別子が多視点画像ＳＥＩであることを示している場合に、多視点画像ＳＥＩと判断する。 Further, the separation unit 201 removes the packet header from the supplied encoded bit string, and separates the NAL unit. Further, an identifier (nal_unit_type) for identifying the type of the NAL unit included in the header portion of the separated NAL unit is evaluated, parameter information related to coding of the entire sequence, parameter information related to picture coding, etc. Is encoded bit sequence encoded in the multi-view image SEI decoding unit 203, the NAL unit is an encoded bit sequence in which the multi-view image SEI is encoded. When the NAL unit is a VCL NAL unit, that is, a decoded image output order number o, a coding mode, and a coded bit string in which a motion / disparity vector, a coded residual signal, and the like are coded, This is supplied to the encoded bit string decoding unit 204. However, when discriminating whether or not the separated NAL unit is a multi-viewpoint image SEI, an identifier that distinguishes the type of the NAL unit is judged as SEI, and then an identifier that distinguishes the type of the payload portion of the SEI is evaluated. Indicates that it is a multi-view image SEI.

次に、パラメータセット復号部２０２は分離部２０１で分離されたシーケンス全体の符号化に関わるパラメータ情報、ピクチャの符号化に関わるパラメータ情報等が符号化されている符号化ビット列を復号し、シーケンス全体の符号化に関わるパラメータ情報、ピクチャの符号化に関わるパラメータ情報等を得る（これらのパラメータ情報は、本多視点画像復号装置全体の処理で用いられるが、本発明の説明には用いないので、図面には図示しない。）。 Next, the parameter set decoding unit 202 decodes the encoded bit string in which the parameter information related to the encoding of the entire sequence separated by the separating unit 201, the parameter information related to the encoding of the picture, and the like are encoded, and the entire sequence Parameter information related to the encoding of the image, parameter information related to the encoding of the picture, etc. (These parameter information is used in the processing of the entire multi-viewpoint image decoding apparatus, but is not used in the description of the present invention. Not shown in the drawing).

多視点画像ＳＥＩ復号部２０３は分離部２０１で分離された多視点画像ＳＥＩが符号化されている符号化ビット列を、図３を用いて説明したシンタックス構造に基づいて復号し、本多視点画像復号装置に供給される符号化された符号化ビット列のコンテンツが多視点画像であるという情報、及び視点数の情報を得る。ここで、図４の「num_views_minus1」は視点数Ｖから”１”を減じた値が符号化されているので「num_views_minus1」の値に”１”を加えた値を視点数Ｖとする。 The multi-view image SEI decoding unit 203 decodes the encoded bit string obtained by encoding the multi-view image SEI separated by the separation unit 201 based on the syntax structure described with reference to FIG. Information indicating that the content of the encoded coded bit string supplied to the decoding device is a multi-view image and information on the number of viewpoints are obtained. Here, since “num_views_minus1” in FIG. 4 is encoded by a value obtained by subtracting “1” from the number of viewpoints V, the value obtained by adding “1” to the value of “num_views_minus1” is set as the number of viewpoints V.

ここで、本多視点画像復号装置に供給される符号化ビット列に多視点画像ＳＥＩが含まれていない場合、符号化ビット列のコンテンツが多視点画像でないと判断し、従来の単視点のＡＶＣ／Ｈ.２６４符号化方式で符号化された符号化ビット列とみなして、視点数を”１”として復号する。 Here, when the multi-view image SEI is not included in the encoded bit sequence supplied to the multi-view image decoding apparatus, it is determined that the content of the encoded bit sequence is not a multi-view image, and the conventional single-view AVC / H It is regarded as an encoded bit string encoded by the .264 encoding method, and decoding is performed with the number of viewpoints set to “1”.

符号化ビット列復号部２０４は分離部２０１で分離された復号画像出力順番号ｏ、符号化モード、及び、動き／視差ベクトル、符号化残差信号（符号化された予測残差信号）等が符号化されている符号化ビット列を復号し、復号画像出力順番号ｏ、符号化モード、及び、動き／視差ベクトル、符号化残差信号等を得る。 The encoded bit string decoding unit 204 encodes the decoded image output order number o, the encoding mode, the motion / disparity vector, the encoded residual signal (encoded prediction residual signal), and the like separated by the separating unit 201. The encoded bit sequence is decoded to obtain a decoded image output order number o, an encoding mode, a motion / disparity vector, an encoded residual signal, and the like.

復号画像管理情報算出部２０５は、多視点画像ＳＥＩ復号部２０３から供給される視点数Ｖと符号化ビット列復号部２０４から供給される復号画像出力順番号ｏとから（１）式を満たす各画像の視点を特定する視点番号ｖ（ただし、ｖは０以上Ｖ未満の整数）とそれぞれの視点での復号画像の出力順序を示す番号ｄ（整数）を算出する。具体的には、番号ｄは復号画像出力順番号ｏを視点数Ｖで整数演算により除算して得た商とする。また、番号ｖは復号画像出力順番号ｏを視点数Ｖで整数演算により除算したときの剰余の値とする。または、視点番号ｖは番号ｄを算出した後で（２）式により算出してもよい。 The decoded image management information calculation unit 205 uses the viewpoint number V supplied from the multi-view image SEI decoding unit 203 and the decoded image output order number o supplied from the encoded bit string decoding unit 204 to satisfy each image satisfying Expression (1). The viewpoint number v (where v is an integer between 0 and less than V) and the number d (integer) indicating the output order of the decoded image at each viewpoint are calculated. Specifically, the number d is a quotient obtained by dividing the decoded image output order number o by the number of viewpoints V by integer arithmetic. The number v is a remainder value obtained by dividing the decoded image output order number o by the number of viewpoints V by integer arithmetic. Alternatively, the viewpoint number v may be calculated by the equation (2) after calculating the number d.

このようにして、視点数Ｖと復号画像出力順番号ｏとから、各画像の視点を特定する視点番号ｖとそれぞれの視点での復号画像の出力順序を示す番号ｄが算出することで、図５、図６に示すように復号画像出力順番号ｏに値を割り当てた場合はもちろんのこと、ネットワークを介した伝送中のエラーによりパケットが欠落した場合にも、正常にどの視点のどの画像が欠落したのかを特定することができる。 In this way, the viewpoint number v for specifying the viewpoint of each image and the number d indicating the output order of the decoded image at each viewpoint are calculated from the number of viewpoints V and the decoded image output order number o. 5. As shown in FIG. 6, not only when a value is assigned to the decoded image output order number o, but also when a packet is lost due to an error during transmission through the network, which image of which viewpoint is normally It is possible to identify whether it is missing.

復号画像管理情報算出部２０５で算出された視点を特定する視点番号ｖ及びそれぞれの視点での復号画像の出力順序を示す番号ｄは、多視点画像ＳＥＩ復号部２０３から供給される視点数Ｖや、符号化ビット列復号部２０４から供給される復号画像出力順番号ｏと共に、後述する復号画像管理部２１１に供給され、復号画像バッファ２１０に格納される復号画像の管理に用いる。 The viewpoint number v specifying the viewpoint calculated by the decoded image management information calculation unit 205 and the number d indicating the output order of the decoded image at each viewpoint are the number of viewpoints V supplied from the multi-viewpoint image SEI decoding unit 203, Along with the decoded image output order number o supplied from the encoded bit string decoding unit 204, the decoded image output order number o is supplied to a decoded image management unit 211, which will be described later, and used for management of the decoded image stored in the decoded image buffer 210.

ここで、多視点画像復号装置では、符号化の場合と同様に、従来の単一視点の復号方式を多視点復号方式として拡張する際に、本方式の復号画像出力順番号ｏを従来の単一視点の復号方式の復号画像の出力順序を示す番号として扱うことで、従来の復号方式との互換をとることができる。 Here, in the multi-view image decoding apparatus, when the conventional single-view decoding method is expanded as a multi-view decoding method, as in the case of encoding, the decoded image output order number o of the present method is set to the conventional single-view decoding method. By treating it as a number indicating the output order of the decoded image of the decoding method of one viewpoint, compatibility with the conventional decoding method can be achieved.

例えば、ＡＶＣ／Ｈ.２６４方式を多視点符号化方式に拡張する際には、本方式の復号画像出力順番号ｏをＡＶＣ／Ｈ.２６４方式の復号画像の出力順序を示す番号であるピクチャ・オーダー・カウント（picture order count）として扱う。また、多視点画像復号装置では符号化側で符号化された順序で復号するため、符号化順序は復号順序と等しくなる。更に、符号化ビット列復号部２０４では、復号する画素ブロックの符号化モード、動きベクトルまたは視差ベクトル、符号化残差信号（符号化された予測残差信号）等の情報を得る。 For example, when the AVC / H.264 system is expanded to the multi-viewpoint encoding system, the decoded image output order number o of this system is a picture / number indicating the output order of the decoded image of the AVC / H.264 system. Treat as an order count. In addition, since the multi-view image decoding apparatus performs decoding in the order encoded on the encoding side, the encoding order is equal to the decoding order. Further, the encoded bit string decoding unit 204 obtains information such as an encoding mode of a pixel block to be decoded, a motion vector or a disparity vector, an encoded residual signal (encoded prediction residual signal), and the like.

続いて、動き／視差補償予測部２０６は、符号化ビット列復号部２０４で復号された符号化モード、及び動きベクトル／視差ベクトルに応じて、動き補償予測／視差補償予測を行う。この動き補償予測／視差補償予測では、符号化モードに応じて復号画像バッファ２１０から供給される画像を参照し、符号化ビット列復号部２０４で復号された動きベクトル／視差ベクトルが指し示す位置の画素ブロックを動き補償予測／視差補償予測ブロックとする。上記の画素ブロックのサイズは小ブロックに分割され、それぞれの小ブロックの予測方法、動きベクトル／視差ベクトルが異なる場合もある。また、複数の参照ピクチャから予測されている場合もある。このような場合は、複数の動き補償予測／視差補償予測を行い、複数の予測ブロックを得る。 Subsequently, the motion / disparity compensation prediction unit 206 performs motion compensation prediction / disparity compensation prediction according to the coding mode decoded by the coded bit string decoding unit 204 and the motion vector / disparity vector. In this motion compensation prediction / disparity compensation prediction, a pixel block at a position indicated by the motion vector / disparity vector decoded by the encoded bit string decoding unit 204 with reference to an image supplied from the decoded image buffer 210 according to the encoding mode. Is a motion compensation prediction / parallax compensation prediction block. The size of the pixel block is divided into small blocks, and the prediction method and motion vector / disparity vector of each small block may be different. In some cases, prediction is performed from a plurality of reference pictures. In such a case, a plurality of motion compensation predictions / disparity compensation predictions are performed to obtain a plurality of prediction blocks.

予測信号合成部２０７は、当該画素ブロックが小ブロックに分割されている場合や、複数の参照ピクチャから予測されている場合は複数の予測ブロックを合成し、当該画素ブロックの予測信号を生成する。一方、残差信号復号部２０８は、符号化ビット列復号部２０４から入力された符号化残差信号に対して、逆量子化、逆直交変換等の残差信号復号処理を行い、復号残差信号を生成する。 The prediction signal synthesis unit 207 synthesizes a plurality of prediction blocks when the pixel block is divided into small blocks or is predicted from a plurality of reference pictures, and generates a prediction signal of the pixel block. On the other hand, the residual signal decoding unit 208 performs a residual signal decoding process such as inverse quantization and inverse orthogonal transform on the encoded residual signal input from the encoded bit string decoding unit 204, thereby obtaining a decoded residual signal. Is generated.

残差信号重畳部２０９は、予測信号合成部２０７から供給される予測信号に、残差信号復号部２０８から供給される復号残差信号を重畳して復号画像信号を算出し、復号画像バッファ２１０に画素ブロック単位で復号画像信号を順次格納する。この復号画像バッファ２１０に格納された復号画像信号は、必要に応じて、符号化順で後に続く画像を復号する際の参照画像となる。 The residual signal superimposing unit 209 calculates a decoded image signal by superimposing the decoded residual signal supplied from the residual signal decoding unit 208 on the prediction signal supplied from the prediction signal combining unit 207, and outputs a decoded image buffer 210. The decoded image signals are sequentially stored in pixel block units. The decoded image signal stored in the decoded image buffer 210 serves as a reference image when decoding subsequent images in the encoding order as necessary.

以上の画素ブロック単位での復号処理を画素ブロック単位で復号画像内のすべての画素ブロックの復号が完了するまで繰り返す。 The above-described decoding process for each pixel block is repeated until the decoding of all the pixel blocks in the decoded image is completed for each pixel block.

復号画像管理部２１１は、復号画像管理情報算出部２０５から供給される視点数Ｖ、復号画像出力順番号ｏ、各画像の視点を特定する視点番号ｖ、及びそれぞれの視点での復号画像の出力順序を示す番号ｄと復号画像バッファ２１０に格納された復号画像信号を対応付けて管理する。復号画像管理部２１１は、これらのパラメータを基に復号画像バッファ２１０に格納された復号画像を出力するかどうか判定する。 The decoded image management unit 211 outputs the number V of viewpoints supplied from the decoded image management information calculation unit 205, the decoded image output order number o, the viewpoint number v that identifies the viewpoint of each image, and the output of the decoded image at each viewpoint. The number d indicating the order and the decoded image signal stored in the decoded image buffer 210 are managed in association with each other. The decoded image management unit 211 determines whether or not to output the decoded image stored in the decoded image buffer 210 based on these parameters.

復号順序と復号画像の出力順序が異なり、出力のタイミングが異なる場合は遅延が必要となり、復号画像を出力しない場合もある。復号画像管理部２１１では、復号画像バッファ２１０に格納されている復号画像信号のそれぞれについて、番号ｖにより視点を特定し、番号ｄによりそれぞれの視点での復号画像の出力順序を管理して各視点の復号画像信号の番号ｄの値が等しい画像を同時、または連続的に出力するように各視点を互いに同期させ、番号ｄの値が小さいものから順に出力するように制御する。復号画像出力部２１２は、復号画像バッファ２１０に格納された復号画像を復号画像管理部２１１の制御に応じて、各視点の復号画像信号を互いに同期させて多視点画像表示装置等に出力する。ここで、各画像の視点を特定する視点番号ｖやそれぞれの視点での復号画像の出力順序を示す番号ｄを各視点の復号画像信号と関連付けて、必要に応じてこれらのパラメータと共に同時に出力する。 When the decoding order and the output order of the decoded image are different and the output timing is different, a delay is necessary, and the decoded image may not be output. The decoded image management unit 211 identifies viewpoints for each decoded image signal stored in the decoded image buffer 210 by a number v, manages the output order of the decoded images at each viewpoint by a number d, and manages each viewpoint. Control is performed so that the viewpoints are synchronized with each other so that images with the same value of the number d of the decoded image signals are output simultaneously or continuously, and output in ascending order of the value of the number d. The decoded image output unit 212 outputs the decoded image stored in the decoded image buffer 210 to the multi-view image display device or the like in synchronization with the decoded image signals of the respective viewpoints according to the control of the decoded image management unit 211. Here, the viewpoint number v that identifies the viewpoint of each image and the number d that indicates the output order of the decoded image at each viewpoint are associated with the decoded image signal of each viewpoint, and output together with these parameters as necessary. .

復号された視点画像Ｍ（ｖ）の各視点の画像を互いに同期させて出力する方法としては、各視点の画像信号をそれぞれ独立したチャンネルで並列に出力する方法と、各視点の画像信号をインターリーブして１つのチャンネルでシリアルに出力する方法がある。復号画像出力部２１２で各視点の画像信号をそれぞれ独立したチャンネルで並列に出力する場合には、それぞれの視点での復号画像の出力順序を示す番号ｄの値が小さいものから順に、各視点の同時刻の画像信号、すなわちそれぞれの視点での復号画像の出力順序を示す番号ｄが等しい各視点の復号画像信号を互いに同期させてそれぞれ出力する。 As a method of outputting the images of each viewpoint of the decoded viewpoint image M (v) in synchronization with each other, a method of outputting the image signals of each viewpoint in parallel on independent channels, and an interleaving of the image signals of each viewpoint There is a method of serially outputting with one channel. When the decoded image output unit 212 outputs the image signals of the respective viewpoints in parallel on independent channels, the number d indicating the output order of the decoded images at the respective viewpoints is ordered in ascending order. The image signals at the same time, that is, the decoded image signals of the respective viewpoints having the same number d indicating the output order of the decoded images at the respective viewpoints are output in synchronization with each other.

前述の多視点画像符号化装置の説明で用いた図４を用いて説明すると、各視点画像Ｍ（ｖ）のそれぞれについて、各画像ｍ（ｖ，ｄ）の番号ｄの値が小さいものから順に出力させることで、復号画像を出力先の多視点画像表示装置等で表示する際に望ましい順序で出力させることができる。また、番号ｄの値が同じである各視点の復号画像信号を同時刻に出力することで、各視点の復号画像信号を互いに同期させることができる。その際、すべての視点の画像信号が復号された後に復号画像信号の出力を開始することで、各視点の復号画像信号を欠落することなく出力することができる。 If it demonstrates using FIG. 4 used by description of the above-mentioned multiview image coding apparatus, for each viewpoint image M (v), the value of the number d of each image m (v, d) is in order from the smallest. By outputting the decoded images, the decoded images can be output in a desirable order when displayed on an output destination multi-viewpoint image display device or the like. Further, by outputting the decoded image signals of the respective viewpoints having the same value of the number d at the same time, the decoded image signals of the respective viewpoints can be synchronized with each other. At that time, by starting output of the decoded image signal after the image signals of all the viewpoints are decoded, the decoded image signals of the respective viewpoints can be output without being lost.

復号画像出力部２１２で各視点をインターリーブした信号として１つのチャンネルでシリアルに出力する場合には、それぞれの視点での復号画像の出力順序を示す番号ｄの値が小さいものから順に、各視点の同時刻の画像、すなわちそれぞれの視点での復号画像の出力順序を示す番号ｄが等しい各視点の復号画像信号を互いにインターリーブすることで同期させて出力する。各視点をインターリーブした信号としてシリアルに出力する方法としては、それぞれの視点の信号を画素単位でインターリーブする方法、複数の画素を纏めた単位でインターリーブする方法、水平方向のライン単位でインターリーブする方法、画像単位でインターリーブする方法、複数の画像を纏めた単位でインターリーブする方法等がある。 When the decoded image output unit 212 serially outputs a signal as an interleaved view of each viewpoint using one channel, the number d indicating the output order of the decoded image at each viewpoint is counted in ascending order. Images at the same time, that is, the decoded image signals of the respective viewpoints having the same number d indicating the output order of the decoded images at the respective viewpoints, are synchronized with each other by interleaving. As a method of serially outputting each viewpoint as an interleaved signal, a method of interleaving each viewpoint signal in units of pixels, a method of interleaving in units of a plurality of pixels, a method of interleaving in units of horizontal lines, There are a method of interleaving in units of images, a method of interleaving in units of a plurality of images, and the like.

出力する復号画像信号のインターリーブ構造については図７〜図１４に示した前述の多視点画像符号化装置に入力される視点画像のインターリーブ構造と同様である。それぞれ独立したチャンネルで並列に出力する場合と同様に、それぞれの視点の信号を画素単位でインターリーブする方法、複数の画素を纏めた単位でインターリーブする方法、水平方向のライン単位でインターリーブする方法では例えば図７〜図１２に示したようにそれぞれの視点での復号画像の出力順序を示す番号ｄの値が小さいものから順に、番号ｄが等しい各視点のそれぞれの画像信号を画素単位、複数の画素を纏めた単位、水平方向のライン単位でインターリーブして出力することで、各視点の画像信号を同時刻に出力することができ、視点間を互いに同期して出力することができる。この際、インターリーブの対象となるすべての視点の画像信号が復号された後に復号画像信号のインターリーブ、及び出力を開始することで、各視点の復号画像信号を欠落することなく出力することができる。 The interleave structure of the decoded image signal to be output is the same as the interleave structure of the viewpoint image input to the above-described multi-view image encoding device shown in FIGS. As in the case of outputting in parallel on independent channels, the method of interleaving each viewpoint signal in units of pixels, the method of interleaving in units of a plurality of pixels, and the method of interleaving in units of horizontal lines, for example, As shown in FIG. 7 to FIG. 12, each image signal of each viewpoint having the same number d is assigned to a plurality of pixels in order from the smallest number d indicating the output order of the decoded image at each viewpoint. Are interleaved and output in units of horizontal lines and output in units of horizontal lines, so that the image signals of the viewpoints can be output at the same time, and the viewpoints can be output in synchronization with each other. At this time, by starting the interleaving and output of the decoded image signal after the image signals of all viewpoints to be interleaved are decoded, the decoded image signals of the respective viewpoints can be output without being lost.

また、画像単位でインターリーブする方法では、図１３に示したように番号ｄが等しい各視点のそれぞれの画像信号を１つのグループとし、そのグループにおいて、画像単位で連続的に番号ｖの値が小さいものから順に出力する。さらに、番号ｄの値が小さいグループから順に出力することで、画像単位で各視点を互いに同期して出力することができる。 Further, in the method of interleaving in units of images, as shown in FIG. 13, the image signals of the viewpoints having the same number d are grouped into one group, and the value of the number v is continuously small in units of images in that group. Output in order. Furthermore, by outputting in order from the group having the smallest value of the number d, the viewpoints can be output in synchronization with each other in units of images.

以上のように、図１７に示す多視点画像復号装置において、復号画像バッファ２１０に格納された復号画像を多視点画像表示装置等に出力する際には出力先の表示装置等の入力に合わせた形式、すなわち前記各視点がそれぞれ独立したチャンネルで並列に出力する方法、または前記各視点がインターリーブされた信号として１つのチャンネルでシリアルに出力する方法で出力する。従って、従来例の立体視画像復号化方法及び装置のように復号後に１フレーム又は１フィールド毎に順次配列して出力してから同時化する必要が無く、同時化のための画像バッファを持たず、遅延時間を短くすることができるという効果を得ることができる。 As described above, in the multi-view image decoding apparatus shown in FIG. 17, when outputting the decoded image stored in the decoded image buffer 210 to the multi-view image display apparatus or the like, it matches the input of the output destination display apparatus or the like. The output is performed in a format, that is, a method in which each viewpoint is output in parallel on independent channels, or a method in which each viewpoint is serially output on one channel as an interleaved signal. Therefore, unlike the conventional stereoscopic image decoding method and apparatus, there is no need to synchronize after decoding by sequentially arranging every frame or field after decoding, and there is no image buffer for synchronization. The effect that the delay time can be shortened can be obtained.

以上の復号処理を復号の対象となる符号化ビット列のすべての復号処理が完了するまで繰り返す。 The above decoding process is repeated until all the decoding processes of the encoded bit string to be decoded are completed.

なお、上記の説明においては、復号画像管理部２１１では、復号画像管理情報算出部２０５から供給される視点数Ｖ、復号画像出力順番号ｏ、各画像の視点を特定する視点番号ｖ、及びそれぞれの視点での復号画像の出力順序を示す番号ｄと復号画像バッファ２１０に格納された復号画像信号を対応付けて管理するものとして説明したが、算出して得られる各画像の視点を特定する視点番号ｖの代わりに復号画像の管理用に視点を特定する視点ＩＤを用意し、その視点ＩＤに基づいて復号画像の視点を管理したり、復号画像を出力したりすることもできる。 In the above description, in the decoded image management unit 211, the number V of viewpoints supplied from the decoded image management information calculation unit 205, the decoded image output order number o, the viewpoint number v specifying the viewpoint of each image, and each In the above description, the number d indicating the output order of the decoded image from the viewpoint of the image and the decoded image signal stored in the decoded image buffer 210 are managed in association with each other. However, the viewpoint that specifies the viewpoint of each image obtained by calculation Instead of the number v, a viewpoint ID for specifying a viewpoint can be prepared for managing the decoded image, and the viewpoint of the decoded image can be managed or the decoded image can be output based on the viewpoint ID.

ただし、算出して得られる各画像の視点を特定する視点番号ｖと復号画像の管理用に視点を特定する視点ＩＤは1対１で対応する必要がある。即ち、両者は同じ値でも、違う値でもよい。復号画像の管理用に視点を特定する視点ＩＤは必要に応じて復号装置で生成してもよいし、符号化側で復号画像の管理用に視点を特定する視点ＩＤを符号化し、その値を用いてもよい。符号化側で復号画像の管理用に視点を特定する視点ＩＤを符号化する場合は、図１の符号化管理部１０１で復号画像の管理用に視点を特定する視点ＩＤと各視点のそれぞれを特定する視点番号ｖとの１対１の対応関係を管理し、多視点画像ＳＥＩ符号化部１０３でその１対１の対応関係が復号側で判別できるように視点ＩＤを符号化する。 However, the viewpoint number v that specifies the viewpoint of each image obtained by calculation and the viewpoint ID that specifies the viewpoint for managing the decoded image need to correspond one-to-one. That is, both may be the same value or different values. The viewpoint ID for specifying the viewpoint for management of the decoded image may be generated by the decoding device as necessary, or the viewpoint ID for specifying the viewpoint for management of the decoded image is encoded on the encoding side and the value is set. It may be used. When encoding the viewpoint ID for identifying the viewpoint for managing the decoded image on the encoding side, the viewpoint ID for identifying the viewpoint for managing the decoded image by the encoding management unit 101 in FIG. A one-to-one correspondence relationship with the specified viewpoint number v is managed, and the viewpoint ID is encoded so that the multi-view image SEI encoding unit 103 can determine the one-to-one correspondence relationship on the decoding side.

この場合に、多視点画像ＳＥＩ符号化部１０３で符号化する多視点画像ＳＥＩの符号化シンタックス構造の一例を図１９に示す。図１９（ａ）において、「num_views_minus1」は図３で説明したのと同様に符号化する多視点画像の視点数を示すパラメータであり、視点数は”１”以上となるため、視点数から”１”を引いた値を、ＡＶＣ／Ｈ.２６４符号化方式で用意されている指数ゴロム（Exponential Golomb）符号等の可変長や予め長さを規定した固定長で符号化する。また、view_id[v]は各視点のそれぞれを特定する視点番号ｖが指し示す視点の復号画像の管理用に視点を特定する視点ＩＤを示し、ＡＶＣ／Ｈ.２６４符号化方式で用意されている指数ゴロム（Exponential Golomb）符号等の可変長や予め長さを規定した固定長で符号化する。また、forループは、各画像の視点を特定する視点番号ｖの値が０の視点から昇順にそれぞれの視点ＩＤが符号化されることを示す。 In this case, an example of the encoding syntax structure of the multi-view image SEI encoded by the multi-view image SEI encoding unit 103 is shown in FIG. In FIG. 19A, “num_views_minus1” is a parameter indicating the number of viewpoints of a multi-view image to be encoded in the same manner as described in FIG. 3, and the number of viewpoints is “1” or more. A value obtained by subtracting 1 ″ is encoded with a variable length such as an exponential Golomb code prepared in the AVC / H.264 encoding method or a fixed length with a predetermined length. View_id [v] indicates a viewpoint ID for specifying a viewpoint for management of a decoded image of the viewpoint indicated by the viewpoint number v specifying each viewpoint, and is an index prepared in the AVC / H.264 encoding method. Encode with variable length such as Exponential Golomb code or fixed length with pre-defined length. The for loop indicates that the viewpoint IDs are encoded in ascending order from the viewpoint having the viewpoint number v that specifies the viewpoint of each image having a value of 0.

復号側では図１７の多視点画像ＳＥＩ復号部２０３で、図１９（ａ）に示すシンタックス構造に従って復号し、多視点画像の視点数Ｖに加えて、各視点のそれぞれを特定する視点番号ｖが指し示す視点の復号画像の管理用に視点を特定する視点ＩＤであるview_id[v]を復号する。この各視点のそれぞれを特定する視点番号ｖと視点ＩＤとの１対１の対応関係は復号画像管理情報算出部２０５を通じて、復号画像管理部２１１で管理する。また、復号画像出力部２１２では必要に応じて各画像の視点を特定する視点番号ｖの代わりに視点ＩＤと各視点の復号画像信号とを関連付けて、同時に出力する。 On the decoding side, the multi-view image SEI decoding unit 203 in FIG. 17 performs decoding according to the syntax structure shown in FIG. 19A, and in addition to the number of viewpoints V of the multi-view image, a viewpoint number v that identifies each viewpoint. View_id [v], which is a viewpoint ID for specifying the viewpoint, is managed for management of the decoded image of the viewpoint indicated by. The one-to-one correspondence between the viewpoint number v that identifies each viewpoint and the viewpoint ID is managed by the decoded image management unit 211 through the decoded image management information calculation unit 205. In addition, the decoded image output unit 212 associates the viewpoint ID with the decoded image signal of each viewpoint, instead of the viewpoint number v that specifies the viewpoint of each image, as necessary, and outputs them simultaneously.

また、図１９（ａ）を用いた説明においては各画像の視点を特定する視点番号ｖが０の視点から昇順にＩＤを符号化するものして説明したが、同一時刻における各視点の符号化／復号順に視点ＩＤを符号化することもできる。この場合に、多視点画像ＳＥＩ符号化部１０３で符号化する多視点画像ＳＥＩの符号化シンタックス構造の一例を図１９（ｂ）に示す。iは同一時刻における各視点の符号化／復号順序を示し、各視点のiは符号化／復号順序に応じて０から昇順に１つずつ増加する値を持つ。view_id[i]は同一時刻における各視点の符号化／復号順を示す番号iが指し示す視点の復号画像の管理用に視点を特定する視点ＩＤを示し、ＡＶＣ／Ｈ.２６４符号化方式で用意されている指数ゴロム（Exponential Golomb）符号等の可変長や予め長さを規定した固定長で符号化する。また、forループは同一時刻における各視点の符号化／復号順を示す番号iの値が０の視点から昇順に視点ＩＤが符号化されることを示す。 Further, in the description using FIG. 19A, the ID is encoded in ascending order from the viewpoint having the viewpoint number v 0 that specifies the viewpoint of each image, but the encoding of each viewpoint at the same time is described. The viewpoint ID can be encoded in the decoding order. In this case, an example of an encoding syntax structure of the multi-view image SEI encoded by the multi-view image SEI encoding unit 103 is shown in FIG. i indicates the encoding / decoding order of each viewpoint at the same time, and i of each viewpoint has a value that increases one by one in ascending order from 0 according to the encoding / decoding order. view_id [i] indicates a viewpoint ID for identifying a viewpoint for management of a decoded image of the viewpoint indicated by the number i indicating the encoding / decoding order of each viewpoint at the same time, and is prepared in the AVC / H.264 encoding scheme. It is encoded with a variable length such as an exponential Golomb code or a fixed length with a predetermined length. The for loop indicates that viewpoint IDs are encoded in ascending order from a viewpoint having a value of number i indicating the encoding / decoding order of each viewpoint at the same time.

復号側では図１７の多視点画像ＳＥＩ復号部２０３で、図１９（ｂ）に示すシンタックス構造に従って復号し、多視点画像の視点数Ｖに加えて、同一時刻における各視点の符号化／復号順を示す番号iが指し示す視点の復号画像の管理用に視点を特定する視点ＩＤであるview_id[i]を復号する。この視点ＩＤと同一時刻における各視点の符号化／復号順を示す番号iとは１対１で対応し、各視点のそれぞれを特定する視点番号ｖと同一時刻における各視点の符号化／復号順を示す番号iとは１対１で対応するので、各視点のそれぞれを特定する視点番号ｖと視点ＩＤも１対１で対応する。これらの１対１の対応関係は復号画像管理情報算出部２０５を通じて、復号画像管理部２１１で管理する。また、復号画像出力部２１２では必要に応じて各画像の視点を特定する視点番号ｖの代わりに視点ＩＤと各視点の復号画像信号とを関連付けて、同時に出力する。 On the decoding side, the multi-view image SEI decoding unit 203 in FIG. 17 performs decoding according to the syntax structure shown in FIG. 19B, and encodes / decodes each viewpoint at the same time in addition to the number of viewpoints V of the multi-view image. View_id [i], which is a viewpoint ID for specifying the viewpoint, is decoded for management of the decoded image of the viewpoint indicated by the number i indicating the order. The viewpoint ID and the number i indicating the encoding / decoding order of each viewpoint at the same time have a one-to-one correspondence, and the encoding / decoding order of each viewpoint at the same time as the viewpoint number v that identifies each viewpoint. Therefore, the viewpoint number v for identifying each viewpoint and the viewpoint ID also have a one-to-one correspondence. These one-to-one correspondences are managed by the decoded image management unit 211 through the decoded image management information calculation unit 205. In addition, the decoded image output unit 212 associates the viewpoint ID with the decoded image signal of each viewpoint, instead of the viewpoint number v that specifies the viewpoint of each image, as necessary, and outputs them simultaneously.

次に、本発明になる多視点画像受信方法による多視点画像復号処理手順について、図１８のフローチャートを参照して説明する。各ステップの処理動作については図１７のブロック図を用いて説明したものと同じであるので、ここでは図１７と対応付けることで、処理手順のみを説明する。 Next, the multi-view image decoding processing procedure by the multi-view image receiving method according to the present invention will be described with reference to the flowchart of FIG. Since the processing operation of each step is the same as that described with reference to the block diagram of FIG. 17, only the processing procedure will be described here in association with FIG.

まず、ステップＳ２０１では、受信した符号化ビット列をＮＡＬユニット単位に分離する。このステップＳ２０１は、ネットワークを介して送信された多視点画像符号化データを受信する受信装置において、図２０に示す受信及び分離処理手順が行われる。図２０のステップＳ４０１では図１７には図示しない受信部にてネットワークを介して符号化ビット列を受信する。ステップＳ４０２では上記受信部にて受信した符号化ビット列に用いられたＭＰＥＧ−２システム方式、ＭＰ４ファイルフォーマット、ＲＴＰ等の規格に基づいて付加されたパケット・ヘッダを復号して除去する。この受信された符号化ビット列を図１７の分離部２０１でＮＡＬユニット単位で分離する（ステップＳ４０３）。 First, in step S201, the received encoded bit string is separated into NAL unit units. In step S201, the reception and separation processing procedure shown in FIG. 20 is performed in the reception apparatus that receives the multi-view image encoded data transmitted via the network. In step S401 in FIG. 20, a coded bit string is received via a network by a receiving unit (not shown in FIG. 17). In step S402, the packet header added based on the MPEG-2 system method, MP4 file format, RTP, etc. used in the encoded bit string received by the receiving unit is decoded and removed. The received encoded bit string is separated in units of NAL units by the separation unit 201 in FIG. 17 (step S403).

図１８に戻って説明する。図１８のステップＳ２０１で分離されたＮＡＬユニットのヘッダ部に含まれるＮＡＬユニットの種類を見分ける識別子（nal_unit_type）を評価し、当該ＮＡＬユニットがシーケンス全体の符号化に関わるパラメータ情報、ピクチャの符号化に関わるパラメータ情報等のパラメータセットであるか否か判定し（ステップＳ２０２）、パラメータセットの場合、ステップＳ２０６に進み、パラメータセットではなくＳＥＩと判定された場合（ステップＳ２０３）、ステップＳ２０５に進む。ステップＳ２０５では、ＳＥＩのペイロード部の種類を見分ける識別子を評価し、多視点画像ＳＥＩの場合、ステップＳ２０７に進む。 Returning to FIG. The identifier (nal_unit_type) for identifying the type of the NAL unit included in the header part of the NAL unit separated in step S201 in FIG. 18 is evaluated, and the NAL unit is used to encode the parameter information and the picture related to the coding of the entire sequence. It is determined whether or not it is a parameter set such as related parameter information (step S202). If it is a parameter set, the process proceeds to step S206, and if it is determined not to be a parameter set but SEI (step S203), the process proceeds to step S205. In step S205, an identifier for identifying the type of the payload portion of the SEI is evaluated, and in the case of the multi-view image SEI, the process proceeds to step S207.

また、当該ＮＡＬユニットがパラメータセットでも、ＳＥＩでもない場合は、ステップＳ２０４に進む。ステップＳ２０４では当該ＮＡＬユニットがＶＣＬＮＡＬユニットであるか、即ち復号画像出力順番号ｏ、符号化モード、動きベクトルまたは視差ベクトル、符号化残差信号等が符号化されている符号化ビット列であるかを判定し、ＶＣＬＮＡＬユニットである場合、ステップＳ２０８に進む。これらのステップＳ２０１、Ｓ２０２、Ｓ２０３、Ｓ２０４、Ｓ２０５の処理は、図１７の多視点画像復号装置では分離部２０１での処理動作に相当する。 If the NAL unit is neither a parameter set nor SEI, the process proceeds to step S204. In step S204, whether the NAL unit is a VCL NAL unit, that is, a decoded bit sequence in which a decoded image output order number o, a coding mode, a motion vector or a disparity vector, a coded residual signal, and the like are coded. If it is a VCL NAL unit, the process proceeds to step S208. The processing in steps S201, S202, S203, S204, and S205 corresponds to the processing operation in the separation unit 201 in the multi-viewpoint image decoding apparatus in FIG.

ＮＡＬユニットがパラメータセットの場合、ステップＳ２０６では、シーケンス全体の符号化に関わるパラメータ情報の符号化ビット列、ピクチャの符号化に係わるパラメータ情報の符号化ビット列を復号し、シーケンス全体の符号化に関わるパラメータ情報、ピクチャの符号化に関わるパラメータ情報等を得る。このステップＳ２０６の処理は、図１７の多視点画像復号装置ではパラメータセット復号部２０２での復号動作に相当する。続いて、ステップＳ２０１の分離処理に戻る。 When the NAL unit is a parameter set, in step S206, a parameter bit related to encoding of the entire sequence is decoded in step S206 by decoding an encoded bit sequence of parameter information related to encoding of the entire sequence and an encoded bit sequence of parameter information related to encoding of a picture. Information, parameter information related to picture coding, and the like are obtained. The processing in step S206 corresponds to the decoding operation in the parameter set decoding unit 202 in the multi-view image decoding device in FIG. Subsequently, the process returns to the separation process in step S201.

ステップＳ２０５でＮＡＬユニットが多視点画像ＳＥＩであると判定された場合、ステップＳ２０７に進み、多視点画像ＳＥＩの符号化ビット列を復号し、符号化された符号化ビット列のコンテンツが多視点画像であるという情報、及び視点数Ｖの情報を得る。このステップＳ２０７の処理は、図１７の多視点画像復号装置ではパラメータセット復号部２０２での復号動作に相当する。続いて、ステップＳ２０１の分離処理に戻る。 If it is determined in step S205 that the NAL unit is a multi-view image SEI, the process proceeds to step S207, where the encoded bit sequence of the multi-view image SEI is decoded, and the encoded encoded bit sequence content is a multi-view image. And information on the number of viewpoints V are obtained. The processing in step S207 corresponds to a decoding operation in the parameter set decoding unit 202 in the multi-viewpoint image decoding apparatus in FIG. Subsequently, the process returns to the separation process in step S201.

ステップＳ２０４でＮＡＬユニットがＶＣＬＮＡＬユニットと判定された場合、以下のステップＳ２０８からステップＳ２１８までの処理手順を行う。まず、ステップＳ２０８では、復号画像出力順番号ｏ等の情報が含まれている符号化ビット列を復号し、復号画像出力順番号ｏを得る。このステップＳ２０８の処理は、図１７の多視点画像復号装置では符号化ビット列復号部２０４での復号画像出力順番号ｏの復号動作に相当する。 If it is determined in step S204 that the NAL unit is a VCL NAL unit, the following processing procedure from step S208 to step S218 is performed. First, in step S208, an encoded bit string including information such as a decoded image output order number o is decoded to obtain a decoded image output order number o. The processing in step S208 corresponds to the decoding operation of the decoded image output order number o in the encoded bit string decoding unit 204 in the multi-view image decoding device in FIG.

続いて、ステップＳ２０９では、ステップＳ２０７で得られた視点数ＶとステップＳ２０８で得られた復号画像出力順番号ｏとから、各画像の視点を特定する視点番号ｖ（ただし、ｖは０以上Ｖ未満の整数）とそれぞれの視点での復号画像の出力順序を示す番号ｄ（整数）を算出する。このステップＳ２０９の処理は、図１７の多視点画像復号装置では復号画像管理情報算出部２０５での復号動作に相当する。 Subsequently, in step S209, the viewpoint number v for specifying the viewpoint of each image from the number of viewpoints V obtained in step S207 and the decoded image output order number o obtained in step S208 (where v is 0 or more V And a number d (integer) indicating the output order of the decoded image at each viewpoint. The processing in step S209 corresponds to the decoding operation in the decoded image management information calculation unit 205 in the multi-viewpoint image decoding apparatus in FIG.

続いて、ステップＳ２１０では、符号化モード、及び、動き／視差ベクトル、符号化残差信号（符号化された予測残差信号）等の情報が含まれている符号化ビット列を復号し、符号化モード、及び、動き／視差ベクトル、符号化残差信号等を得る。このステップＳ２０８の処理は、図１７の多視点画像復号装置では符号化ビット列復号部２０４での符号化モード、及び、動き／視差ベクトル、符号化残差信号等の復号動作に相当する。 Subsequently, in step S210, an encoded bit sequence including information such as an encoding mode, a motion / disparity vector, and an encoded residual signal (encoded prediction residual signal) is decoded and encoded. The mode, motion / disparity vector, encoded residual signal, etc. are obtained. The processing in step S208 corresponds to the encoding mode in the encoded bit string decoding unit 204 and the decoding operation of the motion / disparity vector, the encoded residual signal, and the like in the multi-view image decoding apparatus in FIG.

続いて、ステップＳ２１１では、ステップＳ２１０で得られた符号化モード、及び、動き／視差ベクトルとから、動き補償予測／視差補償予測を行い、予測ブロックを得る。当該画素ブロックが小ブロックに分割されている場合や、複数の参照ピクチャから予測されている場合は複数の動き補償予測／視差補償予測を行い、複数の予測ブロックを得る。このステップＳ２１１の処理は、図１７の多視点画像復号装置では動き／視差補償予測部２０６での処理動作に相当する。 Subsequently, in step S211, motion compensation prediction / disparity compensation prediction is performed from the coding mode obtained in step S210 and the motion / disparity vector to obtain a prediction block. When the pixel block is divided into small blocks or when predicted from a plurality of reference pictures, a plurality of motion compensation predictions / disparity compensation predictions are performed to obtain a plurality of prediction blocks. The processing in step S211 corresponds to the processing operation in the motion / disparity compensation prediction unit 206 in the multi-viewpoint image decoding apparatus in FIG.

続いて、ステップＳ２１２では、当該画素ブロックが小ブロックに分割されている場合や、複数の参照ピクチャから予測されている場合はステップＳ２１１で得られた複数の予測ブロックを合成し、当該画素ブロックの予測信号とする。このステップＳ２１２の処理は、図１７の多視点画像復号装置では予測信号合成部２０７での処理動作に相当する。 Subsequently, in step S212, when the pixel block is divided into small blocks, or when predicted from a plurality of reference pictures, a plurality of prediction blocks obtained in step S211 are synthesized, and Let it be a prediction signal. The processing in step S212 corresponds to the processing operation in the prediction signal synthesis unit 207 in the multi-viewpoint image decoding apparatus in FIG.

続いて、ステップＳ２１３では、ステップＳ２１０で得られた符号化残差信号に対して、逆量子化、逆直交変換等の残差信号復号処理を行い、復号残差信号を生成する。このステップＳ２１３の処理は、図１７の多視点画像復号装置では残差信号復号部２０８での処理動作に相当する。 Subsequently, in step S213, residual signal decoding processing such as inverse quantization and inverse orthogonal transform is performed on the encoded residual signal obtained in step S210 to generate a decoded residual signal. The processing in step S213 corresponds to the processing operation in the residual signal decoding unit 208 in the multi-viewpoint image decoding apparatus in FIG.

続いて、ステップＳ２１４では、ステップＳ２１２で得られた予測信号に、ステップＳ２１３で得られた復号残差信号を重畳して復号画像信号を得る。このステップＳ２１４の処理は、図１７の多視点画像復号装置では残差信号重畳部２０９での処理動作に相当する。次のステップＳ２１５では、ステップＳ２１４で得られた復号画像信号予測信号を復号バッファに格納する。このステップＳ２１５の処理は、図１７の多視点画像復号装置では復号画像バッファ２１０への格納動作に相当する。 Subsequently, in step S214, a decoded image signal is obtained by superimposing the decoded residual signal obtained in step S213 on the prediction signal obtained in step S212. The processing in step S214 corresponds to the processing operation in the residual signal superimposing unit 209 in the multi-viewpoint image decoding apparatus in FIG. In the next step S215, the decoded image signal prediction signal obtained in step S214 is stored in the decoding buffer. The processing in step S215 corresponds to the storing operation in the decoded image buffer 210 in the multi-viewpoint image decoding apparatus in FIG.

続いて、ステップＳ２１６では、ＶＣＬＮＡＬユニット内の全ての画素ブロックについて復号処理が完了しているか否かを判断する。完了している場合、ステップＳ２１７に進む。完了していない場合、ステップＳ２１０に進み、ＶＣＬＮＡＬユニット内の全ての画素ブロックについて符号化処理が完了するまでステップＳ２１０からステップＳ２１５までの処理を繰り返す。 Subsequently, in step S216, it is determined whether or not the decoding process has been completed for all pixel blocks in the VCL NAL unit. If completed, the process proceeds to step S217. If not completed, the process proceeds to step S210, and the processes from step S210 to step S215 are repeated until the encoding process is completed for all the pixel blocks in the VCL NAL unit.

続いて、ステップＳ２１７では、復号画像を出力するか否かを判断する。出力すると判断した場合、ステップＳ２１８に進み、出力しないと判断した場合、ステップＳ２１９に進む。このステップＳ２１７の処理は、図１７の多視点画像復号装置では復号画像管理部２１１の管理動作に相当する。ステップＳ２１７で出力すると判断した場合、ステップＳ２１８に進み、復号画像を同期させて出力する。このステップＳ２１８の処理は、図１７の多視点画像復号装置では復号画像出力部２１２での処理動作に相当する。 Subsequently, in step S217, it is determined whether or not to output a decoded image. If it is determined to output, the process proceeds to step S218, and if it is determined not to output, the process proceeds to step S219. The processing in step S217 corresponds to the management operation of the decoded image management unit 211 in the multi-viewpoint image decoding apparatus in FIG. If it is determined in step S217 to output, the process proceeds to step S218, and the decoded image is output in synchronization. The processing in step S218 corresponds to the processing operation in the decoded image output unit 212 in the multi-viewpoint image decoding apparatus in FIG.

また、ステップＳ２１７で復号画像を出力しないと判断した場合、ステップＳ２１９に進み、復号の対象となる符号化ビット列のすべての復号処理が完了したか否かを判断する。完了している場合、本多視点画像復号処理手順が終了となる。完了していない場合、最初のステップＳ２０１に戻り、復号の対象となる符号化ビット列のすべての復号処理が完了するまでステップＳ２０１からステップＳ２１８までの処理を繰り返す。 If it is determined in step S217 that the decoded image is not output, the process proceeds to step S219, and it is determined whether or not all the decoding processes for the encoded bit string to be decoded have been completed. If completed, this multi-viewpoint image decoding processing procedure ends. If not completed, the process returns to the first step S201, and the processes from step S201 to step S218 are repeated until all the decoding processes of the encoded bit string to be decoded are completed.

それに加えて、図１の多視点画像符号化装置では、多視点画像信号を構成する各画像の視点を特定する視点番号ｖとそれぞれの視点での復号画像の出力順序を示す番号ｄを一括で示す復号画像出力順番号ｏを多視点画像信号の視点の数Ｖと共に符号化しているため、多視点画像復号装置では、このようにして符号化された符号化データを復号画像出力順番号ｏの復号信号に基づいて確実に復号することができ、多視点画像信号を構成する各視点画像の視点を特定する視点番号ｖとそれぞれの視点での復号画像の出力順序を示す番号ｄに応じて、多視点画像表示装置等に適切に出力することができる。 In addition, in the multi-view image encoding apparatus of FIG. 1, a view number v that specifies the viewpoint of each image that constitutes the multi-view image signal and a number d that indicates the output order of the decoded image at each view are collectively displayed. Since the decoded image output order number o shown is encoded together with the number of viewpoints V of the multi-view image signal, the multi-view image decoding apparatus converts the encoded data encoded in this way into the decoded image output order number o. In accordance with the viewpoint number v that can be reliably decoded based on the decoded signal and that identifies the viewpoint of each viewpoint image that constitutes the multi-viewpoint image signal, and the number d that indicates the output order of the decoded image at each viewpoint, It can be appropriately output to a multi-viewpoint image display device or the like.

また、従来の単一視点の画像符号化／復号方式を多視点画像符号化／復号方式に拡張する際に、本方式の各視点画像の視点を特定する視点番号ｖとそれぞれの視点での復号画像の出力順序を示す番号ｄを一括で示す復号画像出力順番号ｏを従来の単一視点の画像符号化／復号方式の復号画像の出力順序を示す番号（例えば、ＡＶＣ／Ｈ.２６４のpicture order count）として扱い、符号化／復号することで、小さな改良により従来の単一視点の画像符号化／復号方式との互換を取ることができるという効果を得ることができる。 In addition, when the conventional single-viewpoint image encoding / decoding method is extended to the multi-viewpoint image encoding / decoding method, the viewpoint number v for specifying the viewpoint of each viewpoint image of this method and the decoding at each viewpoint The decoded image output order number o that collectively indicates the number d indicating the output order of the images is the number that indicates the output order of the decoded images in the conventional single-viewpoint image encoding / decoding scheme (for example, picture of AVC / H.264) order count) and encoding / decoding can achieve an effect that compatibility with a conventional single-viewpoint image encoding / decoding method can be achieved with small improvements.

なお、本発明は以上の実施の形態に限定されるものではなく、例えば、図１の多視点画像符号化装置において、画像を入力する際に、遅延させることなく、適宜並べ替えバッファ１０５に入力し、格納したが、入力される画像の形式と並べ替えバッファ１０５の格納形式が異なるなど、並べ替えバッファ１０５への格納時に画素の並び替えが必要な場合、一時記憶用のラインバッファを設け、入力画像信号をそのラインバッファに一時的に書き込んでから適宜画素を並び替えて並べ替えバッファ１０５に格納したり、画素を並び替えて上記のラインバッファに一時的に書き込んでから適宜並べ替えバッファ１０５に格納したりすることもできる。 The present invention is not limited to the above embodiment. For example, in the multi-view image encoding apparatus of FIG. 1, when inputting an image, the image is input to the rearrangement buffer 105 as appropriate without delay. However, if it is necessary to rearrange the pixels when storing in the rearrangement buffer 105, such as when the input image format is different from the storage format of the rearrangement buffer 105, a line buffer for temporary storage is provided. After the input image signal is temporarily written in the line buffer, the pixels are rearranged as appropriate and stored in the rearrangement buffer 105, or after the pixels are rearranged and temporarily written in the line buffer, the rearrangement buffer 105 is changed as appropriate. Can also be stored.

同様に、図１７の多視点画像復号装置において、復号画像バッファ２１０に格納された復号画像を多視点画像表示装置等に出力する際に、画素の並び替えが必要な場合、復号画像出力部２１２の内部、または外部に一時記憶用のラインバッファを設け、復号画像バッファ２１０から読み出した復号画像信号をそのラインバッファに一時的に書き込んでから適宜画素を並び替えて出力したり、復号画像バッファ２１０から読み出した復号画像信号を画素を並び替えて上記のラインバッファに一時的に書き込んでから適宜出力したりすることもできる。 Similarly, when the decoded image stored in the decoded image buffer 210 is output to the multi-view image display device or the like in the multi-view image decoding device of FIG. A line buffer for temporary storage is provided inside or outside of this, and after the decoded image signal read from the decoded image buffer 210 is temporarily written in the line buffer, the pixels are rearranged and output as appropriate, or the decoded image buffer 210 is output. The decoded image signal read out from can be rearranged in pixels and temporarily written in the line buffer, and then output as appropriate.

また、以上の図１７の多視点画像復号装置の説明においては、復号画像出力部２１２では、復号画像バッファ２１０に格納された復号画像を復号画像管理情報算出部２０５で視点数Ｖと復号画像出力順番号ｏから算出された視点を特定する視点番号ｖとそれぞれの視点での復号画像の出力順序を示す番号ｄを基に復号画像管理部２１１の制御に応じて、復号された各視点の画像信号を互いに同期させて出力したが、各視点を画像単位でインターリーブして１つのチャンネルでシリアルに出力する場合には復号画像出力順番号ｏをもとに出力することもできる。この場合、各復号画像の復号画像出力順番号ｏの値が小さいものから順に各画像の視点を特定する視点番号ｖまたは視点ＩＤを各視点の復号画像信号と関連付けて、これらのパラメータと共に同時に出力させることで、各視点の復号画像信号を画像単位でインターリーブして互いに同期させて出力させることができる。ここで説明した画像単位での出力順序は図１３を用いて説明した画像単位での出力順序と等価である。このように、各復号画像の復号画像出力順番号ｏに応じて復号画像を出力する場合にも、復号画像管理情報算出部２０５で視点数Ｖと復号画像出力順番号ｏから算出された各画像の視点を特定する視点番号ｖまたは視点ＩＤと関連付けて、これらのパラメータと共に同時に出力させることで、適切に各視点の復号画像信号を出力することができ、復号画像の出力先の多視点画像表示装置等で視点と画像の対応関係を把握することができる。 In the above description of the multi-view image decoding apparatus in FIG. 17, the decoded image output unit 212 outputs the decoded image stored in the decoded image buffer 210 to the decoded image management information calculation unit 205 and the number of viewpoints V and the decoded image output. Based on the viewpoint number v that identifies the viewpoint calculated from the order number o and the number d that indicates the output order of the decoded image at each viewpoint, the decoded image of each viewpoint is controlled according to the control of the decoded image management unit 211. The signals are output in synchronization with each other. However, when the viewpoints are interleaved in units of images and serially output in one channel, they can be output based on the decoded image output order number o. In this case, the viewpoint number v or viewpoint ID for specifying the viewpoint of each image is associated with the decoded image signal of each viewpoint in order from the smallest decoded image output order number o of each decoded image, and is output together with these parameters. By doing so, the decoded image signal of each viewpoint can be interleaved in units of images and output in synchronization with each other. The output order in units of images described here is equivalent to the output order in units of images described with reference to FIG. Thus, even when a decoded image is output according to the decoded image output order number o of each decoded image, each image calculated by the decoded image management information calculation unit 205 from the viewpoint number V and the decoded image output order number o. The decoded image signal of each viewpoint can be appropriately output by associating with the viewpoint number v or the viewpoint ID for specifying the viewpoint, and simultaneously outputting them together with these parameters, and the multi-view image display of the output destination of the decoded image The correspondence between the viewpoint and the image can be grasped by an apparatus or the like.

また、以上の説明においては、符号化、復号に用いる多視点画像は異なる視点から実際に撮影された多視点画像を符号化、復号することもできるが、実際には撮影していない仮想的な視点の位置を周辺の視点から補間する等、変換または生成された視点画像を符号化、復号することもでき、本発明に含まれる。また、コンピュータグラフィックス等の多視点画像を符号化、復号することもでき、本発明に含まれる。 In the above description, multi-viewpoint images used for encoding and decoding can be encoded and decoded from multi-viewpoint images actually captured from different viewpoints. It is also possible to encode and decode a viewpoint image that has been converted or generated, such as by interpolating the viewpoint position from surrounding viewpoints, and is included in the present invention. In addition, multi-view images such as computer graphics can be encoded and decoded, and is included in the present invention.

例えば、Ａ，Ｂ，Ｃ，Ｄの４つの視点の画像信号を備えた多視点画像信号は、（１）４つの視点の画像信号がすべて各視点で実際に撮影して得られた画像信号である場合、（２）４つの視点の画像信号がすべて各視点で仮想的に撮影したものとして生成した画像信号である場合、（３）Ａ，Ｂ視点の画像信号が各視点で実際に撮影して得られた画像信号、Ｃ，Ｄ視点の画像信号が各視点で仮想的に撮影したものとして生成した画像信号といったように、実際に撮影して得られた画像信号と仮想的に撮影したものと生成した画像信号とが混在している場合の３つの場合が想定される。 For example, a multi-viewpoint image signal including image signals of four viewpoints A, B, C, and D is (1) an image signal obtained by actually photographing all four viewpoint image signals at each viewpoint. In some cases, (2) when all four viewpoint image signals are virtually taken at each viewpoint, (3) A and B viewpoint image signals are actually captured at each viewpoint. The image signal obtained by actually shooting and the image signal obtained by actually shooting, such as the image signal obtained by virtually capturing the image signal of the C and D viewpoints and the image signals of the C and D viewpoints. There are three cases where the generated image signal and the generated image signal are mixed.

また、本発明で用いる多視点画像の各視点の位置はどのような配置でもよい。このことについて、図２１〜図２４と共に説明する。図２１〜図２４中に示される番号は視点位置を示す視点番号ｖ（ｖ＝０，１，２，・・・）である。図２１は視点を水平方向に配置した例である。カメラを水平方向に並べて撮影されたものである。図２２は視点を垂直方向に配置した例である。カメラを垂直方向に並べて撮影されたものである。図２３、図２４は視点を水平／垂直２次元の方向に配置した例である。カメラを水平／垂直２次元の方向に配置し並べて撮影されたものである。視点を特定する視点番号ｖの値は各視点と１対１で対応し、０以上視点数Ｖ未満の整数をそれぞれ割り当てければならないが、カメラパラメータ等で符号化側と復号側で整合性が取れればどのような順番でもよい。 In addition, the position of each viewpoint of the multi-viewpoint image used in the present invention may be any arrangement. This will be described with reference to FIGS. The numbers shown in FIGS. 21 to 24 are viewpoint numbers v (v = 0, 1, 2,...) Indicating viewpoint positions. FIG. 21 shows an example in which the viewpoints are arranged in the horizontal direction. The images were taken with the cameras arranged horizontally. FIG. 22 shows an example in which the viewpoints are arranged in the vertical direction. The picture was taken with the cameras arranged vertically. 23 and 24 show examples in which the viewpoints are arranged in a two-dimensional horizontal / vertical direction. The images were taken with the cameras arranged in two horizontal / vertical directions. The value of the viewpoint number v that identifies the viewpoint has a one-to-one correspondence with each viewpoint and must be assigned an integer that is greater than or equal to 0 and less than the number V of viewpoints. Any order is acceptable.

なお、以上の多視点画像符号化、及び復号に関する処理は、ハードウェアを用いた伝送、蓄積、受信装置として実現することができるのは勿論のこと、ＲＯＭ（リード・オンリ・メモリ）やフラッシュメモリ等に記憶されているファームウェアや、コンピュータ等のソフトウェアによっても実現することができる。コンピュータ等のソフトウェアとして実現する場合はコンピュータ上で汎用的に使われるＲＡＭ（ランダム・アクセス・メモリ）を並べ替えバッファ、復号画像バッファとして用いることができる。また、符号化された符号化ビット列のネットワークを介した伝送の際には、コンピュータ上に実装されているネットワーク・インターフェース等を介して伝送することができる。そのファームウェアプログラム、ソフトウェアプログラムをコンピュータ等で読み取り可能な記録媒体に記録して提供することも、有線あるいは無線のネットワークを通してサーバから提供することも、地上波あるいは衛星ディジタル放送のデータ放送として提供することも可能である。 The above multi-view image encoding and decoding processes can be realized as a transmission, storage, and reception device using hardware, as well as a ROM (Read Only Memory) and a flash memory. It can also be realized by firmware stored in the computer or software such as a computer. When realized as software such as a computer, a RAM (Random Access Memory) generally used on the computer can be used as a rearrangement buffer and a decoded image buffer. In addition, when the encoded coded bit string is transmitted through the network, it can be transmitted through a network interface or the like mounted on the computer. The firmware program and software program can be recorded on a computer-readable recording medium, provided from a server through a wired or wireless network, or provided as a data broadcast of terrestrial or satellite digital broadcasting Is also possible.

本発明により受信する多視点画像の符号化データを生成する多視点画像符号化装置の一例のブロック図である。It is a block diagram of an example of the multi-view image encoding apparatus which produces | generates the encoding data of the multi-view image received by this invention. 図１の多視点画像符号化処理説明用フローチャートである。3 is a flowchart for explaining multi-view image encoding processing in FIG. 1. 第一の多視点画像ＳＥＩの符号化シンタックス構造の一例を説明する図である。It is a figure explaining an example of the encoding syntax structure of 1st multiview image SEI. 各画像の視点を特定する視点番号ｖ及びそれぞれの視点での復号画像の出力順序を示す番号ｄに値を割り当てた場合の一例を説明する図である。It is a figure explaining an example at the time of assigning a value to the viewpoint number v which specifies the viewpoint of each image, and the number d which shows the output order of the decoded image in each viewpoint. 第一の各画像の復号画像出力順番号ｏに値を割り当てた場合の一例を説明する図である。It is a figure explaining an example at the time of assigning a value to decoded image output order number o of each 1st picture. 第二の各画像の復号画像出力順番号ｏに値を割り当てた場合の一例を説明する図である。It is a figure explaining an example at the time of assigning a value to decoded image output order number o of each 2nd picture. 各視点の信号を画素単位でインターリーブした場合の一例を説明する図である。It is a figure explaining an example at the time of interleaving the signal of each viewpoint per pixel. 各視点の信号を複数の画素を纏めた単位でインターリーブした場合の一例を説明する図である。It is a figure explaining an example at the time of interleaving the signal of each viewpoint by the unit which put together the several pixel. 各視点の信号を１６×１６、１６×８画素等の画素ブロック単位でインターリーブした場合の一例を説明する図である。It is a figure explaining an example at the time of interleaving the signal of each viewpoint in pixel block units, such as 16x16 and 16x8 pixels. 各視点の信号を水平方向のライン単位でインターリーブした場合の一例を説明する図である。It is a figure explaining an example at the time of interleaving the signal of each viewpoint by the line unit of a horizontal direction. 各視点の信号を１つの画像に纏めた形式でインターリーブした場合の一例を説明する図である。It is a figure explaining an example at the time of interleaving in the form which put together the signal of each viewpoint in one picture. 各視点の信号を複数のラインを纏めたスライス単位でインターリーブした場合の一例を説明する図である。It is a figure explaining an example at the time of interleaving the signal of each viewpoint in the slice unit which put together the some line. 各視点の信号を画像単位でインターリーブした場合の一例を説明する図である。It is a figure explaining an example at the time of interleaving the signal of each viewpoint for every image. 各視点の信号を複数の画像を纏めた単位でインターリーブした場合の一例を説明する図である。It is a figure explaining an example at the time of interleaving the signal of each viewpoint in the unit which put together the some image. 各画像の符号化順序、及び動き補償／視差補償の参照関係の一例を説明する図である。It is a figure explaining an example of the encoding relationship of each image, and the reference relationship of motion compensation / parallax compensation. 図２の多重化処理説明用のフローチャートである。3 is a flowchart for explaining multiplexing processing of FIG. 2. 本発明の多視点画像受信装置の要部である多視点画像復号装置の一実施の形態のブロック図である。It is a block diagram of one Embodiment of the multiview image decoding apparatus which is the principal part of the multiview image receiving apparatus of this invention. 図１７の多視点画像復号処理説明用のフローチャートである。18 is a flowchart for explaining the multi-view image decoding process of FIG. 17. 第二の多視点画像ＳＥＩの符号化シンタックス構造の一例を説明する図である。It is a figure explaining an example of the encoding syntax structure of the 2nd multi viewpoint image SEI. 図１８の分離処理説明用のフローチャートである。FIG. 19 is a flowchart for explaining separation processing in FIG. 18. FIG. 視点を水平方向に配置された場合の一例を説明する図である。It is a figure explaining an example at the time of arrange | positioning a viewpoint to a horizontal direction. 視点を垂直方向に配置された場合の一例を説明する図である。It is a figure explaining an example at the time of arrange | positioning a viewpoint to a perpendicular direction. 視点を水平／垂直２次元の方向に配置された場合の一例を説明する図である。It is a figure explaining an example at the time of arrange | positioning a viewpoint in a horizontal / vertical two-dimensional direction. 視点を水平／垂直２次元の方向に配置された場合の一例を説明する図である。It is a figure explaining an example at the time of arrange | positioning a viewpoint in a horizontal / vertical two-dimensional direction. 従来の立体視多視点画像符号化装置の一例の構成図である。It is a block diagram of an example of the conventional stereoscopic vision multi-view image encoding apparatus. 従来の立体視画像復号化装置の一例の構成図である。It is a block diagram of an example of the conventional stereoscopic vision image decoding apparatus.

Explanation of symbols

１０１符号化管理部
１０２パラメータセット符号化部
１０３多視点画像ＳＥＩ符号化部
１０４復号画像出力順番号算出部
１０５並べ替えバッファ
１０６動き／視差補償予測部
１０７符号化モード判定部
１０８残差信号演算部
１０９残差信号符号化部
１１０残差信号復号部
１１１残差信号重畳部
１１２復号画像バッファ
１１３符号化ビット列生成部
１１４多重化部
２０１分離部
２０２パラメータセット復号部
２０３多視点画像ＳＥＩ復号部
２０４符号化ビット列復号部
２０５復号画像管理情報算出部
２０６動き／視差補償予測部
２０７予測信号合成部
２０８残差信号復号部
２０９残差信号重畳部
２１０復号画像バッファ
２１１復号画像管理部
２１２復号画像出力部 Reference Signs List 101 encoding management unit 102 parameter set encoding unit 103 multi-view image SEI encoding unit 104 decoded image output order number calculation unit 105 rearrangement buffer 106 motion / disparity compensation prediction unit 107 encoding mode determination unit 108 residual signal calculation unit 109 Residual Signal Encoding Unit 110 Residual Signal Decoding Unit 111 Residual Signal Superimposing Unit 112 Decoded Image Buffer 113 Encoded Bit String Generation Unit 114 Multiplexing Unit 201 Separating Unit 202 Parameter Set Decoding Unit 203 Multi-View Image SEI Decoding Unit 204 Code Bit stream decoding unit 205 Decoded image management information calculation unit 206 Motion / disparity compensation prediction unit 207 Prediction signal synthesis unit 208 Residual signal decoding unit 209 Residual signal superimposition unit 210 Decoded image buffer 211 Decoded image management unit 212 Decoded image output unit

Claims

It is a multi-viewpoint image signal including image signals of each viewpoint obtained respectively at a plurality of set viewpoints, and the image signal of one viewpoint is an image signal obtained by actually photographing from the one viewpoint, or A multi-view image receiving method for receiving encoded data obtained by encoding a multi-view image signal which is an image signal generated as a virtual image taken from one viewpoint, and decoding the encoded data,
Supplementary additional information indicating that the multi-view image signal is encoded, the number V of viewpoints of the multi-view image signal, the number v for identifying each viewpoint, and the decoded image at each viewpoint A first step of receiving encoded data in which a decoded image output order number o (where o is an integer equal to or greater than 0) and the multi-view image signal are encoded, each indicating a number d indicating an output order; ,
First encoded data in which the supplementary additional information and the number of viewpoints V of the multi-view image signal are encoded from the encoded data received in the first step, and the decoded image output order A second step of separating the second encoded data in which the number o and the multi-view image signal are respectively encoded;
A third step of decoding the first encoded data to generate the supplementary additional information including the number of viewpoints V of the multi-view image signal;
A fourth step of decoding the second encoded data to generate the decoded image output order number o;
A number indicating the output order of the decoded image at each viewpoint obtained by dividing the decoded image output order number o by the integer operation by the number V of the decoded viewpoints (integer of 0 or more). a fifth step of calculating as a number v specifying each of the viewpoints, and calculating the remainder of the division (an integer greater than or equal to 0 and less than V);
A sixth step of decoding the second encoded data to generate a decoded multi-viewpoint image signal;
A seventh step of storing the decoded multi-viewpoint image signal in an image buffer;
According to the number d indicating the output order of the decoded images at the respective viewpoints calculated at the fifth step and the number v specifying each of the viewpoints, the decoded multi-viewpoint images are extracted from the image buffer. An eighth step of extracting a signal and outputting the decoded image signals of the respective viewpoints constituting the decoded multi-view image signal in synchronization with each other;
A multi-viewpoint image receiving method.

It is a multi-viewpoint image signal including image signals of each viewpoint obtained respectively at a plurality of set viewpoints, and the image signal of one viewpoint is an image signal obtained by actually photographing from the one viewpoint, or A multi-view image receiving method for receiving encoded data obtained by encoding a multi-view image signal which is an image signal generated as a virtual image taken from one viewpoint, and decoding the encoded data,
For supplementary additional information indicating that the multi-view image is encoded, the number V of viewpoints of the multi-view image signal, the number v for identifying each viewpoint of the multi-view image, and the management of the decoded image A decoded image output order number o (o is a batch indicating a correspondence relationship with viewpoint IDs that specify viewpoints, a number v that identifies each viewpoint, and a number d that indicates the output order of decoded images at each viewpoint. A first step of receiving encoded data in which each of the multi-view image signals is encoded,
Correspondence between the supplementary information, the number V of viewpoints of the multi-view image signal, the number v for identifying each viewpoint of the multi-view image, and the viewpoint ID from the encoded data received in the first step A second that separates the first encoded data in which the relationship is encoded, and the second encoded data in which the decoded image output order number o and the multi-view image signal are encoded, respectively. And the steps
The supplementary addition including the correspondence between the number V of viewpoints of the multi-view image signal and the number v for identifying each viewpoint of the multi-view image and the viewpoint ID by decoding the first encoded data A third step of generating information;
A fourth step of decoding the second encoded data to generate the decoded image output order number o;
A quotient (integer) obtained by dividing the decoded image output order number o by the integer operation by the number V of the decoded viewpoints is calculated as a number d indicating the output order of the decoded images at each viewpoint. And a fifth step of calculating the remainder of the division (an integer greater than or equal to 0 and less than V) as a number v that identifies each of the viewpoints;
A sixth step of decoding the second encoded data to generate a decoded multi-viewpoint image signal;
A seventh step of storing the decoded multi-viewpoint image signal in an image buffer;
According to the number d indicating the output order of the decoded images at the respective viewpoints calculated in the fifth step and the viewpoint ID, a decoded multi-view image signal is extracted from the image buffer, and the decoded multi-viewpoints are obtained. An eighth step of outputting the decoded image signals of the respective viewpoints constituting the image signal in synchronization with each other;
A multi-viewpoint image receiving method.

It is a multi-viewpoint image signal including image signals of each viewpoint obtained respectively at a plurality of set viewpoints, and the image signal of one viewpoint is an image signal obtained by actually photographing from the one viewpoint, or A multi-view image receiving device that receives encoded data obtained by encoding a multi-view image signal that is an image signal generated as a virtual image taken from one viewpoint, and decodes the encoded data.
Supplementary additional information indicating that the multi-view image is encoded, the number V of viewpoints of the multi-view image signal, the number v for identifying each viewpoint, and the output of the decoded image at each viewpoint Receiving means for receiving encoded data in which a decoded image output order number o (o is an integer equal to or greater than 0) and the multi-view image signal are respectively encoded, which collectively indicate a number d indicating the order;
The supplementary additional information indicating that the multi-view image is encoded and the number of viewpoints V of the multi-view image signal are encoded from the encoded data received by the receiving unit. Separating means for separating encoded data and second encoded data in which the decoded image output order number o and the multi-view image signal are encoded;
First decoding means for decoding the separated first encoded data and generating the supplementary additional information including the number of viewpoints V of the multi-view image signal;
Second decoding means for decoding the separated second encoded data and generating the decoded image output order number o;
A quotient (integer) obtained by dividing the decoded image output order number o by the integer operation by the number V of the decoded viewpoints is calculated as a number d indicating the output order of the decoded images at each viewpoint. And calculating means for calculating the remainder of the division (an integer of 0 or more and less than V) as a number v for identifying each of the viewpoints;
Third decoding means for decoding the second encoded data and generating a decoded multi-viewpoint image signal;
Storage means for storing the decoded multi-viewpoint image signal in an image buffer;
In accordance with a number d indicating the output order of the decoded images at the respective viewpoints supplied from the calculation means and a number v specifying each of the viewpoints, a decoded multi-viewpoint image signal is extracted from the image buffer, Output means for outputting the decoded image signals of the respective viewpoints constituting the decoded multi-view image signal in synchronization with each other;
A multi-viewpoint image data receiving apparatus comprising:

It is a multi-viewpoint image signal including image signals of each viewpoint obtained respectively at a plurality of set viewpoints, and the image signal of one viewpoint is an image signal obtained by actually photographing from the one viewpoint, or A multi-view image receiving device that receives encoded data obtained by encoding a multi-view image signal that is an image signal generated as a virtual image taken from one viewpoint, and decodes the encoded data.
For supplementary additional information indicating that the multi-view image is encoded, the number V of viewpoints of the multi-view image signal, the number v for identifying each viewpoint of the multi-view image, and the management of the decoded image A decoded image output order number o (o is a batch indicating a correspondence relationship with viewpoint IDs that specify viewpoints, a number v that identifies each viewpoint, and a number d that indicates the output order of decoded images at each viewpoint. Receiving means for receiving encoded data in which each of the multi-view image signals is encoded,
From the received encoded data, the supplementary additional information, the number V of viewpoints of the multi-view image signal, the number v for identifying each viewpoint of the multi-view image, and the correspondence relationship between the viewpoint IDs are encoded. Separating means for separating the first encoded data and the second encoded data in which the decoded image output order number o and the multi-view image signal are encoded respectively;
The separated first encoded data is decoded, and the number V of viewpoints of the multi-view image signal, the number v for specifying each viewpoint of the multi-view image, and the viewpoint for management of the decoded image are specified. First decoding means for generating the supplementary additional information including a correspondence relationship with the viewpoint ID to be performed;
Second decoding means for decoding the separated second encoded data and generating the decoded image output order number o;
A quotient (integer) obtained by dividing the decoded image output order number o by the integer operation by the number V of the decoded viewpoints is calculated as a number d indicating the output order of the decoded images at each viewpoint. And calculating means for calculating the remainder of the division (an integer of 0 or more and less than V) as a number v for identifying each of the viewpoints;
Third decoding means for decoding the second encoded data and generating a decoded multi-viewpoint image signal;
Storage means for storing the decoded multi-viewpoint image signal in an image buffer;
According to the number d indicating the output order of the decoded image at each viewpoint supplied from the calculation means and the viewpoint ID, the decoded multi-view image signal is extracted from the image buffer, and the decoded multi-view image signal is Output means for outputting the decoded image signals of the respective viewpoints that are configured to be synchronized with each other;
A multi-viewpoint image data receiving apparatus comprising:

It is a multi-viewpoint image signal including image signals of each viewpoint obtained respectively at a plurality of set viewpoints, and the image signal of one viewpoint is an image signal obtained by actually photographing from the one viewpoint, or Multi-view image reception for receiving encoded data obtained by encoding a multi-view image signal, which is an image signal virtually generated from one viewpoint, using a computer and decoding the received encoded data A program for
In the computer,
Supplementary additional information indicating that the multi-view image signal is encoded, the number V of viewpoints of the multi-view image signal, the number v for identifying each viewpoint, and the decoded image at each viewpoint A first step of receiving encoded data in which a decoded image output order number o (where o is an integer equal to or greater than 0) and the multi-view image signal are encoded, each indicating a number d indicating an output order; ,
First encoded data in which the supplementary additional information and the number of viewpoints V of the multi-view image signal are encoded from the encoded data received in the first step, and the decoded image output order A second step of separating the second encoded data in which the number o and the multi-view image signal are respectively encoded;
A third step of decoding the first encoded data to generate the supplementary additional information including the number of viewpoints V of the multi-view image signal;
A fourth step of decoding the second encoded data to generate the decoded image output order number o;
A number indicating the output order of the decoded image at each viewpoint obtained by dividing the decoded image output order number o by the integer operation by the number V of the decoded viewpoints (integer of 0 or more). a fifth step of calculating as a number v specifying each of the viewpoints, and calculating the remainder of the division (an integer greater than or equal to 0 and less than V);
A sixth step of decoding the second encoded data to generate a decoded multi-viewpoint image signal;
A seventh step of storing the decoded multi-viewpoint image signal in an image buffer;
According to the number d indicating the output order of the decoded images at the respective viewpoints calculated at the fifth step and the number v specifying each of the viewpoints, the decoded multi-viewpoint images are extracted from the image buffer. An eighth step of extracting a signal and outputting the decoded image signals of the respective viewpoints constituting the decoded multi-view image signal in synchronization with each other;
A program for receiving multi-viewpoint images, characterized in that

It is a multi-viewpoint image signal including image signals of each viewpoint obtained respectively at a plurality of set viewpoints, and the image signal of one viewpoint is an image signal obtained by actually photographing from the one viewpoint, or Multi-view image reception for receiving encoded data obtained by encoding a multi-view image signal, which is an image signal virtually generated from one viewpoint, using a computer and decoding the received encoded data A program for
In the computer,
For supplementary additional information indicating that the multi-view image is encoded, the number V of viewpoints of the multi-view image signal, the number v for identifying each viewpoint of the multi-view image, and the management of the decoded image A decoded image output order number o (o is a batch indicating a correspondence relationship with viewpoint IDs that specify viewpoints, a number v that identifies each viewpoint, and a number d that indicates the output order of decoded images at each viewpoint. A first step of receiving encoded data in which each of the multi-view image signals is encoded,
Correspondence between the supplementary information, the number V of viewpoints of the multi-view image signal, the number v for identifying each viewpoint of the multi-view image, and the viewpoint ID from the encoded data received in the first step A second that separates the first encoded data in which the relationship is encoded, and the second encoded data in which the decoded image output order number o and the multi-view image signal are encoded, respectively. And the steps
The supplementary addition including the correspondence between the number V of viewpoints of the multi-view image signal and the number v for identifying each viewpoint of the multi-view image and the viewpoint ID by decoding the first encoded data A third step of generating information;
A fourth step of decoding the second encoded data to generate the decoded image output order number o;
A quotient (integer) obtained by dividing the decoded image output order number o by the integer operation by the number V of the decoded viewpoints is calculated as a number d indicating the output order of the decoded images at each viewpoint. And a fifth step of calculating the remainder of the division (an integer greater than or equal to 0 and less than V) as a number v that identifies each of the viewpoints;
A sixth step of decoding the second encoded data to generate a decoded multi-viewpoint image signal;
A seventh step of storing the decoded multi-viewpoint image signal in an image buffer;
According to the number d indicating the output order of the decoded images at the respective viewpoints calculated in the fifth step and the viewpoint ID, a decoded multi-view image signal is extracted from the image buffer, and the decoded multi-viewpoints are obtained. An eighth step of outputting the decoded image signals of the respective viewpoints constituting the image signal in synchronization with each other;
A program for receiving multi-viewpoint images, characterized in that