JP4944046B2

JP4944046B2 - Video encoding method, decoding method, encoding device, decoding device, program thereof, and computer-readable recording medium

Info

Publication number: JP4944046B2
Application number: JP2008000263A
Authority: JP
Inventors: 信哉志水; 英明木全; 一人上倉; 由幸八島
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2008-01-07
Filing date: 2008-01-07
Publication date: 2012-05-30
Anticipated expiration: 2028-01-07
Also published as: JP2009164865A

Description

本発明は，多視点画像及び多視点動画像において，既知の映像信号と距離情報とを用いて，別の視点の距離情報や映像信号を生成する方法である。また，それを用いた多視点画像及び多視点動画像の符号化及び復号技術に関するものである。 The present invention is a method for generating distance information and a video signal of another viewpoint using a known video signal and distance information in a multi-view image and a multi-view video. The present invention also relates to a technique for encoding and decoding a multi-view image and a multi-view video using the same.

多視点画像とは，複数のカメラで同じ被写体と背景を撮影した複数の画像のことであり，多視点動画像（多視点映像）とは，その動画像のことである。以下では，１つのカメラで撮影された動画像を“２次元動画像”と呼び，同じ被写体と背景を撮影した２次元動画像群を多視点動画像と呼ぶ。 A multi-view image is a plurality of images obtained by photographing the same subject and background with a plurality of cameras, and a multi-view video (multi-view video) is a moving image. Hereinafter, a moving image captured by one camera is referred to as a “two-dimensional moving image”, and a two-dimensional moving image group in which the same subject and background are captured is referred to as a multi-viewpoint moving image.

２次元動画像は，時間方向に関して強い相関があり，その相関を利用することによって符号化効率を高めている。一方，多視点画像や多視点動画像では，各カメラが同期されていた場合，同じ時間に対応した各カメラの映像は全く同じ状態の被写体と背景を別の位置から撮影したものなので，カメラ間で強い相関がある。多視点画像や多視点動画像の符号化においては，この相関を利用することによって符号化効率を高めることができる。 The two-dimensional moving image has a strong correlation in the time direction, and the encoding efficiency is improved by using the correlation. On the other hand, in multi-view images and multi-view images, if the cameras are synchronized, the images of each camera corresponding to the same time are taken from different positions of the subject and background in the same state. There is a strong correlation. In the encoding of a multi-view image or a multi-view video, the encoding efficiency can be increased by using this correlation.

まず，２次元動画像の符号化技術に関する従来技術を述べる。国際符号化標準であるＨ．２６４，ＭＰＥＧ−２，ＭＰＥＧ−４をはじめとした従来の多くの２次元動画像符号化方式では，動き補償，直交変換，量子化，エントロピー符号化という技術を利用して，高効率な符号化を行う。動き補償と呼ばれる技術が，フレーム間の時間相関を利用する方法である。 First, a description will be given of a conventional technique related to a two-dimensional video encoding technique. H., an international encoding standard. In many conventional two-dimensional video coding systems such as H.264, MPEG-2, and MPEG-4, high-efficiency coding is performed using techniques such as motion compensation, orthogonal transformation, quantization, and entropy coding. I do. A technique called motion compensation is a method that uses temporal correlation between frames.

Ｈ．２６４で使われている動き補償技術の詳細については，下記の非特許文献１に記載されているが，以下でその概要を説明する。Ｈ．２６４の動き補償では，符号化対象フレームを様々なサイズのブロックに分割し，ブロックごとに，参照フレームと呼ばれる既に符号化済みのフレームを選び，動きベクトルと呼ばれる対応点を示すベクトル情報を用いて，映像を予測する。このときに許されるブロック分割は，１６×１６，１６×８，８×１６，８×８，８×４，４×８，４×４の７種類であり，被写体の動きの向きや大きさの違いに細かい単位で対応して映像を予測できるようになっている。これによって，予測画像と原画像の差分で表される符号化対象の残差が小さくなるため，高い符号化効率を達成している。 H. The details of the motion compensation technique used in H.264 are described in the following Non-Patent Document 1, and the outline thereof will be described below. H. In H.264 motion compensation, a frame to be encoded is divided into blocks of various sizes, a frame that has already been encoded called a reference frame is selected for each block, and vector information indicating corresponding points called a motion vector is used. , Predict video. There are seven types of block division allowed at this time: 16 × 16, 16 × 8, 8 × 16, 8 × 8, 8 × 4, 4 × 8, and 4 × 4, and the direction and size of the movement of the subject. It is now possible to predict the video corresponding to the difference in small units. As a result, the residual of the encoding target represented by the difference between the predicted image and the original image becomes small, and high encoding efficiency is achieved.

次に，従来の多視点画像や多視点動画像の符号化方式について説明する。多視点画像の符号化方法と，多視点動画像の符号化方法との違いは，多視点動画像にはカメラ間の相関に加えて，時間方向の相関が同時に存在するということである。しかし，カメラ間の相関を利用する方法は，どちらの場合でも同じ方法を用いることができる。そのため，ここでは多視点動画像の符号化において用いられる方法について説明する。 Next, a conventional multi-view image and multi-view video encoding method will be described. The difference between the multi-view image encoding method and the multi-view image encoding method is that the multi-view image has a correlation in the time direction in addition to the correlation between cameras. However, the method using the correlation between cameras can be the same in either case. Therefore, here, a method used in encoding multi-view video will be described.

多視点動画像の符号化については，カメラ間の相関を利用するために，動き補償を同じ時刻の異なるカメラの画像に適用した“視差補償”によって高効率に多視点動画像を符号化する方式が従来から存在する。ここで，視差とは，異なる位置に配置されたカメラの画像平面上で，被写体上の同じ位置が投影される位置の差である。 For multi-view video coding, in order to use the correlation between cameras, a multi-view video is encoded with high efficiency by "parallax compensation" in which motion compensation is applied to different camera images at the same time. Has traditionally existed. Here, the parallax is a difference between positions at which the same position on the subject is projected on the image planes of cameras arranged at different positions.

このカメラ間で生じる視差の概念図を図１に示す。この概念図では，光軸が平行なカメラの画像平面を垂直に見下ろしたものとなっている。このように，異なるカメラの画像平面上で被写体上の同じ位置が投影される位置は，一般的に対応点と呼ばれる。視差補償はこの対応関係に基づいて，符号化対象フレームの各画素値を参照フレームから予測して，その予測残差と，対応関係を示す視差情報とを符号化する。 A conceptual diagram of the parallax generated between the cameras is shown in FIG. In this conceptual diagram, the image plane of a camera with parallel optical axes is viewed vertically. In this way, the position where the same position on the subject is projected on the image plane of different cameras is generally called a corresponding point. In the disparity compensation, each pixel value of the encoding target frame is predicted from the reference frame based on the correspondence relationship, and the prediction residual and the disparity information indicating the correspondence relationship are encoded.

多くの手法では，視差を画像平面上でのベクトルとして表現する。例えば，非特許文献２では，ブロック単位で視差補償を行う仕組みが含まれているが，ブロック単位の視差を２次元ベクトルで，すなわち２つのパラメータ（ｘ成分及びｙ成分）で表現する。つまり，この手法では，２パラメータで構成される視差情報と予測残差を符号化する。 In many methods, parallax is expressed as a vector on the image plane. For example, Non-Patent Document 2 includes a mechanism for performing parallax compensation in units of blocks. However, parallax in units of blocks is expressed by a two-dimensional vector, that is, by two parameters (x component and y component). That is, in this method, disparity information composed of two parameters and a prediction residual are encoded.

一方，特許文献１（映像符号化方法，映像復号方法，映像符号化プログラム，映像復号プログラム及びそれらのプログラムを記録したコンピュータ読み取り可能な記録媒体）では，カメラパラメータを符号化に利用し，エピポーラ幾何拘束に基づき視差ベクトルを１次元の情報として表現することにより，予測情報を効率的に符号化する。 On the other hand, in Patent Document 1 (video encoding method, video decoding method, video encoding program, video decoding program and computer-readable recording medium on which these programs are recorded), the camera parameters are used for encoding, and epipolar geometry is used. By expressing the disparity vector as one-dimensional information based on the constraint, the prediction information is efficiently encoded.

エピポーラ幾何拘束の概念図を図２に示す。エピポーラ幾何拘束によれば，２台のカメラ（カメラ１とカメラ２）において，片方の画像上の点に対応するもう片方の画像上の点は，エピポーラ線という直線上に拘束される。特許文献１の手法では，エピポーラ線上での位置を示すために，参照フレームを撮影しているカメラから被写体までの距離という１つのパラメータで全符号化対象フレームに対する視差を表現している。 A conceptual diagram of the epipolar geometric constraint is shown in FIG. According to the epipolar geometric constraint, in two cameras (camera 1 and camera 2), a point on the other image corresponding to a point on one image is constrained on a straight line called an epipolar line. In the method of Patent Document 1, in order to indicate the position on the epipolar line, the parallax with respect to all the encoding target frames is expressed by one parameter, the distance from the camera that captures the reference frame to the subject.

なお，非特許文献３には，本発明の実施において用いることができる隣接画素間で連続性を仮定しながら視差補償画像を生成する技術が記載されている。 Non-Patent Document 3 describes a technique for generating a parallax compensation image while assuming continuity between adjacent pixels that can be used in the implementation of the present invention.

また，非特許文献４には，視差補償画像（合成画像）を符号化対象画像に対する参照画像として用いて，予測先を示すベクトル情報を付加しながら符号化する方法が記載されている。
特開２００７−０３６８００号公報 ITU-T Rec. H.264/ISO/IEC 11496-10,“Advanced Video Coding", Final Committee Draft, Document JVT-E022, September 2002． Hideaki Kimata and Masaki Kitahara, “Preliminary results on multiple view video coding (3DAV)", document M10976 MPEG Redmond Meeting, July, 2004. Shinya Shimizu, Masaki Kitahara, Hideaki Kimata, Kazuto Kamikura and Yoshiyuki Yashima. “View Scalable Multiview Video Coding using 3-D Warping with Depth Map. " IEEE Transactions on Circuits and Systems for Video Technology, Vol.17, No.11, pp.1485-1495, 2007. Masaki Kitahara, Hideaki Kimata, Shinya Shimizu, Kazuto Kamikura and Yoshiyuki Yashima. “Multi-view Video Coding using View Interpolation and Reference Picture Selection." IEEE International Conference on Multimedia and Expo, pp.97-100, 2006. Non-Patent Document 4 describes a method of encoding while using a parallax compensated image (composite image) as a reference image for an encoding target image and adding vector information indicating a prediction destination.
JP 2007-036800 A ITU-T Rec. H.264 / ISO / IEC 11496-10, “Advanced Video Coding”, Final Committee Draft, Document JVT-E022, September 2002. Hideaki Kimata and Masaki Kitahara, “Preliminary results on multiple view video coding (3DAV)”, document M10976 MPEG Redmond Meeting, July, 2004. Shinya Shimizu, Masaki Kitahara, Hideaki Kimata, Kazuto Kamikura and Yoshiyuki Yashima. “View Scalable Multiview Video Coding using 3-D Warping with Depth Map.” IEEE Transactions on Circuits and Systems for Video Technology, Vol.17, No.11, pp .1485-1495, 2007. Masaki Kitahara, Hideaki Kimata, Shinya Shimizu, Kazuto Kamikura and Yoshiyuki Yashima. “Multi-view Video Coding using View Interpolation and Reference Picture Selection.” IEEE International Conference on Multimedia and Expo, pp.97-100, 2006.

従来の多視点動画像の符号化方法によれば，カメラパラメータが既知である場合，エピポーラ幾何拘束を利用して，カメラの台数に関わらず，参照フレームに対してカメラから被写体までの距離という１次元情報を符号化するだけで，全カメラの符号化対象フレームに対する視差補償が実現でき，視差情報を効率的に符号化することが可能である。 According to the conventional multi-view video encoding method, when the camera parameter is known, the epipolar geometric constraint is used to determine the distance from the camera to the subject with respect to the reference frame regardless of the number of cameras. Only by encoding the dimension information, it is possible to realize the parallax compensation for the encoding target frames of all the cameras, and it is possible to efficiently encode the parallax information.

しかしながら，参照フレームを撮影したカメラと符号化対象フレームのカメラの距離が遠くなると，照明の影響やオクルージョンの影響を受けて，予測映像の品質が低下してしまう。その結果，多くの予測残差信号を符号化しなければならなくなり，効率的な符号化を実現できない。 However, when the distance between the camera that captured the reference frame and the camera of the encoding target frame is increased, the quality of the predicted video is degraded due to the effects of illumination and occlusion. As a result, many prediction residual signals must be encoded, and efficient encoding cannot be realized.

類似の問題として，通常の動画像符号化において，参照フレームと符号化対象フレームとの時間間隔が長くなることによって予測効率が低下してしまうという問題がある。この問題に対して，一般的な動画像符号化では，符号化対象フレームごとに異なる参照フレームを設定することで対処している。 As a similar problem, in normal video encoding, there is a problem that prediction efficiency decreases due to an increase in the time interval between the reference frame and the encoding target frame. In general video coding, this problem is addressed by setting different reference frames for each frame to be encoded.

したがって，前述のカメラ間隔が増大することによる予測品質の低下に対する問題を回避する手法として，カメラごとに異なるカメラを参照フレームとして利用する方法が容易に考えられる。 Therefore, as a technique for avoiding the problem with respect to the decrease in the prediction quality due to the increase in the camera interval, a method of using a different camera for each camera as a reference frame can be easily considered.

しかしながら，従来の多視点動画像符号化方式では，参照フレームに対して被写体までの距離を符号化する必要があるため，カメラごとに異なる参照フレームを使用する場合，複数の距離情報を符号化する必要が生じる。このため符号化効率が低下してしまう。 However, in the conventional multi-view video coding method, since it is necessary to encode the distance to the subject with respect to the reference frame, when using different reference frames for each camera, multiple distance information is encoded. Need arises. For this reason, encoding efficiency will fall.

本発明は係る事情に鑑みてなされたものであって，多視点動画像の符号化においてカメラごとに異なる参照カメラを用いる場合でも，既知の距離情報や映像信号から距離情報を生成することで，複数のカメラの距離情報を符号化することなく効率的な視差補償を実現し，これにより従来よりも高い符号化効率を達成することを目的とする。 The present invention has been made in view of such circumstances, and even when a different reference camera is used for each camera in multi-view video encoding, by generating distance information from known distance information and video signals, An object of the present invention is to realize efficient parallax compensation without encoding distance information of a plurality of cameras, thereby achieving higher encoding efficiency than before.

前述の課題を解決するために，本発明では距離情報の与えられた視点とは異なる視点で撮影されたフレームを参照フレームとして利用する場合に，符号化及び復号における処理として，ある視点に対して与えられた距離情報を別の視点に対する距離情報へと変換することを行う。これによって，様々なカメラで撮影された映像（画像）を参照フレームとする場合においても，複数の距離情報を伝送する必要はなく，カメラごとに，より映像信号の類似したカメラを参照先として利用することができるようになる。つまり，符号量を変化させずに予測効率を向上することができるため，その結果，効率的な符号化を実現することができる。 In order to solve the above-mentioned problem, in the present invention, when a frame shot from a viewpoint different from the viewpoint given distance information is used as a reference frame, as a process in encoding and decoding, The given distance information is converted into distance information for another viewpoint. As a result, even when video (images) taken by various cameras is used as a reference frame, there is no need to transmit multiple distance information, and a camera with a similar video signal is used as a reference destination for each camera. Will be able to. That is, the prediction efficiency can be improved without changing the code amount, and as a result, efficient encoding can be realized.

この変換は，カメラによって実空間のオブジェクトが撮影される際の物理現象に従って行われるため，カメラによる射影変換を十分にモデル化することが可能であれば非常に高い精度で変換を実現することができる。 Since this conversion is performed according to the physical phenomenon when an object in real space is photographed by the camera, if the projection conversion by the camera can be sufficiently modeled, the conversion can be realized with very high accuracy. it can.

また，異なるカメラで撮影した映像の間では，どちらか一方でしか撮影されていない領域が存在する。上記の変換は，あるカメラで撮影された画像に対して与えられる距離情報を別の視点で撮影された画像に対する距離情報へと変換するものであるため，変換先のカメラでは撮影されているが，変換前のカメラでは撮影されていない領域に対して有効な距離情報を与えることができない。そこで本発明では，上記の変換に加えて，そのような領域に対して距離情報を生成する手段を設ける。 In addition, there is a region where only one of the images captured by different cameras is captured. The above conversion is to convert distance information given to an image taken with a certain camera into distance information about an image taken with a different viewpoint. Therefore, effective distance information cannot be given to an area not photographed by the camera before conversion. Therefore, in the present invention, in addition to the above conversion, a means for generating distance information for such a region is provided.

具体的には，変換先の画像上で連続する画素では同じカメラから被写体までの距離を有するとして距離情報を生成する。これは現実空間において被写体は連続性して存在するという事実に基づく処理であるため，精度の高い距離情報の生成が可能である。また，被写体が異なる場合には連続性が成り立たないため，距離情報を生成したい領域に複数の被写体が存在している場合，この手段では距離情報の生成精度が低下してしまう可能性がある。そこで，変換先の画像からエッジ抽出や色情報を用いた領域分割を行うことで被写体を抽出して，近傍の同じ被写体上に与えられた距離情報を用いて未知領域に対する距離情報を生成することで，より精度の高い距離情報の生成が可能である。なお，カメラから十分に遠い距離に存在する風景のようなものの場合には，異なる被写体であっても同じ距離情報を持つとして生成を行うことができる。これはカメラによって撮影されるときのサンプリングが離散的であるためである。 Specifically, the distance information is generated assuming that the continuous pixels on the conversion destination image have the distance from the same camera to the subject. Since this is a process based on the fact that the subject exists continuously in the real space, it is possible to generate distance information with high accuracy. In addition, since continuity does not hold when the subjects are different, if there are a plurality of subjects in an area where distance information is to be generated, this means may reduce the accuracy of distance information generation. Therefore, the subject is extracted by performing edge extraction and area division using color information from the conversion destination image, and distance information for the unknown region is generated using distance information given on the same subject nearby. Thus, it is possible to generate distance information with higher accuracy. In the case of a landscape that exists at a sufficiently long distance from the camera, even different subjects can be generated as having the same distance information. This is because sampling when taken by the camera is discrete.

また，別の時刻における変換先のカメラに対する距離情報が得られている場合，それを用いて現時刻における未知領域の距離情報を生成することもできる。この手段では実空間上で被写体が瞬間的に移動できないという事実に基づいているため，精度の高い距離情報の生成が可能である。最も単純な方法として同じ画素位置の距離情報を複写する手段がある。これは被写体とカメラの位置関係が変わらない場合に有効である。例えば固定カメラを用いて，時間的に変化の少ないものを背景として撮影を行った場合などである。また，時間的に変化があるものを被写体としていたとしても，距離情報が既知の時刻の画像と現時刻の画像とを用いて被写体の動きを抽出し，対応画素の距離情報を用いて未知領域の距離情報を生成することで高精度な距離情報の生成を行うことが可能である。なお，画像信号を用いて対応点を見つける場合，因子分解法を利用することでカメラに運動が含まれる場合においてもより高精度な距離情報の生成が可能である。 In addition, when distance information about the conversion destination camera at another time is obtained, it is possible to generate distance information of an unknown area at the current time using the distance information. Since this means is based on the fact that the subject cannot move instantaneously in real space, it is possible to generate highly accurate distance information. As the simplest method, there is means for copying distance information of the same pixel position. This is effective when the positional relationship between the subject and the camera does not change. For example, when a fixed camera is used and shooting is performed using a background with little change in time. In addition, even if the subject has a temporal change, the subject's movement is extracted using the image at the time when the distance information is known and the image at the current time, and the unknown region is extracted using the distance information of the corresponding pixel. It is possible to generate highly accurate distance information by generating the distance information. In addition, when finding a corresponding point using an image signal, it is possible to generate distance information with higher accuracy by using a factorization method even when the camera includes motion.

なお，ここで紹介した空間方向の連続性を利用した距離情報生成手段と，時間方向の連続性を利用した距離情報生成手段とを組み合わせて利用してもよい。２つの異なる軸の連続性を利用することで，より精度の高い距離情報の生成が可能になる。 Note that the distance information generating means using the continuity in the spatial direction introduced here and the distance information generating means using the continuity in the time direction may be used in combination. By using the continuity of two different axes, it is possible to generate distance information with higher accuracy.

距離情報を生成する際や蓄積の際に距離情報を修正する際に，この距離情報が未知であった領域における被写体は変換元のカメラから見ることができないという条件を用いて，誤った距離情報が生成されるのを回避しても構わない。つまり，生成された距離情報を最初に距離情報の与えられたカメラに対する距離情報に変換した際に，変換前が画像の端に存在する距離情報であれば変換後の画像上の対応点は画面の外側となり，変換前がオクルージョン領域に存在する距離情報であれば変換後の画素にはよりカメラに近いことを示す距離情報が入力された距離情報上に存在することになる。 When correcting the distance information when generating or accumulating the distance information, using the condition that the subject in the area where the distance information was unknown cannot be viewed from the conversion source camera, the incorrect distance information May be avoided. In other words, when the generated distance information is first converted into distance information for the camera to which the distance information is given, if the distance information before the conversion exists at the edge of the image, the corresponding points on the converted image are displayed on the screen. If the distance information before the conversion exists in the occlusion area, the converted pixel exists on the input distance information indicating that the pixel is closer to the camera.

時間的に連続性を考慮して距離情報を生成する際に使用する過去（未来）の距離情報は，上記の処理によって生成された距離情報を用いても構わないが，さらに別のカメラの画像が得られた際に，ステレオマッチング等を用いて距離情報を推定してより正確なものに更新することもできる。 The distance information generated by the above processing may be used as the past (future) distance information used when generating the distance information in consideration of continuity in time. Is obtained, the distance information can be estimated using stereo matching or the like and updated to a more accurate one.

例えば，距離情報の与えられるカメラ（カメラ１），参照フレームとなる映像を撮影したカメラ（カメラ２），符号化対象の映像を撮影するカメラ（カメラ３）が存在する場合，符号化を行っている時刻に対しては，カメラ１とカメラ２の画像しか得られていないため，カメラ２に対する距離情報のうち，カメラ１で撮影されていない領域に対するものは，被写体の空間的・時間的連続性を用いて生成するしかない。しかしながら，符号化順で過去の時刻に対しては，カメラ１とカメラ２の画像のほかに，カメラ３の復号画像が得られている。そのため，カメラ１には撮影されていなくても，カメラ３で撮影されていれば，カメラ２で撮影された画像とステレオマッチングを用いて対応点を計算することができるため，連続性を仮定して生成した距離情報よりも正確な情報を生成することが可能となる。 For example, if there is a camera (camera 1) to which distance information is given, a camera (camera 2) that captures a video as a reference frame, and a camera (camera 3) that captures a video to be encoded, encoding is performed. Since only the images of the camera 1 and the camera 2 are obtained at a certain time, the distance information for the camera 2 for the area not captured by the camera 1 is the spatial and temporal continuity of the subject. It can only be generated using However, a decoded image of the camera 3 is obtained in addition to the images of the cameras 1 and 2 for the past time in the encoding order. Therefore, even if the image is not captured by the camera 1, if the image is captured by the camera 3, the corresponding points can be calculated using stereo matching with the image captured by the camera 2, so continuity is assumed. Thus, it is possible to generate more accurate information than the distance information generated in this way.

本発明によれば，視差補償に必要な情報が大幅に増加することを防ぎながら，カメラごとに異なる参照カメラを使用した視差補償を実現することで，多視点画像全体や多視点動画像全体としての高効率な符号化を実現することができる。 According to the present invention, it is possible to realize parallax compensation using a different reference camera for each camera while preventing a large increase in information necessary for parallax compensation. Highly efficient encoding can be realized.

以下，実施の形態に従って本発明を詳細に説明する。以下に説明する実施例では，３つのカメラで撮影された多視点動画像を符号化する場合を想定する。ここで，カメラＡの映像は視差補償を行わずに符号化され，映像の他にカメラから撮影された被写体までの距離に関する情報が符号化されるものとする。また，カメラＢの映像は符号化されたカメラＡの距離情報を用いて，カメラＡを参照カメラとして視差補償を用いながら符号化されるものとする。そしてカメラＣは符号化されたカメラＡにおける距離情報を用いて，カメラＢを参照カメラとして視差補償を用いながら符号化されるものとする。カメラＡとカメラＢの符号化に関しては従来の手法がそのまま利用可能であるため，本実施例ではカメラＣの映像を符号化する方法について説明を行う。図３に本実施例で使用するカメラ構成の概念図を示す。 Hereinafter, the present invention will be described in detail according to embodiments. In the embodiment described below, it is assumed that a multi-view video captured by three cameras is encoded. Here, it is assumed that the video of the camera A is encoded without performing parallax compensation, and information regarding the distance from the camera to the subject photographed is encoded in addition to the video. In addition, the video of the camera B is encoded using the encoded distance information of the camera A while using the camera A as a reference camera and using parallax compensation. The camera C is encoded using the encoded distance information of the camera A using the camera B as a reference camera and using parallax compensation. Since the conventional method can be used as it is for the encoding of the camera A and the camera B, a method for encoding the video of the camera C will be described in the present embodiment. FIG. 3 shows a conceptual diagram of a camera configuration used in this embodiment.

まず，第１の実施例（以下，実施例１）について説明する。本発明の実施例１に係る映像符号化装置の構成図を図４に示す。図４に示すように，実施例１の映像符号化装置１００は，符号化対象となるカメラＣの原画像を入力する符号化対象画像入力部１０１と，入力された符号化対象画像を格納する符号化対象画像メモリ１０２と，視差補償をする際の参照画像となるカメラＢの復号画像を入力する参照カメラ画像入力部１０３と，入力された参照画像を格納する参照画像メモリ１０４と，視差補償画像を生成する際に用いるカメラＡにおけるカメラから被写体までの距離に関する情報を入力する距離情報入力部１０５と，カメラＡに対する距離情報を参照カメラであるカメラＢに対する距離情報に変換する距離情報変換部１０６と，オクルージョン等によって有効な距離情報が得られなかった領域に対して背景の距離情報を生成する背景距離情報生成部１０７と，距離情報を蓄積する距離情報メモリ１０８と，参照画像と距離情報とから符号化対象画像に対する視差補償画像を生成する視差補償画像生成部１０９と，符号化対象画像を実際に符号化する画像符号化部１１０と，符号化した画像を復号した画像を格納する復号画像メモリ１１１とを備える。 First, the first embodiment (hereinafter referred to as the first embodiment) will be described. FIG. 4 shows a configuration diagram of a video encoding apparatus according to Embodiment 1 of the present invention. As illustrated in FIG. 4, the video encoding device 100 according to the first embodiment stores an encoding target image input unit 101 that inputs an original image of a camera C to be encoded, and an input encoding target image. An encoding target image memory 102, a reference camera image input unit 103 that inputs a decoded image of the camera B, which is a reference image for parallax compensation, a reference image memory 104 that stores an input reference image, and parallax compensation A distance information input unit 105 that inputs information about the distance from the camera to the subject in the camera A used when generating an image, and a distance information conversion unit that converts the distance information about the camera A into distance information about the camera B that is a reference camera. 106, a background distance information generation unit 107 that generates background distance information for an area for which effective distance information cannot be obtained due to occlusion, etc. A distance information memory 108 for storing distance information, a parallax compensation image generation unit 109 for generating a parallax compensation image for the encoding target image from the reference image and the distance information, and image encoding for actually encoding the encoding target image Unit 110 and a decoded image memory 111 for storing an image obtained by decoding an encoded image.

図５に，このようにして構成される映像符号化装置１００が実行する処理フローを示す。この処理フローに従って，実施例１の映像符号化装置１００が実行する処理について詳細に説明する。 FIG. 5 shows a processing flow executed by the video encoding apparatus 100 configured as described above. The processing executed by the video encoding device 100 according to the first embodiment will be described in detail according to this processing flow.

まず，符号化対象画像入力部１０１により，符号化対象のカメラＣの画像が入力され，符号化対象画像メモリ１０２に格納される［ステップＳ１１］。また，参照カメラとなるカメラＢの符号化対象画像と同時刻に撮影された画像を一度符号化して復号した画像が，参照カメラ画像入力部１０３より入力され，参照画像メモリ１０４に格納される。さらに，符号化対象画像と同時刻のカメラＡにおけるカメラから被写体までの距離に関する情報が，距離情報入力部１０５から入力される。なお，ここで全ての情報が同時に入力される必要はなく，符号化対象画像が入力される以前に入力されていても構わないし，符号化対象画像が入力された後に入力されても構わない。後者の場合，符号化対象画像と同時刻の情報が全て揃うまで，視差補償画像を生成するステップＳ１４以降の処理は行われない。 First, the image of the camera C to be encoded is input by the encoding target image input unit 101 and stored in the encoding target image memory 102 [step S11]. In addition, an image obtained by encoding and decoding an image shot at the same time as an encoding target image of the camera B serving as a reference camera is input from the reference camera image input unit 103 and stored in the reference image memory 104. Further, information regarding the distance from the camera to the subject in the camera A at the same time as the encoding target image is input from the distance information input unit 105. Here, it is not necessary to input all the information at the same time. The information may be input before the encoding target image is input, or may be input after the encoding target image is input. In the latter case, the processing after step S14 for generating the parallax compensation image is not performed until all the information at the same time as the encoding target image is obtained.

ここで入力される距離情報は正確なカメラから被写体までの距離を表すような情報である必要はなく，この情報を用いてカメラ間の視差を求められた精度で計算できるものであればよい。また，復号装置側で得られる情報と同じものを使用すれば，復号画像にドリフトと呼ばれる符号化歪みを発生させずに済む。したがって，もし距離情報を歪みあり符号化し復号装置側に伝送する場合には，符号化された距離情報を復号した結果の距離情報を用いればよい。もちろんドリフト歪みを許すのであれば，ここで入力される距離情報が復号装置側で得られるものと別のものを用いても構わない。 The distance information input here does not need to be information that accurately represents the distance from the camera to the subject, and may be any information that can calculate the parallax between the cameras with this information. Also, if the same information as obtained on the decoding device side is used, it is not necessary to generate coding distortion called drift in the decoded image. Therefore, if the distance information is encoded with distortion and transmitted to the decoding device side, the distance information obtained as a result of decoding the encoded distance information may be used. Of course, if drift distortion is allowed, the distance information input here may be different from that obtained on the decoding device side.

入力された距離情報は，距離情報変換部１０６でカメラＡに対する距離情報からカメラＢに対する距離情報へと変換される［ステップＳ１２］。最も単純な変換法では，以下の計算式（１）に従って変換を行う。 The input distance information is converted from distance information for the camera A to distance information for the camera B by the distance information conversion unit 106 [step S12]. In the simplest conversion method, conversion is performed according to the following calculation formula (1).

ここでＡ，Ｒ，ｔはそれぞれカメラの内部パラメータ行列，回転行列，並進ベクトルを表す。添え字は，そのパラメータがどのカメラの情報であるかを示している。なお，ＡとＲは３×３の行列であり，ｔは三次元ベクトルである。カメラパラメータは様々な方法で表現することができるため，上記計算式は画像座標ｍと世界座標Ｍの対応関係が式Ｍ^*＝ＲＡ^-1ｍ^*＋ｔで得られる表現を用いているとする。なお，Ｍ，ｍの右肩に付した＊記号は，スカラ倍を許した斉次座標を表す。ｄ_x（ａ，ｂ）は，カメラＸで撮影された画像の画素（ａ，ｂ）におけるカメラから被写体までの距離を表す。 Here, A, R, and t represent the internal parameter matrix, rotation matrix, and translation vector of the camera, respectively. The subscript indicates which camera information the parameter is. A and R are 3 × 3 matrices, and t is a three-dimensional vector. Since the camera parameters can be expressed by various methods, it is assumed that the above calculation formula uses an expression in which the correspondence relationship between the image coordinate m and the world coordinate M is obtained by the formula M ^* = RA ⁻¹ m ^* + t. Note that the symbol * attached to the right shoulders of M and m represents homogeneous coordinates that allow scalar multiplication. d _x (a, b) represents the distance from the camera to the subject in the pixel (a, b) of the image taken by the camera X.

変換はカメラＢの距離情報バッファを初期化した後に，カメラＡの全ての画素に対して式（１）を用いてカメラＢにおける画素位置と距離情報を計算し，その画素位置に対応する距離情報バッファに求められた距離情報を格納することで行われる。格納を行う際に該当領域に初期値以外の値が既に格納される場合，距離情報の示すカメラとの前後関係に従って，カメラＢにより近いことを示す距離情報を選択し格納する。 In the conversion, after initializing the distance information buffer of the camera B, the pixel position and distance information in the camera B are calculated for all the pixels of the camera A using the expression (1), and the distance information corresponding to the pixel position is calculated. This is done by storing the obtained distance information in the buffer. When a value other than the initial value is already stored in the corresponding area when storing, distance information indicating that it is closer to the camera B is selected and stored according to the context of the camera indicated by the distance information.

この変換では，カメラＡの複数画素がカメラＢの１画素に対応付けられるため，有効な距離情報の得られない画素が大量に発生する場合がある。しかしながら，実際にはカメラＡでは撮影されていないがカメラＢでは撮影されているような領域以外においては，距離情報が得られないことはない。そこで，カメラＡにおける画素間の連続性を考慮して，カメラＢにおける変換距離情報を求める方法を用いても構わない。ただし，ドリフト歪みの発生をなくすためには，ここでの距離情報の変換手法と復号側で行われる距離情報変換手法は同じにする必要がある。 In this conversion, since a plurality of pixels of the camera A are associated with one pixel of the camera B, a large number of pixels for which effective distance information cannot be obtained may occur. However, distance information cannot be obtained in areas other than those actually captured by camera A but not captured by camera B. Therefore, a method for obtaining conversion distance information in the camera B in consideration of continuity between pixels in the camera A may be used. However, in order to eliminate the occurrence of drift distortion, the distance information conversion method here and the distance information conversion method performed on the decoding side need to be the same.

連続性を考慮して変換する方法としては，例えば，カメラＡの画像上で隣接する画素に対して，距離情報の違いがある一定範囲内であれば被写体が実空間で連続していると仮定して，カメラＢの画像上での対応画素を求めた後に，求まった画素の間をそれぞれに対して与えられた距離情報を用いて補間しながら，カメラＢにおける距離情報へと変換する方法がある。 As a conversion method considering continuity, for example, it is assumed that the subject is continuous in the real space if the distance information is within a certain range with respect to adjacent pixels on the image of camera A. Then, after obtaining the corresponding pixels on the image of the camera B, a method of converting the obtained pixels into distance information in the camera B while interpolating between the obtained pixels using the distance information given to the respective pixels is provided. is there.

ここで，カメラＡとカメラＢは異なるカメラであるため，カメラＢでは撮影されるがカメラＡでは撮影されない領域が存在する。従って，そのような領域に対応するカメラＢの画像上の領域では，有効な距離情報が得られていないことになる。 Here, since the camera A and the camera B are different cameras, there is an area that is captured by the camera B but not captured by the camera A. Therefore, effective distance information is not obtained in an area on the image of the camera B corresponding to such an area.

そこで，次にそのような有効な値がない領域における距離情報を生成する［ステップＳ１３］。ドリフト歪みを発生させない場合には，復号側で同じ方法を用いる必要はあるが，どのような方法を用いても構わない。 Therefore, next, distance information in an area where there is no such effective value is generated [step S13]. When drift distortion is not generated, it is necessary to use the same method on the decoding side, but any method may be used.

例えば，実空間では被写体は連続しているため，隣接する画素の距離情報をコピーする方法がある。その領域に写っているものが立体的に単純な壁などのものや，カメラから十分に遠いような背景の場合，隣接画素間でのカメラから被写体までの距離はほぼ等しいので，このコピーによる距離情報生成は高い精度を持つ。また，その領域に複数のオブジェクトがあるような場合でも，参照画像メモリ１０４に格納されている対応するカメラＢの復号画像から，エッジを抽出したり領域分割をしたりすることで被写体を判別し，同じ被写体上で距離情報が既知の隣接画素からコピーを行うことで，より正確な距離情報の生成を実現できる。 For example, since the subject is continuous in real space, there is a method of copying distance information of adjacent pixels. If the object in the area is a three-dimensional simple wall or a background that is far enough from the camera, the distance from the camera to the subject between adjacent pixels is almost equal. Information generation has high accuracy. Even when there are a plurality of objects in the area, the subject is discriminated by extracting an edge or dividing the area from the decoded image of the corresponding camera B stored in the reference image memory 104. , It is possible to realize more accurate generation of distance information by copying from adjacent pixels whose distance information is already known on the same subject.

また，時間的な相関を用いて距離情報を生成することも可能である。背景に静止したものやカメラの距離が十分遠いようなものを撮影していた場合，それらの距離情報は時間的に変化しないため，過去の距離情報を距離情報メモリ１０８に蓄積しておき，距離情報が未知の領域に蓄積されていた値をコピーすることで距離情報を生成することができる。背景に写っている被写体がカメラから十分に遠くなく，時間的に変化するような場合には，対応する時刻の参照画像を用いて被写体の動きを取得し，距離情報メモリ１０８に蓄積された距離情報を平行移動させてコピーすることで，より正確な距離情報を生成できる。この場合においても，画像のエッジ情報や色情報から写っている被写体を区別することができるため，それらの情報を用いることで信頼性の高い距離情報が生成可能である。 It is also possible to generate distance information using temporal correlation. When shooting an object that is stationary in the background or that has a sufficiently long camera distance, the distance information does not change with time, so the past distance information is stored in the distance information memory 108, and the distance information is stored. Distance information can be generated by copying a value stored in an unknown area. If the subject in the background is not sufficiently far from the camera and changes over time, the movement of the subject is obtained using the reference image at the corresponding time, and the distance stored in the distance information memory 108 More accurate distance information can be generated by translating and copying the information. Even in this case, since the subject in the image can be distinguished from the edge information and color information of the image, it is possible to generate highly reliable distance information by using such information.

なお，カメラＢの位置や向きが変化しているような場合には，それに伴って距離情報メモリ１０８上の距離情報を変更する必要がある。カメラの運動がカメラパラメータの変化として明示的に与えられている場合には，過去のカメラパラメータを持つカメラＢ’と現在のカメラパラメータを持つカメラＢが存在すると考え，カメラＢ’に対して与えられている距離情報をカメラＢに対する距離情報へとステップＳ１２と同様の方法で変換すればよい。一方，カメラの運動がカメラパラメータの変化として与えられない場合，距離情報を求めたい時刻の参照画像と距離情報が既知であるような時刻の参照画像との間でマッチングを取り，因子分解法を用いて被写体の運動を分離しながら，カメラの運動を取得することができる。 When the position or orientation of the camera B is changed, it is necessary to change the distance information on the distance information memory 108 accordingly. If the camera motion is explicitly given as a change in camera parameter, it is assumed that there is a camera B ′ having a past camera parameter and a camera B having a current camera parameter, and is given to the camera B ′. The obtained distance information may be converted into distance information for the camera B by the same method as in step S12. On the other hand, if camera motion is not given as a change in camera parameters, matching is performed between a reference image at a time for which distance information is desired and a reference image at a time for which distance information is known. The camera motion can be acquired while using it to separate the subject motion.

距離情報メモリ１０８に蓄積しておく過去の距離情報は，実際に使用した距離情報をそのまま格納しておいても構わないし，参照画像と復号画像との間でステレオマッチングを取り，より正しいものになるように修正したものを用いても構わない。また，本実施例では入力されないが，カメラＡの復号画像を利用してより正確な距離情報を求め直しても構わない。 As the past distance information stored in the distance information memory 108, the actually used distance information may be stored as it is, and stereo matching is performed between the reference image and the decoded image to make it more correct. You may use what was corrected so that it might become. Although not input in this embodiment, more accurate distance information may be obtained again using the decoded image of the camera A.

なお，距離情報を生成する際や蓄積の際に距離情報を修正する際に，この距離情報が未知であった領域における被写体はカメラＡから見ることができないという条件を用いて，誤った距離情報が生成されるのを防ぐことが可能である。つまり，式（１）のカメラＡとカメラＢとを入れ替えた式を用いて，生成（修正）されたカメラＢに対する距離情報をカメラＡに対する距離情報に変換した際に，変換前が画像の端の距離情報であれば画像の外側に対応画素が求まり，オクルージョン領域であれば変換後の画素においてよりカメラに近いことを示す距離情報が入力された距離情報上に存在しなくてはならない。この条件を用いて生成（修正）後の距離情報の値を制限することで，ある程度誤った距離情報が生成されるのを防ぐことが可能である。 In addition, when correcting the distance information when generating the distance information or when accumulating the distance information, an incorrect distance information is used under the condition that the subject in the area where the distance information is unknown cannot be viewed from the camera A. Can be prevented from being generated. That is, when the distance information for the camera B generated (corrected) is converted into the distance information for the camera A using the expression in which the camera A and the camera B in the expression (1) are interchanged, the image before conversion is the end of the image. In the case of the distance information, the corresponding pixel is obtained outside the image, and in the occlusion area, the distance information indicating that the converted pixel is closer to the camera must be present on the input distance information. By limiting the value of the distance information after generation (correction) using this condition, it is possible to prevent the generation of distance information that is erroneous to some extent.

距離情報の生成が終了し，全ての画素に対して有効な値の得られた距離情報は，距離情報メモリ１０８に格納される。そして同時刻の符号化対象画像と参照画像，距離情報がそれぞれ符号化対象画像メモリ１０２，参照画像メモリ１０４，距離情報メモリ１０８に格納されていれば，それらの情報を用いて視差補償画像生成部１０９で視差補償画像を生成する［ステップＳ１４］。 The generation of the distance information is completed, and the distance information for which valid values are obtained for all the pixels is stored in the distance information memory 108. If the encoding target image, the reference image, and the distance information at the same time are stored in the encoding target image memory 102, the reference image memory 104, and the distance information memory 108, respectively, the disparity compensation image generation unit is used using the information. A parallax compensation image is generated at 109 [step S14].

距離情報と画像信号とを用いて行うものであれば，視差補償画像を生成する手法としてどのような手法を用いても構わない。例えば，式（１）のカメラＡをカメラＢに，カメラＢをカメラＣに変更した次の式（２）に従って，カメラＢの画素に対応するカメラＣの画素を決定し，カメラＢの画素値を対応画素位置における視差補償画像の画素値とすることで生成することができる。 Any method may be used as a method for generating a parallax compensation image as long as it is performed using distance information and an image signal. For example, according to the following formula (2) in which the camera A in the formula (1) is changed to the camera B and the camera B is changed to the camera C, the pixel of the camera C corresponding to the pixel of the camera B is determined. Can be generated as the pixel value of the parallax compensation image at the corresponding pixel position.

座標を表すアルファベットをカメラごとに変えているが，その他の記号の表す意味は式（１）と同じである。また，カメラＢの距離情報は式（１）で求められたものではなく，背景距離情報生成部１０７で修正されたものであるため，区別するためにダッシュ（′）記号を付けた。 The alphabet representing coordinates is changed for each camera, but the meanings of the other symbols are the same as in equation (1). Further, since the distance information of the camera B is not obtained by the equation (1) but is corrected by the background distance information generation unit 107, a dash (′) symbol is added to distinguish the information.

非特許文献３のように隣接画素間で連続性を仮定しながら視差補償画像を生成しても構わない。この場合，距離情報を変換するときと異なり，対応画素間で補間される値は画素値となる。もちろん，一度カメラＣにおける距離情報に変換し，カメラＣで撮影された画像の各画素における視差情報を獲得してから，視差補償画像を生成しても構わない。また，各画素を頂点と見立てて，与えられた距離情報から三次元のポリゴンオブジェクトを生成して，それをカメラＣに投影することで視差補償画像を生成する手法もある。 As in Non-Patent Document 3, a parallax compensation image may be generated while assuming continuity between adjacent pixels. In this case, unlike the distance information conversion, the value interpolated between the corresponding pixels is the pixel value. Of course, once converted into the distance information in the camera C and the parallax information in each pixel of the image captured by the camera C is acquired, the parallax compensation image may be generated. There is also a method of generating a parallax compensation image by generating a three-dimensional polygon object from given distance information by regarding each pixel as a vertex and projecting it to the camera C.

ここで行われる視差補償画像を生成する処理も復号画像においてドリフト歪みを発生させないのであれば，復号側と同じ処理を用いる必要がある。 If the process for generating the parallax compensation image performed here does not cause drift distortion in the decoded image, it is necessary to use the same process as that on the decoding side.

視差補償画像が生成されたならば，画像符号化部１１０で符号化対象画像が符号化される［ステップＳ１５］。符号化の結果生成されるビットストリームは，映像符号化装置１００の出力となる。なお，視差補償画像を用いる符号化手法であれば，どのような符号化手法でも用いることができる。 When the parallax compensation image is generated, the image to be encoded is encoded by the image encoding unit 110 [step S15]. The bit stream generated as a result of encoding is an output of the video encoding device 100. Note that any coding method can be used as long as the coding method uses a parallax compensation image.

例えば，最も単純な方法は，生成された視差補償画像をそのまま予測画像の候補として符号化する方法であるが，非特許文献４のように視差補償画像（合成画像）を符号化対象画像に対する参照画像として用いて，予測先を示すベクトル情報を付加しながら符号化する方法や，特許文献１のように，符号化対象画像と視差補償画像との差分を取り，その差分画像を予測符号化することで符号化対象画像を符号化する方法がある。なお，視差補償画像を使用した映像予測以外に，動き補償やベクトルを用いた視差補償，画面内予測などを組み合わせて用いても構わない。 For example, the simplest method is a method in which the generated disparity compensation image is directly encoded as a predicted image candidate, but the disparity compensation image (composite image) is referred to the encoding target image as in Non-Patent Document 4. Using as an image, encoding while adding vector information indicating a prediction destination, or taking a difference between an encoding target image and a parallax compensation image as in Patent Document 1, and predictively encoding the difference image Thus, there is a method of encoding the encoding target image. In addition to video prediction using a parallax compensated image, motion compensation, parallax compensation using a vector, intra prediction, and the like may be used in combination.

符号化された符号化対象画像は，復号されて復号画像メモリ１１１に格納される［ステップＳ１６］。蓄積された復号画像は，距離情報メモリ１０８に格納されている距離情報を修正するために用いられる［ステップＳ１７］。この修正は別のフレームにおける距離情報を生成する際に，より正確な情報を予測するために行われる。例えば，カメラＢの画像であるところの参照画像と，カメラＣの画像であるところの復号画像との間でステレオマッチングを行うことで，ステップＳ１３の処理で連続性等を仮定して生成された距離情報を，実際に複数のカメラでどこに撮影されたかという事実に基づいて，より信頼度の高い距離情報へと修正する。すなわち，視差探索手段（図示省略）によって参照カメラ画像と復号画像との間で対応点探索を行い，得られた視差情報をもとに距離情報メモリ１０８に格納されている距離情報を補正する。 The encoded image to be encoded is decoded and stored in the decoded image memory 111 [step S16]. The accumulated decoded image is used to correct the distance information stored in the distance information memory 108 [step S17]. This correction is performed in order to predict more accurate information when generating distance information in another frame. For example, by performing stereo matching between a reference image that is an image of camera B and a decoded image that is an image of camera C, the image is generated assuming continuity in the process of step S13. The distance information is corrected to more reliable distance information based on the fact that the images were actually taken by a plurality of cameras. That is, a corresponding point search is performed between the reference camera image and the decoded image by a disparity search unit (not shown), and the distance information stored in the distance information memory 108 is corrected based on the obtained disparity information.

本発明では，空間的に隣接する領域の距離情報や時間的に隣接する距離情報を用いて，入力として距離情報の与えられなかった領域に対する距離情報を生成する。このことは，被写体が実空間上で連続である事実や，瞬間的に移動することができないという事実に基づいているため，ある程度正確な距離情報を生成することが可能である。 In the present invention, distance information for an area for which distance information is not given as input is generated using distance information of spatially adjacent areas or distance information of temporally adjacent areas. This is based on the fact that the subject is continuous in real space and the fact that the subject cannot move instantaneously, so that it is possible to generate accurate distance information to some extent.

もちろん未知のものを生成するため，画像の端等では距離情報の生成が非常に困難な場合もある。最悪の場合，実際の距離情報からかけ離れたものが生成されてしまう。しかしながら，一般的な映像符号化装置のように，画像符号化部が複数の予測モードを持ち，最適なものを選択しながら符号化するような場合，正しくない距離情報が生成されても符号化効率を低下させることにはならない。なぜならば，距離情報が未知の領域では視差補償画像を用いる予測モードでは効率的な予測ができないため，従来手法においても別のモードを選択して符号化を行う。つまり誤った距離情報であっても，従来手法と同様に別のモードが選択されるだけであり，符号化する情報は増大しないからである。 Of course, since unknown information is generated, it may be very difficult to generate distance information at the edge of the image. In the worst case, something far from the actual distance information is generated. However, if the image encoding unit has multiple prediction modes and encodes while selecting the optimum one as in a general video encoding device, encoding is performed even if incorrect distance information is generated. It does not reduce efficiency. This is because in a region where the distance information is unknown, efficient prediction cannot be performed in the prediction mode using the parallax compensation image. Therefore, in the conventional method, another mode is selected and encoding is performed. That is, even if the distance information is incorrect, only a different mode is selected as in the conventional method, and the information to be encoded does not increase.

次に，第２の実施例（以下，実施例２）について説明する。本発明の実施例２に係る映像復号装置の構成図を図６に示す。図６に示すように，実施例２の映像復号装置２００は，復号対象となるカメラＣの符号化データを入力する符号化データ入力部２０１と，入力された符号化データを格納する符号化データメモリ２０２と，視差補償をする際の参照画像となるカメラＢの復号画像を入力する参照カメラ画像入力部２０３と，入力された参照画像を格納する参照画像メモリ２０４と，視差補償画像を生成する際に用いるカメラＡにおけるカメラから被写体までの距離に関する情報を入力する距離情報入力部２０５と，カメラＡに対する距離情報を参照カメラであるカメラＢに対する距離情報に変換する距離情報変換部２０６と，オクルージョン等によって有効な距離情報が得られなかった領域に対して背景の距離情報を生成する背景距離情報生成部２０７と，距離情報を蓄積する距離情報メモリ２０８と，参照画像と距離情報とから復号対象画像に対する視差補償画像を生成する視差補償画像生成部２０９と，復号対象画像を実際に符号化データから復号する画像復号部２１０と，復号した画像を格納する復号画像メモリ２１１とを備える。 Next, a second example (hereinafter referred to as Example 2) will be described. FIG. 6 shows a configuration diagram of a video decoding apparatus according to Embodiment 2 of the present invention. As illustrated in FIG. 6, the video decoding device 200 according to the second embodiment includes an encoded data input unit 201 that inputs encoded data of the camera C to be decoded, and encoded data that stores the input encoded data. A memory 202, a reference camera image input unit 203 that inputs a decoded image of the camera B, which is a reference image for parallax compensation, a reference image memory 204 that stores the input reference image, and a parallax compensation image are generated. A distance information input unit 205 that inputs information about the distance from the camera to the subject in the camera A used at the time, a distance information conversion unit 206 that converts the distance information about the camera A into distance information about the camera B as a reference camera, and an occlusion A background distance information generation unit 207 that generates background distance information for an area for which effective distance information cannot be obtained, and the like. A distance information memory 208 for storing information, a parallax compensation image generation unit 209 for generating a parallax compensation image for the decoding target image from the reference image and the distance information, and an image decoding unit for actually decoding the decoding target image from the encoded data 210 and a decoded image memory 211 for storing the decoded image.

図７に，このようにして構成される映像復号装置２００が実行する処理フローを示す。この処理フローに従って，実施例２の映像復号装置２００が実行する処理を説明する。 FIG. 7 shows a processing flow executed by the video decoding apparatus 200 configured as described above. Processing executed by the video decoding apparatus 200 according to the second embodiment will be described according to this processing flow.

まず，符号化データ入力部２０１により，復号対象のカメラＣの画像を符号化した符号化データが入力され，符号化データメモリ２０２に格納される［ステップＳ２１］。また，参照カメラとなるカメラＢの復号した画像が，参照カメラ画像入力部２０３より入力され参照画像メモリ２０４に格納される。さらに，カメラＡにおけるカメラから被写体までの距離に関する情報が距離情報入力部２０５から入力される。なお，ここで全ての情報が同時に入力される必要はなく，復号対象画像の符号化データが入力される以前に入力されていても構わないし，復号対象画像の符号化データが入力された後に入力されても構わない。ただし，以下に示すステップＳ２３，Ｓ２４の処理は同時刻の参照画像と距離情報が揃うまで実行されず，ステップＳ２５，Ｓ２６の処理は同時刻の符号化データと参照画像，距離画像が揃うまでは実行されない。 First, encoded data obtained by encoding an image of the camera C to be decoded is input by the encoded data input unit 201 and stored in the encoded data memory 202 [step S21]. Further, the decoded image of the camera B serving as the reference camera is input from the reference camera image input unit 203 and stored in the reference image memory 204. Further, information regarding the distance from the camera to the subject in the camera A is input from the distance information input unit 205. Note that it is not necessary to input all the information at the same time, it may be input before the encoded data of the decoding target image is input, or may be input after the encoded data of the decoding target image is input. It does not matter. However, the processing in steps S23 and S24 shown below is not executed until the distance information and the reference image at the same time are aligned, and the processing in steps S25 and S26 is performed until the encoded data, the reference image and the distance image at the same time are aligned. Not executed.

なお，実施例１と同様に，ここで入力される距離情報は正確なカメラから被写体までの距離を表すような情報である必要はなく，この情報を用いてカメラ間の視差を求められた精度で計算できるものであればよい。ただし，ドリフト歪みの発生を防ぐためには，映像符号化装置で用いられた距離情報と同じ距離情報が入力されなければならない。 As in the first embodiment, the distance information input here does not need to be information that accurately represents the distance from the camera to the subject, and the accuracy with which the parallax between the cameras is obtained using this information. Anything that can be calculated with However, in order to prevent the occurrence of drift distortion, the same distance information as that used in the video encoding device must be input.

入力された距離情報は距離情報変換部２０６でカメラＡに対する距離情報からカメラＢに対する距離情報へと変換される［ステップＳ２２］。最も単純な方法としては，前述の計算式（１）に従って変換することが可能であり，その場合の詳細の手順は実施例１のステップＳ１２の処理の部分に記載済みである。なお，変換にはどのような方法を用いても構わないが，ドリフト歪みの発生を抑えるためには映像符号化装置で用いた方法と同じ方法を用いる必要がある。 The input distance information is converted from distance information for the camera A to distance information for the camera B by the distance information conversion unit 206 [step S22]. As the simplest method, conversion can be performed according to the above-described calculation formula (1), and the detailed procedure in this case is already described in the processing part of step S12 in the first embodiment. Note that any method may be used for the conversion, but in order to suppress the occurrence of drift distortion, it is necessary to use the same method as that used in the video encoding apparatus.

次に変換後の距離情報において，有効な値がない参照画像上の領域における距離情報を生成する［ステップＳ２３］。ここでもドリフト歪みを発生させないためには，映像符号化装置と同じ方法を用いる必要があるが，どのような方法を用いても構わない。 Next, distance information in a region on the reference image having no valid value is generated in the converted distance information [step S23]. Again, in order not to generate drift distortion, it is necessary to use the same method as the video encoding device, but any method may be used.

例えば，実空間上では被写体は連続しているため，距離情報には空間的な相関があると考えられるため，隣接の既知の距離情報をコピーして生成する方法や，周辺の距離情報からフィルタ処理によって生成する方法がある。また，被写体が瞬間的に移動しないことから距離情報メモリ２０８に格納されている近接時刻の距離情報から推定する方法もある。この場合，被写体が動かないとして同じ位置の距離情報をコピーする方法や，参照画像メモリ２０４に蓄えられている参照画像から被写体の動きを抽出し，距離情報を動き補償して生成する方法がある。なお，前述の実施例１で説明した通り，カメラＢの位置や向きが変化しているような場合には，それに伴って距離情報メモリ２０８上の距離情報を変更する必要がある。 For example, since the subject is continuous in the real space, the distance information is considered to have a spatial correlation. Therefore, a method of copying and generating adjacent known distance information, or filtering from neighboring distance information There is a method of generating by processing. There is also a method of estimating from the distance information of the proximity time stored in the distance information memory 208 because the subject does not move instantaneously. In this case, there are a method of copying distance information at the same position assuming that the subject does not move, and a method of extracting the motion of the subject from the reference image stored in the reference image memory 204 and generating the distance information by motion compensation. . As described in the first embodiment, when the position and orientation of the camera B are changed, it is necessary to change the distance information on the distance information memory 208 accordingly.

なお，距離情報メモリ２０８に蓄積しておく過去の距離情報は，実際に使用した距離情報をそのまま格納しておいても構わないし，本実施例で説明するように参照画像と復号画像との間でステレオマッチングを取り，より正しいものになるように修正したものを用いても構わない。また，本実施例では入力されないが，カメラＡの復号画像を利用してより正確な距離情報を求め直しても構わない。ただし，ドリフト歪みを発生させないためには，映像符号化装置で行った処理と同じ処理を用いて格納されるべき距離情報を求めなければならない。 Note that the past distance information stored in the distance information memory 208 may store the actually used distance information as it is, and as described in the present embodiment, between the reference image and the decoded image. You can use stereo matching that has been modified to be more correct. Although not input in this embodiment, more accurate distance information may be obtained again using the decoded image of the camera A. However, in order not to cause drift distortion, it is necessary to obtain distance information to be stored using the same processing as that performed by the video encoding device.

距離情報の生成が終了し，全ての画素に対して有効な値の得られた距離情報は，一度距離情報メモリ２０８に格納される。そして復号対象画像の符号化データ，それと同時刻の参照画像，距離情報がそれぞれ符号化データメモリ２０２，参照画像メモリ２０４，距離情報メモリ２０８に格納されていれば，それらの情報を用いて視差補償画像生成部２０９で視差補償画像を生成する［ステップＳ２４］。 The generation of the distance information is completed, and the distance information for which valid values are obtained for all the pixels is once stored in the distance information memory 208. If the encoded data of the decoding target image, the reference image at the same time, and the distance information are stored in the encoded data memory 202, the reference image memory 204, and the distance information memory 208, respectively, the disparity compensation is performed using these information. The image generation unit 209 generates a parallax compensation image [step S24].

実施例１と同様に視差補償画像を生成する方法としては，どのようなものを用いても構わない。ただし，ドリフト歪みの発生を防ぐには，映像符号化装置で用いられた手法と同じ手法を用いる必要がある。 As in the first embodiment, any method may be used as a method for generating a parallax compensation image. However, in order to prevent the occurrence of drift distortion, it is necessary to use the same method as that used in the video encoding device.

視差補償画像が生成されたら，これを用いて画像復号部２１０で符号化データが復号され，復号画像が得られる［ステップＳ２５］。この復号画像は復号画像メモリ２１１に格納されると共に，映像復号装置２００の出力となる。この実施例では復号順に出力されるが，復号画像メモリ２１１でバッファリングを行い，撮影された順に出力しても構わない。なお，ここでは映像符号化装置で用いられた符号化手法を用いて生成された符号化データを正しく復号できる手法を用いる必要がある。 When the parallax compensated image is generated, the encoded data is decoded by the image decoding unit 210 using this image, and a decoded image is obtained [step S25]. The decoded image is stored in the decoded image memory 211 and is output from the video decoding device 200. In this embodiment, the images are output in the order of decoding. However, buffering may be performed in the decoded image memory 211 and the images may be output in the order of shooting. Here, it is necessary to use a method that can correctly decode the encoded data generated by using the encoding method used in the video encoding device.

また，蓄積された復号画像は距離情報メモリ２０８に格納されている距離情報を修正するために用いられる［ステップＳ２６］。この修正は別のフレームにおける距離情報を生成する際に，より正確な情報を予測するために行われる。例えば，カメラＢの画像であるところの参照画像と，カメラＣの画像であるところの復号画像との間でステレオマッチングを行うことで，ステップＳ２３の処理で連続性等を仮定して生成された距離情報を，実際に複数のカメラでどこに撮影されたかという事実に基づいて，より信頼度の高い距離情報へと修正する。すなわち，視差探索手段（図示省略）によって参照カメラ画像と復号画像との間で対応点探索を行い，得られた視差情報をもとに距離情報メモリ２０８に格納されている距離情報を補正する。なお，ここでの処理もドリフト歪みを防ぐためには，映像符号化装置と同じ処理を用いる必要がある。 The accumulated decoded image is used to correct the distance information stored in the distance information memory 208 [step S26]. This correction is performed in order to predict more accurate information when generating distance information in another frame. For example, by performing stereo matching between a reference image that is an image of camera B and a decoded image that is an image of camera C, the image is generated assuming continuity in the process of step S23. The distance information is corrected to more reliable distance information based on the fact that the images were actually taken by a plurality of cameras. That is, a corresponding point search is performed between the reference camera image and the decoded image by a parallax search unit (not shown), and the distance information stored in the distance information memory 208 is corrected based on the obtained parallax information. In this case, the same processing as that of the video encoding device needs to be used in order to prevent drift distortion.

ここで説明した実施例では，映像符号化装置，映像復号装置として説明を行ったが，複数カメラで取られた多視点画像の符号化及び復号に適用することも可能である。その場合には，時間方向の連続性は利用できず，空間方向の連続性のみを利用して，有効な距離情報が得られなかった領域の距離情報を生成することになる。また，距離情報の変換・無効領域における距離情報の生成を用いて視差補償画像を生成する処理を抽出し，入力映像（上記実施例の中ではカメラＡの映像）とそれに対する距離情報から，その他のカメラで撮影された映像（上記実施例ではカメラＢの映像）を合成する処理に利用することも可能である。この場合においても，別の時刻の距離情報生成のための生成した距離情報の更新（上記実施例では，処理ステップＳ１７，Ｓ２６に相当）を行うことで，より高品質な映像合成が可能となる。 In the embodiment described here, the video encoding device and the video decoding device have been described. However, the present invention can also be applied to encoding and decoding of multi-viewpoint images taken by a plurality of cameras. In that case, the continuity in the time direction cannot be used, and only the continuity in the spatial direction is used to generate the distance information of the area for which effective distance information cannot be obtained. Also, a process for generating a disparity compensation image using distance information conversion / distance information generation in an invalid area is extracted, and other information is obtained from the input video (camera A video in the above embodiment) and distance information for the input video. It is also possible to use for the process which synthesize | combines the image | video (camera B image | video in the said Example) image | photographed with this camera. Even in this case, by updating the generated distance information for generating the distance information at another time (corresponding to the processing steps S17 and S26 in the above embodiment), higher quality video composition can be performed. .

以上説明した処理は，コンピュータとソフトウェアプログラムとによっても実現することができ，そのプログラムをコンピュータ読み取り可能な記録媒体に記録して提供することも，ネットワークを通して提供することも可能である。 The processing described above can be realized by a computer and a software program, and the program can be provided by being recorded on a computer-readable recording medium or can be provided through a network.

また，以上の実施の形態では，映像符号化装置及び映像復号装置を中心に説明したが，これら映像符号化装置及び映像復号装置の各部の動作に対応したステップによって本発明の映像符号化方法及び映像復号方法を実現することができる。 In the above embodiments, the video encoding device and the video decoding device have been mainly described, but the video encoding method and the video encoding method according to the present invention are performed according to the steps corresponding to the operations of the respective units of the video encoding device and the video decoding device. A video decoding method can be realized.

本実施の形態による映像符号化方法及び映像復号方法の特徴を列挙すると，以下のとおりである。 The characteristics of the video encoding method and video decoding method according to this embodiment are listed as follows.

（１）本実施の形態による映像符号化方法は，１つの基準視点カメラを含む複数のカメラにより撮影された多視点画像または多視点動画像における符号化対象画像を，上記基準視点カメラから被写体までの距離を表す距離情報を用いて，カメラ間の画像予測を行うことにより符号化する映像符号化方法であって，符号化対象画像を符号化するにあたってカメラ間の画像信号の予測に用いる，既に符号化済みのカメラ画像を復号した参照カメラ画像を設定する参照カメラ画像設定ステップと，上記距離情報を基準視点カメラで撮影された基準視点カメラ画像に対するものから，上記基準視点カメラ画像と上記参照カメラ画像との対応関係をもとに上記参照カメラ画像におけるカメラから被写体までの距離を表す距離情報へと変換する距離情報変換ステップと，上記変換された距離情報と上記参照カメラ画像とから，上記符号化対象画像に対する視差補償画像を生成する視差補償画像生成ステップと，上記視差補償画像を用いて画像予測を行い，上記符号化対象画像を符号化する画像符号化ステップとを有することを特徴とする。 (1) In the video encoding method according to the present embodiment, an encoding target image in a multi-view image or multi-view video captured by a plurality of cameras including one reference viewpoint camera is transmitted from the reference viewpoint camera to the subject. A video encoding method that encodes by performing image prediction between cameras using distance information representing the distance of the image, and is used for prediction of an image signal between cameras when encoding an encoding target image. A reference camera image setting step for setting a reference camera image obtained by decoding an encoded camera image, and the reference viewpoint camera image and the reference camera from the reference information for the reference viewpoint camera image captured by the reference viewpoint camera. Based on the correspondence with the image, the distance information variable is converted into distance information representing the distance from the camera to the subject in the reference camera image. A parallax-compensated image generating step for generating a parallax-compensated image for the encoding target image from the converted distance information and the reference camera image, image prediction is performed using the parallax-compensated image, and the code And an image encoding step for encoding the encoding target image.

（２）また，上記（１）の映像符号化方法において，上記距離情報の変換の際に上記基準視点カメラ画像との対応関係が得られなかった上記参照カメラ画像の領域に対して，距離情報を生成することで上記変換された距離情報を修正する距離情報修正ステップを有し，上記視差補償画像生成ステップでは，上記修正された距離情報を用いることを特徴とする。 (2) In the video encoding method of (1), distance information is obtained for a region of the reference camera image for which a correspondence relationship with the reference viewpoint camera image has not been obtained when the distance information is converted. A distance information correction step for correcting the converted distance information by generating the distance information, wherein the corrected distance information is used in the parallax compensation image generation step.

（３）また，上記（２）の映像符号化方法において，上記距離情報修正ステップでは，上記対応関係が得られなかった領域の距離情報を，その領域の周辺領域における距離情報との連続性を仮定して，その領域の周辺領域における距離情報をもとに生成することを特徴とする。 (3) In the video encoding method of (2) above, in the distance information correction step, the distance information of the area for which the correspondence relationship is not obtained is converted into continuity with the distance information in the peripheral area of the area. It is assumed that it is generated based on distance information in the peripheral area of the area.

（４）また，上記（２）の映像符号化方法において，上記参照カメラ画像において同じ被写体が撮影されている領域を検出し，上記参照カメラ画像を上記検出された領域で分割する領域分割ステップを有し，上記距離情報修正ステップでは，上記領域分割ステップにおいて同じ分割に属する周辺領域における距離情報との連続性を仮定して距離情報を生成することを特徴とする。 (4) In the video encoding method of (2), an area dividing step of detecting an area where the same subject is photographed in the reference camera image and dividing the reference camera image by the detected area. And the distance information correcting step generates distance information on the assumption that continuity with distance information in surrounding areas belonging to the same division is assumed in the area dividing step.

（５）また，上記（２）の映像符号化方法において，上記視差補償画像生成ステップで使用された距離情報を蓄積する距離情報蓄積ステップを有し，上記距離情報修正ステップでは，蓄積されていた距離情報を用いて距離情報を生成することを特徴とする。 (5) Further, in the video encoding method of (2), there is a distance information storage step for storing the distance information used in the parallax compensation image generation step, and the distance information correction step stores the distance information. The distance information is generated using the distance information.

（６）また，上記（２）の映像符号化方法において，上記視差補償画像生成ステップで使用された距離情報を蓄積する距離情報蓄積ステップと，上記視差補償画像生成ステップで使用された参照カメラ画像を蓄積する参照カメラ画像蓄積ステップと，蓄積されていた参照カメラ画像と上記参照カメラ画像との間で対応点探索を行う動き探索ステップとを有し，上記距離情報修正ステップでは，上記対応点探索ステップで得られた被写体の時間的な移動情報を用いて，蓄積されていた距離情報から現時刻における距離情報を生成することを特徴とする。 (6) In the video encoding method of (2), a distance information storage step for storing distance information used in the parallax compensation image generation step, and a reference camera image used in the parallax compensation image generation step. A reference camera image accumulation step for accumulating, and a motion search step for searching for corresponding points between the accumulated reference camera image and the reference camera image. In the distance information correction step, the corresponding point search is performed. Using the temporal movement information of the subject obtained in the step, distance information at the current time is generated from the accumulated distance information.

（７）また，上記（５）または（６）の映像符号化方法において，上記画像符号化ステップで符号化された画像を復号する画像復号ステップと，上記復号された画像と上記参照カメラ画像との間で対応点探索を行う視差探索ステップと，上記視差探索ステップで得られた視差情報から，上記視差補償画像生成ステップで使用された距離情報を補正する距離情報補正ステップとを有し，上記距離情報蓄積ステップでは，距離情報補正ステップで補正された距離情報を蓄積することを特徴とする。 (7) In the video encoding method of (5) or (6), an image decoding step for decoding the image encoded in the image encoding step, the decoded image, the reference camera image, A parallax search step for searching for corresponding points between, and a distance information correction step for correcting the distance information used in the parallax compensation image generation step from the parallax information obtained in the parallax search step, In the distance information storage step, the distance information corrected in the distance information correction step is stored.

（８）また，上記（２）から（７）までのいずれかの映像符号化方法において，上記距離情報修正ステップでは，生成した距離情報が上記基準視点カメラに対する距離情報へと変換した際に，有効な距離情報とならないように生成を行うことを特徴とする。 (8) In the video encoding method of any one of (2) to (7), in the distance information correction step, when the generated distance information is converted into distance information for the reference viewpoint camera, The generation is performed so as not to be effective distance information.

（９）また，上記（１）から（８）までのいずれかの映像符号化方法において，上記画像符号化ステップでは，上記視差補償画像を予測画像の候補として使用しながら符号化対象画像を符号化することを特徴とする。 (9) In the video encoding method according to any one of (1) to (8), in the image encoding step, the encoding target image is encoded using the parallax compensation image as a predicted image candidate. It is characterized by becoming.

（１０）また，上記（１）から（８）までのいずれかの映像符号化方法において，上記画像符号化ステップでは，上記視差補償画像を映像予測に用いる参照画像の候補として使用しながら符号化対象画像を符号化することを特徴とする。 (10) In the video encoding method according to any one of (1) to (8), in the image encoding step, encoding is performed using the parallax compensation image as a reference image candidate used for video prediction. The target image is encoded.

（１１）また，上記（１）から（８）までのいずれかの映像符号化方法において，上記画像符号化ステップでは，上記視差補償画像と符号化対象画像の差分画像を生成し，その差分画像を符号化することで符号化対象画像を符号化することを特徴とする。 (11) In the video encoding method according to any one of (1) to (8), in the image encoding step, a difference image between the parallax compensation image and the encoding target image is generated, and the difference image The encoding target image is encoded by encoding.

（１２）本実施の形態による映像復号方法は，１つの基準視点カメラを含む複数のカメラにより撮影された多視点画像または多視点動画像を符号化した符号化データを，上記基準視点カメラから被写体までの距離を表す距離情報を用いて，カメラ間の画像予測を行うことにより復号する映像復号方法であって，復号対象画像を符号化する際にカメラ間の画像信号の予測に用いられた，既に復号済みの参照カメラ画像を設定する参照カメラ画像設定ステップと，上記距離情報を基準視点カメラで撮影された基準視点カメラ画像に対するものから，上記基準視点カメラ画像と上記参照カメラ画像との対応関係をもとに上記参照カメラ画像におけるカメラから被写体までの距離を表す距離情報へと変換する距離情報変換ステップと，上記変換された距離情報と上記参照カメラ画像とから，復号対象画像に対する視差補償画像を生成する視差補償画像生成ステップと，上記視差補償画像を用いて画像予測を行い，符号化データから復号対象画像を復号する画像復号ステップとを有することを特徴とする。 (12) In the video decoding method according to the present embodiment, encoded data obtained by encoding a multi-view image or a multi-view video captured by a plurality of cameras including one reference viewpoint camera is transmitted from the reference viewpoint camera to the subject. This is a video decoding method that decodes by performing image prediction between cameras using distance information that represents the distance up to, and was used to predict the image signal between cameras when encoding the decoding target image. Correspondence relationship between the reference viewpoint camera image and the reference camera image from the reference camera image setting step for setting the reference camera image that has already been decoded and the reference distance camera image captured by the reference viewpoint camera with the distance information. A distance information conversion step for converting the reference camera image into distance information representing the distance from the camera to the subject based on A disparity compensation image generation step for generating a disparity compensation image for the decoding target image from the separation information and the reference camera image, and an image for performing image prediction using the disparity compensation image and decoding the decoding target image from the encoded data And a decoding step.

（１３）また，上記（１２）の映像復号方法において，上記距離情報の変換の際に上記基準視点カメラ画像との対応関係が得られなかった上記参照カメラ画像の領域に対して，距離情報を生成することで上記変換された距離情報を修正する距離情報修正ステップを有し，上記視差補償画像生成ステップでは，上記修正された距離情報を用いることを特徴とする。 (13) Also, in the video decoding method of (12), distance information is applied to an area of the reference camera image for which a correspondence relationship with the reference viewpoint camera image has not been obtained when the distance information is converted. It has a distance information correction step for correcting the converted distance information by generating, and the corrected distance information is used in the parallax compensation image generation step.

（１４）また，上記（１３）の映像復号方法において，上記距離情報修正ステップでは，上記対応関係が得られなかった領域の距離情報を，その領域の周辺領域における距離情報との連続性を仮定して，その領域の周辺領域における距離情報をもとに生成することを特徴とする。 (14) Also, in the video decoding method of (13), in the distance information correction step, the distance information of the area for which the correspondence relationship has not been obtained is assumed to be continuous with the distance information in the peripheral area of the area. Then, it is generated based on the distance information in the peripheral area of the area.

（１５）また，上記（１３）の映像復号方法において，上記参照カメラ画像において同じ被写体が撮影されている領域を検出し，上記参照カメラ画像を上記検出された領域で分割する領域分割ステップを有し，上記距離情報修正ステップでは，上記領域分割ステップにおいて同じ分割に属する周辺領域における距離情報との連続性を仮定して距離情報を生成することを特徴とする。 (15) The video decoding method according to (13) further includes a region dividing step of detecting a region where the same subject is photographed in the reference camera image and dividing the reference camera image by the detected region. In the distance information correction step, the distance information is generated by assuming continuity with the distance information in the peripheral area belonging to the same division in the area division step.

（１６）また，上記（１３）の映像復号方法において，上記視差補償画像生成ステップで使用された距離情報を蓄積する距離情報蓄積ステップを有し，上記距離情報修正ステップでは，蓄積されていた距離情報を用いて距離情報を生成することを特徴とする。 (16) The video decoding method of (13) further includes a distance information storage step for storing distance information used in the parallax compensation image generation step, and the distance information stored in the distance information correction step is stored in the distance information storage step. The distance information is generated using the information.

（１７）また，上記（１３）の映像復号方法において，上記視差補償画像生成ステップで使用された距離情報を蓄積する距離情報蓄積ステップと，上記視差補償画像生成ステップで使用された参照カメラ画像を蓄積する参照カメラ画像蓄積ステップと，蓄積されていた参照カメラ画像と上記参照カメラ画像との間で対応点探索を行う動き探索ステップとを有し，上記距離情報修正ステップでは，上記対応点探索ステップで得られた被写体の時間的な移動情報を用いて，蓄積されていた距離情報から現時刻における距離情報を生成することを特徴とする。 (17) In the video decoding method of (13), a distance information accumulation step for accumulating distance information used in the parallax compensation image generation step, and a reference camera image used in the parallax compensation image generation step. A reference camera image storage step for storing; and a motion search step for searching for a corresponding point between the stored reference camera image and the reference camera image. In the distance information correcting step, the corresponding point searching step The distance information at the current time is generated from the accumulated distance information using the temporal movement information of the subject obtained in the above.

（１８）上記（１６）または（１７）の映像復号方法において，上記画像復号ステップで復号された画像と上記参照カメラ画像との間で対応点探索を行う視差探索ステップと，上記視差探索ステップで得られた視差情報から，上記視差補償画像生成ステップで使用された距離情報を補正する距離情報補正ステップとを有し，上記距離情報蓄積ステップでは，距離情報補正ステップで補正された距離情報を蓄積することを特徴とする。 (18) In the video decoding method of (16) or (17), a parallax search step for searching for corresponding points between the image decoded in the image decoding step and the reference camera image, and the parallax search step A distance information correction step for correcting the distance information used in the parallax compensation image generation step from the obtained parallax information, and the distance information correction step stores the distance information corrected in the distance information correction step. It is characterized by doing.

（１９）上記（１３）から（１８）までのいずれかの映像復号方法において，上記距離情報修正ステップでは，生成した距離情報が上記基準視点カメラに対する距離情報へと変換した際に，有効な距離情報とならないように生成を行うことを特徴とする。 (19) In any one of the video decoding methods of (13) to (18), in the distance information correction step, an effective distance when the generated distance information is converted into distance information for the reference viewpoint camera. It is characterized by generating so as not to become information.

（２０）上記（１２）から（１９）までのいずれかの映像復号方法において，上記画像復号ステップでは，上記視差補償画像を予測画像の候補として使用しながら符号化データから復号対象画像を復号することを特徴とする。 (20) In any one of the video decoding methods according to (12) to (19), in the image decoding step, the decoding target image is decoded from the encoded data while using the parallax compensation image as a predicted image candidate. It is characterized by that.

（２１）上記（１２）から（１９）までのいずれかの映像復号方法において，上記画像復号ステップでは，上記視差補償画像を映像予測に用いる参照画像の候補として使用しながら符号化データから復号対象画像を復号することを特徴とする。 (21) In the video decoding method according to any one of (12) to (19), in the image decoding step, a decoding target is encoded from encoded data while using the parallax compensated image as a reference image candidate used for video prediction. It is characterized by decoding an image.

（２２）上記（１２）から（１９）までのいずれかの映像復号方法において，上記画像復号ステップでは，符号化データから上記視差補償画像と復号対象画像の差分画像を復号し，その差分画像と上記視差補償画像を足し合わせることで復号対象画像を復号することを特徴とする。 (22) In any one of the video decoding methods of (12) to (19), in the image decoding step, a difference image between the parallax compensation image and a decoding target image is decoded from encoded data, The decoding target image is decoded by adding the parallax compensation images.

本実施の形態による効果を，従来技術と対比して説明する。 The effects of this embodiment will be described in comparison with the prior art.

非特許文献２に記載されているような符号化対象画像に対して視差補償に必要な情報を符号化する従来技術（従来技術１）は，符号化対象画像ごとに予測に最適な参照カメラを選択することができるという利点がある一方，付加情報量（視差補償に必要な情報の量）が膨大になるという欠点がある。 The conventional technique (conventional technique 1) that encodes information necessary for parallax compensation with respect to an encoding target image as described in Non-Patent Document 2 has a reference camera that is optimal for prediction for each encoding target image. While there is an advantage that it can be selected, there is a disadvantage that the amount of additional information (the amount of information necessary for parallax compensation) becomes enormous.

また，特許文献１に記載されているような参照カメラを共通化し，参照画像に対して視差補償に必要な距離情報を符号化する従来技術（従来技術２）は，付加情報量が非常に少ないという利点はあるものの，符号化対象画像ごとに予測に最適な参照カメラを選択できないため，カメラ間隔が大きくなると視差補償による映像予測能力が低下してしまうという欠点がある。 Further, the conventional technique (conventional technique 2) in which a reference camera as described in Patent Document 1 is shared and distance information necessary for parallax compensation is encoded with respect to a reference image has a very small amount of additional information. However, since a reference camera optimal for prediction cannot be selected for each image to be encoded, there is a drawback in that the video prediction capability by parallax compensation decreases when the camera interval increases.

これに対して，本実施の形態は，次のようような効果がある。
〔効果１〕従来技術２と同様の付加情報量で，符号化対象画像ごとに予測に最適な参照カメラを選択して視差補償による映像予測を行うことが可能になる。
〔効果２〕付加情報量を増加させずに，従来手法では全く予測を行うことができなかったオクルージョン領域において，視差補償による映像予測を提供することが可能になる。
〔効果３〕精度に関して信頼度の高い距離情報（視差補償に必要な情報）を生成することが可能になる。 In contrast, the present embodiment has the following effects.
[Effect 1] With the same amount of additional information as in the prior art 2, it is possible to perform video prediction by parallax compensation by selecting a reference camera optimal for prediction for each encoding target image.
[Effect 2] Without increasing the amount of additional information, it is possible to provide video prediction based on parallax compensation in an occlusion region that could not be predicted at all by the conventional method.
[Effect 3] It becomes possible to generate distance information (information necessary for parallax compensation) having high reliability with respect to accuracy.

上記効果１が得られる理由は，本実施の形態では，符号化及び復号時に入力された距離情報を参照カメラに対する距離情報へと変換する手段を備えるためである。カメラが物体を撮影して２次元に投影する際の原理を利用する。これは，図５のステップＳ１２及び図７のステップＳ２２で行われる処理である。例えば式（１）などを用いる変換処理がこの処理に相当する。 The reason why the effect 1 is obtained is that the present embodiment includes means for converting distance information input during encoding and decoding into distance information for the reference camera. The principle used when the camera captures an object and projects it in two dimensions is used. This is processing performed in step S12 in FIG. 5 and step S22 in FIG. For example, conversion processing using equation (1) or the like corresponds to this processing.

上記効果２が得られる理由は，本実施の形態では，距離情報の変換を行った際に，有効な距離情報が得られないオクルージョン領域や画像端の領域において，視差補償を行うのに必要な距離情報を生成する手段を備えるためである。異なるカメラで撮影した映像の間では，どちらか一方でしか撮影されていない領域が存在する。距離情報の変換では，あるカメラで撮影された画像に対して与えられる距離情報を別の視点で撮影された画像に対する距離情報へと変換を行うため，変換先のカメラでは撮影されているが，変換前のカメラでは撮影されていない領域に対して有効な距離情報を与えることができない。このような場合に，本実施の形態では，現実空間において物体は空間的に連続しており，時間的に瞬間的に移動や変化することができないという事実に基づいて距離情報を生成する。したがって，生成した距離情報を用いて視差補償画像を生成することができるようになる。この処理は，図５のステップＳ１３及び図７のステップＳ２３で行われる処理である。一般に画像信号よりも距離情報のほうが空間的・時間的な連続性が高いため，生成した視差補償画像上で連続性を仮定した画像信号の生成を行うよりも，距離情報の連続性を仮定した生成を行い，それを用いて視差補償を行うことで，高品質な予測画像を生成することが可能となる。 The reason why the above effect 2 can be obtained is that, in the present embodiment, when distance information is converted, it is necessary to perform parallax compensation in an occlusion area or an image edge area where effective distance information cannot be obtained. This is for providing means for generating distance information. There are areas where only one of the images captured by different cameras is captured. In the conversion of distance information, distance information given to an image shot with a camera is converted into distance information for an image shot with a different viewpoint. Effective distance information cannot be given to an area that is not photographed by the camera before conversion. In such a case, in the present embodiment, the distance information is generated based on the fact that the object is spatially continuous in the real space and cannot be moved or changed instantaneously in time. Therefore, a parallax compensation image can be generated using the generated distance information. This process is performed in step S13 in FIG. 5 and step S23 in FIG. In general, distance information is higher in spatial and temporal continuity than image signal, so continuity of distance information is assumed rather than generating an image signal assuming continuity on the generated parallax compensation image. It is possible to generate a high-quality predicted image by performing generation and performing parallax compensation using the generated image.

上記効果３が得られる理由は，本実施の形態では，符号化後に生成された距離情報を，参照画像とローカルデコード（復号）画像とを用いて正確な対応点を示す距離情報へと更新する手段を備えるためである。生成された距離情報は，ある程度は正確であるが誤りを含む。ローカルデコード画像は符号化・復号の両方で同じものが得られるため，これを用いて距離情報を更新することで，符号化順で後に符号化されるフレームにおいては正確な距離情報の利用が可能となる。更新には，画像信号から距離情報を推定するステレオマッチング法などを使用することができる。図５のステップＳ１７及び図７のステップＳ２６で行われる処理である。 The reason why the third effect is obtained is that, in the present embodiment, the distance information generated after encoding is updated to distance information indicating an accurate corresponding point using a reference image and a local decoded (decoded) image. It is for providing a means. The generated distance information is accurate to some extent but contains errors. Since the same local decoding image can be obtained for both encoding and decoding, updating the distance information using this enables accurate distance information to be used in frames that are encoded later in the encoding order. It becomes. For the update, a stereo matching method for estimating distance information from an image signal can be used. This is processing performed in step S17 of FIG. 5 and step S26 of FIG.

以上，図面を参照して本発明の実施の形態を説明してきたが，上記実施の形態は本発明の例示に過ぎず，本発明が上記実施の形態に限定されるものでないことは明らかである。したがって，本発明の精神及び範囲を逸脱しない範囲で構成要素の追加，省略，置換，その他の変更を行っても良い。 The embodiments of the present invention have been described above with reference to the drawings. However, the above embodiments are merely examples of the present invention, and it is clear that the present invention is not limited to the above embodiments. . Accordingly, additions, omissions, substitutions, and other modifications of the components may be made without departing from the spirit and scope of the present invention.

カメラ間で生じる視差の概念図である。It is a conceptual diagram of the parallax which arises between cameras. エピポーラ幾何拘束の概念図である。It is a conceptual diagram of epipolar geometric constraint. 実施例におけるカメラ構成例の概念図である。It is a conceptual diagram of the camera structural example in an Example. 本発明の実施例１の映像符号化装置を示す図である。It is a figure which shows the video coding apparatus of Example 1 of this invention. 実施例１における映像符号化フローチャートである。4 is a video encoding flowchart according to the first exemplary embodiment. 本発明の実施例２の映像復号装置を示す図である。It is a figure which shows the video decoding apparatus of Example 2 of this invention. 実施例２における映像復号フローチャートである。10 is a video decoding flowchart according to the second embodiment.

Explanation of symbols

１００映像符号化装置
１０１符号化対象画像入力部
１０２符号化対象画像メモリ
１０３参照カメラ画像入力部
１０４参照画像メモリ
１０５距離情報入力部
１０６距離情報変換部
１０７背景距離情報生成部
１０８距離情報メモリ
１０９視差補償画像生成部
１１０画像符号化部
１１１復号画像メモリ
２００映像復号装置
２０１符号化データ入力部
２０２符号化データメモリ
２０３参照カメラ画像入力部
２０４参照画像メモリ
２０５距離情報入力部
２０６距離情報変換部
２０７背景距離情報生成部
２０８距離情報メモリ
２０９視差補償画像生成部
２１０画像復号部
２１１復号画像メモリ DESCRIPTION OF SYMBOLS 100 Video coding apparatus 101 Encoding object image input part 102 Encoding object image memory 103 Reference camera image input part 104 Reference image memory 105 Distance information input part 106 Distance information conversion part 107 Background distance information generation part 108 Distance information memory 109 Parallax Compensation image generation unit 110 Image encoding unit 111 Decoded image memory 200 Video decoding device 201 Encoded data input unit 202 Encoded data memory 203 Reference camera image input unit 204 Reference image memory 205 Distance information input unit 206 Distance information conversion unit 207 Background Distance information generation unit 208 Distance information memory 209 Parallax compensated image generation unit 210 Image decoding unit 211 Decoded image memory

Claims

The encoding target image in the multi-view image or a multi-viewpoint video images taken by the plurality of cameras, including one reference view camera, using the distance information representing the distance from the reference view camera to the object, between the camera A video encoding method for encoding by performing image prediction,
Reference camera image setting for setting a reference camera image obtained by decoding a camera image different from the camera image of the reference viewpoint camera, which has already been encoded, and is used to predict an image signal between cameras when encoding an encoding target image Steps,
Represents the distance from the camera to the subject in the reference camera image based on the correspondence between the reference viewpoint camera image and the reference camera image from the distance information for the reference viewpoint camera image captured by the reference viewpoint camera A distance information conversion step for converting into distance information;
A disparity compensation image generation step for generating a disparity compensation image for the encoding target image from the converted distance information and the reference camera image;
A video encoding method comprising: an image encoding step of performing image prediction using the parallax compensation image and encoding the encoding target image.

The video encoding method according to claim 1, wherein
Distance information for correcting the converted distance information by generating distance information for the region of the reference camera image for which the correspondence with the reference viewpoint camera image was not obtained during the conversion of the distance information A correction step,
In the parallax compensation image generation step, the corrected distance information is used.

The video encoding method according to claim 2,
In the distance information correction step, the distance information of the area where the correspondence relationship is not obtained is assumed to be continuous with the distance information in the surrounding area of the area, and based on the distance information in the surrounding area of the area. A video encoding method comprising generating the video encoding method.

The video encoding method according to claim 2,
A distance information storage step for storing the distance information used in the parallax compensation image generation step;
In the distance information correction step, the distance information is generated using the accumulated distance information.

The video encoding method according to claim 4, wherein
An image decoding step for decoding the image encoded in the image encoding step;
A parallax search step of searching for corresponding points between the decoded image and the reference camera image;
A distance information correction step for correcting the distance information used in the parallax compensation image generation step from the parallax information obtained in the parallax search step;
In the distance information storage step, the distance information corrected in the distance information correction step is stored.

The encoded data obtained by encoding a multi-view image or a multi-view video captured by a plurality of cameras including one reference viewpoint camera is used as distance information representing the distance from the reference viewpoint camera to the subject. A video decoding method for decoding by performing image prediction,
A reference camera image setting step for setting a reference camera image different from the camera image of the base viewpoint camera that has already been decoded and used for prediction of an image signal between cameras when encoding a decoding target image;
Represents the distance from the camera to the subject in the reference camera image based on the correspondence between the reference viewpoint camera image and the reference camera image from the distance information for the reference viewpoint camera image captured by the reference viewpoint camera A distance information conversion step for converting into distance information;
A disparity compensation image generation step for generating a disparity compensation image for the decoding target image from the converted distance information and the reference camera image;
A video decoding method comprising: an image decoding step of performing image prediction using the parallax-compensated image and decoding a decoding target image from encoded data.

The video decoding method according to claim 6,
Distance information for correcting the converted distance information by generating distance information for the region of the reference camera image for which the correspondence with the reference viewpoint camera image was not obtained during the conversion of the distance information A correction step,
In the parallax compensation image generation step, the corrected distance information is used.

The video decoding method according to claim 7,
In the distance information correction step, the distance information of the area where the correspondence relationship is not obtained is assumed to be continuous with the distance information in the surrounding area of the area, and based on the distance information in the surrounding area of the area. And a video decoding method.

The video decoding method according to claim 7,
A distance information storage step for storing the distance information used in the parallax compensation image generation step;
In the distance information correcting step, the distance information is generated using the accumulated distance information.

The video decoding method according to claim 9, wherein
A parallax search step for searching for corresponding points between the image decoded in the image decoding step and the reference camera image;
A distance information correction step for correcting the distance information used in the parallax compensation image generation step from the parallax information obtained in the parallax search step;
In the distance information storage step, the distance information corrected in the distance information correction step is stored.

A multi-view image captured by a plurality of cameras including one reference viewpoint camera or an encoding target image in a multi-view moving image is obtained by using distance information representing the distance from the reference viewpoint camera to the subject. A video encoding device for encoding by performing prediction,
Reference camera image setting for setting a reference camera image obtained by decoding a camera image different from the camera image of the reference viewpoint camera, which has already been encoded, and is used to predict an image signal between cameras when encoding an encoding target image Means,
Represents the distance from the camera to the subject in the reference camera image based on the correspondence between the reference viewpoint camera image and the reference camera image from the distance information for the reference viewpoint camera image captured by the reference viewpoint camera Distance information converting means for converting into distance information;
A disparity compensation image generating means for generating a disparity compensation image for the encoding target image from the converted distance information and the reference camera image;
An image encoding apparatus comprising: an image encoding unit that performs image prediction using the parallax-compensated image and encodes the encoding target image.

The video encoding device according to claim 11, wherein
Distance information for correcting the converted distance information by generating distance information for the region of the reference camera image for which the correspondence with the reference viewpoint camera image was not obtained during the conversion of the distance information With correction means,
The video encoding apparatus, wherein the parallax compensation image generating means uses the corrected distance information.

The video encoding device according to claim 12,
The distance information correcting means is based on the distance information in the peripheral area of the area assuming the continuity of the distance information of the area where the correspondence relationship is not obtained with the distance information in the peripheral area of the area. A video encoding device characterized by generating.

The video encoding device according to claim 12,
Distance information storage means for storing distance information used by the parallax compensation image generation means;
The video encoding apparatus according to claim 1, wherein the distance information correcting means generates distance information using the accumulated distance information.

The video encoding device according to claim 14, wherein
Image decoding means for decoding the image encoded by the image encoding means;
Disparity search means for searching for corresponding points between the decoded image and the reference camera image;
Distance information correction means for correcting distance information used by the parallax compensation image generation means from the parallax information obtained by the parallax search means;
The distance information storage means stores the distance information corrected by the distance information correction means.

The encoded data obtained by encoding a multi-view image or a multi-view video captured by a plurality of cameras including one reference viewpoint camera is used as distance information representing the distance from the reference viewpoint camera to the subject. A video decoding device that decodes by performing image prediction of
Reference camera image setting means for setting a reference camera image different from the camera image of the base viewpoint camera that has already been decoded and used for prediction of an image signal between cameras when encoding a decoding target image;
Represents the distance from the camera to the subject in the reference camera image based on the correspondence between the reference viewpoint camera image and the reference camera image from the distance information for the reference viewpoint camera image captured by the reference viewpoint camera Distance information converting means for converting into distance information;
A disparity compensation image generating means for generating a disparity compensation image for the decoding target image from the converted distance information and the reference camera image;
A video decoding apparatus comprising: an image decoding unit that performs image prediction using the parallax-compensated image and decodes a decoding target image from encoded data.

The video decoding device according to claim 16, wherein
Distance information for correcting the converted distance information by generating distance information for the region of the reference camera image for which the correspondence with the reference viewpoint camera image was not obtained during the conversion of the distance information With correction means,
The video decoding device, wherein the parallax compensation image generation means uses the corrected distance information.

The video decoding device according to claim 17,
The distance information correcting means is based on the distance information in the peripheral area of the area assuming the continuity of the distance information of the area where the correspondence relationship is not obtained with the distance information in the peripheral area of the area. A video decoding device characterized by generating the video decoding device.

The video decoding device according to claim 17,
Distance information storage means for storing distance information used by the parallax compensation image generation means;
The distance information correction means generates distance information using the accumulated distance information.

The video decoding device according to claim 19,
Disparity search means for searching for corresponding points between the image decoded by the image decoding means and the reference camera image;
Distance information correction means for correcting distance information used by the parallax compensation image generation means from the parallax information obtained by the parallax search means;
The distance information storage means stores the distance information corrected by the distance information correction means.

A video encoding program for causing a computer to execute the video encoding method according to any one of claims 1 to 5.

A computer-readable recording medium on which a video encoding program for causing a computer to execute the video encoding method according to any one of claims 1 to 5 is recorded.

A video decoding program for causing a computer to execute the video decoding method according to any one of claims 6 to 10.

A computer-readable recording medium on which a video decoding program for causing a computer to execute the video decoding method according to any one of claims 6 to 10 is recorded.