JP4851564B2

JP4851564B2 - Video encoding method, video decoding method, video encoding program, video decoding program, and computer-readable recording medium on which these programs are recorded

Info

Publication number: JP4851564B2
Application number: JP2009141891A
Authority: JP
Inventors: 信哉志水; 正樹北原; 一人上倉; 由幸八島
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2009-06-15
Filing date: 2009-06-15
Publication date: 2012-01-11
Anticipated expiration: 2025-07-28
Also published as: JP2009213161A

Description

本発明は、ある被写体を撮影する複数のカメラにより撮影された画像を符号化する映像符号化方法と、その映像符号化方法の実現に用いられる映像符号化プログラム及びそのプログラムを記録したコンピュータ読み取り可能な記録媒体と、その映像符号化方法で符号化されたデータを復号する映像復号方法と、その映像復号方法の実現に用いられる映像復号プログラム及びそのプログラムを記録したコンピュータ読み取り可能な記録媒体とに関する。 The present invention relates to a video encoding method for encoding images captured by a plurality of cameras that capture a certain subject, a video encoding program used to realize the video encoding method, and a computer-readable recording of the program And a video decoding method for decoding data encoded by the video encoding method, a video decoding program used for realizing the video decoding method, and a computer-readable recording medium recording the program .

多視点動画像（多視点映像）とは、複数のカメラで同じ被写体と背景を撮影した複数の動画像のことである。以下では、１つのカメラで撮影された動画像を“２次元動画像”と呼び、同じ被写体と背景を撮影した２次元動画像群を多視点動画像と呼ぶ。 A multi-view video (multi-view video) refers to a plurality of videos obtained by shooting the same subject and background with a plurality of cameras. Hereinafter, a moving image captured by one camera is referred to as a “two-dimensional moving image”, and a two-dimensional moving image group in which the same subject and background are captured is referred to as a multi-viewpoint moving image.

多視点動画像に含まれる各カメラの２次元動画像は、時間方向に関して強い相関がある。一方、各カメラが同期されていた場合、同じ時間に対応した各カメラの映像は全く同じ状態の被写体と背景を別の位置から撮影したものなので、カメラ間で強い相関がある。以下では、このカメラ間での相関を空間方向の相関と呼ぶ。 The two-dimensional moving image of each camera included in the multi-viewpoint moving image has a strong correlation with respect to the time direction. On the other hand, when the cameras are synchronized, the images of the cameras corresponding to the same time are obtained by photographing the subject and the background in exactly the same state from different positions, and thus there is a strong correlation between the cameras. Hereinafter, the correlation between the cameras is referred to as a spatial correlation.

動画像の符号化においては、これらの相関を利用することによって符号化効率を高めている。 In the encoding of moving images, the encoding efficiency is improved by using these correlations.

まず、２次元動画像の符号化技術に関する従来技術を述べる。 First, a conventional technique related to a two-dimensional video encoding technique will be described.

２次元動画像の符号化では、符号化しようとする画像と既に符号化済みの画像との差分のみを符号化対象とすることで、時間方向の相関を利用し、符号化効率を高めている。 In the encoding of a two-dimensional moving image, only the difference between an image to be encoded and an image that has already been encoded is used as an encoding target, thereby utilizing the correlation in the time direction and improving the encoding efficiency. .

国際符号化標準であるＨ．２６４、ＭＰＥＧ−２、ＭＰＥＧ−４をはじめとした従来の多くの２次元動画像符号化方式では、差分を求める際に動き補償と呼ばれる技術を用いることで、更に符号化効率を高めている。動き補償とは、画像をより小さなブロックに分割し、そのブロックごとに差分をとるブロックを切り替える技術である。これによって、被写体に動きが生じたり、カメラが動いたりする場合にも差分を小さくし、符号化効率をあげることができる。 H., an international encoding standard. In many conventional two-dimensional video encoding systems such as H.264, MPEG-2, and MPEG-4, encoding efficiency is further improved by using a technique called motion compensation when obtaining a difference. Motion compensation is a technique in which an image is divided into smaller blocks, and blocks that take a difference are switched for each block. As a result, even when the subject moves or the camera moves, the difference can be reduced and the encoding efficiency can be increased.

次に、従来の多視点動画像の符号化方式について説明する。 Next, a conventional multi-view video encoding method will be described.

従来の多視点動画像の符号化では、時間方向及び空間方向の相関を利用して符号化効率を高めるために、時間方向の予測及びカメラ間での補償を行った符号化を採用している。その一例としては、下記に示す非特許文献１の手法がある。 In conventional multi-view video coding, in order to increase the coding efficiency by using the correlation in the time direction and the spatial direction, coding with prediction in the time direction and compensation between cameras is adopted. . As an example, there is a method of Non-Patent Document 1 shown below.

この非特許文献１の手法で行われるカメラ間での補償は視差方向予測と呼ばれ、別のカメラの画像を参照画像として動き補償を行うものである。この手法では符号化効率を考え、マクロブロックごとにどちらの相関を利用した補償を行うのかを選択できるようになっている。そのため、時間方向及び空間方向の相関が符号化に利用されるため、時間方向のみの相関を利用した方法より符号化効率を向上させることが可能である。 Compensation between cameras performed by the method of Non-Patent Document 1 is called parallax direction prediction, and performs motion compensation using an image of another camera as a reference image. In this method, considering the coding efficiency, it is possible to select which correlation is used for compensation for each macroblock. Therefore, since the correlation in the time direction and the spatial direction is used for encoding, it is possible to improve the encoding efficiency as compared with the method using the correlation only in the time direction.

Hideaki Kimata and Masaki Kitahara, "Preliminary results on multiple view video coding(3DAV)," document M10976 MPEG Redmond Meeting, July, 2004.Hideaki Kimata and Masaki Kitahara, "Preliminary results on multiple view video coding (3DAV)," document M10976 MPEG Redmond Meeting, July, 2004.

確かに、非特許文献１の手法では、時間方向のみの相関を利用した方法より符号化効率を向上させることができるようになるものの、複数のカメラの映像を参照し合う方法を用いていることから、データを復号する際に参照関係にある全てのカメラの映像を必要とすることになる。 Certainly, in the method of Non-Patent Document 1, although the encoding efficiency can be improved as compared with the method using the correlation only in the time direction, the method of referring to the images of a plurality of cameras is used. Therefore, when decoding the data, the images of all the cameras having a reference relationship are required.

例えば、図１９に示すような参照関係がある場合、全てのカメラの映像を復号側に渡す必要がある。これは多視点動画像の利用法の１つである自由視点動画像合成においては、必要のないカメラの映像まで復号する必要があるということになってしまう。ネットワークでデータを通信させることを考えると、本来必要のない情報まで相手に送る必要が生じるということになる。 For example, when there is a reference relationship as shown in FIG. 19, it is necessary to pass video from all cameras to the decoding side. This means that in the free viewpoint moving image synthesis, which is one of the utilization methods of multi-viewpoint moving images, it is necessary to decode even the video of the camera that is not necessary. Considering that data is communicated over a network, it becomes necessary to send information to the other party that is not necessary.

ここで述べている自由視点動画像合成とは、カメラの置いてない地点からの映像を隣接するカメラの映像を用いて合成する技術のことである。したがって、各カメラの映像を独立して取り出せるように符号化する機能が求められる。 The free-viewpoint video synthesis described here is a technique for synthesizing video from a point where a camera is not placed using video from an adjacent camera. Therefore, there is a need for a function for encoding the video from each camera so that it can be taken out independently.

また、多視点動画像の利用法として自由視点動画像の合成を考えた場合、自由視点動画像の合成処理は非常に演算量の多い処理であるため、復号と合成とを同時に行うのは高負荷なことと言える。したがって、符号・復号における処理で合成処理を手助けできるような機能も求められる。 Also, when considering the synthesis of free-viewpoint video as a method of using multi-viewpoint video, the synthesis processing of free-viewpoint video is a process with a large amount of computation. It can be said that it is a load. Therefore, a function capable of assisting the synthesis process in the encoding / decoding process is also required.

各カメラの映像を必要に応じて独立して取り出せる機能を実現する符号化方法として、非特許文献１のような手法における参照関係を制限し、ある特定のカメラの映像しか参照できないようにする方法を用いることが考えられる。この方法を用いれば、参照関係が制限されることによって、データを復号する際に必要なカメラの映像を減らすことができる。 As an encoding method for realizing the function of independently extracting the video of each camera as necessary, a method of limiting the reference relationship in the technique as in Non-Patent Document 1 so that only the video of a specific camera can be referred to Can be considered. If this method is used, the reference relationship is limited, so that it is possible to reduce the number of camera images necessary for decoding data.

しかしながら、このような方法では、符号化側で設定された単位毎でしかカメラの情報を個別に取り出すことはできない。また、各カメラの映像を１つ以上のビットストリームとして符号化することは明らかに効率的ではないため、参照関係が制限されることによって、あるカメラ間に強い相関があってもそれを利用できなくなってしまう場合もある。つまり、相関を取り除けなくなるので、全体として十分な符号化効率を達成することが困難となる。 However, with such a method, camera information can be individually extracted only in units set on the encoding side. Also, it is obviously not efficient to encode each camera video as one or more bitstreams, so it can be used even if there is a strong correlation between certain cameras by limiting the reference relationship. It may disappear. That is, since the correlation cannot be removed, it is difficult to achieve sufficient coding efficiency as a whole.

また、下記に示す参考文献によれば、多視点動画像から自由視点動画像を合成する手法には、被写体の幾何情報を利用して合成する手法と、被写体の幾何情報を利用しないで映像から合成する手法とがある。 Further, according to the following references, there are two methods for synthesizing a free viewpoint video from multi-view video, a method using the geometric information of the subject, and a video without using the geometric information of the subject. There is a method to synthesize.

参考文献：Heung-Yeung Shum, Sing Bing Kang, and Shing-Chow Chan, "Survey of image-based representations and compression techniques," IEEE Tran sactions on Circuits and Systems for Video Technology, vol.13, no. 11, Nov.2003, pp.1020-1037.
幾何情報を利用しない手法でもって、幾何情報を利用する手法と同等の品質の映像を合成するためには、より多くのカメラからの映像が入力として必要となる。多くのカメラの映像を用いるということは非常に多くの演算が必要になることを意味する。 References: Heung-Yeung Shum, Sing Bing Kang, and Shing-Chow Chan, "Survey of image-based representations and compression techniques," IEEE Tran sactions on Circuits and Systems for Video Technology, vol.13, no. 11, Nov .2003, pp.1020-1037.
In order to synthesize a video having the same quality as that using the geometric information without using the geometric information, more videos from the camera are required as input. Using many camera images means that a great deal of computation is required.

この点について、非特許文献１のような手法では、各画素のもつ情報は、前のフレームからの動きベクトルまたは他のカメラからの視差ベクトルのどちらかと残差信号という形でしか表現されない。したがって、非特許文献１のような手法で符号化された多視点動画像を用いて自由視点動画像を合成する場合には、幾何情報を計算する必要が生じて演算量が増して合成に時間がかかることになるか、より多くのカメラからの映像が必要になることになる。 In this regard, in the method as described in Non-Patent Document 1, information of each pixel is expressed only in the form of a residual signal and either a motion vector from the previous frame or a parallax vector from another camera. Therefore, when a free viewpoint moving image is synthesized using a multi-view moving image encoded by a technique such as that described in Non-Patent Document 1, it is necessary to calculate geometric information, which increases the amount of calculation and takes time to synthesize. Or more video from more cameras.

本発明はかかる事情に鑑みてなされたものであって、視差方向予測と時間方向予測とを同時に使うことによって多視点動画像の符号化効率を向上できるようにするという構成を採るときにあって、必要となる映像を参照関係にある全てのカメラの映像を使うことなく復号できるようにすることで、自由視点動画像の合成を小さな負荷で実現できるようにする新たな映像符号化復号技術の提供を目的とする。 The present invention has been made in view of such circumstances, and has a configuration in which the encoding efficiency of a multi-view video can be improved by simultaneously using the parallax direction prediction and the temporal direction prediction. A new video coding and decoding technology that enables the synthesis of free-viewpoint video with a small load by enabling the necessary video to be decoded without using the video of all cameras in a reference relationship. For the purpose of provision.

〔１〕本発明の映像符号化方法の基本的な構成
この目的を達成するために、本発明の映像符号化方法は、ある被写体を撮影する複数のカメラにより撮影された画像を符号化することを実現するために、（ａ）基準視点となるカメラにより撮影された基準視点画像を符号化する基準視点画像符号化ステップと、（ｂ）基準視点画像を撮影したカメラから被写体までの推定距離を示す距離画像を生成する距離画像生成ステップと、（ｃ）生成した距離画像を符号化する距離画像符号化ステップと、（ｄ）基準視点画像と距離画像とカメラの設置位置および向きについて規定するカメラの位置関係とに基づいて、基準視点以外の視点における視差補償画像（視差に基づく予測画像となるもの）を推定する視差補償画像推定ステップと、（ｅ）推定した視差補償画像とその推定対象の視点に対応付けられるカメラの撮影した符号化対象画像との差分を示す視差差分画像を算出する視差差分画像算出ステップと、（ｆ）符号化済みの視差差分画像を用いて、算出した視差差分画像を時間的あるいは空間的に予測した視差差分予測値を生成する視差差分予測値生成ステップと、（ｇ）算出した視差差分画像と生成した視差差分予測値との差に相当するデータを符号化する差データ符号化ステップとを有する。 [1] Basic Configuration of Video Encoding Method of the Present Invention To achieve this object, the video encoding method of the present invention encodes images captured by a plurality of cameras that capture a certain subject. (A) a reference viewpoint image encoding step for encoding a reference viewpoint image captured by a camera serving as a reference viewpoint, and (b) an estimated distance from the camera that captured the reference viewpoint image to the subject. A distance image generating step that generates a distance image to be shown; (c) a distance image encoding step that encodes the generated distance image; and (d) a camera that defines the reference viewpoint image, the distance image, and the installation position and orientation of the camera. A parallax-compensated image estimation step for estimating a parallax-compensated image (what becomes a predicted image based on parallax) at a viewpoint other than the reference viewpoint based on the positional relationship of (e) A parallax difference image calculation step for calculating a parallax difference image indicating a difference between the parallax compensation image obtained and the encoding target image captured by the camera associated with the estimation target viewpoint; and (f) an encoded parallax difference image. A parallax difference prediction value generation step for generating a parallax difference prediction value obtained by temporally or spatially predicting the calculated parallax difference image, and (g) the calculated parallax difference image and the generated parallax difference prediction value. A difference data encoding step for encoding data corresponding to the difference.

この基本的な構成を採るときにあって、本発明の映像符号化方法は、さらに次の構成を採ることがある。 In adopting this basic configuration, the video encoding method of the present invention may further adopt the following configuration.

〔１−１〕
視差補償画像推定ステップでは、復号側がカメラの位置関係の情報を符号化データからではなくて得ることができる場合には、基準視点画像の符号化データを復号することで得られる基準視点画像と、距離画像の符号化データを復号することで得られる距離画像と、符号化されることのないカメラの位置関係とに基づいて、基準視点以外の視点における視差補償画像を推定することがある。 [1-1]
In the parallax compensation image estimation step, when the decoding side can obtain the positional relationship information of the camera instead of the encoded data, the reference viewpoint image obtained by decoding the encoded data of the reference viewpoint image; A parallax compensation image at a viewpoint other than the reference viewpoint may be estimated based on the distance image obtained by decoding the encoded data of the distance image and the positional relationship of the cameras that are not encoded.

〔１−２〕
復号側がカメラの位置関係の情報を符号化データから得ることになる場合には、上述の基本的な構成のステップに加えて、（ｇ）外部からの情報に従ってカメラの位置関係を取得するか、全カメラの画像に基づいてカメラの位置関係を推定することで、カメラの位置関係を設定するカメラ位置関係設定ステップと、（ｈ）設定したカメラの位置関係の情報を符号化するカメラ位置関係情報符号化ステップとを有する。 [1-2]
When the decoding side obtains information on the camera positional relationship from the encoded data, in addition to the basic configuration steps described above, (g) acquire the camera positional relationship according to information from the outside, A camera positional relationship setting step for setting the positional relationship of the cameras by estimating the positional relationship of the cameras based on the images of all the cameras; and (h) camera positional relationship information for encoding the information on the positional relationship of the set cameras. Encoding step.

〔１−３〕
視差補償画像推定ステップでは、復号側がカメラの位置関係の情報を符号化データから得ることになる場合には、基準視点画像の符号化データを復号することで得られる基準視点画像と、距離画像の符号化データを復号することで得られる距離画像と、カメラ位置関係情報の符号化データを復号することで得られるカメラの位置関係とに基づいて、基準視点以外の視点における視差補償画像を推定することがある。 [1-3]
In the parallax compensation image estimation step, when the decoding side obtains information on the positional relationship of the camera from the encoded data, the reference viewpoint image obtained by decoding the encoded data of the reference viewpoint image, and the distance image Based on the distance image obtained by decoding the encoded data and the camera positional relationship obtained by decoding the encoded data of the camera positional relationship information, a parallax compensation image at a viewpoint other than the reference viewpoint is estimated. Sometimes.

〔１−４〕
基準視点となるカメラを自動的に決定する場合には、上述の基本的な構成のステップに加えて、（ｉ）他のカメラが撮影する空間と最も重複する空間を撮影しているカメラを基準視点となるカメラとして設定するステップを有することがある。 [1-4]
In the case of automatically determining the camera to be the reference viewpoint, in addition to the basic configuration steps described above, (i) the camera that captures the space most overlapping with the space captured by the other camera is used as a reference. There may be a step of setting as a camera as a viewpoint.

〔１−５〕
距離画像生成ステップでは、画像をブロックに分割して、ブロックごとに距離を推定することで距離画像を生成することがある。 [1-5]
In the distance image generation step, a distance image may be generated by dividing the image into blocks and estimating the distance for each block.

〔１−６〕
距離画像生成ステップでは、規定のアルゴリズムに従って距離画像を生成する場合に、現時刻において生成した距離画像の評価値と、１つ前の時刻において生成した距離画像の評価値との差分値を求めて、その差分値の大きさを所定の閾値と比較することで判断して、その差分値が大きいことを判断する場合には、現時刻において生成した距離画像をそのまま用いることを決定し、その差分値が小さいことを判断する場合には、１つ前の時刻において生成した距離画像に変更して用いることを決定することで距離画像を生成することがある。 [1-6]
In the distance image generation step, when a distance image is generated according to a prescribed algorithm, a difference value between the evaluation value of the distance image generated at the current time and the evaluation value of the distance image generated at the previous time is obtained. When the difference value is determined by comparing the difference value with a predetermined threshold value and the difference value is determined to be large, it is determined that the distance image generated at the current time is used as it is, and the difference is determined. When it is determined that the value is small, a distance image may be generated by determining to use the distance image generated at the previous time.

〔１−７〕
視差補償画像推定ステップでは、基準視点画像と距離画像とカメラの位置関係とに基づいて基準視点以外の視点における視差補償画像を推定することになるが、このとき、画素値を推定できない画素について、周辺の画素の画素値から、その画素の画素値を推定することがある。 [1-7]
In the parallax compensation image estimation step, a parallax compensation image at a viewpoint other than the reference viewpoint is estimated based on the reference viewpoint image, the distance image, and the positional relationship between the cameras. The pixel value of the pixel may be estimated from the pixel values of the surrounding pixels.

〔１−８〕
視差補償画像推定ステップでは、基準視点画像と距離画像とカメラの位置関係とに基づいて基準視点以外の視点における視差補償画像を推定することになるが、このとき、画素値を推定できない画素について、周辺の画素の動き情報からその画素の動き情報を推定して、その推定した動き情報と符号化済みの画像の画素値とに基づいて、その画素の画素値を推定することがある。 [1-8]
In the parallax compensation image estimation step, a parallax compensation image at a viewpoint other than the reference viewpoint is estimated based on the reference viewpoint image, the distance image, and the positional relationship between the cameras. In some cases, the motion information of the pixel is estimated from the motion information of the surrounding pixels, and the pixel value of the pixel is estimated based on the estimated motion information and the pixel value of the encoded image.

〔１−９〕
視差補償画像推定ステップでは、基準視点画像と距離画像とカメラの位置関係とに基づいて基準視点以外の視点における視差補償画像を推定することになるが、このとき、画素値を推定できない画素について、〔１−７〕の方法に従って推定した視差補償画像を用いる場合の符号量と、〔１−８〕の方法に従って推定した視差補償画像を用いる場合の符号量とを比較して、視差補償画像ごとに効率的な符号化を行える方法を選択することで、その画素の画素値を推定することがある。この構成を採るときには、どちらの予測モードを用いたのかを示す情報についても符号化することになる。 [1-9]
In the parallax compensation image estimation step, a parallax compensation image at a viewpoint other than the reference viewpoint is estimated based on the reference viewpoint image, the distance image, and the positional relationship between the cameras. The amount of code when using the parallax compensated image estimated according to the method [1-7] is compared with the amount of code when using the parallax compensated image estimated according to the method [1-8]. In some cases, the pixel value of the pixel is estimated by selecting a method that can perform efficient encoding. When this configuration is adopted, information indicating which prediction mode is used is also encoded.

〔１−１０〕
距離画像符号化ステップでは、基準視点画像を符号化する際に使われた動きベクトルを用いて距離画像を符号化することがある。 [1-10]
In the distance image encoding step, the distance image may be encoded using a motion vector used when encoding the reference viewpoint image.

〔１−１１〕
視差差分予測値生成ステップでは、基準視点画像を符号化する際に使われた動きベクトルと距離画像とカメラの位置関係とに基づいて推定される動きベクトルか、自身の参照画像から推定される動きベクトルの内の符号化効率のよい方を選択して視差差分予測値を生成することがある。 [1-11]
In the parallax difference prediction value generation step, the motion vector estimated when encoding the reference viewpoint image, the motion vector estimated based on the distance image and the positional relationship between the cameras, or the motion estimated from its own reference image There is a case where a parallax difference prediction value is generated by selecting a vector having a higher encoding efficiency.

〔１−１２〕
視差差分予測値生成ステップでは、基準視点画像を符号化する際に使われた動きベクトルと距離画像とカメラの位置関係とに基づいて推定される動きベクトルか、自身の参照画像から推定される動きベクトルの内の符号化効率のよい方を選択して視差差分予測値を生成することがあるが、この動きベクトルを推定するときに、距離画像の符号化データを復号することで得られる距離画像を用いて動きベクトルを推定することがある。 [1-12]
In the parallax difference prediction value generation step, the motion vector estimated when encoding the reference viewpoint image, the motion vector estimated based on the distance image and the positional relationship between the cameras, or the motion estimated from its own reference image A parallax difference prediction value may be generated by selecting the vector with the best coding efficiency, but the distance image obtained by decoding the encoded data of the distance image when estimating this motion vector May be used to estimate the motion vector.

ここで、このように構成される本発明の映像符号化方法はコンピュータプログラムでも実現できるものであり、このコンピュータプログラムは、適当なコンピュータ読み取り可能な記録媒体に記録して提供されたり、ネットワークを介して提供され、本発明を実施する際にインストールされてＣＰＵなどの制御手段上で動作することにより本発明を実現することになる。 Here, the video encoding method of the present invention configured as described above can also be realized by a computer program. The computer program is provided by being recorded on a suitable computer-readable recording medium or via a network. The present invention is realized by being installed when operating the present invention and operating on a control means such as a CPU.

〔２〕本発明の映像復号方法の基本的な構成
本発明の映像復号方法は、本発明の映像符号化方法により生成された符号化データを復号することで、ある被写体を撮影する複数のカメラにより撮影された画像を復元することを実現するために、（ａ）基準視点となるカメラにより撮影された基準視点画像についての符号化データを復号する基準視点画像復号ステップと、（ｂ）基準視点画像を撮影したカメラから被写体までの推定距離を示す距離画像についての符号化データを復号する距離画像復号ステップと、（ｃ）復号した基準視点画像と復号した距離画像とカメラの設置位置および向きについて規定するカメラの位置関係とに基づいて、基準視点以外の視点における視差補償画像を推定する視差補償画像推定ステップと、（ｄ）推定した視差補償画像とその推定対象の視点に対応付けられるカメラの撮影した画像との差分を示す視差差分画像と、その視差差分画像を復元済みの視差差分画像を用いて時間的あるいは空間的に予測した視差差分予測値との差データについての符号化データを復号する差データ復号ステップと、（ｅ）復号した差データと視差差分予測値とに基づいて、基準視点以外の視点に対応付けられるカメラの撮影した画像との差分を示す視差差分画像を復元する視差差分画像復元ステップと、（ｆ）推定した視差補償画像と復元した視差差分画像とに基づいて、基準視点以外の視点に対応付けられるカメラの撮影した画像を復元する画像復元ステップとを有する。 [2] Basic Configuration of Video Decoding Method of the Present Invention The video decoding method of the present invention includes a plurality of cameras that capture a certain subject by decoding encoded data generated by the video encoding method of the present invention. (A) a reference viewpoint image decoding step for decoding encoded data of a reference viewpoint image captured by a camera serving as a reference viewpoint, and (b) a reference viewpoint. A distance image decoding step of decoding encoded data of a distance image indicating an estimated distance from the camera that captured the image to the subject; and (c) a decoded reference viewpoint image, a decoded distance image, and a camera installation position and orientation. A parallax-compensated image estimation step for estimating a parallax-compensated image at a viewpoint other than the reference viewpoint based on the positional relationship of the specified camera; (d) the estimated The parallax difference image indicating the difference between the parallax compensation image and the image captured by the camera associated with the estimation target viewpoint, and the parallax difference image is predicted temporally or spatially using the restored parallax difference image. A difference data decoding step for decoding encoded data of difference data with respect to the predicted parallax difference value; and (e) a camera associated with a viewpoint other than the reference viewpoint based on the decoded difference data and the predicted parallax difference value. A camera associated with a viewpoint other than the reference viewpoint based on the parallax difference image restoration step for restoring the parallax difference image indicating the difference from the captured image, and (f) the estimated parallax compensation image and the restored parallax difference image. An image restoration step for restoring the captured image.

この基本的な構成を採るときにあって、本発明の映像復号方法は、さらに次の構成を採ることがある。 In adopting this basic configuration, the video decoding method of the present invention may further adopt the following configuration.

〔２−１〕
カメラの位置関係の情報を符号化データから得ることになる場合には、上述の基本的な構成のステップに加えて、（ｇ）各画像を撮影したカメラの位置関係の情報についての符号化データを復号するカメラ位置関係情報復号ステップを有する。 [2-1]
When the positional information of the camera is obtained from the encoded data, in addition to the basic configuration steps described above, (g) the encoded data regarding the positional relationship information of the camera that captured each image The camera positional relationship information decoding step for decoding

〔２−２〕
視差補償画像推定ステップでは、復号した基準視点画像と復号した距離画像とカメラの位置関係とに基づいて基準視点以外の視点における視差補償画像を推定することになるが、このとき、画素値を推定できない画素については、周辺の画素の画素値から、その画素の画素値を推定することがある。 [2-2]
In the parallax compensation image estimation step, a parallax compensation image at a viewpoint other than the reference viewpoint is estimated based on the decoded reference viewpoint image, the decoded distance image, and the positional relationship between the cameras. At this time, the pixel value is estimated. For a pixel that cannot be used, the pixel value of the pixel may be estimated from the pixel values of surrounding pixels.

〔２−３〕
視差補償画像推定ステップでは、復号した基準視点画像と復号した距離画像とカメラの位置関係とに基づいて基準視点以外の視点における視差補償画像を推定することになるが、このとき、画素値を推定できない画素については、周辺の画素の動き情報からその画素の動き情報を推定して、その推定した動き情報と復号済みの画像の画素値とに基づいて、その画素の画素値を推定することがある。 [2-3]
In the parallax compensation image estimation step, a parallax compensation image at a viewpoint other than the reference viewpoint is estimated based on the decoded reference viewpoint image, the decoded distance image, and the positional relationship between the cameras. At this time, the pixel value is estimated. For a pixel that cannot be used, the motion information of the pixel is estimated from the motion information of the surrounding pixels, and the pixel value of the pixel is estimated based on the estimated motion information and the pixel value of the decoded image. is there.

〔２−４〕
視差補償画像推定ステップでは、復号した基準視点画像と復号した距離画像とカメラの位置関係とに基づいて基準視点以外の視点における視差補償画像を推定することになるが、このとき、画素値を推定できない画素については、符号化データに埋め込まれている予測モードの情報に基づいて、〔２−２〕の推定方法か〔２−３〕の推定方法のどちらかを選択することで、視差補償画像を単位にして２つの方法を切り替えながら画素値の推定を行うことがある。 [2-4]
In the parallax compensation image estimation step, a parallax compensation image at a viewpoint other than the reference viewpoint is estimated based on the decoded reference viewpoint image, the decoded distance image, and the positional relationship between the cameras. At this time, the pixel value is estimated. For pixels that cannot be processed, the parallax compensation image is selected by selecting either the estimation method [2-2] or the estimation method [2-3] based on the prediction mode information embedded in the encoded data. In some cases, pixel values are estimated while switching between the two methods.

ここで、このように構成される本発明の映像復号方法はコンピュータプログラムでも実現できるものであり、このコンピュータプログラムは、適当なコンピュータ読み取り可能な記録媒体に記録して提供されたり、ネットワークを介して提供され、本発明を実施する際にインストールされてＣＰＵなどの制御手段上で動作することにより本発明を実現することになる。 Here, the video decoding method of the present invention configured as described above can also be realized by a computer program, and this computer program is provided by being recorded on an appropriate computer-readable recording medium or via a network. The present invention is realized by being provided and installed on implementing the present invention and operating on a control means such as a CPU.

〔３〕本発明の処理について
本発明の映像符号化方法は、復号側がカメラの位置関係の情報を符号化データからではなくて得ることができる場合には、基準視点となるカメラにより撮影された基準視点画像と、基準視点画像を撮影したカメラから被写体までの推定距離を示す距離画像と、基準視点画像と距離画像とカメラの位置関係とに基づいて推定された視差補償画像と符号化対象画像との差分を示す視差差分画像という３種類の画像を符号化する。そして、復号側がカメラの位置関係の情報を符号化データから得ることになる場合には、この３種類の画像に加えて、カメラの位置関係の情報を符号化する。 [3] Regarding the processing of the present invention The video encoding method of the present invention was taken by the camera serving as the reference viewpoint when the decoding side can obtain the positional information of the camera instead of the encoded data. A reference viewpoint image, a distance image indicating an estimated distance from the camera that captured the reference viewpoint image to the subject, a parallax compensation image estimated based on the reference viewpoint image, the distance image, and the positional relationship between the camera and the encoding target image Are encoded three types of images called parallax difference images. When the decoding side obtains information on the positional relationship of the camera from the encoded data, the information on the positional relationship of the camera is encoded in addition to the three types of images.

この符号化データを受けて、本発明の映像復号方法は、基準視点画像と距離画像と視差差分画像とを復号することにより得て、復号した基準視点画像と復号した距離画像とカメラの位置関係とに基づいて、基準視点以外の視点における視差補償画像を推定して、その推定した視差補償画像と復号した視差差分画像とに基づいて、基準視点以外の視点に対応付けられるカメラの撮影した符号化対象画像を復元する。 Receiving this encoded data, the video decoding method of the present invention is obtained by decoding the reference viewpoint image, the distance image, and the parallax difference image, and the positional relationship between the decoded reference viewpoint image, the decoded distance image, and the camera. Based on the above, a parallax compensation image at a viewpoint other than the reference viewpoint is estimated, and a code captured by a camera associated with a viewpoint other than the reference viewpoint is based on the estimated parallax compensation image and the decoded parallax difference image Restore the target image.

このように、本発明では、符号化対象の多視点動画像の中の１つの視点（基準視点）から被写体への距離を、その多視点動画像を用いて求めて距離画像を作り出す。次に、基準視点における動画像と奥行き情報の動画像（距離動画像）とを用いて、他視点における動画像を予測する。そして、この予測動画像と符号化対象の動画像との差分を求めることで差分の動画像を得て、基準視点の動画像、基準視点における奥行き情報の動画像（距離動画像）、基準視点以外の視点における差分の動画像をそれぞれ２次元動画像として符号化する。 As described above, according to the present invention, a distance image is created by obtaining a distance from one viewpoint (reference viewpoint) to a subject in a multi-view video to be encoded using the multi-view video. Next, a moving image at another viewpoint is predicted using a moving image at the reference viewpoint and a moving image of depth information (distance moving image). Then, by obtaining the difference between the predicted moving image and the moving image to be encoded, a difference moving image is obtained, and the moving image of the reference viewpoint, the moving image of the depth information at the reference viewpoint (distance moving image), the reference viewpoint The difference moving images at the viewpoints other than are encoded as two-dimensional moving images.

つまり、本発明では、参照は基準視点としか行われていないため、必要なカメラ以外の情報をできるだけ抑えて取り出すという機能を実現できることになる。また、奥行き情報の動画像（距離動画像）は全ての視点の情報から作られたものであるため、それを用いて予測することで、空間方向の相関を利用できることになる。 That is, in the present invention, since only the reference viewpoint is used for reference, it is possible to realize a function of taking out information other than necessary cameras as much as possible. In addition, since the moving image of depth information (distance moving image) is created from information of all viewpoints, the correlation in the spatial direction can be used by predicting using the information.

更に、予測残差は基準視点から予測できない情報であることから、基準視点の画像には含まれてなく、その視点にのみ含まれる情報が残るのだが、これに対しては動き予測を用いて時間方向の相関を利用することで符号化効率を向上できることになる。 Furthermore, because the prediction residual is information that cannot be predicted from the reference viewpoint, it is not included in the image of the reference viewpoint, but information that is included only in that viewpoint remains. Coding efficiency can be improved by using the correlation in the time direction.

更に、奥行き情報の動画像（距離動画像）は基準視点と他の全ての視点との間の視差情報を１つの表現で表すことができているだけでなく、それを２次元動画像として符号化することで視差の時間方向の冗長性を取り除くことができることになる。 Furthermore, the depth information moving image (distance moving image) not only represents the parallax information between the reference viewpoint and all other viewpoints in one expression, but also encoded it as a two-dimensional moving image. By doing so, it is possible to remove the redundancy of the parallax in the time direction.

また、多視点動画像から自由視点動画像を合成する場合、被写体の幾何情報が存在することによって、ある品質の映像を作り出すのに必要な画像の個数が少なくて済んだり、合成の処理が簡略化されたりすることになるのだが、本発明で求める奥行きの動画像（距離動画像）は一種の幾何情報を表しているため、合成画像の品質を向上させたり、幾何情報の推定処理を簡略化できることになる。 Also, when composing free viewpoint video from multi-view video, the number of images required to create a certain quality video can be reduced due to the existence of the geometric information of the subject, and the compositing process is simplified. However, since the depth moving image (distance moving image) required in the present invention represents a kind of geometric information, the quality of the synthesized image is improved, and the geometric information estimation process is simplified. It will be possible.

このように、本発明によれば、受け手が必要とする任意のカメラの映像を、必要ないカメラの情報を送信することをできるだけ抑えて取り出すという機能を実現できるようになる。そして、ある視点からの被写体の推定距離情報を用いて符号化されているため、被写体の幾何情報が提供されることとなり、自由視点動画像を合成する前処理が省略できるという利点を持ち、なお、かつ空間方向の相関と時間方向の相関とを同時に利用することによって多視点動画像を効率的に符号化できるようになる。 As described above, according to the present invention, it is possible to realize a function of extracting an image of an arbitrary camera required by the receiver while suppressing transmission of unnecessary camera information as much as possible. And since it is encoded using the estimated distance information of the subject from a certain viewpoint, the geometric information of the subject is provided, and there is an advantage that the preprocessing for synthesizing the free viewpoint moving image can be omitted. In addition, by simultaneously using the correlation in the spatial direction and the correlation in the temporal direction, the multi-view video can be efficiently encoded.

次に、本発明の映像符号化方法が持つ上述の〔１−１〕〜〔１−１２〕に記載した各処理機能の持つ意味について説明する。 Next, the meaning of each processing function described in the above [1-1] to [1-12] of the video encoding method of the present invention will be described.

（Ａ）〔１−１〕に記載した処理機能の持つ意味 (A) Meaning of the processing function described in [1-1]

復号側が入手できる距離画像と基準視点画像とは符号化されたものを復号した画像であるため、符号化側で、符号化・復号を行わないオリジナルの距離画像と基準視点画像とを用いて推定した視差補償画像を用いると、復号側で推定される視差補償画像との間に誤差が存在することになり、基準視点以外の画像が、視差差分画像の誤差と基準視点画像の誤差と距離画像の誤差とが重なったものになってしまう。 The distance image and the reference viewpoint image that can be obtained by the decoding side are images obtained by decoding the encoded image. Therefore, the encoding side estimates using the original distance image and the reference viewpoint image that are not encoded / decoded. If the parallax compensated image is used, there is an error between the parallax compensated image estimated on the decoding side, and the image other than the reference viewpoint is the error of the parallax difference image, the error of the reference viewpoint image, and the distance image. The error will overlap.

そこで、本発明では、〔１−１〕に記載するように、復号側がカメラの位置関係の情報を符号化データからではなくて得ることができる場合には、符号化側は、基準視点画像の符号化データを復号することで得られる基準視点画像と、距離画像の符号化データを復号することで得られる距離画像と、符号化されることのないカメラの位置関係とに基づいて、基準視点以外の視点における視差補償画像を推定するように処理するのである。 Therefore, in the present invention, as described in [1-1], when the decoding side can obtain the positional information of the camera instead of the encoded data, the encoding side Based on the reference viewpoint image obtained by decoding the encoded data, the distance image obtained by decoding the encoded data of the distance image, and the positional relationship of the cameras that are not encoded, Processing is performed so as to estimate a parallax compensation image at a viewpoint other than the above.

この処理機能に従って、符号化側で符号化歪みの入った距離画像と基準視点画像とを用いることになるため、基準視点以外の画像における符号化歪みは視差差分画像における符号化歪みの影響だけにすることができるようになる。すなわち、視差差分画像の符号化において、復号側における距離画像と基準視点画像の符号化歪みを考慮した符号化を行うことができることで、最高で歪みなしの符号化を達成することができるようになる。 In accordance with this processing function, the encoding side uses a distance image including a coding distortion and a reference viewpoint image, so that the coding distortion in an image other than the reference viewpoint is only affected by the coding distortion in the parallax difference image. Will be able to. That is, in encoding of the parallax difference image, it is possible to perform encoding considering the encoding distortion of the distance image and the reference viewpoint image on the decoding side, so that encoding without distortion can be achieved at the maximum. Become.

（Ｂ）〔１−２〕に記載した処理機能の持つ意味 (B) Meaning of the processing function described in [1-2]

カメラの位置関係が明示的に与えられなかった場合、多視点動画像から距離画像を求め、視差補償画像を求める処理の精度は著しく低下する。そのため視差差分画像により多くの残差が存在することにより符号化効率が悪くなる。 When the positional relationship of the camera is not explicitly given, the accuracy of the process of obtaining the distance image from the multi-viewpoint moving image and obtaining the parallax compensation image is significantly reduced. For this reason, there are many residuals in the parallax difference image, resulting in poor coding efficiency.

そこで、本発明では、〔１−２〕に記載するように、復号側がカメラの位置関係の情報を符号化データから得ることになる場合にあって、カメラの位置関係が外部から与えられる場合には、それを取得して符号化し、一方、カメラの位置関係が外部から与えられない場合には、全カメラの画像に基づいてカメラの位置関係を推定して、それを符号化するように処理するのである。 Therefore, in the present invention, as described in [1-2], when the decoding side obtains information on the camera positional relationship from the encoded data, and the camera positional relationship is given from the outside. Obtains and encodes it, and if the camera's positional relationship is not given from the outside, estimates the camera's positional relationship based on the images of all cameras and encodes it To do.

この処理機能に従って、カメラの位置関係を符号化する必要があるときに、カメラの位置関係が外部から与えられない場合には、与えられた多視点動画像からカメラの位置関係を推定するため、より正確な距離画像と視差補償画像とを求めることができることで、視差差分画像に残る信号を小さくすることができるため、カメラの位置関係が明示的に与えられなくても符号化効率が悪くなることを防ぐことができる。すなわち、カメラの位置関係が明示的に与えられなくても、さまざまなカメラ配置の多視点動画像に対して、より柔軟に対応して符号化が行うことができるようになる。 According to this processing function, when it is necessary to encode the positional relationship of the camera, when the positional relationship of the camera is not given from the outside, in order to estimate the positional relationship of the camera from the given multi-view video, Since it is possible to obtain a more accurate distance image and parallax compensation image, it is possible to reduce the signal remaining in the parallax difference image, resulting in poor encoding efficiency even if the positional relationship of the camera is not explicitly given. Can be prevented. In other words, even when the positional relationship of the cameras is not explicitly given, encoding can be performed in a more flexible manner with respect to multi-view video images with various camera arrangements.

（Ｃ）〔１−３〕に記載した処理機能の持つ意味 (C) Meaning of the processing function described in [1-3]

〔１−３〕に記載した処理機能の持つ意味は、〔１−１〕に記載した処理機能の持つ意味と基本的に同じである。ただし、〔１−３〕に記載した処理機能では、カメラの位置関係の情報も符号化することを想定しているので、視差補償画像を推定する際に必要となるカメラの位置関係についても、カメラ位置関係情報の符号化データを復号することで得られるものを使用するように処理している。 The meaning of the processing function described in [1-3] is basically the same as the meaning of the processing function described in [1-1]. However, since the processing function described in [1-3] assumes that information on the positional relationship of the camera is also encoded, the positional relationship of the camera necessary for estimating the parallax compensation image is also Processing is performed such that data obtained by decoding the encoded data of the camera positional relationship information is used.

（Ｄ）〔１−４〕に記載した処理機能の持つ意味 (D) Meaning of the processing function described in [1-4]

各カメラはほぼ同じ被写体と背景を撮影しているといっても、全く同じ視点から撮影を行なっているわけではないので、各カメラが撮影している空間は異なる。そのため、どのカメラ対で視差予測をするかによって、精度の高い視差予測が可能な領域の大きさが異なる。精度の高い視差予測が可能な領域が小さくなるカメラを基準視点としてしまうと符号化効率が悪くなる。 Even though each camera captures almost the same subject and background, it does not shoot from the exact same viewpoint, so the space captured by each camera is different. For this reason, the size of the region where the parallax prediction with high accuracy is different depends on which camera pair performs the parallax prediction. If a camera with a small area where parallax prediction with high accuracy is small is used as a reference viewpoint, the encoding efficiency is deteriorated.

そこで、本発明では、〔１−４〕に記載するように、他のカメラが撮影する空間と最も重複する空間を撮影しているカメラを基準視点となるカメラとして設定するように処理するのである。 Therefore, in the present invention, as described in [1-4], processing is performed so that the camera that captures the space most overlapping with the space captured by the other camera is set as the reference viewpoint camera. .

この処理機能に従って、各カメラから撮影される映像の共通部分がより多く撮影されているカメラを基準視点に選ぶことになるので、全カメラにおける精度の高い視差予測が可能な領域の合計を最大にすることができるため、符号化効率が悪くなることを防ぐことができる。すなわち、多くの領域に関して精度の高い視差補償画像を生成できるため、より効率的な符号化を行うことができるようになる。 According to this processing function, the camera that picks up more common parts of the images taken from each camera is selected as the reference viewpoint, so the total of the areas that can be accurately predicted for all cameras is maximized. Therefore, it is possible to prevent the encoding efficiency from deteriorating. That is, since a highly accurate parallax compensation image can be generated for many regions, more efficient encoding can be performed.

（Ｅ）〔１−５〕に記載した処理機能の持つ意味 (E) Meaning of the processing function described in [1-5]

本発明においては、もともとの符号化対象である多視点動画像のほかに距離画像を符号化する必要が生じている。そのため、距離画像を符号化するのに必要な符号量はできる限り少なくする必要がある。距離画像は視差情報を提供して視差補償画像を作成するために存在することから、そのために必要な情報が含まれていればよいということになる。つまり、距離画像は画素単位の視差情報を提供できる程度のものでかまわないということを示している。そのような精度の距離推定では隣接する画素での推定距離が同じになる場合が多い。 In the present invention, it is necessary to encode a distance image in addition to a multi-view video that is originally an encoding target. Therefore, it is necessary to reduce the amount of codes necessary for encoding the distance image as much as possible. Since the distance image exists to provide the parallax information and create the parallax compensation image, it is only necessary to include information necessary for that purpose. That is, the distance image may be of a size that can provide disparity information in units of pixels. In such an accurate distance estimation, the estimated distances in adjacent pixels are often the same.

そこで、本発明では、〔１−５〕に記載するように、画像をブロックに分割して、ブロックごとに距離を推定することで距離画像を生成するように処理するのである。 Therefore, in the present invention, as described in [1-5], the image is divided into blocks, and the distance image is generated by estimating the distance for each block.

この処理機能に従って、精度をある程度保ったまま、距離画像の符号化において、より効率的な符号化を行うことができるようなる。 According to this processing function, more efficient encoding can be performed in encoding of a distance image while maintaining a certain degree of accuracy.

（Ｆ）〔１−６〕に記載した処理機能の持つ意味 (F) Meaning of the processing function described in [1-6]

視差予測された画像と符号化対象の画像との誤差が単純に小さくなるように距離の予測を行った場合、距離の予測には時間方向の相関が考慮に入れられていないことから、視差予測された視差補償画像における時間方向の相関が失われてしまう。視差予測された視差補償画像において時間方向の相関がない場合、それと符号化対象の画像との差分で求められる視差差分画像においても時間方向の相関がなくなってしまうことになる。そのような場合、視差差分画像の符号化効率は非常に悪くなる。 When the distance prediction is performed so that the error between the parallax-predicted image and the encoding target image is simply reduced, the correlation in the time direction is not taken into account in the distance prediction. Correlation in the time direction in the parallax compensated image is lost. If there is no temporal correlation in the parallax-predicted parallax-compensated image, the temporal correlation will be lost in the parallax difference image obtained from the difference between it and the encoding target image. In such a case, the encoding efficiency of the parallax difference image becomes very poor.

そこで、本発明では、〔１−６〕に記載するように、規定のアルゴリズムに従って距離画像を生成する場合に、その距離画像に基づいて算出されることになる視差差分画像の符号化効率が向上するようにと、その生成した距離画像をそのまま用いるのか時間的に変化しないものに変更して用いるのかを決定するように処理するのである。ここで、この決定については、距離画像がブロックを単位にして符号化されるような場合には、そのブロックを単位にして行うことになる。 Therefore, in the present invention, as described in [1-6], when a distance image is generated according to a prescribed algorithm, the encoding efficiency of the parallax difference image that is calculated based on the distance image is improved. If so, processing is performed so as to determine whether the generated distance image is used as it is or after being changed to one that does not change with time. Here, when the distance image is encoded in units of blocks, this determination is performed in units of the blocks.

この処理機能に従って、視差差分画像における時間方向の相関を考慮に入れて距離を推定するため、視差差分画像において時間方向の相関が失われることを防ぐことが可能となり、視差差分画像の符号化において、より効率的な符号化を行うことができるようになる。 According to this processing function, the distance is estimated in consideration of the correlation in the time direction in the parallax difference image, so that it is possible to prevent the time direction correlation from being lost in the parallax difference image. Thus, more efficient encoding can be performed.

（Ｇ）〔１−７〕に記載した処理機能の持つ意味 (G) Meaning of the processing function described in [1-7]

基準視点画像と他の視点における画像とは、視差補償をする場合、必ずしも１対１に対応しない。そのため視差補償画像において予測値の存在しない画素が現れることになる。そうなると視差差分画像は画素間で大きく値の異なる画像となる。そのような画像は自然な画像と異なる性質を持つようになることから、一般的な符号化手法を用いたときに符号化効率が低下する。 When the parallax compensation is performed, the reference viewpoint image and the image at another viewpoint do not necessarily correspond one-to-one. Therefore, pixels having no predicted value appear in the parallax compensation image. In this case, the parallax difference image becomes an image having a greatly different value between pixels. Since such an image has a different property from a natural image, the encoding efficiency decreases when a general encoding method is used.

そこで、本発明では、〔１−７〕に記載するように、視差補償画像を推定するときに画素値を推定できない画素について、周辺の画素の画素値から、その画素の画素値を推定するように処理するのである。 Therefore, in the present invention, as described in [1-7], for a pixel whose pixel value cannot be estimated when estimating a parallax compensation image, the pixel value of the pixel is estimated from the pixel values of surrounding pixels. Is processed.

この処理機能に従って、視差補償値のない画素に対して、隣接画素から予測値を作り出すことで、画素間で値が大きく異ならない視差差分画像を生成することが可能となる。隣接画素の値に相関があることはよく知られており、２次元動画像の符号化でも用いられているため、この予測が大きく外れることは少ない。したがって、視差差分画像における符号化対象情報を減らすことで符号化効率を上げることができるようになる。 According to this processing function, by generating a predicted value from an adjacent pixel for a pixel having no parallax compensation value, it is possible to generate a parallax difference image whose value does not differ greatly between pixels. It is well known that there is a correlation between adjacent pixel values, and since this is also used in the encoding of a two-dimensional moving image, this prediction is unlikely to deviate greatly. Therefore, the encoding efficiency can be increased by reducing the encoding target information in the parallax difference image.

（Ｈ）〔１−８〕に記載した処理機能の持つ意味 (H) Meaning of the processing function described in [1-8]

〔１−７〕に記載した処理機能では、視差補償画像を推定するときに画素値を推定できない画素について、空間的な予測を行うことでその画素値を推定するようにしている。 In the processing function described in [1-7], a pixel value cannot be estimated when estimating a parallax compensation image, and the pixel value is estimated by performing spatial prediction.

これに対して、〔１−８〕に記載した処理機能では、視差補償画像を推定するときに画素値を推定できない画素について、周辺の画素の動き情報からその画素の動き情報を推定して、その推定した動き情報と符号化済みの画像の画素値とに基づいて、その画素の画素値を推定するように処理している。 On the other hand, in the processing function described in [1-8], for a pixel whose pixel value cannot be estimated when estimating a parallax compensation image, the motion information of the pixel is estimated from the motion information of the surrounding pixels. Based on the estimated motion information and the pixel value of the encoded image, processing is performed to estimate the pixel value of the pixel.

この処理機能では、一般的な動画像の符号化で用いられる隣接画素における動きベクトルの相関を利用していることから、この予測が大きく外れることは少ない。したがって、〔１−７〕に記載した処理機能と同様に、視差差分画像における符号化対象情報を減らすことで符号化効率を上げることができるようになる。 In this processing function, since the correlation of motion vectors in adjacent pixels used in general video encoding is used, this prediction is unlikely to deviate greatly. Therefore, similarly to the processing function described in [1-7], the encoding efficiency can be increased by reducing the encoding target information in the parallax difference image.

（Ｉ）〔１−９〕に記載した処理機能の持つ意味 (I) Meanings of the processing functions described in [1-9]

〔１−７〕に記載した処理機能では、視差補償画像を推定するときに画素値を推定できない画素について、空間的な予測を行うことでその画素値を推定するようにし、一方、〔１−８〕に記載した処理機能では、視差補償画像を推定するときに画素値を推定できない画素について、時間的な予測を行うことでその画素値を推定するようにしている。 In the processing function described in [1-7], for a pixel whose pixel value cannot be estimated when estimating a parallax compensation image, the pixel value is estimated by performing spatial prediction, In the processing function described in [8], the pixel value is estimated by performing temporal prediction on a pixel whose pixel value cannot be estimated when the parallax compensation image is estimated.

このどちらの予測方法を用いるのかについては、符号化効率の観点から選択することが望ましい。 It is desirable to select which prediction method to use from the viewpoint of coding efficiency.

そこで、本発明では、〔１−９〕に記載するように、〔１−７〕の推定方法に従って推定した視差補償画像を用いる場合の符号量と、〔１−８〕の推定方法に従って推定した視差補償画像を用いる場合の符号量とを比較して、視差補償画像ごとに効率的な符号化を行える方法を選択することで、その画素の画素値を推定するように処理するのである。この構成を採るときには、どちらの予測モードを用いたのかを示す情報についても符号化することになる。 Therefore, in the present invention, as described in [1-9], the code amount in the case of using the parallax compensation image estimated according to the estimation method of [1-7] and the estimation based on the estimation method of [1-8] By comparing the amount of code when using a parallax compensated image and selecting a method that allows efficient coding for each parallax compensated image, processing is performed to estimate the pixel value of that pixel. When this configuration is adopted, information indicating which prediction mode is used is also encoded.

この処理機能に従って、視差補償値のない画素に対して予測値を作り出すことで視差補償画像を推定し、これにより、視差差分画像における符号化対象情報を減らすことで符号化効率を上げることができるようにすることを実現するときにあって、その符号化効率の向上をさらに確実なものにすることができるようになる。 According to this processing function, a parallax compensation image is estimated by creating a prediction value for a pixel having no parallax compensation value, and thereby encoding efficiency can be increased by reducing encoding target information in the parallax difference image. Therefore, the improvement of the encoding efficiency can be further ensured.

（Ｊ）〔１−１０〕に記載した処理機能の持つ意味 (J) Meaning of the processing function described in [1-10]

基準視点画像と距離画像とはどちらも同じ視点からの画像であるため、動きに関しては非常に強い相関があると言える。したがって、両者を独立に符号化してしまうと、かなりの冗長性が残ることになる。 Since both the reference viewpoint image and the distance image are images from the same viewpoint, it can be said that there is a very strong correlation with respect to motion. Therefore, if both are encoded independently, considerable redundancy remains.

そこで、本発明では、〔１−１０〕に記載するように、基準視点画像を符号化する際に使われた動きベクトルを用いて距離画像を符号化するように処理するのである。 Therefore, in the present invention, as described in [1-10], processing is performed so that the distance image is encoded using the motion vector used when the reference viewpoint image is encoded.

この処理機能に従って、同じ動きを表すベクトルを重複して符号化することをなくして、符号化効率を向上させることができるようになる。しかも、基準視点以外の視点の画像を復号する際には、基準視点も距離画像も必ず必要であるため、両者に参照関係があっても、本発明により実現される必要なカメラ以外の情報をできるだけ抑えて取り出すという機能を損なうことにはならない。したがって、基準視点画像と距離画像との相関を用いて、基準視点画像と距離画像とをあわせた符号化効率を上げることができるようになる。 According to this processing function, it is possible to improve the encoding efficiency by avoiding redundant encoding of vectors representing the same motion. In addition, when decoding an image of a viewpoint other than the reference viewpoint, both the reference viewpoint and the distance image are always required. Therefore, even if there is a reference relationship between them, information other than the necessary camera realized by the present invention can be obtained. It does not impair the function of taking out as much as possible. Therefore, it is possible to increase the coding efficiency of the reference viewpoint image and the distance image by using the correlation between the reference viewpoint image and the distance image.

（Ｋ）〔１−１１〕に記載した処理機能の持つ意味 (K) Meaning of the processing function described in [1-11]

同じ被写体を撮影しているということは、被写体の実際の３次元的動きは１つである。そのため、カメラの位置関係と距離画像と基準視点における動きベクトルとから、他視点における動きをある程度予測することができる。しかし、基準視点における動きベクトルは２次元であり、距離画像も完全な３次元ではないので、予測ベクトルは常に正しいわけではない。また、符号化効率という面で言えば、実際の動きを表した動きベクトルを用いることが常に最も高い符号化効率を達成するとは限らない。 The fact that the same subject is being photographed means that the actual three-dimensional movement of the subject is one. Therefore, it is possible to predict the motion at another viewpoint to some extent from the positional relationship of the camera, the distance image, and the motion vector at the reference viewpoint. However, since the motion vector at the reference viewpoint is two-dimensional and the distance image is not perfect three-dimensional, the prediction vector is not always correct. In terms of coding efficiency, the use of motion vectors representing actual motion does not always achieve the highest coding efficiency.

そこで、本発明では、〔１−１１〕に記載するように、視差差分画像を符号化するときに、基準視点画像を符号化する際に使われた動きベクトルと距離画像とカメラの位置関係とに基づいて推定される動きベクトルか、自身の参照画像から推定される動きベクトルの内の符号化効率のよい方を選択して視差差分予測値を生成するように処理するのである。 Therefore, in the present invention, as described in [1-11], when encoding the parallax difference image, the motion vector, the distance image, and the positional relationship between the camera used for encoding the reference viewpoint image One of the motion vectors estimated based on the reference image or the motion vector estimated from its own reference image is selected so as to generate a parallax difference prediction value.

この処理機能に従って、各視点に閉じて求められる動きベクトルと視差補償で求められる動きベクトルのうち符号化効率のよい方を選ぶことで、視差補償で求められる動きベクトルの間違いを許容して高い符号化効率を達成することが可能となる。すなわち、同じ被写体を撮影していることからくる、各視点における動画像に含まれる動き成分の相関を利用することで、視差差分画像の符号化効率を上げることができるようになる。 According to this processing function, by selecting the motion vector obtained by closing each viewpoint and the motion vector obtained by parallax compensation, which has the highest coding efficiency, a high code that allows a motion vector error obtained by parallax compensation is allowed. It is possible to achieve efficiency. That is, it is possible to increase the coding efficiency of the parallax difference image by using the correlation of the motion components included in the moving image at each viewpoint, which comes from shooting the same subject.

（Ｌ）〔１−１２〕に記載した処理機能の持つ意味 (L) Meaning of the processing function described in [1-12]

〔１−１１〕に記載した処理機能では、基準視点画像を符号化する際に使われた動きベクトルと距離画像とカメラの位置関係とに基づいて動きベクトルを推定することになるが、〔１−１２〕に記載した処理機能では、この動きベクトルの推定にあたって、復号側の処理に合わせて、距離画像の符号化データを復号することで得られる距離画像を用いて動きベクトルを推定するように処理する。 In the processing function described in [1-11], the motion vector is estimated based on the motion vector used when the reference viewpoint image is encoded, the distance image, and the positional relationship between the cameras. In the processing function described in -12], in estimating the motion vector, the motion vector is estimated using the distance image obtained by decoding the encoded data of the distance image in accordance with the processing on the decoding side. To process.

この処理機能に従って、復号側と同じ方法で動きベクトルを推定するようにすることから、距離画像に含まれる符号化歪みの影響を取り除いて動きベクトルの推定が可能になることから、復号において歪みが蓄積するのを防ぐことができるようになる。 According to this processing function, since the motion vector is estimated by the same method as that on the decoding side, it is possible to estimate the motion vector by removing the influence of the coding distortion included in the distance image. It becomes possible to prevent accumulation.

本発明によれば、視差方向予測と時間方向予測とを同時に使うことによって多視点動画像の符号化効率を向上させることができるようになるとともに、この符号化にあたって、共通情報として、基準視点の映像と基準視点における距離画像とを利用することによって、受け手が必要とする任意のカメラの映像を、必要ないカメラの情報を送信することをできるだけ抑えて取り出すという機能を実現できるようになる。 According to the present invention, it is possible to improve the encoding efficiency of a multi-view video by using the parallax direction prediction and the temporal direction prediction at the same time. By using the image and the distance image at the reference viewpoint, it is possible to realize a function of extracting an image of an arbitrary camera required by the receiver while suppressing transmission of unnecessary camera information as much as possible.

そして、ある視点からの被写体の推定距離情報を用いて符号化を行うことから、被写体の幾何情報が提供されることとなり、自由視点動画像を生成するのに必要な幾何情報を推定する処理量を削減できるようになるとともに、必要とされる品質の自由視点動画像を生成するのに必要な映像の数を減少させたりすることができるようになる。 Then, since encoding is performed using estimated distance information of the subject from a certain viewpoint, geometric information of the subject is provided, and the amount of processing for estimating the geometric information necessary to generate the free viewpoint moving image Can be reduced, and the number of videos required to generate a free viewpoint moving image with the required quality can be reduced.

カメラ構成の一例を示す図である。It is a figure which shows an example of a camera structure. 本発明の映像符号化装置の一実施形態例である。1 is an embodiment of a video encoding device according to the present invention. カメラ情報初期設定部の構成の詳細を示す図である。It is a figure which shows the detail of a structure of a camera information initial setting part. 基準視点動画像処理部の構成の詳細を示す図である。It is a figure which shows the detail of a structure of a reference | standard viewpoint moving image processing part. 距離画像処理部の構成の詳細を示す図である。It is a figure which shows the detail of a structure of a distance image process part. 非基準視点動画像処理部の構成の詳細を示す図である。It is a figure which shows the detail of a structure of a non-reference | standard viewpoint moving image process part. 本発明の映像符号化装置の実行する処理フローである。It is a processing flow which the video coding apparatus of this invention performs. 本発明の映像符号化装置の実行する処理フローである。It is a processing flow which the video coding apparatus of this invention performs. 本発明の映像符号化装置の実行する処理フローである。It is a processing flow which the video coding apparatus of this invention performs. 本発明の映像符号化装置の実行する処理フローである。It is a processing flow which the video coding apparatus of this invention performs. 本発明の映像符号化装置の実行する処理フローである。It is a processing flow which the video coding apparatus of this invention performs. 基準視点画像の投影処理の説明図である。It is explanatory drawing of the projection process of a reference | standard viewpoint image. 本発明の映像符号化装置が符号化するデータの種類とその符号化の順番を示す図である。It is a figure which shows the kind of data which the video coding apparatus of this invention encodes, and the order of the encoding. 本発明の映像復号装置の一実施形態例である。It is an example of one Embodiment of the video decoding apparatus of this invention. 非基準視点動画像復号部の構成の詳細を示す図である。It is a figure which shows the detail of a structure of a non-reference | standard viewpoint moving image decoding part. 本発明の映像復号装置の実行する処理フローである。It is a processing flow which the video decoding apparatus of this invention performs. 本発明の映像復号装置の実行する処理フローである。It is a processing flow which the video decoding apparatus of this invention performs. 本発明の映像復号装置の実行する処理フローである。It is a processing flow which the video decoding apparatus of this invention performs. 従来技術の説明図である。It is explanatory drawing of a prior art.

以下、実施の形態に従って本発明を詳細に説明する。 Hereinafter, the present invention will be described in detail according to embodiments.

ここで、以下に説明する実施形態例では、３つのカメラで撮影された多視点動画像を符号化する場合を想定している。 Here, in the exemplary embodiment described below, it is assumed that a multi-view video captured by three cameras is encoded.

図１に、本実施形態例で使用するカメラ構成の概念図を示す。図中の四角型の図形は各カメラのフレームを表す。 FIG. 1 shows a conceptual diagram of a camera configuration used in this embodiment. The square figure in the figure represents the frame of each camera.

このカメラ構成の場合、本発明では、まず、カメラＣ１，Ｃ２，Ｃ３の位置関係を求め、いずれか１つのカメラを基準視点と定める。次に、その基準視点からの距離画像（被写体及び背景までの距離の大きさを示す画像）を生成する。そして、同一時刻のフレームに関しては、基準視点に設定されたカメラの画像、距離画像、基準視点以外のカメラの画像の順に符号化を行い、この符号化にあたって、距離画像については基準視点のカメラの画像を参照しながら符号化を行うとともに、基準視点以外のカメラの画像については基準視点のカメラの画像と距離画像とを参照しながら符号化を行う。 In the case of this camera configuration, in the present invention, first, the positional relationship between the cameras C1, C2, and C3 is obtained, and any one of the cameras is determined as a reference viewpoint. Next, a distance image from the reference viewpoint (an image indicating the distance between the subject and the background) is generated. For frames at the same time, encoding is performed in the order of the image of the camera set as the reference viewpoint, the distance image, and the image of the camera other than the reference viewpoint. Encoding is performed with reference to the image, and the images of the cameras other than the reference viewpoint are encoded with reference to the image of the camera at the reference viewpoint and the distance image.

説明を簡単にするために、図１の中でフレームの図に記述されてある順番で符号化をしていくこととする。 In order to simplify the description, encoding is performed in the order described in the frame diagram in FIG.

図２ないし図６に、本発明の映像符号化装置１の一実施形態例を示す。 2 to 6 show an embodiment of the video encoding apparatus 1 according to the present invention.

図２に示すように、本発明の映像符号化装置１は、カメラＣ１，Ｃ２，Ｃ３のフレームを図１に示す順番で入力する画像情報入力部１１と、全てのカメラのある時刻のフレームを蓄積する画像メモリ１２と、カメラの位置関係（カメラの設置位置及びカメラの向き）を推定し、基準視点となるカメラを選出するカメラ情報初期設定部１３と、基準視点の画像を符号化する基準視点動画像処理部１４と、同一時刻の全てのカメラのフレームに基づいて、基準視点からの距離画像を生成し符号化する距離画像処理部１５と、距離画像と基準視点画像とカメラの位置関係とに基づいて基準視点以外のカメラのフレームを推定し、そのカメラの入力画像との差分を符号化する非基準視点動画像処理部１６とを備える。 As shown in FIG. 2, the video encoding device 1 of the present invention includes an image information input unit 11 that inputs the frames of the cameras C1, C2, and C3 in the order shown in FIG. The image memory 12 to be accumulated, the camera positional relationship (camera installation position and camera orientation), a camera information initial setting unit 13 for selecting a camera as a reference viewpoint, and a reference for encoding an image of the reference viewpoint The viewpoint moving image processing unit 14, the distance image processing unit 15 that generates and encodes a distance image from the reference viewpoint based on the frames of all the cameras at the same time, and the positional relationship between the distance image, the reference viewpoint image, and the camera And a non-reference viewpoint moving image processing unit 16 that estimates a frame of a camera other than the reference viewpoint and encodes a difference from the input image of the camera.

図３は、カメラ情報初期設定部１３の構成の詳細を示す図である。 FIG. 3 is a diagram showing details of the configuration of the camera information initial setting unit 13.

この図に示すように、カメラ情報初期設定部１３は、カメラの位置関係を推定するカメラ位置関係推定部１３０と、入力カメラ群の中から基準視点を選択する基準視点設定部１３１と、カメラの位置関係の情報を符号化するカメラ位置関係符号化部１３２とを備える。 As shown in this figure, the camera information initial setting unit 13 includes a camera positional relationship estimation unit 130 that estimates the positional relationship of the cameras, a reference viewpoint setting unit 131 that selects a reference viewpoint from the input camera group, A camera positional relationship encoding unit 132 that encodes positional relationship information.

ここで、カメラ位置関係推定部１３０が推定したカメラの位置関係の情報については、距離画像処理部１５及び非基準視点動画像処理部１６に通知されることになる。 Here, the camera positional relationship information estimated by the camera positional relationship estimation unit 130 is notified to the distance image processing unit 15 and the non-reference viewpoint moving image processing unit 16.

図４は、基準視点動画像処理部１４の構成の詳細を示す図である。 FIG. 4 is a diagram showing details of the configuration of the reference viewpoint moving image processing unit 14.

この図に示すように、基準視点動画像処理部１４は、基準視点の画像を通常の２次元動画像として符号化する基準視点動画像符号化部１４０と、符号化された基準視点動画像を復号する基準視点動画像復号部１４１とを備える。 As shown in this figure, the reference viewpoint moving image processing unit 14 encodes the reference viewpoint moving image encoding unit 140 that encodes the reference viewpoint image as a normal two-dimensional moving image, and the encoded reference viewpoint moving image. A reference viewpoint moving image decoding unit 141 for decoding.

ここで、基準視点動画像符号化部１４０が基準視点の画像を符号化する際に生成したブロック分割タイプや動きベクトルなどの符号化対象情報については、距離画像処理部１５及び非基準視点動画像処理部１６に通知されることになる。また、基準視点動画像復号部１４１が復号した基準視点動画像については、非基準視点動画像処理部１６に通知されることになる。 Here, regarding the encoding target information such as the block division type and the motion vector generated when the reference viewpoint moving image encoding unit 140 encodes the image of the reference viewpoint, the distance image processing unit 15 and the non-reference viewpoint moving image The processing unit 16 is notified. The reference viewpoint moving image decoded by the reference viewpoint moving image decoding unit 141 is notified to the non-reference viewpoint moving image processing unit 16.

図５は、距離画像処理部１５の構成の詳細を示す図である。 FIG. 5 is a diagram showing details of the configuration of the distance image processing unit 15.

この図に示すように、距離画像処理部１５は、多視点動画像とそれらを撮影したカメラの位置関係とに基づいて基準視点からの距離画像を推定し生成する距離画像生成部１５０と、生成された距離画像を通常の２次元動画像として符号化する距離動画像符号化部１５１と、符号化された距離動画像を復号する距離動画像復号部１５２とを備える。 As shown in this figure, the distance image processing unit 15 estimates and generates a distance image from the reference viewpoint based on the multi-viewpoint moving images and the positional relationship between the cameras that captured them, A distance video encoding unit 151 that encodes the distance image as a normal two-dimensional video, and a distance video decoding unit 152 that decodes the encoded distance video.

ここで、距離動画像復号部１５２が復号した距離動画像については、非基準視点動画像処理部１６に通知されることになる。 Here, the distance moving image decoded by the distance moving image decoding unit 152 is notified to the non-reference viewpoint moving image processing unit 16.

図６は、非基準視点動画像処理部１６の構成の詳細を示す図である。 FIG. 6 is a diagram showing details of the configuration of the non-reference viewpoint moving image processing unit 16.

この図に示すように、非基準視点動画像処理部１６は、同一時刻における一度符号化され復号された距離動画像（以後、復号距離動画像と呼ぶ）と一度符号化され復号された基準視点動画像（以後、復号基準視点動画像と呼ぶ）とに基づいて、そのカメラの視点における画像（以後、視差補償画像と呼ぶ）を生成する視差補償画像生成部１６０と、視差補償画像と入力された同一視点、同一時刻における画像との差分画像（以後、視差差分画像と呼ぶ）を生成する視差差分画像生成部１６１と、生成された視差差分画像を通常の２次元動画像として符号化する視差差分動画像符号化部１６２と、符号化された視差差分動画像を復号する視差差分動画像復号部１６３と、符号化されて復号された基準視点動画像と距離動画像と視差差分動画像とから復元される非基準視点の画像を格納する非基準視点画像メモリ１６４とを備える。 As shown in this figure, the non-reference viewpoint moving image processing unit 16 is a once-encoded and decoded distance moving image (hereinafter referred to as a decoded distance moving image) at the same time and a once-encoded and decoded reference viewpoint. Based on a moving image (hereinafter referred to as a decoding reference viewpoint moving image), a parallax compensation image generation unit 160 that generates an image (hereinafter referred to as a parallax compensation image) at the viewpoint of the camera, and a parallax compensation image are input. A parallax difference image generation unit 161 that generates a difference image (hereinafter referred to as a parallax difference image) with an image at the same viewpoint and the same time, and a parallax that encodes the generated parallax difference image as a normal two-dimensional moving image A differential video encoding unit 162; a parallax differential video decoding unit 163 that decodes the encoded parallax differential video; a reference viewpoint video, a distance video, and a parallax differential video that are encoded and decoded; And a non-reference viewpoint image memory 164 for storing the image of the non-reference viewpoint is al restored.

図７ないし図１１に、このように構成される本発明の映像符号化装置１の実行する処理フローを示す。 7 to 11 show a processing flow executed by the video encoding apparatus 1 of the present invention configured as described above.

次に、これらの処理フローに従って、このように構成される本発明の映像符号化装置１の実行する処理について詳細に説明する。 Next, processing executed by the video encoding device 1 of the present invention configured as described above will be described in detail according to these processing flows.

本発明の映像符号化装置１では、符号化処理全体の概要を示す図７の処理フローに示すように、カメラＣ１，Ｃ２，Ｃ３のフレームが図１に示す順番で、次々と画像情報入力部１１に入力される〔Ａ１〕。入力された画像は同一時刻の全てのカメラからの画像が入力されるまで画像メモリ１２に蓄積される〔Ａ２，Ａ３〕。つまり、本実施形態例の場合、画像メモリ１２は３フレーム分のメモリ容量を持つことになる。 In the video encoding apparatus 1 of the present invention, as shown in the processing flow of FIG. 7 showing the outline of the entire encoding process, the frames of the cameras C1, C2, and C3 are sequentially displayed in the order shown in FIG. 11 [A1]. The input images are stored in the image memory 12 until images from all cameras at the same time are input [A2, A3]. That is, in the case of the present embodiment, the image memory 12 has a memory capacity for three frames.

次に、これまでに１フレームも符号化処理が行われていない場合には、カメラ情報初期設定部１３において各カメラの位置関係が調べられ、全てのカメラから１つの基準視点が選択された後に、入力フレームの符号化処理が行われる〔Ａ４，Ａ５，Ａ６〕。一方、これまでに１フレームでも符号化が行われていた場合には、既に求められているカメラの位置関係、基準視点を用いて入力フレームの符号化が行われる〔Ａ４，Ａ６〕。 Next, if no encoding process has been performed so far, the camera information initial setting unit 13 checks the positional relationship of each camera, and after one reference viewpoint is selected from all cameras. The input frame is encoded [A4, A5, A6]. On the other hand, if even one frame has been encoded so far, the input frame is encoded using the already obtained camera positional relationship and reference viewpoint [A4, A6].

図８に、カメラ情報初期設定部１３で行われる処理の処理フローを示す。 FIG. 8 shows a processing flow of processing performed by the camera information initial setting unit 13.

この処理フローに示すように、カメラ情報初期設定部１３では、まず、カメラ位置関係推定部１３０で、与えられた同一時刻における３つのカメラのフレームから各々のカメラの相対な位置関係を推定する〔Ｂ１〕。 As shown in this processing flow, in the camera information initial setting unit 13, first, the camera positional relationship estimation unit 130 estimates the relative positional relationship of each camera from the frames of the three cameras at the same given time [ B1].

この推定法としては、下記に示す参考文献に記載されているものに代表される任意のカメラパラメータの推定法が利用できる。 As this estimation method, any camera parameter estimation method represented by those described in the following references can be used.

参考文献：Oliver Faugeras, Three-Dimension Computer Vision - MIT Press; BCTC /UFF-006.37 F259 1993-ISBN:0-262-06158-9.
この推定したカメラの位置関係の情報は復号側でも正確に必要となることから、カメラ位置関係符号化部１３２で、可逆符号化する〔Ｂ２〕。 References: Oliver Faugeras, Three-Dimension Computer Vision-MIT Press; BCTC /UFF-006.37 F259 1993-ISBN: 0-262-06158-9.
Since the information on the estimated positional relationship of the camera is necessary accurately on the decoding side, the camera positional relationship encoding unit 132 performs lossless encoding [B2].

そして、基準視点設定部１３１で、このカメラの位置関係から互いのカメラで撮影可能な空間を算出し、最も他カメラと重複する空間が大きいカメラを基準視点とする処理を行う〔Ｂ３〕。 Then, the reference viewpoint setting unit 131 calculates a space that can be photographed by the cameras from the positional relationship of the cameras, and performs a process of setting the camera having the largest space overlapping with other cameras as the reference viewpoint [B3].

つまり、以下の式を満たすＣ_bを基準視点カメラとする処理を行う。ただし、Ｓ_j（ｋ）はカメラｊの画像内におけるカメラｋで撮影可能な面積を示す。 In other words, it performs processing for the C _b which satisfy the following expression with reference view camera. Here, S _j (k) indicates an area that can be captured by the camera k in the image of the camera j.

図９に、入力フレームの符号化処理の処理フローを示す。 FIG. 9 shows a processing flow of the input frame encoding process.

この処理フローに示すように、距離画像処理部１５内の距離画像生成部１５０で、画像メモリ１２に蓄えられている全てのカメラのフレームと、カメラ情報初期設定部１３で求められたカメラ位置関係及び基準視点情報とを用いて、基準視点からの被写体と背景の距離画像を生成する〔Ｃ１〕。 As shown in this processing flow, in the distance image generation unit 150 in the distance image processing unit 15, all camera frames stored in the image memory 12 and the camera positional relationship obtained by the camera information initial setting unit 13. Then, a distance image between the subject and the background from the reference viewpoint is generated using the reference viewpoint information [C1].

この際、基準視点画像をあるブロックに分割し、ブロックごとに距離を推定することで距離画像を生成する。基準視点画像のあるブロックにおける距離は次のような方法で推定する。 At this time, the reference viewpoint image is divided into certain blocks, and the distance image is generated by estimating the distance for each block. The distance in a certain block of the reference viewpoint image is estimated by the following method.

すなわち、そのブロックに含まれる画素とその周囲のＭ個の画素において、距離をｄと仮定したときに、カメラの位置関係から対応する非基準視点画像内の画素を求めて、次の評価関数を用いて最小の評価値を与えるｄを、そのブロックの距離とする推定を行う。 That is, when the distance between the pixels included in the block and the surrounding M pixels is assumed to be d, the corresponding pixel in the non-reference viewpoint image is obtained from the positional relationship of the camera, and the following evaluation function is obtained. Then, the estimation is performed with d giving the minimum evaluation value as the distance of the block.

ここで、この式において、Ｂは距離を求めようとするブロックに含まれる基準視点画像における画素とその周囲のＭ個の画素との集合を表し、ある画素ｂ∈Ｂの距離がｄの場合に、非基準視点画像でその画素に対応する画素をｒ（ｂ，ｄ）で表し、画素ｂの持つ画素値をＩ_bで表している。 Here, in this equation, B represents a set of pixels in the reference viewpoint image included in the block whose distance is to be obtained and M pixels around it, and when the distance of a certain pixel bεB is d In the non-reference viewpoint image, a pixel corresponding to the pixel is represented by r (b, d), and a pixel value of the pixel _b is represented by _Ib .

さらに、距離Ｄ_cur（ｄの確定した値）のときに最小の評価値Ｅ_minが得られた場合、１つ前の時刻の過去における同じブロックの持つ距離Ｄ_preを用いて上記の評価関数で求められた評価値Ｅ_preを使い、次の式に示すように、その２つの評価値の差が閾値Ｄ_thを超えなければ過去における距離をそのブロックの距離として用いることとする。 Further, when the minimum evaluation value E _min is obtained at the distance D _cur (a value determined by d), the above evaluation function is used by using the distance D _pre of the same block in the past at the previous time. If the difference between the two evaluation values does not exceed the threshold value D _th as shown in the following equation using the obtained evaluation value E _pre , the distance in the past is used as the distance of the block.

ここで、この閾値Ｄ_thは、上述の〔数２〕で推定される距離をそのまま用いるのか時間的に変化しないものに変更して用いるのかを決定するために用意されており、視差差分画像の符号化効率がよくなる方を用いることを実現するために用意されている。この閾値機能が用意されることで、視差差分画像における時間方向の相関を考慮に入れて距離を推定することになることから、視差差分画像において時間方向の相関が失われることを防ぐことが可能となり、視差差分画像の符号化において、より効率的な符号化を行うことができるようになる。 Here, this threshold value D _th is prepared for determining whether to use the distance estimated in the above [Equation 2] as it is or to change it to one that does not change with time. It is prepared to realize the use of the one with the better coding efficiency. By providing this threshold function, the distance is estimated in consideration of the correlation in the time direction in the parallax difference image, so it is possible to prevent the time direction correlation from being lost in the parallax difference image. Thus, more efficient encoding can be performed in encoding the parallax difference image.

このようにして、図９の処理フローの〔Ｃ１〕では、基準視点の全ての画素に対応する距離を求めることで距離画像を生成するのである。 In this way, in [C1] of the processing flow of FIG. 9, the distance image is generated by obtaining the distance corresponding to all the pixels of the reference viewpoint.

次に、基準視点動画像処理部１４内の基準視点動画像符号化部１４０で、基準視点の画像を通常の２次元動画像（基準視点動画像）として符号化する〔Ｃ２〕。そして、この基準視点動画像の符号化に用いられた動きベクトル（基準視点の画像のフレーム間で求めた動きベクトル）を用いて、距離画像処理部１５内の距離動画像符号化部１５１で、生成した距離画像を２次元動画像（距離動画像）として符号化する〔Ｃ３〕。 Next, the reference viewpoint moving image encoding unit 140 in the reference viewpoint moving image processing unit 14 encodes the reference viewpoint image as a normal two-dimensional moving image (reference viewpoint moving image) [C2]. Then, using the motion vector (motion vector obtained between frames of the reference viewpoint image) used for encoding the reference viewpoint moving image, the distance moving image encoding unit 151 in the distance image processing unit 15 The generated distance image is encoded as a two-dimensional moving image (distance moving image) [C3].

ここで、基準視点動画像と距離動画像における視点位置は同じであるので、動きベクトルや符号化ブロックサイズなどは共通のものが使える。具体的には、基準視点動画像の符号化の際に使われたブロックサイズや動きベクトル情報を距離動画像の符号化の際にそのまま用いるか、新たなブロックサイズや動きベクトルを設定し直すのかを選択的に決める。 Here, since the viewpoint positions in the reference viewpoint moving image and the distance moving image are the same, common motion vectors, encoding block sizes, and the like can be used. Specifically, whether the block size and motion vector information used when encoding the reference viewpoint video are used as they are when encoding the distance video, or whether a new block size and motion vector are set again Select selectively.

その際には、ある定数λを用いて、新たなブロックサイズと動きベクトルとを設定して符号化するのに必要な符号量Ｒ_newと、その情報を用いたときの残差の量Ｄ_newとから次の式で求まるコストＣＯＳＴ_newを計算し、さらに、基準視点動画像の符号化の際に使われたブロックサイズと動きベクトルとを流用することを示すのに必要な符号量Ｒ_oldと、その流用する情報を用いたときの残差の量Ｄ_oldとから同様の式で求まるコストＣＯＳＴ_oldを計算して、より小さなコストとなる方を選択する。 In this case, a certain amount of constant λ is used to set and code a new block size and motion vector, and a code amount R _new necessary for encoding and a residual amount D _new when using this information are used. The cost COST _new obtained by the following equation is calculated from the above, and the code amount R _old necessary to indicate that the block size and the motion vector used for encoding the reference viewpoint moving image are diverted. Then, the cost COST _old obtained by the same expression is calculated from the residual amount D _old when the diverted information is used, and the one having the smaller cost is selected.

また、基準視点動画像の符号化に使われた動きベクトルを、距離動画像を符号化するときにも利用する方法には、予測値に重み付けをする方法や、基準視点動画像を符号化する際の動きベクトルに対する残差ベクトルを用いる方法もある。 In addition, as a method of using the motion vector used for encoding the reference viewpoint moving image when encoding the distance moving image, a method of weighting the prediction value or encoding the reference viewpoint moving image. There is also a method of using a residual vector for the motion vector at the time.

予測値に重み付けをする方法とは、具体的には、基準視点動画像の符号化で用いられた動きベクトルが（Ｖ_x，Ｖ_y）であったとするとき、距離動画像を符号化する際の動き補償の選択肢として、ある定数Ｗeight の情報のみを付加することによって次の式で計算される値を使うことを指す。 Specifically, the method of weighting the predicted value is that when the motion vector used in the encoding of the reference viewpoint moving image is (V _x , V _y ), the distance moving image is encoded. As a motion compensation option, the value calculated by the following equation is used by adding only information of a certain constant Weight.

ここで、estimatedValue(i,j) は位置（ｉ，ｊ）の予測値を表し、previousValue(i,j)は参照フレームにおける位置（ｉ，ｊ）の値を表す。 Here, estimatedValue (i, j) represents the predicted value of position (i, j), and previousValue (i, j) represents the value of position (i, j) in the reference frame.

また、残差ベクトルを使う方法とは、同様にある定数ベクトル（ＭＶ_x，ＭＶ_y）の情報のみを付加することによって次の式で計算されるものを使うことを指す。 Similarly, the method of using a residual vector refers to using the one calculated by the following equation by adding only information of a certain constant vector (MV _x , MV _y ).

このようにして符号化された基準視点動画像については、基準視点動画像処理部１４内の基準視点動画像復号部１４１で復号され、そして、このようにして符号化された距離動画像については、距離画像処理部１５内の距離動画像復号部１５２で復号されることになる〔Ｃ４，Ｃ５〕。 The reference viewpoint moving image encoded in this way is decoded by the reference viewpoint moving image decoding unit 141 in the reference viewpoint moving image processing unit 14, and the distance moving image thus encoded is encoded. The distance moving image decoding unit 152 in the distance image processing unit 15 decodes them [C4, C5].

この復号された基準視点動画像と距離動画像とは、非基準視点動画像処理部１６に送られる。 The decoded reference viewpoint moving image and distance moving image are sent to the non-reference viewpoint moving image processing unit 16.

そして、まず、非基準視点動画像処理部１６内の視差補償画像生成部１６０で、基準視点以外のカメラについて、その復号された基準視点動画像と距離動画像とによって推定される視差補償画像を生成する〔Ｃ６〕。なお、この視差補償画像の生成処理については、図１０の処理フローで後述することにする。 First, the parallax compensation image generation unit 160 in the non-reference viewpoint moving image processing unit 16 obtains a parallax compensation image estimated from the decoded reference viewpoint moving image and the distance moving image for a camera other than the reference viewpoint. Generate [C6]. Note that this parallax compensation image generation processing will be described later in the processing flow of FIG.

続いて、非基準視点動画像処理部１６内の視差差分画像生成部１６１で、この生成された視差補償画像について、入力フレームとの差分を取ることで視差差分画像を生成する〔Ｃ７〕。この視差差分画像については、非基準視点動画像処理部１６内の視差差分動画像符号化部１６２で、普通の２次元動画像として符号化する〔Ｃ８〕。 Subsequently, the parallax difference image generation unit 161 in the non-reference viewpoint moving image processing unit 16 generates a parallax difference image by taking a difference between the generated parallax compensation image and the input frame [C7]. The parallax difference image is encoded as a normal two-dimensional moving image by the parallax difference moving image encoding unit 162 in the non-reference viewpoint moving image processing unit 16 [C8].

続いて、非基準視点動画像処理部１６内の視差差分動画像復号部１６３で、この符号化された視差差分動画像を復号して、非基準視点画像メモリ１６４に格納されている同一時刻の視差補償画像と足し合わすことで復号非基準視点動画像を生成し、次フレームの符号化に用いるために非基準視点画像メモリ１６４に再度格納する〔Ｃ９〕。 Subsequently, the encoded parallax difference moving image is decoded by the parallax difference moving image decoding unit 163 in the non-reference viewpoint moving image processing unit 16 and stored at the same time stored in the non-reference viewpoint image memory 164. A decoded non-reference viewpoint moving image is generated by adding the parallax compensation image, and stored again in the non-reference viewpoint image memory 164 for use in encoding the next frame [C9].

次に、図１０の処理フローに従って、非基準視点動画像処理部１６内の視差補償画像生成部１６０で実行される視差補償画像の生成処理について説明する。 Next, a parallax compensation image generation process executed by the parallax compensation image generation unit 160 in the non-reference viewpoint moving image processing unit 16 will be described according to the processing flow of FIG.

視差補償画像生成部１６０では、視差補償画像を生成するために、距離画像とカメラの位置関係とから基準視点画像の全ての画素の値を対象とするカメラの画像へと投影する〔Ｄ１〕。 In order to generate the parallax compensation image, the parallax compensation image generation unit 160 projects the values of all the pixels of the reference viewpoint image onto the target camera image from the distance image and the positional relationship of the camera [D1].

この投影は、図１２（ａ）に示す式により実行される。ここで、Ｈ_Cnは基準視点で撮影された画像上の点からカメラＣ_nで撮影された画像上の点への変換行列であり、（ｉ，ｊ）は基準視点座標であり、ｄ_ijはその座標に対応する距離画像の値であり、（Ｉ，Ｊ）は基準視点画像上の位置（ｉ，ｊ）に対応するカメラＣ_nで撮影された画像上の位置であり、ｆ_CnはカメラＣ_nの焦点距離であり、Ａ，Ｘ，Ｙは等式を成り立たせる任意の実数である。このとき、Ｈ_Cnは距離画像とカメラの位置関係とから作ることができる。 This projection is executed by the equation shown in FIG. Here, H _Cn is a transformation matrix from points on the image taken at the reference viewpoint to points on the image taken by the camera C _n , (i, j) is the reference viewpoint coordinates, and d _ij is The distance image value corresponding to the coordinates, (I, J) is the position on the image taken by the camera C _n corresponding to the position (i, j) on the reference viewpoint image, and f _Cn is the camera The focal length of C _n , and A, X, and Y are arbitrary real numbers that hold the equation. At this time, H _Cn can be made from the distance image and the positional relationship of the camera.

この〔Ｄ１〕で行われる投影は一般的に１対１に対応しないので、フレーム全体の画素に対して値が割り振られるとは限らない。 Since the projection performed in [D1] generally does not correspond one-to-one, values are not always assigned to the pixels of the entire frame.

例えば、全ての視点において焦点距離ｆの一般的なピンホールモデルの適用可能なカメラを利用し、基準視点のカメラとカメラＣ_nとが同じ方向を向いていて、カメラの位置が水平方向にΔｘだけ離れている場合、Ｈ_Cnは図１２（ｂ）に示す形となる。 For example, a camera that can use a general pinhole model with a focal length f at all viewpoints is used, and the camera of the reference viewpoint and the camera C _n are directed in the same direction, and the camera position is Δx in the horizontal direction. In the case where the distance is only a distance, H _Cn takes the form shown in FIG.

つまり、基準視点上の２つの画素（ｉ１，ｊ１），（ｉ２，ｊ２）が図１２（ｃ）に示す式を満たす場合、どちらの画素もカメラＣ_nでは同じ画素と対応することになるため、Ｈ_Cnが１対１の投影でないことが確認できる。ここで、ｄ_i1j1は画素（ｉ１，ｊ１）に対応する距離画像の値であり、ｄ_i2j2は画素（ｉ２，ｊ２）に対応する距離画像の値である。 In other words, two pixels on the reference viewpoint (i1, j1), (i2 , j2) is 12 if it meets the formula (c), the one of the pixels since the will correspond to the same pixel in the camera C _n , H _Cn is not a one-to-one projection. Here, d _i1j1 is the value of the distance image corresponding to the pixel (i1, j1), and d _i2j2 is the value of the distance image corresponding to the pixel (i2, j2).

このように１対１に対応しないことから、投影によって割り振られた値がない全ての画素において以下の処理を行う〔Ｄ２〕。 Since it does not correspond one-on-one in this way, the following processing is performed on all pixels that do not have a value assigned by projection [D2].

まず、空間的予測を行う〔Ｄ３〕。空間的予測は既に値の割り振られている隣接画素の値から自身の画素の値を求める方法である。次に、時間的予測を行う〔Ｄ４〕。時間的予測は既に値の割り振られている周辺のブロックにおいて、非基準視点動画像処理部１６内の非基準視点画像メモリ１６４に蓄えられている過去のフレームを参照して求められる動きベクトルから自身の動きベクトルを推定して、その動きベクトルを用いて過去のフレームから対応する画素を見つけて補完する方法である。 First, spatial prediction is performed [D3]. Spatial prediction is a method for obtaining the value of its own pixel from the value of an adjacent pixel to which a value has already been assigned. Next, temporal prediction is performed [D4]. Temporal prediction is performed on the basis of motion vectors obtained by referring to past frames stored in the non-reference viewpoint image memory 164 in the non-reference viewpoint moving image processing unit 16 in peripheral blocks to which values have already been assigned. The motion vector is estimated, and the corresponding pixel is found from the past frame using the motion vector and complemented.

そして、実際に符号化対象のフレームとの差分値を比べて、差分値の量と予測の種類を示すために必要な符号量の関係とから、効率的な符号化が行える予測を行う〔Ｄ５〕。この際に用いた予測モードについては、符号化対象情報として視差補償画像と共に次のステップに送られる。 Then, the difference value with the encoding target frame is actually compared, and prediction that enables efficient encoding is performed from the relationship between the amount of difference value and the amount of code necessary to indicate the type of prediction [D5 ]. The prediction mode used at this time is sent to the next step together with the parallax compensation image as encoding target information.

このようにして、非基準視点動画像処理部１６内の視差補償画像生成部１６０で視差補償画像が生成されると〔Ｃ６〕、図９の処理フローで説明したように、入力フレームとの差分を取られることで視差差分画像が生成されて〔Ｃ７〕、その視差差分画像が２次元動画像として符号化され〔Ｃ８〕、さらに、これを復号して非基準視点画像メモリ１６４に格納されている同一時刻の視差補償画像と足し合わすことで復号非基準視点動画像が生成されて、次フレームの符号化に用いるために非基準視点画像メモリ１６４に再度格納されることになる〔Ｃ９〕。 Thus, when the parallax compensation image is generated by the parallax compensation image generation unit 160 in the non-reference viewpoint moving image processing unit 16 [C6], as described in the processing flow of FIG. The parallax difference image is generated [C7], the parallax difference image is encoded as a two-dimensional moving image [C8], and is further decoded and stored in the non-reference viewpoint image memory 164. The decoded non-reference viewpoint moving image is generated by adding the parallax compensation images at the same time and stored again in the non-reference viewpoint image memory 164 for use in encoding the next frame [C9].

この〔Ｃ８〕で行う符号化では、距離動画像と基準視点動画像とを符号化するときに使われた符号化対象情報を用いて動きベクトルの情報を可能ならば共有する。 In the encoding performed in [C8], motion vector information is shared if possible using the encoding target information used when encoding the distance moving image and the reference viewpoint moving image.

図１１に、この動きベクトルの利用に関する処理フローを示す。 FIG. 11 shows a processing flow relating to the use of this motion vector.

この処理フローに示すように、まず、カメラの位置関係と距離画像とから、符号化対象の視差差分動画像の符号化ブロック（ｉ，ｊ）に対応する基準視点動画像のブロック（Ｉ，Ｊ）を求める〔Ｅ１〕。 As shown in this processing flow, first, the block (I, J) of the reference viewpoint moving image corresponding to the encoded block (i, j) of the parallax difference moving image to be encoded is determined from the positional relationship of the camera and the distance image. ) Is obtained [E1].

次に、その基準視点動画像のブロック（Ｉ，Ｊ）を符号化する際に使われた２次元のベクトル（Ｖ_I，Ｖ_J）を抜き出す。このベクトルは２次元であり、始点を参照ブロック、終点を符号化対象ブロックと呼ぶとすると、同一時刻の距離画像における符号化対象ブロックに対応するブロックの推定距離Ｄ_nowを終点の奥行きと仮定し、参照フレームと同一時刻の距離画像における参照ブロックに対応するブロックの推定距離Ｄ_preを始点の奥行きと仮定することによって、３次元の動きベクトル（Ｖ_I，Ｖ_J，Ｄ_now−Ｄ_pre）を定義する〔Ｅ２〕。 Next, a two-dimensional vector (V _I , V _J ) used when coding the block (I, J) of the reference viewpoint moving image is extracted. This vector is two-dimensional. If the start point is called a reference block and the end point is called an encoding target block, the estimated distance D _now of the block corresponding to the encoding target block in the distance image at the same time is assumed to be the end point depth. By assuming the estimated distance D _pre of the block corresponding to the reference block in the distance image at the same time as the reference frame as the depth of the starting point, the three-dimensional motion vector (V _I , V _J , D _now −D _pre ) is obtained. Define [E2].

続いて、カメラの位置関係を用いて、この３次元の動きベクトルを、符号化しようとしている非基準視点のカメラ平面における２次元ベクトルに変換する〔Ｅ３〕。 Subsequently, using the positional relationship of the camera, the three-dimensional motion vector is converted into a two-dimensional vector in the camera plane of the non-reference viewpoint to be encoded [E3].

このベクトルを使用したときの歪み量（予測誤差量）の表現に必要な符号量と、その他の通常の２次元動画像の符号化において求められる動きベクトルを使用したときの歪み量の表現に必要な符号量とを比べて、符号化効率のよいベクトルを採用する〔Ｅ４〕。 Necessary for expressing the amount of distortion necessary for expressing the amount of distortion (prediction error amount) when using this vector and the amount of distortion when using the motion vector required for encoding other ordinary two-dimensional moving images. A vector with good coding efficiency is adopted [E4].

このとき、通常の２次元動画像の符号化において求められる動きベクトルを使用する場合には、その動きベクトルを符号化し、変換して得られた２次元ベクトルを動きベクトル（推定動きベクトル）として使用する場合には、そのことを表す情報を符号化する。 At this time, when a motion vector obtained in encoding of a normal two-dimensional moving image is used, the motion vector is encoded, and the two-dimensional vector obtained by conversion is used as a motion vector (estimated motion vector). If so, information representing that is encoded.

このようにして、本発明の映像符号化装置１は、カメラＣ１，Ｃ２，Ｃ３の撮影した画像を符号化するのである。 In this way, the video encoding device 1 of the present invention encodes images taken by the cameras C1, C2, and C3.

図１３に、図１に示すカメラ構成を用いる場合に、本発明の映像符号化装置１が符号化するデータの種類とその符号化の順番を図示する。 FIG. 13 illustrates the types of data encoded by the video encoding device 1 of the present invention and the encoding order when the camera configuration shown in FIG. 1 is used.

この図に示すように、本発明の映像符号化装置１は、先ず最初に、カメラＣ１，Ｃ２，Ｃ３についてのカメラ情報（どのカメラを基準視点とするのかという情報と、カメラの位置関係の情報などの情報）を符号化し、続いて、時刻Ｔ１について、基準視点の画像／距離画像／基準視点以外のカメラについての視差差分画像を符号化し、続いて、時刻Ｔ２について、基準視点の画像／距離画像／基準視点以外のカメラについての視差差分画像を符号化し、続いて、時刻Ｔ３について、基準視点の画像／距離画像／基準視点以外のカメラについての視差差分画像を符号化するような形で符号化を実行するのである。 As shown in this figure, the video encoding apparatus 1 according to the present invention firstly has camera information about cameras C1, C2, and C3 (information about which camera is used as a reference viewpoint and information about the positional relationship of the cameras). Etc.), and then, for time T1, reference viewpoint image / distance image / parallax difference image for cameras other than the reference viewpoint are encoded, and then reference viewpoint image / distance for time T2. The parallax difference image for the camera other than the image / reference viewpoint is encoded, and then encoded at a time T3 in such a manner that the image of the reference viewpoint / distance image / the parallax difference image for the camera other than the reference viewpoint is encoded. It performs.

次に、このように生成された符号化データを復号する本発明の映像復号装置について説明する。 Next, the video decoding apparatus of the present invention for decoding the encoded data generated in this way will be described.

図１４及び図１５に、本発明の映像復号装置２の一実施形態例を示す。 14 and 15 show an embodiment of the video decoding device 2 of the present invention.

図１４に示すように、本発明の映像復号装置２は、カメラ情報を復号するカメラ情報復号部２１と、復号されたカメラ情報を格納するカメラ情報メモリ２２と、基準視点として選ばれたカメラの映像を復号する基準視点動画像復号部２３と、基準視点動画像を復号する際に生成されるブロック分割タイプや動きベクトルなどの符号化対象情報を格納する符号化対象情報メモリ２４と、距離動画像を復号する距離動画像復号部２５と、視差差分動画像を復号する視差差分動画像復号部２６と、基準視点以外のカメラの映像を復号する非基準視点動画像復号部２７と、復号された画像を出力する画像出力部２８とを備える。 As shown in FIG. 14, the video decoding device 2 of the present invention includes a camera information decoding unit 21 that decodes camera information, a camera information memory 22 that stores the decoded camera information, and a camera selected as a reference viewpoint. A reference viewpoint moving picture decoding unit 23 for decoding video, an encoding target information memory 24 for storing encoding target information such as a block division type and a motion vector generated when decoding the reference viewpoint moving picture, and a distance video A distance moving image decoding unit 25 that decodes an image, a parallax difference moving image decoding unit 26 that decodes a parallax difference moving image, a non-reference viewpoint moving image decoding unit 27 that decodes video of a camera other than the reference viewpoint, and And an image output unit 28 for outputting the image.

図１５は、非基準視点動画像復号部２７の構成の詳細を示す図である。 FIG. 15 is a diagram illustrating details of the configuration of the non-reference viewpoint moving image decoding unit 27.

この図に示すように、非基準視点動画像復号部２７は、現在の時刻の距離画像と参照画像に使われている画像と同じ時刻の距離画像とを格納する距離画像メモリ２７０と、視差補償画像を生成する視差補償画像生成部２７１と、視差補償画像生成部２７１で生成される視差補償画像に関して時間方向と空間方向とから画素値を補完する視差補償画像補完部２７２と、視差補償画像と復号された視差差分動画像とから最終的な非基準視点動画像を生成する非基準視点動画像生成部２７３と、生成された非基準視点画像を以降のフレームの復号に使うために格納しておく非基準視点画像メモリ２７４とを備える。 As shown in this figure, the non-standard viewpoint moving image decoding unit 27 includes a distance image memory 270 that stores a distance image at the current time and a distance image at the same time as the image used for the reference image, and parallax compensation. A parallax compensation image generating unit 271 that generates an image, a parallax compensation image complementing unit 272 that supplements pixel values from the time direction and the spatial direction with respect to the parallax compensation image generated by the parallax compensation image generation unit 271, A non-reference viewpoint moving image generation unit 273 that generates a final non-reference viewpoint moving image from the decoded parallax difference moving image, and stores the generated non-reference viewpoint image for use in decoding of subsequent frames. A non-reference viewpoint image memory 274.

図１６ないし図１８に、このように構成される本発明の映像復号装置２の実行する処理フローを示す。 FIGS. 16 to 18 show processing flows executed by the video decoding apparatus 2 of the present invention configured as described above.

次に、これらの処理フローに従って、このように構成される本発明の映像復号装置２の実行する処理について詳細に説明する。 Next, processing executed by the video decoding apparatus 2 of the present invention configured as described above will be described in detail according to these processing flows.

ここで、本実施形態例では、まずカメラ情報が入力され、その後、時刻毎に基準視点動画像、距離動画像、視差差分動画像の順番で符号化データが入力されることを想定している。 Here, in the present embodiment, it is assumed that camera information is input first, and then encoded data is input in order of the reference viewpoint moving image, the distance moving image, and the parallax difference moving image for each time. .

これから、本発明の映像復号装置２では、図１６の処理フローに示すように、まずカメラ情報が入力され、その後、時刻毎に基準視点動画像、距離動画像、視差差分動画像の順番で符号化データがされてくるので、その符号化データを入力する〔Ｆ１〕。 From now on, in the video decoding apparatus 2 of the present invention, as shown in the processing flow of FIG. 16, first, camera information is input, and thereafter, in order of the reference viewpoint moving image, the distance moving image, and the parallax difference moving image for each time. Encoded data is input, and the encoded data is input [F1].

入力した符号化データがカメラ情報であれば、カメラ情報復号部２１で、それを復号して、カメラ情報メモリ２２に蓄える〔Ｆ２，Ｆ３〕。 If the input encoded data is camera information, the camera information decoding unit 21 decodes it and stores it in the camera information memory 22 [F2, F3].

一方、入力した符号化データが基準視点動画像であれば、基準視点動画像復号部２３で、それを復号して、復号の際に得た動きベクトルや符号化ブロック分割タイプなどの符号化対象情報については符号化対象情報メモリ２４に格納し、復号した基準視点の画像については非基準視点動画像復号部２７と画像出力部２８とに送る〔Ｆ４，Ｆ５〕。 On the other hand, if the input encoded data is the reference viewpoint moving image, the reference viewpoint moving image decoding unit 23 decodes the input encoded data, and the encoding target such as the motion vector and the encoded block division type obtained at the time of decoding is decoded. Information is stored in the encoding target information memory 24, and the decoded reference viewpoint image is sent to the non-reference viewpoint moving image decoding unit 27 and the image output unit 28 [F4, F5].

一方、入力した符号化データが距離動画像であれば、距離動画像復号部２５で、符号化対象情報メモリ２４に格納されている符号化対象情報を用いて、それを復号して、視差差分動画像復号部２６と非基準視点動画像復号部２７とに送る〔Ｆ６，Ｆ７〕。 On the other hand, if the input encoded data is a distance moving image, the distance moving image decoding unit 25 decodes the encoded object information using the encoding target information stored in the encoding target information memory 24, and generates a disparity difference. It is sent to the moving picture decoding unit 26 and the non-reference viewpoint moving picture decoding unit 27 [F6, F7].

一方、入力した符号化データが視差差分動画像であれば、視差差分動画像復号部２６で、符号化対象情報メモリ２４に格納されている符号化対象情報と距離動画像復号部２５で復号された距離画像とカメラ情報メモリ２２に格納されているカメラ位置関係とを用いて、それを復号して、非基準視点動画像復号部２７に送る〔Ｆ８〕。 On the other hand, if the input encoded data is a parallax differential video, the parallax differential video decoding unit 26 decodes the encoded target information stored in the encoding target information memory 24 and the distance video decoding unit 25. The obtained distance image and the camera positional relationship stored in the camera information memory 22 are decoded and sent to the non-reference viewpoint moving image decoding unit 27 [F8].

このとき、通常の動画像の復号と同じく、動きベクトルを用いて参照フレームからの動き補償を行いながら視差差分動画像を復号する。ここで用いられる動きベクトルは、視差差分動画像の符号化データに含まれているか、符号化対象情報メモリ２４に格納されている符号化対象情報と距離動画像復号部２５で復号された距離動画像とカメラ情報メモリ２２に格納されているカメラ位置関係とから推定する。 At this time, the parallax difference moving image is decoded while performing motion compensation from the reference frame using the motion vector, similarly to the decoding of the normal moving image. The motion vector used here is included in the encoded data of the parallax difference video, or the encoding target information stored in the encoding target information memory 24 and the distance video decoded by the distance video decoding unit 25. This is estimated from the image and the camera positional relationship stored in the camera information memory 22.

そして、非基準視点動画像復号部２７で、基準視点動画像復号部２３で復号された基準視点動画像と距離動画像復号部２５で復号された距離動画像とカメラ情報メモリ２２に格納されているカメラ位置関係とを用いて、非基準視点動画像を復号して、画像出力部２８に送る〔Ｆ９〕。 Then, the non-reference viewpoint moving image decoding unit 27 stores the reference viewpoint moving image decoded by the reference viewpoint moving image decoding unit 23, the distance moving image decoded by the distance moving image decoding unit 25, and the camera information memory 22. The non-reference viewpoint moving image is decoded using the existing camera positional relationship and sent to the image output unit 28 [F9].

画像出力部２８は、基準視点動画像復号部２３から送られてくる基準視点動画像と、非基準視点動画像復号部２７から送られてる非基準視点動画像とを受けて、最終的に送られてきた復号画像を出力する〔Ｆ１０〕。 The image output unit 28 receives the reference viewpoint moving image sent from the reference viewpoint moving image decoding unit 23 and the non-reference viewpoint moving image sent from the non-reference viewpoint moving image decoding unit 27, and finally sends them. The decoded image thus received is output [F10].

次に、図１７の処理フローに従って、視差差分動画像復号部２６で実行される推定動きベクトルの導出処理について説明する。 Next, the estimated motion vector derivation process executed by the parallax difference video decoding unit 26 will be described according to the processing flow of FIG.

視差差分動画像復号部２６で実行される推定動きベクトルの導出処理では、まず、復号する視差差分動画像のブロック（ｘ，ｙ）に対応する基準視点動画像のブロック（Ｘ，Ｙ）を、カメラ位置関係とその時刻の復号済み距離画像とを用いて求める〔Ｇ１〕。 In the estimation motion vector derivation process executed by the disparity difference moving image decoding unit 26, first, the block (X, Y) of the reference viewpoint moving image corresponding to the block (x, y) of the disparity difference moving image to be decoded is determined. It calculates | requires using a camera positional relationship and the decoded distance image of the time [G1].

次に、符号化対象情報として蓄えられている、基準視点動画像のブロック（Ｘ，Ｙ）を符号化する際に使われた動きベクトル（Ｕ_{X ,}Ｕ_Y）を抜き出す〔Ｇ２〕。 Next, the motion vector (U _X, U _Y ) used when encoding the block (X, Y) of the reference viewpoint moving image stored as the encoding target information is extracted [G2].

続いて、このベクトルが「ブロック（Ｘ，Ｙ）は参照フレームにおけるブロック（Ｘ−Ｕ_{X ,}Ｙ−Ｕ_Y）の映像が移動してきたものである」ということを示していることにすると、復号対象フレームと同一時刻の距離画像における基準視点画像のブロック（Ｘ，Ｙ）に対応する位置の値Ｄ_nをベクトルの終点における奥行きと仮定し、参照フレームと同一時刻の距離画像における基準視点画像のブロック（Ｘ−Ｕ_{X ,}Ｙ−Ｕ_Y）に対応する位置の値Ｄ_pをベクトルの始点における奥行きと仮定することによって、３次元の動きベクトル（Ｕ_{X ,}Ｕ_{Y ,}Ｄ_n−Ｄ_p）を定義する〔Ｇ３〕。 Subsequently, this vector is "block (X, Y) is a block _{(X-U X, Y-} U Y) in the reference frame of video in which what has been moved" With that indicates that the decoding The position value D _n corresponding to the block (X, Y) of the reference viewpoint image in the distance image at the same time as the target frame is assumed to be the depth at the end point of the vector, and the reference viewpoint image in the distance image at the same time as the reference frame By assuming the position value D _p corresponding to the block (X-U _X, Y-U _Y ) as the depth at the starting point of the vector, the three-dimensional motion vector (UX _, U _Y, D _n -D _p ) Is defined [G3].

最後に、カメラの位置関係を用いて、この３次元の動きベクトルを、復号しようとしている視差差分動画像に対応する非基準視点のカメラ平面における２次元ベクトルに変換して推定動きベクトルを得る〔Ｇ４〕。 Finally, using the positional relationship of the camera, this three-dimensional motion vector is converted into a two-dimensional vector in the camera plane of the non-reference viewpoint corresponding to the parallax difference moving image to be decoded to obtain an estimated motion vector [ G4].

次に、図１８の処理フローに従って、非基準視点動画像復号部２７で実行される非基準視点動画像の復号処理について説明する。 Next, the non-reference viewpoint video decoding process executed by the non-reference viewpoint video decoding unit 27 will be described according to the processing flow of FIG.

非基準視点動画像復号部２７で実行される非基準視点動画像の復号処理では、まず復号距離動画像と復号基準視点動画像とカメラ位置関係とが視差補償画像生成部２７１に入力され、視差補償画像生成部２７１で、視差補償画像を生成する〔Ｈ１〕。 In the decoding process of the non-reference viewpoint moving image executed by the non-reference viewpoint moving image decoding unit 27, first, the decoded distance moving image, the decoded reference viewpoint moving image, and the camera positional relationship are input to the parallax compensated image generating unit 271, and the parallax The compensation image generation unit 271 generates a parallax compensation image [H1].

次に、その視差補償画像の中で、先ほどの処理では値が割り振られなかった画素について、視差補償画像補完部２７２で、空間的補完あるいは時間的補完を行う〔Ｈ２，Ｈ３，Ｈ４，Ｈ５〕。どちらの補完を行うのかは視差差分動画像の符号化データの中に埋め込まれている。 Next, in the parallax compensation image, a pixel whose value has not been assigned in the previous processing is subjected to spatial complementation or temporal complementation in the parallax compensation image complementing unit 272 [H2, H3, H4, H5]. . Which complement is performed is embedded in the encoded data of the parallax difference moving image.

ここで、空間的補完とは周囲の画素の値から画素を予測する補完処理である。また、時間的補完とは周囲の画素の動きベクトルを非基準視点画像メモリ２７４に蓄えられている画像から予測し、その周囲の画素の動きベクトルからその画素の動きベクトルを予測して、その動きベクトルによって表されるその画素が非基準視点画像メモリ２７４に蓄えられている画像の中で対応する画素の値をその画素の値とする補完である。この補完処理を値が割り振られた画素がなくなるまで繰り返す〔Ｈ２〕。 Here, spatial complementation is a complementation process that predicts a pixel from the values of surrounding pixels. In addition, temporal complementation predicts a motion vector of a surrounding pixel from an image stored in the non-reference viewpoint image memory 274, predicts a motion vector of the pixel from a motion vector of the surrounding pixel, and moves the motion. This is complementation in which the value of the corresponding pixel in the image stored in the non-reference viewpoint image memory 274 is represented by the vector. This complementing process is repeated until there are no more assigned pixels [H2].

そして、非基準視点動画像生成部２７３で、生成された視差補償画像と視差差分動画像復号部２６で復号された視差差分動画像とを足しあわすことで非基準視点動画像を生成する〔Ｈ６〕。この生成した非基準視点画像については以降の時間的補完処理で使われるため、非基準視点動画像生成部２７３は、それを非基準視点画像メモリ２７４に格納する〔Ｈ７〕。 Then, the non-reference viewpoint moving image generation unit 273 generates a non-reference viewpoint moving image by adding the generated parallax compensation image and the parallax difference moving image decoded by the parallax difference moving image decoding unit 26 [H6 ]. Since the generated non-reference viewpoint image is used in the subsequent temporal interpolation processing, the non-reference viewpoint moving image generation unit 273 stores it in the non-reference viewpoint image memory 274 [H7].

このようにして、本発明の映像復号装置２は、本発明の映像符号化装置１により生成された符号化データを復号することで、カメラＣ１，Ｃ２，Ｃ３の撮影した画像を復号するのである。 Thus, the video decoding device 2 of the present invention decodes the images captured by the cameras C1, C2, and C3 by decoding the encoded data generated by the video encoding device 1 of the present invention. .

以上に説明した実施形態例では、カメラの位置関係が変わらないものとしてあるが、位置関係に変化がある場合には、その都度カメラの位置関係を再計算して符号化して伝送することもできる。また、ある程度のずれは許容して、ＧＯＰ単位でカメラの位置関係情報を更新するようにもできる。 In the embodiment described above, the positional relationship of the camera is assumed not to change. However, if there is a change in the positional relationship, the positional relationship of the camera can be recalculated and encoded and transmitted each time. . Further, the positional relationship information of the camera can be updated in units of GOP while allowing a certain amount of deviation.

また、以上に説明した実施形態例では、カメラのフレームからカメラの位置関係や基準視点を求めるようにしているが、それらの情報を外部から与えることもできる。その際には、上述したカメラの位置関係の推定と基準視点の選択の処理とを省くことができる。 In the embodiment described above, the positional relationship of the camera and the reference viewpoint are obtained from the camera frame. However, such information can also be given from the outside. In that case, it is possible to omit the above-described estimation of the positional relationship of the cameras and the selection of the reference viewpoint.

また、以上に説明した実施形態例では説明しなかったが、カメラの位置関係を求める手法としては、符号化処理・復号処理のどちらにも影響がなく任意の手法を用いることができる。 Although not described in the embodiment described above, any technique can be used as a technique for obtaining the positional relationship of the cameras without affecting both the encoding process and the decoding process.

また、実施形態例の中に２次元動画像を符号化・復号する処理がいくつか存在するが、その手法は動き予測・動き補償を行う動画像符号化・復号手法であれば、どのような手法にも適用することができる。動き予測・動き補償を用いる符号化技術としては、国際標準であるＨ．２６４やＭＰＥＧ−４など多数の手法が存在する。 In addition, there are some processes for encoding / decoding two-dimensional moving images in the embodiment, but any method can be used as long as the method is a moving image encoding / decoding method that performs motion prediction / compensation. The method can also be applied. As an encoding technique using motion prediction / compensation, H.I. There are many techniques such as H.264 and MPEG-4.

１映像符号化装置
２映像復号装置
１１画像情報入力部
１２画像メモリ
１３カメラ情報初期設定部
１４基準視点動画像処理部
１５距離画像処理部
１６非基準視点動画像処理部
２１カメラ情報復号部
２２カメラ情報メモリ
２３基準視点動画像復号部
２４符号化対象情報メモリ
２５距離動画像復号部
２６視差差分動画像復号部
２７非基準視点動画像復号部
２８画像出力部 DESCRIPTION OF SYMBOLS 1 Video coding apparatus 2 Video decoding apparatus 11 Image information input part 12 Image memory 13 Camera information initial setting part 14 Reference viewpoint moving image processing part 15 Distance image processing part 16 Non-reference viewpoint moving image processing part 21 Camera information decoding part 22 Camera Information memory 23 Reference viewpoint video decoding unit 24 Encoding target information memory 25 Distance video decoding unit 26 Parallax difference video decoding unit 27 Non-reference viewpoint video decoding unit 28 Image output unit

Claims

A video encoding method for encoding images captured by a plurality of cameras that capture a subject,
Encoding a reference viewpoint image captured by a camera serving as a reference viewpoint;
Generating a distance image indicating an estimated distance from the camera that captured the reference viewpoint image to the subject;
Encoding the distance image;
Estimating a parallax compensation image at a viewpoint other than the reference viewpoint based on the reference viewpoint image, the distance image, and the positional relationship of the camera that defines the installation position and orientation of the camera;
Calculating a parallax difference image indicating a difference between the estimated parallax compensation image and an encoding target image captured by a camera associated with the estimation target viewpoint;
Generating a parallax difference prediction value obtained by temporally or spatially predicting the calculated parallax difference image using the encoded parallax difference image;
Encoding data corresponding to a difference between the calculated parallax difference image and the parallax difference prediction value,
A characteristic video encoding method.

The video encoding method according to claim 1,
In the step of estimating the parallax compensation image, a reference viewpoint image obtained by decoding the encoded data of the reference viewpoint image and a distance image obtained by decoding the encoded data of the distance image are used. Estimating a parallax compensation image at a viewpoint other than the reference viewpoint,
A characteristic video encoding method.

The video encoding method according to claim 1,
Acquiring the positional relationship of the camera according to information from the outside, or setting the positional relationship of the camera by estimating the positional relationship of the camera based on images of all cameras;
Encoding information on the positional relationship of the set camera,
A characteristic video encoding method.

The video encoding method according to claim 3, wherein
In the step of estimating the parallax compensation image, a reference viewpoint image obtained by decoding encoded data of the reference viewpoint image, a distance image obtained by decoding encoded data of the distance image, and the camera Estimating a parallax compensation image at a viewpoint other than the reference viewpoint based on the positional relation of the camera obtained by decoding the encoded data of the positional relation information,
A characteristic video encoding method.

The video encoding method according to any one of claims 1 to 4,
Having a step of setting a camera that is shooting a space most overlapping with a space shot by another camera as a reference viewpoint camera,
A characteristic video encoding method.

The video encoding method according to any one of claims 1 to 5,
In the step of generating the distance image, the distance image is generated by dividing the image into blocks and estimating the distance for each block.
A characteristic video encoding method.

The video encoding method according to any one of claims 1 to 6,
In the step of generating the distance image, when the distance image is generated according to a prescribed algorithm, the difference value between the evaluation value of the distance image generated at the current time and the evaluation value of the distance image generated at the previous time Is determined by comparing the magnitude of the difference value with a predetermined threshold, and when the difference value is determined to be large, it is determined that the distance image generated at the current time is used as it is. When determining that the difference value is small, generating a distance image by determining to change to the distance image generated at the previous time,
A characteristic video encoding method.

The video encoding method according to any one of claims 1 to 7,
In the step of estimating the parallax compensation image, a pixel value of a pixel that is not estimated based on the reference viewpoint image, the distance image, and the positional relationship of the camera is calculated from the pixel values of surrounding pixels. To estimate
A characteristic video encoding method.

The video encoding method according to any one of claims 1 to 7,
In the step of estimating the parallax compensation image, for the pixel whose pixel value cannot be estimated based on the positional relationship between the reference viewpoint image, the distance image, and the camera, the motion information of the pixel is estimated from the motion information of surrounding pixels. Then, based on the estimated motion information and the pixel value of the encoded image, estimating the pixel value of the pixel,
A characteristic video encoding method.

The video encoding method according to any one of claims 1 to 9,
In the step of encoding the distance image, encoding the distance image using a motion vector used in encoding the reference viewpoint image,
A characteristic video encoding method.

The video encoding method according to any one of claims 1 to 10,
In the step of generating the parallax difference prediction value, the motion vector used when the reference viewpoint image is encoded, the motion vector estimated based on the positional relationship between the distance image and the camera, or its own reference Selecting a motion vector estimated from an image having a higher coding efficiency to generate the parallax difference prediction value;
A characteristic video encoding method.

The video encoding method according to claim 11, wherein
In the step of generating the parallax difference prediction value, when estimating a motion vector, using a distance image obtained by decoding encoded data of the distance image,
A characteristic video encoding method.

A video decoding method for decoding encoded data of images taken by a plurality of cameras that photograph a subject,
Decoding encoded data for a reference viewpoint image captured by a camera serving as a reference viewpoint;
Decoding encoded data for a distance image indicating an estimated distance from the camera that captured the reference viewpoint image to the subject;
Estimating a parallax compensation image at a viewpoint other than the reference viewpoint based on the decoded reference viewpoint image, the decoded distance image, and a positional relationship of the camera that defines the installation position and orientation of the camera;
Using the disparity difference image indicating the difference between the estimated disparity compensation image and the image captured by the camera associated with the estimation target viewpoint, and the disparity difference image obtained by restoring the disparity difference image in terms of time or space Decoding encoded data for difference data with the predicted parallax difference prediction value;
Restoring the parallax difference image based on the decoded difference data and the parallax difference prediction value;
Restoring a captured image of a camera associated with a viewpoint other than the reference viewpoint based on the estimated parallax compensation image and the restored parallax difference image,
A video decoding method.

The video decoding method according to claim 13, wherein
Having the step of decoding the encoded data about the positional relationship information of the camera,
A video decoding method.

The video decoding method according to claim 13 or 14,
In the step of estimating the parallax compensation image, a pixel whose pixel value cannot be estimated based on the decoded reference viewpoint image, the decoded distance image, and the positional relationship of the camera is calculated based on the pixel values of surrounding pixels. Estimating the pixel value of
A video decoding method.

The video decoding method according to claim 13 or 14,
In the step of estimating the parallax compensation image, for a pixel whose pixel value cannot be estimated based on the decoded reference viewpoint image, the decoded distance image, and the positional relationship of the camera, the motion information of the surrounding pixels is used for the pixel value. Estimating the motion information and estimating the pixel value of the pixel based on the estimated motion information and the pixel value of the decoded image;
A video decoding method.

A video encoding program for causing a computer to execute the video encoding method according to any one of claims 1 to 12.

A computer-readable recording medium on which a video encoding program for causing a computer to execute the video encoding method according to any one of claims 1 to 12 is recorded.

A video decoding program for causing a computer to execute the video decoding method according to any one of claims 13 to 16.

A computer-readable recording medium on which a video decoding program for causing a computer to execute the video decoding method according to any one of claims 13 to 16 is recorded.