JP7417388B2

JP7417388B2 - Encoding device, decoding device, and program

Info

Publication number: JP7417388B2
Application number: JP2019164370A
Authority: JP
Inventors: 一宏原; 健介久富; 智之三科
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2019-09-10
Filing date: 2019-09-10
Publication date: 2024-01-18
Anticipated expiration: 2039-09-10
Also published as: JP2021044658A

Description

本発明は、符号化装置、復号装置、及びプログラムに関し、特に、インテグラル３Ｄ（３次元）映像の表示や自由視点映像の表示に必要となる多視点画像の符号化装置、復号装置、及びプログラムに関する。 The present invention relates to an encoding device, a decoding device, and a program, and in particular to an encoding device, a decoding device, and a program for multi-view images necessary for displaying integral 3D (three-dimensional) video and free-view video. Regarding.

インテグラル３Ｄ映像を表示する要素画像群を撮影することができるカメラとして、撮像素子のセンサーの手前にレンズアレイを配置するライトフィールドカメラが製品化されている。しかし、一般にライトフィールドカメラは撮影後のリフォーカス機能を目的としている。そのため、ライトフィールドカメラで撮影した画像を用いてインテグラル３Ｄ映像を表示すると、ライトフィールドカメラを構成するメインレンズの直径が、被写体までの距離に比べて小さな値となることから、運動視差が小さく、３次元映像の奥行を十分に再現することができない。この問題は、メインレンズの直径を大きくすることやカメラと被写体との距離を短くすることで理論上は解決することができるが、これらの対策による問題解決は実用的ではない。 A light field camera in which a lens array is arranged in front of a sensor of an image pickup device has been commercialized as a camera capable of photographing a group of elemental images that display an integral 3D image. However, light field cameras are generally intended for a refocusing function after shooting. Therefore, when displaying integral 3D images using images taken with a light field camera, the diameter of the main lens that makes up the light field camera is small compared to the distance to the subject, so motion parallax is small. , it is not possible to sufficiently reproduce the depth of 3D images. This problem can theoretically be solved by increasing the diameter of the main lens or shortening the distance between the camera and the subject, but these measures are not practical.

そこで、通常のカメラを水平・垂直の２次元配列に並べたカメラアレイを用いて、多視点映像を撮影することが考えられている。この場合の要素画像群の生成は、カメラアレイで撮影された複数の映像から視点内挿処理を用いることでカメラ間の視点映像を生成、その後、カメラアレイで撮影した映像と視点内挿映像から要素画像群に変換する処理が行われる（特許文献１）。ここで、カメラアレイのカメラ間距離は、カメラから被写体までの距離や、視点内挿が実用的に可能な距離、表示装置で再現できる視域角によって設計できることが知られている。また、視点内挿処理ではカメラから被写体までの距離を相対的に表現するデプスマップ（奥行き画像）を用いることで高精度な内挿画像の生成が行われている。デプスマップは、画像処理技術による奥行き推定や赤外線を用いて光学的に距離を測定する方法で生成される。このデプスマップ生成の精度を上げると、視点内挿の精度も向上する。 Therefore, it has been considered to capture multi-view images using a camera array in which ordinary cameras are arranged in a two-dimensional array horizontally and vertically. In this case, the element image group is generated by using viewpoint interpolation processing from multiple images taken by the camera array to generate viewpoint images between the cameras, and then from the images taken by the camera array and the viewpoint interpolation images. A process of converting into a group of elemental images is performed (Patent Document 1). Here, it is known that the inter-camera distance of the camera array can be designed based on the distance from the camera to the subject, the distance where viewpoint interpolation is practically possible, and the viewing angle that can be reproduced by the display device. Furthermore, in viewpoint interpolation processing, a highly accurate interpolated image is generated by using a depth map (depth image) that relatively expresses the distance from the camera to the subject. The depth map is generated by estimating depth using image processing technology or optically measuring distance using infrared rays. Increasing the accuracy of this depth map generation also improves the accuracy of viewpoint interpolation.

インテグラル３Ｄ映像の表示について、３次元映像を再現できる奥行は隣接する多視点画像間の視差、レンズアレイの焦点距離、および要素画像の画素数に関係する。その中でも３次元映像を再現できる奥行きを広げるためには、要素画像の画素数を増やすことが有効であると知られている。この場合、要素画像の画素数は多視点画像の視点数と等しくなることから、奥行きのある３次元映像を生成するためには符号化対象となる多視点画像の視点数が多く必要になり、３次元映像を表示するための情報量は膨大となる。 Regarding the display of an integral 3D image, the depth at which a 3D image can be reproduced is related to the parallax between adjacent multi-view images, the focal length of the lens array, and the number of pixels of the element images. Among these, it is known that increasing the number of pixels of elemental images is effective in increasing the depth at which three-dimensional images can be reproduced. In this case, the number of pixels in the elemental image is equal to the number of viewpoints in the multi-view image, so in order to generate a three-dimensional video with depth, a large number of viewpoints in the multi-view image to be encoded is required. The amount of information needed to display three-dimensional images is enormous.

インテグラル３Ｄ映像の伝送や記録では、３次元映像を表示するための膨大な情報量を符号化する。符号化では、要素画像群を多視点画像群に変換後に多視点映像符号化を行う方法や、変換後の多視点映像を符号化時に間引き、復号時に視点内挿する方法が知られている。この方法では多視点画像を間引くことで符号化対象画像を減らし、３次元映像の表示に必要な情報量を低減させている。 In transmitting and recording integral 3D video, a huge amount of information is encoded to display the 3D video. In encoding, methods are known in which multi-view video encoding is performed after converting a group of elemental images into a group of multi-view images, and a method in which the converted multi-view video is thinned out at the time of encoding and viewpoint interpolation is performed at the time of decoding. In this method, the number of images to be encoded is reduced by thinning out multi-view images, thereby reducing the amount of information required to display a three-dimensional image.

図９に、従来の符号化対象画像の例を示す。図９は、多視点映像及び多視点映像に対応するデプスマップの１フレームを表しており、これがＮフレーム集まって映像を構成している。各フレームにおいて、例えば、１３×１３＝１６９視点の多視点画像を等間隔で間引いて５×５＝２５視点の多視点画像とし、同様に、１３×１３＝１６９視点のデプスマップを等間隔で間引いて５×５＝２５視点のデプスマップとしている。破線で表されている画像は、符号化しない多視点画像である。符号化対象画像は同視点の多視点画像とデプスマップがそれぞれ等間隔に並ぶ配置となっている。 FIG. 9 shows an example of a conventional encoding target image. FIG. 9 shows a multi-view video and one frame of a depth map corresponding to the multi-view video, and a collection of N frames constitutes a video. In each frame, for example, a multi-view image of 13 x 13 = 169 viewpoints is thinned out at equal intervals to create a multi-view image of 5 x 5 = 25 viewpoints, and similarly, a depth map of 13 x 13 = 169 viewpoints is thinned out at equal intervals. The depth map is thinned out and has 5×5=25 viewpoints. Images represented by broken lines are multi-view images that are not encoded. In the encoding target image, multi-view images and depth maps of the same viewpoint are arranged at equal intervals.

この符号化対象画像の符号化においては、隣接する視点画像（多視点画像のうちの１視点の画像を「視点画像」と言うこととする。）の相関の高さを利用して、基準となる視点画像と符号化対象の視点画像との差分を求めて符号化する方法、及び、視点画像と対応するデプスマップとの相関の高さを利用して符号化を行う方法等を用いて、多視点画像とデプスマップを１フレームとして一体的に符号化処理し、高い圧縮率の符号化が行われる。 In encoding this encoding target image, the high correlation between adjacent viewpoint images (an image of one viewpoint among multi-view images is referred to as a "viewpoint image") is used to A method of encoding by determining the difference between a viewpoint image and a viewpoint image to be encoded, and a method of encoding using the high correlation between a viewpoint image and a corresponding depth map, etc. The multi-view image and the depth map are integrally encoded as one frame, resulting in high compression rate encoding.

また、復号側では、符号化対象画像の復号を行った後、間引かれた（不足している）多視点画像の視点内挿を行う。視点内挿は最初にデプスマップの内挿を行い、例えば、５×５＝２５視点のデプスマップから１３×１３＝１６９視点のデプスマップを再生する。次いで、全視点のデプスマップと復号された視点画像をもとに、復号された視点間の視点画像の内挿を行う。 Furthermore, on the decoding side, after decoding the encoding target image, viewpoint interpolation is performed for the thinned out (missing) multi-view images. In viewpoint interpolation, a depth map is first interpolated, and for example, a depth map of 13×13=169 viewpoints is reproduced from a depth map of 5×5=25 viewpoints. Next, interpolation of viewpoint images between the decoded viewpoints is performed based on the depth maps of all viewpoints and the decoded viewpoint images.

特開２０１６－１５８２１３号公報Japanese Patent Application Publication No. 2016-158213

より奥行きのあるインテグラル３Ｄ映像を表示するためには画素数の多い要素画像群が必要であり、このため多視点画像の視点数が多く必要になる。視点数の多い多視点画像のデータ量を圧縮するために、視点の間引きを多くし、符号化する視点の間隔を広げて、符号化対象画像を減らす方法が考えられる。 In order to display an integral 3D video with greater depth, a group of elemental images with a large number of pixels is required, and therefore a large number of viewpoints in a multi-view image is required. In order to compress the data amount of a multi-view image with a large number of viewpoints, a method can be considered to increase the thinning of the viewpoints, widen the interval between the viewpoints to be encoded, and reduce the number of images to be encoded.

しかし、視点を多く間引いて符号化対象画像を減らす方法では、伝送や記録において、３次元映像を表示するための情報量を減らすことができる一方で、復号後の視点内挿による画質劣化が発生する。これは、視点内挿で必要となるデプスマップの生成にて、符号化対象画像を減らすことから隣接画像間の距離が遠くなり生成するデプスマップの精度が低下することに起因する。また、視点内挿の予測にて視差が大きくなることからオクルージョン領域を埋めるインペイント処理の精度が低下することも原因として挙げられる。視点内挿による画質劣化には、デプスマップの視点内挿による画質劣化と多視点画像の視点内挿による画質劣化が含まれる。さらに、符号化処理では、符号化の対象である多視点画像間の視差が大きくなることから視点補償予測の精度が低下し、符号化効率を悪化させる。 However, while the method of reducing the number of images to be encoded by thinning out many viewpoints can reduce the amount of information needed to display 3D images during transmission and recording, it also causes deterioration in image quality due to viewpoint interpolation after decoding. do. This is because when generating the depth map required for viewpoint interpolation, the number of images to be encoded is reduced, so the distance between adjacent images increases and the accuracy of the generated depth map decreases. Another cause is that the accuracy of the in-painting process for filling in the occlusion area decreases because the parallax increases in the prediction of viewpoint interpolation. Image quality deterioration due to viewpoint interpolation includes image quality deterioration due to viewpoint interpolation of depth maps and image quality deterioration due to viewpoint interpolation of multi-view images. Furthermore, in the encoding process, since the parallax between the multi-view images to be encoded increases, the accuracy of viewpoint compensation prediction decreases, and the encoding efficiency deteriorates.

したがって、上記のような問題点に鑑みてなされた本発明の目的は、符号化において、多視点映像及びデプスマップの伝送データ量を効率的に削減し、復号において、視点内挿画像の画質劣化を抑えることができる符号化装置、復号装置、及びプログラムを提供することにある。 Therefore, an object of the present invention, which was made in view of the above-mentioned problems, is to efficiently reduce the amount of transmitted data of multi-view videos and depth maps in encoding, and to reduce the image quality deterioration of viewpoint interpolated images in decoding. An object of the present invention is to provide an encoding device, a decoding device, and a program that can suppress the problem.

上記課題を解決するために本発明に係る符号化装置は、多視点映像と前記多視点映像に対応するデプスマップを符号化する符号化装置であって、各フレームにおいて前記多視点映像と前記デプスマップの視点を間引く視点間引き部と、前記多視点映像と前記デプスマップのフレームを間引くフレーム間引き部と、視点間引き及びフレーム間引きされた多視点映像とデプスマップを符号化する符号化部とを備えた符号化装置において、前記フレーム間引き部は、各フレームの各視点において視点画像とデプスマップの一方を間引くとともに、間引く対象をフレーム毎に切り替えることを特徴とする。 In order to solve the above problems, an encoding device according to the present invention is an encoding device that encodes a multi-view video and a depth map corresponding to the multi-view video, and the encoding device encodes a multi-view video and a depth map corresponding to the multi-view video in each frame. A viewpoint thinning unit that thins out the viewpoints of the map, a frame thinning unit that thins out the frames of the multi-view video and the depth map, and an encoding unit that encodes the multi-view video and the depth map that have been thinned out by the viewpoint thinning and frames. In the encoding device, the frame thinning unit thins out either the viewpoint image or the depth map at each viewpoint of each frame, and switches the thinning target for each frame.

また、前記符号化装置は、前記フレーム間引き部が、各フレームにおいて、一視点の視点画像とデプスマップの一方を間引くとともに、前記一視点に隣接する少なくとも一つの視点において視点画像とデプスマップの他方を間引くことが望ましい。 Further, in the encoding device, the frame thinning unit thins out one of the viewpoint image and the depth map of one viewpoint in each frame, and the other of the viewpoint image and the depth map in at least one viewpoint adjacent to the one viewpoint. It is desirable to thin out.

また、前記符号化装置は、前記フレーム間引き部が、視点間引きされた各フレームにおいて、視点画像又はデプスマップが市松模様となるように間引くことが望ましい。 Further, in the encoding device, it is preferable that the frame thinning unit thins out the viewpoint image or the depth map in each viewpoint thinned frame so as to form a checkered pattern.

また、前記符号化装置は、前記フレーム間引き部が、視点間引きされた各フレームにおいて、視点画像とデプスマップが交互に列配置又は行配置となるように間引くことが望ましい。 Further, in the encoding device, it is preferable that the frame thinning unit thins out the viewpoint images and the depth map in each frame thinned out so that the viewpoint images and the depth map are arranged alternately in columns or rows.

上記課題を解決するために本発明に係る復号装置は、映像符号化データから、多視点映像とデプスマップを復号する復号部と、前記多視点映像と前記デプスマップに対して、間引きされたフレームを内挿するフレーム内挿部と、各フレームにおいて、間引きされた視点画像とデプスマップを視点内挿する視点内挿部とを備え、前記映像符号化データから復号された前記視点画像及び前記デプスマップの信頼度を、内挿された前記視点画像及び前記デプスマップの信頼度よりも高い値に設定し、前記視点内挿部は、参照画像の信頼度の重みに基づいて、視点画像の予測を行うことを特徴とする。 In order to solve the above problems, a decoding device according to the present invention includes a decoding unit that decodes a multi-view video and a depth map from encoded video data, and a decoding unit that decodes a multi-view video and a depth map from video encoded data, and and a viewpoint interpolation unit that interpolates the thinned-out viewpoint image and the depth map in each frame, and the viewpoint interpolation unit interpolates the viewpoint image and the depth map decoded from the video encoded data. The reliability of the map is set to a higher value than the reliability of the interpolated viewpoint image and the depth map, and the viewpoint interpolation unit predicts the viewpoint image based on the reliability weight of the reference image. It is characterized by doing the following .

上記課題を解決するために本発明に係るプログラムは、コンピュータを、上記の符号化装置として機能させることを特徴とする。 In order to solve the above problems, a program according to the present invention is characterized by causing a computer to function as the above encoding device.

上記課題を解決するために本発明に係るプログラムは、コンピュータを、上記の復号装置として機能させることを特徴とする。 In order to solve the above problems, a program according to the present invention is characterized by causing a computer to function as the above decoding device.

本発明における符号化装置、復号装置、及びプログラムによれば、符号化において、多視点映像及びデプスマップの伝送データ量を効率的に削減し、復号において、視点内挿画像の画質劣化を抑えることができる。 According to the encoding device, decoding device, and program of the present invention, it is possible to efficiently reduce the amount of transmission data of multi-view videos and depth maps during encoding, and suppress image quality deterioration of viewpoint interpolated images during decoding. I can do it.

本発明の符号化装置及び復号装置のブロック図の例である。It is an example of a block diagram of an encoding device and a decoding device of the present invention. 偶数フレーム目の符号化対象画像の一例を示す図である。FIG. 3 is a diagram illustrating an example of an image to be encoded in an even-numbered frame. 奇数フレーム目の符号化対象画像の一例を示す図である。FIG. 3 is a diagram illustrating an example of an image to be encoded in an odd-numbered frame. 偶数フレーム目の符号化対象画像の別の例を示す図である。FIG. 7 is a diagram illustrating another example of an image to be encoded in an even-numbered frame. 奇数フレーム目の符号化対象画像の別の例を示す図である。FIG. 7 is a diagram illustrating another example of an image to be encoded in an odd-numbered frame. 偶数フレーム目の符号化対象画像の更に別の例を示す図である。FIG. 7 is a diagram showing still another example of an image to be encoded in an even frame. 奇数フレーム目の符号化対象画像の更に別の例を示す図である。FIG. 7 is a diagram illustrating yet another example of an image to be encoded in an odd-numbered frame. 本発明の視点内挿処理を説明する図である。FIG. 3 is a diagram illustrating viewpoint interpolation processing of the present invention. 従来の符号化対象画像の例を示す図である。FIG. 2 is a diagram showing an example of a conventional encoding target image.

以下、本発明の実施の形態について説明する。 Embodiments of the present invention will be described below.

図１に、本発明の符号化装置及び復号装置のブロック図の例を示す。符号化装置１０と復号装置２０は、全体として符号化・復号システムを構成する。符号化装置１０と復号装置２０の間は、情報通信が可能な任意の伝送路で結ばれていてもよく、この場合は、両者は送信装置１０と受信装置２０として機能する。このときの送受信方法としては、放送システム、電波通信、有線・無線ネットワーク等を利用することができる。また、両者をそれぞれ独立した装置とし、記録媒体等を用いて符号化装置１０から復号装置２０へのデータの授受を行ってもよい。 FIG. 1 shows an example of a block diagram of an encoding device and a decoding device of the present invention. Encoding device 10 and decoding device 20 constitute an encoding/decoding system as a whole. Encoding device 10 and decoding device 20 may be connected by any transmission path that allows information communication, and in this case, both function as transmitting device 10 and receiving device 20. As a transmission/reception method at this time, a broadcasting system, radio wave communication, wired/wireless network, etc. can be used. Alternatively, both may be independent devices, and data may be sent and received from the encoding device 10 to the decoding device 20 using a recording medium or the like.

以下、符号化装置１０、復号装置２０それぞれについて、詳細に説明する。 Each of the encoding device 10 and the decoding device 20 will be described in detail below.

［符号化装置］
符号化装置１０は、視点間引き部１１、フレーム間引き部１２、及び符号化部１３を備えている。 [Encoding device]
The encoding device 10 includes a viewpoint thinning section 11, a frame thinning section 12, and an encoding section 13.

入力画像は、例えば、カメラ（例えば、ＣＭＯＳセンサ）が縦横１３×１３個（＝１６９個）配列された多視点カメラで取得した多視点映像と、当該多視点映像に対応するデプスマップである。多視点映像の１視点の画像（視点画像）のそれぞれは、カラーのテクスチャー画像であり、デプスマップは、多視点映像と同じ視点とフレームを有する多視点の奥行き画像の映像である。入力画像は、視点間引き部１１に入力される。 The input image is, for example, a multi-view video obtained by a multi-view camera in which cameras (for example, CMOS sensors) are arranged in 13×13 (=169) rows and columns, and a depth map corresponding to the multi-view video. Each one-viewpoint image (viewpoint image) of the multi-view video is a color texture image, and the depth map is a multi-view depth image having the same viewpoint and frame as the multi-view video. The input image is input to the viewpoint thinning unit 11 .

なお、本実施形態では、デプスマップは外部で生成されて、符号化装置１０に入力されるものとしたが、入力映像として多視点映像のみが符号化装置１０に入力され、多視点映像に基づく奥行き推定処理等を行い、デプスマップを符号化装置１０内で生成することもできる。 Note that in this embodiment, the depth map is generated externally and inputted to the encoding device 10, but only multi-view video is input to the encoding device 10 as an input video, and the depth map is generated externally and inputted to the encoding device 10. A depth map can also be generated within the encoding device 10 by performing depth estimation processing or the like.

視点間引き部１１は、入力された多視点映像及び多視点映像に対応するデプスマップについて、各フレームにおいて、等間隔で視点を間引く視点間引き処理を行う。例えば、１３×１３＝１６９視点の多視点画像を等間隔で間引いて５×５＝２５視点の多視点画像に縮小する。また、デプスマップも同様に、１３×１３＝１６９視点のデプスマップを間引いて５×５＝２５視点のデプスマップとする。視点間引きの結果得られた多視点映像及びデプスマップは、図９の従来の符号化対象画像（符号化対象映像）と同じであり、各フレームにおいて、多視点画像とデプスマップがそれぞれ同視点で等間隔に並ぶ配置となっている。視点間引きされた多視点映像及びデプスマップは、フレーム間引き部１２に出力される。 The viewpoint thinning unit 11 performs a viewpoint thinning process of thinning out the viewpoints at equal intervals in each frame on the input multi-view video and the depth map corresponding to the multi-view video. For example, a multi-view image with 13×13=169 viewpoints is thinned out at equal intervals to reduce it to a multi-view image with 5×5=25 viewpoints. Similarly, for the depth map, the depth map with 13×13=169 viewpoints is thinned out to create a depth map with 5×5=25 viewpoints. The multi-view image and depth map obtained as a result of viewpoint thinning are the same as the conventional encoding target image (encoding target video) in FIG. They are arranged at equal intervals. The multi-view video and depth map with the viewpoints thinned out are output to the frame thinning unit 12.

フレーム間引き部１２は、入力された（視点間引きされた）多視点映像とデプスマップのフレームを間引きする。ここで、フレーム間引きの処理の一例について、図２、図３を用いて説明する。 The frame thinning unit 12 thins out the frames of the input (viewpoint thinned out) multi-view video and depth map. Here, an example of frame thinning processing will be described using FIGS. 2 and 3.

図２は、フレーム間引きされた多視点映像及びデプスマップの偶数（２ｎ：ｎは整数）フレーム目を示しており、図３は、フレーム間引きされた多視点映像及びデプスマップの奇数（２ｎ＋１）フレーム目を示している。すなわち、図２は偶数フレーム目の符号化対象画像の一例を示しており、図３は奇数フレーム目の符号化対象画像の一例を示している。各フレームでは、視点間引きされた５×５＝２５視点において、多視点画像とデプスマップの一方が間引き（フレーム間引き）される。図２と図３を対比すると明らかなように、ある視点において、偶数フレームでデプスマップが間引きされた場合は、奇数フレームでは視点画像が間引きされる。偶数フレームで視点画像が間引きされた視点は、奇数フレームではデプスマップが間引きされる。すなわち、視点画像とデプスマップの間引く対象をフレーム毎に交互に切り替える。 Figure 2 shows the even-numbered (2n: n is an integer) frame of the multi-view video and depth map with frame thinning, and Figure 3 shows the odd-numbered (2n+1) frame of the frame-thinned multi-view video and depth map. Showing eyes. That is, FIG. 2 shows an example of an image to be encoded in an even frame, and FIG. 3 shows an example of an image to be encoded in an odd frame. In each frame, one of the multi-view image and the depth map is thinned out (frame thinning) among 5×5=25 thinned-out viewpoints. As is clear from comparing FIG. 2 and FIG. 3, in a certain viewpoint, when the depth map is thinned out in even frames, the viewpoint image is thinned out in odd frames. For viewpoints whose viewpoint images are thinned out in even-numbered frames, depth maps are thinned out in odd-numbered frames. That is, the thinning target of the viewpoint image and the depth map is alternately switched for each frame.

さらに、視点間引きされた各フレームにおいて、ある視点でデプスマップを間引いた場合、その視点に（視点間引き後）隣接する少なくとも一つの視点においては視点画像を間引くことが望ましい。すなわち、一視点において視点画像とデプスマップの一方を間引いたとき、その一視点に隣接する少なくとも一つの視点において視点画像とデプスマップの他方を間引くことが望ましい。これにより、隣接する視点において視点画像とデプスマップがそれぞれ符号化される。そして、復号側で視点内挿を行う際に、隣接する視点画像の少なくとも一方が、復号された精度の高い画像となる。 Furthermore, in each frame where the viewpoints have been thinned out, when the depth map is thinned out at a certain viewpoint, it is desirable to thin out the viewpoint images at at least one viewpoint adjacent to that viewpoint (after viewpoint thinning). That is, when one of the viewpoint image and the depth map is thinned out at one viewpoint, it is desirable to thin out the other of the viewpoint image and the depth map at at least one viewpoint adjacent to the one viewpoint. As a result, the viewpoint images and depth maps are respectively encoded at adjacent viewpoints. Then, when viewpoint interpolation is performed on the decoding side, at least one of the adjacent viewpoint images becomes a highly accurate decoded image.

図２の例では、ある視点でデプスマップを間引いて視点画像を符号化対象画像としたとき、その上下左右の（視点間引き後）隣接する視点は、視点画像を間引いてデプスマップを符号化対象画像とする。図３の奇数フレームにおいても、視点画像とデプスマップの配置は反対であるが、ある視点とそれに隣接する上下左右の視点において、視点画像とデプスマップが交互に符号化対象画像となっている。このように、図２、図３では、視点間引きされた各フレームにおいて、視点画像（又はデプスマップ）が市松模様となるようにフレーム間引きされる。 In the example in Figure 2, when the depth map is thinned out at a certain viewpoint and the viewpoint image is used as the encoding target image, the adjacent viewpoints on the upper, lower, left, and right sides (after viewpoint thinning) are thinned out and the depth map is used as the encoding target image. Make it an image. Even in the odd-numbered frames in FIG. 3, the positions of the viewpoint image and the depth map are opposite, but the viewpoint image and the depth map alternately serve as encoding target images for a certain viewpoint and the adjacent viewpoints on the upper, lower, left, and right sides. In this way, in FIGS. 2 and 3, the frames are thinned out so that the perspective images (or depth maps) form a checkered pattern in each frame that has been thinned out.

フレーム間引きの処理の別の例について、図４、図５を用いて説明する。 Another example of frame thinning processing will be described using FIGS. 4 and 5.

図４は、フレーム間引きされた多視点映像及びデプスマップの偶数（２ｎ：ｎは整数）フレーム目を示しており、図５は、フレーム間引きされた多視点映像及びデプスマップの奇数（２ｎ＋１）フレーム目を示している。すなわち、図４は偶数フレーム目の符号化対象画像の別の例を示しており、図５は奇数フレーム目の符号化対象画像の別の例を示している。各フレームでは、視点間引きされた５×５＝２５視点において、多視点画像とデプスマップの一方が間引き（フレーム間引き）される。図４と図５を対比すると明らかなように、ある視点において、偶数フレームでデプスマップが間引きされた場合は、奇数フレームでは視点画像が間引きされ、偶数フレームで視点画像が間引きされた視点は、奇数フレームではデプスマップが間引きされる。 FIG. 4 shows the even-numbered (2n: n is an integer) frame of the multi-view video and depth map with frame thinning, and FIG. 5 shows the odd-numbered (2n+1) frame of the multi-view video with frame thinning and the depth map. Showing eyes. That is, FIG. 4 shows another example of an image to be encoded in an even frame, and FIG. 5 shows another example of an image to be encoded in an odd frame. In each frame, one of the multi-view image and the depth map is thinned out (frame thinning) among 5×5=25 thinned-out viewpoints. As is clear from comparing FIG. 4 and FIG. 5, in a certain viewpoint, when the depth map is thinned out in even frames, the viewpoint image is thinned out in odd frames, and the viewpoint image is thinned out in even frames. In odd frames, the depth map is thinned out.

図４の例では、ある視点でデプスマップを間引いて視点画像を符号化対象画像としたとき、その左右の（視点間引き後）隣接する視点は、視点画像を間引いてデプスマップを符号化対象画像とする。図５の奇数フレームにおいても、視点画像とデプスマップの配置は反対であるが、ある視点とそれに隣接する左右の視点において、視点画像とデプスマップが交互に符号化対象画像となっている。このように、図４、図５では、視点間引きされた各フレームにおいて、視点画像及びデプスマップが交互に列配置となるように間引きされることが望ましい。 In the example shown in FIG. 4, when the depth map is thinned out at a certain viewpoint and the viewpoint image is used as the encoding target image, the adjacent viewpoints on the left and right (after viewpoint thinning) are thinned out and the depth map is used as the encoding target image. shall be. Even in the odd-numbered frames in FIG. 5, the positions of the viewpoint image and the depth map are reversed, but the viewpoint image and the depth map are alternately encoding target images in a certain viewpoint and the adjacent left and right viewpoints. In this way, in FIGS. 4 and 5, it is desirable that the viewpoint images and depth maps are thinned out so that they are alternately arranged in columns in each frame where the viewpoints have been thinned out.

フレーム間引きの処理の更に別の例について、図６、図７を用いて説明する。 Still another example of frame thinning processing will be described with reference to FIGS. 6 and 7.

図６は、フレーム間引きされた多視点映像及びデプスマップの偶数（２ｎ：ｎは整数）フレーム目を示しており、図７は、フレーム間引きされた多視点映像及びデプスマップの奇数（２ｎ＋１）フレーム目を示している。すなわち、図６は偶数フレーム目の符号化対象画像の更に別の例を示しており、図７は奇数フレーム目の符号化対象画像の更に別のを示している。各フレームでは、視点間引きされた５×５＝２５視点において、多視点画像とデプスマップの一方が間引き（フレーム間引き）され、図６と図７を対比すると明らかなように、各視点において、視点画像とデプスマップの間引く対象をフレーム毎に交互に切り替える。 FIG. 6 shows the even-numbered (2n: n is an integer) frame of the frame-thinned multi-view video and depth map, and FIG. 7 shows the odd-numbered (2n+1) frame of the frame-thinned multi-view video and depth map. Showing eyes. That is, FIG. 6 shows yet another example of the image to be encoded in the even-numbered frame, and FIG. 7 shows still another example of the image to be encoded in the odd-numbered frame. In each frame, one of the multi-view image and the depth map is thinned out (frame thinning) in 5×5=25 viewpoints, and as is clear from comparing FIG. 6 and FIG. The image and depth map thinning targets are alternately switched for each frame.

図６の例では、ある視点でデプスマップを間引いて視点画像を符号化対象画像としたとき、その上下の（視点間引き後）隣接する視点は、視点画像を間引いてデプスマップを符号化対象画像とする。図７の奇数フレームにおいても、視点画像とデプスマップの配置は反対であるが、ある視点とそれに隣接する上下の視点において、視点画像とデプスマップが交互に符号化対象画像となっている。このように、図６、図７では、視点間引きされた各フレームにおいて、視点画像及びデプスマップが交互に行配置となるように間引きされることが望ましい。 In the example of FIG. 6, when the depth map is thinned out at a certain viewpoint and the viewpoint image is used as the encoding target image, the adjacent viewpoints above and below (after viewpoint thinning) are thinned out and the depth map is used as the encoding target image. shall be. Even in the odd-numbered frames in FIG. 7, the positions of the viewpoint image and the depth map are opposite, but the viewpoint image and the depth map are alternately encoding target images in a certain viewpoint and the adjacent upper and lower viewpoints. In this way, in FIGS. 6 and 7, it is desirable that the viewpoint images and depth maps are thinned out so that they are alternately arranged in rows in each frame where the viewpoints have been thinned out.

図１に戻って、フレーム間引きされた多視点映像とデプスマップ（多視点画像とデプスマップの符号化対象画像）は、符号化部１３に出力される。 Returning to FIG. 1, the frame-thinned multi-view video and depth map (multi-view image and depth map encoding target image) are output to the encoding unit 13.

符号化部１３は、フレーム間引きされた多視点映像とデプスマップ（入力された符号化対象画像）を符号化する。符号化処理は、従来から使用されている符号化ツール（例えば、Ｈ．２６５／ＨＥＶＣ（High Efficiency Video Coding）の拡張規格、等）によって圧縮・符号化を行う。符号化対象の多視点映像とデプスマップの相関の高さを利用した予測処理により高効率なデータ圧縮を行うことができる。 The encoding unit 13 encodes the frame-thinned multi-view video and the depth map (the input encoding target image). In the encoding process, compression and encoding are performed using a conventionally used encoding tool (for example, an extended standard of H.265/HEVC (High Efficiency Video Coding), etc.). Highly efficient data compression can be performed by prediction processing that utilizes the high correlation between the multi-view video to be encoded and the depth map.

符号化部１３で生成された映像符号化データは、符号化装置１０の出力として、出力される。 The encoded video data generated by the encoder 13 is output as the output of the encoder 10.

本実施形態の符号化装置によれば、符号化対象画像（図２～図７）の視点間距離は従来例（図９）と同じでありながら、各視点の視点画像とデプスマップのフレームを交互に間引いており、符号化の情報量を半分に低減させることができる。 According to the encoding device of this embodiment, although the distance between viewpoints of the encoding target image (FIGS. 2 to 7) is the same as that of the conventional example (FIG. 9), the viewpoint image of each viewpoint and the depth map frame are The data are alternately thinned out, and the amount of information to be encoded can be reduced by half.

［復号装置］
復号装置２０は、復号部２１、フレーム内挿部２２、視点内挿部２３、及び多視点画像要素画像変換部２４を備えている。 [Decoding device]
The decoding device 20 includes a decoding section 21, a frame interpolation section 22, a viewpoint interpolation section 23, and a multi-view image element image conversion section 24.

符号化装置１０にて符号化された映像符号化データが、復号部２１に入力される。復号部２１は、入力された映像符号化データを、符号化に対応する復号方法により復号する。復号処理により、フレーム間引きされた多視点映像とデプスマップ（符号化対象画像）が復号される。復号された映像（画像）データは、フレーム内挿部２２に出力される。 Video encoded data encoded by the encoding device 10 is input to the decoding unit 21 . The decoding unit 21 decodes the input video encoded data using a decoding method corresponding to encoding. Through the decoding process, the frame-thinned multi-view video and the depth map (encoding target image) are decoded. The decoded video (image) data is output to the frame interpolation section 22.

フレーム内挿部２２は、フレーム間引きされた多視点映像とデプスマップのフレーム内挿を行う。例えば、２ｎフレームにおいてデプスマップが間引きされた視点では、２ｎ－１フレームの同じ視点のデプスマップと、２ｎ＋１フレームの同じ視点のデプスマップとを用いた補間処理等により、２ｎフレームのデプスマップを予測し、デプスマップのフレームを内挿する。なお、２ｎフレーム内の隣接するデプスマップを用いて、イントラ予測を利用してもよい。奇数（２ｎ＋１）フレームの間引きされたデプスマップも同様にフレーム内挿を行う。 The frame interpolation unit 22 performs frame interpolation between the frame-thinned multi-view video and the depth map. For example, at a viewpoint where the depth map is thinned out in 2n frames, the depth map of 2n frames is predicted by interpolation processing using the depth map of the same viewpoint of 2n-1 frame and the depth map of the same viewpoint of 2n+1 frame. and interpolate the depth map frame. Note that intra prediction may be used using adjacent depth maps within 2n frames. Frame interpolation is similarly performed on the thinned depth map of odd number (2n+1) frames.

また、多視点映像のフレーム内挿については、例えば、２ｎフレームにおいて視点画像が間引きされた視点では、２ｎ－１フレームの同じ視点の視点画像と、２ｎ＋１フレームの同じ視点の視点画像とを用いた補間処理に加え、２ｎフレームの当該視点のデプスマップを利用して、２ｎフレームの視点画像を予測し、視点画像のフレームを内挿することが望ましい。当該視点のデプスマップは符号化データを復号して得られた精度の高いデプスマップであり、フレーム内挿する視点画像の精度を高めることができる。奇数（２ｎ＋１）フレームの間引きされた視点画像も同様にフレーム内挿を行う。 Regarding frame interpolation of multi-view video, for example, for a viewpoint where the viewpoint images are thinned out in 2n frames, the viewpoint images of the same viewpoint of 2n-1 frames and the viewpoint images of the same viewpoint of 2n+1 frames are used. In addition to the interpolation process, it is desirable to use the depth map of the 2n-frame viewpoint to predict the 2n-frame viewpoint image and interpolate the frame of the viewpoint image. The depth map of the viewpoint is a highly accurate depth map obtained by decoding the encoded data, and it is possible to improve the accuracy of the viewpoint image for frame interpolation. Frame interpolation is similarly performed on the thinned-out viewpoint images of odd number (2n+1) frames.

このように、各フレームの各視点において、フレームが間引かれた多視点映像とデプスマップのフレーム内挿を行う。この結果、フレーム内挿後の多視点映像とデプスマップの情報量と視点の位置関係は、図９と同じになる。フレーム内挿された多視点映像とデプスマップを、視点内挿部２３へ出力する。 In this way, frame interpolation is performed between the frame-thinned multi-view video and the depth map at each viewpoint of each frame. As a result, the information amount of the multi-view video and the depth map after frame interpolation, and the positional relationship between the viewpoints are the same as in FIG. 9 . The frame interpolated multi-view video and depth map are output to the viewpoint interpolation unit 23.

視点内挿部２３は、フレーム内挿された多視点映像とデプスマップを用いることで、視点間の符号化されなかった多視点画像を視点内挿によって生成する。すなわち、フレーム内挿後の５×５個の多視点画像とデプスマップ（視点間引きされた画像：図９を参照。）から、視点内挿処理によって視点間の視点画像を予測・内挿し、視点間引きを行う前の１３×１３＝１６９視点の多視点画像（図９の全体）を再生する。 The viewpoint interpolation unit 23 uses the frame-interpolated multi-view video and the depth map to generate an uncoded multi-view image between viewpoints by viewpoint interpolation. That is, from the 5 x 5 multi-view images after frame interpolation and the depth map (view thinned images: see Figure 9), the viewpoint images between viewpoints are predicted and interpolated by viewpoint interpolation processing, and the viewpoint A multi-view image of 13×13=169 viewpoints (the entirety of FIG. 9) before thinning is reproduced.

視点内挿部２３における視点内挿処理を、図８を参照して説明する。図８において、各視点を左上の視点を基準とした二次元座標で表すこととする。例えば、座標（１，１）の視点画像と座標（４，４）の視点画像は、映像符号化データから復号された視点画像であり、座標（１，４）の視点画像と座標（４，１）の視点画像は、フレーム内挿によって生成された視点画像であるとする。また、座標（１，４）のデプスマップと座標（４，１）のデプスマップは、映像符号化データから復号されたデプスマップであり、座標（１，１）のデプスマップと座標（４，４）のデプスマップは、フレーム内挿によって生成されたデプスマップであるとする。 The viewpoint interpolation process in the viewpoint interpolation unit 23 will be explained with reference to FIG. In FIG. 8, each viewpoint is represented by two-dimensional coordinates based on the upper left viewpoint. For example, a viewpoint image at coordinates (1,1) and a viewpoint image at coordinates (4,4) are viewpoint images decoded from video encoded data, and a viewpoint image at coordinates (1,4) and a viewpoint image at coordinates (4,4) are decoded from video encoded data. It is assumed that the viewpoint image in 1) is a viewpoint image generated by frame interpolation. Also, the depth map at coordinates (1, 4) and the depth map at coordinates (4, 1) are depth maps decoded from video encoded data, and the depth map at coordinates (1, 1) and the depth map at coordinates (4, 1) are depth maps decoded from video encoded data. It is assumed that the depth map 4) is a depth map generated by frame interpolation.

本実施形態においては、参照画像である多視点画像とデプスマップの信頼度の重み（係数）を設定する。この信頼度の設定では、フレーム内挿で生成した多視点画像とデプスマップの信頼度を小さくし、伝送や記録された映像符号化データから復号された多視点画像とデプスマップの信頼度を高い値に設定する。 In this embodiment, reliability weights (coefficients) of a multi-view image that is a reference image and a depth map are set. This reliability setting reduces the reliability of multi-view images and depth maps generated by frame interpolation, and increases the reliability of multi-view images and depth maps decoded from transmitted and recorded video encoded data. Set to value.

視点内挿は、従来と同様に、最初にデプスマップの視点内挿を行う。すなわち、５×５＝２５視点のデプスマップから視点間の全デプスマップを生成して内挿する。図８の例では、ある視点（例えば、座標（３，３））のデプスマップを生成する際には、その周囲の復号された又はフレーム内挿されたデプスマップ（座標（１，１）、（１，４）、（４，１）、（４，４））を参照画像とし、参照画像からの距離に基づいて（例えば、距離の逆数を信頼度として）画像を補間し、当該視点のデプスマップを生成する。 In viewpoint interpolation, viewpoint interpolation of a depth map is first performed as in the conventional method. That is, from the depth maps of 5×5=25 viewpoints, all depth maps between viewpoints are generated and interpolated. In the example of FIG. 8, when generating a depth map for a certain viewpoint (for example, coordinates (3, 3)), the decoded or frame interpolated depth map (coordinates (1, 1), (1, 4), (4, 1), (4, 4)) as a reference image, interpolate the image based on the distance from the reference image (for example, using the reciprocal of the distance as the reliability), and Generate a depth map.

本実施形態では、このときに、参照画像であるデプスマップの信頼度の重みを計算要素として加える。すなわち、参照画像からの距離に基づいた信頼度に対して、フレーム内挿で生成したデプスマップ（座標（１，１）、（４，４））に基づく信頼度を小さくするよう補正し、符号化データから復号されたデプスマップ（座標（１，４）、（４，１））に基づく信頼度を高い値に補正する。例えば、内挿されたデプスマップについては０．５の重み係数で信頼度を補正し、復号されたデプスマップについては１．０の重み係数で信頼度を補正する。なお、係数は調整可能である。補正された信頼度に基づいて、参照画像に基づく補間処理を行い、視点間のデプスマップを生成して内挿する。同様の処理を、他の視点のデプスマップについても行う。 In this embodiment, at this time, the weight of the reliability of the depth map, which is the reference image, is added as a calculation element. In other words, the reliability based on the depth map (coordinates (1, 1), (4, 4)) generated by frame interpolation is corrected to be smaller than the reliability based on the distance from the reference image, and the The reliability based on the depth map (coordinates (1, 4), (4, 1)) decoded from the encoded data is corrected to a high value. For example, the reliability of an interpolated depth map is corrected with a weighting factor of 0.5, and the reliability of a decoded depth map is corrected with a weighting factor of 1.0. Note that the coefficient can be adjusted. Based on the corrected reliability, interpolation processing is performed based on the reference image to generate and interpolate a depth map between viewpoints. Similar processing is performed for depth maps of other viewpoints.

次いで、すべてのデプスマップの視点内挿後に、多視点画像の視点内挿を行う。例えば、フレーム内挿後の５×５＝２５視点の多視点画像と全デプスマップから、視点間の視点画像を生成して内挿する。ある視点（例えば、座標（３，３））の視点画像を生成する際には、その周囲の復号された又はフレーム内挿された視点画像（座標（１，１）、（１，４）、（４，１）、（４，４））を参照画像とし、参照画像からの距離に基づいて（例えば、距離の逆数を信頼度として）画像を補間し、さらに各視点のデプスマップを用いて、当該視点の視点画像を生成する。 Next, after viewpoint interpolation of all depth maps, viewpoint interpolation of multi-view images is performed. For example, from a multi-view image of 5×5=25 viewpoints after frame interpolation and all depth maps, a viewpoint image between viewpoints is generated and interpolated. When generating a viewpoint image at a certain viewpoint (for example, coordinates (3, 3)), the surrounding decoded or frame interpolated viewpoint images (coordinates (1, 1), (1, 4), (4,1), (4,4)) as a reference image, interpolate the image based on the distance from the reference image (for example, using the reciprocal of the distance as the confidence level), and further use the depth map of each viewpoint. , generate a viewpoint image of the viewpoint.

本実施形態では、このときに、参照画像である視点画像の信頼度の重みを計算要素として加える。すなわち、参照画像からの距離に基づいた信頼度に対して、フレーム内挿で生成した視点画像（座標（１，４）、（４，１））に基づく信頼度を小さくするよう補正し、符号化データから復号された視点画像（座標（１，１）、（４，４））に基づく信頼度を高い値に補正する。例えば、内挿された視点画像については０．５の重み係数で信頼度を補正し、復号された視点画像については１．０の重み係数で信頼度を補正する。なお、係数は調整可能である。また、デプスマップの利用においても、フレーム内挿又は視点内挿で生成したデプスマップの信頼度を小さくし、符号化データから復号されたデプスマップの信頼度を高く設定して利用する。補正された信頼度に基づいて、参照画像に基づく補間処理を行い、多視点画像を生成して内挿する。同様の処理を、他の視点の視点画像についても行う。 In this embodiment, at this time, the reliability weight of the viewpoint image that is the reference image is added as a calculation element. In other words, the reliability based on the viewpoint image (coordinates (1, 4), (4, 1)) generated by frame interpolation is corrected to be smaller than the reliability based on the distance from the reference image, and the The reliability based on the viewpoint image (coordinates (1, 1), (4, 4)) decoded from the encoded data is corrected to a high value. For example, the reliability of interpolated viewpoint images is corrected with a weighting coefficient of 0.5, and the reliability of decoded viewpoint images is corrected with a weighting coefficient of 1.0. Note that the coefficient can be adjusted. Furthermore, when using a depth map, the reliability of a depth map generated by frame interpolation or viewpoint interpolation is set low, and the reliability of a depth map decoded from encoded data is set high. Based on the corrected reliability, interpolation processing is performed based on the reference image to generate and interpolate a multi-view image. Similar processing is performed for viewpoint images of other viewpoints.

こうして、多視点画像の内挿を行い、再生された多視点画像群を、多視点画像要素画像変換部２４へ出力する。 In this way, multi-view images are interpolated, and the reproduced multi-view image group is output to the multi-view image element image conversion section 24.

多視点画像要素画像変換部２４は、入力された多視点画像を要素画像群に変換する。要素画像群は、多視点画像から変換して作成することができる。すなわち、多視点画像を構成する各視点画像から、互いに同じ座標位置にある１画素を抽出し、多視点画像の全体の配置を保ったまま集積することで、１つの要素画像を生成する。例えば、１３×１３個のカメラで撮影した多視点画像群の各視点画像の１画素から、１３×１３画素の要素画像が生成される。他の要素画像も同様に生成することにより、多視点画像を、１３×１３画素の要素画像が視点画像の画素数集合した、要素画像群に変換することができる。これを復号装置２０の出力画像とする。この出力画像により、インテグラル３Ｄ映像を表示することができる。 The multi-view image element image conversion unit 24 converts the input multi-view images into a group of element images. The element image group can be created by converting multi-view images. That is, one element image is generated by extracting one pixel located at the same coordinate position from each viewpoint image forming a multi-view image and integrating the pixels while maintaining the overall arrangement of the multi-view images. For example, a 13×13 pixel element image is generated from one pixel of each viewpoint image of a multi-view image group captured by 13×13 cameras. By generating other element images in the same way, the multi-view image can be converted into an element image group in which 13×13 pixel element images are a set of the number of pixels of the viewpoint images. This is assumed to be the output image of the decoding device 20. This output image allows integral 3D video to be displayed.

なお、本実施形態では、出力画像に基づいてインテグラル３Ｄ映像を表示させることを前提として、要素画像群を出力画像としたが、例えば、多視点映像を表示させるためには、多視点画像要素画像変換部２４を設けることなく、視点内挿後の多視点画像を復号装置の出力画像としてもよい。 Note that in this embodiment, the element image group is the output image on the premise that an integral 3D video is displayed based on the output image, but for example, in order to display a multi-view video, the multi-view image element The multi-view image after viewpoint interpolation may be used as the output image of the decoding device without providing the image conversion unit 24.

本実施形態の復号装置によれば、各視点の画像の信頼度を利用することにより、精度の高い視点内挿ができる。また、視点間引き後の視点間距離を比較的短くすることで、視点内挿画像の画質劣化を抑えることが可能になる。 According to the decoding device of this embodiment, highly accurate viewpoint interpolation can be performed by using the reliability of images of each viewpoint. Furthermore, by making the distance between viewpoints relatively short after viewpoint thinning, it is possible to suppress deterioration in the image quality of the viewpoint interpolation image.

上記の実施の形態では、符号化装置１０の構成と動作について説明したが、本発明はこれに限らず、多視点画像を符号化する符号化方法として構成されてもよい。すなわち、図１のデータの流れに従って、多視点画像から映像符号化データを生成する符号化方法として構成されてもよい。また、復号装置２０の構成と動作について説明したが、本発明はこれに限らず、映像符号化データを復号する復号方法として構成されてもよい。すなわち、図１のデータの流れに従って、映像符号化データから、要素画像群の出力画像を生成する復号方法として構成されてもよい。 In the above embodiment, the configuration and operation of the encoding device 10 have been described, but the present invention is not limited to this, and may be configured as an encoding method for encoding a multi-view image. That is, the present invention may be configured as an encoding method that generates video encoded data from multi-view images according to the data flow shown in FIG. Further, although the configuration and operation of the decoding device 20 have been described, the present invention is not limited to this, and may be configured as a decoding method for decoding video encoded data. That is, the present invention may be configured as a decoding method that generates an output image of an elemental image group from encoded video data according to the data flow shown in FIG.

なお、上述した符号化装置１０又は復号装置２０として機能させるためにコンピュータを好適に用いることができ、そのようなコンピュータは、符号化装置１０又は復号装置２０の各機能を実現する処理内容を記述したプログラムを該コンピュータの記憶部に格納しておき、該コンピュータのＣＰＵによってこのプログラムを読み出して実行させることで実現することができる。なお、このプログラムは、コンピュータ読取り可能な記録媒体に記録可能である。 Note that a computer can be suitably used to function as the encoding device 10 or the decoding device 20 described above, and such a computer can be used to describe the processing content for realizing each function of the encoding device 10 or the decoding device 20. This can be achieved by storing a program in the storage section of the computer, and having the CPU of the computer read and execute this program. Note that this program can be recorded on a computer-readable recording medium.

上述の実施形態は代表的な例として説明したが、本発明の趣旨及び範囲内で、多くの変更及び置換ができることは当業者に明らかである。したがって、本発明は、上述の実施形態によって制限するものと解するべきではなく、特許請求の範囲から逸脱することなく、種々の変形や変更が可能である。例えば、実施形態に記載の複数の構成ブロックを１つに組み合わせたり、あるいは１つの構成ブロックを分割したりすることが可能である。 Although the embodiments described above have been described as representative examples, it will be apparent to those skilled in the art that many modifications and substitutions can be made within the spirit and scope of the invention. Therefore, the present invention should not be construed as being limited to the above-described embodiments, and various modifications and changes can be made without departing from the scope of the claims. For example, it is possible to combine a plurality of configuration blocks described in the embodiments into one, or to divide one configuration block.

１０符号化装置
１１視点間引き部
１２フレーム間引き部
１３符号化部
２０復号装置
２１復号部
２２フレーム内挿部
２３視点内挿部
２４多視点画像要素画像変換部
10 Encoding device 11 Viewpoint thinning section 12 Frame thinning section 13 Encoding section 20 Decoding device 21 Decoding section 22 Frame interpolation section 23 Viewpoint interpolation section 24 Multi-view image element image conversion section

Claims

An encoding device that encodes a multi-view video and a depth map corresponding to the multi-view video, comprising:
a viewpoint thinning unit that thins out the viewpoints of the multi-view video and the depth map in each frame;
a frame thinning unit that thins out frames of the multi-view video and the depth map;
In an encoding device including an encoding unit that encodes a multi-view video that has undergone viewpoint thinning and frame thinning and a depth map,
The encoding device is characterized in that the frame thinning unit thins out either the viewpoint image or the depth map at each viewpoint of each frame, and switches the thinning target for each frame.

The encoding device according to claim 1,
The frame thinning unit is characterized in that in each frame, one of the viewpoint image and the depth map of one viewpoint is thinned out, and the other of the viewpoint image and the depth map of at least one viewpoint adjacent to the one viewpoint is thinned out. Encoding device.

The encoding device according to claim 1 or 2,
The encoding device is characterized in that the frame thinning unit thins out viewpoint images or depth maps in each frame subjected to viewpoint thinning so as to form a checkered pattern.

The encoding device according to claim 1 or 2,
The encoding device is characterized in that the frame thinning unit thins out viewpoint images and depth maps in each frame subjected to viewpoint thinning so that they are alternately arranged in columns or rows.

a decoding unit that decodes a multi-view video and a depth map from video encoded data;
a frame interpolation unit that interpolates thinned frames to the multi-view video and the depth map;
Each frame includes a viewpoint interpolation unit that interpolates the thinned-out viewpoint image and the depth map ,
setting the reliability of the viewpoint image and the depth map decoded from the video encoded data to a higher value than the reliability of the interpolated viewpoint image and the depth map;
The decoding device is characterized in that the viewpoint interpolation unit predicts the viewpoint image based on a reliability weight of the reference image .

A program that causes a computer to function as the encoding device according to any one of claims 1 to 4.

A program that causes a computer to function as the decoding device according to claim 5 .