JP2020195093A

JP2020195093A - Encoder, decoder, and program

Info

Publication number: JP2020195093A
Application number: JP2019100591A
Authority: JP
Inventors: 一宏原; Kazuhiro Hara; 三科　智之; Tomoyuki Mishina; 智之三科
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2019-05-29
Filing date: 2019-05-29
Publication date: 2020-12-03
Anticipated expiration: 2039-05-29
Also published as: JP7389565B2

Abstract

To provide an encoder, a decoder, and a program that can prevent deterioration in image quality after encoding and decoding regarding a multi-viewpoint image and reduce an information amount to be transmitted and recorded.SOLUTION: An encoder includes: an encoding processing unit that inputs a multi-viewpoint image as an input image, down-samples the input image, and then encodes the image; and a learning model creation processing unit that generates a learning model for machine learning used to up-sample the image based on the input image. A decoder includes: a decoding processing unit that decodes the multi-viewpoint image from coded image data and further converts the multi-viewpoint image into an element image group; a machine learning processing unit that generates an interpolation element image by a machine learning image based on the input learning model; and an output image generation unit that interpolates the interpolation element image into the element image group and generates an output image.SELECTED DRAWING: Figure 1

Description

本発明は、符号化装置、復号装置、及びプログラムに関し、特に、インテグラル映像の表示や自由視点映像の表示に必要となる多視点画像の符号化装置、復号装置、及びプログラムに関する。 The present invention relates to a coding device, a decoding device, and a program, and more particularly to a multi-viewpoint image coding device, a decoding device, and a program necessary for displaying an integral image and a free-viewpoint image.

インテグラル映像を表示する要素画像群を撮影することができるカメラとして、撮像素子のセンサーの手前にレンズアレイを配置するライトフィールドカメラが製品化されている。しかし、一般にライトフィールドカメラは撮影後のリフォーカス機能を目的としている。そのため、ライトフィールドカメラで撮影した画像を用いてインテグラル映像を表示すると、ライトフィールドカメラを構成するメインレンズの直径が、被写体までの距離に比べて小さな値となることから、運動視差が小さく、３次元映像の奥行を十分に再現することができない。この問題は、メインレンズの直径を大きくすることやカメラと被写体との距離を短くすることで理論上は解決することができるが、これらの対策による問題解決は実用的ではない。 A light field camera in which a lens array is arranged in front of the sensor of the image sensor has been commercialized as a camera capable of capturing an element image group for displaying an integral image. However, light field cameras are generally intended for post-shooting refocusing. Therefore, when an integral image is displayed using an image taken by a light field camera, the diameter of the main lens constituting the light field camera becomes a small value compared to the distance to the subject, so that the motion parallax is small. The depth of the 3D image cannot be reproduced sufficiently. This problem can be theoretically solved by increasing the diameter of the main lens or shortening the distance between the camera and the subject, but it is not practical to solve the problem by these measures.

そこで、通常のカメラを水平・垂直の２次元配列に並べたカメラアレイを用いて、多視点映像を撮影することが考えられている。この場合の要素画像群の生成は、カメラアレイで撮影された複数の映像から視点内挿処理を用いることでカメラ間の視点映像を生成、その後、カメラアレイで撮影した映像と視点内挿映像から要素画像群に変換する処理が行われる（特許文献１）。ここで、カメラアレイのカメラ間距離は、カメラから被写体までの距離や、視点内挿が実用的に可能な距離、表示装置で再現できる視域角によって設計できることが知られている。また、視点内挿処理ではカメラから被写体までの距離を相対的に表現するデプスマップを用いることで高精度な内挿画像の生成が行われている。デプスマップは、画像処理技術による奥行き推定や赤外線を用いて光学的に距離を測定する方法で生成される。このデプスマップ生成の精度を上げると、視点内挿の精度も向上する。 Therefore, it is considered to shoot a multi-viewpoint image by using a camera array in which ordinary cameras are arranged in a horizontal and vertical two-dimensional array. In this case, the element image group is generated by using the viewpoint interpolation processing from a plurality of images taken by the camera array to generate the viewpoint image between the cameras, and then from the image taken by the camera array and the viewpoint interpolation image. A process of converting into an element image group is performed (Patent Document 1). Here, it is known that the inter-camera distance of the camera array can be designed by the distance from the camera to the subject, the distance where viewpoint interpolation is practically possible, and the viewing range angle that can be reproduced by the display device. Further, in the viewpoint interpolation process, a highly accurate interpolation image is generated by using a depth map that relatively expresses the distance from the camera to the subject. Depth maps are generated by depth estimation using image processing technology or optical distance measurement using infrared rays. Increasing the accuracy of this depth map generation also improves the accuracy of viewpoint interpolation.

インテグラル映像の表示について、３次元映像を再現できる奥行は隣接する多視点画像間の視差、レンズアレイの焦点距離、および要素画像の画素数に関係する。その中でも３次元映像を再現できる奥行きを広げるためには、要素画像の画素数を増やすことが有効であると知られている。この場合、要素画像の画素数は多視点画像の視点数と等しくなることから、奥行きのある３次元映像を生成するためには符号化対象となる多視点画像の視点数が多く必要になり、３次元映像を表示するための情報量は膨大となる。 Regarding the display of the integral image, the depth at which the stereoscopic image can be reproduced is related to the parallax between adjacent multi-viewpoint images, the focal length of the lens array, and the number of pixels of the element image. Among them, it is known that it is effective to increase the number of pixels of the element image in order to increase the depth at which the three-dimensional image can be reproduced. In this case, since the number of pixels of the element image is equal to the number of viewpoints of the multi-viewpoint image, a large number of viewpoints of the multi-viewpoint image to be encoded is required to generate a deep three-dimensional image. The amount of information for displaying a three-dimensional image is enormous.

インテグラル映像の伝送や記録では、３次元映像を表示するための膨大な情報量を符号化する。符号化では、要素画像群を多視点画像群に変換後に多視点映像符号化を行う方法や、変換後の多視点映像を符号化時に間引き、復号時に視点内挿する方法が知られている。 In the transmission and recording of integral video, a huge amount of information for displaying a three-dimensional video is encoded. In coding, a method of converting an element image group into a multi-viewpoint image group and then performing multi-viewpoint video coding, and a method of thinning out the converted multi-viewpoint video at the time of coding and interpolating the viewpoint at the time of decoding are known.

特開２０１６−１５８２１３号公報Japanese Unexamined Patent Publication No. 2016-15823

しかしながら、符号化時に多視点映像の視点数を減らす方法は、伝送や記録において、３次元映像を表示するための情報量を減らすことができる一方で、復号後の視点内挿による画質劣化が発生する。これは、視点内挿処理での精度を上げるためのデプスマップ生成にて、参照する視点数を減らすことから生成するデプスマップの精度の低下が影響していることと、視点内挿処理でのオクルージョン領域（陰になって見えない領域）の予測にて、視差が大きくなることからオクリュージョン領域が増えることも原因として挙げられる。したがって、視点数を減らす手法によって、情報量を削減することには限界がある。さらに、多くの視点を削減した場合には、符号化処理において、符号化の対象である多視点画像間の視差が大きくなることから視点補償予測の精度を低下させ、符号化効率を悪化させてしまう。 However, the method of reducing the number of viewpoints of the multi-viewpoint video at the time of encoding can reduce the amount of information for displaying the three-dimensional video in transmission and recording, but the image quality deteriorates due to the viewpoint insertion after decoding. To do. This is because the depth map generation to improve the accuracy in the viewpoint interpolation process is affected by the decrease in the accuracy of the depth map generated by reducing the number of viewpoints to be referenced, and the viewpoint interpolation process. Another reason is that the interpolation region increases due to the large parallax in the prediction of the interpolation region (the region that cannot be seen in the shadow). Therefore, there is a limit to reducing the amount of information by the method of reducing the number of viewpoints. Further, when many viewpoints are reduced, the parallax between the multi-viewpoint images to be coded becomes large in the coding process, so that the accuracy of the viewpoint compensation prediction is lowered and the coding efficiency is deteriorated. It ends up.

従って、上記のような問題点に鑑みてなされた本発明の目的は、多視点画像について、符号化・復号後の画質劣化を抑制し、且つ、伝送・記録する情報量の削減を可能にする符号化装置、復号装置、及びプログラムを提供することにある。 Therefore, an object of the present invention made in view of the above problems is to suppress deterioration of image quality after coding / decoding of a multi-viewpoint image and to reduce the amount of information to be transmitted / recorded. To provide a coding device, a decoding device, and a program.

上記課題を解決するために、本発明は、符号化側では、多視点画像のダウンサンプリング（画素数を減らし、画面解像度を低くする手法）を利用することで情報量を削減する。また、復号側では、多視点画像を要素画像群に変換し、要素画像の補間を行うことで画像のアップサンプリング（画面解像度の復元）を行う。さらに、要素画像の補間に機械学習を用いる。なお、本明細書で「画像」とは、動画像を含み、いわゆる「映像」であってよい。 In order to solve the above problems, the present invention reduces the amount of information on the coding side by using downsampling of a multi-viewpoint image (a method of reducing the number of pixels and lowering the screen resolution). Further, on the decoding side, the multi-viewpoint image is converted into an element image group, and the element image is interpolated to perform image upsampling (restoration of screen resolution). Furthermore, machine learning is used for interpolation of element images. In the present specification, the "image" includes a moving image and may be a so-called "video".

上記課題を解決するために本発明に係る符号化装置は、多視点画像を入力画像とし、前記入力画像のダウンサンプリングを行い、その後、画像の符号化を行う符号化処理部と、前記入力画像に基づいて、前記画像のアップサンプリングに利用する機械学習のための学習モデル及び／又は学習パラメータを生成する学習モデル作成処理部とを備えることを特徴とする。 In order to solve the above problems, the coding apparatus according to the present invention has a coding processing unit that uses a multi-viewpoint image as an input image, downsamples the input image, and then encodes the image, and the input image. Based on the above, it is characterized by including a learning model for machine learning used for upsampling the image and / or a learning model creation processing unit that generates learning parameters.

また、前記符号化装置は、前記符号化処理部が、さらに多視点画像を間引く視点間引き処理を行うことが望ましい。 Further, in the coding apparatus, it is desirable that the coding processing unit further thins out the multi-viewpoint images.

また、前記符号化装置は、前記符号化処理部が、前記入力画像からデプスマップを生成することが望ましい。 Further, in the coding apparatus, it is desirable that the coding processing unit generates a depth map from the input image.

また、前記符号化装置は、前記学習モデル作成処理部が、前記多視点画像を要素画像群に変換し、内挿対象の要素画像に対して、当該要素画像の隣接要素画像を入力データとして、機械学習を行うことが望ましい。 Further, in the coding device, the learning model creation processing unit converts the multi-viewpoint image into an element image group, and uses an adjacent element image of the element image as input data with respect to the element image to be interpolated. It is desirable to perform machine learning.

上記課題を解決するために本発明に係る復号装置が、画像符号化データから多視点画像を復号し、さらに前記多視点画像を要素画像群に変換する復号処理部と、入力された学習モデル及び／又は学習パラメータに基づく機械学習機能により、補間要素画像を生成する機械学習処理部と、前記要素画像群に前記補間要素画像を内挿し、出力画像を生成する出力画像生成部とを備えることを特徴とする。 In order to solve the above problems, the decoding device according to the present invention decodes a multi-viewpoint image from image-encoded data, and further converts the multi-viewpoint image into an element image group, a decoding processing unit, an input learning model, and / Or a machine learning processing unit that generates an interpolation element image by a machine learning function based on a learning parameter, and an output image generation unit that inserts the interpolation element image into the element image group and generates an output image. It is a feature.

また、前記復号装置は、前記復号処理部が、復号後の前記多視点画像に対して視点内挿を行うことが望ましい。 Further, in the decoding device, it is desirable that the decoding processing unit inserts the viewpoint into the multi-viewpoint image after decoding.

また、前記復号装置は、前記出力画像生成部は、前記補間要素画像の内挿後に、前記要素画像群から多視点画像への変換を行い、変換後の多視点画像に対して、視点内挿を行うことが望ましい。 Further, in the decoding device, the output image generation unit converts the element image group into a multi-viewpoint image after interpolating the interpolated element image, and inserts the viewpoint into the converted multi-viewpoint image. It is desirable to do.

また、前記復号装置は、表示する画像の奥行きに応じた前記学習モデル及び／又は学習パラメータを複数用意し、デプスマップから得られる前記画像の奥行きに応じて、前記学習モデル及び／又は学習パラメータを切り替えることが望ましい。 Further, the decoding device prepares a plurality of the learning model and / or learning parameters according to the depth of the image to be displayed, and sets the learning model and / or the learning parameters according to the depth of the image obtained from the depth map. It is desirable to switch.

上記課題を解決するために本発明に係るプログラムは、コンピュータを、前記符号化装置として機能させるためのプログラムとすることを特徴とする。 In order to solve the above problems, the program according to the present invention is characterized in that the computer is a program for functioning as the coding device.

また、上記課題を解決するために本発明に係るプログラムは、コンピュータを、前記復号装置として機能させるためのプログラムとすることを特徴とする。 Further, in order to solve the above problems, the program according to the present invention is characterized in that the computer is a program for functioning as the decoding device.

本発明における符号化装置、復号装置、及びプログラムによれば、多視点画像について、符号化により伝送・記録する情報量を大きく削減することが可能になる。また、復号後の画質劣化を抑制することができる。 According to the coding device, the decoding device, and the program in the present invention, it is possible to greatly reduce the amount of information transmitted / recorded by coding a multi-viewpoint image. In addition, deterioration of image quality after decoding can be suppressed.

本発明の符号化装置及び復号装置の構成の例を示す図である。It is a figure which shows the example of the structure of the coding apparatus and decoding apparatus of this invention. 本発明の符号化装置のブロック図の例である。It is an example of the block diagram of the coding apparatus of this invention. 多視点画像と要素画像の関係について説明する図である。It is a figure explaining the relationship between a multi-viewpoint image and an element image. 機械学習に使用する各種画像の例を示す図である。It is a figure which shows the example of various images used for machine learning. 本発明の復号装置のブロック図の一例である。It is an example of the block diagram of the decoding apparatus of this invention. 本発明の復号装置のブロック図の別の例である。It is another example of the block diagram of the decoding apparatus of this invention.

以下、本発明の実施の形態について、図面を用いて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

（実施の形態）
本発明の符号化装置及び復号装置の構成の例を図１に示す。符号化装置１００と復号装置２００は、全体として符号化・復号システムを構成する。符号化装置１００と復号装置２００の間は、情報通信が可能な任意の伝送路で結ばれていてもよく、この場合は、両者は送信装置１００と受信装置２００として機能する。このときの送受信方法としては、放送システム、電波通信、有線・無線ネットワーク等を利用することができる。また、両者をそれぞれ独立した装置とし、記録媒体等を用いて符号化装置１００から復号装置２００へのデータの授受を行ってもよい。 (Embodiment)
An example of the configuration of the coding device and the decoding device of the present invention is shown in FIG. The coding device 100 and the decoding device 200 together constitute a coding / decoding system. The coding device 100 and the decoding device 200 may be connected by an arbitrary transmission line capable of information communication, and in this case, both function as the transmitting device 100 and the receiving device 200. As a transmission / reception method at this time, a broadcasting system, radio wave communication, a wired / wireless network, or the like can be used. Further, both may be independent devices, and data may be exchanged from the coding device 100 to the decoding device 200 using a recording medium or the like.

符号化装置１００は、符号化処理部１１０と学習モデル作成処理部１２０とを備える。符号化装置１００に入力された入力画像（多視点画像）について、符号化処理部１１０は、少なくとも画像のダウンサンプリングを行い、情報量を削減し、その後、画像の符号化を行って、画像符号化データを出力する。 The coding device 100 includes a coding processing unit 110 and a learning model creation processing unit 120. With respect to the input image (multi-viewpoint image) input to the coding apparatus 100, the coding processing unit 110 at least downsamples the image to reduce the amount of information, and then encodes the image to obtain an image code. Output the conversion data.

また、学習モデル作成処理部１２０は入力画像に基づいて、復号側での画像のアップサンプリングに利用する機械学習のための学習モデル及び／又は学習パラメータを生成し、出力する。 Further, the learning model creation processing unit 120 generates and outputs a learning model and / or a learning parameter for machine learning used for upsampling the image on the decoding side based on the input image.

復号装置２００は、復号処理部２１０、機械学習処理部２２０、及び出力画像生成部２３０を備える。復号装置２００に入力された多視点画像の画像符号化データに基づいて、復号処理部２１０は多視点画像の復号を行い、更に多視点画像を要素画像群に変換し、変換された要素画像群のデータを機械学習処理部２２０と出力画像生成部２３０へ出力する。 The decoding device 200 includes a decoding processing unit 210, a machine learning processing unit 220, and an output image generation unit 230. Based on the image coding data of the multi-viewpoint image input to the decoding device 200, the decoding processing unit 210 decodes the multi-viewpoint image, further converts the multi-viewpoint image into an element image group, and the converted element image group. Data is output to the machine learning processing unit 220 and the output image generation unit 230.

機械学習処理部２２０は、復号装置２００に入力された学習モデル及び／又は学習パラメータに基づいて学習済みの機械学習機能を再構成し、復号処理部２１０からの要素画像データに基づいて補間画像（補間要素画像）を生成する。 The machine learning processing unit 220 reconstructs the trained machine learning function based on the learning model and / or the learning parameter input to the decoding device 200, and the interpolated image (interpolated image) based on the element image data from the decoding processing unit 210. Interpolate element image) is generated.

そして、出力画像生成部２３０は、復号処理部２１０からの要素画像群に機械学習処理部２２０からの補間要素画像を内挿し、出力画像を生成する。この補間要素画像の内挿は、多視点画像のアップサンプリングを行うことと等価な処理である。なお、本発明では、インテグラル映像を構成するための要素画像を出力画像としているが、例えば、自由視点映像を構成するための多視点画像を出力画像とすることもできる。 Then, the output image generation unit 230 interpolates the interpolated element image from the machine learning processing unit 220 into the element image group from the decoding processing unit 210 to generate an output image. The interpolation of the interpolation element image is a process equivalent to upsampling the multi-viewpoint image. In the present invention, the element image for forming the integral image is used as the output image, but for example, the multi-viewpoint image for forming the free-viewpoint image can be used as the output image.

以下、符号化装置１００、復号装置２００それぞれについて、詳細に説明する。 Hereinafter, each of the coding device 100 and the decoding device 200 will be described in detail.

［符号化装置］
図２は、本発明の符号化装置１００のブロック図の例である。視点間引き部１１１、デプスマップ生成部１１２、ダウンサンプリング部１１３、及び符号化部１１４が、図１の符号化処理部１１０に相当し、多視点画像要素画像変換部１２１、学習用画像生成部１２２、及び学習モデル生成部１２３が、図１の学習モデル作成処理部１２０に相当する。以下、各ブロックについて説明する。 [Encoding device]
FIG. 2 is an example of a block diagram of the coding device 100 of the present invention. The viewpoint thinning unit 111, the depth map generation unit 112, the downsampling unit 113, and the coding unit 114 correspond to the coding processing unit 110 of FIG. 1, and the multi-viewpoint image element image conversion unit 121 and the learning image generation unit 122. , And the learning model generation unit 123 corresponds to the learning model creation processing unit 120 of FIG. Hereinafter, each block will be described.

入力画像は、例えば、カメラ（例えば、ＣＭＯＳセンサ）が縦横２５×２５個（＝６２５個）配列された多視点カメラで取得した多視点画像である。１視点の画像のそれぞれは、カラーのテクスチャー画像である。また、入力画像にデプスマップを含めても良い。入力画像は、視点間引き部１１１、デプスマップ生成部１１２、多視点画像要素画像変換部１２１のそれぞれに入力される。 The input image is, for example, a multi-viewpoint image acquired by a multi-viewpoint camera in which 25 × 25 (= 625) cameras (for example, CMOS sensors) are arranged vertically and horizontally. Each of the images from one viewpoint is a color texture image. In addition, the depth map may be included in the input image. The input image is input to each of the viewpoint thinning unit 111, the depth map generation unit 112, and the multi-view image element image conversion unit 121.

視点間引き部１１１は、入力された多視点画像について、等間隔で視点を間引く視点間引き処理を行う。例えば、２５×２５の視点を間引いて５×５視点の画像に縮小する。なお、デプスマップについても、必要に応じて間引き処理を行う。間引きされた多視点画像は、ダウンサンプリング部１１３に出力される。 The viewpoint thinning unit 111 performs a viewpoint thinning process for thinning the viewpoints at equal intervals on the input multi-view image. For example, the 25 × 25 viewpoint is thinned out to reduce the image to a 5 × 5 viewpoint image. The depth map is also thinned out as necessary. The thinned out multi-viewpoint image is output to the downsampling unit 113.

デプスマップ生成部１１２は、入力された多視点画像からデプスマップを作成する。デプスマップ作成には視点間引きをしていない多視点画像（例えば、２５×２５視点の画像）を利用することができ、マップに画像のデプス情報を正確に反映することができる。生成されたデプスマップは、視点間引き部１１１に出力される。デプスマップの間引き処理を行わない場合は、生成されたデプスマップを、ダウンサンプリング部１１３に出力してもよい。なお、デプスマップが撮影され、入力画像にデプスマップが含まれる場合は、このデプスマップ生成部１１２を省略することもできる。 The depth map generation unit 112 creates a depth map from the input multi-viewpoint image. A multi-viewpoint image (for example, a 25 × 25 viewpoint image) without decimation of viewpoints can be used for creating the depth map, and the depth information of the image can be accurately reflected on the map. The generated depth map is output to the viewpoint thinning unit 111. When the depth map thinning process is not performed, the generated depth map may be output to the downsampling unit 113. If the depth map is captured and the input image includes the depth map, the depth map generation unit 112 may be omitted.

ダウンサンプリング部１１３は、入力された多視点画像のダウンサンプリングを行う。ダウンサンプリング処理としては、例えば、各画像の画面解像度を１／４（縦横１／２）にする。このサンプリング率をどの程度低下させるかは、必要に応じて選択することができる。なお、デプスマップも同様にダウンサンプリングすることができる。ダウンサンプリングされた画像は、符号化部１１４に出力する。 The downsampling unit 113 downsamples the input multi-viewpoint image. As the downsampling process, for example, the screen resolution of each image is reduced to 1/4 (vertical and horizontal 1/2). How much to reduce this sampling rate can be selected as needed. The depth map can also be downsampled in the same manner. The downsampled image is output to the coding unit 114.

符号化部１１４は、ダウンサンプリングされた画像を符号化する。符号化処理は、画像の任意の符号化方法を用いることができ、例えば、ＭＰＥＧ（Moving Picture Experts Group）、H.264／ＡＶＣ（Advanced Video Coding）、H.265／ＨＥＶＣ（High Efficiency Video Coding）等、周知の画像（動画像）符号化方法を採用することができる。多視点画像の符号化とともに、デプスマップも符号化を行う。符号化部１１４で生成された画像符号化データを、符号化装置１００の出力として、出力する。 The coding unit 114 encodes the downsampled image. Any image coding method can be used for the coding process, for example, MPEG (Moving Picture Experts Group), H.264 / AVC (Advanced Video Coding), H.265 / HEVC (High Efficiency Video Coding). Etc., a well-known image (moving image) coding method can be adopted. Along with encoding the multi-viewpoint image, the depth map is also encoded. The image coding data generated by the coding unit 114 is output as the output of the coding device 100.

本実施形態では、伝送・記録のための情報量の削減を、視点間引きとダウンサンプリングの両者を用いて行っている。視点の間引き率と画像のサンプリング率は、共に調整可能であり、画像の特性に応じて選択することが望ましい。例えば、視点を間引いても画像劣化の少ない（復元し易い）多視点画像は、視点の間引き率を高くして情報量を削減し、また、画面解像度を低くしても画像劣化の少ない画像（空間高周波成分の少ない画像）は、画像サンプリング率を低く設定して情報量を削減することができる。 In the present embodiment, the amount of information for transmission / recording is reduced by using both viewpoint thinning and downsampling. Both the thinning rate of the viewpoint and the sampling rate of the image can be adjusted, and it is desirable to select them according to the characteristics of the image. For example, a multi-viewpoint image with little image deterioration (easy to restore) even if the viewpoint is thinned out is an image with little image deterioration even if the screen resolution is lowered while increasing the thinning rate of the viewpoint to reduce the amount of information. For images with few spatial high frequency components), the image sampling rate can be set low to reduce the amount of information.

なお、本発明では、ダウンサンプリングによって失われた高周波成分の画像を、復号側で品質良く復元できることから、ダウンサンプリングを主に利用することが望ましく、更に、ダウンサンプリングのみによって情報量の削減を行うことも可能である。 In the present invention, it is desirable to mainly use downsampling because the image of the high frequency component lost by downsampling can be restored with good quality on the decoding side, and further, the amount of information is reduced only by downsampling. It is also possible.

次に、本発明では、復号側での多視点画像のアップサンプリングの手段として、多視点画像を変換した要素画像の補間を行う。さらに本発明では補間要素画像の生成に機械学習を用いる。そのため、学習モデル作成処理部１２０は、要素画像の補間のための学習モデル及び／又は学習パラメータを作成する。 Next, in the present invention, as a means of upsampling the multi-viewpoint image on the decoding side, interpolation of the element image obtained by converting the multi-viewpoint image is performed. Further, in the present invention, machine learning is used to generate an interpolated element image. Therefore, the learning model creation processing unit 120 creates a learning model and / or a learning parameter for interpolation of element images.

機械学習は、複数の隣接要素画像を並べてひとつの入力画像とし、教師あり学習を行う。この場合、要素画像間の関係は光学的な規則に基づいていることから学習結果はさまざまなコンテンツでの利用が可能になる。 In machine learning, supervised learning is performed by arranging a plurality of adjacent element images into one input image. In this case, since the relationship between the element images is based on the optical rules, the learning result can be used in various contents.

多視点画像要素画像変換部１２１は、入力された多視点画像を要素画像に変換する。ここで、多視点画像と要素画像の関係について、図３を用いて説明する。 The multi-viewpoint image element image conversion unit 121 converts the input multi-viewpoint image into an element image. Here, the relationship between the multi-viewpoint image and the element image will be described with reference to FIG.

図３（Ａ）は、多視点画像群（単に、多視点画像ということもある。）であり、画像群の中央部の２つの視点の画像を上部に拡大して示す。多視点画像群は、例えば、カメラアレイで撮影された画像であり、上部の２つの画像は、対象物を隣接したカメラで撮影した画像に対応する。多視点画像群を構成する各画像が１つの視点の画像に対応し、各視点画像は対象物に対して互いに視差を生じる。なお、多視点画像群を構成する各視点画像は、実際に撮影された画像のみではなく、視点内挿等により作成された画像を含んでもよい。 FIG. 3A is a multi-viewpoint image group (sometimes simply referred to as a multi-viewpoint image), and the images of the two viewpoints in the central portion of the image group are enlarged and shown at the top. The multi-viewpoint image group is, for example, an image taken by a camera array, and the upper two images correspond to an image of an object taken by an adjacent camera. Each image constituting the multi-viewpoint image group corresponds to an image of one viewpoint, and each viewpoint image causes parallax with respect to an object. In addition, each viewpoint image constituting the multi-viewpoint image group may include not only the image actually taken but also the image created by viewpoint interpolation or the like.

図３（Ｂ）は、要素画像群であり、中央部の複数の要素画像を上部に拡大して示す。要素画像群は、多視点画像群から変換して作成することができる。すなわち、多視点画像群を構成する各視点画像から、互いに同じ座標位置にある画素を集め、多視点画像群の全体の配置を保ったまま集積することで、１つの要素画像を生成する。例えば、２２×２２個のカメラアレイで撮影した多視点画像群（Ａ）から、２２×２２画素の要素画像が生成される。他の要素画像も同様に生成することにより、多視点画像群を要素画像群に変換することができる。 FIG. 3B is an element image group, and a plurality of element images in the central portion are enlarged and shown at the upper part. The element image group can be created by converting from the multi-viewpoint image group. That is, one element image is generated by collecting pixels at the same coordinate positions from each viewpoint image constituting the multi-viewpoint image group and accumulating them while maintaining the overall arrangement of the multi-viewpoint image group. For example, an element image of 22 × 22 pixels is generated from a multi-viewpoint image group (A) taken by a 22 × 22 camera array. By generating other element images in the same manner, the multi-viewpoint image group can be converted into the element image group.

多視点画像要素画像変換部１２１は、このような処理により、多視点画像群（Ａ）から要素画像群（Ｂ）への変換を行う。 The multi-viewpoint image element image conversion unit 121 converts the multi-viewpoint image group (A) into the element image group (B) by such processing.

本発明では、上述のように、補間要素画像の生成に機械学習を用いる。そのため、学習用画像生成部１２２は、機械学習装置の学習に用いるための画像を生成する。なお、機械学習装置としては、例えば、多層ニューラルネットワークのアルゴリズムを有するコンピュータやＳＶＭ（サポートベクターマシン）など、画像認識が可能な任意の構成のものであってよい。 In the present invention, as described above, machine learning is used to generate the interpolated element image. Therefore, the learning image generation unit 122 generates an image to be used for learning of the machine learning device. The machine learning device may have any configuration capable of image recognition, such as a computer having a multi-layer neural network algorithm or an SVM (support vector machine).

図４に、機械学習に使用する各種画像の例をイメージとして示す。図４（Ａ）は、機械学習での入力データ（入力画像）であり、図３（Ｂ）の要素画像群を例として、入力データを作成している。図４（Ｂ）は、入力データに対する正解画像（教師データ）である。 FIG. 4 shows examples of various images used for machine learning as images. FIG. 4A is input data (input image) in machine learning, and input data is created using the element image group of FIG. 3B as an example. FIG. 4B is a correct image (teacher data) for the input data.

図４（Ｃ）の機械学習装置の出力画像は、要素画像群に内挿する画像であり、復号装置２００の出力画像を生成する際に補間画像（補間要素画像）として使用される。したがって、元となる要素画像の画素数と等しい画素数の画像（図４の例では、要素画像は２２×２２画素（pixel）である。）とする。 The output image of the machine learning device of FIG. 4C is an image to be interpolated in the element image group, and is used as an interpolated image (interpolated element image) when generating the output image of the decoding device 200. Therefore, the number of pixels is equal to the number of pixels of the original element image (in the example of FIG. 4, the element image is 22 × 22 pixels (pixels)).

このとき、入力データは内挿対象の要素画像の隣接要素画像の集合とする。図４の例では、求めたい要素画像（内挿対象の要素画像：矢印で示す。）に対して、それに隣接する要素画像として、白丸で示した９枚の画像（隣接要素画像１〜９）を並べて入力画像とする。さらに、内挿対象の要素画像の位置を示す行列値（図４では、隣接要素画像１を基準位置として、［２，３］と設定）を入力メタデータとする。正解画像は、求めるべき内挿対象の要素画像（矢印）そのものを用いる。なお、入力メタデータはどのような形態であってもよく、例えば、５×５のマトリクスに位置を示すフラグを立てたり、２値データで位置を表してもよい。ここで、入力データとして、白丸で示した９枚の画像を選択したのは、符号化側で１／４のダウンサンプリングが行われ、１／４の要素画像から全体の要素画像群を復元する場合を想定している。学習データは符号化側の処理に応じて作成することが望ましい。 At this time, the input data is a set of adjacent element images of the element images to be interpolated. In the example of FIG. 4, nine images (adjacent element images 1 to 9) indicated by white circles as element images adjacent to the desired element image (element image to be interpolated: indicated by an arrow). Are arranged side by side as an input image. Further, a matrix value indicating the position of the element image to be interpolated (in FIG. 4, set to [2, 3] with the adjacent element image 1 as the reference position) is used as the input metadata. For the correct image, the element image (arrow) itself to be interpolated is used. The input metadata may be in any form, for example, a flag indicating the position may be set in a 5 × 5 matrix, or the position may be expressed by binary data. Here, the nine images indicated by white circles are selected as input data because 1/4 downsampling is performed on the coding side, and the entire element image group is restored from the 1/4 element image. I'm assuming a case. It is desirable to create the training data according to the processing on the coding side.

図２に戻り、学習モデル生成部１２３は、補間画像（補間要素画像）の作成に用いる機会学習の学習モデル及び学習パラメータを生成する。例えば、機械学習装置に対して、学習用画像生成部１２２からの図４（Ａ）に示す入力画像（入力データ）と図４（Ｂ）に示す正解画像とを訓練用画像として用いて、学習を行わせ、補間画像の作成に最適な学習モデル及び／又は学習パラメータを生成する。 Returning to FIG. 2, the learning model generation unit 123 generates a learning model and learning parameters for opportunity learning used for creating an interpolated image (interpolated element image). For example, for a machine learning device, learning is performed by using the input image (input data) shown in FIG. 4 (A) and the correct answer image shown in FIG. 4 (B) from the learning image generation unit 122 as training images. To generate a learning model and / or learning parameters that are optimal for creating an interpolated image.

学習の結果、最適な学習モデルと学習パラメータを習得した機械学習装置は、図４（Ａ）に示す入力画像が入力されると、正解画像に近似した補間画像（図４（Ｃ））を出力することができる。機械学習が終了すると、学習モデル生成部１２３は、得られた学習済みモデル・学習パラメータを出力する。なお、学習モデル及び／又は学習パラメータは、さらに符号化・変調を行って、復号装置に送信してもよい。 As a result of learning, the machine learning device that has acquired the optimum learning model and learning parameters outputs an interpolated image (FIG. 4 (C)) that approximates the correct image when the input image shown in FIG. 4 (A) is input. can do. When the machine learning is completed, the learning model generation unit 123 outputs the obtained learned model / learning parameter. The learning model and / or the learning parameters may be further encoded / modulated and transmitted to the decoding device.

学習モデル及び／又は学習パラメータは１種類に限られず、画像の特性に応じて、複数種類を用意してもよい。例えば、表示する３次元映像の奥行きに応じて、学習モデルを複数用意することができる。 The learning model and / or the learning parameter is not limited to one type, and a plurality of types may be prepared depending on the characteristics of the image. For example, a plurality of learning models can be prepared according to the depth of the displayed three-dimensional image.

次に、復号装置について説明する。 Next, the decoding device will be described.

［復号装置］
復号装置２００では、ダウンサンプリングされたデータから、画像を再生する。一般には、例えば、復号側で超解像度技術を用いて画面解像度を元のサイズに復元することが考えられる。しかし既存の超解像度技術を用いて多視点画像を元のサイズに復元する場合、画面サイズを小さくする処理時にサンプリング定理に基づき高周波成分がカットされてしまい、復号側での高周波成分の復元ができない。 [Decoding device]
The decoding device 200 reproduces an image from the downsampled data. In general, for example, it is conceivable that the decoding side uses super-resolution technology to restore the screen resolution to the original size. However, when restoring a multi-view image to its original size using existing super-resolution technology, the high-frequency component is cut based on the sampling theorem during the process of reducing the screen size, and the high-frequency component cannot be restored on the decoding side. ..

これに対して、本発明では、多視点画像群を要素画像群に変換し、その後要素画像の補間を行う。これは、インテグラル映像を構成するための多視点画像間の視点間距離が一般的に短いことから、サンプリング定理に基づき失われた高周波成分は隣接する多視点画像に含まれていることを考慮している。そのため複数の多視点画像の画素から構成される要素画像を補間することにより、符号化時のサイズ変更（ダウンサンプリング）により失われた高周波成分の情報量を復元することが可能になる。 On the other hand, in the present invention, the multi-viewpoint image group is converted into the element image group, and then the element image is interpolated. This is because the distance between viewpoints between multi-view images for composing integral images is generally short, so it is taken into consideration that the high-frequency components lost based on the sampling theorem are included in adjacent multi-view images. doing. Therefore, by interpolating an element image composed of pixels of a plurality of multi-viewpoint images, it is possible to restore the amount of information of high-frequency components lost due to size change (downsampling) at the time of coding.

さらに本発明では要素画像の補間に機械学習を用いる。符号化側で作成した学習モデル及び／又は学習パラメータを利用して、復号側で学習済みの機械学習装置を構成することにより、精度の良い補間画像を作成することができる。 Further, in the present invention, machine learning is used for interpolation of element images. By using the learning model and / or the learning parameter created on the coding side to configure the machine learning device trained on the decoding side, it is possible to create an accurately interpolated image.

図５は、本発明の復号装置２００のブロック図の一例である。図５の実施形態は、視点内挿後にアップサンプリングを行う復号方法を具体化したものである。復号部２１１、視点内挿処理部２１２、多視点画像要素画像変換部２１３、及び要素画像補間用入力画像生成部２１４が、図１の復号処理部２１０に相当し、機械学習部２２１、及び補間要素画像生成部２２２が、図１の機械学習処理部２２０に相当し、要素画像内挿部２３１が図１の出力画像生成部２３０に相当する。以下、各ブロックについて説明する。 FIG. 5 is an example of a block diagram of the decoding device 200 of the present invention. The embodiment of FIG. 5 embodies a decoding method in which upsampling is performed after interpolation of the viewpoint. The decoding unit 211, the viewpoint interpolation processing unit 212, the multi-viewpoint image element image conversion unit 213, and the element image interpolation input image generation unit 214 correspond to the decoding processing unit 210 in FIG. 1, and the machine learning unit 221 and the interpolation unit. The element image generation unit 222 corresponds to the machine learning processing unit 220 of FIG. 1, and the element image interpolation unit 231 corresponds to the output image generation unit 230 of FIG. Hereinafter, each block will be described.

符号化装置１００にて符号化された画像符号化データが、復号部２１１に入力される。復号部２１１は、入力された画像符号化データを、符号化に対応する復号方法により復号する。復号された画像データは、視点間引き及びダウンサンプリングされた多視点画像である。なお、画像符号化データにデプスマップが含まれている場合は、デプスマップも復号する。また、画像符号化データにデプスマップが含まれていない場合は、復号された多視点画像からデプスマップを作成する。復号された画像データは、視点内挿処理部２１２及び多視点画像要素画像変換部２１３に出力される。 The image coded data encoded by the coding device 100 is input to the decoding unit 211. The decoding unit 211 decodes the input image-encoded data by a decoding method corresponding to the coding. The decoded image data is a multi-viewpoint image that has been thinned out and downsampled. If the image-encoded data includes a depth map, the depth map is also decoded. If the image-encoded data does not include a depth map, a depth map is created from the decoded multi-viewpoint image. The decoded image data is output to the viewpoint interpolation processing unit 212 and the multi-viewpoint image element image conversion unit 213.

視点内挿処理部２１２は、入力された画像データ（視点間引き及びダウンサンプリングされた多視点画像）に対して、視点内挿を行い、間引きされた視点を復元する。画像符号化データから復号されたデプスマップ、又は、作成されたデプスマップを利用して、より正確な視点内挿を行うことが望ましい。視点内挿により生成された視点の画像は、多視点画像要素画像変換部２１３に出力される。 The viewpoint interpolation processing unit 212 performs viewpoint interpolation on the input image data (viewpoint thinning and downsampled multi-viewpoint image), and restores the thinned out viewpoint. It is desirable to perform more accurate viewpoint interpolation by using the depth map decoded from the image-encoded data or the created depth map. The viewpoint image generated by the viewpoint interpolation is output to the multi-viewpoint image element image conversion unit 213.

多視点画像要素画像変換部２１３は、復号された多視点画像と内挿された視点の画像に基づいて、視点数が復元された多視点画像（ただし、画像はダウンサンプリングされている。）を生成し、これを要素画像（要素画像群）に変換する。多視点画像から要素画像への変換処理は、符号化装置において説明した処理内容と同じである。多視点画像が元の１／４にダウンサンプリングされている場合は、変換された要素画像群は、元の１／４の要素画像数の（すなわち、図４（Ａ）の白丸を付した要素画像が集まった）要素画像群となる。変換された要素画像群は、要素画像補間用入力画像生成部２１４と要素画像内挿部２３１に出力される。 The multi-viewpoint image element image conversion unit 213 creates a multi-viewpoint image (however, the image is downsampled) in which the number of viewpoints is restored based on the decoded multi-viewpoint image and the interpolated viewpoint image. Generate and convert this into an element image (element image group). The conversion process from the multi-viewpoint image to the element image is the same as the processing content described in the coding apparatus. When the multi-viewpoint image is downsampled to 1/4 of the original, the converted element image group is the element with the number of element images of 1/4 of the original (that is, the element with the white circle in FIG. 4 (A)). It becomes an element image group (a collection of images). The converted element image group is output to the element image interpolation input image generation unit 214 and the element image interpolation unit 231.

要素画像補間用入力画像生成部２１４は、補間要素画像を生成するのに使用する入力データ及び入力メタデータを作成する。変換された要素画像群に基づいて、求めたい補間要素画像（内挿対象の要素画像）の周囲の隣接要素画像を選択し、図４（Ａ）に示す入力画像を生成する。なお、入力された要素画像群は元の１／４の要素画像数（白丸を付した要素画像のみ）であるから、入力された要素画像群の一部領域をそのまま入力画像（入力データ）とすればよい。また、求めたい要素画像の位置を示すデータを作成し、入力メタデータとする。要素画像補間用入力画像生成部２１４は、求めたい補間要素画像のための入力データ及び入力メタデータを順次作成し、補間要素画像生成部２２２に出力する。 The element image interpolation input image generation unit 214 creates the input data and the input metadata used to generate the interpolation element image. Based on the converted element image group, the adjacent element image around the interpolated element image (element image to be interpolated) to be obtained is selected, and the input image shown in FIG. 4A is generated. Since the input element image group has 1/4 of the original number of element images (only the element images with white circles), a part of the input element image group is used as the input image (input data) as it is. do it. In addition, data indicating the position of the element image to be obtained is created and used as input metadata. The element image interpolation input image generation unit 214 sequentially creates input data and input metadata for the desired interpolation element image, and outputs the input data to the interpolation element image generation unit 222.

機械学習部２２１は、入力された学習モデル及び／又は学習パラメータに基づいて、機械学習装置（機械学習機能）を再構成する。学習モデル・学習パラメータが符号化／変調等されている場合は、復調／復号を事前に行う。入力された学習モデル・学習パラメータは、符号化側で図４の入力画像及び正解画像に基づいて教師あり学習を行うことにより得られた、最適化された学習モデル・学習パラメータであるから、このデータに基づいて機械学習装置を再構成することにより、学習済みの機械学習機能が再現できる。 The machine learning unit 221 reconfigures the machine learning device (machine learning function) based on the input learning model and / or learning parameters. Learning model ・ If the learning parameters are encoded / modulated, demodulation / decoding is performed in advance. Since the input learning model / learning parameter is an optimized learning model / learning parameter obtained by performing supervised learning based on the input image and the correct answer image of FIG. 4 on the coding side, this By reconfiguring the machine learning device based on the data, the learned machine learning function can be reproduced.

なお、学習モデル及び／又は学習パラメータは１種類に限られず、画像の特性に応じて、複数種類を用意してもよく、複数の機械学習装置（学習モデル）を準備してもよい。例えば、視差量は映像の奥行で大きく変わることから、これを学習モデルに反映させ、表示する３次元映像の奥行きに応じた学習モデル及び／又は学習パラメータを複数用意する。そして、デプスマップから得られる表示エリアの奥行きに応じて、学習モデル及び／又は学習パラメータを切り替え、最適な学習モデル及び／又は学習パラメータに基づく機械学習機能を補間要素画像生成部２２２で利用可能にすることとしてもよい。 The learning model and / or the learning parameter is not limited to one type, and a plurality of types may be prepared or a plurality of machine learning devices (learning models) may be prepared depending on the characteristics of the image. For example, since the amount of parallax changes greatly depending on the depth of the image, this is reflected in the learning model, and a plurality of learning models and / or learning parameters are prepared according to the depth of the three-dimensional image to be displayed. Then, the learning model and / or the learning parameter is switched according to the depth of the display area obtained from the depth map, and the machine learning function based on the optimum learning model and / or the learning parameter can be used by the interpolation element image generation unit 222. You may do it.

補間要素画像生成部２２２は、要素画像補間用入力画像生成部２１４で生成された入力データ及び入力メタデータに基づき、機械学習部２２１で再現された学習済みの機械学習機能により、補間要素画像を生成する。生成された補間要素画像は、要素画像内挿部２３１に出力される。なお、ここでは、機械学習部２２１で再現された学習済みの機械学習機能を補間要素画像生成部２２２に移植することとして説明したが、機械学習部２２１と補間要素画像生成部２２２を実質的に一体のものとして、機械学習部が補間要素画像を生成するようにしてもよい。 The interpolation element image generation unit 222 uses the learned machine learning function reproduced by the machine learning unit 221 based on the input data and the input metadata generated by the input image generation unit 214 for element image interpolation to generate the interpolation element image. Generate. The generated interpolated element image is output to the element image interpolation unit 231. Although the description has been given here as porting the learned machine learning function reproduced by the machine learning unit 221 to the interpolation element image generation unit 222, the machine learning unit 221 and the interpolation element image generation unit 222 are substantially combined. As a unit, the machine learning unit may generate an interpolating element image.

要素画像内挿部２３１は、多視点画像要素画像変換部２１３から入力された要素画像群に対して、補間要素画像生成部２２２で生成された要素画像を内挿する。本実施形態では、１／４の要素画像群に対して、３／４の補間要素画像を作成して内挿する。この要素画像の内挿処理は、多視点画像のアップサンプリング（画面解像度の復元）を行うことと等価である。さらに、要素画像内挿部２３１では補間した要素画像と復号後の要素画像の配置や要素画像の境界を目立たなくする平滑処理などを行う。出力画像は、入力画像の多視点画像群の視点数と一致する画素数からなる要素画像から構成される要素画像群である。 The element image interpolation unit 231 interpolates the element image generated by the interpolation element image generation unit 222 into the element image group input from the multi-viewpoint image element image conversion unit 213. In the present embodiment, a 3/4 interpolated element image is created and interpolated with respect to the 1/4 element image group. The interpolation process of this element image is equivalent to upsampling (restoring the screen resolution) of the multi-viewpoint image. Further, the element image interpolating unit 231 performs arrangement of the interpolated element image and the decoded element image, smoothing processing for making the boundary of the element image inconspicuous, and the like. The output image is an element image group composed of element images having the same number of pixels as the number of viewpoints of the multi-view image group of the input image.

これにより、符号化対象となった多視点画像群（符号化装置１００への入力画像）と等価な要素画像群を生成することができ、これを復号装置２００の出力画像として出力する。 As a result, an element image group equivalent to the multi-viewpoint image group (input image to the coding device 100) to be encoded can be generated, and this is output as an output image of the decoding device 200.

本実施形態では、出力画像に基づいてインテグラル立体を表示させることを前提として、要素画像群を出力画像としたが、例えば、多視点映像を表示させるためには、要素画像多視点画像変換手段を介して、多視点画像を出力画像としてもよい。 In the present embodiment, the element image group is used as the output image on the premise that the integral solid is displayed based on the output image. For example, in order to display the multi-view image, the element image multi-view image conversion means. A multi-viewpoint image may be used as an output image via.

図６は、本発明の復号装置２００のブロック図の別の例である。図６の実施形態は、アップサンプリングを行った後に視点内挿を行う復号方法を具体化したものである。復号部２１１、多視点画像要素画像変換部２１３、及び要素画像補間用入力画像生成部２１４が、図１の復号処理部２１０に相当し、機械学習部２２１、及び補間要素画像生成部２２２が、図１の機械学習処理部２２０に相当し、要素画像内挿部２３１、要素画像多視点画像変換部２３２、視点内挿処理部２３３、及び多視点画像要素画像変換部２３４が、図１の出力画像生成部２３０に相当する。 FIG. 6 is another example of a block diagram of the decoding device 200 of the present invention. The embodiment of FIG. 6 embodies a decoding method in which viewpoint interpolation is performed after upsampling. The decoding unit 211, the multi-viewpoint image element image conversion unit 213, and the element image interpolation input image generation unit 214 correspond to the decoding processing unit 210 of FIG. 1, and the machine learning unit 221 and the interpolation element image generation unit 222 The element image insertion unit 231, the element image multi-viewpoint image conversion unit 232, the viewpoint insertion processing unit 233, and the multi-viewpoint image element image conversion unit 234 correspond to the machine learning processing unit 220 of FIG. It corresponds to the image generation unit 230.

以下、各ブロックについて説明するが、図５と同じブロックは同じ符号で示し、図５と重複する内容は説明を簡略化する。 Hereinafter, each block will be described, but the same blocks as those in FIG. 5 are indicated by the same reference numerals, and the contents overlapping with FIG. 5 will be simplified.

復号部２１１は、入力された画像符号化データを、符号化に対応する復号方法により復号する。復号された画像データは、視点間引き及びダウンサンプリングされた多視点画像である。なお、画像符号化データにデプスマップが含まれている場合は、デプスマップも復号する。含まれていない場合は、復号された多視点画像からデプスマップを作成する。復号された画像データは、多視点画像要素画像変換部２１３に出力される。 The decoding unit 211 decodes the input image-encoded data by a decoding method corresponding to the coding. The decoded image data is a multi-viewpoint image that has been thinned out and downsampled. If the image-encoded data includes a depth map, the depth map is also decoded. If it is not included, create a depth map from the decoded multi-view image. The decoded image data is output to the multi-viewpoint image element image conversion unit 213.

多視点画像要素画像変換部２１３は、復号された多視点画像を要素画像（要素画像群）に変換する。本実施形態では、この段階での視点内挿処理は行わない。したがって、変換された要素画像群は、視点間引き率に応じた画素数の小さい要素画像が、ダウンサンプリングに応じた（例えば、元の１／４の）要素画像数だけ集まった、要素画像群となる。変換された要素画像群は、要素画像補間用入力画像生成部２１４に出力される。 The multi-viewpoint image element image conversion unit 213 converts the decoded multi-viewpoint image into an element image (element image group). In the present embodiment, the viewpoint interpolation process is not performed at this stage. Therefore, the converted element image group is an element image group in which element images having a small number of pixels according to the viewpoint thinning rate are collected by the number of element images corresponding to downsampling (for example, 1/4 of the original). Become. The converted element image group is output to the element image interpolation input image generation unit 214.

要素画像補間用入力画像生成部２１４は、補間要素画像を生成するのに使用する入力データ及び入力メタデータを作成する。変換された要素画像群に基づいて、求めたい補間要素画像の周囲の隣接要素画像を選択し、図４（Ａ）と同様に入力画像（入力データ）を生成し、また、求めたい要素画像の位置を示す入力メタデータを作成する。作成された入力データ及び入力メタデータは、補間要素画像生成部２２２に出力される。変換された要素画像群は要素画像内挿部２３１にも出力する。なお、要素画像群は、多視点画像要素画像変換部２１３から直接要素画像内挿部２３１に出力してもよい。 The element image interpolation input image generation unit 214 creates the input data and the input metadata used to generate the interpolation element image. Based on the converted element image group, the adjacent element image around the interpolated element image to be obtained is selected, an input image (input data) is generated in the same manner as in FIG. 4 (A), and the element image to be obtained is obtained. Create input metadata that indicates the location. The created input data and input metadata are output to the interpolation element image generation unit 222. The converted element image group is also output to the element image interpolation unit 231. The element image group may be output directly from the multi-viewpoint image element image conversion unit 213 to the element image interpolation unit 231.

機械学習部２２１は、入力された学習モデル及び／又は学習パラメータに基づいて、機械学習装置（機械学習機能）を再構成する。なお、表示する３次元映像の奥行きに応じた学習モデルを複数用意し、デプスマップから得られる表示エリアの奥行きに応じて、学習モデルを切り替えることとしてもよい。 The machine learning unit 221 reconfigures the machine learning device (machine learning function) based on the input learning model and / or learning parameters. A plurality of learning models may be prepared according to the depth of the three-dimensional image to be displayed, and the learning models may be switched according to the depth of the display area obtained from the depth map.

補間要素画像生成部２２２は、要素画像補間用入力画像生成部２１４で生成された入力データ及び入力メタデータに基づき、機械学習部２２１で再現された学習済みの機械学習機能により、補間要素画像を生成する。本実施形態では、生成される補間要素画は、多視点画像要素画像変換部２１３で変換された要素画像と同じく、視点間引きに対応して画素数の小さい補間画像となる。生成された補間要素画像は、要素画像内挿部２３１に出力される。なお、機械学習部２２１と補間要素画像生成部２２２を実質的に一体の処理部としてもよい。 The interpolation element image generation unit 222 uses the learned machine learning function reproduced by the machine learning unit 221 based on the input data and the input metadata generated by the input image generation unit 214 for element image interpolation to generate the interpolation element image. Generate. In the present embodiment, the generated interpolated element image is an interpolated image having a small number of pixels corresponding to the thinning of viewpoints, like the element image converted by the multi-viewpoint image element image conversion unit 213. The generated interpolated element image is output to the element image interpolation unit 231. The machine learning unit 221 and the interpolation element image generation unit 222 may be substantially integrated as a processing unit.

要素画像内挿部２３１は、多視点画像要素画像変換部２１３で変換された要素画像群に対して、補間要素画像生成部２２２で生成された要素画像を内挿する。本実施形態では、この内挿処理により、多視点画像のアップサンプリング（画面解像度の復元）を行ったこととなるが、各要素画像は視点間引き率に応じて画素数が小さい。そこで、内挿処理をした要素画像群を要素画像多視点画像変換部２３２に出力する。 The element image interpolating unit 231 interpolates the element image generated by the interpolation element image generation unit 222 into the element image group converted by the multi-viewpoint image element image conversion unit 213. In the present embodiment, the multi-viewpoint image upsampling (restoration of screen resolution) is performed by this interpolation processing, but the number of pixels of each element image is small according to the viewpoint thinning rate. Therefore, the interpolated element image group is output to the element image multi-viewpoint image conversion unit 232.

要素画像多視点画像変換部２３２は、入力された要素画像群を多視点画像に変換する。この段階では、視点間引きされた多視点画像が生成される。この多視点画像を、視点内挿処理部２３３に出力する。 The element image multi-viewpoint image conversion unit 232 converts the input element image group into a multi-viewpoint image. At this stage, a multi-viewpoint image with the viewpoints thinned out is generated. This multi-viewpoint image is output to the viewpoint interpolation processing unit 233.

視点内挿処理部２３３は、入力された画像（視点間引きされた多視点画像）に対して、視点内挿を行い、間引きされた視点を復元する。この際、デプスマップを利用してより正確な視点内挿を行うことが望ましい。内挿された視点の画像は、多視点画像要素画像変換部２３４に出力される。 The viewpoint interpolation processing unit 233 performs viewpoint interpolation on the input image (multi-viewpoint image with the viewpoint thinned out) and restores the thinned out viewpoint. At this time, it is desirable to perform more accurate viewpoint interpolation using the depth map. The interpolated viewpoint image is output to the multi-viewpoint image element image conversion unit 234.

多視点画像要素画像変換部２３４は、視点が復元された多視点画像を要素画像に変換する。これにより、符号化対象となった多視点画像群（符号化装置１００への入力画像）と等価な要素画像群を生成することができ、これを復号装置２００の出力画像として出力する。 The multi-viewpoint image element image conversion unit 234 converts the multi-viewpoint image whose viewpoint has been restored into an element image. As a result, an element image group equivalent to the multi-viewpoint image group (input image to the coding device 100) to be encoded can be generated, and this is output as an output image of the decoding device 200.

なお、本実施形態では、インテグラル立体を表示させることを前提として、要素画像群を出力画像としたが、例えば、出力画像に基づいて多視点映像を表示させる場合には、多視点画像要素画像変換部２３４を省略して、多視点画像を出力画像としてもよい。 In the present embodiment, the element image group is used as the output image on the premise that the integral solid is displayed. However, for example, when displaying the multi-viewpoint image based on the output image, the multi-viewpoint image element image The conversion unit 234 may be omitted, and the multi-viewpoint image may be used as the output image.

上記の実施の形態では、符号化装置１００の構成と動作について説明したが、本発明はこれに限らず、入力画像を符号化する符号化方法として構成されてもよい。すなわち、図２のデータの流れに従って、多視点画像の入力画像から画像符号化データと、学習モデル及び／又は学習パラメータを生成する符号化方法として構成されてもよい。また、復号装置２００の構成と動作について説明したが、本発明はこれに限らず、画像符号化データを復号する復号方法として構成されてもよい。すなわち、図５又は図６のデータの流れに従って、多視点画像の画像符号化データと、学習モデル及び／又は学習パラメータから、画像を復号し、要素画像群の出力画像を生成する復号方法として構成されてもよい。 In the above-described embodiment, the configuration and operation of the coding device 100 have been described, but the present invention is not limited to this, and may be configured as a coding method for coding an input image. That is, it may be configured as a coding method for generating image coding data and a learning model and / or learning parameters from an input image of a multi-viewpoint image according to the data flow of FIG. Further, although the configuration and operation of the decoding device 200 have been described, the present invention is not limited to this, and may be configured as a decoding method for decoding image-encoded data. That is, it is configured as a decoding method that decodes an image from the image-encoded data of the multi-viewpoint image and the learning model and / or the learning parameter according to the data flow of FIG. 5 or 6, and generates an output image of the element image group. May be done.

なお、上述した符号化装置１００又は復号装置２００として機能させるためにコンピュータを好適に用いることができ、そのようなコンピュータは、符号化装置１００又は復号装置２００の各機能を実現する処理内容を記述したプログラムを該コンピュータの記憶部に格納しておき、該コンピュータのＣＰＵによってこのプログラムを読み出して実行させることで実現することができる。なお、このプログラムは、コンピュータ読取り可能な記録媒体に記録可能である。 A computer can be preferably used to function as the coding device 100 or the decoding device 200 described above, and such a computer describes the processing content for realizing each function of the coding device 100 or the decoding device 200. This can be realized by storing the program in the storage unit of the computer and reading and executing this program by the CPU of the computer. This program can be recorded on a computer-readable recording medium.

上述の実施形態は代表的な例として説明したが、本発明の趣旨及び範囲内で、多くの変更及び置換ができることは当業者に明らかである。したがって、本発明は、上述の実施形態によって制限するものと解するべきではなく、特許請求の範囲から逸脱することなく、種々の変形や変更が可能である。例えば、実施形態に記載の複数の構成ブロックを１つに組み合わせたり、あるいは１つの構成ブロックを分割したりすることが可能である。 Although the above embodiments have been described as typical examples, it will be apparent to those skilled in the art that many modifications and substitutions can be made within the spirit and scope of the present invention. Therefore, the present invention should not be construed as being limited by the above-described embodiments, and various modifications and modifications can be made without departing from the scope of claims. For example, it is possible to combine the plurality of constituent blocks described in the embodiment into one, or to divide one constituent block into one.

１００符号化装置
１１０符号化処理部
１１１視点間引き部
１１２デプスマップ生成部
１１３ダウンサンプリング部
１１４符号化部
１２０学習モデル作成処理部
１２１多視点画像要素画像変換部
１２２学習用画像生成部
１２３学習モデル生成部
２００復号装置
２１０復号処理部
２１１復号部
２１２視点内挿処理部
２１３多視点画像要素画像変換部
２１４要素画像補間用入力画像生成部
２２０機械学習処理部
２２１機械学習部
２２２補間要素画像生成部
２３０出力画像生成部
２３１要素画像内挿部
２３２要素画像多視点画像変換部
２３３視点内挿処理部
２３４多視点画像要素画像変換部
100 Coding device 110 Coding processing unit 111 Perspective thinning unit 112 Depth map generation unit 113 Downsampling unit 114 Coding unit 120 Learning model creation processing unit 121 Multi-viewpoint image element Image conversion unit 122 Learning image generation unit 123 Learning model generation Unit 200 Decoding device 210 Decoding processing unit 211 Decoding unit 212 Viewpoint interpolation processing unit 213 Multi-viewpoint image element image conversion unit 214 Element image interpolation input image generation unit 220 Machine learning processing unit 221 Machine learning unit 222 Interpolation element image generation unit 230 Output image generation unit 231 Element image interpolation unit 232 Element image Multi-viewpoint image conversion unit 233 Viewpoint interpolation processing unit 234 Multi-viewpoint image Element image conversion unit

Claims

Using a multi-viewpoint image as an input image
A coding processing unit that downsamples the input image and then encodes the image.
A coding device including a learning model for machine learning used for upsampling the image and / or a learning model creation processing unit that generates learning parameters based on the input image.

The coding device according to claim 1, wherein the coding processing unit further performs a viewpoint thinning process for thinning out a multi-view image.

The coding device according to claim 1 or 2, wherein the coding processing unit generates a depth map from the input image.

In the coding apparatus according to any one of claims 1 to 3, the learning model creation processing unit converts the multi-viewpoint image into an element image group, and the element image to be inserted is the element. An encoding device characterized in that machine learning is performed using an adjacent element image of an image as input data.

A decoding processing unit that decodes a multi-viewpoint image from image-encoded data and further converts the multi-viewpoint image into an element image group.
A machine learning processing unit that generates an interpolation element image by a machine learning function based on the input learning model and / or learning parameters.
A decoding device including an output image generation unit that interpolates the interpolated element image into the element image group and generates an output image.

The decoding device according to claim 5, wherein the decoding processing unit performs viewpoint interpolation on the decoded multi-viewpoint image.

In the decoding device according to claim 5, the output image generation unit converts the element image group into a multi-viewpoint image after interpolating the interpolated element image, and the converted multi-viewpoint image is converted. A decoding device characterized by performing viewpoint interpolation.

In the decoding apparatus according to any one of claims 5 to 7, a plurality of the learning models and / or learning parameters according to the depth of the image to be displayed are prepared, and the learning parameters are prepared according to the depth of the image obtained from the depth map. , The decoding device, characterized in that the learning model and / or the learning parameters are switched.

A program for causing a computer to function as the encoding device according to any one of claims 1 to 4.

A program for causing a computer to function as the decoding device according to any one of claims 5 to 8.