JP2021044659A

JP2021044659A - Encoding device, decoding device and program

Info

Publication number: JP2021044659A
Application number: JP2019164372A
Authority: JP
Inventors: 一宏原; Kazuhiro Hara; 三科　智之; Tomoyuki Mishina; 智之三科
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2019-09-10
Filing date: 2019-09-10
Publication date: 2021-03-18
Anticipated expiration: 2039-09-10
Also published as: JP7382186B2

Abstract

To provide an encoding device, a decoding device and a program that can perform compressing/encoding of a multi-viewpoint video in high encoding efficiency.SOLUTION: An encoding device comprises: a viewpoint thinning-out unit that performs viewpoint thinning-out processing of a multi-viewpoint image group; a multi-viewpoint image element image conversion unit that converts the viewpoint thinned-out multi-viewpoint image group to an encoding element image group; and an encoding unit that performs encoding processing of the encoding element image group. The decoding device comprises: a decoding unit that decodes image encoding data, and that creates the encoding element image group; an element image multi-viewpoint image conversion unit that converts the encoding element image group to the multi-viewpoint image group; a depth estimation unit that performs depth estimation of each multi-viewpoint image on the basis of the multi-viewpoint image group, and that creates a depth map; and a viewpoint interpolation processing unit that performs viewpoint interpolation between viewpoints of the multi-viewpoint image group on the basis of the multi-viewpoint image group and the created depth map.SELECTED DRAWING: Figure 1

Description

本発明は、符号化装置、復号装置、及びプログラムに関し、特に、インテグラル３Ｄ（３次元）映像の表示や自由視点映像の表示に必要となる多視点画像の符号化装置、復号装置、及びプログラムに関する。 The present invention relates to a coding device, a decoding device, and a program, and more particularly, a multi-viewpoint image coding device, a decoding device, and a program necessary for displaying an integral 3D (three-dimensional) image and a free-viewpoint image. Regarding.

インテグラル３Ｄ映像を表示する要素画像群を撮影することができるカメラとして、撮像素子のセンサーの手前にレンズアレイを配置するライトフィールドカメラが製品化されている。しかし、一般にライトフィールドカメラは撮影後のリフォーカス機能を目的としている。そのため、ライトフィールドカメラで撮影した画像を用いてインテグラル３Ｄ映像を表示すると、ライトフィールドカメラを構成するメインレンズの直径が、被写体までの距離に比べて小さな値となることから、運動視差が小さく、３次元映像の奥行を十分に再現することができない。この問題は、メインレンズの直径を大きくすることやカメラと被写体との距離を短くすることで理論上は解決することができるが、これらの対策による問題解決は実用的ではない。 As a camera capable of capturing an element image group for displaying an integral 3D image, a light field camera in which a lens array is arranged in front of a sensor of an image sensor has been commercialized. However, in general, a light field camera aims at a refocus function after shooting. Therefore, when an integral 3D image is displayed using an image taken by a light field camera, the diameter of the main lens constituting the light field camera becomes a small value compared to the distance to the subject, so that the motion disparity is small. It is not possible to sufficiently reproduce the depth of a three-dimensional image. This problem can be theoretically solved by increasing the diameter of the main lens or shortening the distance between the camera and the subject, but it is not practical to solve the problem by these measures.

そこで、通常のカメラを水平・垂直の２次元配列に並べたカメラアレイを用いて、多視点映像を撮影することが考えられている。この場合の要素画像群の生成は、カメラアレイで撮影された複数の映像から視点内挿処理を用いることでカメラ間の視点映像を生成、その後、カメラアレイで撮影した映像と視点内挿映像から要素画像群に変換する処理が行われる（特許文献１）。ここで、カメラアレイのカメラ間距離は、カメラから被写体までの距離や、視点内挿が実用的に可能な距離、表示装置で再現できる視域角によって設計できることが知られている。また、視点内挿処理ではカメラから被写体までの距離を相対的に表現するデプスマップ（奥行き画像）を用いることで高精度な内挿画像の生成が行われている。デプスマップは、画像処理技術による奥行き推定や赤外線を用いて光学的に距離を測定する方法で生成される。このデプスマップ生成の精度を上げると、視点内挿の精度も向上する。 Therefore, it is considered to shoot a multi-viewpoint image by using a camera array in which ordinary cameras are arranged in a horizontal and vertical two-dimensional array. In this case, the element image group is generated by using the viewpoint interpolation processing from a plurality of images taken by the camera array to generate the viewpoint image between the cameras, and then from the image taken by the camera array and the viewpoint interpolation image. A process of converting into an element image group is performed (Patent Document 1). Here, it is known that the inter-camera distance of the camera array can be designed by the distance from the camera to the subject, the distance where viewpoint interpolation is practically possible, and the viewing range angle that can be reproduced by the display device. Further, in the viewpoint interpolation processing, a highly accurate interpolation image is generated by using a depth map (depth image) that relatively expresses the distance from the camera to the subject. Depth maps are generated by depth estimation using image processing technology or optical distance measurement using infrared rays. Increasing the accuracy of this depth map generation also improves the accuracy of viewpoint interpolation.

インテグラル３Ｄ映像の表示について、３次元映像を再現できる奥行は隣接する多視点画像間の視差、レンズアレイの焦点距離、および要素画像の画素数に関係する。その中でも３次元映像を再現できる奥行きを広げるためには、要素画像の画素数を増やすことが有効であると知られている。この場合、要素画像の画素数は多視点画像の視点数と等しくなることから、奥行きのある３次元映像を生成するためには符号化対象となる多視点画像の視点数が多く必要になり、３次元映像を表示するための情報量は膨大となる。 Regarding the display of an integral 3D image, the depth at which a three-dimensional image can be reproduced is related to the parallax between adjacent multi-viewpoint images, the focal length of the lens array, and the number of pixels of the element image. Among them, it is known that it is effective to increase the number of pixels of the element image in order to increase the depth at which the three-dimensional image can be reproduced. In this case, since the number of pixels of the element image is equal to the number of viewpoints of the multi-viewpoint image, a large number of viewpoints of the multi-viewpoint image to be encoded is required to generate a deep three-dimensional image. The amount of information for displaying a three-dimensional image is enormous.

インテグラル３Ｄ映像の伝送や記録では、３次元映像を表示するための膨大な情報量を符号化する。符号化では、要素画像群を多視点画像群に変換後に多視点映像符号化を行う方法や、変換後の多視点映像を符号化時に間引き、復号時に視点内挿する方法が知られている。 In the transmission and recording of integral 3D video, a huge amount of information for displaying a three-dimensional video is encoded. In coding, a method of converting an element image group into a multi-viewpoint image group and then performing multi-viewpoint video coding, and a method of thinning out the converted multi-viewpoint video at the time of coding and interpolating the viewpoint at the time of decoding are known.

また将来的には、インテグラル３Ｄ映像のテレビ放送も想定されている。インテグラル３Ｄ映像のテレビ放送を行う際には、従来のテレビ放送（以下、２Ｄ（２次元）用放送映像という。）との互換性を確保することが要求される。 In the future, television broadcasting of integral 3D images is also envisioned. When performing television broadcasting of integral 3D video, it is required to ensure compatibility with conventional television broadcasting (hereinafter referred to as 2D (two-dimensional) broadcast video).

特開２０１６−１５８２１３号公報Japanese Unexamined Patent Publication No. 2016-158213

しかしながら、従来の多視点映像符号化では、符号化の対象である多視点画像間の視差が大きくなると視点補償予測の精度が低下し、符号化効率が悪化する。また、多視点映像の視点数を増やして符号化する場合には、符号化を行う３次元映像の情報量が多くなってしまい高効率な圧縮ができない。 However, in the conventional multi-viewpoint video coding, when the parallax between the multi-viewpoint images to be coded becomes large, the accuracy of the viewpoint compensation prediction is lowered and the coding efficiency is deteriorated. Further, when the number of viewpoints of the multi-viewpoint video is increased and the coding is performed, the amount of information of the three-dimensional video to be encoded becomes large, and highly efficient compression cannot be performed.

また、インテグラル３Ｄ映像のテレビ放送と、従来の２Ｄ用放送映像の同時配信に関しては、インターネット回線を利用した配信では、送信側は受信機からのそれぞれの表示端末に適した画像形式のデータの要求後に、要求されたデータを送信するため、同時配信は大きな問題にはならないが、放送波での伝送を想定した場合では、従来放送との互換性を確保するためにインテグラル３Ｄ映像と２Ｄ用放送映像とを表示する両方の情報を送信することが必要になる。したがって、インテグラル３Ｄ映像に２Ｄ用放送映像を加えた膨大なデータ量を符号化するため、高い符号化効率で符号化を行うことが求められる。 In addition, regarding the simultaneous distribution of integral 3D video TV broadcasting and conventional 2D broadcast video, in the distribution using the Internet line, the transmitting side receives data in an image format suitable for each display terminal from the receiver. Simultaneous distribution is not a big problem because the requested data is transmitted after the request, but when transmission in broadcast waves is assumed, integral 3D video and 2D are used to ensure compatibility with conventional broadcasting. It is necessary to transmit both information to display the broadcast video for use. Therefore, in order to encode a huge amount of data obtained by adding a 2D broadcast video to an integral 3D video, it is required to perform coding with high coding efficiency.

したがって、上記のような問題点に鑑みてなされた本発明の目的は、多視点映像を高い符号化効率で圧縮・符号化することができる符号化装置、復号装置、及びプログラムを提供することにある。 Therefore, an object of the present invention made in view of the above problems is to provide a coding device, a decoding device, and a program capable of compressing and coding a multi-viewpoint video with high coding efficiency. is there.

上記課題を解決するために、本発明は、符号化側では、多視点画像を要素画像群に変換して符号化処理を行う。要素画像群に対してイントラブロックコピー機能を持つ映像符号化方式で圧縮を行うことで情報量を削減する。また、復号側では、復号した要素画像群を多視点画像に変換し、視点内挿処理をした後に要素画像群への変換を行う。さらに、要素画像の一部に２Ｄ用放送映像のピクセルを導入する。なお、本明細書で「画像」とは、動画像を含み、いわゆる「映像」であってよい。 In order to solve the above problems, the present invention converts a multi-viewpoint image into an element image group and performs a coding process on the coding side. The amount of information is reduced by compressing the element image group using a video coding method that has an intra-block copy function. Further, on the decoding side, the decoded element image group is converted into a multi-viewpoint image, and after the viewpoint interpolation processing is performed, the conversion to the element image group is performed. Further, pixels of a 2D broadcast image are introduced as a part of the element image. In the present specification, the "image" includes a moving image and may be a so-called "video".

上記課題を解決するために本発明に係る符号化装置は、入力された多視点画像群に対して視点間引き処理を行う視点間引き部と、視点間引きされた多視点画像群を符号化用要素画像群に変換する多視点画像要素画像変換部と、前記符号化用要素画像群を符号化処理する符号化部とを備えることを特徴とする。 In order to solve the above problems, the coding apparatus according to the present invention has a viewpoint thinning unit that performs viewpoint thinning processing on an input multi-view image group, and an element image for encoding the multi-view image group that has been thinned out. It is characterized by including a multi-viewpoint image element image conversion unit that converts into a group and a coding unit that encodes the coding element image group.

また、前記符号化装置は、前記符号化処理が、イントラブロックコピー機能を有する符号化ツールを用いることが望ましい。 Further, it is desirable that the coding apparatus uses a coding tool having an intra-block copy function for the coding process.

また、前記符号化装置は、前記符号化用要素画像群の要素画像の画素サイズを符号化のブロック単位と等しくすることが望ましい。 Further, it is desirable that the coding apparatus makes the pixel size of the element image of the coding element image group equal to the coding block unit.

また、前記符号化装置は、さらに２Ｄ（２次元）画像が入力され、前記多視点画像要素画像変換部は、前記符号化用要素画像群の各要素画像に前記２Ｄ画像の対応する画素又は画素ブロックを埋め込むことが望ましい。 Further, a 2D (two-dimensional) image is further input to the coding device, and the multi-viewpoint image element image conversion unit receives pixels or pixels corresponding to the 2D image in each element image of the coding element image group. It is desirable to embed blocks.

上記課題を解決するために本発明に係る復号装置は、入力された画像符号化データを復号し、符号化用要素画像群を作成する復号部と、前記符号化用要素画像群を多視点画像群に変換する要素画像多視点画像変換部と、前記多視点画像群に基づいて各多視点画像の奥行き推定を行い、デプスマップを生成する奥行き推定部と、前記多視点画像群と生成された前記デプスマップに基づいて、前記多視点画像群の視点間の視点内挿を行う視点内挿処理部とを備えることを特徴とする。 In order to solve the above problems, the decoding apparatus according to the present invention decodes the input image coding data and creates a coding element image group, and a decoding unit, and the coding element image group is a multi-viewpoint image. Element image to be converted into a group Image multi-viewpoint image conversion unit, a depth estimation unit that estimates the depth of each multi-viewpoint image based on the multi-viewpoint image group and generates a depth map, and the multi-viewpoint image group are generated. It is characterized by including a viewpoint insertion processing unit that performs viewpoint insertion between viewpoints of the multi-view image group based on the depth map.

また、前記復号装置は、視点内挿された多視点画像群を要素画像群に変換する多視点画像要素画像変換部をさらに備え、前記要素画像群を出力することが望ましい。 Further, it is desirable that the decoding device further includes a multi-viewpoint image element image conversion unit that converts the multi-viewpoint image group interpolated into the viewpoint into an element image group, and outputs the element image group.

また、前記復号装置は、前記符号化用要素画像群は、２Ｄ（２次元）画像が埋め込まれており、前記要素画像多視点画像変換部は、前記符号化用要素画像群の各要素画像から前記２Ｄ画像の画素又は画素ブロックを抽出し、集積して２Ｄ画像を再生して出力することが望ましい。 Further, in the decoding device, a 2D (two-dimensional) image is embedded in the coding element image group, and the element image multi-viewpoint image conversion unit is used from each element image of the coding element image group. It is desirable to extract the pixels or pixel blocks of the 2D image, integrate them, and reproduce and output the 2D image.

上記課題を解決するために本発明に係るプログラムは、コンピュータを、前記符号化装置として機能させることを特徴とする。 A program according to the present invention for solving the above problems is characterized in that a computer functions as the coding device.

上記課題を解決するために本発明に係るプログラムは、コンピュータを、前記復号装置として機能させることを特徴とする。 A program according to the present invention for solving the above problems is characterized in that a computer functions as the decoding device.

本発明における符号化装置、復号装置、及びプログラムによれば、多視点映像を高い符号化効率で圧縮・符号化することができる。 According to the coding device, the decoding device, and the program of the present invention, the multi-viewpoint video can be compressed and coded with high coding efficiency.

第１の実施形態の符号化装置及び復号装置のブロック図の例である。It is an example of the block diagram of the coding apparatus and decoding apparatus of 1st Embodiment. 多視点画像群から符号化用要素画像群への変換を説明する図である。It is a figure explaining the conversion from the multi-viewpoint image group to the element image group for coding. 復号装置における多視点画像群から要素画像群への変換を説明する図である。It is a figure explaining the conversion from a multi-viewpoint image group to an element image group in a decoding apparatus. 第２の実施形態の符号化装置及び復号装置のブロック図の例である。It is an example of the block diagram of the coding apparatus and decoding apparatus of the 2nd Embodiment. 多視点画像群から２Ｄ画像を埋め込んだ符号化用要素画像群への変換を説明する図である。It is a figure explaining the conversion from a multi-viewpoint image group to a coding element image group in which a 2D image is embedded.

以下、本発明の実施の形態について説明する。 Hereinafter, embodiments of the present invention will be described.

（第１の実施形態）
図１に、本発明の第１の実施形態の符号化装置及び復号装置のブロック図の例を示す。符号化装置１０と復号装置２０は、全体として符号化・復号システムを構成する。符号化装置１０と復号装置２０の間は、情報通信が可能な任意の伝送路で結ばれていてもよく、この場合は、両者は送信装置１０と受信装置２０として機能する。このときの送受信方法としては、放送システム、電波通信、有線・無線ネットワーク等を利用することができる。また、両者をそれぞれ独立した装置とし、記録媒体等を用いて符号化装置１０から復号装置２０へのデータの授受を行ってもよい。 (First Embodiment)
FIG. 1 shows an example of a block diagram of a coding device and a decoding device according to the first embodiment of the present invention. The coding device 10 and the decoding device 20 together constitute a coding / decoding system. The coding device 10 and the decoding device 20 may be connected by an arbitrary transmission line capable of information communication, and in this case, both function as the transmitting device 10 and the receiving device 20. As a transmission / reception method at this time, a broadcasting system, radio wave communication, a wired / wireless network, or the like can be used. Further, both may be independent devices, and data may be exchanged from the coding device 10 to the decoding device 20 using a recording medium or the like.

以下、符号化装置１０、復号装置２０それぞれについて、詳細に説明する。 Hereinafter, each of the coding device 10 and the decoding device 20 will be described in detail.

［符号化装置］
符号化装置１０は、視点間引き部１１、多視点画像要素画像変換部１２、及び符号化部１３を備えている。 [Coordinator]
The coding device 10 includes a viewpoint thinning unit 11, a multi-view image element image conversion unit 12, and a coding unit 13.

入力画像は、例えば、カメラ（例えば、ＣＭＯＳセンサ）が縦横２２×２２個（＝４８４個）配列された多視点カメラで取得した多視点画像群である。１視点の画像のそれぞれは、カラーのテクスチャー画像である。入力画像は、視点間引き部１１に入力される。 The input image is, for example, a group of multi-viewpoint images acquired by a multi-viewpoint camera in which 22 × 22 (= 484) cameras (for example, CMOS sensors) are arranged vertically and horizontally. Each of the images from one viewpoint is a color texture image. The input image is input to the viewpoint thinning unit 11.

視点間引き部１１は、入力された多視点画像群について、等間隔で視点を間引く視点間引き処理を行う。例えば、２２×２２の視点を間引いて８×８視点の画像に縮小する。間引きされた多視点画像群は、多視点画像要素画像変換部１２に出力される。 The viewpoint thinning unit 11 performs a viewpoint thinning process for thinning out the viewpoints at equal intervals for the input multi-view image group. For example, the 22 × 22 viewpoint is thinned out to reduce the image to an 8 × 8 viewpoint image. The thinned out multi-viewpoint image group is output to the multi-viewpoint image element image conversion unit 12.

多視点画像要素画像変換部１２は、入力された（間引きされた）多視点画像群を符号化用要素画像群に変換する。ここで、符号化用要素画像群への変換について、図２を用いて説明する。 The multi-viewpoint image element image conversion unit 12 converts the input (thinned out) multi-viewpoint image group into a coding element image group. Here, the conversion to the element image group for coding will be described with reference to FIG.

図２（Ａ）は、多視点画像群（単に、多視点画像ということもある。）であり、画像群の中央部の２つの視点の画像を上部に拡大して示す。多視点画像群は、例えば、カメラアレイで撮影された画像であり、上部の２つの画像は、対象物を隣接したカメラで撮影した画像に対応する。多視点画像群を構成する各画像が１つの視点の画像に対応し、各視点画像は対象物に対して互いに視差を生じる。なお、多視点画像群を構成する各視点画像は、実際に撮影された画像のみではなく、視点内挿等により作成された画像を含んでもよい。図２（Ａ）の全体（２２×２２＝４８４視点）の多視点画像群が入力画像であり、この内、丸印が付された画像が、符号化用要素画像群への変換で使用する多視点画像（この例では、８×８＝６４視点）である。丸印の付されていない視点の画像は、データ量を削減するため視点間引き部１１で間引かれ、その後の処理には使用されない。 FIG. 2A is a multi-viewpoint image group (sometimes simply referred to as a multi-viewpoint image), and the images of the two viewpoints in the central portion of the image group are enlarged and shown at the top. The multi-viewpoint image group is, for example, an image taken by a camera array, and the upper two images correspond to an image of an object taken by an adjacent camera. Each image constituting the multi-viewpoint image group corresponds to an image of one viewpoint, and each viewpoint image causes parallax with respect to an object. It should be noted that each viewpoint image constituting the multi-viewpoint image group may include not only the image actually taken but also the image created by viewpoint interpolation or the like. The entire multi-viewpoint image group (22 × 22 = 484 viewpoints) in FIG. 2 (A) is an input image, and the image with a circle is used for conversion to a coding element image group. It is a multi-viewpoint image (in this example, 8 × 8 = 64 viewpoints). The image of the viewpoint without the circle is thinned out by the viewpoint thinning unit 11 in order to reduce the amount of data, and is not used for the subsequent processing.

図２（Ｂ）は、符号化用要素画像群であり、中央部の複数の要素画像を上部に拡大して示す。一般に、要素画像群は、多視点画像群から変換して作成することができる。すなわち、多視点画像群を構成する各視点画像から、互いに同じ座標位置にある１画素を抽出し、多視点画像群の全体の配置を保ったまま集積することで、１つの要素画像を生成する。例えば、図２（Ａ）の内の８×８個のカメラで撮影した多視点画像群の各視点画像の１画素から、８×８画素の要素画像（符号化用要素画像）が生成される。他の要素画像も同様に生成することにより、多視点画像群を、８×８画素の要素画像が視点画像の画素数集合した、符号化用要素画像群に変換することができる。 FIG. 2B is a group of element images for coding, and a plurality of element images in the central portion are enlarged and shown at the upper part. In general, the element image group can be created by converting from the multi-viewpoint image group. That is, one element image is generated by extracting one pixel at the same coordinate position from each viewpoint image constituting the multi-viewpoint image group and accumulating the multi-viewpoint image group while maintaining the overall arrangement. .. For example, an 8 × 8 pixel element image (encoding element image) is generated from one pixel of each viewpoint image of the multi-viewpoint image group taken by the 8 × 8 cameras in FIG. 2 (A). .. By generating other element images in the same manner, the multi-viewpoint image group can be converted into a coding element image group in which the element images of 8 × 8 pixels are set by the number of pixels of the viewpoint image.

多視点画像要素画像変換部１２は、このような処理により、間引かれた多視点画像群（Ａ）から符号化用要素画像群（Ｂ）への変換を行い、符号化部１３へ出力する。 The multi-viewpoint image element image conversion unit 12 converts the thinned-out multi-viewpoint image group (A) into the coding element image group (B) by such processing, and outputs the image to the coding unit 13. ..

符号化部１３は、入力された符号化用要素画像群を符号化する。符号化処理は、従来の符号化ツールの内、イントラブロックコピー機能を有する符号化ツール（例えば、Ｈ．２６５／ＨＥＶＣ（High Efficiency Video Coding）の拡張規格等）によって圧縮・符号化を行う。符号化対象のブロックを隣接ブロックから予測するイントラブロックコピー機能を用いることにより、符号化用要素画像群の要素画像間の相関の高さを利用した予測処理が可能となり、符号化データ量を大幅に削減できる。なお、イントラブロックコピーにより高効率なデータ圧縮を行うため、符号化用要素画像群の要素画像（符号化用要素画像）の画素サイズを符号化のブロック単位と等しくすることが望ましい。 The coding unit 13 encodes the input coding element image group. The coding process performs compression / coding by a coding tool having an intra-block copy function (for example, an extended standard of H.265 / HEVC (High Efficiency Video Coding)) among the conventional coding tools. By using the intra-block copy function that predicts the block to be coded from the adjacent block, it is possible to perform prediction processing using the high correlation between the element images of the element image group for coding, and the amount of coded data is greatly increased. Can be reduced to. In order to perform highly efficient data compression by intra-block copying, it is desirable that the pixel size of the element image (coding element image) of the coding element image group be equal to the coding block unit.

符号化部１３で生成された画像符号化データは、符号化装置１０の出力として、出力される。 The image coding data generated by the coding unit 13 is output as the output of the coding device 10.

［復号装置］
復号装置２０は、復号部２１、要素画像多視点画像変換部２２、奥行き推定部２３、視点内挿処理部２４、及び多視点画像要素画像変換部２５を備えている。 [Decoding device]
The decoding device 20 includes a decoding unit 21, an element image multi-viewpoint image conversion unit 22, a depth estimation unit 23, a viewpoint interpolation processing unit 24, and a multi-viewpoint image element image conversion unit 25.

符号化装置１０にて符号化された画像符号化データが、復号部２１に入力される。復号部２１は、入力された画像符号化データを、符号化に対応する復号方法により復号する。復号された画像データは、（符号化用）要素画像群である。復号された要素画像群は、要素画像多視点画像変換部２２に出力される。 The image coded data encoded by the coding device 10 is input to the decoding unit 21. The decoding unit 21 decodes the input image-encoded data by a decoding method corresponding to the coding. The decoded image data is an element image group (for coding). The decoded element image group is output to the element image multi-viewpoint image conversion unit 22.

要素画像多視点画像変換部２２は、入力された要素画像群を多視点画像群に変換する。入力された要素画像群は、実質的に、符号化の対象となった符号化用要素画像群（図２（Ｂ））であるから、前述した多視点画像群から要素画像群への変換と全く逆の変換を行うことにより、要素画像群は図２（Ａ）の８×８個の多視点画像（間引かれた多視点画像群）に変換される。要素画像多視点画像変換部２２は、変換した多視点画像群を奥行き推定部２３及び視点内挿処理部２４へ出力する。 The element image multi-viewpoint image conversion unit 22 converts the input element image group into a multi-viewpoint image group. Since the input element image group is substantially the coding element image group (FIG. 2B) that is the object of coding, the conversion from the multi-viewpoint image group to the element image group described above is performed. By performing the completely reverse conversion, the element image group is converted into the 8 × 8 multi-viewpoint images (thinned multi-viewpoint image group) shown in FIG. 2 (A). The element image multi-viewpoint image conversion unit 22 outputs the converted multi-viewpoint image group to the depth estimation unit 23 and the viewpoint interpolation processing unit 24.

奥行き推定部２３は、入力された多視点画像群から奥行き推定を行い多視点画像群のデプスマップを生成する。デプスマップには画像のデプス情報が反映される。生成したデプスマップを視点内挿処理部２４へ出力する。 The depth estimation unit 23 estimates the depth from the input multi-viewpoint image group and generates a depth map of the multi-viewpoint image group. The depth information of the image is reflected in the depth map. The generated depth map is output to the viewpoint interpolation processing unit 24.

視点内挿処理部２４は、復号後の多視点画像群と生成されたデプスマップを用いることで復号された視点（カメラ）間の視点画像を視点内挿によって生成する。すなわち、復号後の８×８個の多視点画像（図２（Ａ）の丸印の視点）から、視点内挿処理によって視点間の視点画像を予測・内挿し、間引き処理を行う前の２２×２２＝４８４視点の多視点画像群（図２（Ａ）の全体）を再生する。再生された多視点画像群を、多視点画像要素画像変換部２５へ出力する。 The viewpoint interpolation processing unit 24 generates a viewpoint image between the decoded viewpoints (cameras) by using the decoded multi-viewpoint image group and the generated depth map. That is, from the decoded 8 × 8 multi-viewpoint images (viewpoints marked with a circle in FIG. 2A), the viewpoint images between the viewpoints are predicted and interpolated by the viewpoint interpolation processing, and 22 before the thinning process is performed. × 22 = A multi-viewpoint image group of 484 viewpoints (entire of FIG. 2A) is reproduced. The reproduced multi-viewpoint image group is output to the multi-viewpoint image element image conversion unit 25.

多視点画像要素画像変換部２５は、入力された多視点画像群を要素画像群に変換する。図３に、入力される多視点画像群（図３（Ａ））と要素画像群（図３（Ｂ））の例を示す。多視点画像要素画像変換部２５に入力される多視点画像群は、復号された多視点画像と生成されたカメラ間の内挿視点画像とからなる２２×２２＝４８４視点の多視点画像群であり、符号化装置１０に入力された入力画像を復元したものに相当する。これを要素画像群に変換することで、画素サイズの大きい要素画像（２２×２２画素）の要素画像群が生成され、これを復号装置２０の出力画像とする。この出力画像により、インテグラル３Ｄ映像を表示することができる。 The multi-viewpoint image element image conversion unit 25 converts the input multi-viewpoint image group into an element image group. FIG. 3 shows an example of the input multi-viewpoint image group (FIG. 3 (A)) and the element image group (FIG. 3 (B)). The multi-viewpoint image group input to the multi-viewpoint image element image conversion unit 25 is a 22 × 22 = 484-viewpoint multi-viewpoint image group consisting of a decoded multi-viewpoint image and an interpolated viewpoint image between the generated cameras. Yes, it corresponds to a restored input image input to the encoding device 10. By converting this into an element image group, an element image group of an element image (22 × 22 pixels) having a large pixel size is generated, and this is used as an output image of the decoding device 20. With this output image, an integral 3D image can be displayed.

出力画像の要素画像群においては、伝送された画像符号化データの要素画像群よりも、各要素画像の画素数を増やすことができ、奥行きがより広がったインテグラル３Ｄ映像を再現できる。 In the element image group of the output image, the number of pixels of each element image can be increased as compared with the element image group of the transmitted image coding data, and an integral 3D image having a wider depth can be reproduced.

なお、本実施形態では、出力画像に基づいてインテグラル３Ｄ映像を表示させることを前提として、要素画像群を出力画像としたが、例えば、多視点映像を表示させるためには、多視点画像要素画像変換部２５を設けることなく、視点内挿後の多視点画像を復号装置の出力画像としてもよい。 In the present embodiment, the element image group is used as the output image on the premise that the integral 3D image is displayed based on the output image. For example, in order to display the multi-view image, the multi-view image element is used. The multi-viewpoint image after the viewpoint insertion may be used as the output image of the decoding device without providing the image conversion unit 25.

本実施形態によれば、多視点画像群からの視点の間引きと、符号化用要素画像群の要素画像間の相関の高さを利用した符号化処理により、符号化データ量の大幅な削減ができる。また、復号装置では、デプスマップを利用して精度の高い視点内挿ができる。 According to the present embodiment, the amount of coded data can be significantly reduced by thinning out the viewpoints from the multi-viewpoint image group and the coding process using the high correlation between the element images of the coding element image group. it can. Further, in the decoding device, highly accurate viewpoint interpolation can be performed by using the depth map.

（第２の実施形態）
図４に、本発明の第２の実施形態の符号化装置及び復号装置のブロック図の例を示す。第２の実施形態は、多視点画像群に加えて２Ｄ画像を一つのデータフォーマットとして符号化する例であり、インテグラル３Ｄ映像と２Ｄ用放送映像との両方の情報を送信する放送システムに応用が可能な符号化・復号システムである。符号化装置１０-1と復号装置２０-1は、情報通信が可能な任意の伝送路で結ばれていてもよく、この場合は、両者は送信装置１０-1と受信装置２０-1として機能する。このときの送受信方法としては、放送システム、電波通信、有線・無線ネットワーク等を利用することができる。また、両者をそれぞれ独立した装置とし、記録媒体等を用いてデータの授受を行ってもよい。 (Second embodiment)
FIG. 4 shows an example of a block diagram of the coding device and the decoding device according to the second embodiment of the present invention. The second embodiment is an example of encoding a 2D image as one data format in addition to a multi-viewpoint image group, and is applied to a broadcasting system that transmits information of both an integral 3D image and a 2D broadcast image. It is a coding / decoding system capable of. The coding device 10-1 and the decoding device 20-1 may be connected by an arbitrary transmission line capable of information communication, and in this case, both function as a transmitting device 10-1 and a receiving device 20-1. To do. As a transmission / reception method at this time, a broadcasting system, radio wave communication, a wired / wireless network, or the like can be used. In addition, both may be used as independent devices, and data may be exchanged using a recording medium or the like.

以下、符号化装置１０-1、復号装置２０-1それぞれについて、詳細に説明する。なお、図１と共通の部分は、説明を簡略化する。 Hereinafter, each of the coding device 10-1 and the decoding device 20-1 will be described in detail. The parts common to FIG. 1 will be simplified.

［符号化装置］
符号化装置１０-1は、視点間引き部１１、多視点画像要素画像変換部１４、及び符号化部１３を備えており、多視点画像群と２Ｄ画像が入力される。 [Coordinator]
The coding device 10-1 includes a viewpoint thinning unit 11, a multi-view image element image conversion unit 14, and a coding unit 13, and a multi-view image group and a 2D image are input.

入力画像の一方は、例えば、多視点カメラで取得した多視点画像群であり、視点間引き部１１に入力される。この入力画像は、第１の実施形態と同じであってもよく、例えば、カメラが縦横２２×２２個（＝４８４個）配列された、インテグラル３Ｄ映像のための多視点画像群である。１視点の画像のそれぞれは、カラーのテクスチャー画像である。 One of the input images is, for example, a multi-viewpoint image group acquired by a multi-viewpoint camera, and is input to the viewpoint thinning unit 11. This input image may be the same as that of the first embodiment, and is, for example, a multi-viewpoint image group for an integral 3D image in which 22 × 22 (= 484) cameras are arranged vertically and horizontally. Each of the images from one viewpoint is a color texture image.

入力画像の他方は、２Ｄ画像であり、例えば、２Ｄ用放送映像である。２Ｄ画像は、多視点画像要素画像変換部１４に入力される。本実施形態では、２Ｄ画像は、多視点画像群の１つの視点の画像よりも、縦横３倍の画素数を有する画像であるとする。 The other of the input images is a 2D image, for example, a 2D broadcast image. The 2D image is input to the multi-viewpoint image element image conversion unit 14. In the present embodiment, it is assumed that the 2D image is an image having three times the number of pixels in the vertical and horizontal directions as compared with the image of one viewpoint in the multi-viewpoint image group.

視点間引き部１１は、入力された多視点画像について、等間隔で視点を間引く視点間引き処理を行う。さらに、２Ｄ画像の埋め込みを行うために、所定領域の視点を予め間引いておく。例えば、２２×２２の視点を間引いて８×８視点の画像に縮小し、さらに、その８×８視点の画像のうち、所定領域（２Ｄ画像を埋め込む領域）にある３×３視点を除く。間引きされた多視点画像群は、多視点画像要素画像変換部１４に出力される。 The viewpoint thinning unit 11 performs a viewpoint thinning process for thinning the viewpoints at equal intervals with respect to the input multi-view image. Further, in order to embed a 2D image, the viewpoint of a predetermined area is thinned out in advance. For example, the 22 × 22 viewpoint is thinned out to reduce the image to an 8 × 8 viewpoint image, and the 3 × 3 viewpoint in a predetermined area (area in which the 2D image is embedded) is excluded from the 8 × 8 viewpoint image. The thinned out multi-viewpoint image group is output to the multi-viewpoint image element image conversion unit 14.

多視点画像要素画像変換部１４は、間引きされた多視点画像群と、入力された２Ｄ画像とを組み合わせて、符号化用要素画像群に変換する。本実施形態における符号化用要素画像群への変換について、図５を用いて説明する。 The multi-viewpoint image element image conversion unit 14 combines the thinned-out multi-viewpoint image group and the input 2D image and converts them into a coding element image group. The conversion to the coding element image group in this embodiment will be described with reference to FIG.

図５（Ａ）は、上側の画像が、符号化装置に入力された２Ｄ画像であり、例えば、２Ｄ用放送映像である。本実施形態では、２Ｄ画像は、多視点画像群の１つの視点の画像よりも、縦横３倍の画素数を有する画像とする。ただし、図５では、２Ｄ画像の画面サイズを誇張して描いている。 In FIG. 5A, the upper image is a 2D image input to the coding device, and is, for example, a 2D broadcast image. In the present embodiment, the 2D image is an image having three times the number of pixels in the vertical and horizontal directions as compared with the image of one viewpoint in the multi-viewpoint image group. However, in FIG. 5, the screen size of the 2D image is exaggerated.

図５（Ａ）の下側の画像全体が、符号化装置に入力された多視点画像群（単に、多視点画像ということもある。）であり、ここでは図２（Ａ）に記載の多視点画像群と同じものである。例えば、２２×２２＝４８４視点のカメラアレイで撮影された画像であり、多視点画像群を構成する各画像が１つの視点の画像に対応し、各視点画像は対象物に対して互いに視差を生じる。図５（Ａ）の多視点画像群の内、丸印が付された画像が、視点間引き部１１で間引かれた後の、符号化用要素画像群への変換で使用する多視点画像である。この例では、等間隔に８×８＝６４視点を抽出した後、さらに中央部の３×３＝９視点が除かれており、５５視点が符号化用要素画像群への変換で利用される。 The entire lower image of FIG. 5 (A) is a multi-viewpoint image group (sometimes simply referred to as a multi-viewpoint image) input to the encoding device, and here, the multi-viewpoint image shown in FIG. 2 (A). It is the same as the viewpoint image group. For example, it is an image taken by a camera array of 22 × 22 = 484 viewpoints, each image constituting a multi-viewpoint image group corresponds to an image of one viewpoint, and each viewpoint image has parallax with respect to an object. Occurs. Of the multi-viewpoint image group of FIG. 5A, the image with a circle is a multi-viewpoint image used for conversion to a coding element image group after being thinned out by the viewpoint thinning unit 11. is there. In this example, after 8 × 8 = 64 viewpoints are extracted at equal intervals, the 3 × 3 = 9 viewpoints in the central part are further removed, and 55 viewpoints are used for conversion to the element image group for coding. ..

図５（Ｂ）は、符号化用要素画像群であり、中央部の複数の要素画像を上部に拡大して示す。多視点画像群を構成する５５個の各視点画像から、互いに同じ座標位置にある１画素を抽出し、多視点画像群の全体の配置を保ったまま集積する。さらに、本実施形態では２Ｄ画像の画面解像度は多視点画像の画面解像度に比べて縦横どちらも３倍となっているから、多視点画像の１画素に対応する位置にある２Ｄ画像の３×３画素を１つの画素ブロックとして抽出し、符号化用要素画像の中央部の３×３画素の領域に配置することで、１つの要素画像を生成する。この結果、図５（Ｃ）に示されるように、５５個の多視点画像群の各１画素と、２Ｄ画像の３×３画素とを組み合わせて、８×８（＝６４）画素の要素画像（符号化用要素画像）が生成される。他の要素画像も同様に生成することにより、多視点画像群と２Ｄ画像を、８×８画素の要素画像が視点画像の画素数集合した、符号化用要素画像群（２Ｄ画像を埋め込んだ符号化用要素画像群）に変換することができる。 FIG. 5B is a group of element images for coding, and a plurality of element images in the central portion are enlarged and shown in the upper part. One pixel at the same coordinate position from each of the 55 viewpoint images constituting the multi-viewpoint image group is extracted and accumulated while maintaining the overall arrangement of the multi-viewpoint image group. Further, in the present embodiment, the screen resolution of the 2D image is three times as large as the screen resolution of the multi-view image, so that the 3 × 3 of the 2D image at the position corresponding to one pixel of the multi-view image. One element image is generated by extracting pixels as one pixel block and arranging them in a region of 3 × 3 pixels in the center of a coding element image. As a result, as shown in FIG. 5C, an element image of 8 × 8 (= 64) pixels is obtained by combining each 1 pixel of 55 multi-viewpoint image groups and 3 × 3 pixels of a 2D image. (Element image for coding) is generated. By generating other element images in the same manner, a multi-viewpoint image group and a 2D image are assembled, and an 8 × 8 pixel element image is a set of the number of pixels of the viewpoint image. It can be converted into an element image group for conversion).

多視点画像要素画像変換部１４は、このような処理により、間引かれた多視点画像群と２Ｄ画像から符号化用要素画像群（Ｂ）への変換を行い、符号化部１３へ出力する。なお、この符号化用要素画像群（Ｂ）は一例であり、例えば、２Ｄ画像が、多視点画像群の１つの視点の画像よりも、縦横２倍の画素数を有する画像である場合には、要素画像へ埋め込む２Ｄ画像は２×２画素の画素ブロックとする。また、２Ｄ画像が、多視点画像群の１つの視点の画像と同じ画素数の画像である場合には、２Ｄ画像も１画素として要素画像へ埋め込む。なお、２Ｄ画像を埋め込む位置は、要素画像の中央に限らず、要素画像の任意の場所に埋め込んでよい。 The multi-viewpoint image element image conversion unit 14 converts the thinned-out multi-viewpoint image group and the 2D image into the coding element image group (B) by such processing, and outputs the image to the coding unit 13. .. The coding element image group (B) is an example. For example, when the 2D image is an image having twice the number of pixels in the vertical and horizontal directions as the image of one viewpoint in the multi-viewpoint image group. The 2D image to be embedded in the element image is a pixel block of 2 × 2 pixels. When the 2D image is an image having the same number of pixels as the image of one viewpoint in the multi-view image group, the 2D image is also embedded in the element image as one pixel. The position where the 2D image is embedded is not limited to the center of the element image, and may be embedded at any position in the element image.

符号化部１３は、入力された符号化用要素画像群を符号化する。符号化部１３は図１の符号化部１３と同一のものであり、符号化処理は、従来の符号化ツールの内、イントラブロックコピー機能を有する符号化ツールによって圧縮・符号化を行う。２Ｄ画像を埋め込んだ符号化用要素画像群についても、イントラブロックコピー機能を用いることにより、符号化データ量の大幅な削減ができる。なお、符号化用要素画像の画素サイズは、符号化のブロック単位と等しくすることが望ましい。 The coding unit 13 encodes the input coding element image group. The coding unit 13 is the same as the coding unit 13 of FIG. 1, and the coding process performs compression / coding by a coding tool having an intra-block copy function among the conventional coding tools. By using the intra-block copy function for the coding element image group in which the 2D image is embedded, the amount of coded data can be significantly reduced. It is desirable that the pixel size of the element image for coding be equal to the block unit of coding.

符号化部１３で生成された画像符号化データは、符号化装置１０-1の出力として、出力される。 The image coding data generated by the coding unit 13 is output as the output of the coding device 10-1.

［復号装置］
復号装置２０-1は、復号部２１、要素画像多視点画像変換部２６、奥行き推定部２３、視点内挿処理部２４、及び多視点画像要素画像変換部２５を備えている。 [Decoding device]
The decoding device 20-1 includes a decoding unit 21, an element image multi-viewpoint image conversion unit 26, a depth estimation unit 23, a viewpoint interpolation processing unit 24, and a multi-viewpoint image element image conversion unit 25.

符号化装置１０-1にて符号化された画像符号化データが、復号部２１に入力される。復号部２１は、入力された画像符号化データを、符号化に対応する復号方法により復号する。復号された画像データは、２Ｄ画像を埋め込んだ符号化用要素画像群である。復号された符号化用要素画像群は、要素画像多視点画像変換部２６に出力される。 The image coded data encoded by the coding device 10-1 is input to the decoding unit 21. The decoding unit 21 decodes the input image-encoded data by a decoding method corresponding to the coding. The decoded image data is an element image group for coding in which a 2D image is embedded. The decoded coding element image group is output to the element image multi-viewpoint image conversion unit 26.

要素画像多視点画像変換部２６は、入力された符号化用要素画像群から、２Ｄ画像の画素を抽出するとともに、符号化用要素画像群を多視点画像群に変換する。すなわち、復号された符号化用要素画像群の各要素画像は、図５（Ｃ）の画素配置を有しているから、各要素画像から中央部の２Ｄ画像（３×３画素）を抽出し、これを符号化用要素画像群の中での各要素画像の位置に基づいて集積し、元の２Ｄ画像を復元する。また、各要素画像の他の画素（多視点画像に基づく画素）については、前述した多視点画像群から要素画像群への変換と全く逆の変換を行うことにより、符号化用要素画像群は図５（Ａ）の５５個の多視点画像群に変換される。要素画像多視点画像変換部２６は、復元した２Ｄ画像を復号装置２０-1の出力画像として出力するとともに、変換した多視点画像群を奥行き推定部２３及び視点内挿処理部２４へ出力する。 The element image multi-viewpoint image conversion unit 26 extracts the pixels of the 2D image from the input coding element image group and converts the coding element image group into the multi-viewpoint image group. That is, since each element image of the decoded coding element image group has the pixel arrangement shown in FIG. 5C, the central 2D image (3 × 3 pixels) is extracted from each element image. , This is accumulated based on the position of each element image in the element image group for coding, and the original 2D image is restored. Further, for the other pixels of each element image (pixels based on the multi-viewpoint image), the element image group for coding can be obtained by performing the conversion completely opposite to the conversion from the multi-viewpoint image group to the element image group described above. It is converted into the 55 multi-viewpoint image group shown in FIG. 5 (A). The element image multi-viewpoint image conversion unit 26 outputs the restored 2D image as an output image of the decoding device 20-1, and outputs the converted multi-viewpoint image group to the depth estimation unit 23 and the viewpoint interpolation processing unit 24.

奥行き推定部２３は、入力された多視点画像群から奥行き推定を行い多視点画像群のデプスマップを作成し、作成したデプスマップを視点内挿処理部２４へ出力する。 The depth estimation unit 23 estimates the depth from the input multi-viewpoint image group, creates a depth map of the multi-viewpoint image group, and outputs the created depth map to the viewpoint interpolation processing unit 24.

視点内挿処理部２４は、復号後の多視点画像群と多視点画像群のデプスマップを用いることで間引かれた視点を視点内挿によって生成する。すなわち、復号後の５５個の多視点画像（図５（Ａ）の丸印の視点）から、視点内挿処理によって、間引き処理を行う前の２２×２２＝４８４視点の多視点画像群（図５（Ａ）の全体）を再生する。再生された多視点画像群を、多視点画像要素画像変換部２５へ出力する。 The viewpoint interpolation processing unit 24 generates a thinned-out viewpoint by viewpoint interpolation by using the decoded multi-viewpoint image group and the depth map of the multi-viewpoint image group. That is, from the 55 multi-viewpoint images after decoding (the viewpoints marked with circles in FIG. 5 (A)), a multi-viewpoint image group of 22 × 22 = 484 viewpoints before the thinning process is performed by the viewpoint interpolation processing (FIG. 5 (A) as a whole) is reproduced. The reproduced multi-viewpoint image group is output to the multi-viewpoint image element image conversion unit 25.

多視点画像要素画像変換部２５は、入力された多視点画像群を要素画像群に変換する。視点内挿された多視点画像群を要素画像群に変換することで、画素サイズの大きい要素画像（２２×２２画素、図３（Ｂ））の要素画像群が生成され、これを復号装置２０-1の出力画像とする。この出力画像により、インテグラル３Ｄ映像を表示する。 The multi-viewpoint image element image conversion unit 25 converts the input multi-viewpoint image group into an element image group. By converting the multi-viewpoint image group interpolated into the viewpoint into the element image group, the element image group of the element image (22 × 22 pixels, FIG. 3B) having a large pixel size is generated, and the element image group is generated by the decoding device 20. The output image is -1. An integral 3D image is displayed by this output image.

なお、本実施形態では、インテグラル３Ｄ映像を表示させることを前提として、要素画像群を復号装置２０-1の出力画像としたが、例えば、出力画像に基づいて多視点映像を表示させる場合には、多視点画像要素画像変換部２５を省略して、多視点画像を出力画像としてもよい。 In the present embodiment, the element image group is the output image of the decoding device 20-1 on the premise that the integral 3D image is displayed. For example, when displaying the multi-viewpoint image based on the output image, the element image group is used as the output image. May omit the multi-viewpoint image element image conversion unit 25 and use the multi-viewpoint image as an output image.

本実施形態によれば、インテグラル３Ｄ映像と従来の２Ｄ用放送映像とを同時に圧縮・符号化して放送することができる。これにより、従来の２Ｄ用放送映像のみに対応した表示モニターの場合は、２Ｄ画像を利用して表示処理を行い、インテグラル３Ｄ映像に対応した表示モニターの場合は、要素画像群を利用して表示処理を行うことができる。 According to this embodiment, the integral 3D image and the conventional 2D broadcast image can be simultaneously compressed and encoded and broadcast. As a result, in the case of a display monitor that supports only conventional 2D broadcast images, display processing is performed using 2D images, and in the case of a display monitor that supports integral 3D images, element images are used. Display processing can be performed.

本実施形態によれば、符号化装置では、多視点画像群と２Ｄ画像を同時に符号化及び伝送できるとともに、従来の映像符号化方式を適用しつつインテグラル３Ｄ映像用の画像に対して高効率なデータ圧縮ができる。また、復号装置では、要素画像群と２Ｄ画像とを出力することができ、用途に応じて画像を選択することができる。 According to the present embodiment, the coding apparatus can simultaneously encode and transmit a multi-viewpoint image group and a 2D image, and is highly efficient with respect to an image for an integral 3D image while applying a conventional video coding method. Data can be compressed. Further, the decoding device can output the element image group and the 2D image, and can select the image according to the application.

上記の実施の形態では、符号化装置１０、１０-1の構成と動作について説明したが、本発明はこれに限らず、多視点画像を符号化する符号化方法として構成されてもよい。すなわち、図１又は図４のデータの流れに従って、多視点画像から画像符号化データを生成する符号化方法、又は多視点画像と２Ｄ画像とから画像符号化データを生成する符号化方法として構成されてもよい。また、復号装置２０、２０-1の構成と動作について説明したが、本発明はこれに限らず、画像符号化データを復号する復号方法として構成されてもよい。すなわち、図１又は図４のデータの流れに従って、画像符号化データから、要素画像群の出力画像を生成する復号方法、又は、画像符号化データから、要素画像群と２Ｄ画像の出力画像を生成する復号方法として構成されてもよい。 In the above-described embodiment, the configuration and operation of the coding devices 10 and 10-1 have been described, but the present invention is not limited to this, and may be configured as a coding method for coding a multi-viewpoint image. That is, it is configured as a coding method for generating image-coded data from a multi-viewpoint image or a coding method for generating image-coded data from a multi-viewpoint image and a 2D image according to the data flow of FIG. 1 or FIG. You may. Further, although the configurations and operations of the decoding devices 20 and 20-1 have been described, the present invention is not limited to this, and may be configured as a decoding method for decoding image-encoded data. That is, according to the data flow of FIG. 1 or 4, the decoding method for generating the output image of the element image group from the image coded data, or the output image of the element image group and the 2D image is generated from the image coded data. It may be configured as a decoding method.

なお、上述した符号化装置１０、１０-1又は復号装置２０、２０-1として機能させるためにコンピュータを好適に用いることができ、そのようなコンピュータは、符号化装置１０、１０-1又は復号装置２０、２０-1の各機能を実現する処理内容を記述したプログラムを該コンピュータの記憶部に格納しておき、該コンピュータのＣＰＵによってこのプログラムを読み出して実行させることで実現することができる。なお、このプログラムは、コンピュータ読取り可能な記録媒体に記録可能である。 A computer can be preferably used to function as the above-mentioned coding devices 10, 10-1 or decoding devices 20, 20-1, and such a computer can be used as the coding devices 10, 10-1 or decoding. It can be realized by storing a program describing the processing contents for realizing each function of the devices 20 and 20-1 in the storage unit of the computer, and reading and executing this program by the CPU of the computer. This program can be recorded on a computer-readable recording medium.

上述の実施形態は代表的な例として説明したが、本発明の趣旨及び範囲内で、多くの変更及び置換ができることは当業者に明らかである。したがって、本発明は、上述の実施形態によって制限するものと解するべきではなく、特許請求の範囲から逸脱することなく、種々の変形や変更が可能である。例えば、実施形態に記載の複数の構成ブロックを１つに組み合わせたり、あるいは１つの構成ブロックを分割したりすることが可能である。 Although the above embodiments have been described as typical examples, it will be apparent to those skilled in the art that many modifications and substitutions can be made within the spirit and scope of the present invention. Therefore, the present invention should not be construed as being limited by the above-described embodiments, and various modifications and modifications can be made without departing from the scope of claims. For example, it is possible to combine the plurality of constituent blocks described in the embodiment into one, or to divide one constituent block into one.

１０符号化装置
１１視点間引き部
１２多視点画像要素画像変換部
１３符号化部
１４多視点画像要素画像変換部
２０復号装置
２１復号部
２２要素画像多視点画像変換部
２３奥行き推定部
２４視点内挿処理部
２５多視点画像要素画像変換部
２６要素画像多視点画像変換部
10 Coding device 11 Viewpoint thinning unit 12 Multi-viewpoint image element image conversion unit 13 Coding unit 14 Multi-viewpoint image element image conversion unit 20 Decoding device 21 Decoding unit 22 Element image Multi-viewpoint image conversion unit 23 Depth estimation unit 24 Viewpoint insertion Processing unit 25 Multi-viewpoint image Element image conversion unit 26 Element image Multi-viewpoint image conversion unit

Claims

A viewpoint thinning unit that performs viewpoint thinning processing on the input multi-view image group,
A multi-viewpoint image element image conversion unit that converts the multi-viewpoint image group thinned out from the viewpoint into an element image group for coding,
A coding device including a coding unit that encodes the coding element image group.

In the coding device according to claim 1, the coding process uses a coding tool having an intra-block copy function.

The coding device according to claim 1 or 2, wherein the pixel size of the element image of the coding element image group is equal to the coding block unit.

In the coding apparatus according to any one of claims 1 to 3, a 2D (two-dimensional) image is further input, and the multi-viewpoint image element image conversion unit performs each element image of the coding element image group. An encoding device that embeds the corresponding pixel or pixel block of the 2D image in the image.

A decoding unit that decodes the input image coding data and creates an element image group for coding,
An element image multi-viewpoint image conversion unit that converts the coding element image group into a multi-viewpoint image group,
A depth estimation unit that estimates the depth of each multi-view image based on the multi-view image group and generates a depth map,
A decoding device including a viewpoint insertion processing unit that performs viewpoint interpolation between viewpoints of the multi-view image group based on the multi-view image group and the generated depth map.

The decoding device according to claim 5, further comprising a multi-viewpoint image element image conversion unit that converts a multi-viewpoint image group interpolated into a viewpoint into an element image group, and outputs the element image group.

In the decoding device according to claim 5 or 6, a 2D (two-dimensional) image is embedded in the coding element image group, and the element image multi-viewpoint image conversion unit is the coding element image group. A decoding device that extracts pixels or pixel blocks of the 2D image from each element image of the above, integrates them, reproduces the 2D image, and outputs the image.

A program that causes a computer to function as the coding device according to any one of claims 1 to 4.

A program that causes a computer to function as the decoding device according to any one of claims 5 to 7.