JP6979290B2

JP6979290B2 - Image coding device and image decoding device, as well as image coding program and image decoding program.

Info

Publication number: JP6979290B2
Application number: JP2017116481A
Authority: JP
Inventors: 一宏原; 和博千田; 敦郎市ヶ谷; 美和片山; 真宏河北; 智之三科; 宏菊池
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2017-06-14
Filing date: 2017-06-14
Publication date: 2021-12-08
Anticipated expiration: 2037-06-14
Also published as: JP2019004271A

Description

インテグラル方式を用いた立体映像の、画像符号化装置および画像復号化装置、並びに、画像符号化プログラムおよび画像復号化プログラムに関する。 The present invention relates to an image coding device and an image decoding device, and an image coding program and an image decoding program for a stereoscopic image using an integral method.

従来、任意の視点から視認することができる立体テレビ用の撮像・表示方式の一つとして、平面上または球面上に配列されたレンズ群（レンズアレイ）を用いるインテグラル方式が使用されている。このインテグラル方式を使用したカメラからの出力信号は、レンズ群から得られる立体撮像信号が集積されたものとなる。
基本原理に基づくインテグラル方式での立体像の撮影は、図１５に示すように、被写体１をレンズ２を介し、微小なレンズを多数並べたレンズアレイ３を通して撮像カメラ４で撮影する。また、立体像を表示する際は、各レンズにより撮影された画像（以下、「要素画像」と称する。）が配列された要素画像群を液晶パネル等の表示デバイス５に表示し、その前面にレンズアレイ６を配置することにより、元々の被写体１が放つ光線を再現することで、空間上に立体像７を再生する。 Conventionally, as one of the imaging / display methods for a stereoscopic television that can be visually recognized from an arbitrary viewpoint, an integral method using a lens group (lens array) arranged on a plane or a spherical surface has been used. The output signal from the camera using this integral method is an integrated stereoscopic imaging signal obtained from the lens group.
In the shooting of a stereoscopic image by the integral method based on the basic principle, as shown in FIG. 15, the subject 1 is photographed by the image pickup camera 4 through the lens 2 and the lens array 3 in which a large number of minute lenses are arranged. When displaying a stereoscopic image, an element image group in which images taken by each lens (hereinafter referred to as "element image") are arranged is displayed on a display device 5 such as a liquid crystal panel, and is displayed in front of the element image group. By arranging the lens array 6, the stereoscopic image 7 is reproduced in space by reproducing the light rays emitted by the original subject 1.

再生される立体像の解像度は、撮像カメラもしくは撮像素子、および、上記のレンズ群（レンズアレイ）を構成するレンズ（以下、「要素レンズ」と称する。）の個数に関係することが知られている。要素レンズの個数が増大し、一つのレンズに割り当てられる画素（以下、「要素画素」と称する。）の画素数が多ければ、再生される立体像の解像度も増大する。 It is known that the resolution of the reproduced stereoscopic image is related to the number of an image pickup camera or an image pickup element and the number of lenses (hereinafter, referred to as "element lenses") constituting the above lens group (lens array). There is. As the number of element lenses increases and the number of pixels of the pixels (hereinafter referred to as "element pixels") assigned to one lens increases, the resolution of the reproduced stereoscopic image also increases.

ここで自然な（つまり、像が立体であると問題なく認識できる画質の）立体像を再現できる要素画像から構成される要素画像群の伝送や記録を考えると、そのデータサイズは、符号化後においても、画面解像度が高いことから増大する。 Considering the transmission and recording of an element image group composed of element images capable of reproducing a natural (that is, an image quality that can be recognized without any problem that the image is three-dimensional), the data size is determined after encoding. However, it increases because the screen resolution is high.

データサイズの圧縮に用いられる従来の符号化方式であるＡＶＣ（Advanced Video Coding）やＨＥＶＣ（High Efficiency Video Coding）では、符号化画像に対して離散コサイン変換などの直行変換処理を行う。この処理では画像に含まれる周波数成分を周波数ごとに分離している。人の眼では高い周波数成分の絵柄の変化には気付きにくいことから、ＡＶＣやＨＥＶＣの符号化処理では、高周波成分を量子化することでデータサイズを小さくする処理が行われている。
ここで、図１６（ａ）に示すような、要素画像群を符号化する場合について考える。なお、図１６（ｂ）は、要素画像群の一部を拡大して各要素画像を示した図である。要素画像に含まれる各画素は、その画素位置に応じたある特定の方向から来た光線の輝度情報を示している。 In AVC (Advanced Video Coding) and HEVC (High Efficiency Video Coding), which are conventional coding methods used for data size compression, a direct conversion process such as a discrete cosine transform is performed on a coded image. In this process, the frequency components included in the image are separated for each frequency. Since it is difficult for the human eye to notice a change in the pattern of a high frequency component, in the coding process of AVC or HEVC, a process of reducing the data size by quantizing the high frequency component is performed.
Here, a case of encoding an element image group as shown in FIG. 16A will be considered. Note that FIG. 16B is a diagram showing each element image by enlarging a part of the element image group. Each pixel included in the element image indicates the luminance information of a light ray coming from a specific direction according to the pixel position.

この図１６（ａ）のような要素画像群の符号化では、要素画像の画面解像度と符号化処理で直行変換を行う画像サイズが一致しない場合において、ある要素画像が隣の要素画像の影響を受けるため、直行変換後の画像には高周波成分が多く存在してしまう。そして、これらの高周波成分は量子化処理で丸められるため、丸めを多くすればするほど符号化時の画像劣化が顕著に表れてしまう。
この問題を解決する手法として、要素画像と直行変換処理を行う画像のサイズを一致させる技術が開発されている（例えば、特許文献１参照）。 In the coding of the element image group as shown in FIG. 16A, when the screen resolution of the element image and the image size for which the orthogonal conversion is performed in the coding process do not match, one element image is affected by the adjacent element image. Therefore, many high-frequency components are present in the image after orthogonal conversion. Since these high-frequency components are rounded by the quantization process, the more the rounding is, the more the image deterioration at the time of coding becomes remarkable.
As a method for solving this problem, a technique for matching the size of an element image and an image to be subjected to a orthogonal transformation process has been developed (see, for example, Patent Document 1).

さらに、要素画像群を効率的に符号化する技術として、特許文献２に記載の技術が開発されている。特許文献２に記載の技術では、要素画像を構成する一つ一つの画素が画素位置に応じて特定の方向から来た光線の強度を表わしていることに基づき、全ての要素画像から同じ画素位置にある画素を抽出して集めることにより、ある特定の方向（視点）から視認される画像（以下、「多視点画像」と称する。）を生成する。つまり、要素画像群を、各要素画像における同一位置の画素を要素とする集合に変換し、多視点画像が集まった画像群（以下、「多視点画像群」と称する。）を生成する。なお、多視点画像群は、各多視点画像をそれぞれの視点位置に応じて水平方向・垂直方向に並べたものである。そして、それらの多視点画像が集まった多視点画像群に対して、符号化処理を行う手法が報告されている。図１７に、要素画像群から変換した多視点画像群の一例を示す。図１７では、前景のキャラクタと背景の魚との位置関係が、各視点によって異なっている。 Further, as a technique for efficiently encoding an element image group, the technique described in Patent Document 2 has been developed. In the technique described in Patent Document 2, the same pixel position is obtained from all element images based on the fact that each pixel constituting the element image represents the intensity of light rays coming from a specific direction according to the pixel position. By extracting and collecting the pixels in the above, an image visually recognized from a specific direction (viewpoint) (hereinafter referred to as "multi-viewpoint image") is generated. That is, the element image group is converted into a set having pixels at the same position in each element image as elements, and an image group in which multi-viewpoint images are collected (hereinafter referred to as "multi-viewpoint image group") is generated. The multi-viewpoint image group is a group of multi-viewpoint images arranged in the horizontal direction and the vertical direction according to the respective viewpoint positions. Then, a method of performing coding processing on a multi-viewpoint image group in which these multi-viewpoint images are collected has been reported. FIG. 17 shows an example of a multi-viewpoint image group converted from an element image group. In FIG. 17, the positional relationship between the character in the foreground and the fish in the background is different depending on each viewpoint.

また、多視点画像群に変換した後で符号化を行う手法については、全ての多視点画像を符号化する手法と、多視点画像群から奥行き画像情報（以下、「デプス」と称する。）を生成することにより、一部の多視点画像と生成したデプスとを符号化する手法が報告されている。 As for the method of encoding after converting to the multi-viewpoint image group, the method of coding all the multi-viewpoint images and the depth image information (hereinafter referred to as "depth") from the multi-viewpoint image group are used. A method of encoding a part of a multi-viewpoint image and the generated depth by generating it has been reported.

この一部の多視点画像と生成したデプスとを符号化する手法を、図１８を参照して説明する。
図１８では、要素画像群から変換した多視点画像群のうち、上下左右の視域の端にあたる多視点画像、および、視域の中心にある多視点画像を抽出し、その抽出した多視点画像と、それらの画像に対応するデプスとを符号化の対象としている。エンコード側では、伝送や記録をするために符号化される画像が、全ての画像を符号化する場合と比べて画像の数が少ないことから効率的なデータ圧縮を行うことができる。一方、デコード側では伝送されなかった多視点画像を補間（以下、「視点内挿」と称する。）する処理が必要となる。視点内挿では、エンコード側で生成したデプスを用いることにより、原画像に近い映像を生成することが可能となる。 A method for encoding a part of the multi-viewpoint image and the generated depth will be described with reference to FIG.
In FIG. 18, among the multi-viewpoint images converted from the element image group, the multi-viewpoint image corresponding to the edges of the upper, lower, left, and right visual areas and the multi-viewpoint image at the center of the visual range are extracted, and the extracted multi-viewpoint image is obtained. And the depth corresponding to those images are the targets of encoding. On the encoding side, since the number of images encoded for transmission or recording is smaller than that in the case of encoding all images, efficient data compression can be performed. On the other hand, on the decoding side, a process of interpolating (hereinafter, referred to as “viewpoint interpolation”) a multi-viewpoint image that has not been transmitted is required. In viewpoint interpolation, it is possible to generate an image close to the original image by using the depth generated on the encoding side.

特開２００６−１４８８８５号公報Japanese Unexamined Patent Publication No. 2006-148885 特開２０１３−２５１６６３号公報Japanese Unexamined Patent Publication No. 2013-251663

上記した一部の多視点画像と生成したデプスとを符号化する手法を用いた場合、現状では、視点間の映像の相関が高く、ＣＧ（Computer Graphics）で生成した映像であれば、デコード側で画質劣化の少ない多視点画像群を得ることができる。しかしながら、実写の映像ではデコード側での実用サービスに耐えられる充分な画質の映像が生成できていないことに改善の余地があった。 When the above-mentioned method of encoding a part of the multi-viewpoint image and the generated depth is used, the correlation between the images between the viewpoints is high at present, and if the image is generated by CG (Computer Graphics), the decoding side. It is possible to obtain a multi-viewpoint image group with little deterioration in image quality. However, there is room for improvement in the fact that the live-action video cannot generate a video with sufficient image quality that can withstand the practical service on the decoding side.

本発明は、以上のような点を鑑みてなされたものであり、立体映像の効率的なデータ圧縮を行いつつ、画質の低下を抑制した立体像を復元することができる、画像符号化装置および画像復号化装置、並びに、画像符号化プログラムおよび画像復号化プログラムを提供することを課題とする。 The present invention has been made in view of the above points, and is an image encoding device capable of restoring a stereoscopic image in which deterioration of image quality is suppressed while performing efficient data compression of a stereoscopic image. An object of the present invention is to provide an image decoding device, and an image coding program and an image decoding program.

前記課題を解決するために、本願発明の画像符号化装置は、レンズアレイを介して得られる複数の要素画像から生成された多視点画像群を取り込んで符号化する画像符号化装置であって、スウィーピング処理手段と、画像符号化手段と、を備える構成とした。 In order to solve the above problems, the image coding device of the present invention is an image coding device that captures and encodes a multi-viewpoint image group generated from a plurality of element images obtained via a lens array. The configuration includes a sweeping processing means and an image coding means.

かかる構成によれば、画像符号化装置は、スウィーピング処理手段によって、レンズアレイを構成する複数の要素レンズを通して得られる光束から得た複数の要素画像を用いて、複数の要素画像それぞれにおける同一の位置の部分画像が抽出され、抽出された部分画像を要素画像の配列にしたがって配列して部分画像の集合とすることにより生成された多視点画像群を取り込み、多視点画像群のうち予め設定した方向において所定の数の多視点画像毎に、符号化の対象とする基準点となる多視点画像を決定し、取り込んだ画像のフレーム毎に、次フレームにおいて符号化の対象とする多視点画像を、当該フレームにおいて符号化の対象とした多視点画像から予め設定した方向で隣接する多視点画像に切り替えることを示すスウィーピング処理を、多視点画像群の全ての多視点画像が符号化の対象となるようなフレーム数分繰り返すことができる。
これにより、画像符号化装置は、従来技術のように、常に同じ位置のある多視点画像を符号化するのではなく、フレーム毎に符号化する多視点画像を隣接する多視点画像に切り替える。よって、複数フレームの符号化を行えば、全ての視点での多視点画像の符号化を行うことができる。つまり、符号化されない多視点画像をなくすことができ、より多くの視点の多視点画像を利用して、復号する際の画質を均一にすることができる。 According to such a configuration, the image coding apparatus uses the plurality of element images obtained from the light beam obtained through the plurality of element lenses constituting the lens array by the sweeping processing means, and the same position in each of the plurality of element images. a partial image of the extraction, the extracted partial images arranged according to the arrangement of the element images capture the multi-viewpoint image group generated by a set of partial images, the direction set in advance among the multi-viewpoint image group In, for each predetermined number of multi-viewpoint images, a multi-viewpoint image to be a reference point to be encoded is determined, and for each frame of the captured image, a multi-viewpoint image to be encoded in the next frame is determined. The sweeping process indicating that the multi-viewpoint image to be encoded in the frame is switched to the adjacent multi-viewpoint image in a preset direction is performed so that all the multi-viewpoint images in the multi-viewpoint image group are to be encoded. It can be repeated for a number of frames.
As a result, the image coding device does not always encode the multi-viewpoint image at the same position as in the prior art, but switches the multi-viewpoint image encoded for each frame to the adjacent multi-viewpoint image. Therefore, if a plurality of frames are coded, the multi-viewpoint image can be coded from all viewpoints. That is, it is possible to eliminate the unencoded multi-viewpoint image, and it is possible to make the image quality at the time of decoding uniform by using the multi-viewpoint image of more viewpoints.

また、画像符号化装置は、画像符号化手段によって、スウィーピング処理手段が符号化の対象に決定した多視点画像の奥行き画像情報を生成し、符号化の対象に決定した多視点画像と当該多視点画像に関して生成した奥行き画像情報とを圧縮符号化して符号化データを生成することができる。
これにより、画像符号化装置は、全ての多視点画像のうちスウィーピング処理手段が符号化の対象に決定した多視点画像とその奥行き画像情報とを、圧縮符号化して符号化データを生成することができる。よって、全ての多視点画像を符号化する場合に比べ、符号化の対象となる多視点画像を削減することができるため、符号化データのデータサイズを小さくすることができる。 Further, the image coding apparatus generates depth image information of the multi-viewpoint image determined by the sweeping processing means as the coding target by the image coding means, and the multi-viewpoint image determined as the coding target and the multi-viewpoint. Encoded data can be generated by compressing and encoding the depth image information generated for the image.
As a result, the image coding apparatus can compress and encode the multi-viewpoint image determined to be the target of coding by the sweeping processing means and the depth image information thereof among all the multi-viewpoint images to generate the coded data. can. Therefore, as compared with the case where all the multi-viewpoint images are encoded, the number of multi-viewpoint images to be encoded can be reduced, so that the data size of the coded data can be reduced.

このように、本願発明の画像符号化装置は、フレーム毎に符号化の対象とする多視点画像を予め設定した方向で隣接する多視点画像に切り替えて符号化データを生成することができる。これにより、常の同じ位置の多視点画像を符号化した符号化データを取り込む場合に比べ、この画像符号化装置による符号化データを取り込むことにより、画像復号化装置において、不足する画像を復元する際に画質を均一にでき、立体像の画質低下を抑えることができる。 As described above, the image coding apparatus of the present invention can generate coded data by switching the multi-viewpoint image to be coded for each frame to the adjacent multi-viewpoint image in a preset direction. As a result, compared to the case of capturing the coded data obtained by encoding the multi-viewpoint image at the same position, the image decoding device recovers the missing image by capturing the coded data by the image coding device. In this case, the image quality can be made uniform and the deterioration of the image quality of the stereoscopic image can be suppressed.

また、本発明の画像復号化装置は、上記した画像符号化装置が生成した符号化データを取得する画像復号化手段と、多視点画像生成手段と、を備える構成とした。 Further, the image decoding apparatus of the present invention is configured to include an image decoding means for acquiring the coded data generated by the above-mentioned image coding apparatus and a multi-viewpoint image generating means.

かかる構成によれば、画像復号化装置は、画像復号化手段によって、画像符号化装置が生成した符号化データを取得し、その符号化データを、符号化の対象に決定した多視点画像と当該多視点画像に関して生成された奥行き画像情報とに復号することができる。
また、画像復号化装置は、多視点画像生成手段によって、フレーム毎の全ての多視点画像のうち復号化した多視点画像を除いた不足する多視点画像について、同一フレーム内で隣接する復号化した多視点画像およびフレーム間で隣接する復号化した多視点画像を、参照する多視点画像として抽出し、参照する多視点画像およびその奥行き画像情報を用いて視点内挿処理を実行することにより不足する多視点画像を補間してフレーム毎の多視点画像群を生成して出力することができる。
これにより、画像復号化装置は、１フレーム内での多視点画像を利用した視点内挿に加え、フレーム間での多視点画像を利用した視点内挿を行い、多視点画像群を生成することができる。よって、補間する多視点画像の画質低下を抑えることができる。 According to such a configuration, the image decoding device acquires the coded data generated by the image coding device by the image decoding means, and uses the coded data as the multi-viewpoint image determined to be the object of coding. It can be decoded into the depth image information generated for the multi-viewpoint image.
Further, the image decoding device uses the multi-viewpoint image generation means to decode the insufficient multi-viewpoint images excluding the decoded multi-viewpoint images among all the multi-viewpoint images for each frame adjacent to each other in the same frame. It is insufficient by extracting the multi-viewpoint image and the decoded multi-viewpoint image adjacent between the frames as the reference multi-viewpoint image and executing the viewpoint insertion process using the reference multi-viewpoint image and its depth image information. Multi-viewpoint images can be interpolated to generate and output multi-viewpoint image groups for each frame.
As a result, the image decoding device generates a multi-viewpoint image group by performing viewpoint interpolation using the multi-viewpoint image between frames in addition to the viewpoint interpolation using the multi-viewpoint image within one frame. Can be done. Therefore, it is possible to suppress the deterioration of the image quality of the multi-viewpoint image to be interpolated.

このように、本発明の画像復号化装置は、画像符号化装置が生成した符号化データを取得し、１フレーム内やフレーム間での視点内挿を行い、不足している多視点画像を生成することができる。また、画像符号化装置のスウィーピング処理により、より多くの視点での多視点画像の情報を利用できるため、多視点画像群の画質を均一にし、立体像の画質低下を抑えることができる。よって、画面解像度が高くデータサイズが大きな実写の映像についても、実用サービスに耐えられる充分な画質の立体映像を生成することが可能となる。 As described above, the image decoding device of the present invention acquires the coded data generated by the image coding device, performs viewpoint interpolation within one frame or between frames, and generates a missing multi-viewpoint image. can do. Further, since the information of the multi-viewpoint image can be used from more viewpoints by the sweeping process of the image coding device, the image quality of the multi-viewpoint image group can be made uniform and the deterioration of the image quality of the stereoscopic image can be suppressed. Therefore, it is possible to generate a stereoscopic image having sufficient image quality that can withstand practical services even for a live-action image having a high screen resolution and a large data size.

本発明によれば、立体映像の効率的なデータ圧縮を行いつつ、画質の低下を抑制した立体像を復元することができる、画像符号化装置および画像復号化装置、並びに、画像符号化プログラムおよび画像復号化プログラムを提供することができる。 According to the present invention, an image coding device and an image decoding device, an image coding program, and an image coding program capable of restoring a stereoscopic image in which deterioration of image quality is suppressed while performing efficient data compression of a stereoscopic image are performed. An image decoding program can be provided.

本実施形態に係る画像符号化装置が実行するスウィーピング処理の概念を説明するための図である。It is a figure for demonstrating the concept of the sweeping process executed by the image coding apparatus which concerns on this embodiment. 本実施形態に係る画像符号化装置と画像復号化装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the image coding apparatus and the image decoding apparatus which concerns on this embodiment. 各多視点画像とそのデプスとを簡略化して表わした多視点画像群を示す図である。It is a figure which shows the multi-viewpoint image group which represented each multi-viewpoint image and the depth thereof simply. 多視点画像群の内、符号化を行う多視点画像を示す図である。It is a figure which shows the multi-viewpoint image which performs the coding in the multi-viewpoint image group. 符号化を行う多視点画像をフレーム毎に切り替えるスウィーピング処理の方向を示す図である。It is a figure which shows the direction of the sweeping process which switches a multi-viewpoint image to be coded for every frame. １フレーム目において符号化を行う多視点画像の例を示す図である。It is a figure which shows the example of the multi-viewpoint image which performs the coding in the 1st frame. ２フレーム目において符号化を行う多視点画像の例を示す図である。It is a figure which shows the example of the multi-viewpoint image which performs the coding in the 2nd frame. ３フレーム目において符号化を行う多視点画像の例を示す図である。It is a figure which shows the example of the multi-viewpoint image which performs the coding in the 3rd frame. ４フレーム目において符号化を行う多視点画像の例を示す図である。It is a figure which shows the example of the multi-viewpoint image which performs the coding in the 4th frame. ５フレーム目において符号化を行う多視点画像の例を示す図である。It is a figure which shows the example of the multi-viewpoint image which performs the coding in the 5th frame. 視点内挿処理において参照する多視点画像を説明するための図である。It is a figure for demonstrating the multi-viewpoint image referred to in the viewpoint interpolation processing. 多視点画像群の内、符号化を行う多視点画像の他の例を示す図である。It is a figure which shows the other example of the multi-viewpoint image which performs the coding in the multi-viewpoint image group. 符号化を行う多視点画像をフレーム毎に切り替えるスウィーピング処理の方向を示す他の例の図である。It is a figure of another example which shows the direction of a sweeping process which switches a multi-viewpoint image to be coded frame by frame. 本実施形態に係る画像符号化装置と画像復号化装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the image coding apparatus and the image decoding apparatus which concerns on this embodiment. インテグラル方式での立体像の撮影および表示手法を説明するための図である。It is a figure for demonstrating the technique of taking and displaying a 3D image by an integral method. 図１６（ａ）は、要素画像群の一例を示す図である。図１６（ｂ）は、要素画像の拡大図を示す図である。FIG. 16A is a diagram showing an example of an element image group. FIG. 16B is a diagram showing an enlarged view of the element image. 要素画像群から変換した多視点画像群の一例を示す図である。It is a figure which shows an example of the multi-viewpoint image group converted from the element image group. 一部の多視点画像とそのデプスとを符号化する手法を説明するための図である。It is a figure for demonstrating the technique of encoding a part of a multi-viewpoint image and its depth.

以下、本発明を実施するための形態（以下、「実施形態」と称する。）について図面を参照して説明する。
まず、本実施形態に係る、画像符号化装置１０および画像復号化装置２０（共に後記する、図２参照）が実行する処理の概要について説明する。 Hereinafter, embodiments for carrying out the present invention (hereinafter, referred to as “embodiments”) will be described with reference to the drawings.
First, an outline of the processing executed by the image coding device 10 and the image decoding device 20 (both described later, see FIG. 2) according to the present embodiment will be described.

本実施形態に係る画像符号化装置１０は、上記した一部の多視点画像と生成したデプスとを符号化する手法を改良し、次に示す構成を備える。（１）符号化を行う多視点画像の数を増やし、現状の視点合成技術で対応できる範囲での符号化を行う。つまり、図１８において説明した、従来の技術における、上下左右の視域の端にあたる多視点画像、および、視域の中心にある多視点画像を符号化する場合に比べて、全多視点画像のうち符号化を行う多視点画像を増加させる。（２）動画のフレーム毎に符号化の対象とする多視点画像を隣接する多視点画像（例えば、水平方向若しくは垂直方向の多視点画像）に切り替えていくことで、新たに符号化する多視点画像を選択するスウィーピング処理を行う。 The image coding apparatus 10 according to the present embodiment has an improved method for encoding a part of the above-mentioned multi-viewpoint image and the generated depth, and has the following configuration. (1) Increase the number of multi-viewpoint images to be coded, and perform coding within the range that can be handled by the current viewpoint synthesis technology. That is, as compared with the case of encoding the multi-viewpoint image corresponding to the edges of the upper, lower, left, and right visual ranges and the multi-viewpoint image at the center of the visual range in the conventional technique described with reference to FIG. Among them, the number of multi-viewpoint images to be encoded is increased. (2) A new multi-viewpoint to be encoded by switching the multi-viewpoint image to be encoded for each frame of the moving image to an adjacent multi-viewpoint image (for example, a multi-viewpoint image in the horizontal direction or the vertical direction). Performs a sweeping process to select an image.

本実施形態に係る画像符号化装置１０が実行するスウィーピング処理の概念を、図１を参照して説明する。
従来の多視点画像群の符号化においては、画像全体の中で符号化される多視点画像の位置が予め固定されていた。例えば、図１８に示すように、上下左右の視域の端や視域の中心の画像が予め選択され、全フレームにおいて、その固定した位置の多視点画像が符号化されていた。つまり、１フレーム内の空間軸上の位置においても、フレーム間の時間軸上の位置においても、符号化する多視点画像が固定されていた。
これに対し、本実施形態の画像符号化装置１０では、符号化の対象とする多視点画像を、フレーム毎に切り替えていく。例えば、図１に示すように、動画の１フレーム目では、「多視点画像１」と「多視点画像３」とを抽出し符号化する。２フレーム目では、「多視点画像２」と「多視点画像４」とを抽出し符号化する。３フレーム目以降も同様に、フレーム毎に符号化の対象となる多視点画像を切り替える。このように、フレーム毎に符号化する多視点画像を切り替えて決定する処理を、本実施形態ではスウィーピング処理と称する。 The concept of the sweeping process executed by the image coding apparatus 10 according to the present embodiment will be described with reference to FIG.
In the conventional coding of a multi-viewpoint image group, the position of the coded multi-viewpoint image in the entire image is fixed in advance. For example, as shown in FIG. 18, images at the edges of the upper, lower, left, and right visual areas and the center of the visual areas were selected in advance, and the multi-viewpoint images at the fixed positions were encoded in all frames. That is, the multi-viewpoint image to be encoded was fixed both at the position on the spatial axis within one frame and at the position on the time axis between the frames.
On the other hand, in the image coding apparatus 10 of the present embodiment, the multi-viewpoint image to be coded is switched for each frame. For example, as shown in FIG. 1, in the first frame of the moving image, the “multi-viewpoint image 1” and the “multi-viewpoint image 3” are extracted and encoded. In the second frame, the "multi-viewpoint image 2" and the "multi-viewpoint image 4" are extracted and encoded. Similarly, in the third and subsequent frames, the multi-viewpoint image to be encoded is switched for each frame. In this embodiment, the process of switching and determining the multi-viewpoint image to be encoded for each frame is referred to as a sweeping process.

このスウィーピング処理を行うことで、画像復号化装置２０において、１フレーム空間内の視点内挿（空間軸上での距離の近い画像を用いた視点内挿）に加えて、フレーム間での画像を利用した視点内挿（時間軸上での距離の近い画像を用いた視点内挿）を行うことができる（詳細は、図１１を参照して後記する。）。よって、生成される画像の画質を向上させることができる。また、常に同じ位置にある多視点画像を視点内挿した場合、符号化されなかった画像（視点内挿を行った多視点画像）の画質劣化が符号化・復号化した多視点画像と比較して目立ってしまう。スウィーピング処理を行うことにより、全ての視点での多視点画像の情報を含んだ画像が生成されるため、デコード側で出力される多視点画像群は画質が均一となり、立体映像を見る位置によって変化する立体像の画質低下を抑えることができる。 By performing this sweeping process, in the image decoding device 20, in addition to the viewpoint interpolation within one frame space (viewpoint interpolation using an image having a short distance on the spatial axis), an image between frames can be obtained. It is possible to perform the viewpoint interpolation (viewpoint interpolation using an image having a short distance on the time axis) (details will be described later with reference to FIG. 11). Therefore, the image quality of the generated image can be improved. In addition, when a multi-viewpoint image that is always in the same position is interpolated from the viewpoint, the image quality deterioration of the unencoded image (multi-viewpoint image with viewpoint interpolation) is compared with the coded / decoded multi-viewpoint image. It stands out. By performing the sweeping process, an image containing the information of the multi-viewpoint image from all viewpoints is generated, so that the image quality of the multi-viewpoint image group output on the decoding side becomes uniform and changes depending on the position where the stereoscopic image is viewed. It is possible to suppress deterioration of the image quality of the stereoscopic image.

＜画像符号化装置、画像復号化装置＞
次に、図２を参照して、本実施形態に係る画像符号化装置１０および画像復号化装置２０について説明する。
画像符号化装置１０は、例えば、インテグラル方式の撮影装置（撮像カメラ）に接続されたり、組み込まれたりすることにより、要素画像群データを多視点画像群に変換した上で符号化し、画像復号化装置２０側へ出力する装置である。画像復号化装置２０は、伝送装置（図示省略）等を介して、画像符号化装置１０から符号化データを取得し、その符号化データを復号化して多視点画像群を生成し、元の要素画像群データに復元する装置である。この画像復号化装置２０は、例えば、インテグラル方式の画像表示装置やプロジェクタ装置に接続されたり組み込まれたりする。 <Image coding device, image decoding device>
Next, the image coding device 10 and the image decoding device 20 according to the present embodiment will be described with reference to FIG.
The image coding device 10 is connected to or incorporated into, for example, an integral type photographing device (imaging camera) to convert element image group data into a multi-viewpoint image group, encode it, and decode the image. It is a device that outputs to the conversion device 20 side. The image decoding device 20 acquires coded data from the image coding device 10 via a transmission device (not shown) or the like, decodes the coded data to generate a multi-viewpoint image group, and generates an original element. It is a device that restores image group data. The image decoding device 20 is connected to or incorporated into, for example, an integral type image display device or a projector device.

≪画像符号化装置≫
画像符号化装置１０は、図２に示すように、多視点画像変換手段１１と、スウィーピング処理手段１２と、画像符号化手段１３とを備える。 ≪Image coding device≫
As shown in FIG. 2, the image coding device 10 includes a multi-viewpoint image conversion means 11, a sweeping processing means 12, and an image coding means 13.

多視点画像変換手段１１は、図示を省略したレンズアレイを構成する複数の要素レンズを通して得られる光束から得た複数の要素画像（要素画像群データ）を取り込む。そして、多視点画像変換手段１１は、取り込んだ複数の要素画像（要素画像群データ）について、各要素画像における同一の位置の部分画像を、要素画像における配列にしたがって配列した多視点画像群（部分画像の集合）に変換する。なお、部分画像は、要素画像の一部を切り出した画像であり、一画素で構成されてもよいし、隣接する複数の画素で構成されてもよい。 The multi-viewpoint image conversion means 11 captures a plurality of element images (element image group data) obtained from a light flux obtained through a plurality of element lenses constituting a lens array (not shown). Then, the multi-viewpoint image conversion means 11 arranges partial images at the same position in each element image of the captured plurality of element images (element image group data) according to an array in the element image (part). Convert to a set of images). The partial image is an image obtained by cutting out a part of the element image, and may be composed of one pixel or a plurality of adjacent pixels.

本実施形態においては、インテグラル方式の立体テレビカメラにより撮影された要素画像群データを取り込み、正方の要素レンズから構成されるレンズアレイを用いて表示装置に立体像を構成するときの要素画像群を例として説明する。
ここで、要素画像の解像度を一例として２０×２０画素とする。また、表示装置に配置されるレンズアレイの有効レンズ数を一例として２４０×１６０個とする。この場合、要素画像群のサイズは、４８００×３２００画素となる。さらに、この要素画像群を変換して得られる多視点画像群は、２４０×１６０画素からなる多視点画像が２０×２０視点となる。
以下、図３に示すように、多視点画像とそのデプスとを説明のため簡略化し一つの矩形で表記する。また、図３は、要素画像群から変換された多視点画像群（図３の符号Ｘ）として、２０×２０視点の多視点画像があることを示している。 In the present embodiment, element image group data taken by an integral type stereoscopic television camera is captured, and an element image group when a stereoscopic image is formed on a display device using a lens array composed of square element lenses. Will be described as an example.
Here, the resolution of the element image is set to 20 × 20 pixels as an example. Further, the number of effective lenses in the lens array arranged in the display device is 240 × 160 as an example. In this case, the size of the element image group is 4800 × 3200 pixels. Further, in the multi-viewpoint image group obtained by converting this element image group, the multi-viewpoint image composed of 240 × 160 pixels has 20 × 20 viewpoints.
Hereinafter, as shown in FIG. 3, the multi-viewpoint image and its depth are simplified and represented by one rectangle for explanation. Further, FIG. 3 shows that there is a 20 × 20 viewpoint multi-viewpoint image as a multi-viewpoint image group (reference numeral X in FIG. 3) converted from the element image group.

図２に示すように、スウィーピング処理手段１２は、フレーム毎の各多視点画像（およびそのデプス。以下適宜省略。）について、符号化する多視点画像を決定する処理を行う。そして、スウィーピング処理手段１２は、フレーム毎に符号化の対象とする多視点画像を、隣接する視点位置（例えば、水平方向もしくは垂直方向）の多視点画像に切り替える。
ここで、例えば、スウィーピング処理手段１２は、多視点画像の２０×２０視点（全４００視点）のうち、図４で示した１００視点（矩形内を斜線で示す多視点画像）を符号化する多視点画像の基準点として予め決定しておく。そして、スウィーピング処理手段１２は、この１００視点を予め設定した図５で示す方向に、フレームが切り替わる毎に視点位置を移動していくスウィーピング処理を実行する。これにより、スウィーピング処理手段１２は、図６に示す１フレーム目、図７に示す２フレーム目、図８に示す３フレーム目、図９に示す４フレーム目、図１０に示す５フレーム目、というように、符号化する多視点画像の位置を切り替えていく。この場合には、４フレームあれば全ての視点の多視点画像（およびそのデプス）の成分を含めることができる。
また、スウィーピング処理手段１２が、全４００視点を符号化する場合と比較して符号化の対象となる多視点画像は１００視点となるため、符号化されたデータサイズを小さくすることができる。 As shown in FIG. 2, the sweeping processing means 12 performs a process of determining a multi-viewpoint image to be encoded for each multi-viewpoint image (and its depth; hereinafter appropriately omitted) for each frame. Then, the sweeping processing means 12 switches the multi-viewpoint image to be encoded for each frame to a multi-viewpoint image at an adjacent viewpoint position (for example, a horizontal direction or a vertical direction).
Here, for example, the sweeping processing means 12 encodes 100 viewpoints (multi-viewpoint image whose inside of the rectangle is shown by diagonal lines) out of 20 × 20 viewpoints (400 viewpoints in total) of the multi-viewpoint image. It is determined in advance as a reference point for the viewpoint image. Then, the sweeping processing means 12 executes a sweeping process in which the viewpoint position is moved every time the frame is switched in the direction shown in FIG. 5 in which the 100 viewpoints are set in advance. As a result, the sweeping processing means 12 is referred to as a first frame shown in FIG. 6, a second frame shown in FIG. 7, a third frame shown in FIG. 8, a fourth frame shown in FIG. 9, and a fifth frame shown in FIG. In this way, the positions of the multi-viewpoint images to be encoded are switched. In this case, if there are four frames, the components of the multi-viewpoint image (and its depth) of all viewpoints can be included.
Further, since the multi-viewpoint image to be coded has 100 viewpoints as compared with the case where the sweeping processing means 12 encodes all 400 viewpoints, the coded data size can be reduced.

なお、スウィーピング処理手段１２が１フレームにおいて符号化を決定する視点数や、フレーム毎に切り替わるスウィーピングの方向（水平方向、垂直方向、もしくは、斜め方向等）は、予め設定しておく。 The number of viewpoints for which the sweeping processing means 12 determines coding in one frame and the sweeping direction (horizontal direction, vertical direction, diagonal direction, etc.) switched for each frame are set in advance.

図２に示すように、画像符号化手段１３は、スウィーピング処理手段１２が符号化すると決定した多視点画像についてデプス（奥行き画像情報）を生成する。そして、画像符号化手段１３は、符号化すると決定した多視点画像とそのデプスとを、例えば、二次元画像用の従来の符号化方式（ＡＶＣ、ＨＥＶＣ等）により圧縮符号化して符号化データを生成し、この符号化データを出力する。出力された符号化データは、図示を省略した伝送装置等により、画像復号化装置２０へ送信される。なお、多視点画像についてのデプスの生成と符号化については、例えば、特開２００７−３６８００号公報に記載の技術を用いることができる。 As shown in FIG. 2, the image coding means 13 generates depth (depth image information) for a multi-viewpoint image determined by the sweeping processing means 12 to be encoded. Then, the image coding means 13 compresses and encodes the multi-viewpoint image determined to be encoded and its depth by, for example, a conventional coding method for a two-dimensional image (AVC, HEVC, etc.) to obtain the coded data. Generate and output this coded data. The output coded data is transmitted to the image decoding device 20 by a transmission device or the like (not shown). For the generation and coding of the depth of the multi-viewpoint image, for example, the technique described in JP-A-2007-36800 can be used.

≪画像復号化装置≫
画像復号化装置２０は、図２に示すように、画像復号化手段２１と、多視点画像生成手段２２と、要素画像変換手段２３とを備える。 ≪Image decoding device≫
As shown in FIG. 2, the image decoding device 20 includes an image decoding means 21, a multi-viewpoint image generation means 22, and an element image conversion means 23.

画像復号化手段２１は、画像符号化装置１０が生成した符号化データを、例えば、伝送装置を介して取得し、この符号化データを復号して多視点画像とそのデプスとを取得する。 The image decoding means 21 acquires the coded data generated by the image coding device 10 via, for example, a transmission device, decodes the coded data, and acquires a multi-viewpoint image and its depth.

多視点画像生成手段２２は、画像復号化手段２１が復号化した多視点画像とそのデプスと用いて、視点内挿処理を実行することにより、デコード側では伝送されなかった多視点画像を補間し、多視点画像群を生成する処理を実行する。
多視点画像生成手段２２は、この視点内挿処理を、予め設定しておいた、空間軸上で隣接する多視点画像（同一フレーム内で距離の近い画像）や、時間軸上で隣接する多視点画像（フレーム間で距離の近い画像）を参照することにより、不足している多視点画像を生成する。
なお、多視点画像生成手段２２は、多視点画像を生成する際に用いる画像を決定した後は、例えば、特開２０１０−２００１８８号公報に記載の視点内挿を行うための視点合成技術を用いて、不足している多視点画像を生成する。
以下、多視点画像生成手段２２が、視点内挿を行うために参照する多視点画像を決定する処理について、図４〜図１０等で示した例を用いて説明する。 The multi-viewpoint image generation means 22 interpolates the multi-viewpoint image that was not transmitted on the decoding side by executing the viewpoint interpolation processing using the multi-viewpoint image decoded by the image decoding means 21 and its depth. , Executes the process of generating a multi-viewpoint image group.
The multi-viewpoint image generation means 22 performs this viewpoint interpolation processing on a preset multi-viewpoint image (an image having a close distance in the same frame) on the spatial axis or a multi-viewpoint image adjacent on the time axis. By referring to the viewpoint image (the image in which the distance between frames is short), the missing multi-viewpoint image is generated.
After determining the image to be used when generating the multi-viewpoint image, the multi-viewpoint image generation means 22 uses, for example, the viewpoint synthesis technique for performing viewpoint interpolation described in Japanese Patent Application Laid-Open No. 2010-20188. To generate the missing multi-viewpoint image.
Hereinafter, the process of determining the multi-viewpoint image to be referred to for performing the viewpoint interpolation by the multi-viewpoint image generation means 22 will be described with reference to the examples shown in FIGS. 4 to 10 and the like.

図１１は、図５で示したフレーム毎の順番で、符号化する多視点画像を切り替えるスウィーピング処理が行われた場合における視点内挿処理を説明するための図である。図１１では、多視点画像の２０×２０視点（全４００視点）のうち、１フレーム目〜７フレーム目について、各フレーム中の一部である５×５視点を抜き出して図示している。例えば、１フレーム目は、図６に示す符号αの領域に含まれる多視点画像を示している。この図１１に示す１フレーム目〜７フレーム目において、中心に位置する３行３列目の視点の視点内挿を行うことを考える。
なお、３フレーム目では「５５」の番号が割り当てられた多視点画像が既に存在するため、視点内挿を行う必要はない。７フレーム目の「５４」の番号が割り当てられた多視点画像についても同様に視点内挿を行う必要はない。 FIG. 11 is a diagram for explaining the viewpoint interpolation processing in the case where the sweeping processing for switching the multi-viewpoint image to be encoded is performed in the order of each frame shown in FIG. In FIG. 11, out of the 20 × 20 viewpoints (400 viewpoints in total) of the multi-viewpoint image, the 5 × 5 viewpoints, which are a part of each frame, are extracted and shown for the first to seventh frames. For example, the first frame shows a multi-viewpoint image included in the region of reference numeral α shown in FIG. In the 1st to 7th frames shown in FIG. 11, it is considered to perform viewpoint interpolation of the viewpoint of the 3rd row and 3rd column located at the center.
Since the multi-viewpoint image to which the number "55" is assigned already exists in the third frame, it is not necessary to perform viewpoint interpolation. Similarly, it is not necessary to perform viewpoint interpolation for the multi-viewpoint image to which the number "54" in the 7th frame is assigned.

ここで、多視点画像生成手段２２は、例えば、３フレーム目の「５５」の番号が割り当てられた多視点画像と、７フレーム目の「５４」の番号を割り当てられた多視点画像とが同じ（例えば、同じ輝度値）であれば、４〜６フレーム目も同一の画像である可能性が高く、その画像をコピーすることにより補間することができる。
なお、３フレーム目の「５５」の番号が割り当てられた多視点画像全体と、７フレーム目の「５４」の番号を割り当てられた多視点画像全体を比較するのではなく、これを分割した領域で扱ってもかまわない。３フレーム目の「５５」の番号が割り当てられた多視点画像と、７フレーム目の「５４」の番号が割り当てられた多視点画像の一部領域が同じであれば、４〜６フレーム目における、その同一である一部領域と同じ位置関係にある画像領域も同一の画像である可能性が高いため、その画像領域をコピーすることにより補間することができる。また、この場合における一部領域の一例として符号化における分割ブロックを単位とすることも可能である。 Here, in the multi-viewpoint image generation means 22, for example, the multi-viewpoint image to which the number "55" in the third frame is assigned and the multi-viewpoint image to which the number "54" in the seventh frame is assigned are the same. If it is (for example, the same luminance value), there is a high possibility that the 4th to 6th frames are the same image, and the image can be interpolated by copying the image.
It should be noted that instead of comparing the entire multi-viewpoint image to which the number "55" in the third frame is assigned and the entire multi-viewpoint image to which the number "54" in the seventh frame is assigned, the area divided thereof. You can handle it with. If the multi-viewpoint image to which the number "55" is assigned in the 3rd frame and the partial area of the multi-viewpoint image to which the number "54" in the 7th frame is assigned are the same, in the 4th to 6th frames. Since there is a high possibility that an image region having the same positional relationship as the same partial region is also the same image, it is possible to interpolate by copying the image region. Further, as an example of a partial region in this case, it is also possible to use a division block in coding as a unit.

また、例えば、３フレーム目の「５５」の番号が割り振られた多視点画像と７フレーム目の「５４」の番号が割り振られた多視点画像とで視差が生じている場合、時間軸上での変化は、カメラのパン、チルトや被写体の動きによるものと考えられることができ、多視点画像生成手段２２は、動き予測を行った上で視点内挿処理を行う。 Further, for example, when a parallax occurs between the multi-viewpoint image to which the number "55" in the third frame is assigned and the multi-viewpoint image to which the number "54" in the seventh frame is assigned, on the time axis. It can be considered that the change in is due to the pan and tilt of the camera and the movement of the subject, and the multi-viewpoint image generation means 22 performs the viewpoint insertion process after predicting the movement.

例えば、２フレーム目の３行３列目の多視点画像の視点内挿では、上記した時間軸上での差分情報に加えて、空間軸上の距離が近い「５１」および「５５」の番号を割り当てられた多視点画像を用いて視点内挿を行う。
同様に、４フレーム目の３行３列目の多視点画像の視点内挿でも時間軸上での差分情報に加えて、空間軸上の距離が近い「５５」および「６０」の番号を割り当てられた多視点画像を用いて視点内挿を行う。
同様に、６フレーム目の３行３列目の多視点画像の視点内挿でも時間軸上での差分情報に加えて、空間軸上の距離が近い「５０」および「５４」の番号を割り当てられた多視点画像を用いて視点内挿を行う。 For example, in the viewpoint interpolation of the multi-viewpoint image of the 3rd row and 3rd column of the 2nd frame, in addition to the difference information on the time axis described above, the numbers "51" and "55" whose distances on the spatial axis are close are short. Perform viewpoint interpolation using the multi-viewpoint image assigned to.
Similarly, in the viewpoint interpolation of the multi-viewpoint image in the 3rd row and 3rd column of the 4th frame, in addition to the difference information on the time axis, the numbers "55" and "60" that are close to each other on the space axis are assigned. Viewpoint interpolation is performed using the obtained multi-viewpoint image.
Similarly, in the viewpoint interpolation of the multi-viewpoint image in the 3rd row and 3rd column of the 6th frame, in addition to the difference information on the time axis, the numbers "50" and "54" that are close to each other on the spatial axis are assigned. Viewpoint interpolation is performed using the obtained multi-viewpoint image.

また、１フレーム目の３行３列目の多視点画像の視点内挿では、上下左右の空間軸方向に参照可能な多くの多視点画像があることから、例えば、上下左右両方に同じ距離にある多視点画像である「４６」と「６５」、「５５」と「５６」を用いて視点内挿を行う。
同様に、５フレーム目の３行３列目の多視点画像の視点内挿では、例えば、上下左右両方に同じ距離にある多視点画像である「４５」と「６４」、「５４」と「５５」を用いて視点内挿を行う。 In addition, in the viewpoint insertion of the multi-viewpoint image in the 3rd row and 3rd column of the 1st frame, since there are many multi-viewpoint images that can be referred to in the vertical and horizontal spatial axis directions, for example, the same distance is set both in the vertical and horizontal directions. Viewpoint insertion is performed using certain multi-viewpoint images "46" and "65", "55" and "56".
Similarly, in the viewpoint interpolation of the multi-viewpoint image in the third row and third column of the fifth frame, for example, "45" and "64", "54" and "54", which are multi-viewpoint images at the same distance in both the top, bottom, left and right. 55 ”is used to perform viewpoint interpolation.

また、例えば、空間軸上で隣接する多視点画像（同一フレーム内で距離の近い画像）と、時間軸上で隣接する多視点画像（フレーム間で距離の近い画像）の両方を用いる手法として、次のようにしてもよい。多視点画像生成手段２２が、視点内挿を行うとする多視点画像の同一フレーム内で隣接する（周囲の）８つの多視点画像と、前後のフレームの各９つ（合計１８）の多視点画像とにより構成される立方体に相当する位置の多視点画像に関する情報を用いて、視点内挿を行う。
例えば、２フレーム目の３行３列目の多視点画像の視点内挿において、多視点画像生成手段２２は、空間軸上で隣接する多視点画像（同一フレーム内で距離の近い画像）として、「５５」と「５１」の番号が割り当てられた多視点画像を抽出する。また、多視点画像生成手段２２は、時間軸上で隣接する多視点画像（フレーム間で距離の近い画像）として、１フレーム目の「５１」と「６１」の番号が割り当てられた多視点画像を抽出し、３フレーム目の「５１」と「６０」の番号が割り当てられた多視点画像を抽出する。 Further, for example, as a method of using both a multi-viewpoint image adjacent on the spatial axis (an image having a short distance within the same frame) and a multi-viewpoint image adjacent on the time axis (an image having a short distance between frames). You may also do the following: The multi-viewpoint image generation means 22 has eight multi-viewpoint images adjacent (surrounding) within the same frame of the multi-viewpoint image to be interpolated, and nine multi-viewpoints (18 in total) in each of the previous and next frames. Viewpoint interpolation is performed using information about a multi-viewpoint image at a position corresponding to a cube composed of an image.
For example, in the viewpoint interpolation of a multi-viewpoint image in the third row and third column of the second frame, the multi-viewpoint image generation means 22 uses the multi-viewpoint image (an image having a short distance in the same frame) adjacent to each other on the spatial axis. A multi-viewpoint image to which the numbers "55" and "51" are assigned is extracted. Further, the multi-viewpoint image generation means 22 is assigned the numbers "51" and "61" of the first frame as multi-viewpoint images (images having a short distance between frames) adjacent to each other on the time axis. Is extracted, and a multi-viewpoint image to which the numbers "51" and "60" are assigned in the third frame is extracted.

このように、多視点画像生成手段２２は、空間軸上で隣接する多視点画像や、時間軸上で隣接する多視点画像を用いて視点内挿を行い、符号化されなかった多視点画像を補間する。
なお、多視点画像生成手段２２が、空間軸上や時間軸上で隣接する多視点画像のうち、どの多視点画像を視点内挿処理の際の参照するかについては、予め多視点画像生成手段２２に設定しておくものとする。 As described above, the multi-viewpoint image generation means 22 performs viewpoint interpolation using the multi-viewpoint images adjacent on the spatial axis and the multi-viewpoint images adjacent on the time axis, and obtains the unencoded multi-viewpoint image. Interpolate.
It should be noted that, among the multi-viewpoint images adjacent on the spatial axis and the time axis, which multi-viewpoint image is referred to by the multi-viewpoint image generation means 22 during the viewpoint interpolation processing is determined in advance by the multi-viewpoint image generation means. It shall be set to 22.

図２に示すように、要素画像変換手段２３は、多視点画像生成手段２２が視点内挿により生成した多視点画像群について、その多視点画像群を複数の要素画像（要素画像群）に変換し、要素画像群データとして復元して出力する。
具体的には、要素画像変換手段２３は、多視点画像生成手段２２が生成した多視点画像群における各部分画像から、同一の位置の画素を抽出し、抽出したこれらの画素を、要素レンズにおけるレンズ座標で示される部分画像の位置にしたがって配列した要素画像とすることにより、要素画像群を復元する。 As shown in FIG. 2, the element image conversion means 23 converts the multi-viewpoint image group generated by the multi-viewpoint image generation means 22 by viewpoint insertion into a plurality of element images (element image groups). Then, it is restored and output as element image group data.
Specifically, the element image conversion means 23 extracts pixels at the same position from each partial image in the multi-viewpoint image group generated by the multi-viewpoint image generation means 22, and these extracted pixels are used in the element lens. The element image group is restored by making the element image arranged according to the position of the partial image indicated by the lens coordinates.

以上説明したように、本実施形態に係る画像符号化装置１０および画像復号化装置２０によれば、画像符号化装置１０が、スウィーピング処理により、フレーム毎に符号化の対象とする多視点画像を、予め設定した方向で隣接する多視点画像に切り替えて決定し、符号化することができる。そして、画像復号化装置２０が、１フレーム空間内（空間軸上）での視点内挿やフレーム間（時間軸上）での視点内挿を行い、不足している多視点画像を生成することができる。よって、従来技術のように、常に同じ位置にある多視点画像を視点内挿した場合と比べ、スウィーピング処理を行うことにより、より多くの視点での多視点画像の情報を含んだ画像を生成できる。このため、デコード側で出力される多視点画像群は画質が均一となり、立体像の画質低下を抑えることができる。
よって、画像符号化装置１０および画像復号化装置２０によれば、立体映像の効率的なデータ圧縮を行いつつ、画質の低下を抑制した立体像を復元することができる。これにより、画面解像度が高くデータサイズが大きな実写の映像についても、実用サービスに耐えられる充分な画質の立体映像を生成することが可能となる。 As described above, according to the image coding device 10 and the image decoding device 20 according to the present embodiment, the image coding device 10 performs a sweeping process to obtain a multi-viewpoint image to be coded frame by frame. , It is possible to switch to an adjacent multi-viewpoint image in a preset direction, determine the image, and encode the image. Then, the image decoding device 20 performs viewpoint interpolation within one frame space (on the space axis) and viewpoint interpolation between frames (on the time axis) to generate a missing multi-viewpoint image. Can be done. Therefore, it is possible to generate an image containing information on the multi-viewpoint image from more viewpoints by performing the sweeping process, as compared with the case where the multi-viewpoint image that is always at the same position is inserted into the viewpoint as in the prior art. .. Therefore, the image quality of the multi-viewpoint image group output on the decoding side becomes uniform, and the deterioration of the image quality of the stereoscopic image can be suppressed.
Therefore, according to the image coding device 10 and the image decoding device 20, it is possible to restore a stereoscopic image in which deterioration of image quality is suppressed while performing efficient data compression of the stereoscopic image. This makes it possible to generate a stereoscopic image with sufficient image quality that can withstand practical services even for a live-action image having a high screen resolution and a large data size.

なお、本実施形態に係る画像符号化装置１０のスウィーピング処理手段１２が実行するスウィーピング処理で多視点画像を切り替える方向は、図４で示した多視点画像の基準点と図５で示した方向に限定されない。
図１２は、多視点画像の基準点の他の例であり、図１３は、図１２の基準点からのスウィーピング処理の方向を示す図である。スウィーピング処理手段１２は、図１２で示す１００視点（矩形内を斜線で示す多視点画像）を符号化する多視点画像の基準点として予め決定しておく。そして、スウィーピング処理手段１２は、この１００視点を予め設定した図１３で示す方向に、フレームが切り替わる毎に視点位置を移動していくスウィーピング処理を実行する。このようにすることによっても、４フレームあれば全ての視点の多視点画像（およびそのデプス）を含めた視点内挿を、画像複合化装置２０において行うことができる。 The direction for switching the multi-viewpoint image in the sweeping process executed by the sweeping processing means 12 of the image coding apparatus 10 according to the present embodiment is the direction shown in FIG. 5 and the reference point of the multi-viewpoint image shown in FIG. Not limited.
FIG. 12 is another example of the reference point of the multi-viewpoint image, and FIG. 13 is a diagram showing the direction of the sweeping process from the reference point of FIG. The sweeping processing means 12 is determined in advance as a reference point of the multi-viewpoint image for encoding the 100 viewpoints (multi-viewpoint image whose inside of the rectangle is indicated by diagonal lines) shown in FIG. Then, the sweeping processing means 12 executes a sweeping process in which the viewpoint position is moved every time the frame is switched in the direction shown in FIG. 13 in which the 100 viewpoints are set in advance. By doing so, the image compounding device 20 can perform viewpoint interpolation including multi-viewpoint images (and their depths) of all viewpoints if there are four frames.

また、本実施形態に係る画像符号化装置１０および画像復号化装置２０を、図１４に示す、画像符号化装置１０ａおよび画像復号化装置２０ａとして構成してもよい。
画像符号化装置１０ａは、スウィーピング処理手段１２および画像符号化手段１３を備えているが、図２に示した画像符号化装置１０と比べ、多視点画像変換手段１１を備えていない。この場合、画像符号化装置１０ａのスウィーピング処理手段１２は、レンズアレイを介して得られた複数の要素画像を変換した結果としての多視点画像群のデータを取り込むようにする。
この画像符号化装置１０ａは、図２の画像符号化装置１０と同様に、フレーム毎に符号化の対象とする多視点画像を予め設定した方向で隣接する多視点画像に切り替えて符号化データを生成することができる。よって、従来技術のように、常の同じ位置の多視点画像を符号化した符号化データを取り込む場合に比べ、画像符号化装置１０ａによる符号化データを取り込むことにより、画像復号化装置２０ａにおいて、不足する画像を復元する際に画質を均一にでき、立体像の画質低下を抑えることができる。 Further, the image coding device 10 and the image decoding device 20 according to the present embodiment may be configured as the image coding device 10a and the image decoding device 20a shown in FIG.
The image coding device 10a includes the sweeping processing means 12 and the image coding means 13, but does not include the multi-viewpoint image conversion means 11 as compared with the image coding device 10 shown in FIG. In this case, the sweeping processing means 12 of the image coding apparatus 10a captures the data of the multi-viewpoint image group as a result of converting the plurality of element images obtained via the lens array.
Similar to the image coding device 10 of FIG. 2, the image coding device 10a switches the multi-viewpoint image to be coded for each frame to the adjacent multi-viewpoint image in a preset direction and converts the coded data. Can be generated. Therefore, as compared with the case of capturing the coded data obtained by encoding the multi-viewpoint image at the same position as in the prior art, the image decoding device 20a captures the coded data by the image coding device 10a. The image quality can be made uniform when restoring the missing image, and the deterioration of the image quality of the stereoscopic image can be suppressed.

画像復号化装置２０ａは、画像復号化手段２１および多視点画像生成手段２２を備えているが、図２に示した画像復号化装置２０と比べ、要素画像変換手段２３を備えていない。この場合、画像復号化装置２０ａの多視点画像生成手段２２は、生成した多視点画像群を出力するようにする。
この画像復号化装置２０ａは、図２の画像復号化装置２０と同様に、画像符号化装置１０ａが生成した符号化データを取得し、１フレーム内やフレーム間での視点内挿を行い、不足している多視点画像を生成することができる。その際、画像符号化装置１０ａのスウィーピング処理により、より多くの視点での多視点画像の情報を利用できるため、多視点画像群の画質を均一にし、立体像の画質低下を抑えることができる。 The image decoding device 20a includes the image decoding means 21 and the multi-viewpoint image generation means 22, but does not include the element image conversion means 23 as compared with the image decoding device 20 shown in FIG. In this case, the multi-viewpoint image generation means 22 of the image decoding device 20a outputs the generated multi-viewpoint image group.
Similar to the image decoding device 20 of FIG. 2, the image decoding device 20a acquires the coded data generated by the image coding device 10a, performs viewpoint interpolation within one frame or between frames, and is insufficient. It is possible to generate a multi-viewpoint image. At that time, since the information of the multi-viewpoint image can be used from more viewpoints by the sweeping process of the image coding apparatus 10a, the image quality of the multi-viewpoint image group can be made uniform and the deterioration of the image quality of the stereoscopic image can be suppressed.

なお、本実施形態においては、本発明に係る画像符号化装置および画像復号化装置を独立した装置として説明したが、これに限定されない。例えば、本発明では、一般的なコンピュータのハードウェア資源を、画像符号化装置、画像復号化装置それぞれの各手段として動作させるプログラム（画像符号化プログラムおよび画像復号化プログラム）によって実現することができる。また、このプログラム（画像符号化プログラムおよび画像復号化プログラム）は、通信回線を介して配布したり、ＣＤ−ＲＯＭ等の記録媒体に記録して配布したりすることも可能である。 In the present embodiment, the image coding device and the image decoding device according to the present invention have been described as independent devices, but the present invention is not limited thereto. For example, in the present invention, a general computer hardware resource can be realized by a program (image coding program and image decoding program) that operates as each means of an image coding device and an image decoding device. .. Further, this program (image coding program and image decoding program) can be distributed via a communication line, or can be recorded and distributed on a recording medium such as a CD-ROM.

１０，１０ａ画像符号化装置
１１多視点画像変換手段
１２スウィーピング処理手段
１３画像符号化手段
２０，２０ａ画像復号化装置
２１画像復号化手段
２２多視点画像生成手段
２３要素画像変換手段 10, 10a image coding device 11 multi-viewpoint image conversion means 12 sweeping processing means 13 image coding means 20, 20a image decoding device 21 image decoding means 22 multi-viewpoint image generation means 23 element image conversion means

Claims

Using a plurality of element images obtained from light beams obtained through the plurality of element lenses constituting the lens array, a partial image at the same position in each of the plurality of element images is extracted, and the extracted partial image is used as the element. The multi-viewpoint image group generated by arranging according to the arrangement of the images to form a set of the partial images is captured.
A multi-viewpoint image to be a reference point to be encoded is determined for each predetermined number of multi-viewpoint images in a preset direction in the multi-viewpoint image group, and each frame of the captured image is set in the next frame. The sweeping process indicating that the multi-viewpoint image to be encoded is switched from the multi-viewpoint image to be coded in the frame to the adjacent multi-viewpoint image in the preset direction is performed on the multi -viewpoint image group. A sweeping processing means that repeats for the number of frames so that all multi-viewpoint images are subject to encoding,
The sweeping processing means generates depth image information of a multi-viewpoint image determined to be a coding target, and compresses the multi-viewpoint image determined to be a coding target and the depth image information generated for the multi-viewpoint image. An image coding means that converts and generates coded data,
An image coding apparatus comprising:

The coded data generated by the image coding apparatus according to claim 1 is acquired, and the coded data is used as the multi-viewpoint image determined to be the target of the coding and the depth image information generated for the multi-viewpoint image. Image decoding means to decode and
Of all the multi-viewpoint images for each frame, the missing multi-viewpoint images excluding the decoded multi-viewpoint images were decoded and the decoded multi-viewpoint images adjacent to each other in the same frame and the decoded multi-viewpoint images adjacent to each other. The multi-viewpoint image is extracted as a reference multi-viewpoint image, and the lacking multi-viewpoint image is interpolated by executing the viewpoint insertion process using the extracted multi-viewpoint image and the depth image information thereof. A multi-viewpoint image generation means that generates and outputs a multi-viewpoint image group for each frame,
An image decoding device characterized by comprising.

A plurality of element images obtained from a light beam obtained through a plurality of element lenses constituting the lens array are captured, a partial image at the same position in each of the plurality of element images is extracted, and the extracted partial image is used as the element image. A multi-viewpoint image conversion means for converting the plurality of element images into a multi-viewpoint image group showing the set of the partial images by forming a set of the partial images arranged according to the arrangement of the above.
Among the converted multi-viewpoint images, a multi-viewpoint image to be a reference point to be encoded is determined for each predetermined number of multi-viewpoint images in a preset direction, and next for each frame of the captured image. the multi-view image to be encoded in a frame, the sweeping process indicating that the switch from the multi-viewpoint images that targets encoded in the frame in the multi-viewpoint image adjacent in a direction the preset said multi-viewpoint image A sweeping processing means that repeats for the number of frames so that all multi-viewpoint images in the group are subject to encoding,
The sweeping processing means generates depth image information of a multi-viewpoint image determined to be a coding target, and compresses the multi-viewpoint image determined to be a coding target and the depth image information generated for the multi-viewpoint image. An image coding means that converts and generates coded data,
An image coding apparatus comprising:

The coded data generated by the image coding apparatus according to claim 3 is acquired, and the coded data is used as the multi-viewpoint image determined to be the target of the coding and the depth image information generated for the multi-viewpoint image. Image decoding means to decode and
Of all the multi-viewpoint images for each frame, the missing multi-viewpoint images excluding the decoded multi-viewpoint images were decoded and the decoded multi-viewpoint images adjacent to each other in the same frame and the decoded multi-viewpoint images adjacent to each other. The multi-viewpoint image is extracted as a reference multi-viewpoint image, and the lacking multi-viewpoint image is interpolated by executing the viewpoint insertion process using the extracted multi-viewpoint image and the depth image information thereof. A multi-viewpoint image generation means that generates a multi-viewpoint image group for each frame,
Element image conversion means for converting the generated multi-viewpoint image group into a plurality of element images by returning the partial image represented by the multi-viewpoint image group generated by the multi-viewpoint image generation means to the original position on the element lens. When,
An image decoding device characterized by comprising.

An image coding program for operating a computer as the image coding device according to claim 1.

An image decoding program for operating a computer as the image decoding device according to claim 2.